Commit d4ce7447 authored 6 years ago by Vladimir Davydov
replication: fix rebootstrap crash in case master has replica's rows

During SUBSCRIBE the master sends only those rows originating from the
subscribed replica that aren't present on the replica. Such rows may
appear after a sudden power loss in case the replica doesn't issue
fdatasync() after each WAL write, which is the default behavior. This
means that a replica can write some rows to WAL, relay them to another
replica, then stop without syncing WAL file. If this happens we expect
the replica to read its own rows from other members of the cluster upon
restart. For more details see commit eae84efb ("replication: recover
missing local data from replica").

Obviously, this feature only makes sense for SUBSCRIBE. During JOIN
we must relay all rows. This is how it initially worked, but commit
adc28591 ("replication: do not delete relay on applier disconnect"),
witlessly removed the corresponding check from relay_send_row() so that
now we don't send any rows originating from the joined replica:

  @@ -595,8 +630,7 @@ relay_send_row(struct xstream *stream, struct xrow_header *packet)
           * it). In the latter case packet's LSN is less than or equal to
           * local master's LSN at the moment it received 'SUBSCRIBE' request.
           */
  -       if (relay->replica == NULL ||
  -           packet->replica_id != relay->replica->id ||
  +       if (packet->replica_id != relay->replica->id ||
              packet->lsn <= vclock_get(&relay->local_vclock_at_subscribe,
                                        packet->replica_id)) {
                  relay_send(relay, packet);

(relay->local_vclock_at_subscribe is initialized to 0 on JOIN)

This only affects the case of rebootstrap, automatic or manual, because
when a new replica joins a cluster there can't be any rows on the master
originating from it. On manual rebootstrap, i.e. when the replica files
are deleted by the user and the replica is restarted from an empty
directory with the same UUID (set via box.cfg.instance_uuid), this isn't
critical - the replica will still receive those rows it should have
received during JOIN once it subscribes. However, in case of automatic
rebootstrap this can result in broken order of xlog/snap files, because
the replica directory still contains old xlog/snap files created before
rebootstrap. The rebootstrap logic expects them to have strictly less
vclocks than new files, but if JOIN stops prematurely, this condition
may not hold, leading to a crash when the vclock of a new xlog/snap is
inserted into the corresponding xdir.

This patch fixes this issue by restoring pre eae84efb behavior: now
we create a new relay for FINAL JOIN instead of reusing the one attached
to the joined replica so that relay_send_row() can detect JOIN phase and
relay all rows in this case. It also adds a comment so that we don't
make such a mistake in future.

Apart from fixing the issue, this patch also fixes a relay leak in
relay_initial_join() in case engine_join_xc() fails, which was also
introduced by the above mentioned commit.

A note about xlog/panic_on_broken_lsn test. Now the relay status isn't
reported by box.info.replication if FINAL JOIN failed and the replica
never subscribed (this is how it worked before commit eae84efb) so
we need to tweak the test a bit to handle this.

Closes #3740
parent e0017ad6
No related branches found
No related tags found
No related merge requests found
Hide whitespace changes
Inline Side-by-side
Showing with 218 additions and 21 deletions
Please register or to comment