Files · f1a507b01afdca733f9a3d0a851a9096b995e825 · core / tarantool

replication: retry in case of XlogGapError

Vladislav Shpilevoy authored 4 years ago

Previously XlogGapError was considered a critical error stopping
the replication. That may be not so good as it looks.

XlogGapError is a perfectly fine error, which should not kill the
replication connection. It should be retried instead.

Because here is an example, when the gap can be recovered on its
own. Consider the case: node1 is a leader, it is booted with
vclock {1: 3}. Node2 connects and fetches snapshot of node1, it
also gets vclock {1: 3}. Then node1 writes something and its
vclock becomes {1: 4}. Now node3 boots from node1, and gets the
same vclock. Vclocks now look like this:

  - node1: {1: 4}, leader, has {1: 3} snap.
  - node2: {1: 3}, booted from node1, has only snap.
  - node3: {1: 4}, booted from node1, has only snap.

If the cluster is a fullmesh, node2 will send subscribe requests
with vclock {1: 3}. If node3 receives it, it will respond with
xlog gap error, because it only has a snap with {1: 4}, nothing
else. In that case node2 should retry connecting to node3, and in
the meantime try to get newer changes from node1.

The example is totally valid. However it is unreachable now
because master registers all replicas in _cluster before allowing
them to make a join. So they all bootstrap from a snapshot
containing all their IDs. This is a bug, because such
auto-registration leads to registration of anonymous replicas, if
they are present during bootstrap. Also it blocks Raft, which
can't work if there are registered, but not yet joined nodes.

Once the registration problem will be solved in a next commit, the
XlogGapError will strike quite often during bootstrap. This patch
won't allow that happen.

Needed for #5287

f1a507b0

History

f1a507b0 4 years ago

History

Name	Last commit	Last update