Skip to content
Snippets Groups Projects
Commit 634f59c7 authored by Serge Petrenko's avatar Serge Petrenko Committed by Vladimir Davydov
Browse files

recovery: panic in case of recovery and replicaset vclock mismatch

We assume that no one touches the instance's WALs, once it has taken the
wal_dir_lock. This is not the case when upgrading from an old setup
(running tarantool 1.7.3-6 or less). Such nodes either take a lock on
snap dir, which may be different from wal dir, or don't take the lock at
all.

So, it's possible that during upgrade an old node is not stopped
properly before a new node is started in the same data directory.

The old node might even write some extra data to WAL during new node's
startup.

This is obviously bad and leads to multiple issues. For example, new node
might start local recovery, scan the WALs and set replicaset.vclock to
some value {1 : 5}. While the node recovers WALs they are appended by the old
node up to vclock {1 : 10}.
The node finishes local recovery with replicaset vclock {1 : 5}, but
data recovered up to vclock {1 : 10}.

The node will use the now outdated replicaset vclock to subscribe to
remote peers (leading to replication breaking due to duplicate keys
found), to initialize WAL (leading to new xlogs appearing with duplicate
LSNs). There might be a number of other issues we just haven't stumbled
upon.

Let's prevent situations like that and panic as soon as we see that the
initially scanned vclock (replicaset vclock) differs from actually
recovered vclock.

Closes #6709
parent dc19be40
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment