-
Serge Petrenko authored
Our split-brain detection machinery relies among other things on all nodes tracking the synchro queue confirmed lsn. This tracking was only added together with the split-brain detection. Only the synchro queue owner tracked the confirmed lsn before. This means that after an upgrade all the replicas remember the latest confirmed lsn as 0, and any PROMOTE/DEMOTE request from the queue owner is treated as a split brain. Let's fix this and only enable split-brain detection on the replica set once the schema version is updated. Thanks to the synchro queue freeze on restart, this can only happen after a new PROMOTE or DEMOTE entry is written by one of the nodes, and thus the correct confirmed lsn is propagated with this PROMOTE/DEMOTE to all the cluster members. Closes #8996 NO_DOC=bugfix (cherry picked from commit a844bd37)
Serge Petrenko authoredOur split-brain detection machinery relies among other things on all nodes tracking the synchro queue confirmed lsn. This tracking was only added together with the split-brain detection. Only the synchro queue owner tracked the confirmed lsn before. This means that after an upgrade all the replicas remember the latest confirmed lsn as 0, and any PROMOTE/DEMOTE request from the queue owner is treated as a split brain. Let's fix this and only enable split-brain detection on the replica set once the schema version is updated. Thanks to the synchro queue freeze on restart, this can only happen after a new PROMOTE or DEMOTE entry is written by one of the nodes, and thus the correct confirmed lsn is propagated with this PROMOTE/DEMOTE to all the cluster members. Closes #8996 NO_DOC=bugfix (cherry picked from commit a844bd37)
gh-8996-spurious-spit-brain-detected.md 158 B
bugfix/replication
- Fixed a false-positive split-brain in a replica set on the first promotion after an upgrade from versions before 2.10.1 (gh-8996).