replication: send raft terms in applier heartbeats
There was a bug that an instance could ack a transaction from an old Raft term thus allowing the old leader to CONFIRM it, even if that first instance knew there is a newer Raft term going on. As a result, the old leader could write CONFIRM even if there is already a new leader elected and the synchro quorum was > half. That led to split-brain, when bad txn reached the new leader, and PROMOTE reached the old leader. Split-brain here is totally unnecessary. If the quorum is correct, synchro timeout is infinite, and there is no async transactions, then split-brain shouldn't ever happen. The fix is as simple as attach the current Raft term number to applier heartbeats. In the testcase above if terms are attached, the old leader gets ACK + new term. That causes the old leader freeze even if the pending txn got quorum. The old leader can't CONFIRM nor ROLLBACK its pending txns until a new leader is elected. Freeze is guaranteed, because if a new leader was elected, then it had got votes from > half cluster. It means > half nodes have the new term. That in turn means the old leader during collecting ACKs for its "new" txn will get the new term number from at least one replica. When the new leader finished writing PROMOTE, it either confirms or rolls back the txn of the old leader (depending on whether it has reached the new leader before promotion). Neither result causes split brain. The rollback only causes a non-critical error on the old leader raised by the bad txn's commit attempt. There were some alternatives considered. One of the most promising ones was to make instances reject txns if they see these txns coming from an instance having an old Raft term. It would help in the test provided above. But wouldn't do in a more complicated test, when there is a third node which gets the bad transaction, then gets local term bumped, and then replicates to any other instance. Others would accept that bad txn, because the sender has a newer Raft term, even though the txn author is still in the old term. Tracking terms of txn author is not possible in too many cases so as to rely on that. Closes #7253 @TarantoolBot document Title: New iproto field in applier -> relay ACKs The applier->relay channel (from replica back to master) is used only for sending ACKs. Replication data goes the other way (relay->applier). These ACKs had 2 fields: `IPROTO_VCLOCK (0x26)` and `IPROTO_VCLOCK_SYNC (0x5a)`. Now they have a new field: `IPROTO_TERM (0x53)`. It is a unsigned number containing `box.info.election.term` of the sender node (applier, replica).
Showing
- changelogs/unreleased/gh-7253-split-brain-old-term-txn.md 5 additions, 0 deletionschangelogs/unreleased/gh-7253-split-brain-old-term-txn.md
- src/box/applier.cc 2 additions, 0 deletionssrc/box/applier.cc
- src/box/applier.h 2 additions, 0 deletionssrc/box/applier.h
- src/box/relay.cc 12 additions, 0 deletionssrc/box/relay.cc
- src/box/xrow.c 13 additions, 1 deletionsrc/box/xrow.c
- src/box/xrow.h 2 additions, 0 deletionssrc/box/xrow.h
- test/luatest_helpers/server.lua 35 additions, 16 deletionstest/luatest_helpers/server.lua
- test/replication-luatest/gh_5295_split_brain_test.lua 4 additions, 1 deletiontest/replication-luatest/gh_5295_split_brain_test.lua
- test/replication-luatest/gh_6036_qsync_order_test.lua 7 additions, 0 deletionstest/replication-luatest/gh_6036_qsync_order_test.lua
- test/replication-luatest/gh_7253_election_long_wal_write_test.lua 274 additions, 0 deletions...lication-luatest/gh_7253_election_long_wal_write_test.lua
- test/replication/gh-5195-qsync-replica-write.result 11 additions, 36 deletionstest/replication/gh-5195-qsync-replica-write.result
- test/replication/gh-5195-qsync-replica-write.test.lua 9 additions, 18 deletionstest/replication/gh-5195-qsync-replica-write.test.lua
- test/replication/gh-5213-qsync-applier-order-3.result 7 additions, 0 deletionstest/replication/gh-5213-qsync-applier-order-3.result
- test/replication/gh-5213-qsync-applier-order-3.test.lua 3 additions, 0 deletionstest/replication/gh-5213-qsync-applier-order-3.test.lua
Loading
Please register or sign in to comment