raft: leader fencing on quorum loss
Losing quorum didn't modify RAFT leader behaviour, this led to easy split brain situation, when current leader gets disconnected from the rest of the replicaset. Old leader remained RW while new leader could be elected in the rest of the replicaset. Now when current leader looses connection to replica (connection being alive relay and applier for that replica) leader checks that he still has at least quorum of alive connections to replicas. If less than quorum of alive connection remain: leader resigns its role becoming follower without starting new term. This makes old leader RO. Old leader won't start new elections because of pre-vote, added in 421a0968. Fenced leader will "freeze" its limbo, and won't CONFIRM nor ROLLBACK any synchronous transactions until it regains leadership. If another leader is elected while old one is fenced, once old leader is reconnected to new one "frozen" transaction will be rolled back. When quorum of connections is regained and new leader is not elected old leader or another node will start elections and elect new leader. Closes #6661 @TarantoolBot document Title: RAFT leader fencing RAFT documentation must include that RAFT leader will resign its leadership if it has less than replication_synchro_quruom of alive connections to replicas (alive connection is connected relay and applier) if fencing is on. This applies to election_mode 'candidate' or 'manual'. Resigning leadership is becoming follower in current RAFT term, this leads to resigned leader becoming read-only. Introduce a new `box.cfg` option - `election_fencing_enabled`. If set to `true` fencing is on (default behaviour). If set to `false` fencing is off and leader doesn't resign it's leadership when looses quorum. If enabled on the current leader, when it doesn't have quorum of alive connections - leader will immediately resign its leadership.
Showing
- changelogs/unreleased/elections-leader-fencing.md 5 additions, 0 deletionschangelogs/unreleased/elections-leader-fencing.md
- src/box/box.cc 36 additions, 0 deletionssrc/box/box.cc
- src/box/box.h 1 addition, 0 deletionssrc/box/box.h
- src/box/lua/cfg.cc 9 additions, 0 deletionssrc/box/lua/cfg.cc
- src/box/lua/load_cfg.lua 5 additions, 0 deletionssrc/box/lua/load_cfg.lua
- src/box/raft.c 68 additions, 9 deletionssrc/box/raft.c
- src/box/raft.h 15 additions, 0 deletionssrc/box/raft.h
- src/box/txn_limbo.c 28 additions, 5 deletionssrc/box/txn_limbo.c
- src/box/txn_limbo.h 20 additions, 0 deletionssrc/box/txn_limbo.h
- src/lib/raft/raft.c 8 additions, 0 deletionssrc/lib/raft/raft.c
- src/lib/raft/raft.h 7 additions, 0 deletionssrc/lib/raft/raft.h
- test/app-tap/init_script.result 1 addition, 0 deletionstest/app-tap/init_script.result
- test/box/admin.result 2 additions, 0 deletionstest/box/admin.result
- test/box/cfg.result 4 additions, 0 deletionstest/box/cfg.result
- test/replication-luatest/election_fencing_test.lua 291 additions, 0 deletionstest/replication-luatest/election_fencing_test.lua
- test/replication/election_replica.lua 2 additions, 0 deletionstest/replication/election_replica.lua
- test/unit/raft.c 40 additions, 1 deletiontest/unit/raft.c
- test/unit/raft.result 7 additions, 1 deletiontest/unit/raft.result
- test/unit/raft_test_utils.c 8 additions, 0 deletionstest/unit/raft_test_utils.c
- test/unit/raft_test_utils.h 4 additions, 0 deletionstest/unit/raft_test_utils.h
Loading
Please register or sign in to comment