txn_limbo: filter incoming synchro requests
When we receive synchro requests we can't just apply them blindly because in worst case they may come from split-brain configuration (where a cluster split into several clusters and each one has own leader elected, then clusters are trying to merge back into the original one). We need to do our best to detect such disunity and force these nodes to rejoin from the scratch for data consistency sake. Thus when we're processing requests we pass them to the packet filter first which validates their contents and refuse to apply if they violate consistency. Depending on request type each packet traverses an appropriate chain. filter_generic(): a common chain for any synchro packet. 1) request:replica_id = 0 allowed for PROMOTE request only. 2) request:replica_id should match limbo:owner_id, IOW the limbo migration should be noticed by all instances in the cluster. filter_confirm_rollback(): a chain for CONFIRM | ROLLBACK packets. 1) Zero lsn is disallowed for such requests. filter_promote_demote(): a chain for PROMOTE | DEMOTE packets. 1) The requests should come in with nonzero term, otherwise the packet is corrupted. 2) The request's term should not be less than maximal known one, iow it should not come in from nodes which didn't notice raft epoch changes and living in the past. filter_queue_boundaries(): a common finalization chain. 1) If LSN of the request matches current confirmed LSN the packet is obviously correct to process. 2) If LSN is less than confirmed LSN then the request is wrong, we have processed the requested LSN already. 3) If LSN is greater than confirmed LSN then a) If limbo is empty we can't do anything, since data is already processed and should issue an error; b) If there is some data in the limbo then requested LSN should be in range of limbo's [first; last] LSNs, thus the request will be able to commit and rollback limbo queue. Note the filtration is disabled during initial configuration where we apply requests from the only source of truth (either the remote master, or our own journal), so no split brain is possible. In order to make split-brain checks work, the applier nopify filter now passes synchro requests from obsolete term without nopifying them. Also, now ANY asynchronous request coming from an instance with obsolete term is treated as a split-brain. Think of it as of a syncrhonous request committed with a malformed quorum. Closes #5295 NO_DOC=it's literally below Co-authored-by:Serge Petrenko <sergepetrenko@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> @TarantoolBot document Title: new error type: ER_SPLIT_BRAIN If for some reason the cluster had 2 leaders working independently (for example, user has mistakenly lovered the quorum below N / 2 + 1), then once such leaders and their followers try connecting to each other, they will receive the ER_SPLIT_BRAIN error, and the connection will be aborted. This is done to preserve data integrity. Once the user notices such an error he or she has to manually inspect the data on both the split halves, choose a way to restore the data, and rebootstrap one of the halves from the other.
Showing
- changelogs/unreleased/gh-5295-split-brain-detection.md 5 additions, 0 deletionschangelogs/unreleased/gh-5295-split-brain-detection.md
- src/box/applier.cc 18 additions, 14 deletionssrc/box/applier.cc
- src/box/box.cc 7 additions, 0 deletionssrc/box/box.cc
- src/box/errcode.h 1 addition, 0 deletionssrc/box/errcode.h
- src/box/txn_limbo.c 295 additions, 46 deletionssrc/box/txn_limbo.c
- src/box/txn_limbo.h 15 additions, 2 deletionssrc/box/txn_limbo.h
- test/box/error.result 1 addition, 0 deletionstest/box/error.result
- test/replication-luatest/gh_5295_split_brain_test.lua 240 additions, 0 deletionstest/replication-luatest/gh_5295_split_brain_test.lua
- test/replication-luatest/suite.ini 1 addition, 1 deletiontest/replication-luatest/suite.ini
Loading
Please register or sign in to comment