replication: introduce orphan mode
This patch modifies the replication configuration procedure so as to fully conform to the specification presented in #2958. In a nutshell, now box.cfg() tries to synchronize all connected replicas before returning. If it fails to connect enough replicas to form a quorum, it leaves the server in a degraded 'orphan' mode, which is basically read-only. More details below. First of all, it's worth mentioning that we already have 'orphan' status in Tarantool (between 'loading' and 'hot_standby'), but it has nothing to do with replication. Actually, it's unclear why it was introduced in the first place so we agreed to silently drop it. We assume that a replica is synchronized if its lag is not greater than the value of new configuration option box.cfg.replication_sync_lag. Otherwise a replica is considered to be syncing and has "sync" status. If replication_sync_lag is unset (nil) or set to TIMEOUT_INFINITY, then a replica skips the "sync" state and switches to "follow" immediately. The default value of replication_sync_lag is 10 seconds, but it is ignored (assumed to be inf) in case the master is running tarantool older than 1.7.7, which does not send heartbeat messages. If box.cfg() is called for the very first time (bootstrap) for a given instance, then 1. It tries to connect to all configured replicas for as long as it takes (replication_timeout isn't taken into account). If it fails to connect to at least one replica, bootstrap is aborted. 2. If this is a cluster bootstrap and the current instance turns out to be the new cluster leader, then it performs local bootstrap and switches to 'running' state and leaves box.cfg() immediately. 3. Otherwise (i.e. if this is bootstrap of a slave replica), then it bootstraps from a remote master and then stays in 'orphan' state until it synchronizes with all replicas before switching to 'running' state and leaving box.cfg(). If box.cfg() is called after bootstrap, in order to recover from the local storage, then 1. It recovers the last snapshot and xlogs stored in the local directory. 2. Then it switches to 'orphan' mode and tries to connect to at least as many replicas as specified by box.cfg.replication_connect_quorum for a time period which is a multiple of box.cfg.replication_timeout (4x). If it fails, it doesn't abort, but leaves box.cfg() in 'orphan' mode. The state will switch to 'running' asynchronously as soon as the instance has synced with 'replication_connect_quorum' replicas. 3. If it managed to connect to enough replicas to form a quorum at step 2, it synchronizes with them: box.cfg() doesn't return until at least 'replication_connect_quorum' replicas have been synchronized. If box.cfg() is called after recovery to reconfigure replication, then it tries to connect to all specified replicas within a time period which is a multiple of box.cfg.replication_timeout (4x). The value of box.cfg.replication_connect_quorum isn't taken into account, neither is the value of box.cfg.replication_sync_lag - box.cfg() returns as soon as all configured replicas have been connected. Just like any other status, the new one is reflected by box.info.status. Suggested by @kostja Follow-up #2958 Closes #999
Showing
- src/box/applier.cc 23 additions, 3 deletionssrc/box/applier.cc
- src/box/applier.h 4 additions, 3 deletionssrc/box/applier.h
- src/box/box.cc 61 additions, 22 deletionssrc/box/box.cc
- src/box/box.h 8 additions, 0 deletionssrc/box/box.h
- src/box/lua/load_cfg.lua 3 additions, 1 deletionsrc/box/lua/load_cfg.lua
- src/box/replication.cc 124 additions, 21 deletionssrc/box/replication.cc
- src/box/replication.h 74 additions, 8 deletionssrc/box/replication.h
- src/cfg.c 10 additions, 0 deletionssrc/cfg.c
- src/cfg.h 3 additions, 0 deletionssrc/cfg.h
- test/app-tap/init_script.result 22 additions, 21 deletionstest/app-tap/init_script.result
- test/box/admin.result 2 additions, 0 deletionstest/box/admin.result
- test/box/cfg.result 4 additions, 0 deletionstest/box/cfg.result
- test/replication/quorum.lua 16 additions, 5 deletionstest/replication/quorum.lua
- test/replication/quorum.result 67 additions, 68 deletionstest/replication/quorum.result
- test/replication/quorum.test.lua 48 additions, 52 deletionstest/replication/quorum.test.lua
- test/replication/quorum1.lua 1 addition, 0 deletionstest/replication/quorum1.lua
- test/replication/quorum2.lua 1 addition, 0 deletionstest/replication/quorum2.lua
- test/replication/quorum3.lua 1 addition, 0 deletionstest/replication/quorum3.lua
- test/replication/suite.ini 1 addition, 1 deletiontest/replication/suite.ini
Loading
Please register or sign in to comment