replication: disallow bootstrap of read-only masters
In a configuration with several read-only and read-write instances, if replication_connect_quorum is not greater than the amount of read-only instances and replication_connect_timeout happens to be small enough for some read-only instances to form a quorum and exceed the timeout before any of the read-write instaces start, all these read-only instances will choose themselves a read-only bootstrap leader. This 'leader' will successfully bootstrap itself, but will fail to register any of the other instances in _cluster table, since it isn't writeable. As a result, some of the read-only instances will just die unable to bootstrap from a read-only bootstrap leader, and when the read-write instances are finally up, they'll see a single read-only instance which managed to bootstrap itself and now gets a REPLICASET_UUID_MISMATCH error, since no read-write instance will choose it as bootstrap leader, and will rather bootstrap from one of its read-write mates. The described situation is clearly not what user has hoped for, so throw an error, when a read-only instance tries to initiate the bootstrap. The error will give the user a cue that he should increase replication_connect_timeout. Closes #4321 @TarantoolBot document Title: replication: forbid to bootstrap read-only masters. It is no longer possible to bootstrap a read-only instance in an emply data directory as a master. You will see the following error trying to do so: ``` ER_BOOTSTRAP_READONLY: Trying to bootstrap a local read-only instance as master ``` Now if you have a fresh instance, which has `read_only=true` in an initial `box.cfg` call, you need to set up replication from an instance which is either read-write, or has your local instance's uuid in its `_cluster` table. In case you have multiple read-only and read-write instances with replication set up, and you still see the aforementioned error message, this means that none of your read-write instances managed to start listening on their port before read_only instances have exceeded the `replication_connect_timeout`. In this case you should raise `replication_connect_timeout` to a greater value.
Showing
- src/box/box.cc 4 additions, 0 deletionssrc/box/box.cc
- src/box/errcode.h 1 addition, 0 deletionssrc/box/errcode.h
- test/app-tap/cfg.test.lua 10 additions, 8 deletionstest/app-tap/cfg.test.lua
- test/box-tap/cfg.test.lua 10 additions, 1 deletiontest/box-tap/cfg.test.lua
- test/box/misc.result 1 addition, 0 deletionstest/box/misc.result
Loading
Please register or sign in to comment