Commit 037bd58c authored 5 years ago by Serge Petrenko Committed by Kirill Yukhin 5 years ago

replication: disallow bootstrap of read-only masters

In a configuration with several read-only and read-write instances, if
replication_connect_quorum is not greater than the amount of read-only
instances and replication_connect_timeout happens to be small enough
for some read-only instances to form a quorum and exceed the timeout
before any of the read-write instaces start, all these read-only
instances will choose themselves a read-only bootstrap leader.
This 'leader' will successfully bootstrap itself, but will fail to
register any of the other instances in _cluster table, since it isn't
writeable. As a result, some of the read-only instances will just die
unable to bootstrap from a read-only bootstrap leader, and when the
read-write instances are finally up, they'll see a single read-only
instance which managed to bootstrap itself and now gets a
REPLICASET_UUID_MISMATCH error, since no read-write instance will
choose it as bootstrap leader, and will rather bootstrap from one of
its read-write mates.

The described situation is clearly not what user has hoped for, so
throw an error, when a read-only instance tries to initiate the
bootstrap. The error will give the user a cue that he should increase
replication_connect_timeout.

Closes #4321

@TarantoolBot document
Title: replication: forbid to bootstrap read-only masters.

It is no longer possible to bootstrap a read-only instance in an emply
data directory as a master. You will see the following error trying to
do so:
```
ER_BOOTSTRAP_READONLY: Trying to bootstrap a local read-only instance as master
```
Now if you have a fresh instance, which has
`read_only=true` in an initial `box.cfg` call, you need to set up
replication from an instance which is either read-write, or has your
local instance's uuid in its `_cluster` table.

In case you have multiple read-only and read-write instances with
replication set up, and you still see the aforementioned error message,
this means that none of your read-write instances managed to start
listening on their port before read_only instances have exceeded the
`replication_connect_timeout`. In this case you should raise
`replication_connect_timeout` to a greater value.

parent 43575303

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 26 additions and 9 deletions

Please register or to comment