Commit dfd3071f authored 7 years ago by Vladimir Davydov Committed by Konstantin Osipov 7 years ago
replication: introduce orphan mode

This patch modifies the replication configuration procedure so as to
fully conform to the specification presented in #2958. In a nutshell,
now box.cfg() tries to synchronize all connected replicas before
returning. If it fails to connect enough replicas to form a quorum, it
leaves the server in a degraded 'orphan' mode, which is basically
read-only. More details below.

First of all, it's worth mentioning that we already have 'orphan' status
in Tarantool (between 'loading' and 'hot_standby'), but it has nothing
to do with replication. Actually, it's unclear why it was introduced in
the first place so we agreed to silently drop it.

We assume that a replica is synchronized if its lag is not greater than
the value of new configuration option box.cfg.replication_sync_lag.
Otherwise a replica is considered to be syncing and has "sync" status.
If replication_sync_lag is unset (nil) or set to TIMEOUT_INFINITY, then
a replica skips the "sync" state and switches to "follow" immediately.
The default value of replication_sync_lag is 10 seconds, but it is
ignored (assumed to be inf) in case the master is running tarantool
older than 1.7.7, which does not send heartbeat messages.

If box.cfg() is called for the very first time (bootstrap) for a given
instance, then

 1. It tries to connect to all configured replicas for as long as it
    takes (replication_timeout isn't taken into account). If it fails to
    connect to at least one replica, bootstrap is aborted.

 2. If this is a cluster bootstrap and the current instance turns out to
    be the new cluster leader, then it performs local bootstrap and
    switches to 'running' state and leaves box.cfg() immediately.

 3. Otherwise (i.e. if this is bootstrap of a slave replica), then it
    bootstraps from a remote master and then stays in 'orphan' state
    until it synchronizes with all replicas before switching to
    'running' state and leaving box.cfg().

If box.cfg() is called after bootstrap, in order to recover from the
local storage, then

 1. It recovers the last snapshot and xlogs stored in the local
    directory.

 2. Then it switches to 'orphan' mode and tries to connect to at least
    as many replicas as specified by box.cfg.replication_connect_quorum
    for a time period which is a multiple of box.cfg.replication_timeout
    (4x). If it fails, it doesn't abort, but leaves box.cfg() in
    'orphan' mode. The state will switch to 'running' asynchronously as
    soon as the instance has synced with 'replication_connect_quorum'
    replicas.

 3. If it managed to connect to enough replicas to form a quorum at step
    2, it synchronizes with them: box.cfg() doesn't return until at
    least 'replication_connect_quorum' replicas have been synchronized.

If box.cfg() is called after recovery to reconfigure replication, then
it tries to connect to all specified replicas within a time period which
is a multiple of box.cfg.replication_timeout (4x). The value of
box.cfg.replication_connect_quorum isn't taken into account, neither is
the value of box.cfg.replication_sync_lag - box.cfg() returns as soon as
all configured replicas have been connected.

Just like any other status, the new one is reflected by box.info.status.

Suggested by @kostja

Follow-up #2958
Closes #999
parent 4e0729cb
No related branches found
No related tags found
No related merge requests found
Hide whitespace changes
Inline Side-by-side
Showing with 473 additions and 205 deletions
Please register or to comment