Commit f45ae852 authored 7 years ago by Vladimir Davydov Committed by Konstantin Osipov 7 years ago

replication: cleanup timeouts

Currently, we have three variables related to replication timeouts:
applier_timeout, relay_timeout, and replication_cfg_timeout. They are
all set to the value of box.cfg.replication_timeout. We use these
variables in four different cases:

 - Sending heartbeat messages periodically from master to slave and back
   (relay_timeout, applier_timeout).

 - Reconnecting applier after a disconnect (applier_timeout).

 - Disconnecting a replica if no hearbeat message has been receivied
   within the specified timeout (TIMEOUT_PERIODS * replication_timeout).

 - Waiting for box.cfg() to succeed (replication_connect_quorum_timeout).

This is confusing. Let's keep just one variable, replication_timeout,
that would determine the heartbeat interval and introduce the following
helpers for the three other cases:

 - replication_reconnect_timeout()
 - replication_disconnect_timeout()
 - replication_connect_quroum_timeout()

Also, let's make replication_connect_quorum_timeout() return 4 times the
configured timeout in the scope of this patch, because, as pointed out by
@kostja,

> We need another replication_timeout variable, using the same variable
> for everything doesn't work.  Please try setting a broken
> box.cfg.replication second time, and you'll see that it doesn't try to
> reconnect, because reconnect timeout = replication timeout. This is
> broken, reconnect_timeout should be < replication_timeout, to allow for
> at least a few reconnects.

Suggested by @kostja

Follow-up #2958

parent e755ad24

No related branches found

No related tags found

Hide whitespace changes

Inline Side-by-side

Showing with 45 additions and 28 deletions

Please register or to comment