replication: cleanup timeouts
Currently, we have three variables related to replication timeouts: applier_timeout, relay_timeout, and replication_cfg_timeout. They are all set to the value of box.cfg.replication_timeout. We use these variables in four different cases: - Sending heartbeat messages periodically from master to slave and back (relay_timeout, applier_timeout). - Reconnecting applier after a disconnect (applier_timeout). - Disconnecting a replica if no hearbeat message has been receivied within the specified timeout (TIMEOUT_PERIODS * replication_timeout). - Waiting for box.cfg() to succeed (replication_connect_quorum_timeout). This is confusing. Let's keep just one variable, replication_timeout, that would determine the heartbeat interval and introduce the following helpers for the three other cases: - replication_reconnect_timeout() - replication_disconnect_timeout() - replication_connect_quroum_timeout() Also, let's make replication_connect_quorum_timeout() return 4 times the configured timeout in the scope of this patch, because, as pointed out by @kostja, > We need another replication_timeout variable, using the same variable > for everything doesn't work. Please try setting a broken > box.cfg.replication second time, and you'll see that it doesn't try to > reconnect, because reconnect timeout = replication timeout. This is > broken, reconnect_timeout should be < replication_timeout, to allow for > at least a few reconnects. Suggested by @kostja Follow-up #2958
Showing
- src/box/applier.cc 3 additions, 5 deletionssrc/box/applier.cc
- src/box/applier.h 0 additions, 3 deletionssrc/box/applier.h
- src/box/box.cc 2 additions, 9 deletionssrc/box/box.cc
- src/box/relay.cc 2 additions, 8 deletionssrc/box/relay.cc
- src/box/relay.h 0 additions, 3 deletionssrc/box/relay.h
- src/box/replication.cc 2 additions, 0 deletionssrc/box/replication.cc
- src/box/replication.h 36 additions, 0 deletionssrc/box/replication.h
Loading
Please register or sign in to comment