relay: send heartbeats on top of replication stream
There was a problem with the leader's relay continuing to ping the remote followers even when the leader's tx thread is hung. This tricked the followers into thinking the leader is alive and well, even though it couldn't serve any new requests. The problem was partially fixed by commit 56571d83 ("raft: make followers notice leader hang"): that commit made relay thread stop sending heartbeats in case tx thread is unresponsive. Up to now we didn't differentiate between heartbeats and data rows: the receipt of both was considered a sign the master is alive. So if some replicas are not up to date with the master, they will continue thinking it's alive until they are fully synced and notice there are no more heartbeats from it. In order to fix this, stop treating all data as heartbeats and start sending heartbeats on top of an active replication stream. Closes #7515 NO_DOC=bugfix
Showing
- changelogs/unreleased/gh-7515-syncing-follower-notices-leader-hang.md 4 additions, 0 deletions...nreleased/gh-7515-syncing-follower-notices-leader-hang.md
- src/box/applier.cc 10 additions, 1 deletionsrc/box/applier.cc
- src/box/relay.cc 18 additions, 7 deletionssrc/box/relay.cc
- test/replication-luatest/gh_7515_sync_node_sees_leader_hang_test.lua 98 additions, 0 deletions...ation-luatest/gh_7515_sync_node_sees_leader_hang_test.lua
Loading
Please register or sign in to comment