replication: fix replica disconnect upon reconfiguration
Replication reconfiguration used to work as follows: upon receiving a new config disconnect from all the existing masters and try to connect to all the masters from the new config. This lead to instances doing extra work when old config and new config had the same nodes in them: instead of doing nothing, tarantool reinitialized the connection. There was another problem: when an existing connection is broken, master takes some time to notice that. So, when replica resets the connection, it may try to reconnect faster than the master is able to notice its absence. In this case replica wouldn't be able to reconnect due to `duplicate connection with the same replica UUID`. So replication would stop for a replication_timeout, which may be quite large (seconds or tens of seconds). Let's prevent tarantool from reconnecting to the same instance, if there already is a working connection. Closes #4669
Showing
- changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md 5 additions, 0 deletionschangelogs/unreleased/gh-4669-applier-reconfig-reconnect.md
- src/box/box.cc 23 additions, 8 deletionssrc/box/box.cc
- src/box/replication.cc 29 additions, 12 deletionssrc/box/replication.cc
- src/box/replication.h 3 additions, 1 deletionsrc/box/replication.h
- test/replication-luatest/gh_4669_applier_reconnect_test.lua 49 additions, 0 deletionstest/replication-luatest/gh_4669_applier_reconnect_test.lua
Loading
Please register or sign in to comment