replication: refuse to connect to master with nil UUID
The title is pretty self-explanatory. That's all this commit does. Now a couple of words on why this is needed. Commit 2a0c4f2b ("replication: make replica subscribe to master's ballot") changed replica connect behaviour: instead of holding a single connection to the master, replica may have two: master's ballot retrieval is now performed in a separate connection owned by a separate fiber called ballot_watcher. First connection to master is initialized as always and then applier fiber creates the ballot_watcher, which connects to the same address on its own. This lead to some unexpected consequences: random cartridge integration tests started failing with the following error: tarantool/tarantool/cartridge/test-helpers/cluster.lua:209: "localhost:13303": Replication setup failed, instance orphaned Here's what happened. Cartridge has a module named remote control. The module mimics a tarantool server and "listens" on the same socket the tarantool is intended to listen before box.cfg{listen=...} is called. For example one can see such output in tarantool logs with cartridge: NO_WRAP 13:07:43.210 [10265] main/132/applier/admin@localhost:13301 I> remote master 46a71a25-4328-4a41-985d-d93d6ed7fb7f at 127.0.0.1:13301 running Tarantool 2.11.0 13:07:43.210 [10265] main/133/applier/admin@localhost:13302 I> remote master 00000000-0000-0000-0000-000000000000 at 127.0.0.1:13302 running Tarantool 1.10.0 13:07:43.210 [10265] main/134/applier/admin@localhost:13303 I> remote master bcce45ad-38b7-4d8a-936a-133614a7775f at 127.0.0.1:13303 running Tarantool 2.11.0 NO_WRAP The second "Tarantool" in the output (with zero instance uuid and running Tarantool 1.10.0) is the remote control on an unconfigured tarantool instance. Before splitting applier connection in two, this was no problem: applier would try to get the instance's ballot from a remote control listener and fail (remote control doesn't answer to replication requests). Applier would retry connecting to the same address until it got a reply, meaning that remote control is stopped and real tarantool became listening on the socket. Now applier has two connections, and the following situation became possible: when applier connection is initialized, remote control is still working, and applier is connected to the remote control instance. Applier performs ballot receipt in a separate fiber, which's not yet initialized, so no errors are raised. As soon as applier creates the ballot watcher, remote control is stopped and the real tarantool starts listening on the socket. This means that no error happens in the ballot watcher as well (normal tarantool answers to replication requests, of course). And we get to an unhandled situation when applier itself is connected to (already dead) remote control instance, while its ballot watcher is connected to the real tarantool. As soon as applier sees the ballot is fetched, it continues connection process to the already dead remote control instance and gets an error: NO_WRAP 13:07:44.214 [10265] main/133/applier/admin@localhost:13302 I> failed to authenticate 13:07:44.214 [10265] main/133/applier/admin@localhost:13302 coio.c:326 E> SocketError: unexpected EOF when reading from socket, called on fd 1620, aka 127.0.0.1:54150: Broken pipe 13:07:44.214 [10265] main/133/applier/admin@localhost:13302 I> will retry every 1.00 second 13:07:44.214 [10265] main/115/remote_control/127.0.0.1:50242 C> failed to synchronize with 1 out of 3 replicas 13:07:44.214 [10265] main/115/remote_control/127.0.0.1:50242 I> entering orphan mode NO_WRAP Follow-up #5272 Closes #8185 NO_CHANGELOG=not user-visible NO_DOC=not user-visible (can't create Tarantool with zero uuid)
Showing
- src/box/applier.cc 14 additions, 0 deletionssrc/box/applier.cc
- src/box/errcode.h 1 addition, 0 deletionssrc/box/errcode.h
- test/box/error.result 1 addition, 0 deletionstest/box/error.result
- test/replication-luatest/gh_8185_dont_connect_to_nil_uuid_test.lua 51 additions, 0 deletions...ication-luatest/gh_8185_dont_connect_to_nil_uuid_test.lua
Please register or sign in to comment