relay: do not report vclock[0] anywhere
Remote replica's vclock is given to master to send data starting from that position. The master does that, but, in order to find the relevant position in local WAL to start from, the master must ignore the local rows. Consider them all already "sent". For that the master replaces the remote vclock[0] with the local vclock[0]. That makes xlog cursor skip all the local rows. The problem is that this vclock was taken by relay as is, like if it was truly reported by the replica. It was even saved as the "last received ACK". Which clearly isn't the case. When a real ACK was received, it didn't contain anything in vclock[0], and yet relay "saw" that the previous ACK has vclock[0] > 0. That looked like the replica went backwards without even closing connection, which isn't possible. That made the relay crash from cringe (on assert). The fix is not to save the local vclock[0] in the last received ACK. For GC and xlog cursor the hack is still needed. An option how to make it easier was to set vclock[0] to INT64_MAX to just never even bother with any local rows, but that didn't work. Some assumptions in other places seem to depend on having a proper local LSN in these places. Closes #10047 NO_CHANGELOG=the bug wasn't released NO_DOC=bugfix
Showing
- src/box/relay.cc 4 additions, 4 deletionssrc/box/relay.cc
- src/lib/core/errinj.h 1 addition, 0 deletionssrc/lib/core/errinj.h
- src/lib/vclock/vclock.h 7 additions, 0 deletionssrc/lib/vclock/vclock.h
- test/box/errinj.result 1 addition, 0 deletionstest/box/errinj.result
- test/replication-luatest/gh_10047_local_vclock_downstream_test.lua 99 additions, 0 deletions...ication-luatest/gh_10047_local_vclock_downstream_test.lua
Loading
Please register or sign in to comment