limbo: don't clear txn sync flag in case of fail
The limbo cleared TXN_WAIT_SYNC and TXN_WAIT_ACK flags for all removed entries - succeeded and failed. For succeeded it is fine. For failed it was not. The reason is that a transaction could be rolled back after a successful WAL write but before its waiting fiber wakes up. Then on wakeup the fiber wouldn't not see TXN_WAIT_SYNC flag and assert that the transaction signature >= 0. It wasn't true for txns rolled back due to synchro-reasons like a foreign PROMOTE not including this transaction. The patch makes so a failed transaction keeps its TXN_WAIT_SYNC flag so as its owner fiber on wakeup would reach txn_limbo_wait_complete(), notice the bad signature, and follow the rollback-path. TXN_WAIT_ACK is dropped, because the transaction owner otherwise would try to call txn_limbo_ack() for the transaction even if the limbo doesn't belong to the instance anymore. An alternative solution would be to check signature value for all transactions even when journal_entry->res is >= 0. But that would slow down the common path even for non-synchro transactions. Closes #6842 NO_DOC=Bugfix
Showing
- changelogs/unreleased/gh-6842-qsync-assertions.md 5 additions, 0 deletionschangelogs/unreleased/gh-6842-qsync-assertions.md
- src/box/txn_limbo.c 2 additions, 1 deletionsrc/box/txn_limbo.c
- test/replication-luatest/gh_6842_qsync_applier_order_test.lua 98 additions, 4 deletions.../replication-luatest/gh_6842_qsync_applier_order_test.lua
Please register or sign in to comment