qsync: order access to the limbo terms
Limbo terms tracking is shared between appliers and when one of appliers is waiting for write to complete inside journal_write() routine, an other may need to access read term value to figure out if promote request is valid to apply. Due to cooperative multitasking access to the terms is not consistent so we need to be sure that other fibers read up to date terms (ie written to the WAL). For this sake we use a latching mechanism, when one fiber takes a lock for updating other readers are waiting until the operation is complete. For example here is a call graph of two appliers applier 1 --------- applier_apply_tx (promote term = 3 current max term = 2) applier_synchro_filter_tx apply_synchro_row journal_write (sleeping) at this moment another applier comes in with obsolete data and term 2 applier 2 --------- applier_apply_tx (term 2) applier_synchro_filter_tx txn_limbo_is_replica_outdated -> false journal_write (sleep) applier 1 --------- journal wakes up apply_synchro_row_cb set max term to 3 So the applier 2 didn't notice that term 3 is already seen and wrote obsolete data. With locking the applier 2 will wait until applier 1 has finished its write. We introduce the following helpers: 1) txn_limbo_begin: which takes a lock 2) txn_limbo_commit and txn_limbo_rollback which simply release the lock but have different names for better semantics 3) txn_limbo_process is a general function which uses x_begin and x_commit helper internally 4) txn_limbo_apply to do a real job over processing the request, it implies that txn_limbo_begin been called Testing such in-flight condition won't be easy so we introduce "box.info.synchro.queue.busy" field to report if limbo is currently latched and processing a sync request. @TarantoolBot document Title: synchronous replication changes `box.info.synchro.queue` gets a new `busy` field. It is set to `true` when there is a synchronous transaction is processing but not yet complete. Thus any other incoming synchronous transactions will be delayed until active one is finished. Part-of #6036 Acked-by:Serge Petrenko <sergepetrenko@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
Showing
- changelogs/unreleased/gh-6036-applier-fix-race.md 5 additions, 0 deletionschangelogs/unreleased/gh-6036-applier-fix-race.md
- src/box/applier.cc 5 additions, 1 deletionsrc/box/applier.cc
- src/box/box.cc 6 additions, 2 deletionssrc/box/box.cc
- src/box/lua/info.c 3 additions, 1 deletionsrc/box/lua/info.c
- src/box/txn_limbo.c 12 additions, 1 deletionsrc/box/txn_limbo.c
- src/box/txn_limbo.h 39 additions, 5 deletionssrc/box/txn_limbo.h
- test/replication-luatest/gh_6036_qsync_order_test.lua 211 additions, 0 deletionstest/replication-luatest/gh_6036_qsync_order_test.lua
- test/replication-luatest/suite.ini 1 addition, 0 deletionstest/replication-luatest/suite.ini
Loading
Please register or sign in to comment