replication: fix flaky election_qsync.test
Fix the test failing occasionally with the following result mismatch: [001] replication/election_qsync.test.lua memtx [ fail ] [001] [001] Test failed! Result content mismatch: [001] --- replication/election_qsync.result Thu Jul 15 17:15:48 2021 [001] +++ var/rejects/replication/election_qsync.reject Thu Jul 15 20:46:51 2021 [001] @@ -145,8 +145,7 @@ [001] | ... [001] box.space.test:select{} [001] | --- [001] - | - - [1] [001] - | - [2] [001] + | - - [2] [001] | ... [001] box.space.test:drop() [001] | --- [001] The issue happened because row [1] wasn't delivered to the 'default' instance from the 'replica' at all. The test does try to wait for [1] to be written to WAL and replicated, but sometimes it fails to wait until this event happens: box.ctl.promote() is issued asynchronously once the instance becomes the Raft leader. So issuing `box.ctl.wait_rw()` doesn't guarantee that the replica has already written the PROMOTE (the limbo is initially unclaimed so replica becomes writeable as soon as it becomes the Raft leader). Right after `wait_rw()` we wait for lsn propagation and for 'default' instance to reach replica's lsn. It may happen that lsn propagation happens due to PROMOTE being written to WAL, and not row [1]. When this is the case, the 'default' instance doesn't receive row [1] at all, resulting in the test error shown above. Fix the issue by waiting for the promotion to happen explicitly. Part of #5430
Please register or sign in to comment