test: fix flaky replication/wal_rw_stress.test.lua
Found issue (reproduced on VBox FreeBSD machine): [016] --- replication/wal_rw_stress.result Fri Feb 21 11:53:21 2020 [016] +++ replication/wal_rw_stress.reject Fri May 8 08:23:56 2020 [016] @@ -73,7 +73,42 @@ [016] ... [016] box.info.replication[1].downstream.status ~= 'stopped' or box.info [016] --- [016] -- true [016] +- version: 2.5.0-27-g32f59756a [016] + id: 2 [016] + ro: false [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + package: Tarantool [016] + cluster: [016] + uuid: 397c196f-9105-11ea-96ab-08002739cbd6 [016] + listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto [016] + replication: [016] + 1: [016] + id: 1 [016] + uuid: 397a1886-9105-11ea-96ab-08002739cbd6 [016] + lsn: 10005 [016] + upstream: [016] + status: follow [016] + idle: 0.46353673400017 [016] + peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto [016] + lag: -0.45732522010803 [016] + downstream: [016] + status: stopped [016] + message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati [016] + system_message: Broken pipe [016] + 2: [016] + id: 2 [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + lsn: 0 [016] + signature: 10005 [016] + status: running [016] + vinyl: [] [016] + uptime: 2 [016] + lsn: 0 [016] + sql: [] [016] + gc: [] [016] + pid: 41231 [016] + memory: [] [016] + vclock: {1: 10005} [016] ... [016] test_run:cmd("switch default") [016] --- To check the downstream status and it's message need to wait until an downstream appears. This prevents an attempt to index a nil value when one of those functions are called before a record about a peer appears in box.info.replication. It was observed on test: replication/show_error_on_disconnect after commit c6bea65f ('replication: recfg with 0 quorum returns immediately'). Checked that test still checks the error for which it was created at b9db91e1 ('xlog: fix fallocate vs read race') patch and successfully got the needed error "tx checksum mismatch": [153] --- replication/wal_rw_stress.result Fri Jun 19 15:01:49 2020 [153] +++ replication/wal_rw_stress.reject Fri Jun 19 15:04:02 2020 [153] @@ -73,7 +73,43 @@ [153] ... [153] test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info ... [153] + downstream: [153] + status: stopped [153] + message: tx checksum mismatch Note that wait_cond() allows to overcome a transient network connectivity errors, but 'tx checksum mismatch' is persistent one and will be catched. Closes #4977
Please register or sign in to comment