Skip to content
Snippets Groups Projects
Commit 06eda0f7 authored by Alexander V. Tikhonov's avatar Alexander V. Tikhonov Committed by Kirill Yukhin
Browse files

test: fix flaky replication/wal_rw_stress.test.lua

Found issue (reproduced on VBox FreeBSD machine):

 [016] --- replication/wal_rw_stress.result	Fri Feb 21 11:53:21 2020
 [016] +++ replication/wal_rw_stress.reject	Fri May  8 08:23:56 2020
 [016] @@ -73,7 +73,42 @@
 [016]  ...
 [016]  box.info.replication[1].downstream.status ~= 'stopped' or box.info
 [016]  ---
 [016] -- true
 [016] +- version: 2.5.0-27-g32f59756a
 [016] +  id: 2
 [016] +  ro: false
 [016] +  uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +  package: Tarantool
 [016] +  cluster:
 [016] +    uuid: 397c196f-9105-11ea-96ab-08002739cbd6
 [016] +  listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto
 [016] +  replication:
 [016] +    1:
 [016] +      id: 1
 [016] +      uuid: 397a1886-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 10005
 [016] +      upstream:
 [016] +        status: follow
 [016] +        idle: 0.46353673400017
 [016] +        peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto
 [016] +        lag: -0.45732522010803
 [016] +      downstream:
 [016] +        status: stopped
 [016] +        message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati
 [016] +        system_message: Broken pipe
 [016] +    2:
 [016] +      id: 2
 [016] +      uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 0
 [016] +  signature: 10005
 [016] +  status: running
 [016] +  vinyl: []
 [016] +  uptime: 2
 [016] +  lsn: 0
 [016] +  sql: []
 [016] +  gc: []
 [016] +  pid: 41231
 [016] +  memory: []
 [016] +  vclock: {1: 10005}
 [016]  ...
 [016]  test_run:cmd("switch default")
 [016]  ---

To check the downstream status and it's message need to wait until an
downstream appears. This prevents an attempt to index a nil value when
one of those functions are called before a record about a peer appears
in box.info.replication. It was observed on test:
  replication/show_error_on_disconnect
after commit
  c6bea65f ('replication: recfg with 0
quorum returns immediately').

Checked that test still checks the error for which it was created at
b9db91e1 ('xlog: fix fallocate vs
read race') patch and successfully got the needed error "tx checksum
mismatch":

[153] --- replication/wal_rw_stress.result      Fri Jun 19 15:01:49 2020
[153] +++ replication/wal_rw_stress.reject      Fri Jun 19 15:04:02 2020
[153] @@ -73,7 +73,43 @@
[153]  ...
[153]  test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info
...
[153] +      downstream:
[153] +        status: stopped
[153] +        message: tx checksum mismatch

Note that wait_cond() allows to overcome a transient network
connectivity errors, but 'tx checksum mismatch' is persistent
one and will be catched.

Closes #4977
parent 3e904475
No related branches found
No related tags found
No related merge requests found
......@@ -18,6 +18,7 @@ fragile = errinj.test.lua ; gh-3870
sync.test.lua ; gh-3835 gh-3877
transaction.test.lua ; gh-4312
wal_rw_stress.test.lua ; gh-4977
wal_off.test.lua ; gh-4355
replica_rejoin.test.lua ; gh-4985
recover_missing_xlog.test.lua ; gh-4989
box_set_replication_stress ; gh-4992
......@@ -2,7 +2,7 @@ test_run = require('test_run').new()
---
...
--
-- gh-3893: Replication failure: relay may report that an xlog
-- gh-3883: Replication failure: relay may report that an xlog
-- is corrupted if it it currently being written to.
--
s = box.schema.space.create('test')
......@@ -71,7 +71,9 @@ test_run:cmd("switch replica")
box.cfg{replication = replication}
---
...
box.info.replication[1].downstream.status ~= 'stopped' or box.info
test_run:wait_cond(function() \
return box.info.replication[1].downstream.status ~= 'stopped' \
end) or box.info
---
- true
...
......
test_run = require('test_run').new()
--
-- gh-3893: Replication failure: relay may report that an xlog
-- gh-3883: Replication failure: relay may report that an xlog
-- is corrupted if it it currently being written to.
--
s = box.schema.space.create('test')
......@@ -38,7 +38,9 @@ test_run:cmd("setopt delimiter ''");
-- are running in different threads, there shouldn't be any rw errors.
test_run:cmd("switch replica")
box.cfg{replication = replication}
box.info.replication[1].downstream.status ~= 'stopped' or box.info
test_run:wait_cond(function() \
return box.info.replication[1].downstream.status ~= 'stopped' \
end) or box.info
test_run:cmd("switch default")
-- Cleanup.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment