Skip to content
Snippets Groups Projects
Commit 15a26877 authored by Alxander V. Tikhonov's avatar Alxander V. Tikhonov Committed by Kirill Yukhin
Browse files

test: flaky replication/long_row_timeout.test.lua


On heavy loaded hosts either slow machines like VMware found the
following issues:

  [010] --- replication/long_row_timeout.result   Fri May  8 08:56:08 2020
  [010] +++ var/rejects/replication/long_row_timeout.reject       Mon Jun 21 04:39:08 2021
  [010] @@ -23,7 +23,7 @@
  [010]  ...
  [010]  box.info.replication[2].downstream.status
  [010]  ---
  [010] -- follow
  [010] +- stopped
  [010]  ...
  [010]  -- make applier incapable of reading rows in one go, so that it
  [010]  -- yields a couple of times.
  [010]

It happened because replication downstream status check occurred too
early, when it was only in 'stopped' state. This situation happens
when replica done its initial join, but not reached subscription yet.
To give the replication status check routine ability to reach the
needed 'follow' state, it need for it using test_run:wait_downstream()
routine.

We don't see this issue anymore, but let's fix it still in case we ever
encounter it again.

Also remove the test from fragile list.

Closes #4351

Co-developed-by: default avatarSerge Petrenko <sergepetrenko@tarantool.org>
parent b6a08c5c
No related branches found
No related tags found
No related merge requests found
......@@ -21,9 +21,9 @@ test_run:cmd('start server replica')
---
- true
...
box.info.replication[2].downstream.status
test_run:wait_downstream(2, {status = 'follow'})
---
- follow
- true
...
-- make applier incapable of reading rows in one go, so that it
-- yields a couple of times.
......@@ -54,15 +54,16 @@ test_run:cmd('setopt delimiter ";"')
---
- true
...
ok = true;
status = nil;
---
...
start = fiber.time();
---
...
while fiber.time() - start < 3 * box.cfg.replication_timeout do
if box.info.replication[2].downstream.status ~= 'follow' then
ok = false
status = box.info.replication[2].downstream.status
if status ~= 'follow' then
status = box.info.replication
break
end
fiber.sleep(0.001)
......@@ -73,9 +74,9 @@ test_run:cmd('setopt delimiter ""');
---
- true
...
ok
status
---
- true
- follow
...
s:drop()
---
......
......@@ -8,8 +8,7 @@ test_run = require('test_run').new()
box.schema.user.grant('guest', 'replication')
test_run:cmd('create server replica with rpl_master=default, script="replication/replica.lua"')
test_run:cmd('start server replica')
box.info.replication[2].downstream.status
test_run:wait_downstream(2, {status = 'follow'})
-- make applier incapable of reading rows in one go, so that it
-- yields a couple of times.
......@@ -22,18 +21,19 @@ for i = 1,5 do box.space.test:replace{1, digest.urandom(1024)} collectgarbage('c
-- replication_disconnect_timeout is 4 * replication_timeout, check that
-- replica doesn't time out too early.
test_run:cmd('setopt delimiter ";"')
ok = true;
status = nil;
start = fiber.time();
while fiber.time() - start < 3 * box.cfg.replication_timeout do
if box.info.replication[2].downstream.status ~= 'follow' then
ok = false
status = box.info.replication[2].downstream.status
if status ~= 'follow' then
status = box.info.replication
break
end
fiber.sleep(0.001)
end;
test_run:cmd('setopt delimiter ""');
ok
status
s:drop()
test_run:cmd('stop server replica')
......
......@@ -17,10 +17,6 @@ fragile = {
"issues": [ "gh-3870" ],
"checksums": [ "5d3f58323aafc1a11d9b9264258f7acf", "919921e13968b108d342555746ba55c9" ]
},
"long_row_timeout.test.lua": {
"issues": [ "gh-4351" ],
"checksums": [ "acd88b48b0046ec52346274eeeef0b25", "a645ff7616b5caf0fcd2099022b776bf", "eb3e92564ba71e7b7c458050223f4d57" ]
},
"skip_conflict_row.test.lua": {
"issues": [ "gh-4958" ],
"checksums": [ "a21f07339237cd9d0b8c74e144284449", "0359b0b1cc80052faf96972959513694", "ef104dfd04afa7c75087de13246e3eb0" ]
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment