Skip to content
Snippets Groups Projects
Commit 4336265e authored by Vladimir Davydov's avatar Vladimir Davydov
Browse files

test: fix flaky xlog/panic_on_broken_lsn test

The test injects an error for a replication of a particular LSN. The
problem there's a pending record about the new replica written to the
_cluster space on join, which may not have been flushed to WAL by the
time the test injects an error. Due to the race, the test may inject an
error for the _cluster row instead of the test row, which breaks its
expectations.

```
[009] xlog/panic_on_broken_lsn.test.lua                               [ fail ]
[009]
[009] Test failed! Result content mismatch:
[009] --- xlog/panic_on_broken_lsn.result       Tue Nov  2 13:29:53 2021
[009] +++ var/rejects/xlog/panic_on_broken_lsn.reject   Tue Nov  2 13:29:59 2021
[009] @@ -155,8 +155,8 @@
[009]  ...
[009]  (found:gsub('^.*, req: ', ''):gsub('lsn: %d+', 'lsn: <lsn>'))
[009]  ---
[009] -- '{type: ''INSERT'', replica_id: 1, lsn: <lsn>, space_id: 9000, index_id: 0, tuple:
[009] -  [2, "v1"]}'
[009] +- '{type: ''INSERT'', replica_id: 1, lsn: <lsn>, space_id: 320, index_id: 0, tuple:
[009] +  [2, "34898d18-eed4-4f8a-97ff-4ffba7b42892"]}'
[009]  ...
[009]  test_run:cmd('cleanup server replica')
[009]  ---
[009]
```

Fix this by flushing WAL before writing a broken row.

Closes #4508
parent 7083b20b
No related branches found
No related tags found
No related merge requests found
......@@ -114,15 +114,27 @@ box.error.injection.set("ERRINJ_REPLICA_JOIN_DELAY", true)
fiber = require('fiber')
---
...
-- Asynchronously run a function that will:
-- 1. Wait for the replica join to start.
-- 2. Make sure that the record about the new replica written to
-- the _cluster space hits the WAL by writing a row to the test
-- space. This is important, because at the next step we need
-- to compute the LSN of the row that is going to be written to
-- the WAL next so we don't want to race with in-progress WAL
-- writes.
-- 3. Inject an error into replication of the next WAL row and write
-- a row to the test space. This row should break replication.
-- 4. Resume the replica join.
test_run:cmd("setopt delimiter ';'")
---
- true
...
_ = fiber.create(function()
test_run:wait_cond(function() return box.info.replication[2] ~= nil end)
box.space.test:auto_increment{'v1'}
lsn = box.info.vclock[1]
box.error.injection.set("ERRINJ_RELAY_BREAK_LSN", lsn + 1)
box.space.test:auto_increment{'v1'}
box.space.test:auto_increment{'v2'}
box.error.injection.set("ERRINJ_REPLICA_JOIN_DELAY", false)
end);
---
......@@ -156,7 +168,7 @@ found = test_run:grep_log(nil, str, 256, {filename = filename})
(found:gsub('^.*, req: ', ''):gsub('lsn: %d+', 'lsn: <lsn>'))
---
- '{type: ''INSERT'', replica_id: 1, lsn: <lsn>, space_id: 9000, index_id: 0, tuple:
[2, "v1"]}'
[3, "v2"]}'
...
test_run:cmd('cleanup server replica')
---
......
......@@ -46,12 +46,24 @@ lsn = -1
box.error.injection.set("ERRINJ_REPLICA_JOIN_DELAY", true)
fiber = require('fiber')
-- Asynchronously run a function that will:
-- 1. Wait for the replica join to start.
-- 2. Make sure that the record about the new replica written to
-- the _cluster space hits the WAL by writing a row to the test
-- space. This is important, because at the next step we need
-- to compute the LSN of the row that is going to be written to
-- the WAL next so we don't want to race with in-progress WAL
-- writes.
-- 3. Inject an error into replication of the next WAL row and write
-- a row to the test space. This row should break replication.
-- 4. Resume the replica join.
test_run:cmd("setopt delimiter ';'")
_ = fiber.create(function()
test_run:wait_cond(function() return box.info.replication[2] ~= nil end)
box.space.test:auto_increment{'v1'}
lsn = box.info.vclock[1]
box.error.injection.set("ERRINJ_RELAY_BREAK_LSN", lsn + 1)
box.space.test:auto_increment{'v1'}
box.space.test:auto_increment{'v2'}
box.error.injection.set("ERRINJ_REPLICA_JOIN_DELAY", false)
end);
test_run:cmd("setopt delimiter ''");
......
......@@ -16,10 +16,6 @@ fragile = {
"issues": [ "gh-4952" ],
"checksums": [ "7c9571a53f3025f02ab23703939a02d6" ]
},
"panic_on_broken_lsn.test.lua": {
"issues": [ "gh-4991" ],
"checksums": [ "005597305c925b49ed6f247a102486e0" ]
},
"panic_on_wal_error.test.lua": {
"issues": [ "gh-5348" ],
"checksums": [ "b874bcc6f69faa2c5654ecc9fe8474de", "7bebfd82b2419c3cf2235f222e835af8" ]
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment