test: fix flaky xlog/panic_on_broken_lsn test

The test injects an error for a replication of a particular LSN. The problem there's a pending record about the new replica written to the _cluster space on join, which may not have been flushed to WAL by the time the test injects an error. Due to the race, the test may inject an error for the _cluster row instead of the test row, which breaks its expectations. ``` [009] xlog/panic_on_broken_lsn.test.lua [ fail ] [009] [009] Test failed! Result content mismatch: [009] --- xlog/panic_on_broken_lsn.result Tue Nov 2 13:29:53 2021 [009] +++ var/rejects/xlog/panic_on_broken_lsn.reject Tue Nov 2 13:29:59 2021 [009] @@ -155,8 +155,8 @@ [009] ... [009] (found:gsub('^.*, req: ', ''):gsub('lsn: %d+', 'lsn: <lsn>')) [009] --- [009] -- '{type: ''INSERT'', replica_id: 1, lsn: <lsn>, space_id: 9000, index_id: 0, tuple: [009] - [2, "v1"]}' [009] +- '{type: ''INSERT'', replica_id: 1, lsn: <lsn>, space_id: 320, index_id: 0, tuple: [009] + [2, "34898d18-eed4-4f8a-97ff-4ffba7b42892"]}' [009] ... [009] test_run:cmd('cleanup server replica') [009] --- [009] ``` Fix this by flushing WAL before writing a broken row. Closes #4508

test: fix flaky xlog/panic_on_broken_lsn test
4336265e · Vladimir Davydov · 7083b20b · 4336265e · 4336265e · 4336265e
Commit 4336265e authored 3 years ago by Vladimir Davydov
--- a/test/xlog/panic_on_broken_lsn.result
+++ b/test/xlog/panic_on_broken_lsn.result
@@ -114,15 +114,27 @@ box.error.injection.set("ERRINJ_REPLICA_JOIN_DELAY", true)
 fiber = require('fiber')
 ---
 ...
+-- Asynchronously run a function that will:
+-- 1. Wait for the replica join to start.
+-- 2. Make sure that the record about the new replica written to
+--    the _cluster space hits the WAL by writing a row to the test
+--    space. This is important, because at the next step we need
+--    to compute the LSN of the row that is going to be written to
+--    the WAL next so we don't want to race with in-progress WAL
+--    writes.
+-- 3. Inject an error into replication of the next WAL row and write
+--    a row to the test space. This row should break replication.
+-- 4. Resume the replica join.
 test_run:cmd("setopt delimiter ';'")
 ---
 - true
 ...
 _ = fiber.create(function()
    test_run:wait_cond(function() return box.info.replication[2] ~= nil end)
+    box.space.test:auto_increment{'v1'}
    lsn = box.info.vclock[1]
    box.error.injection.set("ERRINJ_RELAY_BREAK_LSN", lsn + 1)
-    box.space.test:auto_increment{'v1'}
+    box.space.test:auto_increment{'v2'}
    box.error.injection.set("ERRINJ_REPLICA_JOIN_DELAY", false)
 end);
 ---
@@ -156,7 +168,7 @@ found = test_run:grep_log(nil, str, 256, {filename = filename})
 (found:gsub('^.*, req: ', ''):gsub('lsn: %d+', 'lsn: <lsn>'))
 ---
 - '{type: ''INSERT'', replica_id: 1, lsn: <lsn>, space_id: 9000, index_id: 0, tuple:
-  [2, "v1"]}'
+  [3, "v2"]}'
 ...
 test_run:cmd('cleanup server replica')
 ---

--- a/test/xlog/panic_on_broken_lsn.test.lua
+++ b/test/xlog/panic_on_broken_lsn.test.lua
@@ -46,12 +46,24 @@ lsn = -1
 box.error.injection.set("ERRINJ_REPLICA_JOIN_DELAY", true)

 fiber = require('fiber')
+-- Asynchronously run a function that will:
+-- 1. Wait for the replica join to start.
+-- 2. Make sure that the record about the new replica written to
+--    the _cluster space hits the WAL by writing a row to the test
+--    space. This is important, because at the next step we need
+--    to compute the LSN of the row that is going to be written to
+--    the WAL next so we don't want to race with in-progress WAL
+--    writes.
+-- 3. Inject an error into replication of the next WAL row and write
+--    a row to the test space. This row should break replication.
+-- 4. Resume the replica join.
 test_run:cmd("setopt delimiter ';'")
 _ = fiber.create(function()
    test_run:wait_cond(function() return box.info.replication[2] ~= nil end)
+    box.space.test:auto_increment{'v1'}
    lsn = box.info.vclock[1]
    box.error.injection.set("ERRINJ_RELAY_BREAK_LSN", lsn + 1)
-    box.space.test:auto_increment{'v1'}
+    box.space.test:auto_increment{'v2'}
    box.error.injection.set("ERRINJ_REPLICA_JOIN_DELAY", false)
 end);
 test_run:cmd("setopt delimiter ''");

--- a/test/xlog/suite.ini
+++ b/test/xlog/suite.ini
@@ -16,10 +16,6 @@ fragile = {
            "issues": [ "gh-4952" ],
            "checksums": [ "7c9571a53f3025f02ab23703939a02d6" ]
        },
-        "panic_on_broken_lsn.test.lua": {
-            "issues": [ "gh-4991" ],
-            "checksums": [ "005597305c925b49ed6f247a102486e0" ]
-        },
        "panic_on_wal_error.test.lua": {
            "issues": [ "gh-5348" ],
            "checksums": [ "b874bcc6f69faa2c5654ecc9fe8474de", "7bebfd82b2419c3cf2235f222e835af8" ]