Skip to content
Snippets Groups Projects
Commit e5009dc4 authored by Vladislav Shpilevoy's avatar Vladislav Shpilevoy Committed by Kirill Yukhin
Browse files

raft: don't drop GC when restart relay recovery

When a node becomes a leader, it restarts relay recovery cursors
to re-send all the data since the last acked row.

But during recovery restart the relay lost the trigger, which used
to update GC state in TX thread.

The patch preserves the trigger.

Follow up for #5433
parent a0a60102
No related branches found
No related tags found
No related merge requests found
......@@ -882,8 +882,10 @@ relay_restart_recovery(struct relay *relay)
struct vclock restart_vclock;
vclock_copy(&restart_vclock, &relay->recv_vclock);
vclock_reset(&restart_vclock, 0, vclock_get(&relay->r->vclock, 0));
struct recovery *r = recovery_new(wal_dir(), false, &restart_vclock);
rlist_swap(&relay->r->on_close_log, &r->on_close_log);
recovery_delete(relay->r);
relay->r = recovery_new(wal_dir(), false, &restart_vclock);
relay->r = r;
recover_remaining_wals(relay->r, &relay->stream, NULL, true);
}
......
......@@ -100,6 +100,61 @@ s:drop()
| ---
| ...
-- Ensure the restarted recovery correctly propagates GC state. For that create
-- some noise xlog files, snapshots, and check if the relay reports to GC that
-- it does not use them anymore after scanning.
fiber = require('fiber')
| ---
| ...
s = box.schema.create_space('test')
| ---
| ...
_ = s:create_index('pk')
| ---
| ...
s:replace{1}
| ---
| - [1]
| ...
box.snapshot()
| ---
| - ok
| ...
s:replace{2}
| ---
| - [2]
| ...
box.snapshot()
| ---
| - ok
| ...
test_run:wait_lsn('replica', 'default')
| ---
| ...
lsn = test_run:get_lsn('replica', box.info.id)
| ---
| ...
-- Eventually GC should get the last relayed LSN as it is reported on each
-- relayed xlog file.
test_run:wait_cond(function() \
local consumers = box.info.gc().consumers \
assert(#consumers == 1) \
local vclock = consumers[1].vclock \
if vclock[box.info.id] >= lsn then \
return true \
end \
s:replace{3} \
box.snapshot() \
test_run:wait_lsn('replica', 'default') \
return false \
end)
| ---
| - true
| ...
s:drop()
| ---
| ...
test_run:cmd('stop server replica')
| ---
| - true
......
......@@ -50,6 +50,34 @@ test_run:switch('default')
assert(not test_run:grep_log('default', 'XlogGapError', 1000))
s:drop()
-- Ensure the restarted recovery correctly propagates GC state. For that create
-- some noise xlog files, snapshots, and check if the relay reports to GC that
-- it does not use them anymore after scanning.
fiber = require('fiber')
s = box.schema.create_space('test')
_ = s:create_index('pk')
s:replace{1}
box.snapshot()
s:replace{2}
box.snapshot()
test_run:wait_lsn('replica', 'default')
lsn = test_run:get_lsn('replica', box.info.id)
-- Eventually GC should get the last relayed LSN as it is reported on each
-- relayed xlog file.
test_run:wait_cond(function() \
local consumers = box.info.gc().consumers \
assert(#consumers == 1) \
local vclock = consumers[1].vclock \
if vclock[box.info.id] >= lsn then \
return true \
end \
s:replace{3} \
box.snapshot() \
test_run:wait_lsn('replica', 'default') \
return false \
end)
s:drop()
test_run:cmd('stop server replica')
test_run:cmd('delete server replica')
box.cfg{ \
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment