Skip to content
Snippets Groups Projects
Commit e755ad24 authored by Vladimir Davydov's avatar Vladimir Davydov
Browse files

Fix replication freeze if slave bumps lsn while master is down

To avoid rescanning the last recovered xlog in case it has been properly
finalized, recover_remaining_wals() skips xlogs whose signature is less
than the signature of the current recovery position. This assumption is
incorrect if this function is used for replication. For example consider
the following scenario in case of master -> slave replication:

 1. Master temporarily shuts down.
 2. Slave bumps its LSN while master is down.
 3. Master is brought back online.
 4. Slave reconnects to master.

In such a case the recovery vclock signature sent by slave on reconnect
will be greater than the signature of the xlog file created after master
restart, causing replication to silently freeze.

Instead of comparing xlog signature to recovery position, we should
compare it to the signature of the last scanned xlog. To do that, we
need to remove TRASH() from xlog_cursor_close() so that xlog cursor
meta isn't overwritten on close. To make sure nobody attempts to use a
closed cursor, let's add corresponding assertions to each public xlog
cursor function.

Fixes b25c60f0 ("recovery: do not rescan last xlog")

Closes #3038
parent 7d843674
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment