recovery: do not stall replication on unfinished xlog
recover_remaining_wals() doesn't proceed to the next xlog until the current one is finalized (EOF marker is written). As a result, if an xlog wasn't finalized for some reason (IO error occurred or tarantool was killed in cold blood with SIGKILL), replication will stall on the unfinished xlog forever. To preclude that, let's differentiate WAL write and rotation in WAL watcher and force recover_remaining_wals() rescan the WAL directory on any WAL rotation so that it will continue to the next WAL even if the current one wasn't properly finalized. In case of hot standby, we can't reliably detect WAL rotation (e.g. on Mac OS inotify may not work) so also rescan the WAL directory every wal_dir_rescan_delay seconds. Closes #2294
Showing
- src/box/recovery.cc 40 additions, 32 deletionssrc/box/recovery.cc
- src/box/recovery.h 1 addition, 1 deletionsrc/box/recovery.h
- src/box/relay.cc 5 additions, 3 deletionssrc/box/relay.cc
- src/box/wal.cc 19 additions, 15 deletionssrc/box/wal.cc
- src/box/wal.h 16 additions, 6 deletionssrc/box/wal.h
- src/box/xlog.cc 17 additions, 2 deletionssrc/box/xlog.cc
- src/errinj.h 1 addition, 0 deletionssrc/errinj.h
- test/box/errinj.result 2 additions, 0 deletionstest/box/errinj.result
- test/replication/errinj.result 28 additions, 0 deletionstest/replication/errinj.result
- test/replication/errinj.test.lua 9 additions, 0 deletionstest/replication/errinj.test.lua
Loading
Please register or sign in to comment