fix: bug in raft snapshot application
Summary
- fix: bug in raft snapshot application
There was a hard-to-reproduce bug in our snapshot application code. We always compact the raft log before applying the snapshot, because the snapshot replaces the entries and some of the logic in raft-rs seems to rely on this. The problem was, that our compact_log function would not remove any unapplied entries, which makes sense for compaction triggered automatically by raft log size, but doesn't make sense for raft snapshot, as the snapshot contains the state corresponding to the newer entries. The fix is simple: don't guard from unapplied entry compaction in case the compaction is for raft snapshot.
We don't add any regression tests for this, because the implementation would be too difficult and would need us to pollute the code with error injection logic, which is not a worthy trade off in this case. But also the logic will still be tested, because this bug was responsible for a large amount of flaky tests, so we should see a significant reduction in flakiness from now on in tests concerning raft snapshots.
- test: reduce test_large_snapshot flakiness
- Close #1167 (closed), #779 (closed)
- Cherry-pick to: none
- Docs follow-up: not necessary
Merge request reports
Activity
added 1 commit
- fb211b2e - refactor: assert index is <= applied in compact_log instead of truncating
added 1 commit
- 3491c4d7 - refactor: assert index is <= applied in compact_log instead of truncating
added 1 commit
- b89e5e11 - refactor: assert index is <= applied in compact_log instead of truncating
added 1 commit
- 45ba4e17 - refactor: assert index is <= applied in compact_log instead of truncating
added 1 commit
- ceb07c78 - refactor: assert index is <= applied in compact_log instead of truncating
added 1 commit
- fbbd4d6d - refactor: assert index is <= applied in compact_log instead of truncating
- Resolved by Georgy Moshkin
- Resolved by Georgy Moshkin
There might be tests marked
xfail
for being too flacky due to this issue. Have you checked for them?Edited by Егор Ивков
requested review from @e-ivkov
assigned to @gmoshkin
added 8 commits
- 12cb955f - chore: document tips on debugging python tests
- 9d98ba5b - ci: annotate test job log using collapsible sections
- 62413f03 - chore: fix adr naming convention so files are properly ordered by date
- cd1802d9 - feat(sql): support VARCHAR without limit
- 644918d9 - test: reduce test_large_snapshot flakiness
- 87f169a0 - fix: bug in raft snapshot application
- dd0adf33 - test: fix CI=1 hack for local runs
- f5b5055b - refactor: assert index is <= applied in compact_log instead of truncating
Toggle commit list