Skip to content
Snippets Groups Projects

fix: bug in raft snapshot application

Merged Georgy Moshkin requested to merge gmoshkin/raft-msg-reject-term-0 into master

Summary

  • fix: bug in raft snapshot application

There was a hard-to-reproduce bug in our snapshot application code. We always compact the raft log before applying the snapshot, because the snapshot replaces the entries and some of the logic in raft-rs seems to rely on this. The problem was, that our compact_log function would not remove any unapplied entries, which makes sense for compaction triggered automatically by raft log size, but doesn't make sense for raft snapshot, as the snapshot contains the state corresponding to the newer entries. The fix is simple: don't guard from unapplied entry compaction in case the compaction is for raft snapshot.

We don't add any regression tests for this, because the implementation would be too difficult and would need us to pollute the code with error injection logic, which is not a worthy trade off in this case. But also the logic will still be tested, because this bug was responsible for a large amount of flaky tests, so we should see a significant reduction in flakiness from now on in tests concerning raft snapshots.

  • test: reduce test_large_snapshot flakiness

Edited by Georgy Moshkin

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Егор Ивков approved this merge request

    approved this merge request

  • requested review from @e-ivkov

  • Georgy Moshkin added 8 commits

    added 8 commits

    • 12cb955f - chore: document tips on debugging python tests
    • 9d98ba5b - ci: annotate test job log using collapsible sections
    • 62413f03 - chore: fix adr naming convention so files are properly ordered by date
    • cd1802d9 - feat(sql): support VARCHAR without limit
    • 644918d9 - test: reduce test_large_snapshot flakiness
    • 87f169a0 - fix: bug in raft snapshot application
    • dd0adf33 - test: fix CI=1 hack for local runs
    • f5b5055b - refactor: assert index is <= applied in compact_log instead of truncating

    Compare with previous version

  • Georgy Moshkin changed the description

    changed the description

  • Georgy Moshkin resolved all threads

    resolved all threads

  • Georgy Moshkin resolved all threads

    resolved all threads

  • Georgy Moshkin added 4 commits

    added 4 commits

    • 36a59324 - test: reduce test_large_snapshot flakiness
    • e7c8582d - fix: bug in raft snapshot application
    • 1161519a - test: fix CI=1 hack for local runs
    • 0957624d - refactor: assert index is <= applied in compact_log instead of truncating

    Compare with previous version

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading