Skip to content

Fix/large snapshots

Georgy Moshkin requested to merge fix/large-snapshots into master

Summary

  • Bump tarantool-module -> master

  • Add support for receiving raft snapshot in chunks.

    • This is implemented by opening a read view (which required patching tarantool) onto the global spaces for the duration of snapshot application. The follower will block it's raft main loop until one of

      • raft leader becomes unknown
      • the snapshot producer responds with an error saying that the snapshot read view has disappeared
      • all of the chunks are received and the whole snapshot data is handled The follower may still respond to rpc messages during this time, but will not send any raft messages to anybody, which AFAIK is ok with respect to raft-rs.
    • If the snapshot fits into a single chunk, the read view is closed immediately and only the chunk is cached, otherwise the read view is kept open for as long as someone needs it (simple reference counting is used) or a configurable timeout is reached.

    • When a snapshot is requested by a new follower, the cached one is reused, unless it was generated before the last log truncation.

    • The snapshot cache is cleaned up only when a new snapshot is generated.

  • New pico properties added:

    • snapshot_chunk_max_size which determines (approximately) the size of the snapshot chunk
    • snapshot_read_view_close_timeout which determines when the old snapshot cache (and read views) should be cleaned up
  • Raft main loop has also been refactored to accommodate for the new blocking snapshot application process:

    • Raft soft state is handled before anything else, so that we know who's leader ASAP
    • Raft read states are handled right after the first batch of messages, so that any fibers waiting for the read barrier may do so while the snapshot chunks are being fetched, although now that I've written this out it makes no sense, how is a read state going to arrive in the same batch as a snapshot? Oh well..

Close #335 (closed)

Ensure that

  • New code is covered by tests
  • API is documented
  • Changelog is up to date
  • (if Lua API changed) Lua API version is bumped in luamod.rs
  • (if API docs changed) A follow-up doc issue is created in picodata/docs and linked here
Edited by Yaroslav Dynnikov

Merge request reports