Skip to content
Snippets Groups Projects
Commit 2c547c26 authored by Vladimir Davydov's avatar Vladimir Davydov Committed by Konstantin Osipov
Browse files

box: rework internal garbage collection API

The current gc implementation has a number of flaws:

 - It tracks checkpoints, not consumers, which makes it impossible to
   identify the reason why gc isn't invoked. All we can see is the
   number of users of each particular checkpoint (reference counter),
   while it would be good to know what references it (replica or
   backup).

 - While tracking checkpoints suits well for backup and initial join, it
   doesn't look good when used for subscribe, because replica is
   supposed to track a vclock, not a checkpoint.

 - Tracking checkpoints from box/gc also violates encapsulation:
   checkpoints are, in fact, memtx snapshots, so they should be tracked
   by memtx engine, not by gc, as they are now. This results in
   atrocities, like having two snap xdirs - one in memtx, another in gc.

 - Garbage collection is invoked by a special internal function,
   box.internal.gc.run(), which is passed the signature of the oldest
   checkpoint to save. This function is then used by the snapshot daemon
   to maintain the configured number of checkpoints. This brings
   unjustified complexity to the snapshot daemon implementation: instead
   of just calling box.snapshot() periodically it has to take on
   responsibility to invoke the garbage collector with the right
   signature. This also means that garbage collection is disabled unless
   snapshot daemon is configured to be running, which is confusing, as
   snapshot daemon is disabled by default.

So this patch reworks box/gc as follows:

 - Checkpoints are now tracked by memtx engine and can be accessed via a
   new module box/src/checkpoint.[hc], which provides simple wrappers
   around corresponding MemtxEngine methods.

 - box/gc.[hc] now tracks not checkpoints, but individual consumers that
   can be registered, unregistered, and advanced. Each consumer has a
   human-readable name displayed by box.internal.gc.info():

   tarantool> box.internal.gc.info()
   ---
   - consumers:
     - name: backup
       signature: 8
     - name: replica 885a81a9-a286-4f06-9cb1-ed665d7f5566
       signature: 12
     - name: replica 5d3e314f-bc03-49bf-a12b-5ce709540c87
       signature: 12
     checkpoints:
     - signature: 8
     - signature: 11
     - signature: 12
   ...

 - box.internal.gc.run() is removed. Garbage collection is now invoked
   automatically by box.snapshot() and doesn't require the snapshot
   daemon to be up and running.
parent 5e20557c
No related branches found
No related tags found
Loading
Showing with 626 additions and 354 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment