Skip to content
Snippets Groups Projects
Commit 2993a758 authored by Vladimir Davydov's avatar Vladimir Davydov Committed by Roman Tsisyk
Browse files

vinyl: rework replication

Currently, on initial join we send the current vinyl state. To do that,
we open a read iterator over a space's primary index and send statements
returned by it. Such an approach has a number of inherent problems:

 - An open read iterator blocks compaction, which is unacceptable for
   such a long operation as join. To avoid blocking compaction, we open
   the iterator in the dirty mode, i.e. it skims over the tops. This,
   however, introduces a different kind of problem: this makes the
   threshold between initial and final join phases hazy - statements
   sent on final join may or may not be among those sent during the
   initial join, and there's no efficient way to differentiate between
   them w/o sending extra information.

 - The replica expects LSNs to be growing monotonically. This constraint
   is imposed by the lsregion allocator used for storing statements in
   memory, but read iterator returns statements ordered by key, not by
   LSN. Currently, replica simply crashes if statements happen to be
   sent in an order different from chronological, which renders vinyl
   replication unusable. In the scope of the current model, we can't fix
   this by assigning fake LSNs to statements received on initial join,
   because there's no strict LSN threshold between initial and final
   join phases (see the previous paragraph).

 - In the initial join phase, replica is only aware of  spaces that were
   created before the last snapshot, while vinyl sends statements from
   spaces that exist now. As a result, if a space was created after the
   most recent snapshot, the replica won't be able to receive its tuples
   and fail.

To address the above-mentioned problems, we make vinyl initial join send
the latest snapshot, just like in case of memtx. We implement this by
loading the vinyl state from the last snapshot of the metadata log and
sending statements of all runs from the snapshot as is (including
deletes and updates), to be applied by the replica. To make lsregion at
the receiving end happy, we assign fake monotonically growing LSNs to
statements received on initial join. This is OK, because

  any LSN from final join > max real LSN from initial join
  max real LSN from initial join >= max fake LSN

hence

  any LSN from final join > any fake LSN from initial join

Besides fixing vinyl replication, this patch also enables the
replication test suite for the vinyl engine (except for hot_standby)
and makes engine/replica_join cover the following test cases:
 - secondary indexes
 - delete and update statements
 - keys added in an order different from LSN
 - recreate space after checkpoint

Closes #1911
Closes #2001
parent ac87c66b
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment