gc: rely on minimal vclock components instead of signatures
The current WAL GC implementation tracks consumers (i.e. remote replicas) by their vclock signature, which is the sum of all vclock components. This approach is wrong, and this can be shown by a little example. The example will be a little synthetic, but it'll illustrate the problem. Say, you have 2 masters, A and B with ids 1 and 2 respectively, and a replica C with id 3. Say, С replicates from both A and B, and there is no replication between A and B (say, the instances were reconfigured to not replicate from each other). Now, say replica C has followed A and B to vclock {1:5, 2:13}. At the same time, A has lsn 10 and B has lsn 15. A and B do not know about each other’s changes, so A’s vclock is {1:10} and B’s vclock is {2:15}. Now imagine A does a snapshot and creates a new xlog with signature 10. A’s directory will look like: 00…000.xlog 00…010.snap 00….010.xlog Replica C reports its vclock {1:5, 2:13} to A, A uses the vclock to update the corresponding GC consumer. Since signatures are used, GC consumer is assigned a signature = 13 + 5 = 18. This is greater than the signature of the last xlog on A (which is 10), so the previous xlog (00…00.xlog) can be deleted (at least A assumes it can be). Actually, replica still needs 00…00.xlog, because it contains rows corresponding to vclocks {1:6} - {1:10}, which haven’t been replicated yet. If instead of using vclock signatures, gc consumers used vclocks, such a problem wouldn’t arise. Replica would report its vclock {1:5, 2:13}. The vclock is NOT strictly greater than A’s most recent xlog vclock ({1:10}), so the previous log is kept until replica reports a vclock {1:10, 2:something} or {1:11, …} and so on. Rewrite gc to perform cleanup based on finding minimal vclock components present in at least one of the consumer vclocks instead of just comparing vclock signatures. Prerequisite #4114
Loading
Please register or sign in to comment