vinyl: fix secondary index divergence on update
If an UPDATE request doesn't touch key parts of a secondary index, we don't need to re-index it in the in-memory secondary index, as this would only increase IO load. Historically, we use column mask set by the UPDATE operation to skip secondary indexes that are not affected by the operation on commit. However, there's a problem here: the column mask isn't precise - it may have a bit set even if the corresponding column value isn't changed by the update operation, e.g. consider {'+', 2, 0}. Not taking this into account may result in appearance of phantom tuples on disk as the write iterator assumes that statements that have no effect aren't written to secondary indexes (this is needed to apply INSERT+DELETE "annihilation" optimization). We fixed that by clearing column mask bits in vy_tx_set in case we detect that the key isn't changed, for more details see #3607 and commit e72867cb ("vinyl: fix appearance of phantom tuple in secondary index after update"). It was rather an ugly hack, but it worked. However, it turned out that apart from looking hackish this code has a nasty bug that may lead to tuples missing from secondary indexes. Consider the following example: s = box.schema.space.create('test', {engine = 'vinyl'}) s:create_index('pk') s:create_index('sk', {parts = {2, 'unsigned'}}) s:insert{1, 1, 1} box.begin() s:update(1, {{'=', 2, 2}}) s:update(1, {{'=', 3, 2}}) box.commit() The first update operation writes DELETE{1,1} and REPLACE{2,1} to the secondary index write set. The second update replaces REPLACE{2,1} with DELETE{2,1} and then with REPLACE{2,1}. When replacing DELETE{2,1} with REPLACE{2,1} in the write set, we assume that the update doesn't modify secondary index key parts and clear the column mask so as not to commit a pointless request, see vy_tx_set. As a result, we skip the first update too and get key {2,1} missing from the secondary index. Actually, it was a dumb idea to use column mask to skip statements in the first place, as there's a much easier way to filter out statements that have no effect for secondary indexes. The thing is every DELETE statement inserted into a secondary index write set acts as a "single DELETE", i.e. there's exactly one older statement it is supposed to purge. This is, because in contrast to the primary index we don't write DELETE statements blindly - we always look up the tuple overwritten in the primary index first. This means that REPLACE+DELETE for the same key is basically a no-op and can be safely skip. Moreover, DELETE+REPLACE can be treated as no-op, too, because secondary indexes don't store full tuples hence all REPLACE statements for the same key are equivalent. By marking both statements as no-op in vy_tx_set, we guarantee that no-op statements don't make it to secondary index memory or disk levels. Closes #4242
Showing
- src/box/vinyl.c 3 additions, 4 deletionssrc/box/vinyl.c
- src/box/vy_tx.c 21 additions, 49 deletionssrc/box/vy_tx.c
- src/box/vy_tx.h 6 additions, 11 deletionssrc/box/vy_tx.h
- test/engine/update.result 30 additions, 0 deletionstest/engine/update.result
- test/engine/update.test.lua 12 additions, 0 deletionstest/engine/update.test.lua
- test/vinyl/write_iterator.result 1 addition, 1 deletiontest/vinyl/write_iterator.result
- test/vinyl/write_iterator.test.lua 1 addition, 1 deletiontest/vinyl/write_iterator.test.lua
Loading
Please register or sign in to comment