Commit 69aee6fc authored 5 years ago by Vladimir Davydov
vinyl: fix secondary index divergence on update

If an UPDATE request doesn't touch key parts of a secondary index, we
don't need to re-index it in the in-memory secondary index, as this
would only increase IO load. Historically, we use column mask set by the
UPDATE operation to skip secondary indexes that are not affected by the
operation on commit. However, there's a problem here: the column mask
isn't precise - it may have a bit set even if the corresponding column
value isn't changed by the update operation, e.g. consider {'+', 2, 0}.
Not taking this into account may result in appearance of phantom tuples
on disk as the write iterator assumes that statements that have no
effect aren't written to secondary indexes (this is needed to apply
INSERT+DELETE "annihilation" optimization). We fixed that by clearing
column mask bits in vy_tx_set in case we detect that the key isn't
changed, for more details see #3607 and commit e72867cb ("vinyl: fix
appearance of phantom tuple in secondary index after update"). It was
rather an ugly hack, but it worked.

However, it turned out that apart from looking hackish this code has
a nasty bug that may lead to tuples missing from secondary indexes.
Consider the following example:

  s = box.schema.space.create('test', {engine = 'vinyl'})
  s:create_index('pk')
  s:create_index('sk', {parts = {2, 'unsigned'}})
  s:insert{1, 1, 1}

  box.begin()
  s:update(1, {{'=', 2, 2}})
  s:update(1, {{'=', 3, 2}})
  box.commit()

The first update operation writes DELETE{1,1} and REPLACE{2,1} to the
secondary index write set. The second update replaces REPLACE{2,1} with
DELETE{2,1} and then with REPLACE{2,1}. When replacing DELETE{2,1} with
REPLACE{2,1} in the write set, we assume that the update doesn't modify
secondary index key parts and clear the column mask so as not to commit
a pointless request, see vy_tx_set. As a result, we skip the first
update too and get key {2,1} missing from the secondary index.

Actually, it was a dumb idea to use column mask to skip statements in
the first place, as there's a much easier way to filter out statements
that have no effect for secondary indexes. The thing is every DELETE
statement inserted into a secondary index write set acts as a "single
DELETE", i.e. there's exactly one older statement it is supposed to
purge. This is, because in contrast to the primary index we don't write
DELETE statements blindly - we always look up the tuple overwritten in
the primary index first. This means that REPLACE+DELETE for the same key
is basically a no-op and can be safely skip. Moreover, DELETE+REPLACE
can be treated as no-op, too, because secondary indexes don't store full
tuples hence all REPLACE statements for the same key are equivalent.
By marking both statements as no-op in vy_tx_set, we guarantee that
no-op statements don't make it to secondary index memory or disk levels.

Closes #4242
parent d07e3ac2
No related branches found
No related tags found
No related merge requests found
Hide whitespace changes
Inline Side-by-side
Showing with 74 additions and 66 deletions
Please register or to comment