Skip to content
Snippets Groups Projects
Commit 5a61c471 authored by Nikita Pettik's avatar Nikita Pettik
Browse files

vinyl: rework upsert operation

Previous upsert implementation had a few drawbacks which led to number
of various bugs and issues.

Issue #5092 (redundant update operations execution)

In a nutshell, application of upsert(s) (on top of another upsert)
consists of two actions (see vy_apply_upsert()): execute and squash.
Consider example:

insert({1, 1})  -- terminal statement, stored on disk
upsert({1}, {{'-', 2, 20}}) -- old ups1
upsert({1}, {{'+', 2, 10}}) -- new ups2

'Execute' step takes update operations from the new upsert and combines them
with key of the old upsert.  {1} + {'+', 2, 10} can't be evaluated since
key consists of only one field. Note that in case upsert doesn't fold
into insert the upsert's tuple and the tuple stored in index can be
different. In our particular case, tuple stored on disk has two fields
({1, 1}), so first upsert's update operation can be applied to it:
{1, 1} + {'+', 2, 10} --> {1, 11}. If upsert's operation can't be executed
using key of old upsert, we simply continue processing squash step.
In turn 'squash' is a combination of update operations: arithmetic
operations are combined so we don't have to store actions over the same
field; the rest operations - are merged into single array. As a result,
we get one upsert with squashed operations: upsert({1}, {{'+', 2, -10}}).
Then vy_apply_upsert() is called again to apply new upsert on the top of
terminal statement - insert{1, 1}. Since now tuple has second field,
update operations can be executed and corresponding result is {1, -9}.
It is the final result of upsert application procedure.
Now imagine that we have following upserts:

upsert({1, 1}, {{'-', 2, 20}}) -- old ups1
upsert({1}, {{'+', 2, 10}}) -- new ups2

In this case execution successfully finishes and modifies old upsert's
tuple: {1, 1} + {'+', 2, 10} --> {1, 11}
However, we still have to squash/accumulate update operations since they
may be applied on tuple stored on disk later. After all, we have
following upsert: upsert({2, 11}, {{'+', 2, -10}}). Then it is applied
on the top of insert({1, 1}) and we get the same result as in the first
case - {1, -9}. The only difference is that upsert's tuple was modified.
As one can see, execution of update operations applied to upsert's tuple
is redundant in the case index already contains tuple with the same key
(i.e. when upserts turns into update). Instead, we are able to
accumulate/squash update operations only. When the last upsert is being
applied, we can either execute all update operation on tuple fetched
from index (i.e. upsert is update) OR on tuple specified in the first
upsert (i.e. first upsert is insert).

Issue #5105 (upsert doesn't follow associative property)

Secondly, current approach breaks associative property: after upsert's
update operations are merged into one array, part of them (related to
one upsert) can be skipped, meanwhile the rest - is applied. For
instance:

-- Index is over second field.
i = s:create_index('pk', {parts={2, 'uint'}})
s:replace{1, 2, 3, 'default'}
s:upsert({2, 2, 2}, {{'=', 4, 'upserted'}})
-- First update operation modifies primary key, so upsert must be ignored.
s:upsert({2, 2, 2}, {{'#', 1, 1}, {'!', 3, 1}})

After merging two upserts we get the next one:
upsert({2, 2, 2}, {{'=', 4, 'upserted'}, {'#', 1, 1}, {'!', 3, 1}}

While we executing update operations, we don't distinguish operations from
different upserts. Thus, if one operation fails, the rest are ignored
as well. As a result, first (in general case - all preceding squashed
upserts) upsert won't be applied, even despite the fact it is
absolutely correct. What is more, user gets no error/warning concerning
this fact.

Issue #1622 (no upsert result validation)

After upsert application, there's no check verifying that result
satisfies space's format: number of fields, their types, overflows etc.
Due to this tuples violating format may appear in the space, which in
turn may lead to unpredictable consequences.

To resolve these issues, let's group update operations of each upsert into
separate array. So that operations related to particular upsert are
stored in single array. In terms of previous example we will get:
upsert({2, 2, 2}, {{{'=', 4, 'upserted'}}, {{'#', 1, 1}, {'!', 3, 1}}}

Also note that we don't longer have to apply update operations on tuple
in vy_apply_upsert() when it comes for two upserts: it can be done once we
face terminal statement; or if there's no underlying statement (i.e. it is
delete statement or no statement at all) we apply all update arrays except
the first one on upsert's tuple. In case one of operations from array
fail, we skip the rest operations from this array and process to the
next array. After successful application of update operations of each
array, we check that the resulting tuple fits into space format. If they
aren't, we rollback applied operations, log error and moving to the next
group of operations.

Finally, arithmetic operations are not longer able to be combined: it is
requirement which is arises from #5105 issue.  Otherwise, result of
upserts combination might turn out to be inapplicable to the tuple
stored on disk (e.g. result applied on tuple leads to integer overflow -
in this case only last upsert leading to overflow must be ignored).

Closes #1622
Closes #5105
Closes #5092
Part of #5107
parent 33870e37
No related branches found
No related tags found
No related merge requests found
Showing
with 1074 additions and 132 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment