Commits · f4cb5fc8ae8fc31aa36fc69e3def753a46c1b008 · core / tarantool

Aug 22, 2018

Introduce _vinyl_deferred_delete system space · f4cb5fc8

The space is a blackhole. It will be used for writing deferred DELETE
statements generated by vinyl compaction tasks to WAL so that we can
recover deferred DELETEs that hadn't been dumped to disk before the
server was restarted.

Since the new space doesn't depend on other system spaces, let's assign
the minimal possible id to it, i.e. 257.

Needed for #2129

f4cb5fc8

vinyl: allow to skip certain statements on read · baf3657b

Vladimir Davydov authored 6 years ago

In the scope of #2129 we will defer insertion of certain DELETE
statements into secondary indexes until primary index compaction.
However, by the time we invoke compaction, new statements might
have been inserted into the space for the same set of keys.
If that happens, insertion of a deferred DELETE will break the
invariant which the read iterator relies upon: that for any key
older sources store older statements. To avoid that, let's add
a new per statement flag, VY_STMT_SKIP_READ, and make the read
iterator ignore statements marked with it.

Needed for #2129

baf3657b

vinyl: prepare write iterator heap comparator for deferred DELETEs · 8767768c

Vladimir Davydov authored 6 years ago

In the scope of #2129, we won't delete the overwritten tuple from
secondary indexes immediately on REPLACE. Instead we will defer
generation of the DELETE statement until the primary index compaction.
However, it may happen that the overwritten tuple and the tuple that
overwrote it have the same secondary key parts, in which case the
deferred DELETE is not needed and should be discarded on secondary
index compaction. This patch makes the write iterator heap comparator
function discard such useless deferred DELETEs.

Note, this patch also removes the code that prioritises terminal
statements over UPSERTs in the write iterator, which, according to the
comment, may happen only during forced recovery. I don't see why we
should do that, even during forced recovery, neither have I managed to
find the reason in the commit history, so I dropped this code in order
not to overburden the write iterator logic with some esoteric cases.

Needed for #2129

8767768c

vinyl: teach write iterator to return overwritten tuples · 1a6e32ce

Vladimir Davydov authored 6 years ago

A REPLACE/DELETE request is supposed to delete the old tuple from all
indexes. In order to generate a DELETE statement for a secondary index,
we need to look up the old tuple in the primary index, which is costly
as it implies a random disk access. In the scope of #2129 we are
planning to optimize out the lookup by deferring generation of the
DELETE statement until primary index compaction.

To do that, we need to differentiate statements for which DELETE was
deferred from those for which it was inserted when the request was
executed (as it is the case for UPDATE). So this patch introduces a per
statement flag, VY_STMT_DEFERRED_DELETE. If set for a REPLACE or DELETE
statement, it will make the write iterator to return the overwritten
statement to the caller via a callback.

Needed for #2129

1a6e32ce

vinyl: do not store meta in secondary index runs · 504bc805

Vladimir Davydov authored 6 years ago

Currenlty, tuple meta is only needed for storing statement flags in run
files. In the scope of #2129 two statement flags will be introduced,
VY_STMT_SKIP_READ and VY_STMT_DEFERRED_DELETE. None of them makes any
sense for secondary indexes. If we encode meta for secondary index
statements, we will have to either clear the flags on the upper level
(e.g. in the write iterator) or filter them out before encoding a
statement. Alternatively, we can skip encoding meta for secondary index
statements altogether, and this is what this patch does, because it's
the simplest and clearest method for now. If tuple meta is ever used for
storing anything else besides statement flags or a new statement flag
appears that may be used with secondary index statements, we will
recover the code and mask out those flags for secondary indexes.

504bc805

Add entities user, role to access control · c46702ff

Serge Petrenko authored 6 years ago

Previously the only existing entities in access control were space,
funciton and sequence. Added user and role entities, so it is now
possible to create users or roles without create privilege on universe.
Also added all the needed checks and modified tests accordingly.

Closes #3524
Needed for #3530

c46702ff

Introduce separate entity object types for entity privileges · ba5aec56

Serge Petrenko authored 6 years ago

When granting or revoking a privilege on an entire entity, id 0 was used
to indicate the fact that we don't grant a privilege on a single object,
but on a whole entity. This caused confusion, because for entity USER,
for example, id 0 is a valid object id (user 'guest' uses it).
Any non-zero id dedicated to this cause obviously may be confused as well.
Fix this by creating separate schema_object_types for entities:
SC_ENTITY_SPACE, SC_ENTITY_SEQUENCE, etc.

Closes #3574
Needed for #3524

ba5aec56

vinyl: zap vy_mem::min_lsn and rename max_lsn to dump_lsn · e269366c

Vladimir Davydov authored 6 years ago

We never use vy_mem::min_lsn so let's zap it. As for max_lsn, we only
need it to update vy_lsm::dump_lsn (max LSN stored on disk). Let's
rename it appropriately. There's another reason to do that. Once we
start storing deferred DELETE statements in memory (see #2129), it won't
be the max statement LSN stored in vy_mem anymore, because we will
account WAL LSN of deferred DELETE statements there too. Renaming it to
dump_lsn will help avoid confusion.

Needed for #2129

e269366c

vinyl: fix backup skipping vylog created after recovery · 2457be71

Vladimir Davydov authored 6 years ago

Commit 8e710090 ("vinyl: simplify vylog recovery from backup") broke
backup in case the vylog directory is empty at the time of recovery
(i.e. vinyl isn't in use): before the commit we added the last
checkpoint vclock to the vylog directory index in this case while now we
don't. As a result, if the user starts using vinyl (creates a vinyl
space) after recovery, backup will not return the newly created vylog,
because it hasn't been indexed.

Fix this issue by restoring the code that adds the last checkpoint
vclock to the vylog directory index in case no vylog exists on recovery.

Closes #3624

2457be71

xlog: simplify xdir_add_vclock protocol · 6fefb627

Vladimir Davydov authored 6 years ago

We add the vclock of a new snapshot/xlog/vylog file to the corresponding
xdir, because we need it for garbage collection and backup. We use the
xdir_add_vclock() function for that. Currently, the function protocol is
rather abstruse: it expects that the caller allocates a vclock with
malloc() and passes the ownership on to the xdir by calling this
function. This is done that way, because we add a vclock to an xdir
after committing a new xlog file, where we can't fail anymore. Since
malloc() can theoretically fail, we allocate a vclock before writing an
xlog and add it to an xdir only after the xlog is committed. This makes
the code look rather complicated.

Actually, as explained in #3534, malloc() doesn't normally fail so this
complexity is in fact needless and only makes it more difficult to patch
the code. Let's simplify the code by making xdir_add_vclock() allocate
vclock by itself and, since it can't fail, panic on allocation error.

6fefb627

Aug 21, 2018

Merge branch '1.9' into 1.10 · 461a5b48
Vladimir Davydov authored 6 years ago

461a5b48

lua: fix for option pid_file overwritten by tarantoolctl · a1d685f3

Konstantin Belyavskiy authored 6 years ago

During startup tarantoolctl ignores 'pid_file' option and set it to
default value.
This cause a fault if user tries to execute config with option set.
In case of being started with tarantoolctl shadow this option with
additional wrapper around box.cfg.

Closes #3214

a1d685f3

Add FindICONV and iconv wrapper · dcac64af

Konstantin Belyavskiy authored 6 years ago

Fixing build under FreeBSD:
Undefined symbol "iconv_open"
Add compile time build check with FindICONV.cmake
and a wrapper to import relevant symbol names with include file.

Closes #3441

dcac64af

box: fix long uri output in box.info() · aa7831c2

Serge Petrenko authored 6 years ago

lua_pushapplier() had an inexplicably small buffer for uri representation.
Enlarged the buffer. Also lua_pushapplier() didn't take into account
that uri_format() could return a value larger than buffer size. Fixed.

Closes #3630

aa7831c2

tarantoolctl: return an error on enter to a dead socket · c8805858

Serge Petrenko authored 6 years ago

Tarantoolctl enter didn't check whether connection to a socket was
established if a socket file existed. It just executed a local console.
Fix this by adding a check and an error, also add a test case.

Closes #3364

c8805858

Aug 20, 2018

netbox: discard all requests on a connection close · b2b84f47

Vladislav Shpilevoy authored 6 years ago

Their responses are unreachable now since the socket is closed.
The bug was introduced by me during big netbox refactoring and
removal of requests retries here: 62ba7ba7. Such requests
evidently can be discarded immediately.

Closes #3629

b2b84f47

tarantoolctl: remove confusing message on eval error · dc648b99

Vladimir Davydov authored 6 years ago

If `tarantoolctl eval` fails, apart from returning 3 and printing the
eval error to stderr, tarantoolctl will also emit the following message:

  Error while reloading config:

This message is quite confusing and useless too, as we have the return
code for that. Let's zap it.

Closes #3560

dc648b99

Aug 17, 2018

vinyl: add function to create surrogate deletes from raw msgpack · c9dec54d

Vladimir Davydov authored 6 years ago

Currently, there's only vy_stmt_new_surrogate_delete(), which takes a
tuple. Let's add vy_stmt_new_surrogate_delete_raw(), which takes raw
msgpack data.

Needed for #2129

c9dec54d

vinyl: remove pointless assertion from vy_stmt_new_surrogate_delete · e136966f

Vladimir Davydov authored 6 years ago

For some reason, vy_stmt_new_surrogate_delete() checks that the source
tuple has all fields mandated by the space format (min_field_count).
This is pointless, because to generate a surrogate DELETE statement, we
don't need all tuple fields - if a field is absent it will be replaced
with NULL. We haven't stepped on this assertion, because we always
create surrogate DELETEs from full tuples. However, to implement #2129
we need to be able to create surrogate DELETEs from tuples that only
have indexed fields. So let's remove this assertion.

Needed for #2129

e136966f

vinyl: pin last statement returned by write iterator explicitly · d0e32738

Vladimir Davydov authored 6 years ago

Currently, the last statement returned by the write iterator is
referenced indirectly, via a read view. This works, because the write
iterator can only return a statement if it corresponds to a certain read
view.  However, in the scope of #2129, the write iterator will also have
to keep statements for which a deferred DELETE hasn't been generated
yet, even if no read view needs it. So let's make the write iterator
reference the last returned statement explicitly, i.e. via a dedicated
member of the write_iterator struct.

Needed for #2129

d0e32738

vinyl: introduce statement flags · 2377baa2

Vladimir Davydov authored 6 years ago

In the scope of #2129 we need to mark REPLACE statements for which we
generated DELETE in secondary indexes so that we don't generate DELETE
again on compaction. We also need to mark DELETE statements that were
generated on compaction so that we can skip them on SELECT.

Let's add flags field to struct vy_stmt. Flags are stored both in memory
and on disk - they are encoded in tuple meta in the latter case.

Needed for #2129

2377baa2

xrow: allow to store tuple metadata in request · e4cc14f9

Vladimir Davydov authored 6 years ago

This patch set allows to store msgpack map with arbitrary keys inside
a request. In particular, this is needed to store vinyl statement flags
in run files.

Needed for #2129

e4cc14f9

box: expose on_commit/rollback triggers for Lua · a49c8c02

Vladislav Shpilevoy authored 6 years ago

On commit/rollback triggers are already implemented
within Tarantool internals. The patch just exposes
them for Lua. Below the API is described, which
deserves an attention though.

Closes #857

@TarantoolBot document
Title: Document box.on_commit/on_rollback triggers
On commit/rollback triggers can be set similar to
space:on_replace triggers:

    box.on_commit/rollback(new_trigger, old_trigger)

A trigger can be set only inside an active
transaction. When a trigger is called, it takes 1
parameter: an iterator over the transaction
statements.

    box.on_commit/on_rollback(function(iterator)
        for i, old_tuple, new_tuple, space_id in iterator() do
            -- Do something with tuples and space ...
        end
    end)

On each step the iterator returns 4 values: statement
number (grows from 1 to statement count), old tuple or
nil, new tuple or nil and space id. Old tuple is not
nil when the statement updated or deleted the existing
tuple. New tuple is not nil when the statement updated
or inserted the tuple.

Limitations:
    * the iterator can not be used outside of the
      trigger. Otherwise it throws an error;
    * a trigger can not do any database requests (DML,
      DDL, DQL) - behaviuor is undefined;
    * on_commit/rollback triggers shall not fail,
      otherwise Tarantool exits with panic.

a49c8c02

xlog: fix out of static memory on metadata load · 961b7274

Kirill Shcherbatov authored 6 years ago

This problem triggered asan checks on start tarantool
with existent xlog. We don't have to touch even static
non-initialized memory.

961b7274

replication: fix assertion in relay_cancel() · 54d0a792

Serge Petrenko authored 6 years ago

If relay thread is already exiting (but hadn't executed relay_stop()
yet) and relay_cancel() is called we may encounter an error trying to
call pthread_cancel() after the thread has exited. Handle this case.

Follow-up #3485

54d0a792

vinyl: remove env argument of vy_check_is_unique_{primary,secondary} · 40dc65b1

Vladimir Davydov authored 6 years ago

Besides vy_check_is_unique, other callers of vy_check_is_unique_primary
and vy_check_is_unique_secondary are only called when vinyl engine is
online. So let's move the optimization that skips uniqueness check on
recovery to vy_check_is_unique and remove the env argument.

40dc65b1

replication: fix a failing assert in replica_on_applier_disconnect() · c939ca8a

Serge Petrenko authored 6 years ago

One possible case when two applier errors happen one after another
wasn't handled in replica_on_applier_disconnect(), which lead to
occasional test failures and crashes. Handle this case and add a
regression test.

Part of #3510

c939ca8a

Update test-run · 83731e2c
Serge Petrenko authored 6 years ago
```
Fix a bug where crash_expected option lead to test hang.
```
83731e2c

test: fix app-tap/tarantoolctl sporadic failure · 9bf93573

Serge Petrenko authored 6 years ago

When `tarantoolctl status` is called immediately after `tarantoolctl
stop` there is a chance that tarantool hasn't exited yet, so the pid
file still exists, which is reported by `tarantoolctl status`. This
leads to occasional test failures. Fix this by waiting till tarantool
exits before calling `status`.

Closes #3557

9bf93573

Aug 16, 2018

lua: add string.fromhex method · 181eb67d
N.Tatunov authored 6 years ago
```
Add string.fromhex method. Add test for string.fromhex().

Closes #2562
```
181eb67d

box: allow nullability mismatch in space indices and format · 5b68c5f5

Serge Petrenko authored 6 years ago

Field is_nullable option must be the same in index parts and space
format. This causes problems with altering this option. The only way to
do it is to drop space format, alter nullability in index, and then
reset space format. Not too convenient.

Fix this by allowing different nullability in space format and indices.
This allows to change nullability in space format and index separately.
If at least one of the options is set to false, the resulting
nullability is also set to false.

Closes #3430

5b68c5f5

replication: join relay threads on shutdown · 4f8c988c

Serge Petrenko authored 6 years ago

Relay threads keep using tx upon shutdown, which leads to occasional
segmentation faults and assertion fails (e.g. in replication test
suite). Fix this by forcefully cancelling (with pthread_cancel) and
joining relay threads before proceeding to tx destruction.

Closes #3485

4f8c988c

Aug 15, 2018

Using luaT_tolstring in conversion function, instead lua_tostring · 5be0752f

Eugine Blikh authored 6 years ago

We can throw any Lua object as Lua error, but current behaviour won't
convert it to string. So diag error's object will have NULL, instead of
string.

luaT_tolstring honors __tostring metamethod and thus can convert table
to it's string representation.

For example, old behaviour is:
```
tarantool> fiber.create(error, 'help')
LuajitError: help
tarantool> fiber.create(error, { message = 'help' })
LuajitError:
tarantool> fiber.create(error, setmetatable({ message = 'help' }, {
                          __tostring = function(self) return self.message end
                        }))
LuajitError:
```

New behaviour is:
```
tarantool> fiber.create(error, 'help')
LuajitError: help
tarantool> fiber.create(error, { 'help' })
LuajitError: table: 0x0108fa2790
tarantool> fiber.create(error, setmetatable({ message = 'help' }, {
                          __tostring = function(self) return self.message end
                        }))
LuajitError: help
```

It won't break anything, but'll add new behaviour

5be0752f

Introduce luaT_tolstring · 01c32701

Eugine Blikh authored 6 years ago

`lua_tostring`/`lua_tolstring` ignores metatable/boolean/nil and return NULL,
but sometimes it's needed to have similar behaviour, like lua functions
tostring. Lua 5.1 and LuaJIT ignores it by default, Lua 5.2 introduced
auxilary function luaL_to(l)string with supporting of __tostring. This
function is backport of Lua 5.1 "lauxlib.h"s luaL_tostring in the luaT
namespace.

01c32701

test: spurious duplicate key error when ddl races with tuple update · 25b61bc5

Vladimir Davydov authored 6 years ago

If a tuple inserted into a secondary index under construction had the
same key parts, both primary and secondary, as a tuple already stored
in the index (i.e. it's not indexed fields that got updated), the
uniqueness check failed before commit fc3834c0 ("vinyl: check key
uniqueness before modifying tx write set"). The commit added a piece of
code to vy_check_is_unique_secondary() that compares the new and the
found tuple by primary key parts and doesn't raise an error if they
match (they point to the same tuple and hence the inserted tuple doesn't
actually modify the index, let alone violate the unique constraint).
This patch adds a test for that fix.

Closes #3578

25b61bc5

vinyl: check key uniqueness before modifying tx write set · fc3834c0

Vladimir Davydov authored 6 years ago

Currently, we handle INSERT/REPLACE/UPDATE requests by iterating over
all space indexes starting from the primary and inserting the
corresponding statements to tx write set, checking key uniqueness if
necessary. This means that by the time we write a REPLACE to the write
set of a secondary index, it has already been written to the primary
index write set. This is OK, and vy_tx_prepare() relies on that to
implement the common memory level. However, this also means that when we
check uniqueness of a secondary index, the new REPLACE can be found via
the primary index. This is OK now, because all indexes are fully
independent, but it isn't going to fly after #2129 is implemented. The
problem is in order to check if a tuple is present in a secondary index,
we will have to look up the corresponding full tuple in the primary
index. To illustrate the problem, consider the following situation:

  Primary index covers field 1.
  Secondary index covers field 2.

  Committed statements:

    REPLACE{1, 10, lsn=1} - present in both indexes
    DELETE{1, lsn=2} - present only in the primary index

  Transaction:

    REPLACE{1, 10}

When we check uniqueness of the secondary index, we find committed
statement REPLACE{1, 10, lsn=1}, then look up the corresponding full
tuple in the primary index and find REPLACE{1, 10}. Since the two tuples
match, we mistakenly assume that there's a conflict.

To avoid a situation like that, let's check uniqueness before modifying
the write set of any index.

Needed for #2129

fc3834c0

Aug 14, 2018

test: add test to check log_nonblock mode · 3800e36d

Olga Arkhangelskaia authored 6 years ago

We should test log_nonblock mode. In some cases the loss of this flag
lead to tarantool hanging forever. This tests checks such possibility.

Follow-up #3615

3800e36d

Merge branch '1.9' into 1.10 · 587bfc7e
Vladimir Davydov authored 6 years ago

587bfc7e

replication: do not ignore replication_connect_quorum · c1a16b26

Serge Petrenko authored 6 years ago

On bootstrap and after initial configuration replication_connect_quorum
was ignored. The instance tried to connect to every replica listed in
replication parameter, and failed if it wasn't possible.

The patch alters this behaviour. An instance still tries to connect to
every node listed in box.cfg.replication, but does not raise an error if
it was able to connect to at least replication_connect_quorum instances.

Closes #3428

@TarantoolBot document
Title: replication_connect_quorum is not ignored
Now on replica set bootstrap and in case of replication reconfiguration
(e.g. calling box.cfg{replication=...} for the second time) tarantool
doesn't fail, if it couldn't connect to to every replica, but could
connect to replication_connect_quorum replicas. If after
replication_connect_timeout seconds the instance is not connected to at
least replication_connect_quorum other instances, we throw an error.

c1a16b26

test: add arguments to replication instances · 438a4e65

Serge Petrenko authored 6 years ago

Add start arguments to replication test instances to control
replication_timeout and replication_connect_timeout settings
between restarts.

Needed for #3428

438a4e65