Commits · 6d85c35ca96526408ac6b00c2eb2f968bffe0caf · core / tarantool

Jul 31, 2018

vinyl: make point lookup always return the latest tuple version · 6d85c35c

Currently, vy_point_lookup(), in contrast to vy_read_iterator, doesn't
rescan the memory level after reading disk, so if the caller doesn't
track the key before calling this function, the caller won't be sent to
a read view in case the key gets updated during yield and hence will
be returned a stale tuple. This is OK now, because we always track the
key before calling vy_point_lookup(), either in the primary or in a
secondary index. However, for #2129 we need it to always return the
latest tuple version, no matter if the key is tracked or not.

The point is in the scope of #2129 we won't write DELETE statements to
secondary indexes corresponding to a tuple replaced in the primary
index. Instead after reading a tuple from a secondary index we will
check whether it matches the tuple corresponding to it in the primary
index: if it is not, it means that the tuple read from the secondary
index was overwritten and should be skipped. E.g. suppose we have the
primary index over the first field and a secondary index over the second
field and the following statements in the space:

  REPLACE{1, 10}
  REPLACE{1, 20}

Then reading {10} from the secondary index will return REPLACE{1, 10}, but
lookup of {1} in the primary index will return REPLACE{1, 20} which
doesn't match REPLACE{1, 10} read from the secondary index hence the
latter was overwritten and should be skipped.

The problem is in the example above we don't want to track key {1} in
the primary index before lookup, because we don't actually read its
value. So for the check to work correctly, we need the point lookup to
guarantee that the returned tuple is always the newest one. It's fairly
easy to do - we just need to rescan the memory level after yielding on
disk if its version changed.

Needed for #2129

6d85c35c

Add tarantoolctl rocks pack/unpack subcommands · 0746fdb4
Konstantin Nazarov authored 6 years ago
```
The subcommands are used to create binary rock distributions.
In context of #3525
```
0746fdb4

Jul 30, 2018

vinyl: implement rebootstrap support · 06658416

Vladimir Davydov authored 6 years ago

If vy_log_bootstrap() finds a vylog file in the vinyl directory, it
assumes it has to be rebootstrapped and calls vy_log_rebootstrap().
The latter scans the old vylog file to find the max vinyl object id,
from which it will start numbering objects created during rebootstrap to
avoid conflicts with old objects, then it writes VY_LOG_REBOOTSTRAP
record to the old vylog to denote the beginning of a rebootstrap
section. After that initial join proceeds as usual, writing information
about new objects to the old vylog file after VY_LOG_REBOOTSTRAP marker.
Upon successful rebootstrap completion, checkpoint, which is always
called right after bootstrap, rotates the old vylog and marks all
objects created before the VY_LOG_REBOOTSTRAP marker as dropped in the
new vylog. The old objects will be purged by the garbage collector as
usual.

In case rebootstrap fails and checkpoint never happens, local recovery
writes VY_LOG_ABORT_REBOOTSTRAP record to the vylog. This marker
indicates that the rebootstrap attempt failed and all objects created
during rebootstrap should be discarded. They will be purged by the
garbage collector on checkpoint. Thus even if rebootstrap fails, it is
possible to recover the database to the state that existed right before
a failed rebootstrap attempt.

Closes #461

06658416

vinyl: simplify vylog recovery from backup · 8e710090

Vladimir Davydov authored 6 years ago

Since we don't create snapshot files for vylog, but instead append
records written after checkpoint to the same file, we have to use the
previous vylog file for backup (see vy_log_backup_path()). So when
recovering from a backup we need to rotate the last vylog to keep vylog
and checkpoint signatures in sync. Currently, we do it on recovery
completion and we use vy_log_create() instead of vy_log_rotate() for it.
This is done so that we can reuse the context that was used for recovery
instead of rereading vylog for rotation. Actually, there's no point in
this micro-optimization, because we rotate vylog only when recovering
from a backup. Let's remove it and use vy_log_rotate() for this.

Needed for #461

8e710090

replication: print master uuid when (re)bootstrapping · 71cec841

Vladimir Davydov authored 6 years ago

Currently only the remote address is printed. Let's also print the UUID,
because replicas are identified by UUID everywhere in tarantool, not by
the address. An example of the output is below:

I> can't follow eb81a67e-99ee-40bb-8601-99b03fa20124 at [::1]:58083: required {1: 8} available {1: 12}
C> replica is too old, initiating rebootstrap
I> bootstrapping replica from eb81a67e-99ee-40bb-8601-99b03fa20124 at [::1]:58083

I> can't follow eb81a67e-99ee-40bb-8601-99b03fa20124 at [::1]:58083: required {1: 17, 2: 1} available {1: 20}
I> can't rebootstrap from eb81a67e-99ee-40bb-8601-99b03fa20124 at [::1]:58083: replica has local rows: local {1: 17, 2: 1} remote {1: 23}
I> recovery start

Suggested by @kostja.

Follow-up ea69a0cd ("replication: rebootstrap instance on startup
if it fell behind").

71cec841

vinyl: zap tx_manager_vlsn · 5a772639

Vladimir Davydov authored 6 years ago

This function is not used anywhere since commit a1e005d8
("vinyl: write_iterator merges vlsns subsequnces")

5a772639

Jul 26, 2018

Merge branch '1.9' into 1.10 · fe07ada1
Konstantin Osipov authored 6 years ago

fe07ada1
lua: fix fio.rmtree to work with non empty dirs · 9917edc7
Konstantin Belyavskiy authored 6 years ago
```
Fix 'fio.rmtree' to remove a non empty directories.
And update test.

Closes #3258
```
9917edc7
lua: fix fio.rmtree to work with non empty dirs · 564a053c
Konstantin Belyavskiy authored 6 years ago
```
Fix 'fio.rmtree' to remove a non empty directories.
And update test.

Closes #3258
```
564a053c

Make access_check_ddl check for entity privileges. · d2e70f18

Serge Petrenko authored 6 years ago

Function access_check_ddl checked only for universal access, thus
granting entity or singe object access to a user would have no effect in
scope of this function.
Fix this by adding entity access checks.

Also attaching an existing sequence to a space checked for
create privilege on both space and sequence
(instead of read + write on sequence). Fixed it and changed the tests
accordingly.

Closes #3516

d2e70f18

Jul 24, 2018
- Allow to mix blackhole statements in other engines' transactions · d512174a
  Vladimir Davydov authored 6 years ago
  
  Blackhole doesn't need transaction control as it doesn't actually store anything so we can mark it with ENGINE_BYPASS_TX.
  d512174a
- Merge branch '1.9' into 1.10 · b9fd0b3b
  Vladimir Davydov authored 6 years ago
  
  b9fd0b3b
Jul 23, 2018

replication: rebootstrap instance on startup if it fell behind · ea69a0cd

Vladimir Davydov authored 6 years ago

If a replica fell too much behind its peers in the cluster and xlog
files needed for it to get up to speed have been removed, it won't be
able to proceed without rebootstrap. This patch makes the recovery
procedure detect such cases and initiate rebootstrap procedure if
necessary.

Note, rebootstrap is currently only supported by memtx engine. If there
are vinyl spaces on the replica, rebootstrap will fail. This is fixed by
the following patches.

Part of #461

ea69a0cd

tx: exclude sysview engine from transaction control · 0ecabde8

Vladimir Davydov authored 6 years ago

Sysview is a special engine that is used for filtering out objects that
a user can't access due to lack of privileges. Since it's treated as a
separate engine by the transaction manager, we can't query sysview
spaces from a memtx/vinyl transaction. In particular, if called from a
transaction space:format() will return

  error: A multi-statement transaction can not use multiple storage engines

which is inconvenient.

To fix this, let's mark sysview engine with a new ENGINE_BYPASS_TX flag
and make the transaction manager skip binding a transaction to an engine
in case this flag is set.

Closes #3528

0ecabde8

Introduce blackhole engine · cdf3ed8f

Vladimir Davydov authored 6 years ago

Blackhole is a very simple engine that allows to create spaces that may
written to, but not read from. It only supports INSERT/REPLACE requests.
It doesn't support any indexes hence SELECT is impossible. It does check
space format though and supports on_replace and before_replace triggers.

The whole purpose of this new engine is writing arbitrary rows to WAL
without storing them anywhere. In particular, we need this engine to
write deferred DELETEs generated for vinyl spaces to WAL.

Needed for #2129

cdf3ed8f

space: call before_replace trigger even if space has no indexes · 00204b6a

Vladimir Davydov authored 6 years ago

Needed for blackhole spaces, which don't support indexes per se, but
still may have a before_replace trigger installed.

00204b6a

Jul 22, 2018

replication: unregister replica with gc if deleted from cluster · ea28a925

Vladimir Davydov authored 6 years ago

When a replica is removed from the cluster table, the corresponding
replica struct isn't destroyed unless both the relay and the applier
attached to it are stopped, see replica_clear_id(). Since replica struct
is a holder of the garbage collection state, this means that in case an
evicted replica has an applier or a relay that fails to exit for some
reason, garbage collection will hang.

A relay thread stops as soon as the replica it was started for receives
a row that tries to delete it from the cluster table (because this isn't
allowed by the cluster space trigger, see on_replace_dd_cluster()).
If a replica isn't running, the corresponding relay can't run as well,
because writing to a closed socket isn't allowed. That said, a relay
can't block garbage collection.

An applier, however, is deleted only when replication is reconfigured.
So if a replica that was evicted from the cluster was configured as a
master, its replica struct will hang around blocking garbage collection
for as long as the replica remains in box.cfg.replication. This is what
happens in #3546.

Fix this issue by forcefully unregistering a replica with the garbage
collector when it is deleted from the cluster table. This is OK as it
won't be able to resubscribe and so we don't need to keep WALs for it
any longer. Note, the relay thread may still be running when a replica
is deleted from the cluster table, in which case we can't unregister it
with the garbage collector right away, because the relay may need to
access the garbage collection state. In such a case, leave the job to
replica_clear_relay, which is called as soon as the relay thread exits.

Closes #3546

ea28a925

Jul 21, 2018

txn: unify txn_stmt tuples reference counting rules · efed5d7f

Vladimir Davydov authored 6 years ago

Currently, the way txn_stmt::old_tuple and new_tuple are referenced
depends on the engine. For vinyl, the rules are straightforward: if
txn_stmt::{old_tuple,new_tuple} is not NULL, then the reference to the
corresponding tuple is elevated. Hence when a transaction is committed
or rolled back, vinyl calls tuple_unref on both txn_stmt::old_tuple and
new_tuple. For memtx, things are different: the engine doesn't
explicitly increment the reference counter of the tuples - it simply
sets them to the newly inserted tuple and the replaced tuple. On commit,
the reference counter of the old tuple is decreased to delete the
replaced tuple, while on rollback the reference counter of the new tuple
is decreased to delete the new tuple.

Because of this, we can't implement the blackhole engine (aka /dev/null)
without implementing commit and rollback engine methods - even though
such an engine doesn't store anything it still has to set the new_tuple
for on_replace trigger and hence it is responsible for releasing it on
commit or rollback. Since commit/rollback are rather inappropriate for
this kind of engine, let's instead unify txn_stmt reference counting
rules and make txn.c unreference the tuples no matter what engine is.
This doesn't change vinyl, because it already conforms. For memtx, this
means that we need to increase the reference counter when we insert a
new tuple into a space - not a big deal as tuple_ref is almost free.

efed5d7f

Rework memtx replace function · d361b1f7

Nikita Pettik authored 7 years ago

By now, replace function takes new tuple and old tuple as arguments, instead
of single txn_stmt. It has been done in order to avoid abusing txn_stmt:
the only usage was extracting tuples from it.
As a result, this function can be used by ephemeral tables
without any patching.

(cherry picked from commit 880712c9)

d361b1f7

Merge sysview_index.[hc] and sysview_engine.[hc] · 44fc192d
Vladimir Davydov authored 6 years ago
```
They are fairly small and closely related so let's merge them and call
the result sysview.[hc].
```
44fc192d
Add generic engine, space, index method stubs · 38a27423
Vladimir Davydov authored 6 years ago
```
This should reduce maintenance burden and help us introduce a new
engine.
```
38a27423

Include oldest vclock available on the instance in IPROTO_BALLOT · 989bb8f0

Vladimir Davydov authored 6 years ago

It will be used to check if a replica fell too much behind its peers and
so needs to be rebootstrapped.

Needed for #461

989bb8f0

Get rid of IPROTO_SERVER_IS_RO · 0ade0880

Vladimir Davydov authored 6 years ago

Not needed anymore as we now use the new IPROTO_VOTE command instead of
IPROTO_VOTE_DEPRECATED. Let's remove it altogether and reuse its code
for IPROTO_BALLOT (they are never decoded together so no conflict should
happen). Worst that can happen is we choose a read-only master when
bootstrapping an older version of tarantool.

0ade0880

IPROTO_VOTE command - follow-up fixes · 42a0ebfa

Vladimir Davydov authored 6 years ago

This patch contains some follow-up fixes for fe8ae607
("Introduce IPROTO_VOTE command"):
 - Rename 'status' to 'ballot' everywhere in the comments.
 - Rename IPROTO_REQUEST_VOTE to IPROTO_VOTE_DEPRECATED and
   iproto_reply_request_vote to iproto_reply_vote_deprecated
   to emphasize the fact that this iproto command has been
   deprecated and IPROTO_VOTE should be used instead.
 - Only send an IPROTO_VOTE request to a master if it is
   running tarantool 1.10.1 or newer.

42a0ebfa

Jul 20, 2018

Introduce IPROTO_VOTE command · fe8ae607

Vladimir Davydov authored 6 years ago

The new command is supposed to supersede IPROTO_REQUEST_VOTE, which is
difficult to extend, because it uses the global iproto key namespace.
The new command returns a map (IPROTO_BALLOT), to which we can add
various information without polluting the global namespace. Currently,
the map contains IPROTO_BALLOT_IS_RO and IPROTO_BALLOT_VCLOCK keys,
but soon it will be added info needed for replica rebootstrap feature.

Needed for #461

fe8ae607

Jul 19, 2018

Merge branch '1.9' into 1.10 · 712108d2
Kirill Yukhin authored 6 years ago

712108d2

say: fix invalid arguments · 1046f851

Kirill Shcherbatov authored 6 years ago

_say function was called with invalid arguments.
Thank @sorc1 for patch.

Closes #3433.

1046f851

say: add missing strdup failure check · c422b267
Olga Arkhangelskaia authored 6 years ago
```
Strdup may silently fail without any message from tarantool.
Patch adds this checks.
```
c422b267

third_party: fix strings "true"/"false" in yaml · a2d7643c

Kirill Shcherbatov authored 6 years ago

Strings containing "true" and "false" were converted
to a boolean type when serializing. Fixed.
Example:
type(yaml.decode(yaml.encode('false'))) == string
type(yaml.decode(yaml.encode('true'))) == string

Closes #3476.

a2d7643c

lua: fix strange behaviour of tonumber64 · 455e8898

Kirill Shcherbatov authored 6 years ago

Function tonumber64 has worked incorrectly with values less
than INT64_MAX.
Now it works in the interval [INT64_MAX, UINT64_MAX] returning
nil otherwise.

Closes #3466.

455e8898

net.box: fix invalid index:count() with iterator · b793994d

Kirill Shcherbatov authored 6 years ago

Net.box didn't pass options containing iterator to
server side.
There were also invalid results for two :count tests in
net.box.result file.

Thanks @ademenev for contributing problem and help with
problem locating.

Closes #3262.

b793994d

vinyl: pass flags to vy_recovery_new · aa08773a

Vladimir Davydov authored 6 years ago

Currently, this function takes a single boolean argument, but I'm
planning to add another one. Since two bool arguments look rather
confusing, let's turn this arguments into flags.

Needed for #461

aa08773a

gc: return gc_consumer_signature() to avoid vclock_copy() in box.info.gc() · 56124ebe
Konstantin Osipov authored 6 years ago
```
A small cleanup to avoid a potentially inefficient use of
gc_consumer_vclock().

Follow up on a patch for gh-461.
```
56124ebe

gc: keep track of vclocks instead of signatures · 887845a1

Vladimir Davydov authored 6 years ago

In order to check if a replica needs to be rebootstrapped, we need to
know the vclock of the oldest WAL stored on the master, but the garbage
collector works with signatures and hence can't report the vclock it was
last called for. Actually, all gc users have a vclock and can pass it
instead of signature so it's pretty easy to switch garbage collection
infrastructure to vclock.

Needed for #461

887845a1

Jul 18, 2018

xrow: factor out function for decoding vclock · fdb1e715
Vladimir Davydov authored 6 years ago
```
We will need it in other places.
```
fdb1e715

recovery: clean up WAL dir scan code · 9f1e0f44

Vladimir Davydov authored 6 years ago

 - Remove extra scan of the WAL directory from local_recovery() - we
   scan the directory in recovery_end_vclock() hence we can skip scan in
   recover_remaining_wals() by passing scan_dir = false.

 - Rename recovery_end_vclock() to recovery_scan() to emphasize the fact
   that this function scans the WAL directory. Write a comment to this
   function.

 - Add comments to wal.c explaining why we scan the WAL directory there.

Follow-up 0695fbbb ("box: retrieve end vclock before starting local
recovery").

9f1e0f44

Update test-run · c9bb2492
Vladimir Davydov authored 6 years ago
```
To bring crash_expected option of "start server" command.
```
c9bb2492

Add errors for non-existent privileges and entities. · aecbbfd7

Serge Petrenko authored 6 years ago

There were no checks for granting and revoking a non-existent
privilege or a privilege to a non-existent entity.
Added the checks, and a test case.

Closes #3417

aecbbfd7

Jul 17, 2018

net.box: fix invalid index:count() with iterator · 25b9f0f0

Kirill Shcherbatov authored 6 years ago

Net.box didn't pass options containing iterator to
server side.
There were also invalid results for two :count tests in
net.box.result file.

Thanks @ademenev for contributing problem and help with
problem locating.

Closes #3262.

25b9f0f0

vinyl: fix potential use-after-free in vy_read_view_merge · 18d1acbd

Vladimir Davydov authored 6 years ago

If is_first_insert flag is set and vy_stmt_type(rv->tuple) equals
IPROTO_DELETE, we free rv->tuple, but then we dereference it via
an on-stack variable to check if we need to turn a REPLACE into an
INSERT or vice versa. Fix this.

18d1acbd