Commits · eddc8cc03bc8a00c4f2edfc9782910b88b0fdd33 · core / tarantool

Dec 11, 2019

test: update box/errinj.test.lua result file · eddc8cc0

Vladislav Shpilevoy authored 5 years ago

The test started failing after my commit:
ca07088c
(func: fix not unloading of unused modules), because I forgot to
update the result file.

Follow up #4648

Unverified

eddc8cc0

Dec 10, 2019

errinj: provide 'get' method in Lua · c3c6d3fc

Vladislav Shpilevoy authored 5 years ago

Error injections are used to simulate an error. They are
represented as a flag, or a number, and are used in Lua tests. But
they don't have any feedback. That makes impossible to use the
injections to check that something has happened. Something very
needed to be checked, and impossible to check in a different way.

More certainly, the patch is motivated by a necessity to count
loaded dynamic libraries to ensure, that they are loaded and
unloaded when expected. This is impossible to do in a platform
independent way. But an error injection as a debug-only counter
would solve the problem.

Needed for #4648

c3c6d3fc

Oct 21, 2019

recovery: build secondary index in hot standby mode · 5aa243de

Ilya Kosarev authored 5 years ago

End recovery (which means building secondary indexes) just after
last known log file was read. This allows fast switch to hot standby
instance without any delay for secondary index to be built.
Due to engine_end_recovery carryover, xdir_collect_inprogress,
previously being called from it, is now moved to garbage collector.

Closes #4135

5aa243de

Sep 13, 2019

relay: join new replicas off read view · 6332aca6

Vladimir Davydov authored 5 years ago

Historically, we join a new replica off the last checkpoint. As a
result, we must always keep the last memtx snapshot and all vinyl data
files corresponding to it. Actually, there's no need to use the last
checkpoint for joining a replica. Instead we can use the current read
view as both memtx and vinyl support it. This should speed up the
process of joining a new replica, because we don't need to replay all
xlogs written after the last checkpoint, only those that are accumulated
while we are relaying the current read view. This should also allow us
to avoid creating a snapshot file on bootstrap, because the only reason
why we need it is allowing joining replicas. Besides, this is a step
towards decoupling the vinyl metadata log from checkpointing in
particular and from xlogs in general.

Closes #1271

6332aca6

Aug 20, 2019

sql: GREATEST, LEAST instead of MIN/MAX overload · a46b5200

Kirill Shcherbatov authored 5 years ago

This patch does two things: renames existing scalar min/max
functions and reserves names for them in NoSQL cache.

Moreover it is an important step to get rid of function's name
overloading required for replace FuncDef cache with Tarantool's
function cache.

Closes #4405
Needed for #2200, #4113, #2233

@TarantoolBot document
Title: Scalar functions MIN/MAX are renamed to LEAST/GREATEST

The MIN/MAX functions are typically used only as aggregate
functions in other RDBMS(MSSQL, Postgress, MySQL, Oracle) while
Tarantool's SQLite legacy code use them also in meaning
GREATEST/LEAST scalar function. Now it fixed.

a46b5200

Aug 14, 2019

wal: make wal_sync fail on write error · 2d5e56ff

Vladimir Davydov authored 5 years ago

wal_sync() simply flushes the tx<->wal request queue, it doesn't
guarantee that all pending writes are successfully committed to disk.
This works for now, but in order to implement replica join off the
current read view, we need to make sure that all pending writes have
been persisted and won't be rolled back before we can use memtx
snapshot iterators. So this patch adds a return code to wal_sync():
since now on it returns -1 if rollback is in progress and hence
some in-memory changes are going to be rolled back. We will use
this method after opening memtx snapshot iterators used for feeding
a consistent read view a newly joined replica so as to ensure that
changes frozen by the iterators have made it to the disk.

2d5e56ff

Jul 05, 2019

test: redo some swim tests using error injections · a0d6ac29

Vladislav Shpilevoy authored 5 years ago

There were tests relying on certain content of SWIM messages.
After next patches these conditions won't work without an
explicit intervention with error injections.

The patchset moves these tests to separate release-disabled
files.

Part of #4253

a0d6ac29

Jul 04, 2019
- Replace ERRINJ_SNAP_WRITE_ROW_TIMEOUT with ERRINJ_SNAP_WRITE_DELAY · 3d5da41c
  Vladimir Davydov authored 5 years ago
  
  Timeout injections are unstable and difficult to use. Injecting a delay is much more convenient.
  3d5da41c
Jun 11, 2019

txn: Fire a trigger after a transaction finalization · 831a35d4

Georgy Kirichenko authored 5 years ago

Fire transaction trigger after a transaction finalization. This allows
to not to view the transaction dismissed changes in case of rollback.

Fixes: #4276

831a35d4

Jun 04, 2019

test: move vinyl space format test case to engine suite · 0a59d73d

Serge Petrenko authored 5 years ago

After making memtx space format check non-blocking, move the appropriate
vinyl test case to engine suite. Introduce a new errinj,
ERRINJ_CHECK_FORMAT_DELAY, to unify the test case for both engines.

Follow-up #3976

0a59d73d

May 30, 2019

test: move background index build test to engine suite from vinyl · 39d0e427

Serge Petrenko authored 5 years ago

Since we have implemented memtx background index build, the
corresponding vinyl test cases are now also suitable for memtx,
so move them to engine suite so that both engines are tested.
Also add some tests to check that an ongoing index build is aborted in
case a tuple violating unique constraint or format of the new index is
inserted.
Add some error injections to unify appropriate memtx/vinyl tests.

Closes #3976

39d0e427

May 07, 2019

box: zap field_map_get_size · 55e1a140

Vladimir Davydov authored 5 years ago

Turns out we don't really need it as we can use data_offset + bsize
(i.e. the value returned by tuple_size() helper function) to get the
size of a tuple to free. We only need to take into account the offset
of the base tuple struct in the derived struct (memtx_tuple).

There's a catch though:

 - We use sizeof(struct memtx_tuple) + field_map_size + bsize for
   allocation size.
 - We set data_offset to sizeof(struct tuple) + field_map_size.
 - struct tuple is packed, which makes its size 10 bytes.
 - memtx_tuple embeds struct tuple (base) at 4 byte offset, but since
   it is not packed, its size is 16 bytes, NOT 4 + 10 = 14 bytes as
   one might expect!
 - This means data_offset + bsize + offsetof(struct memtx_tuple, base)
   doesn't equal allocation size.

To fix that, let's mark memtx_tuple packed. The only side effect it has
is that we save 2 bytes per each memtx tuple. It won't affect tuple data
layout at all, because struct memtx_tuple already has a packed layout
and so 'packed' will only affect its size, which is only used for
computing allocation size.

My bad I overlooked it during review.

Follow-up f1d9f257 ("box: introduce multikey indexes in memtx").

55e1a140

Apr 29, 2019

core/coio_file: Use eio_sendfile_sync instead of a chunk mode · 04bf646f

Cyrill Gorcunov authored 5 years ago

eio library provides a portable version of sendfile syscall
which works a way more efficient than explicit copying file
by 4K chunks.

04bf646f

Apr 16, 2019

vinyl: fix crash during index build · ccd46a27

Vladimir Davydov authored 5 years ago

To propagate changes applied to a space while a new index is being
built, we install an on_replace trigger. In case the on_replace
trigger callback fails, we abort the DDL operation.

The problem is the trigger may yield, e.g. to check the unique
constraint of the new index. This opens a time window for the DDL
operation to complete and clear the trigger. If this happens, the
trigger will try to access the outdated build context and crash:

 | #0  0x558f29cdfbc7 in print_backtrace+9
 | #1  0x558f29bd37db in _ZL12sig_fatal_cbiP9siginfo_tPv+1e7
 | #2  0x7fe24e4ab0e0 in __restore_rt+0
 | #3  0x558f29bfe036 in error_unref+1a
 | #4  0x558f29bfe0d1 in diag_clear+27
 | #5  0x558f29bfe133 in diag_move+1c
 | #6  0x558f29c0a4e2 in vy_build_on_replace+236
 | #7  0x558f29cf3554 in trigger_run+7a
 | #8  0x558f29c7b494 in txn_commit_stmt+125
 | #9  0x558f29c7e22c in box_process_rw+ec
 | #10 0x558f29c81743 in box_process1+8b
 | #11 0x558f29c81d5c in box_upsert+c4
 | #12 0x558f29caf110 in lbox_upsert+131
 | #13 0x558f29cfed97 in lj_BC_FUNCC+34
 | #14 0x558f29d104a4 in lua_pcall+34
 | #15 0x558f29cc7b09 in luaT_call+29
 | #16 0x558f29cc1de5 in lua_fiber_run_f+74
 | #17 0x558f29bd30d8 in _ZL16fiber_cxx_invokePFiP13__va_list_tagES0_+1e
 | #18 0x558f29cdca33 in fiber_loop+41
 | #19 0x558f29e4e8cd in coro_init+4c

To fix this issue, let's recall that when a DDL operation completes,
all pending transactions that affect the altered space are aborted by
the space_invalidate callback. So to avoid the crash, we just need to
bail out early from the on_replace trigger callback if we detect that
the current transaction has been aborted.

Closes #4152

ccd46a27

Apr 10, 2019

test: fix vinyl/errinj_ddl failure · f41d2999

Vladimir Davydov authored 5 years ago

The test fixes the following two test failures:

 | --- vinyl/errinj_ddl.result	Tue Mar 19 17:52:48 2019
 | +++ vinyl/errinj_ddl.reject	Tue Mar 19 19:05:36 2019
 | @@ -358,7 +358,7 @@
 | ...
 | s.index.sk:stat().memory.rows
 | ---
 | -- 27
 | +- 23
 | ...
 | test_run:cmd('restart server default')
 | fiber = require('fiber')

This happens, because creation of the test index can happen later than
we expect. Fix it by adding an appropriate wait_cond.

 | --- vinyl/errinj_ddl.result	Tue Mar 19 17:52:48 2019
 | +++ vinyl/errinj_ddl.reject	Tue Mar 19 18:07:55 2019
 | @@ -504,6 +504,7 @@
 | ...
 | _ = s1:create_index('sk', {parts = {2, 'unsigned'}})
 | ---
 | +- error: Tuple field 2 required by space format is missing
 | ...
 | errinj.set("ERRINJ_VY_READ_PAGE_TIMEOUT", 0)
 | ---

This one is due to a test transaction completing before DDL starts so
that the transaction isn't aborted by DDL, as we expect. Fix it by
making sure the transaction won't commit before DDL starts, again with
the aid of wait_cond.

 | --- vinyl/errinj_ddl.result     Wed Apr 10 18:59:57 2019
 | +++ vinyl/errinj_ddl.reject     Wed Apr 10 19:05:35 2019
 | @@ -779,7 +779,7 @@
 |  ...
 |  ch1:get()
 |  ---
 | -- Transaction has been aborted by conflict
 | +- Duplicate key exists in unique index 'i1' in space 'test'
 |  ...
 |  ch2:get()
 |  ---

This test case fails, because we use a timeout to stall reading DML
operations. This was initially a bad call, because under severe load
(e.g. parallel test run), the timeout may fire before we get to execute
the DDL request, which is supposed to abort the DML operations, in which
case they won't be aborted. Fix this by replacing the timeout with a
delay, as we should have done right from the start.

Closes #4056
Closes #4057

f41d2999

Mar 27, 2019

sql: store regular identifiers in case-normal form · e7558062

Kirill Shcherbatov authored 6 years ago

Introduced a new sql_normalize_name routine performing SQL name
conversion to case-normal form via unicode character folding.
For example, ß is converted to SS. The result is similar to SQL
UPPER function.

Closes #3931

e7558062

Mar 26, 2019

test: fix long_row_timeout.test.lua failure in parallel mode · 17acae1f

Serge Petrenko authored 6 years ago

The test used to write big rows (20 mb in size), so when run in parallel
mode, it put high load on the disk and processor, which made appliers
time out multiple times during read, and caused the test to fail
occasionally.
So, instead of writing huge rows in test, introduce a new error
injection restricting sio from reading more than a couple of bytes per
request. This ensures that the test is still relevant and makes it a lot
more lightweight.

Closes #4062

17acae1f

Feb 08, 2019

wal: do not promote wal vclock for failed writes · 066b929b

Georgy Kirichenko authored 6 years ago

Wal used to promote vclock prior to write the row. This lead to a
situation when master's row would be skipped forever in case there is
an error trying to write it. However, some errors are transient, and we
might be able to successfully apply the same row later. So we do not
promote writer vclock in order to be able to restart replication from
failing point.

Obsoletes xlog/panic_on_lsn_gap.test.

Needed for #2283

066b929b

Jan 25, 2019

Allow to reuse tuple_formats for ephemeral spaces · dbbd9317

Kirill Yukhin authored 6 years ago

Since under heavy load with SQL queries ephemeral
spaces might be extensively used it is possible to run out
of tuple_formats for such spaces. This occurs because
tuple_format is not immediately deleted when ephemeral space is
dropped. Its removel is postponed instead and triggered only
when tuple memory is exhausted.
As far as there's no way to alter ephemeral space's format,
let's re-use them for multiple epehemral spaces in case
they're identical.

Closes #3924

dbbd9317

Dec 06, 2018

test: errinj for pause relay_send · 1c34c91f

Sergei Voronezhskii authored 6 years ago

Instead of using timeout we need just pause `relay_send`. Can't rely
on timeout because of various system load in parallel mode. Add new
errinj which checks boolean in loop and until it is not `True` do not
pass the method `relay_send` to the next statement.

To check the read-only mode, need to make a modification of tuple. It
is enough to call `replace` method. Instead of `delete` and then
useless verification that we have not delete tuple by using `get`
method.

And lookup the xlog files in loop with a little sleep, until the file
count is not as expected.

Update box/errinj.result because new errinj was added.

Part of #2436, #3232

1c34c91f

Nov 29, 2018

test: fix vinyl/errinj spurious failure · 8e13153b

Vladimir Davydov authored 6 years ago

The failing test case checks that modifications done to the space during
the final dump of a newly built index are recovered properly. It assumes
that a series of operations will complete in 0.1 seconds, but it may not
happen if the disk is slow (like on Travis CI). This results in spurious
failures. To fix this issue, let's replace ERRINJ_VY_RUN_WRITE_TIMEOUT
used by the test with ERRINJ_VY_RUN_WRITE_DELAY, which blocks index
creation until it is disabled instead of injecting a time delay as its
predecessor did.

Closes #3756

8e13153b

Oct 25, 2018

wal: delete old wal files when running out of disk space · 8a1bdc82

Vladimir Davydov authored 6 years ago

Now if the WAL thread fails to preallocate disk space needed to commit
a transaction, it will delete old WAL files until it succeeds or it
deletes all files that are not needed for local recovery from the oldest
checkpoint. After it deletes a file, it notifies the garbage collector
via the WAL watcher interface. The latter then deactivates consumers
that would need deleted files.

The user doesn't see a ENOSPC error if the WAL thread successfully
allocates disk space after deleting old files. Here's what's printed
to the log when this happens:

wal/101/main C> ran out of disk space, try to delete old WAL files
wal/101/main I> removed /home/vlad/src/tarantool/test/var/001_replication/master/00000000000000000005.xlog
wal/101/main I> removed /home/vlad/src/tarantool/test/var/001_replication/master/00000000000000000006.xlog
wal/101/main I> removed /home/vlad/src/tarantool/test/var/001_replication/master/00000000000000000007.xlog
main/105/main C> deactivated WAL consumer replica 82d0fa3f-6881-4bc5-a2c0-a0f5dcf80120 at {1: 5}
main/105/main C> deactivated WAL consumer replica 98dce0a8-1213-4824-b31e-c7e3c4eaf437 at {1: 7}

Closes #3397

8a1bdc82

Sep 22, 2018

test: fix spurious box/access_sysview test failure · 144c58b3

Vladimir Davydov authored 6 years ago

Due to a missing privilege revocation in box/errinj, box/access_sysview
fails if executed after it.

Fixes commit af6b554b ("test: remove universal grants from tests").

144c58b3

Sep 20, 2018

test: remove universal grants from tests · af6b554b

Serge Petrenko authored 6 years ago

This patch rewrites all tests to grant only necessary privileges, not
privileges to universe. This was made possible by bugfixes in access
control, patches #3516, #3574, #3524, #3530.

Follow-up #3530

af6b554b

Sep 19, 2018

vinyl: keep track of compaction queue length · 06e70cad

Vladimir Davydov authored 6 years ago

Currently, there's no way to figure out whether compaction keeps up
with dumps or not while this is essential for implementing transaction
throttling. This patch adds a metric that is supposed to help answer
this question. This is the compaction queue size. It is calculated per
range and per LSM tree as the total size of slices awaiting compaction.
We update the metric along with the compaction priority of a range, in
vy_range_update_compact_priority(), and account it to an LSM tree in
vy_lsm_acct_range(). For now, the new metric is reported only on per
index basis, in index.stat() under disk.compact.queue.

06e70cad

Aug 29, 2018

xlog: add request details to panic message for broken LSN · 48f55559

Sergei Kalashnikov authored 6 years ago

Aid the debugging of replication issues related to out-of-order
requests. Adds the details of request/tuple to the diagnostic
message whenever possible.

Closes #3105

48f55559

Aug 08, 2018

test: fix box/errinj.test.lua sporadic failure · 8c06a069

Mergen Imeev authored 6 years ago

In some cases operation box.snapshot() takes longer than expected.
This leads to situations when the previous error is reported instead
of the new one. Now these errors completely separated.

Closes #3599

8c06a069

Jul 30, 2018

vinyl: implement rebootstrap support · 06658416

Vladimir Davydov authored 6 years ago

If vy_log_bootstrap() finds a vylog file in the vinyl directory, it
assumes it has to be rebootstrapped and calls vy_log_rebootstrap().
The latter scans the old vylog file to find the max vinyl object id,
from which it will start numbering objects created during rebootstrap to
avoid conflicts with old objects, then it writes VY_LOG_REBOOTSTRAP
record to the old vylog to denote the beginning of a rebootstrap
section. After that initial join proceeds as usual, writing information
about new objects to the old vylog file after VY_LOG_REBOOTSTRAP marker.
Upon successful rebootstrap completion, checkpoint, which is always
called right after bootstrap, rotates the old vylog and marks all
objects created before the VY_LOG_REBOOTSTRAP marker as dropped in the
new vylog. The old objects will be purged by the garbage collector as
usual.

In case rebootstrap fails and checkpoint never happens, local recovery
writes VY_LOG_ABORT_REBOOTSTRAP record to the vylog. This marker
indicates that the rebootstrap attempt failed and all objects created
during rebootstrap should be discarded. They will be purged by the
garbage collector on checkpoint. Thus even if rebootstrap fails, it is
possible to recover the database to the state that existed right before
a failed rebootstrap attempt.

Closes #461

06658416

Jun 28, 2018

xdir: remove inprogress files after restart · f41aac61

Vladimir Davydov authored 6 years ago

If tarantool is stopped while writing a snapshot or a vinyl run file,
inprogress files will never be removed. Fix this by collecting those
files on recovery completion.

Original patch by @IlyaMarkovMipt. Reworked by @locker.

Closes #3406

f41aac61

test: update test results · aaa9bdbe
Konstantin Osipov authored 6 years ago
```
A minor follow up on the fix for gh-3452 (http.client timeout bug)
```
aaa9bdbe

Jun 14, 2018

memtx: don't delay deletion of temporary tuples during snapshot · f9299c43

Vladimir Davydov authored 6 years ago

Since tuples stored in temporary spaces are never written to disk, we
can always delete them immediately, even when a snapshot is in progress.

Closes #3432

f9299c43

Jun 01, 2018

vinyl: fix compaction vs checkpoint race resulting in invalid gc · b25e3168

Vladimir Davydov authored 6 years ago

The callback invoked upon compaction completion uses checkpoint_last()
to determine whether compacted runs may be deleted: if the max LSN
stored in a compacted run (run->dump_lsn) is greater than the LSN of the
last checkpoint (gc_lsn) then the run doesn't belong to the last
checkpoint and hence is safe to delete, see commit 35db70fa ("vinyl:
remove runs not referenced by any checkpoint immediately").

The problem is checkpoint_last() isn't synced with vylog rotation - it
returns the signature of the last successfully created memtx snapshot
and is updated in memtx_engine_commit_checkpoint() after vylog is
rotated. If a compaction task completes after vylog is rotated but
before snap file is renamed, it will assume that compacted runs do not
belong to the last checkpoint, although they do (as they have been
appended to the rotated vylog), and delete them.

To eliminate this race, let's use vylog signature instead of snap
signature in vy_task_compact_complete().

Closes #3437

b25e3168

May 31, 2018

vinyl: fix false-positive assertion at exit · ff02157f

Vladimir Davydov authored 6 years ago

latch_destroy() and fiber_cond_destroy() are basically no-op. All they
do is check that latch/cond is not used. When a global latch or cond
object is destroyed at exit, it may still have users and this is OK as
we don't stop fibers at exit. In vinyl this results in the following
false-positive assertion failures:

  src/latch.h:81: latch_destroy: Assertion `l->owner == NULL' failed.

  src/fiber_cond.c:49: fiber_cond_destroy: Assertion `rlist_empty(&c->waiters)' failed.

Remove "destruction" of vy_log::latch to suppress the first one. Wake up
all fibers waiting on vy_quota::cond before destruction to suppress the
second one. Add some test cases.

Closes #3412

ff02157f

May 25, 2018

test: rework test case for memtx async garbage collection · c5f98b91

Vladimir Davydov authored 6 years ago

Do not use errinj as it is unreliable. Check that:
 - No memory is freed by immediately after space drop (WAL is off).
 - All memory is freed asynchronously after yield.

c5f98b91

May 21, 2018

memtx: free tuples asynchronously when primary index is dropped · 2a1482f3

Vladimir Davydov authored 6 years ago

When a memtx space is dropped or truncated, we have to unreference all
tuples stored in it. Currently, we do it synchronously, thus blocking
the tx thread. If a space is big, tx thread may remain blocked for
several seconds, which is unacceptable. This patch makes drop/truncate
hand actual work to a background fiber.

Before this patch, drop of a space with 10M 64-byte records took more
than 0.5 seconds. After this patch, it takes less than 1 millisecond.

Closes #3408

2a1482f3

May 08, 2018

iproto: fix error with unstoppable batching · 01bfa59b

Vladislav Shpilevoy authored 6 years ago

IProto connection stops to read input on reached request limit.
But when multiple requests are in a batch, the IProto does not
check the limit, so it can be violated.

Lets check the limit during batch parsing after each message too,
not only once before parsing.

01bfa59b

Apr 22, 2018
- test: fix unstable test · 0879a488
  Vladislav Shpilevoy authored 6 years ago
  
  0879a488
- test: fix unstable test · 320c66cd
  Vladislav Shpilevoy authored 6 years ago
  
  320c66cd
Apr 10, 2018

space: space_vtab::build_secondary_key => build_index · 4cbfdede

Vladimir Davydov authored 6 years ago

The build_secondary_key method of space vtab is used not only for
building secondary indexes, but also for rebuilding primary indexes.
To avoid confusion, let's rename it to build_index and pass to it
the source space, the new index, and the new tuple format.

4cbfdede

Apr 07, 2018

vinyl: use ERRINJ_DOUBLE for ERRINJ_VY_READ_PAGE_TIMEOUT · 8dc9895f

Vladimir Davydov authored 6 years ago

We use ERRINJ_DOUBLE for all other timeout injections. This makes them
more flexible as we can inject an arbitrary timeout in tests, not just
enable some hard-coded timeout. Besides, it makes tests easier to
follow. So let's use ERRINJ_DOUBLE for ERRINJ_VY_READ_PAGE_TIMEOUT too.

8dc9895f