Commits · 49b374f9c3eb2c4bddd254226ee2adc8d366d858 · core / tarantool

Jun 13, 2024

relay: rename vclock args and make const · 49b374f9

Vladislav Shpilevoy authored 9 months ago

It wasn't clear which of them are inputs and which are outputs.
The patch explicitly marks the input vclocks as const. It makes
the code a bit easier to read inside of relay.cc knowing that
these vclocks shouldn't change.

Alongside "replica_clock" in subscribe is renamed to
"start_vclock". To make it consistent with relay_final_join(), and
to signify that technically it doesn't have to be a replica
vclock. It isn't really. Box.cc alters the replica's vclock before
giving it to relay, which means it is no longer "replica clock".

In scope of #10047

NO_TEST=refactoring
NO_CHANGELOG=refactoring
NO_DOC=refactoring

(cherry picked from commit 5ebbed77)

49b374f9

relay: move gc subscriber creation out of it · 605752e5

Vladislav Shpilevoy authored 9 months ago

GC consumer creation and destroy seemed to only happen in box.cc
with one exception in relay_subscribe(). Lets move it out for
consistency. Now relay can only notify GC consumers, but can't
manage them.

That also makes it harder to misuse the GC by passing some wrong
vclock to it, similar to what was happening in #10047.

In scope of #10047

NO_TEST=refactoring
NO_CHANGELOG=refactoring
NO_DOC=refactoring

(cherry picked from commit 4dc0c1ea)

605752e5

box: introduce box_localize_vclock · 149fc1f7

Vladislav Shpilevoy authored 9 months ago

The function takes the burden of explaining why this hack about
setting local component in a remote vclock is needed. It also
creates a new vclock, not alters an existing one. This is to
signify that the vclock is no longer what was received from a
remote host.

Otherwise it is too easy to actually mistreat this mutant vlock as
a remote vclock. That btw did happen and is fixed in following
commits.

In scope of #10047

NO_TEST=refactoring
NO_CHANGELOG=refactoring
NO_DOC=refactoring

(cherry picked from commit b8463960)

149fc1f7

ci: add a workflow to check for entrypoint tags · 426bff55

Nikolay Shirokovskiy authored 1 year ago

Check check-entrypoint.sh comment for explanation of what entrypoint tag
is. The workflow fails if current branch does not have a most recent
entrypoint tag that it should have.

Part of #8319

NO_TEST=ci
NO_CHANGELOG=ci
NO_DOC=ci

(cherry picked from commit c06d0d14)

426bff55

vinyl: fix gc vs vylog race leading to duplicate record · 085279aa

Vladimir Davydov authored 9 months ago

Vinyl run files aren't always deleted immediately after compaction,
because we need to keep run files corresponding to checkpoints for
backups. Such run files are deleted by the garbage collection procedure,
which performs the following steps:

 1. Loads information about all run files from the last vylog file.
 2. For each loaded run record that is marked as dropped:
    a. Tries to remove the run files.
    b. On success, writes a "forget" record for the dropped run,
       which will make vylog purge the run record on the next
       vylog rotation (checkpoint).

(see `vinyl_engine_collect_garbage()`)

The garbage collection procedure writes the "forget" records
asynchronously using `vy_log_tx_try_commit()`, see `vy_gc_run()`.
This procedure can be successfully executed during vylog rotation,
because it doesn't take the vylog latch. It simply appends records
to a memory buffer which is flushed either on the next synchronous
vylog write or vylog recovery.

The problem is that the garbage collection isn't necessarily loads
the latest vylog file because the vylog file may be rotated between
it calls `vy_log_signature()` and `vy_recovery_new()`. This may
result in a "forget" record written twice to the same vylog file
for the same run file, as follows:

  1. GC loads last vylog N
  2. GC starts removing dropped run files.
  3. CHECKPOINT starts vylog rotation.
  4. CHECKPOINT loads vylog N.
  5. GC writes a "forget" record for run A to the buffer.
  6. GC is completed.
  7. GC is restarted.
  8. GC finds that the last vylog is N and blocks on the vylog latch
     trying to load it.
  9. CHECKPOINT saves vylog M (M > N).
 10. GC loads vylog N. This triggers flushing the forget record for
     run A to vylog M (not to vylog N), because vylog M is the last
     vylog at this point of time.
 11. GC starts removing dropped run files.
 12. GC writes a "forget" record for run A to the buffer again,
     because in vylog N it's still marked as dropped and not forgotten.
     (The previous "forget" record was written to vylog M).
 13. Now we have two "forget" records for run A in vylog M.

Such duplicate run records aren't tolerated by the vylog recovery
procedure, resulting in a permanent error on the next checkpoint:

```
ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Run XXXX forgotten but not registered
```

To fix this issue, we move `vy_log_signature()` under the vylog latch
to `vy_recovery_new()`. This makes sure that GC will see vylog records
that it's written during the previous execution.

Catching this race in a function test would require a bunch of ugly
error injections so let's assume that it'll be tested by fuzzing.

Closes #10128

NO_DOC=bug fix
NO_TEST=tested manually with fuzzer

(cherry picked from commit 9d3859b2)

085279aa

box: prevent demoted leader from being a candidate in the next elections · 22a9cfd8

Georgiy Lebedev authored 10 months ago


Currently, the demoted leader sees that nobody has requested a vote in the
newly persisted term (because it has just written it without voting, and
nobody had time to see the new term yet), and hence votes for itself,
becoming the most probable winner of the next elections.

To prevent this from happening, let's forbid the demoted leader to be a
candidate in the next elections using `box_raft_leader_step_off`.

Closes #9855

NO_DOC=<bugfix>

Co-authored-by: Serge Petrenko <sergepetrenko@tarantool.org>
(cherry picked from commit 05d03a1c)

22a9cfd8

box: refactor `box_demote` to make it more comprehensible · 49747a4b

Georgiy Lebedev authored 10 months ago


Suggested by Nikita Zheleztsov in the scope of #9855.

Needed for #9855

NO_CHANGELOG=<refactoring>
NO_DOC=<refactoring>
NO_TEST=<refactoring>

Co-authored-by: Nikita Zheleztsov <n.zheleztsov@proton.me>
(cherry picked from commit ff010fe9)

49747a4b

election: fix box.ctl.demote() nop in off-mode · 42631d5b

Vladislav Shpilevoy authored 1 year ago

box.ctl.demote() used not to do anything with election_mode='off'
if the synchro queue didn't belong to the caller in the same term
as the election state.

The reason could be that if the synchro queue term is "outdated",
there is no guarantee that some other instance doesn't own it in
the latest term right now.

The "problem" is that this could be workarounded easily by just
calling promote + demote together.

There isn't much sense in fixing it for the off-mode because the
only reasons off-mode exists are 1) for people who don't use
synchro at all, 2) who did use it and want to stop. Hence they
need demote just to disown the queue.

The patch "legalizes" the mentioned workaround by allowing to
perform demote in off-mode even if the synchro queue term is old.

Closes #6860

NO_DOC=bugfix

(cherry picked from commit 1afe2274)

42631d5b

tuple: don't use offset_slot_cache in vinyl threads · 7d90a94c

Vladimir Davydov authored 9 months ago

`key_part::offset_slot_cache` and `key_part::format_epoch` are used for
speeding up tuple field lookup in `tuple_field_raw_by_part()`. These
structure members are accessed and updated without any locks, assuming
this code is executed exclusively in the tx thread. However, this isn't
necessarily true because we also perform tuple field lookups in vinyl
read threads. Apparently, this can result in unexpected races and bugs,
for example:

```
  #1  0x590be9f7eb6d in crash_collect+256
  #2  0x590be9f7f5a9 in crash_signal_cb+100
  #3  0x72b111642520 in __sigaction+80
  #4  0x590bea385e3c in load_u32+35
  #5  0x590bea231eba in field_map_get_offset+46
  #6  0x590bea23242a in tuple_field_raw_by_path+417
  #7  0x590bea23282b in tuple_field_raw_by_part+203
  #8  0x590bea23288c in tuple_field_by_part+91
  #9  0x590bea24cd2d in unsigned long tuple_hint<(field_type)5, false, false>(tuple*, key_def*)+103
  #10 0x590be9d4fba3 in tuple_hint+40
  #11 0x590be9d50acf in vy_stmt_hint+178
  #12 0x590be9d53531 in vy_page_stmt+168
  #13 0x590be9d535ea in vy_page_find_key+142
  #14 0x590be9d545e6 in vy_page_read_cb+210
  #15 0x590be9f94ef0 in cbus_call_perform+44
  #16 0x590be9f94eae in cmsg_deliver+52
  #17 0x590be9f9583e in cbus_process+100
  #18 0x590be9f958a5 in cbus_loop+28
  #19 0x590be9d512da in vy_run_reader_f+381
  #20 0x590be9cb4147 in fiber_cxx_invoke(int (*)(__va_list_tag*), __va_list_tag*)+34
  #21 0x590be9f8b697 in fiber_loop+219
  #22 0x590bea374bb6 in coro_init+120
```

Fix this by skipping this optimization for threads other than tx.

No test is added because reproducing this race is tricky. Ideally, bugs
like this one should be caught by fuzzing tests or thread sanitizers.

Closes #10123

NO_DOC=bug fix
NO_TEST=tested manually with fuzzer

(cherry picked from commit 19d1f1cc)

7d90a94c

vinyl: fix cache iterator skipping tuples in read view · 1bad1afc

Vladimir Davydov authored 9 months ago

The tuple cache doesn't store older tuple versions so if a reader is
in a read view, it must skip tuples that are newer than the read view,
see `vy_cache_iterator_stmt_is_visible()`. A reader must also ignore
cached intervals if any of the tuples used as a boundary is invisible
from the read view, see `vy_cache_iterator_skip_to_read_view()`.
There's a bug in `vy_cache_iterator_restore()` because of which such
an interval may be returned to the reader: when we step backwards
from the last returned tuple we consider only one of the boundaries.
As a result, if the other boundary is invisible from the read view,
the reader will assume there's nothing in the index between the
boundaries and skip reading older sources (memory, disk). Fix this by
always checking if the other boundary is visible.

Closes #10109

NO_DOC=bug fix

(cherry picked from commit 7b72080d)

1bad1afc

vinyl: fix run iterator skipping tuples following non-terminal statement · 56b6ed79

Vladimir Davydov authored 9 months ago

If a run iterator is positioned at a non-terminal statement (UPSERT or
UPDATE), `vy_run_iterator_next()` will iterate over older statements
with the same key using `vy_run_iterator_next_lsn()` to build the key
history. While doing so, it may reach the end of the run file (if the
current key is the last in the run). This would stop iteration
permanently, which is apparently wrong for reverse iterators (LE or LT):
if this happens the run iterator won't return any keys preceding the
last one in the run file. Fix this by removing `vy_run_iterator_stop()`
from `vy_run_iterator_next_lsn()`.

Part of #10109

NO_DOC=bug fix
NO_CHANGELOG=next commit

(cherry picked from commit 72763f94)

56b6ed79

Jun 10, 2024

ci: fix RPM package builds on aarch64 runners · 715abaaf

Yaroslav Lobankov authored 9 months ago

We're using LXD containers as aarch64 runners. For some reason, OOM
killer just kills the compilation process while package building when
`make -j $(nproc)`. The issue happens only with builds where LTO is
enabled. It's found, that `-j6` works fine. The bigger value causes
problems.

NO_DOC=ci
NO_TEST=ci
NO_CHANGELOG=ci

715abaaf

test: bump test-run to new version · 4d8dc4f2

Yaroslav Lobankov authored 9 months ago

Bump test-run to new version with the following improvements:

- Calculate parallel jobs based on available CPUs [1]
- Bump luatest to 1.0.1-15 (--list-test-cases) [2]
- luatest: detox test searching code [3]
- luatest: allow to run test cases in parallel [4]

[1] tarantool/test-run@182aa77
[2] tarantool/test-run@1fbbf9a
[3] tarantool/test-run@3b0ccd0
[4] tarantool/test-run@dd00063

NO_DOC=test
NO_TEST=test
NO_CHANGELOG=test

(cherry picked from commit 32bcea7d)

4d8dc4f2

ci: disable workaround for LuaJIT profiling tests on aarch64 runners · 307e3377

Yaroslav Lobankov authored 9 months ago

Disable workaround for LuaJIT profiling tests on aarch64 runners due to
the following error:

    mount: /tmp/luajit-test-vardir: mount failed: Operation not permitted

Looks like it happens because our aarch64 runners are LXD containers.

NO_DOC=ci
NO_TEST=ci
NO_CHANGELOG=ci

(cherry picked from commit e64457d9)

307e3377

vinyl: fix crash on invalid upsert · ca21e6d5

Vladimir Davydov authored 9 months ago

`vy_apply_result_does_cross_pk()` must be called after the new tuple
format is validated, otherwise it may crash in case the new tuple has
fields conflicting with the primary key definition.

While we are at it, fix the operation cursor (`ups_ops`) not advanced
on this kind of error. This resulted in skipped `upsert` statements
following an invalid `upsert` statement in a transaction.

Closes #10099

NO_DOC=bug fix

(cherry picked from commit dd0ac814)

ca21e6d5

Jun 07, 2024

vinyl: fix crash on extending secondary key parts with primary · 05fa2f74

Vladimir Davydov authored 9 months ago

If a secondary index is altered in such a way that its key parts are
extended with the primary key parts, rebuild isn't required because
`cmp_def` doesn't change, see `vinyl_index_def_change_requires_rebuild`.
In this case `vinyl_index_update_def` will try to update `key_def` and
`cmp_def` in-place with `key_def_copy`. This will lead to a crash
because the number of parts in the new `key_def` is greater.

We can't use `key_def_dup` instead of `key_def_copy` there because
there may be read iterators using the old `key_def` by pointer so
there's no other option but to force rebuild in this case.

The bug was introduced in commit 64817066 ("vinyl: use update_def
index method to update vy_lsm on ddl").

Closes #10095

NO_DOC=bug fix

(cherry picked from commit 9b817848)

05fa2f74

vinyl: fix crash in index drop if there is DML request reading from it · f7f01196

Vladimir Davydov authored 9 months ago

A DML request (insert, replace, update) can yield while reading from
the disk in order to check unique constraints. In the meantime the index
can be dropped. The DML request can't crash in this case thanks to
commit d3e12369 ("vinyl: abort affected transactions when space is
removed from cache"), but the DDL operation can because:
 - It unreferences the index in `alter_space_commit`, which may result
   in dropping the LSM tree with `vy_lsm_delete`.
 - `vy_lsm_delete` may yield in `vy_range_tree_free_cb` while waiting
   for disk readers to complete.
 - Yielding in commit triggers isn't allowed (crashes).

We already fixed a similar issue when `index.get` crashed if raced
with index drop, see commit 75f03a50 ("vinyl: fix crash if space is
dropped while space.get is reading from it"). Let's fix this issue in
the same way - by taking a reference to the LSM tree while checking
unique constraints. To do that it's enough to move `vy_lsm_ref` from
`vinyl_index_get` to `vy_get`.

Also, let's replace `vy_slice_wait_pinned` with an assertion checking
that the slice pin count is 0 in `vy_range_tree_free_cb` because
`vy_lsm_delete` must not yield.

Closes #10094

NO_DOC=bug fix

(cherry picked from commit bde28f0f)

f7f01196

tuple: fix crash on hashing tuple with double fields · 73dd3a8e

Vladimir Davydov authored 9 months ago

`tuple_hash_field()` doesn't advance the MsgPack cursor after hashing
a tuple field with the type `double`, which can result in crashes both
in memtx (while inserting a tuple into a hash index) and in vinyl
(while writing a bloom filter on dump or compaction).

The bug was introduced by commit 51af059c ("box: compare and hash
msgpack value of double key field as double").

Closes #10090

NO_DOC=bug fix

(cherry picked from commit bc0daf99)

73dd3a8e

Jun 06, 2024

test: bump test-run to new version · 9b8fb7ab

Nikolay Shirokovskiy authored 10 months ago

Bump test-run to new version with the following improvements:

- Bump luatest to 1.0.1-14-gdfee2f3 [1]
- Adjust test result report width to terminal size [2]
- dispatcher: lift pipe buffer size restriction [3]
- flake8: fix E721 do not compare types [4]

[1] tarantool/test-run@84ebae5
[2] tarantool/test-run@1724211
[3] tarantool/test-run@81259c4
[4] tarantool/test-run@1037299

We also have to fix several tests that check that script with luatest
assertions have empty stderr output. test-run brings Luatest which
logs assertions at 'info' level.

Note that gh_8433_raft_is_candidate_test is different. Original
assertion involves logging huge tables that have somewhere closed
sockets inside. And 'socket.__tostring' currently raises error for
closed sockets.

NO_DOC=submodule bump
NO_TEST=submodule bump
NO_CHANGELOG=submodule bump

(cherry picked from commit 97a801e1)

9b8fb7ab

test: bump test-run to new version · ac9e8897

Oleg Chaplashkin authored 11 months ago

Bump test-run to new version with the following improvements:

- Bump luatest to 1.0.1-5-g105c69d [1]
- tap13: fix worker fail on failed TAP13 parsing [2]

[1] tarantool/test-run@ed5b623
[2] tarantool/test-run@7c1a0a7

NO_DOC=test
NO_TEST=test
NO_CHANGELOG=test

(cherry picked from commit 4466deaf)

ac9e8897

May 30, 2024

ci: fix submodule_update.yml workflow · daa080ed

Yaroslav Lobankov authored 9 months ago

- Fix the `test-sdk` job according to recent changes in SDK
- Fix sending VK Teams message on failure

NO_DOC=ci
NO_TEST=ci
NO_CHANGELOG=ci

daa080ed

May 29, 2024

txn: run statement `on_rollback` triggers before rolling back statement · 41af99a2

Georgiy Lebedev authored 11 months ago

Logically, we call triggers after running statements. These triggers can
make significant changes (for instance, DDL triggers), so, for consistency,
we should call the statement's `on_rollback` triggers before rolling back
the statement. This also adheres to the logic that transaction
`on_rollback` triggers are called before rolling back individual
transaction statements.

One particular bug that this patch fixes is rolling back of DDL on the
`_space` space. DDL is essentially a replace operation on the `_space`
space, which also invokes the `on_replace_dd_space` trigger. In this
trigger, among other things, we swap the indexes of the original space,
`alter->old_space`, which is equal to the corresponding transaction
`stmt->space`, with the indexes of the newly created space,
`alter->new_space`:
https://github.com/tarantool/tarantool/blob/de80e0264f7deb58ea86ef85b37b92653a803430/src/box/alter.cc#L1036-L1047

If then a rollback happens, we first rollback the replace operation, using
`stmt->space`, and only after that do we swap back the indexes in
`alter_space_rollback`:
https://github.com/tarantool/tarantool/blob/de80e0264f7deb58ea86ef85b37b92653a803430/src/box/memtx_engine.cc#L659-L669
https://github.com/tarantool/tarantool/blob/de80e0264f7deb58ea86ef85b37b92653a803430/src/box/alter.cc#L916-L925

For DDL on the _space space, the replace operation and DDL occur on the
same space. This means that during rollback of the replace, we will try to
do a replace in the empty indexes that were created for `alter->new_space`.
Not only does this break the replace operation, but also the newly inserted
tuple, which remains in the index, gets deleted, and access to it causes
undefined behavior (heap-use-after-free).

As part of the work on this patch, tests of rollback of DDL on system
spaces which use `on_rollback` triggers were enumerated:
* `_sequence` — box/sequence.test.lua;
* `_sequence_data` — box/sequence.test.lua;
* `_space_sequence` — box/sequence.test.lua;
* `_trigger` — sql/ddl.test.lua, sql/errinj.test.lua;
* `_collation` — engine-luatest/gh_4544_collation_drop_test.lua,
                 box/ddl_collation.test.lua;
* `_space` — box/transaction.test.lua, sql/ddl.test.lua;
* `_index` — box/transaction.test.lua, sql/ddl.test.lua;
* `_cluster` — box/transaction.test.lua;
* `_func` — box/transaction.test.lua, box/function1.test.lua;
* `_priv` — box/errinj.test.lua,
            box-luatest/rollback_ddl_on__priv_space_test.lua;
* `_user` — box/transaction.test.lua,
            box-luatest/gh_4348_transactional_ddl_test.lua.

Closes #9893

NO_DOC=<bugfix>

(cherry picked from commit d529082f)

41af99a2

box: pass statement being rolled back (if any) to `priv_grant` · 83ae9be8

Georgiy Lebedev authored 10 months ago

In scope of #9893 we are going to run statement `on_rollback` triggers
before rolling back the corresponding statement. During rollback of DDL in
the `_priv` space, the database is accessed from `user_reload_privs` to
reload user privileges, so we need it to account for the current statement
being rolled back: i.e., the new tuple that was introduced (if any) must
not be used, while the old tuple (if any) must be used.

Needed for #9893

NO_CHANGELOG=<refactoring>
NO_DOC=<refactoring>

(cherry picked from commit 797c04ff)

83ae9be8

txn: pass txn_stmt instead of txn to on_commit/on_rollback · 817697e8

Ilya Verbin authored 1 year ago

Currently on_rollback triggers are called on rollback of the whole
transaction. To make it possible to invoke them on rollback to a
savepoint, we need to pass a statement at which the savepoint was
created.

Needed for #9340

NO_DOC=refactoring
NO_TEST=refactroring
NO_CHANGELOG=refactoring

(cherry picked from commit a1d85827)

817697e8

test: fix flaky downstream lag test · abf52e08

Vladislav Shpilevoy authored 9 months ago

It could fail in ASAN build. Can't tell why just there.

The main reason was that in a topology server1 + server2->server3
one of the cases
- did a txn on server1,
- then enabled server2->server3 replication,
- then waited for server2->server3 sync,
- and instantly assumed the txn reached server3.

Surely it not always did. At the server2->server3 sync the txn
might not had reached server2 itself yet.

The fix is as simple as explicitly ensure the txn is on server2
before waiting server2->server3 sync.

Another potential for flakiness was that the default timeout in
luatest.helpers.retrying is super low, just 5 seconds. The patch
manually bumps it to 60 seconds to be sure any future failures
wouldn't be related to too small timeout.

Closes #10031

NO_DOC=test
NO_CHANGELOG=test

(cherry picked from commit d4ea121b)

abf52e08

static-build: bump the OpenSSL library version to 3.2.1 · 5d1f8c48

Georgiy Lebedev authored 1 year ago


Bump the OpenSSL library version to 3.2.1 and remove OpenSSL patches which
are already present in the updated library version.

Disable modules in OpenSSL configuration to make sure the OpenSSL 3.0
legacy provider is compiled into the library.

Closes #7502

NO_DOC=<dependency bump>
NO_TEST=<dependency bump>

Co-authored-by: Sergey Bronnikov <sergeyb@tarantool.org>
(cherry picked from commit 8de22969)

5d1f8c48

May 23, 2024

core: build fix for recent gcc · 69f2ddfc

Nikolay Shirokovskiy authored 10 months ago

```
/home/shiny/dev/tarantool/src/lib/core/coio_task.c:114:58:
	error: ‘calloc’ sizes specified with ‘sizeof’ in the earlier argument
	and not in the later argument [-Werror=calloc-transposed-args]
  114 |         struct cord *cord = (struct cord *)calloc(sizeof(struct cord), 1);
```

NO_TEST=build fix
NO_CHANGELOG=build fix
NO_DOC=build fix

(cherry picked from commit fb6b6c60)

69f2ddfc

May 22, 2024

space_upgrade: respect min_field_count of both old and new formats · 5372ef8a

Andrey Saranchin authored 10 months ago

When upgrading a space, attribute `has_optional_parts` of indexes can be
changed. So in order to correctly index both old and new tuples we should
set new min_field_count value to the minimal min_field_count of old and
new formats. Actual value will be set when space upgrade completes.

Part of tarantool/tarantool-ee#698
Part of tarantool/tarantool-ee#750

NO_TEST=in ee
NO_CHANGELOG=in ee
NO_DOC=bugfix

(cherry picked from commit c449ada4)

5372ef8a

May 21, 2024

wal: fix wal_queue_max_size assignment during initial box.cfg · 359df4fc

Serge Petrenko authored 10 months ago

wal_queue_max_size took effect only after the initial box.cfg call,
meaning that users with non-zero `replication_sync_timeout` still synced
using the default 16 Mb queue size. In some cases the default was too
big and the same issues described in #5536 arose.

Fix this.

Closes #10013

NO_DOC=bugfix

(cherry picked from commit ab0f7913)

359df4fc

May 20, 2024

vinyl: fix index name in duplicate key error message · 77fb489a

Vladimir Davydov authored 10 months ago

The code setting ER_TUPLE_FOUND uses index_name_by_id() to find
the index name, but it passes an index in the dense index map to
it while the function expects an index in the sparse index map.
Apparently, this doesn't work as expected after an index is removed
from the middle of the index map. This bug was introduced by
commit fc3834c0 ("vinyl: check key uniqueness before modifying
tx write set").

Instead of just fixing the index passed to index_name_by_id(), we do
a bit of refactoring. We stop passing index_name and space_name to
vy_check_is_unique_*() functions and instead get them right before
raising ER_TUPLE_FOUND. Note, to get the space name, we need to call
space_by_id() but it should be fine because (a) the space is very likely
to be cached as the last accessed one and (b) this is an error path so
it isn't performance critical. We also drop index_name_by_id() and
extract the index name from the LSM tree object.

Closes #5975

NO_DOC=bug fix

(cherry picked from commit 2cfba5eb)

77fb489a

vinyl: fix index build crash on invalid UPSERT · 83d7fe10

Vladimir Davydov authored 10 months ago

Like UPDATE, UPSERT must not modify primary key parts. Unlike UPDATE,
such an invalid UPSERT statement doesn't fail (raise an error) - we
just log the error and ignore the statement. The problem is, we don't
clear txn_stmt. As a result, if we're currently building a new index,
the on_replace trigger installed by the build procedure will try to
process this statement, triggering the assertion in the transaction
manager that doesn't expect any statements in a secondary index without
the corresponding statement in the primary index:

  ./src/box/vy_tx.c:728: vy_tx_prepare:
    Assertion `lsm->space_id == current_space_id' failed.

Let's fix this by clearing the txn_stmt corresponding to a skipped
UPSERT.

Note, this also means that on_replace triggers installed by the user
won't run on invalid UPSERT (hence test/vinyl/on_replace.result update),
but this is consistent with the memtx engine, which doesn't run them
in this case, either.

Closes #10026

NO_DOC=bug fix

(cherry picked from commit 5ac0d26a)

83d7fe10

May 17, 2024

relay: update lag on any acked txn · 79a8e82e

Vladislav Shpilevoy authored 10 months ago

Not only for own txns, but also on the txns authored by other
instances.

Note that the lag isn't updated when the replica got new txns from
another master. The lag still only reflects the replication
between this relay and its specific applier.

The motivation is that otherwise the lag sometimes shows
irrelevant things, like that the replica is very outdated, while
it keeps replicating just fine. Only not txns of this specific
master, who might even turned into a replica itself already.

Closes #9748

NO_DOC=bugfix

(cherry picked from commit 39af9fbe)

79a8e82e

relay: enforce prev and new ack vclocks relation · aeaca11e

Vladislav Shpilevoy authored 10 months ago

From the code it isn't obvious, but relay->status_msg.vclock and
relay->last_recv_ack.vclock are both coming from the applier.
Status_msg is the previous ack, last_recv_ack is the latest ack.

They can never go down. And are not affected anyhow by the master
committing its own transactions. I.e. master can commit something,
relay->r->vclock (recovery cursor) will go up, and recovery vclock
might become incomparable with the last ACK vclock. But the prev
and last ACK vclocks are always comparable and always go up.

This invariant was broken though, because relay on restart didn't
nullify the current applier status (status_msg). It could break
if the replica would loose its xlog files or its ID would be
taken by another instance - then its vclock would go down, making
last_recv_ack.vclock < status_msg.vclock. But that is not right
and is fixed in this patch.

In scope of #9748

NO_DOC=bugfix
NO_TEST=test 5158 already covers it
NO_CHANGELOG=bugfix

(cherry picked from commit 71dbb47c)

aeaca11e

relay: move ack handling into new function · 68fdba93

Vladislav Shpilevoy authored 10 months ago

To reduce the insane indentation level. And to isolate the further
changes in next commits more.

Part of #9748

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit d6f15a10)

68fdba93

applier: drop lag to zero on reconnect · 9ce12d45

Vladislav Shpilevoy authored 10 months ago

Before the patch if the applier was reconnected, the master would
see downstream lag equal to the time since it replicated the last
txn to this applier.

This happened because applier between reconnects kept the txn
timestamp used for acks. On the master's side the relay was
recreated, received the ack, thought the applier just applied this
txn, and displayed this as a lag.

The test makes a master restart because this is the easiest way to
reproduce it. Most importantly, the applier shouldn't be
re-created, and relay should restart.

Part of #9748

NO_DOC=bugfix
NO_CHANGELOG=later

(cherry picked from commit dda42035)

9ce12d45

applier: move applier_txn_last_tm into applier · bda619c2

Vladislav Shpilevoy authored 10 months ago

It was stored in struct replica, now is in struct applier. The
motivation is that applier-specific data must be inside the
applier.

Also it makes the next commits look more logical. They are going
to change this timestamp when applier progresses through its state
machine. It looks strange when the applier is changing the replica
object. Replica is on an upper level in the hierarchy. It owns the
applier and the applier ideally mustn't know about struct replica
(hardly possible to achieve), or at least not change it (this is
feasible).

In scope of #9748

NO_DOC=internal
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit 8e5d9f2a)

bda619c2

vinyl: fix bug when tuple not committed to unique nullable index · 4db12994

Vladimir Davydov authored 10 months ago

A unique nullable key definition extended with primary key parts
(cmp_def) assumes that two tuples are equal *without* comparing
primary key fields if all secondary key fields are equal and not
nulls, see tuple_compare_slowpath(). This is a hack required to
ignore the uniqueness constraint for nulls in memtx. The memtx
engine can't use the secondary key definition as is (key_def) for
comparing tuples in the index tree, as it does for a non-nullable
unique index, because this wouldn't allow insertion of any
duplicates, including nulls. It couldn't use cmp_def without this
hack, either, because then conflicting tuples with the same
secondary key fields would always compare as not equal due to
different primary key parts.

For Vinyl, this hack isn't required because it explicitly skips
the uniqueness check if any of the indexed fields are nulls, see
vy_check_is_unique_secondary(). Furthermore, this hack is harmful
because Vinyl relies on the fact that two tuples compare as equal by
cmp_def if and only if *all* key fields (both secondary and primary)
are equal. For example, this is used in the transaction manager,
which overwrites statements equal by cmp_def, see vy_tx_set_entry().

Let's disable this hack by resetting unique_part_count in cmp_def.

Closes #9769

NO_DOC=bug fix

(cherry picked from commit 2e689063)

4db12994

May 16, 2024

vinyl: fix use-after-free of LSM tree in scheduler · 660c355f

Vladimir Davydov authored 10 months ago

Between picking an LSM tree from a heap and taking a reference to it in
vy_task_new() there are a few places where the scheduler may yield:
 - in vy_worker_pool_get() to start a worker pool;
 - in vy_task_dump_new() to wait for a memory tree to be unpinned;
 - in vy_task_compaction_new() to commit an entry to the metadata log
   after splitting or coalescing a range.

If a concurrent fiber drops and deletes the LSM tree in the meanwhile,
the scheduler will crash. To avoid that, let's take a reference to
the LSM tree.

It's quite difficult to write a functional test for it without a bunch
of ugly error injections so we rely on fuzzing tests.

Closes #9995

NO_DOC=bug fix
NO_TEST=fuzzing

(cherry picked from commit 1c4605bb)

660c355f

May 14, 2024

test/interactive: add connect() function · ba66508f

Alexander Turenko authored 10 months ago

It encapsulates all the needed actions to connect to a remote console
using a Unix socket.

Part of #9985

NO_DOC=testing helper change
NO_CHANGELOG=see NO_DOC

(cherry picked from commit bb430c55)

ba66508f

test/interactive: disable hide/show prompt feature · c9d5b345

Alexander Turenko authored 10 months ago

See #7169 for details about the hide/show prompt feature. In short, it
hides readline's prompt before `print()` or `log.<level>()` calls and
restores the prompt afterwards.

This feature sometimes badly interferes with
`test.interactive_tarantool` heuristics about readline's command
echoing.

This commit disables the feature in `test.interactive_tarantool` by
default and enables it explicitly where needed.

Part of #9985

NO_DOC=testing helper change
NO_CHANGELOG=see NO_DOC

(cherry picked from commit 23094b6f)

c9d5b345