Commits · 737dc12a0bf5bc1d47b058b34128f0fc98bf77ca · core / tarantool

Jul 16, 2024

box: update synchro quorum in on_commit trigger instead of on_replace · 737dc12a

Georgiy Lebedev authored 9 months ago

Currently, we update the synchronous replication quorum from the
`on_replace` trigger of the `_cluster` space when registering a new
replica. However, during the join process, the replica cannot ack its own
insertion into the `_cluster` space. In the scope of #9723, we are going to
enable synchronous replication for most of the system spaces, including the
`_cluster` space. There are several problems with this:

1. Joining a replica to a 1-member cluster without manual changing of
quorum won't work: it is impossible to commit the insertion into the
`_cluster` space with only 1 node, since the quorum will equal to 2 right
after the insertion.

2. Joining a replica to a 3-member cluster may fail: the quorum will become
equal to 3 right after the insertion, the newly joined replica cannot ACK
its own insertion into the `_cluster` space — if one out of original 3
nodes fails, then reconfiguration will fail.

Generally speaking, it will be impossible to join a new replica to the
cluster, if a quorum, which includes the newly added replica (which cannot
ACK), cannot be gathered.

To solve these problems, let's update the quorum in the `on_commit`
trigger. This way we’ll be able to insert a node regardless of the current
configuration. This somewhat contradicts with the Raft specification, which
requires application of all configuration changes in the `on_replace`
trigger (i.e., as soon as they are persisted in the WAL, without quorum
confirmation), but still forbids several reconfigurations at the same time.

Closes #10087

NO_DOC=<no special documentation page devoted to cluster reconfiguration>

(cherry picked from commit 29d1c0fa)

737dc12a

Jul 15, 2024

vinyl: use broadcast instead of signal to notify about dump completion · 04347ee7

Vladimir Davydov authored 8 months ago

There may be more than one fiber waiting on `vy_scheduler::dump_cond`:

```
box.snapshot
  vinyl_engine_wait_checkpoint
    vy_scheduler_wait_checkpoint

space.create_index
  vinyl_space_build_index
    vy_scheduler_dump
```

To avoid hang, we should use `fiber_cond_broadcast`.

Closes #10233

NO_DOC=bug fix

(cherry picked from commit 30547157)

04347ee7

Jul 08, 2024

fiber: phohibit fiber self join · 2131743e

Nikolay Shirokovskiy authored 8 months ago

In this case join will just hang. Instead let's raise an error in case
of Lua API and panic in case of C API.

Closes #10196

NO_DOC=minor

(cherry picked from commit 1e1bf36d)

2131743e

fiber: make the concurrent fiber_join safer · cf3def52

Magomed Kostoev authored 1 year ago

Prior to this patch a bunch of illegal conditions was possible:
1. The joinability of a fiber could be changed while the fiber is
   being joined by someone. This could lead to double recycling:
   the first one happened on the fiber finish, and the second one
   in the fiber join.
2. The joinability of a dead joinable fiber could be altered, this
   led to inability jo join the dead fiber and free its resources.
3. A running fiber could be joined concurrently by two or more
   fibers, so the fiber could be recycled more than once (once
   per each concurrent join).
4. A dead recycled fiber could be made joinable and joined leading
   to the double recycle.

Fixed these issues by adding a new FIBER_JOIN_BEEN_INVOKED flag: now
the `fiber_set_joinable` and `fiber_join_timeout` functions detect
the double join. Because of the API limitations both of them panic
when an invalid condition is met:
- The `fiber_set_joinable` was not designed to report errors.
- The `fiber_join_timeout` can't raise any error unless a timeout
  is met, because the `fiber_join` users don't expect to receive
  any error from this function at all (except the one generated
  by the joined fiber).

It's still possible that a fiber join is performed on a struct which
has been recycled and, if the new fiber is joinable too, this can't
be detected. The current fiber API does not allow to fix this, so
this is to be the user's responsibility, they should be warned about
the fact the double join to the same fiber is illegal.

Closes #7562

@TarantoolBot document
Title: `fiber_join`, `fiber_join_timeout` and `fiber_set_joinable`
behave differently now.

`fiber_join` and `fiber_join_timeout` now panic in case if double
join of the given fiber is detected.

`fiber_set_joinable` now panics if the given fiber is dead or is
joined already. This prevents some amount of error conditions that
could happen when using the API in an unexpected way, including:
- Making a dead joinable fiber non-joinable could lead to a memory
  leak: one can't join the fiber anymore.
- Making a dead joinable fiber joinable again is a sign of attempt
  to join the fiber later. That means the fiber struct may be joined
  later, when it's been recycled and reused. This could lead to a
  very hard to debug double join.
- Making an alive joined fiber non-joinable would lead to the double
  free: once on the fiber function finish, and secondly in the active
  fiber join finish. Risks of making it joinable are described above.
- Making a dead and recycled fiber joinable allowed to join the fiber
  once again leading to a double free.

Any given by the API `struct fiber` should only be joined once. If a
fiber is joined after the first join on it has finished the behavior
is undefined: it can either be a panic or an incidental join to a
totally foreign fiber.

(cherry picked from commit 44401529)

cf3def52

Jul 04, 2024

fiber: fix leak on dead joinable fiber search · e97b01f6

Nikolay Shirokovskiy authored 8 months ago

When fiber is accessed from Lua we create a userdata object and keep the
reference for future accesses. The reference is cleared when fiber is
stopped. But if fiber is joinable is still can be found with
`fiber.find`. In this case we create userdata object again.
Unfortunately as fiber is already stopped we fail to clear the
reference. The trigger memory that clear the reference is also leaked.
As well as fiber storage if it is accessed after fiber is stopped.

Let's add `on_destroy` trigger to fiber and clear the references there.

Note that with current set of LSAN suppressions the trigger memory leak
of the issue is not reported.

Closes #10187

NO_DOC=bugfix

(cherry picked from commit 7db4de75)

e97b01f6

Jun 26, 2024

box: fix memleak on functional index drop · 432789dc

Nikolay Shirokovskiy authored 8 months ago

We just don't free functional index keys on functional index drop now.
Let's approach keys deletion as in the case of primary index drop ie
let's drop these keys in background.

We should set `use_hint` to `true` in case of MEMTX_TREE_VTAB_DISABLED
tree index methods because `memtx_tree_disabled_index_vtab` uses
`memtx_tree_index_destroy<true>`. Otherwise we get read outside of index
structure for stub functional index on destroy for introduced `is_func`
field (which is reported by ASAN).

Closes #10163

NO_DOC=bugfix

(cherry picked from commit 319357d5)

432789dc

Jun 13, 2024

relay: do not report vclock[0] anywhere · 4f2e67f5

Vladislav Shpilevoy authored 9 months ago

Remote replica's vclock is given to master to send data starting
from that position. The master does that, but, in order to find
the relevant position in local WAL to start from, the master must
ignore the local rows. Consider them all already "sent". For that
the master replaces the remote vclock[0] with the local vclock[0].
That makes xlog cursor skip all the local rows.

The problem is that this vclock was taken by relay as is, like if
it was truly reported by the replica. It was even saved as the
"last received ACK". Which clearly isn't the case.

When a real ACK was received, it didn't contain anything in
vclock[0], and yet relay "saw" that the previous ACK has
vclock[0] > 0. That looked like the replica went backwards without
even closing connection, which isn't possible. That made the relay
crash from cringe (on assert).

The fix is not to save the local vclock[0] in the last received
ACK.

For GC and xlog cursor the hack is still needed. An option how to
make it easier was to set vclock[0] to INT64_MAX to just never
even bother with any local rows, but that didn't work. Some
assumptions in other places seem to depend on having a proper
local LSN in these places.

Closes #10047

NO_CHANGELOG=the bug wasn't released
NO_DOC=bugfix

(cherry picked from commit 1f75231a)

4f2e67f5

box: prevent demoted leader from being a candidate in the next elections · 22a9cfd8

Georgiy Lebedev authored 10 months ago


Currently, the demoted leader sees that nobody has requested a vote in the
newly persisted term (because it has just written it without voting, and
nobody had time to see the new term yet), and hence votes for itself,
becoming the most probable winner of the next elections.

To prevent this from happening, let's forbid the demoted leader to be a
candidate in the next elections using `box_raft_leader_step_off`.

Closes #9855

NO_DOC=<bugfix>

Co-authored-by: Serge Petrenko <sergepetrenko@tarantool.org>
(cherry picked from commit 05d03a1c)

22a9cfd8

election: fix box.ctl.demote() nop in off-mode · 42631d5b

Vladislav Shpilevoy authored 1 year ago

box.ctl.demote() used not to do anything with election_mode='off'
if the synchro queue didn't belong to the caller in the same term
as the election state.

The reason could be that if the synchro queue term is "outdated",
there is no guarantee that some other instance doesn't own it in
the latest term right now.

The "problem" is that this could be workarounded easily by just
calling promote + demote together.

There isn't much sense in fixing it for the off-mode because the
only reasons off-mode exists are 1) for people who don't use
synchro at all, 2) who did use it and want to stop. Hence they
need demote just to disown the queue.

The patch "legalizes" the mentioned workaround by allowing to
perform demote in off-mode even if the synchro queue term is old.

Closes #6860

NO_DOC=bugfix

(cherry picked from commit 1afe2274)

42631d5b

vinyl: fix cache iterator skipping tuples in read view · 1bad1afc

Vladimir Davydov authored 9 months ago

The tuple cache doesn't store older tuple versions so if a reader is
in a read view, it must skip tuples that are newer than the read view,
see `vy_cache_iterator_stmt_is_visible()`. A reader must also ignore
cached intervals if any of the tuples used as a boundary is invisible
from the read view, see `vy_cache_iterator_skip_to_read_view()`.
There's a bug in `vy_cache_iterator_restore()` because of which such
an interval may be returned to the reader: when we step backwards
from the last returned tuple we consider only one of the boundaries.
As a result, if the other boundary is invisible from the read view,
the reader will assume there's nothing in the index between the
boundaries and skip reading older sources (memory, disk). Fix this by
always checking if the other boundary is visible.

Closes #10109

NO_DOC=bug fix

(cherry picked from commit 7b72080d)

1bad1afc

vinyl: fix run iterator skipping tuples following non-terminal statement · 56b6ed79

Vladimir Davydov authored 9 months ago

If a run iterator is positioned at a non-terminal statement (UPSERT or
UPDATE), `vy_run_iterator_next()` will iterate over older statements
with the same key using `vy_run_iterator_next_lsn()` to build the key
history. While doing so, it may reach the end of the run file (if the
current key is the last in the run). This would stop iteration
permanently, which is apparently wrong for reverse iterators (LE or LT):
if this happens the run iterator won't return any keys preceding the
last one in the run file. Fix this by removing `vy_run_iterator_stop()`
from `vy_run_iterator_next_lsn()`.

Part of #10109

NO_DOC=bug fix
NO_CHANGELOG=next commit

(cherry picked from commit 72763f94)

56b6ed79

Jun 10, 2024

vinyl: fix crash on invalid upsert · ca21e6d5

Vladimir Davydov authored 9 months ago

`vy_apply_result_does_cross_pk()` must be called after the new tuple
format is validated, otherwise it may crash in case the new tuple has
fields conflicting with the primary key definition.

While we are at it, fix the operation cursor (`ups_ops`) not advanced
on this kind of error. This resulted in skipped `upsert` statements
following an invalid `upsert` statement in a transaction.

Closes #10099

NO_DOC=bug fix

(cherry picked from commit dd0ac814)

ca21e6d5

Jun 07, 2024

vinyl: fix crash on extending secondary key parts with primary · 05fa2f74

Vladimir Davydov authored 9 months ago

If a secondary index is altered in such a way that its key parts are
extended with the primary key parts, rebuild isn't required because
`cmp_def` doesn't change, see `vinyl_index_def_change_requires_rebuild`.
In this case `vinyl_index_update_def` will try to update `key_def` and
`cmp_def` in-place with `key_def_copy`. This will lead to a crash
because the number of parts in the new `key_def` is greater.

We can't use `key_def_dup` instead of `key_def_copy` there because
there may be read iterators using the old `key_def` by pointer so
there's no other option but to force rebuild in this case.

The bug was introduced in commit 64817066 ("vinyl: use update_def
index method to update vy_lsm on ddl").

Closes #10095

NO_DOC=bug fix

(cherry picked from commit 9b817848)

05fa2f74

vinyl: fix crash in index drop if there is DML request reading from it · f7f01196

Vladimir Davydov authored 9 months ago

A DML request (insert, replace, update) can yield while reading from
the disk in order to check unique constraints. In the meantime the index
can be dropped. The DML request can't crash in this case thanks to
commit d3e12369 ("vinyl: abort affected transactions when space is
removed from cache"), but the DDL operation can because:
 - It unreferences the index in `alter_space_commit`, which may result
   in dropping the LSM tree with `vy_lsm_delete`.
 - `vy_lsm_delete` may yield in `vy_range_tree_free_cb` while waiting
   for disk readers to complete.
 - Yielding in commit triggers isn't allowed (crashes).

We already fixed a similar issue when `index.get` crashed if raced
with index drop, see commit 75f03a50 ("vinyl: fix crash if space is
dropped while space.get is reading from it"). Let's fix this issue in
the same way - by taking a reference to the LSM tree while checking
unique constraints. To do that it's enough to move `vy_lsm_ref` from
`vinyl_index_get` to `vy_get`.

Also, let's replace `vy_slice_wait_pinned` with an assertion checking
that the slice pin count is 0 in `vy_range_tree_free_cb` because
`vy_lsm_delete` must not yield.

Closes #10094

NO_DOC=bug fix

(cherry picked from commit bde28f0f)

f7f01196

tuple: fix crash on hashing tuple with double fields · 73dd3a8e

Vladimir Davydov authored 9 months ago

`tuple_hash_field()` doesn't advance the MsgPack cursor after hashing
a tuple field with the type `double`, which can result in crashes both
in memtx (while inserting a tuple into a hash index) and in vinyl
(while writing a bloom filter on dump or compaction).

The bug was introduced by commit 51af059c ("box: compare and hash
msgpack value of double key field as double").

Closes #10090

NO_DOC=bug fix

(cherry picked from commit bc0daf99)

73dd3a8e

Jun 06, 2024

test: bump test-run to new version · 9b8fb7ab

Nikolay Shirokovskiy authored 10 months ago

Bump test-run to new version with the following improvements:

- Bump luatest to 1.0.1-14-gdfee2f3 [1]
- Adjust test result report width to terminal size [2]
- dispatcher: lift pipe buffer size restriction [3]
- flake8: fix E721 do not compare types [4]

[1] tarantool/test-run@84ebae5
[2] tarantool/test-run@1724211
[3] tarantool/test-run@81259c4
[4] tarantool/test-run@1037299

We also have to fix several tests that check that script with luatest
assertions have empty stderr output. test-run brings Luatest which
logs assertions at 'info' level.

Note that gh_8433_raft_is_candidate_test is different. Original
assertion involves logging huge tables that have somewhere closed
sockets inside. And 'socket.__tostring' currently raises error for
closed sockets.

NO_DOC=submodule bump
NO_TEST=submodule bump
NO_CHANGELOG=submodule bump

(cherry picked from commit 97a801e1)

9b8fb7ab

May 29, 2024

txn: run statement `on_rollback` triggers before rolling back statement · 41af99a2

Georgiy Lebedev authored 11 months ago

Logically, we call triggers after running statements. These triggers can
make significant changes (for instance, DDL triggers), so, for consistency,
we should call the statement's `on_rollback` triggers before rolling back
the statement. This also adheres to the logic that transaction
`on_rollback` triggers are called before rolling back individual
transaction statements.

One particular bug that this patch fixes is rolling back of DDL on the
`_space` space. DDL is essentially a replace operation on the `_space`
space, which also invokes the `on_replace_dd_space` trigger. In this
trigger, among other things, we swap the indexes of the original space,
`alter->old_space`, which is equal to the corresponding transaction
`stmt->space`, with the indexes of the newly created space,
`alter->new_space`:
https://github.com/tarantool/tarantool/blob/de80e0264f7deb58ea86ef85b37b92653a803430/src/box/alter.cc#L1036-L1047

If then a rollback happens, we first rollback the replace operation, using
`stmt->space`, and only after that do we swap back the indexes in
`alter_space_rollback`:
https://github.com/tarantool/tarantool/blob/de80e0264f7deb58ea86ef85b37b92653a803430/src/box/memtx_engine.cc#L659-L669
https://github.com/tarantool/tarantool/blob/de80e0264f7deb58ea86ef85b37b92653a803430/src/box/alter.cc#L916-L925

For DDL on the _space space, the replace operation and DDL occur on the
same space. This means that during rollback of the replace, we will try to
do a replace in the empty indexes that were created for `alter->new_space`.
Not only does this break the replace operation, but also the newly inserted
tuple, which remains in the index, gets deleted, and access to it causes
undefined behavior (heap-use-after-free).

As part of the work on this patch, tests of rollback of DDL on system
spaces which use `on_rollback` triggers were enumerated:
* `_sequence` — box/sequence.test.lua;
* `_sequence_data` — box/sequence.test.lua;
* `_space_sequence` — box/sequence.test.lua;
* `_trigger` — sql/ddl.test.lua, sql/errinj.test.lua;
* `_collation` — engine-luatest/gh_4544_collation_drop_test.lua,
                 box/ddl_collation.test.lua;
* `_space` — box/transaction.test.lua, sql/ddl.test.lua;
* `_index` — box/transaction.test.lua, sql/ddl.test.lua;
* `_cluster` — box/transaction.test.lua;
* `_func` — box/transaction.test.lua, box/function1.test.lua;
* `_priv` — box/errinj.test.lua,
            box-luatest/rollback_ddl_on__priv_space_test.lua;
* `_user` — box/transaction.test.lua,
            box-luatest/gh_4348_transactional_ddl_test.lua.

Closes #9893

NO_DOC=<bugfix>

(cherry picked from commit d529082f)

41af99a2

box: pass statement being rolled back (if any) to `priv_grant` · 83ae9be8

Georgiy Lebedev authored 10 months ago

In scope of #9893 we are going to run statement `on_rollback` triggers
before rolling back the corresponding statement. During rollback of DDL in
the `_priv` space, the database is accessed from `user_reload_privs` to
reload user privileges, so we need it to account for the current statement
being rolled back: i.e., the new tuple that was introduced (if any) must
not be used, while the old tuple (if any) must be used.

Needed for #9893

NO_CHANGELOG=<refactoring>
NO_DOC=<refactoring>

(cherry picked from commit 797c04ff)

83ae9be8

test: fix flaky downstream lag test · abf52e08

Vladislav Shpilevoy authored 9 months ago

It could fail in ASAN build. Can't tell why just there.

The main reason was that in a topology server1 + server2->server3
one of the cases
- did a txn on server1,
- then enabled server2->server3 replication,
- then waited for server2->server3 sync,
- and instantly assumed the txn reached server3.

Surely it not always did. At the server2->server3 sync the txn
might not had reached server2 itself yet.

The fix is as simple as explicitly ensure the txn is on server2
before waiting server2->server3 sync.

Another potential for flakiness was that the default timeout in
luatest.helpers.retrying is super low, just 5 seconds. The patch
manually bumps it to 60 seconds to be sure any future failures
wouldn't be related to too small timeout.

Closes #10031

NO_DOC=test
NO_CHANGELOG=test

(cherry picked from commit d4ea121b)

abf52e08

May 21, 2024

wal: fix wal_queue_max_size assignment during initial box.cfg · 359df4fc

Serge Petrenko authored 10 months ago

wal_queue_max_size took effect only after the initial box.cfg call,
meaning that users with non-zero `replication_sync_timeout` still synced
using the default 16 Mb queue size. In some cases the default was too
big and the same issues described in #5536 arose.

Fix this.

Closes #10013

NO_DOC=bugfix

(cherry picked from commit ab0f7913)

359df4fc

May 20, 2024

vinyl: fix index name in duplicate key error message · 77fb489a

Vladimir Davydov authored 10 months ago

The code setting ER_TUPLE_FOUND uses index_name_by_id() to find
the index name, but it passes an index in the dense index map to
it while the function expects an index in the sparse index map.
Apparently, this doesn't work as expected after an index is removed
from the middle of the index map. This bug was introduced by
commit fc3834c0 ("vinyl: check key uniqueness before modifying
tx write set").

Instead of just fixing the index passed to index_name_by_id(), we do
a bit of refactoring. We stop passing index_name and space_name to
vy_check_is_unique_*() functions and instead get them right before
raising ER_TUPLE_FOUND. Note, to get the space name, we need to call
space_by_id() but it should be fine because (a) the space is very likely
to be cached as the last accessed one and (b) this is an error path so
it isn't performance critical. We also drop index_name_by_id() and
extract the index name from the LSM tree object.

Closes #5975

NO_DOC=bug fix

(cherry picked from commit 2cfba5eb)

77fb489a

vinyl: fix index build crash on invalid UPSERT · 83d7fe10

Vladimir Davydov authored 10 months ago

Like UPDATE, UPSERT must not modify primary key parts. Unlike UPDATE,
such an invalid UPSERT statement doesn't fail (raise an error) - we
just log the error and ignore the statement. The problem is, we don't
clear txn_stmt. As a result, if we're currently building a new index,
the on_replace trigger installed by the build procedure will try to
process this statement, triggering the assertion in the transaction
manager that doesn't expect any statements in a secondary index without
the corresponding statement in the primary index:

  ./src/box/vy_tx.c:728: vy_tx_prepare:
    Assertion `lsm->space_id == current_space_id' failed.

Let's fix this by clearing the txn_stmt corresponding to a skipped
UPSERT.

Note, this also means that on_replace triggers installed by the user
won't run on invalid UPSERT (hence test/vinyl/on_replace.result update),
but this is consistent with the memtx engine, which doesn't run them
in this case, either.

Closes #10026

NO_DOC=bug fix

(cherry picked from commit 5ac0d26a)

83d7fe10

May 17, 2024

relay: update lag on any acked txn · 79a8e82e

Vladislav Shpilevoy authored 10 months ago

Not only for own txns, but also on the txns authored by other
instances.

Note that the lag isn't updated when the replica got new txns from
another master. The lag still only reflects the replication
between this relay and its specific applier.

The motivation is that otherwise the lag sometimes shows
irrelevant things, like that the replica is very outdated, while
it keeps replicating just fine. Only not txns of this specific
master, who might even turned into a replica itself already.

Closes #9748

NO_DOC=bugfix

(cherry picked from commit 39af9fbe)

79a8e82e

applier: drop lag to zero on reconnect · 9ce12d45

Vladislav Shpilevoy authored 10 months ago

Before the patch if the applier was reconnected, the master would
see downstream lag equal to the time since it replicated the last
txn to this applier.

This happened because applier between reconnects kept the txn
timestamp used for acks. On the master's side the relay was
recreated, received the ack, thought the applier just applied this
txn, and displayed this as a lag.

The test makes a master restart because this is the easiest way to
reproduce it. Most importantly, the applier shouldn't be
re-created, and relay should restart.

Part of #9748

NO_DOC=bugfix
NO_CHANGELOG=later

(cherry picked from commit dda42035)

9ce12d45

vinyl: fix bug when tuple not committed to unique nullable index · 4db12994

Vladimir Davydov authored 10 months ago

A unique nullable key definition extended with primary key parts
(cmp_def) assumes that two tuples are equal *without* comparing
primary key fields if all secondary key fields are equal and not
nulls, see tuple_compare_slowpath(). This is a hack required to
ignore the uniqueness constraint for nulls in memtx. The memtx
engine can't use the secondary key definition as is (key_def) for
comparing tuples in the index tree, as it does for a non-nullable
unique index, because this wouldn't allow insertion of any
duplicates, including nulls. It couldn't use cmp_def without this
hack, either, because then conflicting tuples with the same
secondary key fields would always compare as not equal due to
different primary key parts.

For Vinyl, this hack isn't required because it explicitly skips
the uniqueness check if any of the indexed fields are nulls, see
vy_check_is_unique_secondary(). Furthermore, this hack is harmful
because Vinyl relies on the fact that two tuples compare as equal by
cmp_def if and only if *all* key fields (both secondary and primary)
are equal. For example, this is used in the transaction manager,
which overwrites statements equal by cmp_def, see vy_tx_set_entry().

Let's disable this hack by resetting unique_part_count in cmp_def.

Closes #9769

NO_DOC=bug fix

(cherry picked from commit 2e689063)

4db12994

May 14, 2024

test/interactive: add connect() function · ba66508f

Alexander Turenko authored 10 months ago

It encapsulates all the needed actions to connect to a remote console
using a Unix socket.

Part of #9985

NO_DOC=testing helper change
NO_CHANGELOG=see NO_DOC

(cherry picked from commit bb430c55)

ba66508f

test/interactive: disable hide/show prompt feature · c9d5b345

Alexander Turenko authored 10 months ago

See #7169 for details about the hide/show prompt feature. In short, it
hides readline's prompt before `print()` or `log.<level>()` calls and
restores the prompt afterwards.

This feature sometimes badly interferes with
`test.interactive_tarantool` heuristics about readline's command
echoing.

This commit disables the feature in `test.interactive_tarantool` by
default and enables it explicitly where needed.

Part of #9985

NO_DOC=testing helper change
NO_CHANGELOG=see NO_DOC

(cherry picked from commit 23094b6f)

c9d5b345

test/interactive: allow to assert for nil response · 73e03318

Alexander Turenko authored 10 months ago

Before this patch the `:roundtrip()` method in the
`test.interative_tarantool` instance considered the following calls as
equivalent:

```lua
g.it = it.new()

-- Doesn't check the response.
g.it:roundtrip('x')

-- Before the patch it was the same as above.
--
-- Now it checks that the response is nil.
local expected = nil
g.it:roundtrip('x', expected)

-- It is the same as previous.
g.it:roundtrip('x', nil)
```

Now the response is checked against the provided expected value if the
value is passed to arguments, even if it is `nil`.

Also, a command's response is now returned from the method. It may be
useful if the response returns some dynamic information (such as a TCP
port number or a file descriptor) that is used later in the test or if
the response should be verified in some non-trivial way, not just a deep
compare.

The `:roundtrip()` method is just `:execute_command()` plus
`:read_response()` plus `luatest.assert_equals()`. However, I like using
`:roundtrip()` even when the assertion is not needed, because it is
shorter and because using the same method brings less context to a
reader.

For example,

```lua
g.it = it.new()
g.it:roundtrip('x = 2')
g.it:roundtrip('y = 3')
g.it:roundtrip('x + y', 6)
```

Part of #9985

NO_DOC=testing helper change
NO_CHANGELOG=see NO_DOC
NO_TEST=see NO_DOC

(cherry picked from commit 7d9e8569)

73e03318

May 08, 2024

relay: fix timeout on large xlog subscribe start · df6988e6

Vladislav Shpilevoy authored 11 months ago

When a replica subscribes, it might in the beginning try to
position its reader cursor to the end of a large xlog file.

Positioning inside of this file can take significant time during
which the WAL reader yielded and tried to send heartbeats, but
couldn't, because the relay thread wasn't communicating with the
TX thread.

When there are no messages from TX for too long time, the
heartbeats to the replica are not being sent
(commit 56571d83 ("raft: make
followers notice leader hang")).

The relay must communicate with the TX thread even when subscribe
is just being started and opens a large xlog file.

This isn't the first time when the missing heartbeats result into
timeouts. See more here:

- commit 30ad4a55 ("relay: yield
    explicitly every N sent rows").

- commit 17289440 ("recovery: make
    it yield when positioning in a WAL").

- commit ee6de025 ("relay: send
    heartbeats while reading a WAL").

Given that this is fixed fourth time, it might suggest that the
relay has not the best architecture having some slight drawbacks.
See more in #9968.

Closes #9094

NO_DOC=bugfix

(cherry picked from commit f7e6686a)

df6988e6

Apr 23, 2024

netbox: close transport after stopping worker loop and wait for the stop · c4c022e2

Georgiy Lebedev authored 1 year ago


Currently, we close the transport from transport from
`luaT_netbox_transport_stop`, and we do not wait for the worker fiber to
stop. This causes several problems.

Firstly, the worker can switch context by yielding (`coio_wait`) or
entering the Lua VM (`netbox_on_state_change`). During a context switch,
the connection can get closed. When the connection is closed, its receive
buffer is reset. If there was some pending response that was partially
retrieved (e.g., a large select), then after resetting the buffer we will
read some inconsistent data. We must not allow this to happen, so let's
check for this case after returning from places where the worker can switch
context. In between closing the connection and cancelling the connection's
worker, an `on_disconnect` trigger can be called, which, in turn, can
also yield, returning control to the worker before it gets cancelled.

Secondly, when the worker enters the Lua VM, garbage collection can be
triggered and the connection owning the worker could get closed
unexpectedly to the worker.

The fundamental source of these problems is that we close the transport
before the worker's loop stops. Instead, we should close it after the
worker's loop stops. In `luaT_netbox_transport_stop`, we should only cancel
the worker, and either wait for the worker to stop, if we are not executing
on it, or otherwise throw an exception (`luaL_testcancel`) to stop the
worker's loop. The user will still have the opportunity to catch this
exception and prevent stoppage of the worker at his own risk. To safeguard
from this scenario, we will now keep the `is_closing` flag enabled once
`luaT_netbox_transport_stop` is called and never disable it.

There also still remains a special case of the connection getting garbage
collected, when it is impossible to stop the worker's loop, since we cannot
join the worker (yielding is forbidden from finalizers), and an exception
will not go past the finalizer. However, this case is safe, since the
connection is not going to be used by this point, so the worker can simply
stop on its own at some point. The only thing we need to account for is
that we cannot wait for the worker to stop: we can reuse the `wait` option
of `luaT_netbox_transport_stop` for this.

Closes #9621
Closes #9826

NO_DOC=<bugfix>

Co-authored-by: Vladimir Davydov <vdavydov@tarantool.org>
(cherry picked from commit fcf7f5c4)

Cherry pick note: Dropped gh_9621_netbox_worker_crash_test because
box.iproto.encode helpers aren't available on 2.11.

c4c022e2

Apr 16, 2024

memtx: fix a bug with mvcc and exclude_null option · 364af9bf

Aleksandr Lyapunov authored 11 months ago

Before this patch MVCC engine expected that if index_replace
sets `result` to NULL then index_replace sets `successor` to
something (NULL or existing tuple, depending on index type).
That looked fine because by contract `successor` is set when
true insertion was happened.

Unfortunately it was not considered that in case of part with
`exclude_null` option in index the insertion can be silently
skipped and thus `successor` can be not set. The latter access
of it was actually an UB.

Fix it by explicit check of tuple_key_is_excluded and work on
this case correctly.

Note that logically `index_replace` should return a flag whether
the new tuple was filtered (excluded) by key_def. But on the other
hand this flag is required only for mvcc while the function is
already has lots of arguments and it's very cheap to determine
this flag right from memtx_tx, so I decided to make the most
simple patch.

NO_DOC=bugfix

(cherry picked from commit 14e21297)

364af9bf

Apr 05, 2024

salad: light: limit table size by (2^32)-8 records · 2bc49964

Ilya Verbin authored 11 months ago

This is the maximum possible capacity of a hash table with 32-bit record
identifiers and 8-element `LIGHT_GROW_INCREMENT`.

Needed for #9864

NO_DOC=see next commit
NO_CHANGELOG=see next commit

(cherry picked from commit f955ca0c)

2bc49964

salad: light: fix LIGHT(slot)() for table sizes > 2^31 · 6c6c1157

Ilya Verbin authored 11 months ago

This function contains a bitwise optimization that returns wrong result
when table size is greater than 2^31. E.g., if table size is 0xB0000000
and hash is 0, it returns 0x80000000 instead of 0. Fix it.

Needed for #9864

NO_DOC=see next commit
NO_CHANGELOG=see next commit

(cherry picked from commit d279368c)

6c6c1157

Apr 03, 2024

net.box: update connection state before calling any triggers · 3bef5baa

Georgiy Lebedev authored 1 year ago

Currently, the connection state is updated after calling triggers. However,
the triggers can, in turn, cause a new state change. The state will be
updated the state in the wrong order, and the original state change will
overwrite the state change from the trigger. To fix this, let's update the
connection state before calling any triggers.

Closes #9827

NO_DOC=<bugfix>

(cherry picked from commit bb38b059)

3bef5baa

net.box: make `on_schema_reload` behave the same way as `on_disconnect` · e519d493

Georgiy Lebedev authored 1 year ago

For the `on_connect` trigger, if the trigger execution fails and an
exception happens, the connection is terminated and its state changes to
'error'. It allows the following filtering semantic: the client checked
some condition from the trigger and decided that the connection does not
suite him — the exception is thrown to indicate that the connection should
be terminated.

Currently, the `on_schema_reload` trigger behaves the same way. However,
filtering a connection from the `on_schema_reload` trigger or waiting for a
schema update does not seem has neither a reasonable semantic, nor a valid
use case. Let's make the `on_schema_reload` trigger behave the same way as
the `on_disconnect` trigger, i.e, log the exception, but otherwise ignore
it.

Closes #9679

@TarantoolBot document
Title: `on_schema_reload` trigger of `net.box` connections behaviour update
Product: Tarantool
Since: 3.1
Root documents: https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/#lua-function.conn.on_schema_reload

When an error is thrown from the `on_schema_reload` trigger, it now behaves
the same way as the `on_disconnect` trigger [^1]:
> If the trigger function causes an error, the error is logged but
otherwise is ignored.

[^1]: https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/#lua-function.conn.on_disconnect

(cherry picked from commit bbd8b684)

e519d493

net.box: fix `on_disconnect` inconsistency with documentation · cc8ecda6

Georgiy Lebedev authored 1 year ago

According to the documentation [1]:
> If the trigger function causes an error, the error is logged but
otherwise is ignored.

However, currently, the `on_disconnect` trigger behaves the same way as the
`on_connect` trigger, i.e., the connection is terminated and its state
changes to 'error'. Let's fix this inconsistency and log errors from the
`on_disconnect` trigger, but otherwise ignore them.

Closes #9677
Closes #9797

NO_DOC=<bugfix>

1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/#lua-function.conn.on_disconnect

(cherry picked from commit 1d6d6a3a)

cc8ecda6

Mar 29, 2024

build: include header "unit.h" after c++ headers · c014944e

Andrey Saranchin authored 11 months ago

Header "unit.h" contains `ok` and `is` macros used to check test cases.
The problem is such simple names can be used in C++ STL library headers
(it's OK because such short names can be hidden in a namespace), so when
including, for example, header "vector" after "unit.h", build can fail
because function declaration or definition in the C++ header will turn
into a macro invocation. I faced this problem building Tarantool on
MacOS with SDK of 14.4 version.

NO_TEST=fix build
NO_CHANGELOG=fix build
NO_DOC=fix build

(cherry picked from commit 025ba32f)

c014944e

Mar 28, 2024

box: support exclude_null option in functional indexes · d41901a0

Andrey Saranchin authored 1 year ago

Currently, exclude_null option doesn't affect functional indexes at all.
It seems that we just forgot to check if tuple should be inserted to the
index - the patch simply adds missing check in replace and build_next
methods of functional memtx_tree index.

Closes #9732

NO_DOC=bugfix

(cherry picked from commit c56998fa)

d41901a0

net.box: reference the connection object from asynchronous requests · 3e86dc91

Georgiy Lebedev authored 1 year ago

In order to prevent the garbage collection of the discarded connection,
asynchronous requests must reference the connection object. We must
reference the connection object rather than the transport object, because
our garbage collection hook is attached to the former.

Closes #9629

NO_DOC=<bugfix>

(cherry picked from commit fb5bf51c)

3e86dc91

Mar 26, 2024

test: increase process termination waiting timeout · fe4a9e24

Alexander Turenko authored 1 year ago

This commit increases a time to wait of the process termination. It may
take longer than 5 seconds, when tarantool is built with an address
sanitizer. The address sanitizer generates a report at the process
termination and it is not always a fast thing.

NO_DOC=test fix
NO_CHANGELOG=see NO_DOC

(cherry picked from commit 5260bc2a)

fe4a9e24