Commits · 30db3c235c1b60ad8925571bfb009fcf0c4383ae · core / tarantool

Jul 14, 2022

box: fix finding snapshot in upgrade script · 30db3c23

Serge Petrenko authored 2 years ago

The upgrade script first tries to determine if the node is booted from
old snaps not recoverable on current Tarantool versions. If this is the
case, it sets up special triggers so that snaps are automatically
converted to a suitable format.

This happens before box.cfg{}, so the workdir is not set at this point
in time, and the upgrade script should take configured work_dir into
account explicitly. Fix this.

Closes #7232

NO_DOC=bugfix

30db3c23

Jul 12, 2022

sql: introduce OpenSpace opcode · 49109769

Mergen Imeev authored 2 years ago

Prior to this patch, some opcodes could use a pointer to struct space
that was set during parsing. However, the pointer to struct space is not
really something that defines spaces. A space can be identified by its
ID or name. In most cases, specifying space by pointer works fine, but
truncate() changes the pointer to space, resulting in a sigfault for
prepared statements using the above opcodes. To avoid this problem, a
new opcode has been introduced. This opcode uses the space ID to
determine the pointer to the struct space at runtime and stores it in
the MEM, which is later used in the mentioned opcodes.

Closes #7358

NO_DOC=bugfix

49109769

sql: drop unused opcode IteratorReopen · 1c9599be

Mergen Imeev authored 2 years ago

Opcode IteratorReopen is not used and should be dropped.

Part of #7358

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

1c9599be

memtx: drop infrastructure for allocating memory chunks · 36eeca7c

Vladimir Davydov authored 2 years ago

We used to use it for allocating functional keys, but now we allocate
those as tuples. Let's drop the legacy infrastructure and make alloc and
free MemtxAllocator methods private.

Follow-up #7376

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

36eeca7c

memtx: allocate functional index key parts as tuples · 6cd463e1

Vladimir Davydov authored 2 years ago

Functional index keys are allocated and freed with MemtxAllocator's
alloc and free methods. In contrast to tuples, which are allocated and
freed with alloc_tuple and free_tuple, freeing a functional index key
happens immediately, irrespective of whether there's a snapshot in
progress or not. It's acceptable, because snapshot only uses primary
indexes, which can't be functional. However, to reuse the snapshot
infrastructure for creating general purpose user read views, we will
need to guarantee that functional index keys stay alive until all read
views using them are closed.

To achieve that, this commit turns functional index keys into tuples,
which automatically makes them linger if there's an open read view.
We use the same global tuple format for allocating functional keys,
because the key format is checked in key_list_iterator_next.

Closes #7376

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

6cd463e1

Jul 11, 2022

core: get rid of fiber_set_cancellable in latch.h · 5404abaa

Ilya Verbin authored 2 years ago

fiber_wakeup has been adapted to spurious wakeups, so this protection
is no longer needed.

Part of #7166

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

5404abaa

core: fix potential misorder of latch waiters · 294e8379

Ilya Verbin authored 2 years ago

Currently the latch doesn't guarantee the order in which it is acquired
by fibers, which requested it. E.g. it is possible to wake up spuriously
a fiber which is yielding in the latch_lock, it will be removed from
l->queue by fiber_make_ready, then it will be inserted to l->queue
again, but for this time, to the head of the list instead of its
original place in the queue.

Fix this by using latch_waiter structure, which is linked into l->queue.

Part of #7166

@TarantoolBot document
Title: Update box_latch_lock description
Since: 2.11

Add "Locks are acquired in the strict order as they were requested." to
the box_latch_lock description in C API reference - Module latch.

294e8379

Jul 08, 2022

http_parser: fix parsing HTTP protocol version · 9ee7e568
Nikolay Shirokovskiy authored 2 years ago
```
Handle status header response like 'HTTP/2 200' with version without
dot.

Closes #7319

NO_DOC=bugfix
```
9ee7e568

http_parser: add const to input data arguments · 9f2b015b

Nikolay Shirokovskiy authored 2 years ago

Parser does not change its input data on parsing and it's caller as well.

NO_DOC=internal
NO_CHANGELOG=internal
NO_TEST=refactoring

9f2b015b

box: fix unexpected error on granting privileges to admin · aaf6f8e9

Nikolay Shirokovskiy authored 2 years ago

We use LuaJIT 'bit' module for bitwise operations. Due to platform
interoperability it truncates arguments to 32bit and returns signed
result. Thus on granting rights using bit.bor to admin user which
have 0xffffffff rights (from bootstrap snapshot) we get -1 as a result.
This leads to type check error given in issue later in execution.

Closes #7226

NO_DOC=minor bugfix

aaf6f8e9

memtx: introduce flag for temporary tuples · 2fec0760

Vladimir Davydov authored 2 years ago

Passing format->is_temporary to MemtxAllocator::free_tuple complicates
the allocator API and the code using it. Let's introduce a per tuple
flag for such tuples.

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

2fec0760

memtx: enter delayed free mode once when snapshot starts · 5315bca6

Vladimir Davydov authored 2 years ago

Currently, we call memtx_enter_delayed_free_mode() per each index,
in index::create_snapshot_iterator(), but there's actually no need to
bump the snapshot_version more than once per snapshot. Let's move it
to the place where we start checkpoint/join and drop memtx wrappers
around memtx_allocators functions.

This will simplify reworking the memtx read view API, see #7364.

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

5315bca6

memtx: move allocator stuff from memtx_engine to MemtxAllocator · c7e3eae9

Vladimir Davydov authored 2 years ago

Let's hide all the logic regarding delayed freeing of memtx tuples to
MemtxAllocator and provide memtx_engine with methods for allocating and
freeing tuples (not memtx_tuples, just generic tuples). All the tuple
and snapshot version manipulation stuff is now done entirely in
MemtxAllocator.

This is a preparation for implementing a general-purpose tuple read view
API in MemtxAllocator, see #7364.

Note, since memtx_engine now deals with the size of a regular tuple,
which is 4 bytes less than the size of memtx_tuple, this changes the
size reported by OOM messages and the meaning of memtx_max_tuple_size,
which now limits the size of a tuple, not memtx_tuple.

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

c7e3eae9

memtx: do not use MemtxAllocator name spec when not necessary · 5e50a204

Vladimir Davydov authored 2 years ago

It isn't necessary to prefix all static class members with the class
name specifier in the class methods.

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

5e50a204

sql: fix error in ORDER BY ephemeral space format · 64cdb80c

Mergen Imeev authored 2 years ago

This patch fixes a bug where the ANY field type was replaced by the
SCALAR field type in the ephemeral space used in ORDER BY.

Closes #7345

NO_DOC=bugfix

64cdb80c

sql: ariphmetic of unsigned values · 3715f632

Mergen Imeev authored 2 years ago

After this patch, the result type of arithmetic between two unsigned
values will be INTEGER.

Closes #7295

NO_DOC=bugfix

3715f632

Jul 07, 2022

coio: fix memory leak in coio_connect_timeout · 7eb7e1ba

Ilya Verbin authored 2 years ago

coio_fill_addrinfo allocates ai_local->ai_addr, which should be freed
in case of error.

Part of #7370

NO_DOC=bugfix
NO_TEST=memory leak
NO_CHANGELOG=minor bug

7eb7e1ba

Jul 06, 2022

memtx: fix story delete statement list · 654cf498

Georgiy Lebedev authored 2 years ago

Current implementation of tracking statements that delete a story has a
flaw, consider the following example:

tx1('box.space.s:replace{0, 0}') -- statement 1

tx2('box.space.s:replace{0, 1}') -- statement 2
tx2('box.space.s:delete{0}') -- statement 3
tx2('box.space.s:replace{0, 2}') -- statement 4

When statement 1 is prepared, both statements 2 and 4 will be linked to the
delete statement list of {0, 0}'s story, though, apparently, statement 4
does not delete {0, 0}.

Let us notice the following: statement 4 is "pure" in the sense that, in
the transaction's scope, it is guaranteed not to replace any tuple — we
can retrieve this information when we check where the insert statement
violates replacement rules, use it to determine "pure" insert statements,
and skip them later on when, during preparation of insert statements, we
handle other insert statements which assume they do not replace anything
(i.e., have no visible old tuple).

On the contrary, statements 1 and 2 are "dirty": they assume that they
replaced nothing (i.e., there was no visible tuple in the index) — when one
of them gets prepared — the other one needs to be either aborted or
relinked to replace the prepared tuple.

We also need to fix relinking of delete statements from the older story
(in terms of the history chain) to the new one during preparation of insert
statements: a statement needs to be relinked iff it comes from a different
transaction (to be precise, there must, actually, be no more than one
delete statement from the same transaction).

Additionally, add assertions to verify the invariant that the story's
add (delete) psn is equal to the psn of the add (delete) statement's
transaction psn.

Closes #7214
Closes #7217

NO_DOC=bugfix

654cf498

Jul 05, 2022

box: fix `fselect()` behavior on binary data · 915ccdf1

Ilya Verbin authored 2 years ago

Currently it throws an error when encounter binary data, print
<binary> tag instead.

Closes #7040

NO_DOC=bugfix

915ccdf1

Jul 04, 2022

net_box.lua: fix "used variable with unused hint" · 2bd83f52

Boris Stepanenko authored 2 years ago

Since 0.26.0 luacheck emits a warning on the `_box` variable.
From luacheck v.0.26.0 release notes:

"Function arguments that start with a single underscore
get an "unused hint". Leaving them unused doesn't result
in a warning. Using them, on the other hand, is a
new warning (№ 214)."

Renamed `_box` to `__box`, which isn't considered unused.

Closes #7304.

NO_DOC=testing
NO_TEST=testing
NO_CHANGELOG=testing

2bd83f52

diag: define error_raise and diag_raise only in C++ code · b1f36d3a

Vladimir Davydov authored 2 years ago

We must not throw exceptions from C code.

Currently, there's the only C function that uses diag_raise() - it's
space_cache_find_xc. We move it under ifdef(__cplusplus).

Follow-up #4735

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

b1f36d3a

session: panic if failed to allocate session · b6012792

Vladimir Davydov authored 2 years ago

current_session() is called from C code so it must not throw, but it may
if it fails to allocate a session. Practically, this is hardly possible,
because we don't limit the runtime arena, which is used for allocation
of session objects. Still, this looks potentially dangerous.

Gracefully handling an allocation failure in all places where
current_session() may be called would be complicated. Since it's more of
a theoretical issue, let's panic on a session allocation error, like we
do if we fail to allocate other mission critical system objects.

Closes #4735

NO_DOC=code health
NO_TEST=code health
NO_CHANGELOG=code health

b6012792

session: rename session_create/destroy to new/delete · eeda37fd

Vladimir Davydov authored 2 years ago

The functions allocate and free a session so they should be called
new/delete, not create/destroy accroding to our naming convention.
While we are at it, also delete obsoleve comments to these functions:
they don't invoke session triggers.

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

eeda37fd

session: convert to C · b713ab71

Vladimir Davydov authored 2 years ago

C++ features are not used in this file.

Note, we need to move ifdef(__cplusplus) in user.h to make guest_user
and admin_user variables accessible from C code.

Also, we need to move initialization of session_vtab_registry to
session_init(), because most C compilers don't allow to initialize
a global variable with a value of another global variable.

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

b713ab71

replication: relax split-brain checks after DEMOTE · b5811f15

Serge Petrenko authored 2 years ago

Our txn_limbo_is_replica_outdated check works correctly only when there
is a stream of PROMOTE requests. Only the author of the latest PROMOTE
is writable and may issue transactions. No matter synchronous or
asynchronous.

So txn_limbo_is_replica_outdated assumes that everyone but the node with
the greatest PROMOTE/DEMOTE term is outdated.

This isn't true for DEMOTE requests. There is only one server which
issues the DEMOTE request, but once it's written, it's fine to accept
asynchronous transactions from everyone.

Now the check is too strict. Every time there is an asynchronous
transaction from someone, who isn't the author of the latest PROMOTE or
DEMOTE, replication is broken with ER_SPLIT_BRAIN.

Let's relax it: when limbo owner is 0, it's fine to accept asynchronous
transactions from everyone, no matter the term of their latest PROMOTE
and DEMOTE.

This means that now after a DEMOTE we will miss one case of true
split-brain: when old leader continues writing data in an obsolete term,
and the new leader first issues PROMOTE and then DEMOTE.

This is a tradeoff for making async master-master work after DEMOTE.

The completely correct fix would be to write the term the transaction
was written in with each transaction and replace
txn_limbo_is_replica_outdated with txn_limbo_is_request_outdated, so
that we decide whether to filter the request or not judging by the term
it was applied in, not by the term we seen in some past PROMOTE from the
node. This fix seems too costy though, given that we only miss one case
of split-brain at the moment when the user enables master-master
replication (by writing a DEMOTE). And in master-master there is no such
thing as a split-brain.

Follow-up #5295
Closes #7286

NO_DOC=internal chcange

b5811f15

replication: guard applier_synchro_filter_tx with limbo latch · 58f0e23d

Serge Petrenko authored 2 years ago

Currently there's only one place where applier_synchro_filter_tx
accesses limbo state under a latch: this place is
txn_limbo_is_replica_outdated. Soon there will be more accesses to limbo
parameters and all of them should be guarded as well.

Let's simplify things a bit and guard the whole synchro_filter_tx with
the limbo latch.

While we are at it remove txn_limbo_is_replica_outdated as not needed
anymore.

Part-of #7286

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

58f0e23d

replication: update stale comments regarding synchro filtering · c91afc1b

Serge Petrenko authored 2 years ago

Starting with commit deca9749
("replication: unify replication filtering with and without elections")
The filter works always, even when elections are turned off.

Reflect that in the comments for applier_synchro_filter_tx and
txn_limbo_is_replica_outdated.

Follow-up #6133

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

c91afc1b

Jul 01, 2022

vinyl: explicitly disable hot standby mode · 008ab8d3

Vladimir Davydov authored 2 years ago

Vinyl doesn't support the hot standby mode. There's a ticket to
implement it, see #2013. The behavior is undefined if running an
instance in the hot standby mode in case the master has Vinyl spaces.
It may result in a crash or even data corruption.

Let's raise an explicit error in this case.

Closes #6565

NO_DOC=bug fix

008ab8d3

hot_standby: log error if hot standby fails to apply row · 00a9977e

Vladimir Davydov authored 2 years ago

Since commit d2537d9d ("relay: cleanup error handling")
recover_remaining_wals() doesn't log the error it throws -
now callers of this function should catch and log the error.
hot_standby_f() doesn't catch the error so the diagnostic
message is lost if we fail to apply a row in the hot standby
mode. Fix this.

NO_DOC=bug fix
NO_TEST=checked in next commit
NO_CHANGELOG=minor bug in logging

00a9977e

json: don't match any nodes if there's [*] in the path · 35802a23

Vladimir Davydov authored 2 years ago

If a nested tuple field is indexed, it can be accessed by [*] aka
multikey or any token:

  s = box.schema.create_space('test')
  s:create_index('pk')
  s:create_index('sk', {parts = {{2, 'unsigned', path = '[1][1]'}}})
  t = s:replace{1, {{1}}}
  t['[2][1][*]'] -- returns 1!

If a nested field isn't indexed (remove creation of the secondary index
in the example above), then access by [*] returns nil.

Call graph:

  lbox_tuple_field_by_path:
    tuple_field_raw_by_full_path
      tuple_field_raw_by_path
        tuple_format_field_by_path
          json_tree_lookup_entry
            json_tree_lookup

And json_tree_lookup matches the first node if the key is [*].
We shouldn't match anything to [*].

Closes #5226

NO_DOC=bug fix

35802a23

Jun 30, 2022

gcov: use __gcov_dump + __gcov_reset instead of __gcov_flush · bd813168

Boris Stepanenko authored 2 years ago

__gcov_flush was removed in gcc11.

Since gcc11 __gcov_dump calls __gcov_lock at the start and
__gcov_unlock before returning. Same is true for __gcov_reset.
Because of that using __gcov_reset right after __gcov_dump since gcc11
is the same as using __gcov_flush before gcc11.

Closes #7302

NO_CHANGELOG=internal
NO_DOC=internal
NO_TEST=internal

bd813168

test: box_promote and box_demote · 5a8dca70

Boris Stepanenko authored 3 years ago

Covered most of box_promote and box_demote with tests:
1. Promote/demote unconfigured box
2. Promoting current leader with elections on and off
3. Demoting follower with elections on and off
4. Promoting current leader, but not limbo owner with elections on
5. Demoting current leader with elections on and off
6. Simultaneous promote/demote
7. Promoting voter
8. Interfering promote/demote while writing new term to wal
9. Interfering promote/demote while waiting for synchro queue
   to be emptied
10. Interfering promote while waiting for limbo to be acked
    (similar to replication/gh-5430-qsync-promote-crash.test.lua)

Closes #6033

NO_DOC=testing stuff
NO_CHANGELOG=testing stuff

5a8dca70

vinyl: re-fix crash in read iterator on rollback due to WAL error · 525456f8

Vladimir Davydov authored 2 years ago

After scanning disk, the Vinyl read iterator checks if it should restore
the iterator over the active memory tree, because new tuples could have
been inserted into it while we yielded reading disk. We assume that
existing tuples can't be deleted from the memory tree, but that's not
always true - a tuple may actually be deleted by rollback after a failed
WAL write. Let's reevaluate all scanned sources and reposition the read
iterator to the next statement if this happens.

Initially, the issue was fixed by commit 83462a5c ("vinyl: restart
read iterator in case L0 is changed"), but it introduced a performance
degradation and was reverted (see #5700).

NO_DOC=bug fix
NO_TEST=already there
NO_CHANGELOG=already there

525456f8

vinyl: fix read stalled by write · 2db6159e

Vladimir Davydov authored 2 years ago

The Vinyl read iterator, which is used for serving range select
requests, works as follows:

 1. Scan in-memory sources. If found an exact match or a chain in
    the cache, return it.
 2. If not found, scan disk sources. This operation may yield.
 3. If any new data was inserted into the active memory tree, go to
    step 1, effectively restarting the iteration from the same key.

Apparently, such an algorithm doesn't guarantee any progress of a read
operation at all - when we yield reading disk on step 2 after a restart,
even newer data may be inserted into the active memory tree, forcing us
to restart again. In other words, in presence of an intensive write
workload, read ops rate may drop down to literally 0.

It hasn't always been like so. Before commit 83462a5c ("vinyl:
restart read iterator in case L0 is changed"), we only restored the
memory tree iterator after a yield, without restarting the whole
procedure. This makes sense, because only memory tree may change after
a yield so there's no point in rescanning other sources, including
disk. By restarting iteration after a yield, the above-mentioned commit
fixed bug #3395: initially we assumed that statements may never be
deleted from a memory tree while actually they can be deleted by
rollback after a failed WAL write.

Let's revert this commit to fix the performance degradation. We will
re-fix bug #3395 in the next commit.

Closes #5700

NO_DOC=bug fix
NO_TEST=should be checked by performance tests

2db6159e

vinyl: disable deferred deletes if there are upserts on disk · a85629a6

Vladimir Davydov authored 2 years ago

Normally, there shouldn't be any upserts on disk if the space has
secondary indexes, because we can't generate an upsert without a lookup
in the primary index hence we convert upserts to replace+delete in this
case. The deferred delete optimization only makes sense if the space has
secondary indexes. So we ignore upserts while generating deferred
deletes, see vy_write_iterator_deferred_delete.

There's an exception to this rule: a secondary index could be created
after some upserts were used on the space. In this case, because of the
deferred delete optimization, we may never generate deletes for some
tuples for the secondary index, as demonstrated in #3638.

We could fix this issue by properly handle upserts in the write iterator
while generating deferred delete, but this wouldn't be easy, because in
case of a minor compaction there may be no replace/insert to apply the
upsert to so we'd have to keep intermediate upserts even if there is a
newer delete statement. Since this situation is rare (happens only once
in a space life time), it doesn't look like we should complicate the
write iterator to fix it.

Another way to fix it is to force major compaction of the primary index
after a secondary index is created. This looks doable, but it could slow
down creation of secondary indexes. Let's instead simply disable the
deferred delete optimization if the primary index has upsert statements.
This way the optimization will be enabled sooner or later, when the
primary index major compaction occurs. After all, it's just an
optimization and it can be disabled for other reasons (e.g. if the space
has on_replace triggers).

Closes #3638

NO_DOC=bug fix

a85629a6

small: bump new version · 662b92aa

Ilya Verbin authored 2 years ago

* doc: add allocators hierarchy diagram
* build: fix CMake warning
* small: fix compilation on macOS 12

NO_DOC=small submodule bump
NO_TEST=small submodule bump
NO_CHANGELOG=small submodule bump

662b92aa

bitset: fix compilation on macOS 12 · a6b76abd

Ilya Verbin authored 2 years ago

Asserts are disabled in the Release build, that leads to:

$ cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_FLAGS=-Werror
$ make

[ 43%] Building C object src/lib/bitset/CMakeFiles/bitset.dir/bitset.c.o
src/lib/bitset/bitset.c:169:9: error: variable 'cardinality_check' set but not used [-Werror,-Wunused-but-set-variable]
        size_t cardinality_check = 0;
               ^

NO_DOC=build fix
NO_TEST=build fix
NO_CHANGELOG=build fix

a6b76abd

Jun 29, 2022

fiber: do not disable fiber.top() on ARM · a99ccce5

Ilya Verbin authored 2 years ago

Now fiber.top() does not use x86-specific instructions, so it can be
enabled for ARM.

Closes #4573

NO_TEST=<Tested in test/app/fiber.test.lua>
NO_DOC=<x86 or ARM are not mentioned in the fiber.top doc>

a99ccce5

fiber: get rid of cpu_misses in fiber.top() · 390311bb

Ilya Verbin authored 2 years ago

It doesn't make sense after switching from RDTSCP to
clock_gettime(CLOCK_MONOTONIC).

Part of #5869

@TarantoolBot document
Title: fiber: get rid of cpu_misses in fiber.top()
Since: 2.11

Remove any mentions of `cpu_misses` in `fiber.top()` description.

390311bb

fiber: use clock_monotonic64 instead of __rdtscp · 2fb7ffc5

Ilya Verbin authored 2 years ago

clock_gettime(CLOCK_MONOTONIC) is implemented via the RDTSCP instruction
on x86 an has the following advantages over the raw instruction:

* It checks for RDTSCP availability in CPUID.
  If RDTSCP is not supported, it switches to RDTSC.
* Linux guarantee that clock is monotonic, hence, the CPU miss
  detection is not needed.
* It works on ARM.

As for disadvantage, this function is about 2x slower compared to a
single RDTSCP instruction. Performance degradation measured by the
fiber switch benchmark [1] is about 3-7% for num_fibers == 10-1000.

Closes #5869

[1] https://github.com/tarantool/tarantool/issues/2694#issuecomment-546381304

NO_DOC=bugfix
NO_TEST=<Tested in test/app/fiber.test.lua>

2fb7ffc5