Commits · 019bacbe118294f10895c976a0d84470f880f2db · core / tarantool

Jul 23, 2024

errinj: log error injection value · 019bacbe

Vladimir Davydov authored 7 months ago

Let's log the new value when an error injection is set in orer to ease
debugging in tests.

NO_DOC=logging
NO_TEST=logging
NO_CHANGELOG=logging

019bacbe

Jul 22, 2024

config/schema: support field deletion using :set() · 8a0013ea

Alexander Turenko authored 8 months ago

This commit implements the `<schema object>:set()` algorithm in a more
accurate way and it solves several drawbacks of the previous
implementation.

* It was impossible to set a field that is nested to a record or a map
  that has the box.NULL value (#10190).
* It was impossible to set a field to the box.NULL value (#10193).
* It was impossible to delete a field, now `nil` RHS value means the
  deletion (#10194).

Fixes #10190
Fixes #10193
Fixes #10194

NO_DOC=Included into https://github.com/tarantool/doc/issues/4279

8a0013ea

test: add the extended transactional DDL tests · 4e9b8bfe

Magomed Kostoev authored 1 year ago

Added more tests for various scenarios of the transactional DLL usage:
1. A DDL transaction which creates a space, formats it to have an
   integer column, fills it with some data, changes the format to
   'any', changes the data values to 'string', and changes the format
   to 'string'.
2. A DDL transaction that creates a space, user, grants privileges to
   the user to use the new space and accesses the space on behalf of
   the new user.
3. Creation of a space and its secondary indexes in one transaction.
4. Creation of a space with a sequence attached in one transaction.
5. A DDL transaction which sets the on_rollback trigger and creates a
   space, the space triggers, inserts a bunch of data and then commits
   or rolls back.
6. Transactional drop of a space with various indexes, field and tuple
   constraints and a sequence attached. The formers are also created
   in a single transaction.

Closes #4349

NO_DOC=test
NO_CHANGELOG=test

4e9b8bfe

lua: bump metrics module · e79982e7

Georgy Moiseev authored 8 months ago

Bump metrics package submodule. Commits from PRs [1-4] affect
Tarantool, the other ones are related to module infrastructure.

1. https://github.com/tarantool/metrics/pull/482
2. https://github.com/tarantool/metrics/pull/483
3. https://github.com/tarantool/metrics/pull/484
4. https://github.com/tarantool/metrics/pull/491

NO_DOC=doc is a part of submodule

e79982e7

tuple: allocate formats table statically · a2da1de7

Vladimir Davydov authored 7 months ago

The tuple formats table may be accessed with `tuple_format_by_id()` from
any thread, not just tx. For example, it's accessed by a vinyl writer
thread when it deletes a tuple. If a thread happens to access the table
while it's being reallocated by tx, see `tuple_format_register()`,
the accessing thread may crash with a use-after-free or NULL pointer
dereference bug, like the one below:

```
 # 1  0x64bd45c09e22 in crash_signal_cb+162
 # 2  0x76ce74e45320 in __sigaction+80
 # 3  0x64bd45ab070c in vy_run_writer_append_stmt+700
 # 4  0x64bd45ada32a in vy_task_write_run+234
 # 5  0x64bd45ad84fe in vy_task_f+46
 # 6  0x64bd45a4aba0 in fiber_cxx_invoke(int (*)(__va_list_tag*), __va_list_tag*)+16
 # 7  0x64bd45c13e66 in fiber_loop+70
 # 8  0x64bd45e83b9c in coro_init+76
```

To avoid that, let's make the tuple formats table statically allocated.
This shouldn't increase actual memory usage because system memory is
allocated lazily, on page fault. The max number of tuple formats isn't
that big (64K) to care about the increase in virtual memory usage.

Closes #10278

NO_DOC=bug fix
NO_TEST=mt race

a2da1de7

config/schema: allow to index 'any' type in :get() · 0f8c715d

Alexander Turenko authored 8 months ago

`<schema object>:get()` now can access a field inside the `any` type if
it is a `table` or `nil`/`box.NULL`.

`config:get()` now can access fields inside `app.cfg.<key>` and
`roles_cfg.<key>`.

Fixes #10205

NO_DOC=The `<schema object>:get()` update is included into
       https://github.com/tarantool/doc/issues/4279.
       The `config:get()` reference on the website doesn't mention the
       constraint, so it doesn't need an update.

0f8c715d

lua: introduce is_interval for datetime.interval · b3db87df

Mergen Imeev authored 8 months ago

This patch introduces is_interval() function for
require('datetime').interval.

Closes #10269

@TarantoolBot document
Title: Function require('datetime').interval.is_interval()

All modules representing field types, with the exception of `interval`
(i.e. `varbinary`, `uuid`, `decimal`, `datetime`), have an `is()` or
an `is_<type_name>()` function. Now such a function `is_interval()` is
provided for the `interval` module.

b3db87df

Introduce protobuf encoder · f83fded7

Col-Waltz authored 1 year ago

Introducing lua protobuf encoder.
Encoder can create a new protocol and encode data according to it.
As a result a binary string is returned that can be transported
by wire or decoded back by another protobuf decoder.
The future versions will add support protocol creation from .proto
files and decode method for encoded data.

Part of #9844

@TarantoolBot document
Title: Protobuf module
Product: Tarantool
Root document: -

Protobuf encoder API

Introducing protobuf encoder. All the Protocol Buffers wire types
are supported except group start/end, which are deprecated in proto3.
To encode data you need to create a protocol according to which
data will be encoded.

The two main components of the protocol are messages and enums.
To create them .message and .enum functions are used

Each message consists of name of the message and fields.
Each field has a name, a type and an id. Id must be unique
for all fields in one message. For example this is a message with
six fields:

```lua
protobuf.message('KeyValue', {
    key = {'bytes', 1},
    create_revision = {'int64', 2},
    mod_revision = {'int64', 3},
    version = {'int64', 4},
    value = {'bytes', 5},
    lease = {'int64', 6),
})
```

This implementation supports recursive definition for fields in
message. Depth of recursion is defined by input data because all
fields are optional by default. Example of recursive message:

```lua
protobuf.message('Node', {
    number = {'int64', 1},
    data = {'bytes', 2},
    next_node = {'Node', 3},
})
```

Each enum type consists of name of type and values.
Values must have a zero value to be set as default as in example:

```lua
protobuf.enum('EventType', {
    ['PUT'] = 0,
    ['DELETE'] = 1,
})
```

To create a protocol .protocol function is used. This function
supports forward declared types and nested messages so the tuple
can be set according to example:

```lua
schema = protobuf.protocol({
    protobuf.message(<...>),
    protobuf.message(<...>),
    protobuf.enum(<...>),
    protobuf.enum(<...>),
})
```

Output protocol can then be used for encoding entered data by the
method named encode. This method converts input data
according to chosen message definition from
protobuf protocol into protobuf wireformat. All fields in message
definition are optional so if some input data is missing
it simply will not be encoded. Input data can be
submitted using luatypes or using cdata (for example
entering int64) according to the example.

```lua
result = schema:encode(‘KeyValue’,
    {
        key = 'protocol',
        version = 2,
        lease = 5,
    }
)
```

Output result will be a binary string encoded according to the
protobuf standard and can be transmitted to another user.

f83fded7

test/fuzz: add an engine fuzzing test · 33670eae

Sergey Bronnikov authored 8 months ago

The test for Tarantool allows you to randomly generate DDL and DML
operations, apply these operations to vinyl and memtx spaces and
toggle random error injections simultaneously. All random
things generated by the test depends on random seed, that can
be passed via command-line argument.

Bugs found by the test:

- https://github.com/tarantool/tarantool/issues/9995
- https://github.com/tarantool/tarantool/issues/10026
- https://github.com/tarantool/tarantool/issues/10033
- https://github.com/tarantool/tarantool/issues/10082
- https://github.com/tarantool/tarantool/issues/10090
- https://github.com/tarantool/tarantool/issues/10094
- https://github.com/tarantool/tarantool/issues/10095
- https://github.com/tarantool/tarantool/issues/10096
- https://github.com/tarantool/tarantool/issues/10096
- https://github.com/tarantool/tarantool/issues/10097
- https://github.com/tarantool/tarantool/issues/10099
- https://github.com/tarantool/tarantool/issues/10100
- https://github.com/tarantool/tarantool/issues/10109
- https://github.com/tarantool/tarantool/issues/10123
- https://github.com/tarantool/tarantool/issues/10128
- https://github.com/tarantool/tarantool/issues/10134
- https://github.com/tarantool/tarantool/issues/10147
- https://github.com/tarantool/tarantool/issues/10148
- https://github.com/tarantool/tarantool/issues/10153
- https://github.com/tarantool/tarantool/issues/10233
- https://github.com/tarantool/tarantool/issues/10235
- https://github.com/tarantool/tarantool/issues/10236
- https://github.com/tarantool/tarantool/issues/10237
- https://github.com/tarantool/tarantool/issues/10245
- https://github.com/tarantool/tarantool/issues/10262
- https://github.com/tarantool/tarantool/issues/10265
- https://github.com/tarantool/tarantool/issues/10267
- https://github.com/tarantool/tarantool/issues/10277
- https://github.com/tarantool/tarantool/issues/10278
- https://github.com/tarantool/tarantool/issues/10283

Part of #4349
Closes #5076

NO_CHANGELOG=fuzzing
NO_DOC=fuzzing
NO_TEST=fuzzing

33670eae

salad: fix typo · 56de615c
Sergey Bronnikov authored 8 months ago
```
NO_CHANGELOG=typo
NO_DOC=typo
NO_TEST=typo
```
56de615c

Jul 19, 2024

perf/lua: add context section to test output · a3ef8fb6

Sergey Bronnikov authored 8 months ago

Google Benchmark output format contains a section "context" that
describes useful information about test environment.

Google Benchmark output format has been supported in Lua
microbenchmarks in commit 3110ef9a
("perf: introduce benchmark.lua helper module"). However, produced
output contains test results only and section "context" is missed.
The patch add a section "context" with the following fields:
date, load average, hostname, tarantool's version, build flags
and a name of build target.

```
$ tarantool uri_escape_unescape.lua --output=res.json --output_format=json
$ jq ".context" res.json
{
  "build_target": "Linux-x86_64-RelWithDebInfo",
  "host_name": "pony",
  "date": "2024-07-04 19:09:11",
  "tarantool_version": "3.2.0-entrypoint-114-g9e5dca29ad",
  "build_flags": " -fexceptions -funwind-tables -fasynchronous-unwind-tables -fno-common -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=/home/sergeyb/sources/MRG/tarantool=. -std=c11 -Wall -Wextra -Wno-gnu-alignof-expression -Wno-cast-function-type -O2 -g -DNDEBUG -ggdb -O2 ",
  "load_avg": [
    "0.76",
    "0.74",
    "0.63"
  ]
}
```

NO_CHANGELOG=perf
NO_DOC=perf
NO_TEST=perf

a3ef8fb6

perf/lua: protect require of column module · 18661ed7
Sergey Bronnikov authored 8 months ago
```
NO_CHANGELOG=perf
NO_DOC=perf
NO_TEST=perf
```
18661ed7

ci: enable BENCH_CMD · 3951a88c

Sergey Bronnikov authored 8 months ago

The patch enable environment variable `BENCH_CMD` introduced in
a previous commit. The `taskset` alone will pin all the process
threads into a single (random) isolated CPU, there's a ticket [1]
about this in the Linux kernel bugtracker. The workaround is using
realtime scheduler for the isolated task using `chrt` [2], e. g.:
`taskset 0xef chrt 50`.

1. https://bugzilla.kernel.org/show_bug.cgi?id=116701
2. https://www.man7.org/linux/man-pages/man1/chrt.1.html

NO_CHANGELOG=performance testing
NO_DOC=performance testing
NO_TEST=performance testing

3951a88c

perf: introduce BENCH_CMD environment variable · 03317b16

Sergey Bronnikov authored 8 months ago

The patch introduces a BENCH_CMD, environment variable that
could be set to a string with command and its arguments on CMake
configuration stage and this string will be used as a pre-command
for executing performance tests. Examples of these commands are
`taskset` [1] and `numactl` [2], or any other utilities, see [3].

1. https://man7.org/linux/man-pages/man1/taskset.1.html
2. https://man7.org/linux/man-pages/man8/numactl.8.html
3. https://github.com/tarantool/tarantool/wiki/Benchmarking#run-benchmarks

NO_CHANGELOG=performance infra
NO_DOC=performance infra
NO_TEST=performance infra

03317b16

perf: add a script for setting environment · f3ca5c93

Sergey Bronnikov authored 8 months ago

"Benchmarking" article [0] in Tarantool's wiki contains a lot of
recommendations that help to setup the Linux operating system and
avoid potential reproducibility pitfalls when executing
performance tests in a Linux-based environment. These
recommendations written in plain text with examples of commands
that could be executed manually. We desire to execute benchmarks
automatically and in continuous mode, therefore we need a way to
setup the test environment automatically before running
benchmarks.

There are many guides with benchmarking tips, but unfortunately
there is no script that will do these steps automatically.
I found only temci [1] and pyperf (`pyperf system` [2]) projects.

The patch adds a script for setting the environment before running
performance tests. All settings used in the proposed script are
described in the article [3]. Note, that uncertain settings were
not implemented.

0. https://github.com/tarantool/tarantool/wiki/Benchmarking
1. https://github.com/parttimenerd/temci
2. https://github.com/travisdowns/uarch-bench/blob/master/uarch-bench.sh
3. https://pyperf.readthedocs.io/en/latest/cli.html#system-cmd

NO_CHANGELOG=performance
NO_DOC=performance
NO_TEST=performance

f3ca5c93

Jul 18, 2024

vinyl: wake up waiters after clearing checkpoint_in_progress flag · fc3196dc

Vladimir Davydov authored 8 months ago

The function `vy_space_build_index`, which builds a new index on DDL,
calls `vy_scheduler_dump` on completion. If there's a checkpoint in
progress, the latter will wait on `vy_scheduler::dump_cond` until
`vy_scheduler::checkpoint_in_progress` is cleared. The problem is
`vy_scheduler_end_checkpoint` doesn't broadcast `dump_cond` when it
clears the flag. Usually, everything works fine because the condition
variable is broadcast on any dump completion, and vinyl checkpoint
implies a dump, but under certain conditions this may lead to a fiber
hang. Let's broadcast `dump_cond` in `vy_scheduler_end_checkpoint`
to be on the safe side.

While we are at it, let's also inject a dump delay to the original
test to make it more robust.

Closes #10267
Follow-up #10234

NO_DOC=bug fix

fc3196dc

Jul 17, 2024

iproto: introduce FETCH_SNAPSHOT_CURSOR feature · 62c49367

Nikita Zheleztsov authored 8 months ago

This commit introduces FETCH_SNAPSHOT_CURSOR feature, which is available
only in EE. The feature is not returned in response to IPROTO_ID and is
not shown in box.iproto.protocol_features in Community Edition. Its id
is shown only in box.iproto.feature, which is a list of all available
features in the current version.

Needed for tarantool/tarantool-ee#741

NO_CHANGELOG=minor

@TarantoolBot document
Title: Document iproto feature FETCH_SNAPSHOT_CURSOR

Root document: https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/#net-box-connect

FETCH_SNAPSHOT_CURSOR feature requires cursor FETCH_SNAPSHOT on the
server. Its ID is IPROTO_FEATURE_FETCH_SNAPSHOT_CURSOR. IPROTO version
is 8 or more, Enterprise Edition is also required.

62c49367

engine: introduce stubs for checkpoint FETCH_SNAPSHOT · 2fca5c13

Nikita Zheleztsov authored 8 months ago

This commit introduces engine stubs that enable a new method
of fetching snapshots for anonymous replicas. Instead of using
the traditional read-view join approach, this update allows
file snapshot fetching. Note that file snapshot fetching
is only available in Tarantool EE.

Checkpoint fetching is done via IPROTO_IS_CHECKPOINT_JOIN,
IPROTO_CHECKPOINT_VCLOCK and IPROTO_CHECKPOINT_LSN fields.

If IPROTO_CHECKPOINT_JOIN is set to true, join will be done from
files: .snap for memtx, .run for vinyl, if false - from read view.

Checkpoint join allows to continue from the place, where client
stopped in case of snapshot fetching error. This allows to avoid
rebootstrap of an anonymous client. This can be done by specifying
CHECKPOINT_VCLOCK, which says from which file server should continue
join, client gets vclock at the beginning of the join. Specifying
CHECKPOINT_LSN allows to continue from some position in checkpoint.
Server sends all data >= CHECKPOINT_LSN.

If CHECKPOINT_VCLOCK is not specified, fetching is done from the latest
available checkpoint. If CHECKPOINT_LSN is not specified - start from
the beginning of the snap. So, specifying only IS_CHECKPOINT_JOIN
triggers fetching the latest checkpoint from files.

Needed for tarantool/tarantool-ee#741

NO_DOC=ee
NO_TEST=ee
NO_CHANGELOG=ee

2fca5c13

engine: send vclock with 0th component during join · 56058393

Nikita Zheleztsov authored 8 months ago

This commit makes engine to send vclock without ignoring 0th component
during join, which is needed for checkpoint FETCH SNAPSHOT.

Currently engine join functions are invoked only from
relay_initial_join, which is done during JOIN or FETCH SNAPSHOT.
They respond with vclock of the read view we're going to send.

In the following commit checkpoint FETCH SNAPSHOT will be introduced,
which responds with vclock of the checkpoint, we're going to send.
Such vclock may include 0th component and it's crucial to send it to
a client, as in case of connection failure, client will send us the
same vclock and we'll have to use its signature to figure out, which
checkpoint client wants.

So, we have to send and receive 0th component of the vclock during
FETCH_SNAPSHOT. This commit also introduces decoding vclocks without
ignoring 0th component, as they'll be used in the following commit too.

Needed for tarantool/tarantool-ee#741

NO_DOC=internal
NO_TEST=ee
NO_CHANGELOG=internal

56058393

xrow: rename xrow_encode_vclock · 313bd730

Nikita Zheleztsov authored 8 months ago

This commit renames xrow_encode_vlock to xrow_encode_vclock_ignore0
since the next commit will introduce encoding vclock without ignoring
0th component, which is needed during sending the response to fetch
snapshot request.

This commit also removes internal field inside the replication_request
structure, as the following commit will use 'vclock' for
encoding/decoding vclock without ignoring component.

Needed for tarantool/tarantool-ee#741

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

313bd730

relay: refactor relay_initial_join · 72cc2b3e

Nikita Zheleztsov authored 8 months ago

From now on during initial join memtx engine prepares vclock, raft and
limbo states, it also sends them during memtx_engine_join.

It's done in order to simplify the code of initial join, as in the
consequent commit checkpoint initial join will be introduced and we want
relay code to handle it the same as read-view join without confusing
conditions.

Needed for tarantool/tarantool-ee#741

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

72cc2b3e

engine: move raft and limbo states after system data in checkpoint · 3da31b83

Nikita Zheleztsov authored 8 months ago

Before this commit raft and limbo states were written at the end of the
checkpoint, which makes it very costly to access them.

Checkpoint join needs to access limbo and raft state in order to send
them during JOIN_META stage. We cannot use the latest states, like it's
done for read-view snapshot fetching: states may be far ahead of the
data, written to the checkpoint, which we're going to send.

This commit moves raft and limbo states after data from the system
spaces but before user data. We cannot put them right at the beginning
of the snapshot, because then we'll have to patch recovery process,
which currently strongly relies on the fact, that system spaces are
at the beginning of the snapshot (this was done in order to apply force
recovery only for user data). If we patch recovery process, then old
versions, where it's unpatched, won't be able to recover from the
snapshots done by the newer version, compatibility of snapshots will be
broken.

The current change is not breaking, old Tarantool versions can restore
from the snapshot made by the newer one.

Needed for tarantool/tarantool-ee#741

NO_DOC=internal
NO_CHANGELOG=internal

3da31b83

Jul 16, 2024

perf: fix warnings in column_scan_module.c · 4cac1677

Ilya Verbin authored 8 months ago

Fix the following warnings (with ENABLE_READ_VIEW defined):

```
./perf/lua/column_scan_module.c:59:18: error: unused variable ‘index_id’ [-Werror=unused-variable]
   59 |         uint32_t index_id = luaL_checkinteger(L, 2);
      |                  ^~~~~~~~

./perf/lua/column_scan_module.c:149:18: error: unused variable ‘index_id’ [-Werror=unused-variable]
  149 |         uint32_t index_id = luaL_checkinteger(L, 2);
      |                  ^~~~~~~~
```

NO_DOC=perf test
NO_TEST=perf test
NO_CHANGELOG=perf test

4cac1677

applier: fix assertion failure after split brain · 5ce010c5

Nikita Zheleztsov authored 8 months ago

After receiving async transaction from an old term applier_apply_tx
exits without unlocking the latch. If the same applier tries to
subscribe for replication, it fails with assertion, as the latch is
already locked.

Let's fix the function, which raises error so that it just sets
diag and returns -1.

Closes #10073

NO_DOC=bugfix
NO_CHANGELOG=no crash on release version

5ce010c5

perf: add column insert test · e5c4bd63

Ilya Verbin authored 8 months ago

The test creates an empty space with 1000 nullable columns storing uint64
values. Then it initializes a datasets that consists of 10 columns and
1 million rows (row count and both column counts are configurable), then
it inserts the dataset into the space.

By default the test uses serial C API but one may switch to the Arrow API
for batch insertion (the feature is exclusive to the Enterprise Edition).

It's also possible to specify the engine and wal_mode to use (default are
memtx, write).

Needed for tarantool/tarantool-ee#712

NO_DOC=perf test
NO_TEST=perf test
NO_CHANGELOG=perf test

e5c4bd63

third_party: initial import of arrow/abi.h · 8cd677da

Ilya Verbin authored 11 months ago

Needed for tarantool/tarantool-ee#712

NO_DOC=for enterprise edition
NO_TEST=for enterprise edition
NO_CHANGELOG=for enterprise edition

8cd677da

lua/utils: export luaL_pushnull and luaL_isnull functions · a6140a3e

Ilya Verbin authored 8 months ago

They are useful in C modules.

Needed for tarantool/tarantool-ee#712

@TarantoolBot document
Title: Update C API reference > Module lua/utils
Product: Tarantool
Root documents: https://www.tarantool.io/en/doc/latest/dev_guide/reference_capi/utils/

The following functions are missed in the documentation:

 * luaL_iscallable
 * luaL_iscdata
 * luaL_isnull
 * luaL_pushnull
 * luaT_call
 * luaT_checktuple
 * luaT_isdecimal
 * luaT_newdecimal
 * luaT_pushdecimal
 * luaT_toibuf
 * luaT_tolstring
 * luaT_tuple_encode
 * luaT_tuple_new

See also: https://github.com/tarantool/doc/issues/2011

a6140a3e

mpstream: introduce mpstream_encode_int64() helper · f8be986d
Ilya Verbin authored 8 months ago
```
Needed for tarantool/tarantool-ee#712

NO_TEST=EE
NO_DOC=internal
NO_CHANGELOG=internal
```
f8be986d
error: introduce ERRINJ_TUPLE_ALLOC_COUNTDOWN · 52926402
Ilya Verbin authored 8 months ago
```
Needed for tarantool/tarantool-ee#712

NO_DOC=internal
NO_TEST=internal
NO_CHANGELOG=internal
```
52926402

test: do not test errinj.info() output · dc0fd81c

Ilya Verbin authored 8 months ago

There is no much sense in testing it, but it is sensitive to source code
changes, especially `ERRINJ_*_COUNTDOWN` injections, e.g. see commit
697123d0 ("box: use maximal space id instead of _schema.max_id").

Needed for tarantool/tarantool-ee#712

NO_DOC=test
NO_CHANGELOG=test

dc0fd81c

sio: fix error message displaying bind address · a5214bfc

Lev Kats authored 8 months ago

Now `sio_bind` function prints address into error message directly
instead of relying on `fd` used in `bind` that failed to execute.

`sio_bind` used `sio_socketname_to_buffer` for error message
effectively attempting printing address bound to `fd` while there
actually was an error in binding that address to that socket in the
first place.

Fixes #5925

NO_DOC=bugfix
NO_CHANGELOG=minor

a5214bfc

test: cover split-brain during promote · 06b87e27

Nikita Zheleztsov authored 8 months ago

This test checks, that when PROMOTE from the previous term is
encountered we immediately notice split-brain situation and break
replication without corrupting data.

Closes #9943

NO_DOC=test
NO_CHANGELOG=test

06b87e27

Jul 15, 2024

applier: drop apply_final_join_tx · da158b9b

Vladislav Shpilevoy authored 9 months ago

Can use the regular applier_apply_tx(), they do the same. The
latter is just more protective, but doesn't matter much in this
case if the code does a few latch locks.

The patch also drops an old test about double-received row panic
during final join. The logic is that absolutely the same situation
could happen during subscribe, but it was always filtered out by
checking replicaset.applier.vclock and skipping duplicate rows.

There doesn't seem to be a reason why final join must be any
different. It is, after all, same subscribe logic but the received
rows go into replica's initial snapshot instead of xlogs. Now it
even uses the same txn processing function applier_apply_tx().

The patch also moves `replication_skip_conflict` option setting
after bootstrap is finished. In theory, final join could deliver
a conflicting row and it must not be ignored. The problem is that
it can't be reproduced anyhow without illegal error injection
(which would corrupt something in an unrealistic way). But lets
anyway move it below bootstrap for clarity.

Follow-up #10113

NO_DOC=refactoring
NO_CHANGELOG=refactoring

da158b9b

box: make instance_vclock const · 19b2cc20

Vladislav Shpilevoy authored 8 months ago

No code besides box.cc can now update instance's vclock
explicitly. That is a protection against hacks like #9916.

Closes #10113

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

19b2cc20

box: make final join vclock update only in box.cc · fe338ed4

Vladislav Shpilevoy authored 9 months ago

The goal is to make sure that no files except box.cc can change
instance_vclock_storage directly. That leads to all sorts of hacks
which in turn lead to bugs - #9916 is a good example.

Now applier on final join only sends rows into the journal. The
journal then is handled by box.cc where vclock is properly
updated.

Part of #10113

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

fe338ed4

journal: extract journal_write_row from limbo · 7d10096c

Vladislav Shpilevoy authored 9 months ago

The function writes a single xrow into the journal in a blocking
way. It isn't so simple, so makes sense to keep as a function,
especially given that it will be used more in the next commit.

Part of #10113

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

7d10096c

box: move recovery_journal creation · 2620eb9e

Vladislav Shpilevoy authored 9 months ago

Recovery journal uses word "recovery" to say that it works with
xlogs. For snapshot recovery there is bootstrap_journal. Lets use
it during local snapshot recovery.

The reasoning is that while right now there is no difference, in
next commits the recovery_journal will do more.

Part of #10113

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

2620eb9e

box: move replicaset.vclock into instance_vclock · f1e8e4e1

Vladislav Shpilevoy authored 9 months ago

Storing vclock of the instance in replicaset.vclock wasn't right.
It wasn't vclock of the whole replicaset. It was local to this
instance. There is no such thing as "replicaset vclock".

The patch moves it to box.h/cc.

Part of #10113

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

f1e8e4e1

applier: treat register txns like regular ones · 51751f87

Vladislav Shpilevoy authored 9 months ago

Applier during the registration waiting (for registering a new ID
or a name) could keep doing the master txns received before the
registration was started. They could still be inside WAL doing a
disk write, when the replica sends a register request.

Before this commit, it could cause an assertion failure in debug
and a double LSN error in release.

The reason was that during the registration waiting the applier
treated all incoming txns as "final join" txns. I.e. it wasn't
checking if those txns were already received, but not committed
yet.

During normal subscribe process the appliers (potentially
multiple) protect themselves from that by keeping track of the
vclocks which are already applied and also being applied right now
(replicaset.applier.vclock).

Such protection ensures that receiving same row from 2 appliers
wouldn't result into its double write. It also protects from the
case when a txn was received, goes to WAL, but then the applier
reconnects, resubscribes, and gets the same txn again - it
shouldn't be applied.

The patch makes so that the registration waiting after recovery
works like subscribe. Registration during recovery would mean
bootstrap via join. And outside of recovery it means the instance
is already running.

Closes #9916

NO_DOC=bugfix

51751f87

lua: shutdown tasks worker fiber · 6e403753

Nikolay Shirokovskiy authored 8 months ago

As this fiber is made system in the commit bf620650 ("box: finish
client fibers on shutdown") we don not need the existing protection from
cancelling. So first remove it. Now make it managed on shutdown.

Note that we may have issues as we finish this fiber too early. The
tasks scheduled but not executed at this moment will never be executed.
So the tasks that be scheduled after fiber is finished. Now when we
don't use worker fiber for swim gc this will not cause leaks. And
leaking fd on Tarantool shutdown in fio is not a problem.

Closes #9722

NO_CHANGELOG=internal
NO_DOC=internal

6e403753