Commits · 1262e99dbd6f1409ed7d5bb9612559b1e9de4609 · core / tarantool

Dec 18, 2023

iproto: fix deadlock when dropping connections from iproto · 1262e99d

Nikolay Shirokovskiy authored 1 year ago

Deadlock is described in detail in the ticket. Let's don't wait while
connection that called iproto_drop_connections is deleted. It will be
deleted eventually when request executing iproto_drop_connections is
finished and all other requests in progress from this connection are
finished too (they are cancelled just as requests of other connections).

CE part of https://github.com/tarantool/tarantool-ee/issues/609

NO_DOC=bugfix
NO_CHANGELOG=feature is not released yet
NO_TEST=test is in EE

1262e99d

config: fix automatic names apply before schema upgrade · c041863d

Serge Petrenko authored 1 year ago

Each config reload on a node with old schema (less than 3.0.0) lead to
the error ER_SCHEMA_NEEDS_UPGRADE. This happened because the relevant
code didn't check for schema version before trying to apply the names.

Fix this.

Closes #9502

NO_DOC=bugfix
NO_CHANGELOG=fix in a not yet released behavior

c041863d

config: fix name applying code checking fixed schema version · c3ef0c57

Serge Petrenko authored 1 year ago

The code in names_apply() and associated triggers considered the schema
to be up to date, if its version was >= 3.0.0. This was correct only for
the upcoming 3.0 release, but not for any releases coming after it.

Let's fix this by comparing schema version to mkversion.get_latest()
instead of a fixed 3.0.0 version.

In-scope-of #9502

NO_DOC=bugfix
NO_TEST=hard to test separately, covered by existing tests
NO_CHANGELOG=changes not yet released behavior

c3ef0c57

mkversion: introduce get_latest method · 11daa099

Serge Petrenko authored 1 year ago

mkversion.get_latest() returns the latest schema version known to this
tarantool build. It is going to be used in config module to determine
when dd operations are allowed.

It will be used in places where `schema_needs_upgrade()` can't be used,
because it compares only the current schema version, not any version you
pass to it.

In-scope-of #9502

NO_DOC=internal
NO_TEST=covered by the next commit
NO_CHANGELOG=internal

11daa099

config: support conditional sections · 6206f744

Alexander Turenko authored 1 year ago

Fixes #9452

@TarantoolBot document
Title: config: conditional sections for upgrading

See https://github.com/tarantool/tarantool/issues/9452 for the problem
statement. In short: some upgrade scenarios may need to configure
tarantool instances differently depending on a tarantool version.

A new top level configuration block is added for this purpose:
`conditional`. Let's look on an example:

```yaml
conditional:
- if: tarantool_version >= 3.99.0 && tarantool_version < 4.0.0
  # This section shouldn't be validated and shouldn't be applied.
  replication:
    new_option: foo
- if: tarantool_version < 3.99.0
  # This section is to be applied.
  process:
    title: '{{ instance_name }} -- in upgrade'
```

The block contains an array of conditional sections, each accompanied by
`if` predicate to determine, whether to apply it on particular tarantool
version. (`if` is required.)

If a section is not to be applied on the given version, it is not
validated at all and may contain unknown options.

If a section is to be applied, it must match the cluster configuration
schema like the main config.

If the same option is set by several sections with true predicate, the
last section wins.

The `if` expression supports one data type: `version`. A value may be
referenced in two ways:

1. Version literal: `1.2.3` (three components, not less, not more).
2. Variable: `tarantool_version` (only this variable is supported).

`tarantool_version` is assumed as three components version, say, 3.0.0.

The operations are the following.

1. Logical OR: `||`
2. Logical AND: `&&`
3. Compare: `>`, `<`, `>=`, `<=`, `==`, `!=`
4. Parentheses: `(`, `)`

All the comparisons assume the versions as three component ones.

6206f744

config: add expression.evaluate · 02b84c95

Alexander Turenko authored 1 year ago

Also add expression.eval shortcut to parse, validate and evaluate an
expression against given variables.

This commit completes expression module implementation.

Part of #9452

NO_DOC=It is a supplementary module for config's conditional section
       predicates. To be documented in the last commit of the series.
NO_CHANGELOG=see NO_DOC

02b84c95

config: add expression.validate · 1707ebef

Alexander Turenko authored 1 year ago

It verifies AST invariants:

* comparison operators have to be applied to version literals and
  variables
* logical operators have to be applied to boolean expressions
* the root AST node should be a boolean expression

It also verifies variables:

* variables are referenced by the given expression have to be provided
* provided variables have to contain a three component versions

Part of #9452

NO_DOC=It is a supplementary module for config's conditional section
       predicates. To be documented in the last commit of the series.
NO_CHANGELOG=see NO_DOC

1707ebef

config: add version expression parsing · e095bd49

Alexander Turenko authored 1 year ago

The usual expression is a kind of `tarantool_version >= 3.1.0 &&
tarantool_version < 4.0.0`, but all the comparison operators are
supported, logical and/or and grouping. See the previous commit for the
list of operators.

The module implements Pratt's expression parsing algorithm.

Part of #9452

NO_DOC=It is a supplementary module for config's conditional section
       predicates. To be documented in the last commit of the series.
NO_CHANGELOG=see NO_DOC

e095bd49

config: add version expression lexer · 1821006b

Alexander Turenko authored 1 year ago

The expression syntax is C alike except version literals. The
expressions are oriented to version comparisons, so arithmetic operators
are not implemented: only compare and logical operators.

Supported tokens:

* Three component version literal: `\d+.\d+.\d+`.
* Variable: `[a-zA-Z_][a-zA-Z0-9_]`.
* Operators: `(`, `)`, `<`, `>`, `<=`, `>=`, `!=`, `==`, `&&`, `||`.

The operators are token separators: `x>=` is the same as `x >= y`.

The idea how to structurize the code is borrowed from @locker's
`src/lua/xml.c`: see commit 5f596b25 ("lua: add internal xml
parser").

Part of #9452

NO_DOC=It is a supplementary module for config's conditional section
       predicates. To be documented in the last commit of the series.
NO_CHANGELOG=see NO_DOC

1821006b

config: remove config.version option · c3cd225a

Alexander Turenko authored 1 year ago

The idea of `config.version` is to differentiate a configuration that is
written for the given tarantool version (and any unknown field is to be
considered as an error) from a configuration that deliberately contains
options that are unknown on the given tarantool version (to apply them
on newer tarantool versions).

This option was planned as mandatory one: a user must set it in its
config. OTOH, a user likely copy-pastes some baseline configuration and
adds some options from documentation pages or another project. The user
unlikely really want to pay attention to a supported version range for
each of used options to deduce a right `config.version` value.

The problem is mostly about conflict of a strict validation (no unknown
options) with schema evolution (ignore options for newer tarantool
versions). See #9452 for possible approaches to solve it.

One of the proposed mechaniscms to support schema evolution together
with a strict validation is implemented in next commits of the series.

Part of #9452

@TarantoolBot document
Title: config: `config.version` is removed

Another schema evolution mechanism is implemented in
https://github.com/tarantool/tarantool/issues/9452.

c3cd225a

config: better diagnostic for a corrupted snapshot · 29d5eda2

Alexander Turenko authored 1 year ago

If a given snapshot has no instance or replicaset UUID, give a
meaningful diagnostic and fail.

Part of #8862

NO_DOC=No API changes.
NO_TEST=Would be complicated, tested manually.

29d5eda2

config: add compat options · 317dc037

Alexander Turenko authored 1 year ago

Fixes #9497

@TarantoolBot document
Title: config: compat options
The new section `compat` is added to the declarative configuration.

The options are the same as ones provided by the `compat` module. Each
can be set to `old` or `new`.

Example:

```yaml
compat:
  json_escape_forward_slash: old
```

The list of currently supported options with their defaults:

```yaml
compat:
  json_escape_forward_slash: new
  yaml_pretty_multiline: new
  fiber_channel_close_mode: new
  box_cfg_replication_sync_timeout: new
  sql_seq_scan_default: new
  fiber_slice_default: new
  box_info_cluster_meaning: new
  binary_data_decoding: new
  box_tuple_new_vararg: new
  box_session_push_deprecation: old
  sql_priv: new
  c_func_iproto_multireturn: new
  box_space_execute_priv: new
  box_tuple_extension: new
  box_space_max: new
...
```

The `box_cfg_replication_sync_timeout` option is non-dynamic and it
can't be changed after a startup.

Technically, all the other options could be changed in runtime (by
changing the configuration file and calling `config:reload()`), but I
would not generally recommend it.

At least code that handles `fiber_channel_close_mode` option has the
following comment, see commit de9b9308 ("fiber: add channel close
mode option to compat").

> The behavior is unspecified for already created channels.
> Choose the mode at an early stage of application's
> initialization.

317dc037

config: support relative path in --config · a9e798c9

Alexander Turenko authored 1 year ago

Before this change the following conditions lead to an error on
`config:reload()`.

* `--config <...>` CLI option is passed with a relative config path
* `process.work_dir` config option is set to a non-null value

It is fixed by calculating of an absolute config path on startup.

Part of #8862

NO_DOC=bugfix

a9e798c9

test: support process.work_dir in server helper · b8f85113

Alexander Turenko authored 1 year ago

I'm going to write a test case with non-null `process.work_dir` for the
next commit. Let's support this case in the testing helper.

Part of #8862

NO_DOC=testing helper change
NO_CHANGELOG=see NO_DOC

b8f85113

test: support relative config path in server helper · b49cd5ec

Alexander Turenko authored 1 year ago

We need to test `--config <...>` with a relative path. Let's support it
in the testing helper.

Part of #8862

NO_DOC=testing helper change
NO_CHANGELOG=see NO_DOC

b49cd5ec

iproto: don't use cord_cancel_and_join for iproto shutdown · 26acba83

Nikolay Shirokovskiy authored 1 year ago

Also we now can free all resources allocated in iproto threads.

Part of #8423

NO_TEST=rely on existing tests
NO_CHANGELOG=internal
NO_DOC=internal

26acba83

config: introduce sharding role rebalancer · 171fd9c5

Mergen Imeev authored 1 year ago

Part of #8862

@TarantoolBot document
Title: rebalancer sharding role in config

A new role for sharding was introduced - 'rebalancer'. This role can be
present at replicaset scope or above. A new change has also been made
to address this requirement: sharding.roles can now only be present in
replicaset scope or above.

There can be at most one replicaset with the rebalancer role.
Additionally, this replicaset must also have a storage role. If a
replicaset with this role exists, then the vshard rebalancer can only be
present in the replicaset with the rebalancer role.

If the rebalancer role is not specified, a rebalancer is selected
automatically from among the masters of the replicasets.

171fd9c5

config: forbid sharding.roles at instance scope · 18422e7c

Mergen Imeev authored 1 year ago

After this patch, sharding.roles can only be present in scope above
"instance".

Part of #8862

NO_DOC=will be described later

18422e7c

main: hide --failover CLI option · af4bad43

Alexander Turenko authored 1 year ago

The failover agent is not ready at the moment. Let's hide the CLI option
from the help message, but still handle it: it is useful to continue the
development of the agent.

See tarantool/tarantool-ee#564

NO_DOC=The option is not present into the documentation.
NO_CHANGELOG=No behavior changes, just help message.

af4bad43

config: don't show box.cfg table in debug logs · b7941d8c

Alexander Turenko authored 1 year ago

It may contain passwords in box.cfg.replication option and the changed
options are printed to logs anyway. No much sense to keep printing the
full table.

NO_DOC=No API changes.
NO_CHANGELOG=Almost invisible for a user.
NO_TEST=A change in debug log message, it doesn't look worthful to cover
        by a test.

b7941d8c

config: introduce options for storage · 79a0eb91

Albert Skalt authored 1 year ago

This patchs adds config options for tarantool config storage,
as fhis feature will be implemented in EE.

Part of https://github.com/tarantool/tarantool-ee/issues/593

NO_DOC=EE feature supplement
NO_CHANGELOG=EE feature supplement

79a0eb91

config: add config source file to the build · befe0dfe

Albert Skalt authored 1 year ago

NO_DOC=supplement change
NO_TEST=supplement change
NO_CHANGELOG=supplement change

Part of https://github.com/tarantool/tarantool-ee/issues/593

befe0dfe

Dec 15, 2023

config: support new vshard options · f6d6e4ba

Mergen Imeev authored 1 year ago

This patch introduces support for the vshard 'box_cfg_mode' and
'schema_management_mode' options in the config module.

Part of #8862

NO_DOC=internal
NO_CHANGELOG=internal

f6d6e4ba

box.ctl: add iproto_lockdown option to security · 79ec68dd

Gleb Kashkin authored 1 year ago

The new option is configured by `box.ctl.iproto_lockdown()`. It is
avalilable only in Enterprise Edition builds.

Needed for tarantool/tarantool-ee#585

NO_DOC=Will be added to Enterprise Edition
NO_CHANGELOG=see NO_DOC
NO_TEST=No logic is added to Community Edition, will be tested in
        Enterprise Edition

79ec68dd

ci: add optional submodule bump step · 51ed892c

Maksim Kokryashkin authored 1 year ago

Currently, if there is a need to test submodule integration with
Tarantool and its integration, it is required to create a PR.
That is inconvenient, so this patch introduces the option to run
the same jobs that are triggered by the `full-ci` label as
reusable workflows with the desired submodule revision. This
allows for integration testing of submodules within their
designated repositories.

NO_DOC=CI
NO_TEST=CI
NO_CHANGELOG=CI

51ed892c

trigger: fix NULL dereference and memory leak · 101345c9

Andrey Saranchin authored 1 year ago

Using LuaC API, one can create a callable number - it will be callable
and at the same time lua_topointer will return NULL if such number will
be passed. And function luaT_event_reset_trigger_with_flags relies on the
fact that lua_topointer cannot return NULL if it is called with a
callable object. The assumption is wrong, so let's rewrite the function
without relying on it.

NB: if you are using old trigger API with such exotic handlers, you can
have only one such trigger in an event - it will occupy name '0x0'.

Closes #9287

NO_CHANGELOG=bugfix for unreleased feature
NO_DOC=bugfix

101345c9

test: fix sending_arbitrary_iproto_packets_test hanging shutdown · 1c737bff

Nikolay Shirokovskiy authored 1 year ago

test_box_iproto_send_errors test adds on disconnect callback handler
which is not cleaned up. So as it writes to channel with capacity 1 and
no one reads from it every new connection will hang on disconnect. And
we make new connection for every test in before each hook. As a result
we fail to shutdown test server in time. Iproto shutdown hangs waiting
for connections to disconnect.

Part of #8423

NO_CHANGELOG=internal
NO_DOC=internal

1c737bff

iproto: drop connections on iproto shutdown · 7b45b9ad

Nikolay Shirokovskiy authored 1 year ago

This is the first patch in series of graceful Tarantool shutdown. We
already have graceful shutdown for client code (through
`box.ctl.on_shutdown()`) or network clients (through "box.shutdown" event).
This is graceful shutdown of server itself when we exit all threads
(instead of abrutly cancelling them) and thoroughly free all resources.

We need to introduce iproto_drop_connections for
tarantool/tarantool-ee#585 and this is part of graceful iproto shutdown
too. During client graceful iproto shutdown we stop accepting new
connections thus iproto_drop_connections on iproto shutdown will stop
all iproto activities.

On connection drop we stop IO for connection, cancel all inprogress
requests of the connection and wait while requests are finished and
connection is freed.

Relay requests are just long polling request of iproto. So on graceful
iproto shutdown the fiber processing relay request is cancelled. This
will cancel join or subscribe cords of relay because cord_cojoin cancels
cord threads.

Part of #8423

NO_TEST=rely on existing on tests
NO_CHANGELOG=internal
NO_DOC=internal

7b45b9ad

coio: make coio write and read a cancellation point · 3f1db5bf

Nikolay Shirokovskiy authored 1 year ago

It is nice to make these functions a cancellation point. Currently if we
use them in a loop and fiber is cancelled then whether the function
check for cancel or not depends on peer speed. If the latter is fast
enough we won't wait and won't check for cancel. So we will have to add
check for cancel in loop itself. Instead let's add this check to the
functions. We do not change accept and connect functions as such case is
unlikely for them.

Part of #8423

NO_TEST=rely on existing tests
NO_CHANGELOG=internal
NO_DOC=internal

3f1db5bf

iproto: introduce is_in_replication connection flag · 31079f50

Nikolay Shirokovskiy authored 1 year ago

Connections that are used for replication handled a bit differently. For
example during processing JOIN or SUBCRIBE requests all IO is done
outside of iproto. Thus we need to drop regular and replication
connections differently. Let's introduce `is_in_replication` flag for
this purpuse. While at it we can refactor other pieces of code where we
differentiate connections to use this flag.

Part of #8423

NO_TEST=refactoring
NO_CHANGELOG=refactoring
NO_DOC=refactoring

31079f50

fiber: make cord_cojoin cancellable · 6f8b0b64

Nikolay Shirokovskiy authored 1 year ago

We may need to cancel fiber that waits for cord to finish. For this
purpose let's cancel fiber started by cord_costart inside the cord.

Note that there is a race between stopping cancel_event in cord and
triggering it using ev_async_send in joining thread. AFAIU it is safe.

We also need to fix stopping wal cord to address stack-use-after-return
issue shown below. Is arises because we did not stop async which resides
in wal endpoint and endpoint resides on stack. Later when we stop the
introduced cancel_event we access not stopped async which at this moment
gone out of scope.

It is simple, we only need to destroy wal endpoint. But then we got
assertion on deleting endpoint as endpoints in other cords are not
destroyed and when we delete wal endpoint we access rlist links which
reside in freed memory of other endpoints. So if we want to cleanup
cleanly we need to stop vinyl loop properly which is reverting the
commit e463128e ("vinyl: cancel reader and writer threads on
shutdown"). So this issue needs more attention. Let's postpone it by
temporary suppressing ASAN issue.

```
==3224698==ERROR: AddressSanitizer: stack-use-after-return on address 0x7f654b3b0170 at pc 0x555a2817c282 bp 0x7f654ca55b30 sp 0x7f654ca55b28
WRITE of size 4 at 0x7f654b3b0170 thread T3
#0 0x555a2817c281 in ev_async_stop /home/shiny/dev/tarantool/third_party/libev/ev.c:5492:37
#1 0x555a27827738 in cord_thread_func /home/shiny/dev/tarantool/src/lib/core/fiber.c:1990:2
#2 0x7f65574aa9ea in start_thread /usr/src/debug/glibc/glibc/nptl/pthread_create.c:444:8
#3 0x7f655752e7cb in clone3 /usr/src/debug/glibc/glibc/misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
```

Part of #8423

NO_DOC=internal
NO_CHANGELOG=internal

6f8b0b64

Dec 14, 2023

replication: don't rollback qsync limbo wait on fiber cancel · 7a2bc0bb

Nikolay Shirokovskiy authored 1 year ago

During iproto graceful shutdown which is WIP we cancel all iproto
request in progress. This causes election_qsync_stress test failure.

We shutdown master on waiting transaction confirmation from quorum
(which is never exist in this test). Currently on shutdown we rollback
transaction in this state. So that when previous master is restarted
after electing new master we don't expect the rollback on previous
master.

Let's keep the transaction in limbo if fiber is cancelled as our
direction is to do only quorum rollbacks.

Part of #8423
Closes #9480

NO_DOC=bugfix

7a2bc0bb

test: increase timeouts in on_shutdown tests · dfca3c6c

Alexander Turenko authored 1 year ago

I got four fails on the given tests in a row on debug-asan job in CI for
tarantool-ee.

It seems, tarantool-ee is more sensitive to small timeouts, when the
address sanitizer slows down the execution. Or I'm just lucky.

Anyway, the given tests don't really need small timeouts: increasing it
doesn't break any test logic, doesn't increase duration of the test in a
successful case and doesn't increase it in case of a failure.

The tests are more stable after the change: I verified it locally by
running each of the tests in parallel many times on tarantool built with
enabled address sanitizer.

See the following commits for details about the given test cases and the
problems behind.

* commit 1fcfb8c2 ("app: start init script event loop explicitly")
* commit 786eb2ac ("main: don't break graceful shutdown on init
  script exit")

Follows up #9266
Follows up #9411

NO_DOC=test adjustment
NO_CHANGELOG=see NO_DOC

dfca3c6c

Dec 13, 2023

box: add integrity check option · ac58289f

Gleb Kashkin authored 1 year ago

The new module can be enable only in the Enterprise Edition builds
via `--integrity-check` cli option.

Needed for tarantool/tarantool-ee#585

NO_DOC=will be added to Enterprise Edition
NO_CHANGELOG=see NO_DOC

ac58289f

test: allow user-defined args in luatest_helpers.server · a8cceef2

Gleb Kashkin authored 1 year ago

Before this patch, all user-defiled cli arguments to the `server:new()`
were ignored. Now config-specific arguments that used to replace
user-defined ones are added to the end of the `args` table instead.

Part of tarantool/tarantool-ee#585

NO_DOC=test helper change
NO_CHANGELOG=see NO_DOC
NO_TEST=see NO_DOC

a8cceef2

changelog: reword gh-9235 changelog · 97f9c2b1

Serge Petrenko authored 1 year ago


Follow-up #9235

NO_DOC=changelog
NO_TEST=changelog

Co-authored-by: Kseniia Antonova <73473519+xuniq@users.noreply.github.com>

97f9c2b1

config: add `config.storage` role file to the build · aab00969

Albert Skalt authored 1 year ago

This patch adds the `config.storage` role file to the build.

Part of https://github.com/tarantool/tarantool-ee/issues/593

NO_DOC=supplement change
NO_TEST=supplement change
NO_CHANGELOG=supplement change

aab00969

Dec 12, 2023

test: bump test-run with luatest update to 1.0.0-3 · 161ca17b

Alexander Turenko authored 1 year ago

This commit updates test-run and the only change in test-run is a bunch
of luatest updates. The list of luatest updates can be found in
tarantool/test-run#415 or below.

- assertions: Improved error message for one assert function [1]
- TAP output: add missing tabulation to artifacts [2]
- utils: add `version_current_ge_than()` [3]
- server: fix unix socket path length check [4]
- server: accept `new_box_uri` as a table [5]

The list excludes changes that are not related to test-run's usage:
documentation, testing of luatest itself, packaging of luatest and so
on.

[1]: tarantool/luatest@2a26c32
[2]: tarantool/luatest@5e8c3e3
[3]: tarantool/luatest@7b6f167
[4]: tarantool/luatest@a8b0389
[5]: tarantool/luatest@f37b353

NO_DOC=testing framework update
NO_CHANGELOG=see NO_DOC
NO_TEST=see NO_DOC

161ca17b

wal: fix failing assertion in box_wait_limbo_acked · 59b817ef

Astronomax authored 1 year ago

Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes #9235

NO_DOC=bugfix

59b817ef

Dec 07, 2023

iproto: don't account message twice in case of override fallback · 21112b06

Nikolay Shirokovskiy authored 1 year ago

We need to call `tx_accept_msg` in `tx_process_override` before we pass
message to the override handler. Unfortunately if handler response with
IPROTO_HANDLER_FALLBACK we call the builtin handler for message that
calls `tx_accept_msg` again which is not expected. Some actions of
this function are idempotent and some are not.

Let's make the function NOP if it called once again.

Closes #9345

NO_DOC=bugfix

21112b06