Commits · 24cbcbe7c9c41fc908177a1eaefe4026268c6848 · core / tarantool

Oct 26, 2023

fiber: make madvise(2) arguments page aligned with ASAN slab cache · 24cbcbe7

Nikolay Shirokovskiy authored 1 year ago

Regularly fiber stack slab is page aligned. So upper stack border is
page aligned too when stack grows down. But with ASAN friendly slab
cache implementation this border is not page aligned. As a result
madvise call on stack may zero memory beyond stack slab which will cause
heap corruption. In debug build corruption is detected by assertion:

NO_WRAP
 >  Fatal glibc error: malloc.c:2593 (sysmalloc): assertion failed: (old_top
 >  == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >=
 >  MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize
 >  - 1)) == 0)
NO_WRAP

Interestingly enough the issue can not be investigated using ASAN. The
memory is zeroed by kernel code which is not instrumented so it is
invisible for sanitizer.

Looks like non-ASAN builds are not affected. Even if stack_size is
not page aligned the slab allocated for stack is page aligned. Thus
memory zeroing will be inside the slab and there will be no memory
corruption.

Also when stack grows up lower stack border in not aligned even with
regular small implementation. So madvise call will fail with EINVAL as
it is required that start address is page aligned. We ignore the error
though. Let's fix this issue too while we at it.

Let's introduce fiber_madvise_aligned to align madvise range with proper
direction before calling madvise(2). To justify its usage note that
besides fixing the issues described above, in case of stack growing down
fiber->stack is page aligned and in case of stack growing up
fiber->stack + fiber->stack_size is page aligned.

Part of #7327

NO_TEST=tested by ASAN (debug build)
NO_CHANGELOG=has effect only with newly introduced ASAN friendly slab cache
NO_DOC=has effect only with newly introduced ASAN friendly slab cache

(cherry picked from commit 130c7807)

24cbcbe7

fiber: don't unpoison fiber stack · 8c1f93bf

Nikolay Shirokovskiy authored 1 year ago

The unpoison was added in the initial commit 1.7.2-68-gafd229393 that
supported ASAN. It is not clear why do we need it as we don't poison
stack memory manually.

Part of #7327

NO_TEST=removing unfunctional code
NO_CHANGELOG=removing unfunctional code
NO_DOC=removing unfunctional code

(cherry picked from commit 0784f7b7)

8c1f93bf

test: tune tests hitting quota for ASAN · d7bd586a

Nikolay Shirokovskiy authored 1 year ago

ASAN small object allocator implementation has a bit different pattern
on quota leasing on allocating memory. So we may need to allocate more
objects to hit the quota etc.

Part of #7327

NO_CHANGELOG=test tuning
NO_DOC=test tuning

(cherry picked from commit d456a986)

d7bd586a

sql: remove legacy code from vdbesort.c · 96505c61

Mergen Imeev authored 1 year ago

This patch removes some deprecated code. This code had no user-visible
effect, but caused problems when running the test with ASAN enabled.

Closes #8761

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit d63a4bf2)

96505c61

misc: avoid allocations of size 0 for region · a51e5647

Nikolay Shirokovskiy authored 1 year ago

Regular region implementation supports allocations of size 0 with no
extra efforts. It returns a non-NULL pointer in this case. However in
case of ASAN friendly implementation it will require a special care for
this case. Instead let's avaid allocations if size 0 for region.

Also use xregion_ macros for allocations. Our current policy is to panic
on OOM on runtime allocations.

Part of tarantool/tarantool#7327

NO_TEST=internal
NO_CHANGELOG=internal
NO_DOC=internal

(cherry picked from commit 8159347d)

a51e5647

misc: get rid of small _xc functions · 601a5802

Nikolay Shirokovskiy authored 1 year ago

Small library currently depends on Tarantool core through 'exception.h'.
This is not the way to go. Let's drop this dependency and instead of
moving _xc functions to Tarantool repo we can just stop using them. Our
current policy is to panic on OOM in case of runtime allocation.

Part of #7327

NO_DOC=<OOM behaviour is not documented>
NO_CHANGELOG=<no OOM expectations>
NO_TEST=<no test harness for checking OOM>

(cherry picked from commit 3fccfc8f)

601a5802

box: drop debug log on tuple new/delete · e955b447

Nikolay Shirokovskiy authored 1 year ago

They are rather noisy. Also delete debug log on arena creation. These
two make sense only with each other.

Part of #7327

NO_TEST=internal
NO_DOC=internal
NO_CHANGELOG=internal

(cherry picked from commit 0dc37356)

e955b447

update: panic on OOM · 669daeeb

Nikolay Shirokovskiy authored 1 year ago

Panic if we fail to allocate internal temporary objects on region. We do
not test allocation failures and this should normally happen also
 (see #3534).

Part of #8658

NO_DOC=code cleanup
NO_TEST=code cleanup
NO_CHANGELOG=code cleanup

(cherry picked from commit b1a03a49)

669daeeb

sql: use xregion_*() functions · e9a42b8d

Mergen Imeev authored 1 year ago

This patch replaces region_*() functions with xregion_*() functions.

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit 1ba84fe3)

e9a42b8d

trivia: rework xregion_alloc_* macros · 5f282bc1

Mergen Imeev authored 1 year ago

This patch removes the 'size' argument from macros, as it was only used
to set an error on failure, which is not possible for x* versions. In
addition, both macros now cast the value to the specified type, as is
done in the original macros.

Closes #8522

NO_DOC=internal
NO_TEST=internal
NO_CHANGELOG=internal

(cherry picked from commit ae02f0cd)

5f282bc1

sql: fix memory leaks · 6627755f

Mergen Imeev authored 1 year ago

This patch fixes SQL memory leaks found by static analyzers and SQL
fuzzer.

Part of tarantool/security#120

NO_DOC=fix for memleak
NO_TEST=fix for memleak
NO_CHANGELOG=fix for memleak

(cherry picked from commit cd173ce5)

6627755f

mpstream: get rid of mpstream_reset · b487e0d7

Nikolay Shirokovskiy authored 1 year ago

Proposed ASAN implementation of region allocator does not support double
reservation for the sake of simplicity. Every reservation is supposed to
be followed by one or more allocations.

This restriction does not work well with mpstream currently. The issue is
mpstream_init/mpstream_reserve do reservation of size 0. For example In
case of region slab of min order is reserved (a chunk of memory of page
size currently). If the first data we want to write to mpstream is
larger then the reservation done then we make reservation again.

Let's get rid of this reservation at the beginning as it is suboptimal
behaviour. Moreover let's get rid of mpstream_reset as mpstream_init
is lightweight and we can create a new mpstream instead of reusing
exiting.

Also while we at it avoid allocation of 0 size in mpstream_flush as it
is done in mpstream_reserve_slow (see 3.0.0-alpha3-19-g8159347d0 "misc:
avoid allocations of size 0 for region" for details).

NO_TEST=internal
NO_CHANGELOG=internal
NO_DOC=internal

(cherry picked from commit 3b1de78d)

b487e0d7

lua: provide tarantool build info before loading lua modules · 248d23b0

Nikolay Shirokovskiy authored 1 year ago

This way we will have access to build info in those modules. In
particularly build.asan flag is going to be used in buffer.lua in scope
of #7327.

Part of #7327

NO_TEST=internal
NO_DOC=internal
NO_CHANGELOG=internal

(cherry picked from commit f58cc96f)

248d23b0

lua: provide whether ASAN build in tarantool.build.asan · 33c63d72

Nikolay Shirokovskiy authored 1 year ago

We already use this info in one of the test and going to use it more.

Part of #7327

@TarantoolBot document
Title: new tarantool.build.asan flag

It is `true` if `ENABLE_ASAN` build option is set and `false` otherwise.

(cherry picked from commit 23012356)

33c63d72

lua: move check param helpers to internal.utils · 96ac7b91

Vladimir Davydov authored 1 year ago

The check_param and check_param_table Lua helpers are defined in
box/lua/schema.lua but used across the whole code base. The problem is
we can't use them in files that are loaded before box/lua/schema.lua,
like box/lua/session.lua. Let's move them to a separate source file
lua/utils.lua to overcome this limitation. Also, let's add some tests.

NO_DOC=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit d8d267c5)

96ac7b91

test: add WA for #3807 to wal_off/oom.test · a54ef00a

Nikolay Shirokovskiy authored 1 year ago

We hit #3807 in release/2.11 for release ASAN build with ASAN-friendly
small allocators.

Follow-up #7327

NO_CHANGELOG=internal
NO_DOC=internal

(cherry picked from commit 3fbd7fcb)

a54ef00a

Oct 24, 2023

log: make log.cfg{modules=...} work as box.cfg{log_modules=...} · 9c0dcd7d

Vladimir Davydov authored 1 year ago

Configuring log modules work differently with log.cfg and box.cfg:
box.cfg{log_modules=...} overwrites the current config completely while
log.cfg{modules=...} overwrites the currently config only for the
specified modules. Let's fix this inconsistency by making log.cfg behave
exactly as box.cfg.

Closes #7962

NO_DOC=bug fix

(cherry picked from commit c13e59a5)

9c0dcd7d

Oct 20, 2023

fiber: use alternative signal stack · a4efd470

Vladimir Davydov authored 1 year ago

We install a signal handler that prints the stack trace on SIGSEGV,
SIGBUS, SIGILL, SIGFPE. The signal handler uses the current stack.
This works fine for most issues, but not for stack overflow, because
the latter makes the current stack unusable, leading to a crash in
the signal handler. Let's install an alternative signal stack in each
thread so that we can print the stack trace on stack overflow.

Note that we skip this for ASAN because it installs its own signal
stack. (Installing a custom stack would result in a crash.)

Closes #9222

NO_DOC=bug fix

(cherry picked from commit cb8e903b)

a4efd470

Oct 17, 2023

app: start init script event loop explicitly · e72eaa8a

Nikolay Shirokovskiy authored 1 year ago

The motivation is to reduce time slip on Tarantool startup before
running init scripts. Internal ev time is set in fiber_init/ev_default_loop
and is not get updated until starting event loop. This causes
timeouts slip up to 0.3 in debug ASAN build in init script (see #9261).

Let's run event loop right at the beginning of the run_script_f before
executing any script. This way besides updating internal ev time we make
an explicit place of starting script event loop. Currently it is started
lazily when config script yields.

This will fix CI for PR https://github.com/tarantool/tarantool-ee/pull/572
for debug ASAN workflow.

We can also remove start_loop condition. It does not make sense now. It
was added in the commit 3a851430 ("Fix tarantool -e "os.exit()"
hang") but since then we start to stop event loop after handling
os.exit().

Also this fixes #9266. The issue is we don't have an event loop to run
on shutdown triggers if -e command line expression add such a trigger
and then call os.exit().

Follow-up #7327
Closes #9266

NO_DOC=bugfix

(cherry picked from commit 1fcfb8c2)

e72eaa8a

tarantoolctl: fix luarocks warnings issue · dc5edaa4

Pavel Balaev authored 1 year ago

This patch fixes issue:

$ tarantoolctl rocks --version 1>/dev/null
Warning: failed to load command module luarocks.cmd.help

NO_DOC=bugfix
NO_CHANGELOG=not released yet

(cherry picked from commit d6ae403e)

dc5edaa4

Oct 16, 2023

console: forward original URI to net.box when connecting over IPROTO · 6bb09cec

Vladimir Davydov authored 1 year ago

Tarantool supports two console protocols: text and binary. The binary
protocol is implemented with IPROTO EVAL request so the console module
reuses the net.box module to establish and maintain a binary connection.
Currently, instead of passing the original URI specified by the user to
net.box.connect as is, the console module parses the URI and passes the
host and port. As a result, extra information that may be specified in
URI parameters is lost. This prevents the user from connecting to the
binary console using the SSL transport because to use the SSL transport
the user must specify transport=ssl URI parameter.

Needed for tarantool/tarantool-ee#567

NO_DOC=no visible changes in CE
NO_TEST=no visible changes in CE
NO_CHANGELOG=no visible changes in CE

(cherry picked from commit 33e72567)

6bb09cec

Oct 13, 2023

box: fix space:bsize() handling on space alter · 1babcf1e

Ilya Verbin authored 1 year ago

During building an index in background, some transaction can perform a dml
request that affects space size (e.g. a replace), but the size will remain
the same, because bsize is moved from the old space to the new space in
memtx_space_prepare_alter() prior to space_execute_dml(). Fix this issue by
calling space_finish_alter() in alter_space_do().
In fact, this patch partially reverts commit 9ec3b1a4 ("alter: zap
space_vtab::commit_alter").

NO_DOC=bugfix

Closes #9247

(cherry picked from commit 54a42186)

1babcf1e

Oct 12, 2023

test: update gh_8083, gh_8445 and gh_7434 tests · 5e147360

Oleg Chaplashkin authored 1 year ago

These tests fail after the commit [1] has been added to the Luatest:

- app-luatest/gh_8083_fatal_signal_handler_test.lua
- app-luatest/gh_8445_crash_during_crash_report_test.lua
- box-luatest/gh_7434_yield_in_on_shutdown_trigger_test.lua

The issue is due to lack of necessary directories:

    sh: 1: cd: can't cd to /tmp/t/001_app-luatest/server-XXX

Just update tests on the simple `fio` module instead `luatest.server`.

[1] tarantool/luatest@7d1358c

NO_CHANGELOG=internal
NO_DOC=internal

(cherry picked from commit 23b61351)

5e147360

test: bump test-run to new version · d6f47f7d

Oleg Chaplashkin authored 1 year ago

Bump test-run to new version with the following improvements:

- luatest: bump luatest to 0.5.7-48-g18859f6 [1]
- Adapt use luatest with new --no-clean option [2]
- luatest: bump luatest to 0.5.7-49-g9c7710e [3]

[1] tarantool/test-run@aa3b34d
[2] tarantool/test-run@8ebb3aa
[3] tarantool/test-run@82542d3

NO_DOC=test
NO_TEST=test
NO_CHANGELOG=test

(cherry picked from commit f4bc53e8)

d6f47f7d

Oct 11, 2023

unit: fix undefined behaviour in prbuf test · 31716bbd

Nikolay Shirokovskiy authored 1 year ago

The test start to fail in CI on osx_debug (x86_64) workflow

```
[033]  	*** test_buffer_foreach_copy_number ***
[033] -ok 13 - prbuf(size=256, payload=16, iterations=16) has been validated
[033] -ok 14 - prbuf(size=256, payload=16, iterations=32) has been validated
[033] -ok 15 - prbuf(size=256, payload=16, iterations=64) has been validated
[033] +ok 13 - prbuf(size=256, payload=4294967312, iterations=16) has been validated
[033] +ok 14 - prbuf(size=256, payload=4294967312, iterations=32) has been validated
[033] +ok 15 - prbuf(size=256, payload=4294967312, iterations=64) has been validated
[033]  	*** test_buffer_foreach_copy_number: done ***
```

NO_CHANGELOG=test fix
NO_DOC=test fix

(cherry picked from commit 4a868563)

31716bbd

Oct 10, 2023

sql: assign collation to indexes in CREATE TABLE · b215f125

Mergen Imeev authored 1 year ago

Before this patch, if an index was created due to a column's UNIQUE
constraint or a column's PRIMARY KEY constraint before adding a
collation, and if the column's fieldno was not equal to the index's
position in space->index, the collation would not be assigned to the
index.

Also, this patch fixes an assertion in debug build for the case when an
index with more that one field was created before a collation was added.

Closes #9229

NO_DOC=bugfix

(cherry picked from commit 65608d87)

b215f125

ci: add debug_asan_clang workflow · 7b67f9be

Nikolay Shirokovskiy authored 1 year ago

Similarly to release_asan_clang but to test debug build. It is also run
only under `asan-ci` and `full-ci` labels.

Fiber stack size is 2 times bigger than in the release workflow for luajit
tests to pass. Note that this factor is a wild guess.

Part of #7327

NO_TEST=ci
NO_CHANGELOG=ci
NO_DOC=ci

(cherry picked from commit 980ad3f4)

7b67f9be

vinyl: purge cache at exit for ASAN · 0426cc7b

Vladimir Davydov authored 1 year ago

Required to suppress the ASAN leak detector.

Closes #9158

NO_DOC=ASAN
NO_TEST=ASAN
NO_CHANGELOG=ASAN

(cherry picked from commit bf62170f)

0426cc7b

test: fix flaky gh-2717-no-quit-sigint · 23dd75fa

Nikolay Shirokovskiy authored 1 year ago

This test is quite a flaky in debug ASAN build. Let's fix it before
turning debug ASAN on in CI.

The issue is due to heavy load popen.read may return nil with 'TimedOut:
timed out' error. Just read again as in the other cases of this test.

Part of #7327

NO_CHANGELOG=internal
NO_DOC=internal

(cherry picked from commit 6f48b8d7)

23dd75fa

asan: temporary suppress leak reports relatead to luajit · 679e568f

Nikolay Shirokovskiy authored 1 year ago

This blocks us from turning debug ASAN CI currently. The ticket for the
leakage is #9213.

Part of #7327

NO_TEST=internal
NO_CHANGELOG=internal
NO_DOC=internal

(cherry picked from commit 37d0fdbf)

679e568f

Oct 09, 2023

box: fix force recovery for transactions with local rows · 4643a26a

Serge Petrenko authored 1 year ago

Force recovery first tries to collect all rows of a transaction into a
single list, and only then applies those rows.

The problem was that it collected rows based on the row replica_id. For
local rows replica_id is set to 0, but actually such rows can be part
of a transaction coming from any instance.

Fix recovery of such rows

Follow-up #8746
Follow-up #7932

NO_DOC=bugfix
NO_CHANGELOG=the broken behaviour couldn't be seen due to bug #8746

(cherry picked from commit 85df1c96)

4643a26a

box: get rid of dummy NOPs after transactions ending with local rows · 9bde48a8

Serge Petrenko authored 1 year ago

In order to preserve transaction boundaries over replication, Tarantool
writes a global NOP row after the last transaction row, if this row
happens to be local. This is done to make sure that the is_commit flag,
which is set only in the last transaction row, reaches the replica. This
wouldn't happen if the last row was local.

This workaround works fine for transactions completely authored by one
instance: when both global and local rows come from operations of a
single master.

However, it's possible to append local rows to a remote master's
transaction on a replica. For example, one can use on_replace triggers
to write to replica's local space on each new transaction coming from
master.

In this case essentially a global NOP entry is added at the end of a
remote master's transaction. This leads to several problems.

First of all, this bumps replica's LSN, which is counter-intuitive,
given that the replica might even be read-only. Besides, in a star
topology this leads to master being unable to connect to the replica
later on due to their vclocks becoming incompatible.

Secondly, even if replication channel between master and replica is
bidirectional, it creates a new row which should be replicated from
replica to master, but at the same time is the last row of the master's
transaction. Once master receives this row, it breaks its connection to
replica due to transaction boundary violation (the last row of the
transaction is received without its beginning).

Adding a NOP row became extraneous since the previous commit, which made
relay find transaction boundaries by itself.

Closes #8958

NO_DOC=bugfix

(cherry picked from commit f5e52b2c)

9bde48a8

relay: send rows transactionally · 8f2e2be9

Serge Petrenko authored 1 year ago

Some time ago we started writing transaction boundaries to WAL and
respecting them in the replication stream: replicas wait for a full
transaction receipt before applying it.

However, during all these changes relay remained transaction-agnostic:
it simply read single rows from WAL and sent them over to the receiver.

This lead to a handful of ugly crutches: for example, tsn is not always
equal to the lsn of the first global row of the transaction: if the
first row is local, tsn is deduced from the first global row of the
transaction.

Also a dummy NOP was appended to the end of a transaction ending by a
local row, so that is_commit flag wasn't lost by the replication.

Let's make relay read a full transaction, filter out all the unnecessary
rows, set the transaction boundaries accordingly and then send the
transaction at once.

Since in relay a single fiber sends data to the remote peer, there is no
chance for a heartbeat to get in between rows of a single transaction:
they're all sent at once. Hence the deletion of a corresponding guard
`relay->is_sending_tx`.

Prerequisite #8958

NO_DOC=internal change
NO_CHANGELOG=internal change
NO_TEST=covered by existing tests

(cherry picked from commit f96782b5)

8f2e2be9

wal: fix transaction boundaries for replicated transactions · c8594fbd

Serge Petrenko authored 1 year ago

Transaction boundaries were not updated correctly for transactions in
which local space writes were made from a replication trigger. Existing
transaction boundaries and row flags from the master were written as is
on the replica. Actually, the replica should recalculate transaction
boundaries and even WAIT_SYNC/WAIT_ACK flags.

Transaction boundaries should be recalculated when a replica appends a
local write at the end of the master's transaction, and
WAIT_SYNC/WAIT_ACK should be overwritten when nopifying synchronous
transactions coming from an old term.

The latter fix has uncovered the bug in skipping outdated synchronous
transactions: if one replica replaces a transaction from an old term
with NOPs and then passes that transaction to the other replica, the
other replica raises a split brain error. It believes the NOPs are an
async transaction form an old term. This worked before the fix, because
the rows were written with the original WAIT_ACK = true bit. Now this
is fixed properly: we allow fully NOP async tranasctions from the old
term.

Closes #8746

NO_DOC=bugfix
NO_CHANGELOG=covered by the next commit

(cherry picked from commit 099cb2da)

c8594fbd

Oct 05, 2023

sql: fix memory leak in SQL parser for column default rule · e3f175ff

Nikolay Shirokovskiy authored 1 year ago

If non-terminal symbol is referenced in C code then destructor for
expression is not called. Thus we don't need to duplicate. Otherwise we
got a memory leak.

See https://www.sqlite.org/cgi/src/doc/trunk/doc/lemon.html#destructor

Close #9159

NO_DOC=bugfix
NO_TEST=tested by debug ASAN CI (to be turned on)

(cherry picked from commit 36ef3fb4)

e3f175ff

Oct 03, 2023

ci: run asan workflow on 'asan-ci' label · 17b3ecd5

Nikolay Shirokovskiy authored 1 year ago

It is convenient to have a label to run ASAN CI without running full CI.

NO_DOC=ci
NO_TEST=ci
NO_CHANGELOG=ci

(cherry picked from commit c0025ffb)

17b3ecd5

ci: run performance tests · 29211065

Sergey Bronnikov authored 1 year ago

Performance tests added to perf directory are not automated and
currently we run these tests manually from time to time. From other side
source code that used rarely could lead to software rot [1].

The patch adds CMake target "test-perf" and GitHub workflow, that runs
these tests in CI. Workflow is based on workflow release.yml, it builds
performance tests and runs them.

1. https://en.wikipedia.org/wiki/Software_rot

NO_CHANGELOG=testing
NO_DOC=testing
NO_TEST=testing

(cherry picked from commit 5edcb712)

29211065

cmake: build performance tests only with release build · 4ff80886

Sergey Bronnikov authored 1 year ago

Note that targets for running performance tests are generated only when
CMAKE_BUILD_TYPE is equal to Release or RelWithDebug. Additionally, C++
performance tests require Google Benchmark library. Using non-debug
build and having installed Google Benchmark library is rare case, so I
suppose we don't need to introduce CMake option for performance testing.

NO_CHANGELOG=testing
NO_DOC=testing
NO_TEST=testing infrastructure

(cherry picked from commit a63d291b)

4ff80886

perf: add targets for running C performance tests · 4771c1b8

Sergey Bronnikov authored 1 year ago

The patch adds a targets for each C performance test in a directory
perf/ and a separate target "test-c-perf" that runs all C performance
tests at once.

NO_CHANGELOG=testing
NO_DOC=testing
NO_TEST=test infrastructure

(cherry picked from commit 68623381)

4771c1b8

perf: add targets for running Lua performance tests · 81b624fb

Sergey Bronnikov authored 1 year ago

The patch adds a targets for each Lua performance test in a directory
perf/lua/ (1mops_write_perftest, box_select_perftest,
uri_escape_unescape_perftest) and a separate target "test-lua-perf" that
runs all Lua performance tests at once.

NO_CHANGELOG=testing
NO_DOC=testing
NO_TEST=test infrastructure

(cherry picked from commit 49d9a874)

81b624fb