Commits · 6213907cd9ec6e03cdf11c2c2d2d01bc85678dde · core / tarantool

Jul 08, 2022

test: fix cleanup in vinyl-luatest/gh_6565_hot_standby_unsupported test · 6213907c

The gh_6565 test doesn't stop the hot standby replica it started,
because the replica should fail to initialize and exit eventually
anyway. However, if the replica lingers until the next test due to
https://github.com/tarantool/test-run/issues/345, the next test may
successfully connect to it, which is likely to lead to a failure,
because UNIX socket paths used by luatest servers are not randomized.

For example, here gh_6568 test fails after gh_6565, because it uses the
same alias for the test instance ('replica'):

NO_WRAP
[008] vinyl-luatest/gh_6565_hot_standby_unsupported_>                 [ pass ]
[008] vinyl-luatest/gh_6568_replica_initial_join_rem>                 [ fail ]
[008] Test failed! Output from reject file /tmp/t/rejects/vinyl-luatest/gh_6568_replica_initial_join_removal_of_compacted_run_files.reject:
[008] TAP version 13
[008] 1..1
[008] # Started on Fri Jul  8 15:30:47 2022
[008] # Starting group: gh-6568-replica-initial-join-removal-of-compacted-run-files
[008] not ok 1  gh-6568-replica-initial-join-removal-of-compacted-run-files.test_replication_compaction_cleanup
[008] #   builtin/fio.lua:242: fio.pathjoin(): undefined path part 1
[008] #   stack traceback:
[008] #         builtin/fio.lua:242: in function 'pathjoin'
[008] #         ...ica_initial_join_removal_of_compacted_run_files_test.lua:43: in function 'gh-6568-replica-initial-join-removal-of-compacted-run-files.test_replication_compaction_cleanup'
[008] #         ...
[008] #         [C]: in function 'xpcall'
[008] replica | 2022-07-08 15:30:48.311 [832856] main/103/default.lua F> can't initialize storage: unlink, called on fd 30, aka unix/:(socket), peer of unix/:(socket): Address already in use
[008] # Ran 1 tests in 0.722 seconds, 0 succeeded, 1 errored
NO_WRAP

Let's fix this by explicitly killing the hot standby replica. Since it
could have exited voluntarily, we need to use pcall, because server.stop
fails if the instance is already dead.

This issue is similar to the one fixed by commit 85040161 ("test:
stop server started by vinyl-luatest/update_optimize test").

NO_DOC=test
NO_CHANGELOG=test

6213907c

http_parser: fix parsing HTTP protocol version · 9ee7e568
Nikolay Shirokovskiy authored 2 years ago
```
Handle status header response like 'HTTP/2 200' with version without
dot.

Closes #7319

NO_DOC=bugfix
```
9ee7e568

box: fix unexpected error on granting privileges to admin · aaf6f8e9

Nikolay Shirokovskiy authored 2 years ago

We use LuaJIT 'bit' module for bitwise operations. Due to platform
interoperability it truncates arguments to 32bit and returns signed
result. Thus on granting rights using bit.bor to admin user which
have 0xffffffff rights (from bootstrap snapshot) we get -1 as a result.
This leads to type check error given in issue later in execution.

Closes #7226

NO_DOC=minor bugfix

aaf6f8e9

memtx: move allocator stuff from memtx_engine to MemtxAllocator · c7e3eae9

Vladimir Davydov authored 2 years ago

Let's hide all the logic regarding delayed freeing of memtx tuples to
MemtxAllocator and provide memtx_engine with methods for allocating and
freeing tuples (not memtx_tuples, just generic tuples). All the tuple
and snapshot version manipulation stuff is now done entirely in
MemtxAllocator.

This is a preparation for implementing a general-purpose tuple read view
API in MemtxAllocator, see #7364.

Note, since memtx_engine now deals with the size of a regular tuple,
which is 4 bytes less than the size of memtx_tuple, this changes the
size reported by OOM messages and the meaning of memtx_max_tuple_size,
which now limits the size of a tuple, not memtx_tuple.

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

c7e3eae9

sql: fix error in ORDER BY ephemeral space format · 64cdb80c

Mergen Imeev authored 2 years ago

This patch fixes a bug where the ANY field type was replaced by the
SCALAR field type in the ephemeral space used in ORDER BY.

Closes #7345

NO_DOC=bugfix

64cdb80c

sql: ariphmetic of unsigned values · 3715f632

Mergen Imeev authored 2 years ago

After this patch, the result type of arithmetic between two unsigned
values will be INTEGER.

Closes #7295

NO_DOC=bugfix

3715f632

Jul 07, 2022

coio: do exec after fork in unit/coio.test · 8b5bcd92

Ilya Verbin authored 2 years ago

This makes the test more real-life, and allows not to bother in the
child process with the memory allocated prior to fork.

Closes #7370

NO_DOC=test fix
NO_CHANGELOG=test fix

8b5bcd92

test: remove tests from fragile list · c245e201

Igor Munkin authored 2 years ago

Since "x64/LJ_GC64: Fix fallback case of asm_fuseloadk64()."
(42853793ec3e6e36bc0f7dff9d483d64ba0d8d28) is backported into
tarantool/luajit trunk, box/bitset.test.lua and box/function1.test.lua
tests are no more fragile.

Follows up tarantool/tarantool-qa#234
Follows up tarantool/tarantool-qa#235

NO_DOC=test changes
NO_CHANGELOG=test changes
NO_TEST=test changes

c245e201

Jul 06, 2022

test: fix box-py/args.test.py · 11166382

Yaroslav Lobankov authored 2 years ago

This patch fixes `box-py/args.test.py` test and allows it to work
against tarantool installed from a package.

Closes tarantool/tarantool-qa#246

NO_DOC=testing stuff
NO_TEST=testing stuff
NO_CHANGELOG=testing stuff

11166382

test: fix app-tap/tarantoolctl.test.lua · 91143500

Yaroslav Lobankov authored 2 years ago

This patch fixes `app-tap/tarantoolctl.test.lua` test and allows it to
work against tarantool installed from a package.

Part of tarantool/tarantool-qa#246

NO_DOC=testing stuff
NO_TEST=testing stuff
NO_CHANGELOG=testing stuff

91143500

test: fix gh-1700-abort-recording-on-fiber-switch.test.lua · d6a6fc23

Yaroslav Lobankov authored 2 years ago

This patch fixes `gh-1700-abort-recording-on-fiber-switch.test.lua` test
and allows it to work against tarantool installed from a package.

Part of tarantool/tarantool-qa#246

NO_DOC=testing stuff
NO_TEST=testing stuff
NO_CHANGELOG=testing stuff

d6a6fc23

test: add new `make` test targets · 9adedc1f

Yaroslav Lobankov authored 2 years ago

This patch adds the new `make` test targets to run unit and functional
tests independending on each other. In some cases it can be useful.

New test targets:

* `test-unit` - run unit tests and exit after the first failure
* `test-unit-force` - run unit tests
* `test-func` - run functional tests and exit after the first failure
* `test-func-force` - run functional tests

Note, tests for 'small' lib are considered as unit tests as well.

Part of tarantool/tarantool-qa#246

NO_DOC=testing stuff
NO_TEST=testing stuff
NO_CHANGELOG=testing stuff

9adedc1f

test: use default readline configuration for a test · 1cd1a2df

Nikolay Shirokovskiy authored 2 years ago

If readline 'show-mode-in-prompt' is on then test fails because it does
not handle prefix added to prompt in this mode. Let's use default
(compiled in) readline configuration instead of the one provided by
user or system config.

NO_DOC=test changes
NO_CHANGELOG=test changes
NO_TEST=test changes

1cd1a2df

memtx: fix story delete statement list · 654cf498

Georgiy Lebedev authored 2 years ago

Current implementation of tracking statements that delete a story has a
flaw, consider the following example:

tx1('box.space.s:replace{0, 0}') -- statement 1

tx2('box.space.s:replace{0, 1}') -- statement 2
tx2('box.space.s:delete{0}') -- statement 3
tx2('box.space.s:replace{0, 2}') -- statement 4

When statement 1 is prepared, both statements 2 and 4 will be linked to the
delete statement list of {0, 0}'s story, though, apparently, statement 4
does not delete {0, 0}.

Let us notice the following: statement 4 is "pure" in the sense that, in
the transaction's scope, it is guaranteed not to replace any tuple — we
can retrieve this information when we check where the insert statement
violates replacement rules, use it to determine "pure" insert statements,
and skip them later on when, during preparation of insert statements, we
handle other insert statements which assume they do not replace anything
(i.e., have no visible old tuple).

On the contrary, statements 1 and 2 are "dirty": they assume that they
replaced nothing (i.e., there was no visible tuple in the index) — when one
of them gets prepared — the other one needs to be either aborted or
relinked to replace the prepared tuple.

We also need to fix relinking of delete statements from the older story
(in terms of the history chain) to the new one during preparation of insert
statements: a statement needs to be relinked iff it comes from a different
transaction (to be precise, there must, actually, be no more than one
delete statement from the same transaction).

Additionally, add assertions to verify the invariant that the story's
add (delete) psn is equal to the psn of the add (delete) statement's
transaction psn.

Closes #7214
Closes #7217

NO_DOC=bugfix

654cf498

Jul 05, 2022

test: stop server started by vinyl-luatest/update_optimize test · 85040161

Vladimir Davydov authored 2 years ago

Normally, if a server created by a test isn't stopped it should be
forcefully killed by luatest or test-run. For some reason, it doesn't
happen sometimes, which may lead to the next test failing to bind,
because all test servers that belong to the same luatest suite and have
the same alias share the same socket path (although they use different
directories). This looks like a test-run or luatest bug.

The vinyl-luatest/update_optimize test doesn't stop the test server
so because of this test-run/luatest bug, the next vinyl-luatest test
fails occasionally:

NO_WRAP
[001] vinyl-luatest/update_optimize_test.lua                          [ pass ]
[001] vinyl-luatest/gh_6568_replica_initial_join_rem>                 [ fail ]
[001] Test failed! Output from reject file /tmp/t/rejects/vinyl-luatest/gh_6568_replica_initial_join_removal_of_compacted_run_files.reject:
[001] TAP version 13
[001] 1..1
[001] # Started on Tue Jul  5 13:30:37 2022
[001] # Starting group: gh-6568-replica-initial-join-removal-of-compacted-run-files
[001] master | 2022-07-05 13:30:37.530 [189564] main/103/default.lua F> can't initialize storage: unlink, called on fd 25, aka unix/:(socket), peer of unix/:(socket): Address already in use
[001] ok     1  gh-6568-replica-initial-join-removal-of-compacted-run-files.test_replication_compaction_cleanup
[001] not ok 1  gh-6568-replica-initial-join-removal-of-compacted-run-files.test_replication_compaction_cleanup
[001] #   Failure in after_all hook: /home/vlad/.rocks/share/tarantool/luatest/process.lua:100: kill failed: 256
[001] #   stack traceback:
[001] #         .../src/tarantool/tarantool/test/luatest_helpers/server.lua:206: in function 'stop'
[001] #         ...src/tarantool/tarantool/test/luatest_helpers/cluster.lua:44: in function 'drop'
[001] #         ...ica_initial_join_removal_of_compacted_run_files_test.lua:34: in function <...ica_initial_join_removal_of_compacted_run_files_test.lua:33>
[001] #         ...
[001] #         [C]: in function 'xpcall'
[001] # Ran 1 tests in 1.682 seconds, 0 succeeded, 1 errored
NO_WRAP

Let's fix this by stopping the test server started by the
vinyl-luatest/update_optimize test.

NO_DOC=test
NO_CHANGELOG=test

85040161

test: add improved test for SELECT consistency · 4e9a94a3

Vladimir Davydov authored 2 years ago

The idea behind the new test is the same as the one used by
vinyl/select_consistency.test.lua: create a space with a few
compound secondary indexes that share the first part, then run
SELECT requests under heavy write load and check that results
match. However, in comparison to its predecessor, the new test
has a few improvements:

 1. It generates DML requests in multi-statement transactions.
 2. It checks non-unique indexes.
 3. It checks multikey indexes.
 4. It triggers L0 dumps not by box.snapshot, but by exceeding
    the box.cfg.vinyl_memory limit.
 5. It starts 20 write and 5 read fibers.
 6. It reruns the test after restart to check that recovery works fine.
 7. It checks that there's no phantom statements stored in
    the space indexes after the test.
 8. It runs the test with deferred DELETEs enabled and disabled.
    (see box.cfg.vinyl_defer_deletes).
 9. It is written in luatest.

The test takes about 20 seconds to finish so it's marked as long run.

Closes #4251

NO_DOC=test
NO_CHANGELOG=test

4e9a94a3

box: fix `fselect()` behavior on binary data · 915ccdf1

Ilya Verbin authored 2 years ago

Currently it throws an error when encounter binary data, print
<binary> tag instead.

Closes #7040

NO_DOC=bugfix

915ccdf1

Jul 04, 2022

replication: relax split-brain checks after DEMOTE · b5811f15

Serge Petrenko authored 2 years ago

Our txn_limbo_is_replica_outdated check works correctly only when there
is a stream of PROMOTE requests. Only the author of the latest PROMOTE
is writable and may issue transactions. No matter synchronous or
asynchronous.

So txn_limbo_is_replica_outdated assumes that everyone but the node with
the greatest PROMOTE/DEMOTE term is outdated.

This isn't true for DEMOTE requests. There is only one server which
issues the DEMOTE request, but once it's written, it's fine to accept
asynchronous transactions from everyone.

Now the check is too strict. Every time there is an asynchronous
transaction from someone, who isn't the author of the latest PROMOTE or
DEMOTE, replication is broken with ER_SPLIT_BRAIN.

Let's relax it: when limbo owner is 0, it's fine to accept asynchronous
transactions from everyone, no matter the term of their latest PROMOTE
and DEMOTE.

This means that now after a DEMOTE we will miss one case of true
split-brain: when old leader continues writing data in an obsolete term,
and the new leader first issues PROMOTE and then DEMOTE.

This is a tradeoff for making async master-master work after DEMOTE.

The completely correct fix would be to write the term the transaction
was written in with each transaction and replace
txn_limbo_is_replica_outdated with txn_limbo_is_request_outdated, so
that we decide whether to filter the request or not judging by the term
it was applied in, not by the term we seen in some past PROMOTE from the
node. This fix seems too costy though, given that we only miss one case
of split-brain at the moment when the user enables master-master
replication (by writing a DEMOTE). And in master-master there is no such
thing as a split-brain.

Follow-up #5295
Closes #7286

NO_DOC=internal chcange

b5811f15

Jul 01, 2022

test: fix running 'small' lib tests for OOS build · 28426f67

Yaroslav Lobankov authored 2 years ago

The 'small' lib test suite was not run for out-of-source builds since
the wrong symlink was created for test binaries and test-run couldn't
find them. Now it is fixed.

When test-run loads tests, first, it searches the suite.ini file and if
it exists test-run consider the dir as a test suite. So there was sense
to create a permanent link for 'small' lib tests.

Closes #4485

NO_DOC=testing stuff
NO_TEST=testing stuff
NO_CHANGELOG=testing stuff

28426f67

vinyl: explicitly disable hot standby mode · 008ab8d3

Vladimir Davydov authored 2 years ago

Vinyl doesn't support the hot standby mode. There's a ticket to
implement it, see #2013. The behavior is undefined if running an
instance in the hot standby mode in case the master has Vinyl spaces.
It may result in a crash or even data corruption.

Let's raise an explicit error in this case.

Closes #6565

NO_DOC=bug fix

008ab8d3

json: don't match any nodes if there's [*] in the path · 35802a23

Vladimir Davydov authored 2 years ago

If a nested tuple field is indexed, it can be accessed by [*] aka
multikey or any token:

  s = box.schema.create_space('test')
  s:create_index('pk')
  s:create_index('sk', {parts = {{2, 'unsigned', path = '[1][1]'}}})
  t = s:replace{1, {{1}}}
  t['[2][1][*]'] -- returns 1!

If a nested field isn't indexed (remove creation of the secondary index
in the example above), then access by [*] returns nil.

Call graph:

  lbox_tuple_field_by_path:
    tuple_field_raw_by_full_path
      tuple_field_raw_by_path
        tuple_format_field_by_path
          json_tree_lookup_entry
            json_tree_lookup

And json_tree_lookup matches the first node if the key is [*].
We shouldn't match anything to [*].

Closes #5226

NO_DOC=bug fix

35802a23

Jun 30, 2022

test: box_promote and box_demote · 5a8dca70

Boris Stepanenko authored 3 years ago

Covered most of box_promote and box_demote with tests:
1. Promote/demote unconfigured box
2. Promoting current leader with elections on and off
3. Demoting follower with elections on and off
4. Promoting current leader, but not limbo owner with elections on
5. Demoting current leader with elections on and off
6. Simultaneous promote/demote
7. Promoting voter
8. Interfering promote/demote while writing new term to wal
9. Interfering promote/demote while waiting for synchro queue
   to be emptied
10. Interfering promote while waiting for limbo to be acked
    (similar to replication/gh-5430-qsync-promote-crash.test.lua)

Closes #6033

NO_DOC=testing stuff
NO_CHANGELOG=testing stuff

5a8dca70

test: fix election_pre_vote flaky failure · a10958e2

Serge Petrenko authored 2 years ago

The test failed with the following output:

 TAP version 13
 1..3
 # Started on Tue Jun 28 13:36:03 2022
 # Starting group: pre-vote
 not ok 1	pre-vote.test_no_direct_connection
 #   .../election_pre_vote_test.lua:46: expected: a value evaluating to
					true, actual: false
 #   stack traceback:
 #   .../election_pre_vote_test.lua:65: in function 'retrying'
 #   .../election_pre_vote_test.lua:64: in function
				    'pre-vote.test_no_direct_connection'
 #   ...
 #   [C]: in function 'xpcall'
 ok     2	pre-vote.test_no_quorum
 ok     3	pre-vote.test_promote_no_quorum
 # Ran 3 tests in 6.994 seconds, 2 succeeded, 1 failed

This is the moment when one of the followers disconnects from
the leader and expects its `box.info.election.leader_idle` to grow.

It wasn't taken into account that this disconnect might lead to leader
resign due to fencing, and then a new leader would emerge and
`leader_idle` would still be small.

IOW, the leader starts with fencing turned off, and only resumes
fencing, once it has connected to a quorum of nodes (one replica in this
test). If the replica that we just connected happens to be the one we
disconnect in the test, the leader might fence, if it hasn't yet
connected to the other replica, because it immediately loses a quorum of
healthy connections right after gaining it for the first time.

Fix this by waiting until everyone follows everyone before each test
case.

The test, of course, could be fixed by turning fencing off, but this
might hide any possible future problems with fencing.

Follow-up #6654
Follow-up #6661

NO_CHANGELOG=test fix
NO_DOC=test fix

a10958e2

vinyl: disable deferred deletes if there are upserts on disk · a85629a6

Vladimir Davydov authored 2 years ago

Normally, there shouldn't be any upserts on disk if the space has
secondary indexes, because we can't generate an upsert without a lookup
in the primary index hence we convert upserts to replace+delete in this
case. The deferred delete optimization only makes sense if the space has
secondary indexes. So we ignore upserts while generating deferred
deletes, see vy_write_iterator_deferred_delete.

There's an exception to this rule: a secondary index could be created
after some upserts were used on the space. In this case, because of the
deferred delete optimization, we may never generate deletes for some
tuples for the secondary index, as demonstrated in #3638.

We could fix this issue by properly handle upserts in the write iterator
while generating deferred delete, but this wouldn't be easy, because in
case of a minor compaction there may be no replace/insert to apply the
upsert to so we'd have to keep intermediate upserts even if there is a
newer delete statement. Since this situation is rare (happens only once
in a space life time), it doesn't look like we should complicate the
write iterator to fix it.

Another way to fix it is to force major compaction of the primary index
after a secondary index is created. This looks doable, but it could slow
down creation of secondary indexes. Let's instead simply disable the
deferred delete optimization if the primary index has upsert statements.
This way the optimization will be enabled sooner or later, when the
primary index major compaction occurs. After all, it's just an
optimization and it can be disabled for other reasons (e.g. if the space
has on_replace triggers).

Closes #3638

NO_DOC=bug fix

a85629a6

Jun 29, 2022

fiber: get rid of cpu_misses in fiber.top() · 390311bb

Ilya Verbin authored 2 years ago

It doesn't make sense after switching from RDTSCP to
clock_gettime(CLOCK_MONOTONIC).

Part of #5869

@TarantoolBot document
Title: fiber: get rid of cpu_misses in fiber.top()
Since: 2.11

Remove any mentions of `cpu_misses` in `fiber.top()` description.

390311bb

Jun 27, 2022

datetime: fix set with hour=nil · ba140128

Timur Safin authored 2 years ago

We did not retain correctly `hour` attribute if modified
via `:set` method attributes `min`, `sec` or `nsec`.

```
tarantool> a = dt.parse '2022-05-05T00:00:00'

tarantool> a:set{min = 0, sec = 0, nsec = 0}
--
- 2022-05-05T12:00:00Z
...
```

Closes #7298

NO_DOC=bugfix

ba140128

Jun 24, 2022

vinyl: drop UPSERT squashing optimization when there is no disk data · afce0913

Vladimir Davydov authored 2 years ago

The optimization is mostly useless, because it only works if there's no
data on disk. As explained in #5080, it contains a potential bug: if L0
dump is triggered between 'prepare' and 'commit', it will insert a
statement to a sealed vy_mem. Let's drop it.

Part of #5080

NO_DOC=bug fix
NO_CHANGELOG=later

afce0913

Fix gh_6634 test case · 47ad3bc9

Nikita Pettik authored 2 years ago

gh_6634_different_log_on_tuple_new_and_free_test.lua verifies that
proper debug message gets into logs for tuple_new() and tuple_delete():
occasionally tuple_delete() printed wrong tuple address. However, still
there are two debug logs: one in tuple_delete() and another one in
memtx_tuple_delete(). So to avoid any possible confusions let's fix
regular expression to find proper log so that now it definitely finds
memtx_tuple_delete().

NO_CHANGELOG=<Test fix>
NO_DOC=<Test fix>

47ad3bc9

net.box: explicitly forbid synchronous requests in triggers · 0d944f90

Vladimir Davydov authored 2 years ago

Net.box triggers (on_connect, on_schema_reload) are executed
by the net.box connection worker fiber so a request issued by
a trigger callback can't be processed until the trigger returns
execution to the net.box fiber. Currently, an attempt to issue
a synchronous request from a net.box trigger leads to a silent
hang of the connection, which is confusing. Let's instead raise
an error until #7291 is implemented.

We need to add the check to three places in the code:
 1. luaT_netbox_wait_result for future:wait_result()
 2. luaT_netbox_iterator_next for future:pairs()
 3. conn._request for all synchronous requests.
    (We can't add the check to luaT_netbox_transport_perform_request,
    because conn._request may also call conn.wait_state, which would
    hang if called from on_connect or on_schema_reload trigger.)

We also add an assertion to netbox_request_wait to ensure that we
never wait for a request completion in the net.box worker fiber.

Closes #5358

@TarantoolBot document
Title: Synchronous requests are not allowed in net.box triggers

An attempt to issue a synchronous request (e.g. `call`) from
a net.box trigger (`on_connect`, `on_schema_reload`) now raises
an error: "Synchronous requests are not allowed in net.box trigger"
(Before https://github.com/tarantool/tarantool/issues/5358 was
fixed, it silently hung.)

Invoking an asynchronous request (see `is_async` option) is allowed,
but the request will not be processed until the trigger returns and
an attempt to wait for the request completion with `future:pairs()`
or `future:wait_result()` will raise the same error.

0d944f90

Jun 23, 2022

box: fix exclude_null for json and multikey indexes · 30cb5d6e

Vladimir Davydov authored 2 years ago

exclude_null is a special index option, which makes the index ignore
tuples that contain null in any of the indexed fields. Currently, it
doesn't work for json and multikey indexes, because:
 1. index_filter_tuple ignores json path.
 2. index_filter_tuple ignores multikey index.

Issue no. 1 is easy to fix - we just need to use tuple_field_by_part
instead of tuple_field when checking if a key field is null.

Issue no. 2 is more complicated, because when we call index_filter_tuple
we don't know the multikey index. We address this issue by pushing the
index_filter_tuple call down to engine-specific index implementation.

For Vinyl, we make vy_stmt_foreach_entry, which iterates over multikey
tuple entries, skip entries that contain nulls.

For memtx, we move the check to index-specific index_replace function
implementation.  Fortunately, only tree indexes support nullable fields
so we just need to update the memtx tree implementation.

Ideally, we should handle multikey indexes in memtx at the top level,
because the implementation should essentially be the same for all kinds
of indexes, but this refactoring is complicated and will be done later.
For now, just fix the bug.

Closes #5861

NO_DOC=bug fix

30cb5d6e

test: make all engine/null test cases multi-engine · 7e605b13

Vladimir Davydov authored 2 years ago

For some reason, some test cases create memtx spaces irrespective of
the value of the engine parameter.

NO_DOC=test
NO_CHANGELOG=test

7e605b13

Jun 22, 2022

box: introduce stubs for wal_ext · 6b6b5598
Nikita Pettik authored 2 years ago
```
NO_CHANGELOG=<No functional changes>
NO_DOC=<Later for EE>
```
6b6b5598

xrow: add old_tuple, new_tuple to struct request · 03e249b9

Nikita Pettik authored 2 years ago

These fields correspond to the tuple before DML request is executed
(old); and after - result (new). For example let index stores
tuple {1, 1}:
replace{1, 2} -- old == {1, 1}, new == {1, 2}

These fields rather make sense for update operation, which holds
a key and an array of update operations (not the old tuple).

`old_tuple`, `new_tuple` are going to be used as WAL extensions available
in enterprise version. Alongside with it let's reserve 0x2c and 0x2d
Iproto keys for these members.

NO_DOC=<No functional changes>
NO_TEST=<No functional changes>
NO_CHANGELOG=<No functional changes>

03e249b9

memtx: fix DML after select of key causing self-conflict of transaction · b3f085f2

Georgiy Lebedev authored 2 years ago

On insertion, when point holes are checked on insertion, we must only
conflict transactions other than the one that read the hole.

NO_CHANGELOG=internal bugfix
NO_DOC=bugfix

Closes #7234
Closes #7235

b3f085f2

memtx: fix DML causing transaction self-conflict after full scan in HASH · 3e546d49

Georgiy Lebedev authored 2 years ago

When full scans are checked on writes, we must only conflict transactions
other than the one that did the full scan.

NO_CHANGELOG=internal bugfix
NO_DOC=bugfix

Closes #7221

3e546d49

Jun 21, 2022

vinyl: fix !vy_tx_is_in_read_view assertion failure in vy_tx_prepare · 2971f691

Vladimir Davydov authored 2 years ago

Commit 4d52199e ("box: fix transaction "read-view" and "conflicted"
states") updated vy_tx_send_to_read_view so that now it aborts all RW
transactions right away instead of sending them to read view and
aborting them on commit. It also updated vy_tx_begin_statement to fail
if a transaction sent to a read view tries to do DML. With all that,
we assume that there cannot possibly be an RW transaction sent to read
view so we have an assertion checking that in vy_tx_commit.

However, this assertion may fail, because a DML statement may yield
on disk read before it writes anything to the write set. If this is
the first statement in a transaction, the transaction is technically
read-only and we will send it to read-view instead of aborting it.
Once it completes the disk read, it will apply the statement and hence
become read-write, breaking our assumption in vy_tx_commit.

Fix this by aborting RW transactions sent to read-view in vy_tx_set.

Follow-up #7240

NO_DOC=bug fix
NO_CHANGELOG=unreleased

2971f691

Jun 17, 2022

fiber: don't crash on wakeup with dead fibers · 206137e7

Cyrill Gorcunov authored 2 years ago


When fiber has finished its work it ended up in two cases:
1) If no "joinable" attribute set then the fiber is
   simply recycled
2) Otherwise it continue hanging around waiting to be
   joined.

Our API allows to call fiber_wakeup() for dead but joinable
fibers (2) in release builds without any side effects, such
fibers are simply ignored, in turn for debug builds this
causes assertion to trigger. We can't change our API for
backward compatibility sake but same time we must not
preserve different behaviour between release and debug
builds since this brings inconsistency. Thus lets get
rid of assertion call and allow to call fiber_wakeup
in debug build as well.

Fixes #5843

NO_DOC=bug fix

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

206137e7

replication: unify replication filtering with and without elections · deca9749

Serge Petrenko authored 2 years ago

Once the split-brain detection is in place, it's fine to nopify obsolete
data even on a node with elections disabled. Let's not keep a bug around
anymore.

This behaviour change leads to changing
"gh_6842_qsync_applier_order_test.lua" a bit. It actually relied on old
and buggy behaviour: it assumed old transactions would not be nopified
and would trigger replication error.

This doesn't happen anymore, because nopify works correctly, and the
transactions are not followed by a conflicting CONFIRM.

The test for this commit is simply altering the
gh_5295_split_brain_detection_test.lua to work with elections disabled.

Closes #6133
Follow-up #5295

NO_DOC=internal change
NO_CHANGELOG=internal change

deca9749

txn_limbo: filter incoming synchro requests · af7d703f

Cyrill Gorcunov authored 2 years ago


When we receive synchro requests we can't just apply them blindly
because in worst case they may come from split-brain configuration
(where a cluster split into several clusters and each one has own
leader elected, then clusters are trying to merge back into the original
one). We need to do our best to detect such disunity and force these
nodes to rejoin from the scratch for data consistency sake.

Thus when we're processing requests we pass them to the packet filter
first which validates their contents and refuse to apply if they violate
consistency.

Depending on request type each packet traverses an appropriate chain.

filter_generic(): a common chain for any synchro packet.
 1) request:replica_id = 0 allowed for PROMOTE request only.
 2) request:replica_id should match limbo:owner_id, IOW the
    limbo migration should be noticed by all instances in the
    cluster.

filter_confirm_rollback(): a chain for CONFIRM | ROLLBACK packets.
 1) Zero lsn is disallowed for such requests.

filter_promote_demote(): a chain for PROMOTE | DEMOTE packets.
 1) The requests should come in with nonzero term, otherwise
    the packet is corrupted.
 2) The request's term should not be less than maximal known
    one, iow it should not come in from nodes which didn't notice
    raft epoch changes and living in the past.

filter_queue_boundaries(): a common finalization chain.
 1) If LSN of the request matches current confirmed LSN the packet
    is obviously correct to process.
 2) If LSN is less than confirmed LSN then the request is wrong,
    we have processed the requested LSN already.
 3) If LSN is greater than confirmed LSN then
    a) If limbo is empty we can't do anything, since data is already
       processed and should issue an error;
    b) If there is some data in the limbo then requested LSN should
       be in range of limbo's [first; last] LSNs, thus the request
       will be able to commit and rollback limbo queue.

Note the filtration is disabled during initial configuration where we
apply requests from the only source of truth (either the remote master,
or our own journal), so no split brain is possible.

In order to make split-brain checks work, the applier nopify filter now
passes synchro requests from obsolete term without nopifying them.

Also, now ANY asynchronous request coming from an instance with obsolete
term is treated as a split-brain. Think of it as of a syncrhonous
request committed with a malformed quorum.

Closes #5295

NO_DOC=it's literally below

Co-authored-by: Serge Petrenko <sergepetrenko@tarantool.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

@TarantoolBot document
Title: new error type: ER_SPLIT_BRAIN

If for some reason the cluster had 2 leaders working independently (for
example, user has mistakenly lovered the quorum below N / 2 + 1), then
once such leaders and their followers try connecting to each other, they
will receive the ER_SPLIT_BRAIN error, and the connection will be
aborted. This is done to preserve data integrity. Once the user notices
such an error he or she has to manually inspect the data on both the
split halves, choose a way to restore the data, and rebootstrap one of
the halves from the other.

af7d703f

txn_limbo: do not confirm/rollback anything after restart · 6cc1b1f2

Serge Petrenko authored 2 years ago

It's important for the synchro queue owner to not finalize any of the
pending synchronous transactions after restart.

Since the node was down for some time the chances are pretty high it was
deposed by some new leader during its downtime. It means that the node
might not know yet that it's transactions were already finalized by someone
else.

So, any arbitrary finalization might lead to a future split-brain, once the
remote PROMOTE finally reaches the local node.

Let's fix this by adding a new reason for the limbo to be frozen - a
queue owner has recovered but has not issued a new PROMOTE locally and
hasn't received any PROMOTE requests from the remote nodes.

Once the first PROMOTE is issued or received, it's safe to return to the
old mode of operation.

So, now the synchro queue owner starts in "frozen" state and can't
CONFIRM, ROLLBACK or issue new transactions until either issuing a
PROMOTE or receiving a PROMOTE from some remote node.

This also required modifying box.ctl.promote() behaviour: it's no
longer a no-op on a synchro queue owner, when elections are disabled and
the queue is frozen due to restart.

Also fix the tests, which assumed the queue owner is writeable after a
restart. gh-5298 test was partially deleted, because it became pointless.

And while we are at it, remove the double run of gh-5288 test. It is
storage engine agnostic, so there's no point in running it for both
memtx and vinyl.

Part-of #5295

NO_CHANGELOG=covered by previous commit

@TarantoolBot document
Title: ER_READONLY error receives new reasons

When box.info.ro_reason is "synchro" and some operation throws an
ER_READONLY error, this error now might include the following reason:
```
Can't modify data on a read-only instance - synchro queue with term 2
belongs to 1 (06c05d18-456e-4db3-ac4c-b8d0f291fd92) and is frozen due to
fencing
```
This means that the current instance is indeed the synchro queue owner,
but it has noticed, that someone else in the cluster might start new
elections or might overtake the synchro queue soon.
This may be also detected by `box.info.election.term` becoming greater than
`box.info.synchro.queue.term` (this is the case for the second error
message).
There is also a slightly different error message:
```
Can't modify data on a read-only instance - synchro queue with term 2
belongs to 1 (06c05d18-456e-4db3-ac4c-b8d0f291fd92) and is frozen until
promotion
```
This means that the node simply cannot guarantee that it is still the
synchro queue owner (for example, after a restart, when a node still thinks
it is the queue owner, but someone else in the cluster has already
overtaken the queue).

6cc1b1f2