- Jul 08, 2022
-
-
Vladimir Davydov authored
The gh_6565 test doesn't stop the hot standby replica it started, because the replica should fail to initialize and exit eventually anyway. However, if the replica lingers until the next test due to https://github.com/tarantool/test-run/issues/345, the next test may successfully connect to it, which is likely to lead to a failure, because UNIX socket paths used by luatest servers are not randomized. For example, here gh_6568 test fails after gh_6565, because it uses the same alias for the test instance ('replica'): NO_WRAP [008] vinyl-luatest/gh_6565_hot_standby_unsupported_> [ pass ] [008] vinyl-luatest/gh_6568_replica_initial_join_rem> [ fail ] [008] Test failed! Output from reject file /tmp/t/rejects/vinyl-luatest/gh_6568_replica_initial_join_removal_of_compacted_run_files.reject: [008] TAP version 13 [008] 1..1 [008] # Started on Fri Jul 8 15:30:47 2022 [008] # Starting group: gh-6568-replica-initial-join-removal-of-compacted-run-files [008] not ok 1 gh-6568-replica-initial-join-removal-of-compacted-run-files.test_replication_compaction_cleanup [008] # builtin/fio.lua:242: fio.pathjoin(): undefined path part 1 [008] # stack traceback: [008] # builtin/fio.lua:242: in function 'pathjoin' [008] # ...ica_initial_join_removal_of_compacted_run_files_test.lua:43: in function 'gh-6568-replica-initial-join-removal-of-compacted-run-files.test_replication_compaction_cleanup' [008] # ... [008] # [C]: in function 'xpcall' [008] replica | 2022-07-08 15:30:48.311 [832856] main/103/default.lua F> can't initialize storage: unlink, called on fd 30, aka unix/:(socket), peer of unix/:(socket): Address already in use [008] # Ran 1 tests in 0.722 seconds, 0 succeeded, 1 errored NO_WRAP Let's fix this by explicitly killing the hot standby replica. Since it could have exited voluntarily, we need to use pcall, because server.stop fails if the instance is already dead. This issue is similar to the one fixed by commit 85040161 ("test: stop server started by vinyl-luatest/update_optimize test"). NO_DOC=test NO_CHANGELOG=test
-
Nikolay Shirokovskiy authored
Handle status header response like 'HTTP/2 200' with version without dot. Closes #7319 NO_DOC=bugfix
-
Nikolay Shirokovskiy authored
We use LuaJIT 'bit' module for bitwise operations. Due to platform interoperability it truncates arguments to 32bit and returns signed result. Thus on granting rights using bit.bor to admin user which have 0xffffffff rights (from bootstrap snapshot) we get -1 as a result. This leads to type check error given in issue later in execution. Closes #7226 NO_DOC=minor bugfix
-
Vladimir Davydov authored
Let's hide all the logic regarding delayed freeing of memtx tuples to MemtxAllocator and provide memtx_engine with methods for allocating and freeing tuples (not memtx_tuples, just generic tuples). All the tuple and snapshot version manipulation stuff is now done entirely in MemtxAllocator. This is a preparation for implementing a general-purpose tuple read view API in MemtxAllocator, see #7364. Note, since memtx_engine now deals with the size of a regular tuple, which is 4 bytes less than the size of memtx_tuple, this changes the size reported by OOM messages and the meaning of memtx_max_tuple_size, which now limits the size of a tuple, not memtx_tuple. NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Mergen Imeev authored
This patch fixes a bug where the ANY field type was replaced by the SCALAR field type in the ephemeral space used in ORDER BY. Closes #7345 NO_DOC=bugfix
-
Mergen Imeev authored
After this patch, the result type of arithmetic between two unsigned values will be INTEGER. Closes #7295 NO_DOC=bugfix
-
- Jul 07, 2022
-
-
Ilya Verbin authored
This makes the test more real-life, and allows not to bother in the child process with the memory allocated prior to fork. Closes #7370 NO_DOC=test fix NO_CHANGELOG=test fix
-
Igor Munkin authored
Since "x64/LJ_GC64: Fix fallback case of asm_fuseloadk64()." (42853793ec3e6e36bc0f7dff9d483d64ba0d8d28) is backported into tarantool/luajit trunk, box/bitset.test.lua and box/function1.test.lua tests are no more fragile. Follows up tarantool/tarantool-qa#234 Follows up tarantool/tarantool-qa#235 NO_DOC=test changes NO_CHANGELOG=test changes NO_TEST=test changes
-
- Jul 06, 2022
-
-
Yaroslav Lobankov authored
This patch fixes `box-py/args.test.py` test and allows it to work against tarantool installed from a package. Closes tarantool/tarantool-qa#246 NO_DOC=testing stuff NO_TEST=testing stuff NO_CHANGELOG=testing stuff
-
Yaroslav Lobankov authored
This patch fixes `app-tap/tarantoolctl.test.lua` test and allows it to work against tarantool installed from a package. Part of tarantool/tarantool-qa#246 NO_DOC=testing stuff NO_TEST=testing stuff NO_CHANGELOG=testing stuff
-
Yaroslav Lobankov authored
This patch fixes `gh-1700-abort-recording-on-fiber-switch.test.lua` test and allows it to work against tarantool installed from a package. Part of tarantool/tarantool-qa#246 NO_DOC=testing stuff NO_TEST=testing stuff NO_CHANGELOG=testing stuff
-
Yaroslav Lobankov authored
This patch adds the new `make` test targets to run unit and functional tests independending on each other. In some cases it can be useful. New test targets: * `test-unit` - run unit tests and exit after the first failure * `test-unit-force` - run unit tests * `test-func` - run functional tests and exit after the first failure * `test-func-force` - run functional tests Note, tests for 'small' lib are considered as unit tests as well. Part of tarantool/tarantool-qa#246 NO_DOC=testing stuff NO_TEST=testing stuff NO_CHANGELOG=testing stuff
-
Nikolay Shirokovskiy authored
If readline 'show-mode-in-prompt' is on then test fails because it does not handle prefix added to prompt in this mode. Let's use default (compiled in) readline configuration instead of the one provided by user or system config. NO_DOC=test changes NO_CHANGELOG=test changes NO_TEST=test changes
-
Georgiy Lebedev authored
Current implementation of tracking statements that delete a story has a flaw, consider the following example: tx1('box.space.s:replace{0, 0}') -- statement 1 tx2('box.space.s:replace{0, 1}') -- statement 2 tx2('box.space.s:delete{0}') -- statement 3 tx2('box.space.s:replace{0, 2}') -- statement 4 When statement 1 is prepared, both statements 2 and 4 will be linked to the delete statement list of {0, 0}'s story, though, apparently, statement 4 does not delete {0, 0}. Let us notice the following: statement 4 is "pure" in the sense that, in the transaction's scope, it is guaranteed not to replace any tuple — we can retrieve this information when we check where the insert statement violates replacement rules, use it to determine "pure" insert statements, and skip them later on when, during preparation of insert statements, we handle other insert statements which assume they do not replace anything (i.e., have no visible old tuple). On the contrary, statements 1 and 2 are "dirty": they assume that they replaced nothing (i.e., there was no visible tuple in the index) — when one of them gets prepared — the other one needs to be either aborted or relinked to replace the prepared tuple. We also need to fix relinking of delete statements from the older story (in terms of the history chain) to the new one during preparation of insert statements: a statement needs to be relinked iff it comes from a different transaction (to be precise, there must, actually, be no more than one delete statement from the same transaction). Additionally, add assertions to verify the invariant that the story's add (delete) psn is equal to the psn of the add (delete) statement's transaction psn. Closes #7214 Closes #7217 NO_DOC=bugfix
-
- Jul 05, 2022
-
-
Vladimir Davydov authored
Normally, if a server created by a test isn't stopped it should be forcefully killed by luatest or test-run. For some reason, it doesn't happen sometimes, which may lead to the next test failing to bind, because all test servers that belong to the same luatest suite and have the same alias share the same socket path (although they use different directories). This looks like a test-run or luatest bug. The vinyl-luatest/update_optimize test doesn't stop the test server so because of this test-run/luatest bug, the next vinyl-luatest test fails occasionally: NO_WRAP [001] vinyl-luatest/update_optimize_test.lua [ pass ] [001] vinyl-luatest/gh_6568_replica_initial_join_rem> [ fail ] [001] Test failed! Output from reject file /tmp/t/rejects/vinyl-luatest/gh_6568_replica_initial_join_removal_of_compacted_run_files.reject: [001] TAP version 13 [001] 1..1 [001] # Started on Tue Jul 5 13:30:37 2022 [001] # Starting group: gh-6568-replica-initial-join-removal-of-compacted-run-files [001] master | 2022-07-05 13:30:37.530 [189564] main/103/default.lua F> can't initialize storage: unlink, called on fd 25, aka unix/:(socket), peer of unix/:(socket): Address already in use [001] ok 1 gh-6568-replica-initial-join-removal-of-compacted-run-files.test_replication_compaction_cleanup [001] not ok 1 gh-6568-replica-initial-join-removal-of-compacted-run-files.test_replication_compaction_cleanup [001] # Failure in after_all hook: /home/vlad/.rocks/share/tarantool/luatest/process.lua:100: kill failed: 256 [001] # stack traceback: [001] # .../src/tarantool/tarantool/test/luatest_helpers/server.lua:206: in function 'stop' [001] # ...src/tarantool/tarantool/test/luatest_helpers/cluster.lua:44: in function 'drop' [001] # ...ica_initial_join_removal_of_compacted_run_files_test.lua:34: in function <...ica_initial_join_removal_of_compacted_run_files_test.lua:33> [001] # ... [001] # [C]: in function 'xpcall' [001] # Ran 1 tests in 1.682 seconds, 0 succeeded, 1 errored NO_WRAP Let's fix this by stopping the test server started by the vinyl-luatest/update_optimize test. NO_DOC=test NO_CHANGELOG=test
-
Vladimir Davydov authored
The idea behind the new test is the same as the one used by vinyl/select_consistency.test.lua: create a space with a few compound secondary indexes that share the first part, then run SELECT requests under heavy write load and check that results match. However, in comparison to its predecessor, the new test has a few improvements: 1. It generates DML requests in multi-statement transactions. 2. It checks non-unique indexes. 3. It checks multikey indexes. 4. It triggers L0 dumps not by box.snapshot, but by exceeding the box.cfg.vinyl_memory limit. 5. It starts 20 write and 5 read fibers. 6. It reruns the test after restart to check that recovery works fine. 7. It checks that there's no phantom statements stored in the space indexes after the test. 8. It runs the test with deferred DELETEs enabled and disabled. (see box.cfg.vinyl_defer_deletes). 9. It is written in luatest. The test takes about 20 seconds to finish so it's marked as long run. Closes #4251 NO_DOC=test NO_CHANGELOG=test
-
Ilya Verbin authored
Currently it throws an error when encounter binary data, print <binary> tag instead. Closes #7040 NO_DOC=bugfix
-
- Jul 04, 2022
-
-
Serge Petrenko authored
Our txn_limbo_is_replica_outdated check works correctly only when there is a stream of PROMOTE requests. Only the author of the latest PROMOTE is writable and may issue transactions. No matter synchronous or asynchronous. So txn_limbo_is_replica_outdated assumes that everyone but the node with the greatest PROMOTE/DEMOTE term is outdated. This isn't true for DEMOTE requests. There is only one server which issues the DEMOTE request, but once it's written, it's fine to accept asynchronous transactions from everyone. Now the check is too strict. Every time there is an asynchronous transaction from someone, who isn't the author of the latest PROMOTE or DEMOTE, replication is broken with ER_SPLIT_BRAIN. Let's relax it: when limbo owner is 0, it's fine to accept asynchronous transactions from everyone, no matter the term of their latest PROMOTE and DEMOTE. This means that now after a DEMOTE we will miss one case of true split-brain: when old leader continues writing data in an obsolete term, and the new leader first issues PROMOTE and then DEMOTE. This is a tradeoff for making async master-master work after DEMOTE. The completely correct fix would be to write the term the transaction was written in with each transaction and replace txn_limbo_is_replica_outdated with txn_limbo_is_request_outdated, so that we decide whether to filter the request or not judging by the term it was applied in, not by the term we seen in some past PROMOTE from the node. This fix seems too costy though, given that we only miss one case of split-brain at the moment when the user enables master-master replication (by writing a DEMOTE). And in master-master there is no such thing as a split-brain. Follow-up #5295 Closes #7286 NO_DOC=internal chcange
-
- Jul 01, 2022
-
-
Yaroslav Lobankov authored
The 'small' lib test suite was not run for out-of-source builds since the wrong symlink was created for test binaries and test-run couldn't find them. Now it is fixed. When test-run loads tests, first, it searches the suite.ini file and if it exists test-run consider the dir as a test suite. So there was sense to create a permanent link for 'small' lib tests. Closes #4485 NO_DOC=testing stuff NO_TEST=testing stuff NO_CHANGELOG=testing stuff
-
Vladimir Davydov authored
Vinyl doesn't support the hot standby mode. There's a ticket to implement it, see #2013. The behavior is undefined if running an instance in the hot standby mode in case the master has Vinyl spaces. It may result in a crash or even data corruption. Let's raise an explicit error in this case. Closes #6565 NO_DOC=bug fix
-
Vladimir Davydov authored
If a nested tuple field is indexed, it can be accessed by [*] aka multikey or any token: s = box.schema.create_space('test') s:create_index('pk') s:create_index('sk', {parts = {{2, 'unsigned', path = '[1][1]'}}}) t = s:replace{1, {{1}}} t['[2][1][*]'] -- returns 1! If a nested field isn't indexed (remove creation of the secondary index in the example above), then access by [*] returns nil. Call graph: lbox_tuple_field_by_path: tuple_field_raw_by_full_path tuple_field_raw_by_path tuple_format_field_by_path json_tree_lookup_entry json_tree_lookup And json_tree_lookup matches the first node if the key is [*]. We shouldn't match anything to [*]. Closes #5226 NO_DOC=bug fix
-
- Jun 30, 2022
-
-
Boris Stepanenko authored
Covered most of box_promote and box_demote with tests: 1. Promote/demote unconfigured box 2. Promoting current leader with elections on and off 3. Demoting follower with elections on and off 4. Promoting current leader, but not limbo owner with elections on 5. Demoting current leader with elections on and off 6. Simultaneous promote/demote 7. Promoting voter 8. Interfering promote/demote while writing new term to wal 9. Interfering promote/demote while waiting for synchro queue to be emptied 10. Interfering promote while waiting for limbo to be acked (similar to replication/gh-5430-qsync-promote-crash.test.lua) Closes #6033 NO_DOC=testing stuff NO_CHANGELOG=testing stuff
-
Serge Petrenko authored
The test failed with the following output: TAP version 13 1..3 # Started on Tue Jun 28 13:36:03 2022 # Starting group: pre-vote not ok 1 pre-vote.test_no_direct_connection # .../election_pre_vote_test.lua:46: expected: a value evaluating to true, actual: false # stack traceback: # .../election_pre_vote_test.lua:65: in function 'retrying' # .../election_pre_vote_test.lua:64: in function 'pre-vote.test_no_direct_connection' # ... # [C]: in function 'xpcall' ok 2 pre-vote.test_no_quorum ok 3 pre-vote.test_promote_no_quorum # Ran 3 tests in 6.994 seconds, 2 succeeded, 1 failed This is the moment when one of the followers disconnects from the leader and expects its `box.info.election.leader_idle` to grow. It wasn't taken into account that this disconnect might lead to leader resign due to fencing, and then a new leader would emerge and `leader_idle` would still be small. IOW, the leader starts with fencing turned off, and only resumes fencing, once it has connected to a quorum of nodes (one replica in this test). If the replica that we just connected happens to be the one we disconnect in the test, the leader might fence, if it hasn't yet connected to the other replica, because it immediately loses a quorum of healthy connections right after gaining it for the first time. Fix this by waiting until everyone follows everyone before each test case. The test, of course, could be fixed by turning fencing off, but this might hide any possible future problems with fencing. Follow-up #6654 Follow-up #6661 NO_CHANGELOG=test fix NO_DOC=test fix
-
Vladimir Davydov authored
Normally, there shouldn't be any upserts on disk if the space has secondary indexes, because we can't generate an upsert without a lookup in the primary index hence we convert upserts to replace+delete in this case. The deferred delete optimization only makes sense if the space has secondary indexes. So we ignore upserts while generating deferred deletes, see vy_write_iterator_deferred_delete. There's an exception to this rule: a secondary index could be created after some upserts were used on the space. In this case, because of the deferred delete optimization, we may never generate deletes for some tuples for the secondary index, as demonstrated in #3638. We could fix this issue by properly handle upserts in the write iterator while generating deferred delete, but this wouldn't be easy, because in case of a minor compaction there may be no replace/insert to apply the upsert to so we'd have to keep intermediate upserts even if there is a newer delete statement. Since this situation is rare (happens only once in a space life time), it doesn't look like we should complicate the write iterator to fix it. Another way to fix it is to force major compaction of the primary index after a secondary index is created. This looks doable, but it could slow down creation of secondary indexes. Let's instead simply disable the deferred delete optimization if the primary index has upsert statements. This way the optimization will be enabled sooner or later, when the primary index major compaction occurs. After all, it's just an optimization and it can be disabled for other reasons (e.g. if the space has on_replace triggers). Closes #3638 NO_DOC=bug fix
-
- Jun 29, 2022
-
-
Ilya Verbin authored
It doesn't make sense after switching from RDTSCP to clock_gettime(CLOCK_MONOTONIC). Part of #5869 @TarantoolBot document Title: fiber: get rid of cpu_misses in fiber.top() Since: 2.11 Remove any mentions of `cpu_misses` in `fiber.top()` description.
-
- Jun 27, 2022
-
-
Timur Safin authored
We did not retain correctly `hour` attribute if modified via `:set` method attributes `min`, `sec` or `nsec`. ``` tarantool> a = dt.parse '2022-05-05T00:00:00' tarantool> a:set{min = 0, sec = 0, nsec = 0} -- - 2022-05-05T12:00:00Z ... ``` Closes #7298 NO_DOC=bugfix
-
- Jun 24, 2022
-
-
Vladimir Davydov authored
The optimization is mostly useless, because it only works if there's no data on disk. As explained in #5080, it contains a potential bug: if L0 dump is triggered between 'prepare' and 'commit', it will insert a statement to a sealed vy_mem. Let's drop it. Part of #5080 NO_DOC=bug fix NO_CHANGELOG=later
-
Nikita Pettik authored
gh_6634_different_log_on_tuple_new_and_free_test.lua verifies that proper debug message gets into logs for tuple_new() and tuple_delete(): occasionally tuple_delete() printed wrong tuple address. However, still there are two debug logs: one in tuple_delete() and another one in memtx_tuple_delete(). So to avoid any possible confusions let's fix regular expression to find proper log so that now it definitely finds memtx_tuple_delete(). NO_CHANGELOG=<Test fix> NO_DOC=<Test fix>
-
Vladimir Davydov authored
Net.box triggers (on_connect, on_schema_reload) are executed by the net.box connection worker fiber so a request issued by a trigger callback can't be processed until the trigger returns execution to the net.box fiber. Currently, an attempt to issue a synchronous request from a net.box trigger leads to a silent hang of the connection, which is confusing. Let's instead raise an error until #7291 is implemented. We need to add the check to three places in the code: 1. luaT_netbox_wait_result for future:wait_result() 2. luaT_netbox_iterator_next for future:pairs() 3. conn._request for all synchronous requests. (We can't add the check to luaT_netbox_transport_perform_request, because conn._request may also call conn.wait_state, which would hang if called from on_connect or on_schema_reload trigger.) We also add an assertion to netbox_request_wait to ensure that we never wait for a request completion in the net.box worker fiber. Closes #5358 @TarantoolBot document Title: Synchronous requests are not allowed in net.box triggers An attempt to issue a synchronous request (e.g. `call`) from a net.box trigger (`on_connect`, `on_schema_reload`) now raises an error: "Synchronous requests are not allowed in net.box trigger" (Before https://github.com/tarantool/tarantool/issues/5358 was fixed, it silently hung.) Invoking an asynchronous request (see `is_async` option) is allowed, but the request will not be processed until the trigger returns and an attempt to wait for the request completion with `future:pairs()` or `future:wait_result()` will raise the same error.
-
- Jun 23, 2022
-
-
Vladimir Davydov authored
exclude_null is a special index option, which makes the index ignore tuples that contain null in any of the indexed fields. Currently, it doesn't work for json and multikey indexes, because: 1. index_filter_tuple ignores json path. 2. index_filter_tuple ignores multikey index. Issue no. 1 is easy to fix - we just need to use tuple_field_by_part instead of tuple_field when checking if a key field is null. Issue no. 2 is more complicated, because when we call index_filter_tuple we don't know the multikey index. We address this issue by pushing the index_filter_tuple call down to engine-specific index implementation. For Vinyl, we make vy_stmt_foreach_entry, which iterates over multikey tuple entries, skip entries that contain nulls. For memtx, we move the check to index-specific index_replace function implementation. Fortunately, only tree indexes support nullable fields so we just need to update the memtx tree implementation. Ideally, we should handle multikey indexes in memtx at the top level, because the implementation should essentially be the same for all kinds of indexes, but this refactoring is complicated and will be done later. For now, just fix the bug. Closes #5861 NO_DOC=bug fix
-
Vladimir Davydov authored
For some reason, some test cases create memtx spaces irrespective of the value of the engine parameter. NO_DOC=test NO_CHANGELOG=test
-
- Jun 22, 2022
-
-
Nikita Pettik authored
NO_CHANGELOG=<No functional changes> NO_DOC=<Later for EE>
-
Nikita Pettik authored
These fields correspond to the tuple before DML request is executed (old); and after - result (new). For example let index stores tuple {1, 1}: replace{1, 2} -- old == {1, 1}, new == {1, 2} These fields rather make sense for update operation, which holds a key and an array of update operations (not the old tuple). `old_tuple`, `new_tuple` are going to be used as WAL extensions available in enterprise version. Alongside with it let's reserve 0x2c and 0x2d Iproto keys for these members. NO_DOC=<No functional changes> NO_TEST=<No functional changes> NO_CHANGELOG=<No functional changes>
-
Georgiy Lebedev authored
On insertion, when point holes are checked on insertion, we must only conflict transactions other than the one that read the hole. NO_CHANGELOG=internal bugfix NO_DOC=bugfix Closes #7234 Closes #7235
-
Georgiy Lebedev authored
When full scans are checked on writes, we must only conflict transactions other than the one that did the full scan. NO_CHANGELOG=internal bugfix NO_DOC=bugfix Closes #7221
-
- Jun 21, 2022
-
-
Vladimir Davydov authored
Commit 4d52199e ("box: fix transaction "read-view" and "conflicted" states") updated vy_tx_send_to_read_view so that now it aborts all RW transactions right away instead of sending them to read view and aborting them on commit. It also updated vy_tx_begin_statement to fail if a transaction sent to a read view tries to do DML. With all that, we assume that there cannot possibly be an RW transaction sent to read view so we have an assertion checking that in vy_tx_commit. However, this assertion may fail, because a DML statement may yield on disk read before it writes anything to the write set. If this is the first statement in a transaction, the transaction is technically read-only and we will send it to read-view instead of aborting it. Once it completes the disk read, it will apply the statement and hence become read-write, breaking our assumption in vy_tx_commit. Fix this by aborting RW transactions sent to read-view in vy_tx_set. Follow-up #7240 NO_DOC=bug fix NO_CHANGELOG=unreleased
-
- Jun 17, 2022
-
-
Cyrill Gorcunov authored
When fiber has finished its work it ended up in two cases: 1) If no "joinable" attribute set then the fiber is simply recycled 2) Otherwise it continue hanging around waiting to be joined. Our API allows to call fiber_wakeup() for dead but joinable fibers (2) in release builds without any side effects, such fibers are simply ignored, in turn for debug builds this causes assertion to trigger. We can't change our API for backward compatibility sake but same time we must not preserve different behaviour between release and debug builds since this brings inconsistency. Thus lets get rid of assertion call and allow to call fiber_wakeup in debug build as well. Fixes #5843 NO_DOC=bug fix Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Serge Petrenko authored
Once the split-brain detection is in place, it's fine to nopify obsolete data even on a node with elections disabled. Let's not keep a bug around anymore. This behaviour change leads to changing "gh_6842_qsync_applier_order_test.lua" a bit. It actually relied on old and buggy behaviour: it assumed old transactions would not be nopified and would trigger replication error. This doesn't happen anymore, because nopify works correctly, and the transactions are not followed by a conflicting CONFIRM. The test for this commit is simply altering the gh_5295_split_brain_detection_test.lua to work with elections disabled. Closes #6133 Follow-up #5295 NO_DOC=internal change NO_CHANGELOG=internal change
-
Cyrill Gorcunov authored
When we receive synchro requests we can't just apply them blindly because in worst case they may come from split-brain configuration (where a cluster split into several clusters and each one has own leader elected, then clusters are trying to merge back into the original one). We need to do our best to detect such disunity and force these nodes to rejoin from the scratch for data consistency sake. Thus when we're processing requests we pass them to the packet filter first which validates their contents and refuse to apply if they violate consistency. Depending on request type each packet traverses an appropriate chain. filter_generic(): a common chain for any synchro packet. 1) request:replica_id = 0 allowed for PROMOTE request only. 2) request:replica_id should match limbo:owner_id, IOW the limbo migration should be noticed by all instances in the cluster. filter_confirm_rollback(): a chain for CONFIRM | ROLLBACK packets. 1) Zero lsn is disallowed for such requests. filter_promote_demote(): a chain for PROMOTE | DEMOTE packets. 1) The requests should come in with nonzero term, otherwise the packet is corrupted. 2) The request's term should not be less than maximal known one, iow it should not come in from nodes which didn't notice raft epoch changes and living in the past. filter_queue_boundaries(): a common finalization chain. 1) If LSN of the request matches current confirmed LSN the packet is obviously correct to process. 2) If LSN is less than confirmed LSN then the request is wrong, we have processed the requested LSN already. 3) If LSN is greater than confirmed LSN then a) If limbo is empty we can't do anything, since data is already processed and should issue an error; b) If there is some data in the limbo then requested LSN should be in range of limbo's [first; last] LSNs, thus the request will be able to commit and rollback limbo queue. Note the filtration is disabled during initial configuration where we apply requests from the only source of truth (either the remote master, or our own journal), so no split brain is possible. In order to make split-brain checks work, the applier nopify filter now passes synchro requests from obsolete term without nopifying them. Also, now ANY asynchronous request coming from an instance with obsolete term is treated as a split-brain. Think of it as of a syncrhonous request committed with a malformed quorum. Closes #5295 NO_DOC=it's literally below Co-authored-by:
Serge Petrenko <sergepetrenko@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> @TarantoolBot document Title: new error type: ER_SPLIT_BRAIN If for some reason the cluster had 2 leaders working independently (for example, user has mistakenly lovered the quorum below N / 2 + 1), then once such leaders and their followers try connecting to each other, they will receive the ER_SPLIT_BRAIN error, and the connection will be aborted. This is done to preserve data integrity. Once the user notices such an error he or she has to manually inspect the data on both the split halves, choose a way to restore the data, and rebootstrap one of the halves from the other.
-
Serge Petrenko authored
It's important for the synchro queue owner to not finalize any of the pending synchronous transactions after restart. Since the node was down for some time the chances are pretty high it was deposed by some new leader during its downtime. It means that the node might not know yet that it's transactions were already finalized by someone else. So, any arbitrary finalization might lead to a future split-brain, once the remote PROMOTE finally reaches the local node. Let's fix this by adding a new reason for the limbo to be frozen - a queue owner has recovered but has not issued a new PROMOTE locally and hasn't received any PROMOTE requests from the remote nodes. Once the first PROMOTE is issued or received, it's safe to return to the old mode of operation. So, now the synchro queue owner starts in "frozen" state and can't CONFIRM, ROLLBACK or issue new transactions until either issuing a PROMOTE or receiving a PROMOTE from some remote node. This also required modifying box.ctl.promote() behaviour: it's no longer a no-op on a synchro queue owner, when elections are disabled and the queue is frozen due to restart. Also fix the tests, which assumed the queue owner is writeable after a restart. gh-5298 test was partially deleted, because it became pointless. And while we are at it, remove the double run of gh-5288 test. It is storage engine agnostic, so there's no point in running it for both memtx and vinyl. Part-of #5295 NO_CHANGELOG=covered by previous commit @TarantoolBot document Title: ER_READONLY error receives new reasons When box.info.ro_reason is "synchro" and some operation throws an ER_READONLY error, this error now might include the following reason: ``` Can't modify data on a read-only instance - synchro queue with term 2 belongs to 1 (06c05d18-456e-4db3-ac4c-b8d0f291fd92) and is frozen due to fencing ``` This means that the current instance is indeed the synchro queue owner, but it has noticed, that someone else in the cluster might start new elections or might overtake the synchro queue soon. This may be also detected by `box.info.election.term` becoming greater than `box.info.synchro.queue.term` (this is the case for the second error message). There is also a slightly different error message: ``` Can't modify data on a read-only instance - synchro queue with term 2 belongs to 1 (06c05d18-456e-4db3-ac4c-b8d0f291fd92) and is frozen until promotion ``` This means that the node simply cannot guarantee that it is still the synchro queue owner (for example, after a restart, when a node still thinks it is the queue owner, but someone else in the cluster has already overtaken the queue).
-