- Jan 24, 2023
-
-
Serge Petrenko authored
There is a false assertion in raft_stop_candidate(): it assumes that the node must always have a running timer whenever it sees the leader. This is not true when the node is busy writing the new term on disk. Cover the mentioned case in the assertion. Closes #8169 NO_DOC=bugfix Co-authored-by:
Sergey Ostanevich <sergos@tarantool.org>
-
Vladimir Davydov authored
We define flight_recorder_cfg struct, box_get_flightrec_cfg function, and box.internal.cfg_configure_flightrec Lua function in the CE repository although they are actually needed only in the EE repository. Let's drop them all from the CE repository and instead define stub functions box_check_flightrec and box_set_flightrec that would check/apply box.cfg flight recorder parameters. While we are at it, add missing comments to flightrec function stubs. NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Serge Petrenko authored
The title is pretty self-explanatory. That's all this commit does. Now a couple of words on why this is needed. Commit 2a0c4f2b ("replication: make replica subscribe to master's ballot") changed replica connect behaviour: instead of holding a single connection to the master, replica may have two: master's ballot retrieval is now performed in a separate connection owned by a separate fiber called ballot_watcher. First connection to master is initialized as always and then applier fiber creates the ballot_watcher, which connects to the same address on its own. This lead to some unexpected consequences: random cartridge integration tests started failing with the following error: tarantool/tarantool/cartridge/test-helpers/cluster.lua:209: "localhost:13303": Replication setup failed, instance orphaned Here's what happened. Cartridge has a module named remote control. The module mimics a tarantool server and "listens" on the same socket the tarantool is intended to listen before box.cfg{listen=...} is called. For example one can see such output in tarantool logs with cartridge: NO_WRAP 13:07:43.210 [10265] main/132/applier/admin@localhost:13301 I> remote master 46a71a25-4328-4a41-985d-d93d6ed7fb7f at 127.0.0.1:13301 running Tarantool 2.11.0 13:07:43.210 [10265] main/133/applier/admin@localhost:13302 I> remote master 00000000-0000-0000-0000-000000000000 at 127.0.0.1:13302 running Tarantool 1.10.0 13:07:43.210 [10265] main/134/applier/admin@localhost:13303 I> remote master bcce45ad-38b7-4d8a-936a-133614a7775f at 127.0.0.1:13303 running Tarantool 2.11.0 NO_WRAP The second "Tarantool" in the output (with zero instance uuid and running Tarantool 1.10.0) is the remote control on an unconfigured tarantool instance. Before splitting applier connection in two, this was no problem: applier would try to get the instance's ballot from a remote control listener and fail (remote control doesn't answer to replication requests). Applier would retry connecting to the same address until it got a reply, meaning that remote control is stopped and real tarantool became listening on the socket. Now applier has two connections, and the following situation became possible: when applier connection is initialized, remote control is still working, and applier is connected to the remote control instance. Applier performs ballot receipt in a separate fiber, which's not yet initialized, so no errors are raised. As soon as applier creates the ballot watcher, remote control is stopped and the real tarantool starts listening on the socket. This means that no error happens in the ballot watcher as well (normal tarantool answers to replication requests, of course). And we get to an unhandled situation when applier itself is connected to (already dead) remote control instance, while its ballot watcher is connected to the real tarantool. As soon as applier sees the ballot is fetched, it continues connection process to the already dead remote control instance and gets an error: NO_WRAP 13:07:44.214 [10265] main/133/applier/admin@localhost:13302 I> failed to authenticate 13:07:44.214 [10265] main/133/applier/admin@localhost:13302 coio.c:326 E> SocketError: unexpected EOF when reading from socket, called on fd 1620, aka 127.0.0.1:54150: Broken pipe 13:07:44.214 [10265] main/133/applier/admin@localhost:13302 I> will retry every 1.00 second 13:07:44.214 [10265] main/115/remote_control/127.0.0.1:50242 C> failed to synchronize with 1 out of 3 replicas 13:07:44.214 [10265] main/115/remote_control/127.0.0.1:50242 I> entering orphan mode NO_WRAP Follow-up #5272 Closes #8185 NO_CHANGELOG=not user-visible NO_DOC=not user-visible (can't create Tarantool with zero uuid)
-
- Jan 23, 2023
-
-
Georgiy Lebedev authored
When we rollback a prepared statement that deletes an MVCC story, we need to reset the deleted story's PSN. Closes #7930 NO_DOC=bugfix
-
Georgiy Lebedev authored
During transaction rollback, we unconditionally assign a PSN to it: we should do this only when necessary, i.e., a transaction is RW and is not already prepared. Needed for #7930 NO_CHANGELOG=refactoring NO_DOC=refactoring NO_TEST=refactoring
-
Georgiy Lebedev authored
Currently, if transaction preparation fails, the transaction is left in an inconsistent state: it has a PSN assigned to it, but its status is not 'prepared' — fix this by resetting its PSN. Needed for #7930 NO_CHANGELOG=refactoring NO_DOC=refactoring NO_TEST=refactoring
-
Georgiy Lebedev authored
During preparation of insert statements in MVCC, we define an old story and abort all transactions that delete this story. If there exists an older story in the history chain, but the story is deleted by a prepared (not necessarily committed) transaction, we consider that it de-facto does not exist anymore — this logic is consistent, since during preparation of the transaction deleting this story, the conflict resolution described above was already done. In this manner, there can be no more than one prepared statement deleting a story at any point in time. Closes #8104 NO_DOC=bugfix
-
- Jan 19, 2023
-
-
Vladislav Shpilevoy authored
In a few places visible to users and in iproto naming the term "cluster" really means "replicaset". One of those places is a part of public API - box.iproto.key.CLUSTER_UUID - which is not yet released. The commit renames "cluster" in those places as a preparation for introduction of actual "cluster", like a set of replicasets. It will start from introduction of cluster name in addition to replicaset uuid/name. There are places which still mention 'cluster', but their rename would be breaking. It will be addressed in scope of a bigger patchset. Part of #5029 NO_CHANGELOG=Was not released @TarantoolBot document Title: Rename `IPROTO_CLUSTER_UUID` to `IPROTO_REPLICASET_UUID` This is a name for one of the IProto keys. The key value doesn't change and the protocol is still backward compatible. But better rename it to `IPROTO_REPLICASET_UUID`, because in future `IPROTO_CLUSTER_UUID` will most likely mean a different thing.
-
Serge Petrenko authored
The event is used by appliers as a better alternative to IPROTO_VOTE. Besides, event subscribers receive exactly the same payload as the ones sending IPROTO_VOTE. So there's no need to guard against subscription to this particular event as long as IPROTO_VOTE isn't guarded. Follow-up #5272 NO_DOC=no user-visible changes NO_CHANGELOG=no user-visible changes NO_TEST=tested by ee
-
- Jan 18, 2023
-
-
Ilya Verbin authored
The function remove_root_directory, which is used for obtaining module names for per-module logging, throws an error when current working directory is `/'. Rewrite it to fix the bug and rename it to strip_cwd_from_path to make the name more clear. Closes #8158 NO_DOC=unreleased NO_CHANGELOG=unreleased
-
Serge Petrenko authored
See the docbot request for details. Closes #5272 @TarantoolBot document Title: new `bootstrap_strategy` configuration option Default behaviour of replica set bootstrap, replica recovery when connecting to remote nodes and replication reconfiguration is changed. The new behaviour is controlled by the option `bootstrap_strategy`, which has the default value "auto". Now `replication_connect_quorum` configuration option takes no effect, and the effective quorum value for each stage of configuration (quorum of established connections, quorum of synced nodes) is determined automatically. On replica set bootstrap, the nodes will refuse to boot, unless a majority is reached (this would mean replication_connect_quorum = 3, when #box.cfg.repilcation is 4 or 5, for example, or replication_connect_quorum = 2, when #box.cfg.replication is 2 or 3). Moreover, the bootstrap leader will fail to boot unless it sees that every connected node chose it as the bootstrap leader. On new replica join to an existing cluster, the replica will fail to boot only if it couldn't connect to anyone. As long as at least one connection is established, the replica will try to join like before. Moreover, the replica will check that its box.cfg.replication table contains every registered node in the cluster, thus ensuring that it has tried to connect to everyone and chose the best bootstrap leader possible. On replication reconfiguration on a working instance and recovery from local WAL files, the node will try to connect to everyone specified in box.cfg.replication. Any number of connections (even no connections) will be deemed a success, but the replica will stay in orphan mode until it is synced with everyone connected. If you wish to return to the old behavior, a deprecated setting `bootstrap_strategy` = "legacy" is left for now. With `bootstrap_strategy` = "legacy", the node behaves exactly like before: quorum for both connection and synchronisation is determined by `replication_connect_quorum`, and neither bootstrap leader nor joining replicas perform any additional checks on bootstrap.
-
Serge Petrenko authored
The only observable behaviour of non-zero replication_sync_timeout is that it delays box.cfg{replication=...} return until either the node is synced with others or the timeout passes. If the timeout passes without reaching sync, box.cfg{} is exited and the node enters "orphan" state, in which it can't write anything until either a reconfiguration happens or replicaset is finally synced. While the previous box.cfg{} call is running (probably waiting for replication_sync_timeout), the user can't issue another box.cfg{} call. So basically, while giving no guarantees that the node exits box.cfg{} in fully synced state, the timeout makes reconfiguration harder: even if the user knows that the sync won't be achieved, he will have to wait until the full timeout passes in order to reconfigure replication. Let's make the default value of replication_sync_timeout 0 instead of 300 seconds. The user still may set the timeout to whatever he likes. Besides, we have recently introduced box.ctl.on_recovery_state triggers, which have a "synced" event, and this is the new recommended way to wait until the node is synced with others. Part-of #5272 @TarantoolBot document Title: Changed default value for `box.cfg.replication_sync_timeout The default value for `replication_sync_timeout` configuration option was changed from 300 seconds to 0.
-
Serge Petrenko authored
Now the instance appends a list of registered replica set members it knows of to its ballot. Prerequisite #5272 NO_CHANGELOG=not user-visible @TarantoolBot document Title: New fields in instance's ballot. Instance's ballot (a response to IPROTO_VOTE sent on replica connect) receives two new fields: 1) The uuid of the node this instance considers the bootstrap leader. Key: IPROTO_BALLOT_BOOTSTRAP_LEADER_UUID = 0x08 Value: uuid, encoded as 36-byte string (like "bfd2b31c-b740-43e5-bf3c-28538a74c9a6"). 2) An array of registered replica set members uuids. Key: IPROTO_BALLOT_REGISTERED_REPLICA_UUIDS = 0x09 Value: a MP_ARRAY of uuids, each uuid encoded as a 36-byte string (like in an example above).
-
Serge Petrenko authored
Note that bootstrap leader uuid is not set when an anonymous replica registers, because technically it's not performing a bootstrap. Prerequisite #5272 NO_DOC=appended to next commit's doc request NO_CHANGELOG=not user-visible
-
Serge Petrenko authored
Previously replicas chose the remote master to boot from by comparing master ballot, which are received in response to IPROTO_VOTE request right on connection init. Such information is not enough in some scenarios. For example, when implementing anonymous replicas and retrying relica join, we had to restart all connections in order to get the latest ballot information. Let's change that: make replica subscribe to the built-in "internal.ballot" event instead of relying on request-response scheme of IPROTO_VOTE. Now replicas always have up-to-date ballot information and there is no need to reinitialize connections to update the ballots. Introduce a new fiber running in tx thread for this purpose: applier ballot watcher. The fiber subscribes on "internal.ballot" event and watches it all the time while the connection to master is alive. In case the master isn't aware of IPROTO_WATCH request or of "internal.ballot" event, old behaviour is also implemented: ballot watcher simply waits for IPROTO_VOTE response and exits. The ballot watcher is started whenever replica tries to connect or reconnect to the remote master and is cancelled whenever its parent connection to the master is closed. We do not put much effort into restarting the fiber and retrying to connect in case it fails. For now ballot info is only used during bootstrap, and not trying to keep the fiber alive at all costs simplifies the code quite a lot. Later on ballot subscriptions will play a more significant role in choosing the bootstrap leader: replicas will re-check remote ballots every now and then during the bootstrap leader election. Part-of #5272 NO_CHANGELOG=internal change NO_TEST=tested by existing replication tests NO_DOC=internal change
-
Serge Petrenko authored
Extract common connection initialization code in a helper. It'll be used in the next commit by auxiliary fibers connecting to the same master. Part-of #5272 NO_CHANGELOG=refactoring NO_TEST=refactoring NO_DOC=refactoring
-
Serge Petrenko authored
Extract ballot body decoding logic from xrow_decode_ballot, it will be reused to decode "internal.ballot" event in the next commit. Prerequisite #5272 NO_CHANGELOG=refactoring NO_TEST=refactoring NO_DOC=refactoring
-
Serge Petrenko authored
Add a new builtin event carrying instance's ballot information (that is, what this instance would normally send in reply to IPROTO_VOTE request). The event will be watched by connecting replicas to find the bootstrap leader. Prerequisite #5272 NO_DOC=technically user-visible, but not intended for users NO_CHANGELOG=see NO_DOC
-
Serge Petrenko authored
In-scope-of #5272 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Serge Petrenko authored
box.iproto table with iproto features and constants was exported to Lua in commit fe89aabe ("box: export IPROTO constants and features to Lua"). Add the table to the whitelist of what's available even before box.cfg. Prerequisite #5272 Closes #8053 NO_DOC=intermediate state wasn't released, no changes necessary NO_CHANGELOG=see NO_DOC NO_TEST=used in next commit's tests
-
Serge Petrenko authored
Extract mp_sizeof_ballot_max() and mp_encode_ballot() helpers from iproto_reply_vote(), since they will be used by builtin "internal.ballot" event soon. While I'm at it, fix mp_sizeof_ballot() calculation: add forgotten map element and replace mp_sizeof_uint(UINt*_MAX) with sizes of actual values to be encoded. Prerequisite #5272 NO_CHANGELOG=refactoring NO_TEST=refactoring NO_DOC=refactoring
-
Georgiy Lebedev authored
Prompt bookkeeping introduced in 66ca6252 is not thread-safe, whilst the logging environment is multithreaded: leave this feature only in main (transaction) thread. Closes #8124 NO_CHANGELOG=<gh-7169 was not release yet> NO_DOC=bugfix NO_TEST=<hard to make Tarantool flood log from multiple threads using current test harness>
-
Georgiy Lebedev authored
Both of the callbacks in the `print` wrapper are expected to be called, but `print` may throw errors, e.g., `print(setmetatable({}, {__tostring = error})`, so we need to call it in a protected environment and execute the 'after' callback even if `print` throws. Closes #8136 NO_CHANGELOG=<gh-7186 was not released yet> NO_DOC=bugfix
-
- Jan 17, 2023
-
-
Serge Petrenko authored
Every libunwind error during backtrace collection is reported with `say_error`. Since commit 19abfd2a ("misc: get rid of fiber_gc") backtraces are collected on each fiber gc allocation, of which there are plenty. For some reason (https://github.com/tarantool/tarantool/issues/7980) each unw_step fails on mac, and an error is spammed to instance logs, even though the backtrace is actually collected. Silence the errors, since there is no much use for them anyway. And silence all of them just to be consistent. This doesn't close #7980, because that issue still needs a proper fix. Although its severity is ameliorated now. In-scope-of #7980 NO_DOC=bugfix NO_CHANGELOG=bugfix NO_TEST=nothing to test
-
- Jan 16, 2023
-
-
Vladimir Davydov authored
lbox_push_event_f and lbox_push_event_f callback functions used for passing the statement between txn and space on/before replace Lua triggers don't assume that the transaction may be aborted by yield after the current statement began (this may happen if a trigger callback yields). In this case, all statements in txn would be rolled back and txn_current_stmt would return NULL, leading to a crash. Let's fix this by checking if the transaction is still active and raising an error immediately if it isn't, thus skipping Lua triggers. Notes: - We merged lbox_pop_txn_stmt_and_check_format into lbox_pop_txn_stmt, because the latter is only called by the former. - Since lbox_push_event_f callback may now fail, we have to update lbox_trigger_run to handle it. Closes #8027 NO_DOC=bug fix
-
- Jan 13, 2023
-
-
Vladimir Davydov authored
A remote space object presented by a net.box connection mimics the API of a local space object presented by box.space. Currently, it misses information about sequences. Let's add it. Note, we have to handle the case when the recently introduced _vspace_sequence system space view is missing on the remote host. To check that this works correctly, we reuse the 2.10.4 test data created by commit 1c33484d ("box: add auth_history and last_modified fields to _user space"). We also add the 'gen.lua' that can be used to regenerated the data. Closes #7858 NO_DOC=bug fix
-
Vladimir Davydov authored
Note, this patch will be backported to 2.10 so we add upgrade function for 2.10.5, not for 2.11.0. Needed for #7858 @TarantoolBot document Title: Document `_space_sequence` and `_vspace_sequence` system spaces The `_space_sequence` system space was added long time ago (in 1.7.5) along with the `_sequence` and `_sequence_data` system spaces, but it was never documented. The space is used to attach sequences to spaces and has the following fields: 1. 'id', type 'unsigned'. Space id. 2. 'sequence_id', type 'unsigned'. Id of the attached sequence. 3. 'is_generated', type 'boolean'. True if the sequence was created automatically (`space:create_index('pk', {sequence = true})`) 4. 'field', type 'unsigned'. Id of the space field that is set using the attached sequence. 5. 'path', type 'string'. Path to the data within the field that is set using the attached sequence. The `_vspace_sequence` is a system space view of the `_space_sequence` space that, like any other system space view, shows only rows accessible by the current user. It will be introduced in Tarantool 2.10.5.
-
Vladimir Davydov authored
Since commit 85ebbcc0 ("box: reset system space formats for bootstrap"), it's illegal to use field names in the upgrade script. The commit missed one place: upgrade of collation strength. Fix it. NO_DOC=unreleased NO_CHANGELOG=unreleased NO_TEST=checked by upgrade script tests
-
Aleksandr Lyapunov authored
If the first argument of box.atomic is a non-callable table then consider it as options table for box.begin{}. For test and debug purposes introduce internal getter of current transaction isolation level as box.internal.txn_isolation(). Closes #7202 @TarantoolBot document Title: Options in box.atomic Now it's allowed to pass transaction options in the first argument of box.atomic(..) call. The options must be a table, exactly as in box.begin(..). If options are passed as the first arguments, the second and the rest arguments are expected to be a functions and its arguments, like in usual box.atomic.
-
Aleksandr Lyapunov authored
When a transaction is in read-confirmed state it must ignore all prepared changes, and if it actually ignores something - it must fall to read-view state. By a mistake the check relied not on actual skipping of a prepared statement, but on the fact that there is a deleting statement. That leads to excess conflicts for transactions with read-committed isolation level. Fix it by raising a conflict only if a deleting statement is skipped. Closes #8122 Needed for #7202 NO_DOC=bugfix
-
Aleksandr Lyapunov authored
Read lists (read set and other similar lists) are used only for detecting a conflict when another transaction is committed. Once a transaction is prepared (no matter with success or not) those lists are no more needed. Moreover, in some part of code it is expected that there can be no read set of already prepared tx. So let's clean those lists once a transaction is prepared. Closes #7945 NO_DOC=bugfix
-
Aleksandr Lyapunov authored
There ware two functions - check_dup_clean and check_dup_dirty. Merge them to one. Also extract phantom checks from check_dup and call them explicitly. That will additionally simplify check_dup and will allow to get rid of temporary conflict trackers - memtx_tx_conflict. Note that this kind of object will remain in memory monitoring by now. It will be removed later. No logical changes. Part of #8122 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Aleksandr Lyapunov authored
If a transaction writes to a gap that is tracked by some another transaction, the interval must be (usually) broken into parts, while new story must be explicitly added to the read set of reading transaction. Now both read tracker and conflict trackers are set in this case. But read tracker is enough in this case - when the writing tx is prepared it will conflict reading transaction, that's all we need. Let's leave only read tracker. Part of #8122 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Aleksandr Lyapunov authored
Each story has two ends: the beginning and the end. For each transaction both ends of a story could be visible or not. Now there's a function that checks visibility of both ends of a story. It can distinguish three cases: both ends are visible, both ends are invisible, and the beginning is visible while the end is not. The function returns true in the first and the last cases; the actual case is clarified with an additional function argument - visible_tuple, which is set to null in one of the cases.. Let's make two different functions for checking visibility of the beginning and the end of a story. Actually that is simple split of function into two parts. The visible_tuple argument will no longer be needed. No logical changes. Part of #8122 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Aleksandr Lyapunov authored
There's no harm but also no sense in it. Part of #8122 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Aleksandr Lyapunov authored
Hide structures and functions that are not required for API. No logical changes. Part of #8122 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Aleksandr Lyapunov authored
Option memtx_tx_manager_use_mvcc_engine changes the behavior of transaction execution workflow. Usually that is implemented as direct check of memtx_tx_manager_use_mvcc_engine. But there are places in the code that rely on the fact that some pointers are set to not null if the engine is enabled. That's a bit confusing. Let's always check for memtx_tx_manager_use_mvcc_engine option when it's needed to determine which workflow must be executed. Note that checking of memtx_tx_manager_use_mvcc_engine option is more correct: in case of delete of nothing (delete statement when a tuple was not found by given key) all the pointers including old_tuple and new_tuple are null, while logically we still need to use mvcc execution workflow. Note also that in this case the mvcc engine does (and must do) almost nothing, so there was no bug in the previous behaviour. Part of #8122 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Aleksandr Lyapunov authored
No logical changes. Part of #8122 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
- Jan 12, 2023
-
-
Vladimir Davydov authored
The status isn't used anywhere - to set the proper error when an aborted transaction is attempted to be used, we check out transaction flags (TXN_IS_CONFLICTED, TXN_IS_ABORTED_BY_YIELD, TXN_IS_ABORTED_BY_TIMEOUT). Let's use TXN_ABORTED instead. While we are at it, also set the transaction status to TXN_ABORTED when a transaction is aborted by yield or timeout and use it instead of checking flags where appropriate, since it's more convenient. Follow-up #8123 NO_DOC=code cleanup NO_TEST=code cleanup NO_CHANGELOG=code cleanup
-
Vladimir Davydov authored
We fail write statements if the current transaction was aborted by yield or timeout. We should fail read-only statements in this case, as well. Note, we already fail read-only statements if the current transaction was aborted by conflict. Closes #8123 NO_DOC=bug fix
-