- Dec 07, 2023
-
-
Nikolay Shirokovskiy authored
We need to call `tx_accept_msg` in `tx_process_override` before we pass message to the override handler. Unfortunately if handler response with IPROTO_HANDLER_FALLBACK we call the builtin handler for message that calls `tx_accept_msg` again which is not expected. Some actions of this function are idempotent and some are not. Let's make the function NOP if it called once again. Closes #9345 NO_DOC=bugfix
-
- Dec 06, 2023
-
-
Alexander Turenko authored
This commit allows to bootstrap an anonymous replica from a replicaset, where all the instances are in read-only mode. The reason of the change is that there are no technical reasons to forbid this action. An anonymous replica is not registered in `_cluster` system space, so it can join a replicaset even if there are no writable instances. Fixes #9432 @TarantoolBot document Title: config: anonymous replica is now supported `replication.anon: true` option is now working. There are configuration constraints that are related to anonymous replicas. * A replicaset must contain at least one non-anonymous instance. * An anonymous replica can't be configured as writable instance using `database.mode` or `<replicaset>.leader` options. * An anonymous replica can't be configured with `replication.election_mode` equals to `candidate`, `voter` or `manual` (only `off` is allowed). A few more nuances about anonymous replicas: * Anonymous replicas are filtered out from default upstream list. * A `replication.failover: election` replicaset can contain anonymous replicas, but `replication.election_mode` defaults to `off` for them (unlike non-anonymous instances, where the default is `candidate`). * `replication.failover: supervised` skips anonymous replicas, when choosing a bootstrap leader. * A anonymous replica can joined a replicaset, which has all the instances in read-only mode (unlike a non-anonymous instance). See details in [1] and [2]. [1]: https://github.com/tarantool/tarantool/issues/9432 [2]: https://github.com/tarantool/tarantool/pull/9418
-
Alexander Turenko authored
This commit adds several checks that are specific for `replication.failover` mode. * `replication.failover: off`: an anonymous replica shouldn't be set to read-write mode using `database.mode` option. * `replication.failover: manual`: an anonymous replica shouldn't be configured as a replicaset leader using `<replicaset>.leader` option. * `replication.failover: election`: an anonymous replica can't be configured with `replication.election_mode` other than `off`. This commit also adjusts default `replication.election_mode` to `off` for an anonymous replica if it is part of a `replication.failover: election` replicaset (the default for a non-anonymous instance is `candidate`). Part of #9432 NO_DOC=The documentation request is in the last commit of the series.
-
Alexander Turenko authored
A replicaset that contains only anonymous replicas can't be bootstrapped, because all the instances must be in read-only mode. Part of #9432 NO_DOC=The documentation request is in the last commit of the series.
-
Alexander Turenko authored
Filter out anonymous replicas when choosing a bootstrap leader in `replication.failover: supervised` mode. An anonymous replica can't be in read-write mode, so it can't be a replicaset bootstrap leader. Part of #9432 NO_DOC=It is bugfix. However, this detail is mentioned in the documentation request is in the last commit of the series just in case.
-
Alexander Turenko authored
This commit effectively allows to set `replication.anon: true` without specifying `replication.peers`. Without filtering out anonymous replicas from the list of upstreams we get an error regarding attempt to use an anonymous replica as an upstream for a non-anonymous instance. Also, anonymous replicas are excluded from autogenerated upstream list for other anonymous replicas. It makes the list the same on all the peers. A user can configure a custom data flow using `replication.peers` option. Part of #9432 NO_DOC=The documentation request is in the last commit of the series.
-
Alexander Turenko authored
The commit effectively enables support of anonymous replicas in the declarative configuration. It has several caveats (see the changelog entry), which will be resolved in the following commits of the patchset. An attempt to persist an instance name of an anonymous replica can't be successful, because it has no entry in `_cluster` system space. Such an attempt leads to ER_INSTANCE_NAME_MISMATCH error. This commit patches the configuration applying logic to skip attempt to set `box.cfg({instance_name = <...>})` if the instance is configured as an anonymous replica using `replication.anon: true` option. Part of #9432 NO_DOC=replication.anon option is already documented in the scope of https://github.com/tarantool/doc/issues/3851. The bugfix shouldn't affect the documentation pages much, however related constraints are summarized in a documentation request in the last commit of the series.
-
Alexander Turenko authored
NO_DOC=code owners file change, no code changes NO_TEST=see NO_DOC NO_CHANGELOG=see NO_DOC
-
Sergey Ostanevich authored
Some changes are not new features, rather developer tools updates and improvements. There are also number of tweaks we can introduce to improve testing and tests backporting across branches, which also are not considered neither feature nor bug fix. NO_DOC=changelog NO_TEST=changelog NO_CHANGELOG=changelog
-
Nikita Zheleztsov authored
Sometimes the test fails with "Peer closed" error, logs says, that fatal error happened: cfg_get('read_only'). This is caused by the instance processing its own ballot. The problem is that we set box.cfg to function in before_all trigger, so it's impossible to access box.cfg table during the whole time of test execution. Let's instead set box.cfg to function at the start of every test and restore box.cfg at the end. This way we'll decrease the time, in which such fatal error can happen. Even though it's still possible to get it in theory, the problem is not reproduced anymore. The alternative solution of introducing errinj seems to be overhead here. Closes tarantool/tarantool-qa#329 NO_DOC=testfix NO_CHANGELOG=testfix
-
Nikita Zheleztsov authored
We need to apply instance/replicaset_name as soon as the instance becomes RW, so currently we try to do so at every box.status broadcast. Even though broadcast happens pretty often, it's not enough: The bug is reproduced in config-luatest/set_names_reload, which checks the following situation: 1. Cluster is recovered from the xlogs without names set. 2. User forgets to set UUID for one replica, starts the cluster. 3. Replica, UUID of which have not been set, fails to start. 4. User notices that and updates config, reloading it on the instances, which succeeded to start, starting failed one. 5. Master must apply name for a failed replica. The test worked all right in the majority of runs, because box.status broadcast happens often: e.g. it's broadcasted, when master's applier synced with replica. However, under heavy load on CPU, the test failed sometimes, when master fails to subscribe on replica and broadcast doesn't happen. Let's try to set names not only, when box.status is broadcasted, but immediately after reload, as at this time new names, which must be set, might appear. Let's also change test so that, it doesn't rely on broadcast anymore. Closes tarantool/tarantool-qa#328 NO_DOC=bugfix
-
Alexander Turenko authored
See the details in the documentation request below. Fixes #9431 @TarantoolBot document Title: config: failover mode and election mode consistency `replication.failover: election` enables RAFT based leader election mechanism on a replicaset. The instances can be configured in the following election modes: `off`, `candidate`, `voter`, `manual`. It is controlled by the `replication.election_mode` parameter. However, the election mode parameter has no sense and confusing for other failover modes (`off`, `manual`, `supervised`). So, it is forbidden to set the election modes other than `off` in failover modes != `election`. Summary: * `replication.failover: off` * `replication.election_mode: off`: OK * `replication.election_mode: candidate`: FAIL * `replication.election_mode: voter`: FAIL * `replication.election_mode: manual`: FAIL * `replication.failover: manual` * `replication.election_mode: off`: OK * `replication.election_mode: candidate`: FAIL * `replication.election_mode: voter`: FAIL * `replication.election_mode: manual`: FAIL * `replication.failover: election` * `replication.election_mode: off`: OK * `replication.election_mode: candidate`: OK * `replication.election_mode: voter`: OK * `replication.election_mode: manual`: OK * `replication.failover: supervised` * `replication.election_mode: off`: OK * `replication.election_mode: candidate`: FAIL * `replication.election_mode: voter`: FAIL * `replication.election_mode: manual`: FAIL
-
Alexander Turenko authored
A next commit adds one more check and it becomes too large snippet of the code to be part of the constructor function. NO_DOC=refactoring NO_CHANGELOG=see NO_DOC NO_TEST=see NO_DOC
-
Alexander Turenko authored
It allows to start a replicaset from the given declarative configuration, perform checks on all or particular instances, add a new instance into the replicaset (without stopping existing instances), update the config file and reload it on the instances. NO_DOC=testing helper NO_CHANGELOG=see NO_DOC NO_TEST=see NO_DOC
-
Alexander Turenko authored
The module now has only one function, which allows to reduce a boilerplace code needed to verify a scenario, when all instances of a replicaset should fail with the same error message. The module will be extended with replicaset management functions later. NO_DOC=testing helper NO_CHANGELOG=see NO_DOC NO_TEST=see NO_DOC
-
Alexander Turenko authored
It allows to construct a declarative configuration for a test case using less boilerplace code/options, especially when a replicaset is to be tested, not a single instance. See a description at top of the file for details. NO_DOC=testing helper NO_CHANGELOG=see NO_DOC NO_TEST=see NO_DOC
-
- Dec 05, 2023
-
-
Serge Petrenko authored
Relay sometimes decodes the PROMOTE packets to be sent in order to conditionally delay their dispatch. It was believed that own WALs can't be corrupted and hence there is no point in checking that decoding succeeds. Let's be more strict here and check the decoding result before proceeding. Closes #9265 NO_DOC=bugfix NO_TEST=hard to test NO_CHANGELOG=not user-visible
-
Mergen Imeev authored
Closes #9435 @TarantoolBot document Title: `login` and `password` fields in URI table Now `uri.parse()` can parse a table containing the `login` and `password` fields. Values from these fields take precedence over values obtained from the string URI. For example, login and password of `{uri = 'one:two:localhost:3301, login = 'alpha', password = 'omega'}` will be `alpha` and `omega` respectively. If the `login` field is set and the `password` field is not set, the password is set to `nil`. If the `password` field is set, the `login` field must be present.
-
Magomed Kostoev authored
Before this patch it could happen that deletion of a function from the function cache didn't delete it from the funcs_by_name map. The reason is that the check if the function exists in the map performs the comparison of the search result with the `end` backet ID of the wrong hash table. The situation in which this could happen is the following: 1. Insertion of a new function into the cache triggers resize of the funcs_by_value map, but the size of the funcs map remains the same. 2. Then user deletes a function. This removes the function from the funcs map. Then we check if the function exists in the funcs_by_value map. The function exists there, but it so happens that it's bucket ID equals to the funcs map bucket count, so the incorrect check if the function exists in the funcs_by_value map states that the function does not exist there, so it's not dropped from the map. 3. Now we have the following result: the function is referenced in the funcs_by_value map, but not in funcs map. This triggers the assertion failure on any attempt to insert a new function with the same name. Closes #9426 NO_DOC=bugfix
-
- Dec 04, 2023
-
-
Mergen Imeev authored
This patch adds support for automatic master discovery in vshard. There is no longer a need to reapply the vshard storage configuration every time an instance becomes a master or ceases to be a master. Automatic master discovery also solves the problem with the rebalancer. Previously, the rebalancer could not work correctly if the masters of some replicasets were unknown, and since the vshard config generated by the config module did not contain information about all the masters, the rebalancer was disabled. The rebalancer can now perform master discovery on its own, which is why it is enabled. Before this patch, the config module was based on the "test: use dofile() for configs instead of require" commit in vshard. At a minimum, commit "storage: fix assertion error in conn_manager" is now required. Part of #8862 NO_DOC=will be added along with documentation for the rebalancer role NO_CHANGELOG=will be added along with changelog for the rebalancer role
-
Maxim Kokryashkin authored
This module became unused as a result of LuaJIT bump made in the commit 88333d13 ("luajit: bump new version"), so it can be purged safely from the Tarantool sources. Part of #8700 NO_DOC=internal NO_TEST=internal NO_CHANGELOG=added within the aforementioned commit
-
- Nov 30, 2023
-
-
Serge Petrenko authored
Current split-brain detector implementation raises an error each time a CONFIRM or ROLLBACK entry is received from the previous synchronous transaction queue owner. It is assumed that the new queue owner must have witnessed all the previous CONFIRMS. Besides, according to Raft, ROLLBACK should never happen. Actually there is a case when a CONFIRM from an old term is legal: it's possible that during leader transition old leader writes a CONFIRM for the same transaction that is confirmed by the new leader's PROMOTE. If PROMOTE and CONFIRM lsns match there is nothing bad about such situation. Symmetrically, when an old leader issues a ROLLBACK with the lsn right after the new leader's PROMOTE lsn, it is not a split-brain. Allow such cases by tracking the last confirmed lsn for each synchronous transaction queue owner and silently nopifying CONFIRMs with an lsn less than the one recorded and ROLLBACKs with lsn greater than that. Closes #9138 NO_DOC=bugfix
-
Serge Petrenko authored
Previously the replicas only persisted the confirmed lsn of the current synchronous transaction queue owner. As soon as the onwer changed, the info about which lsn was confirmed by the previous owner was lost. Actually, this info is needed to correctly filter synchro requests coming from the old term, so start tracking confirmed vclock instead of the confirmed lsn on replicas. In-scope of #9138 NO_TEST=covered by the next commit NO_CHANGELOG=internal change @TarantoolBot document Title: Document new IPROTO_RAFT_PROMOTE request field IPROTO_RAFT_PROMOTE and IPROTO_RAFT_DEMOTE requests receive a new key value pair: IPROTO_VCLOCK : MP_MAP The vclock holds a confirmed vclock of the node sending the request.
-
Serge Petrenko authored
Synchronous requests will receive a new field encoding a full vclock soon. Theoretically a vclock may take up to ~ 300-400 bytes (3 bytes for a map header + 32 components each taking up 1 byte for replica id and up to 9 bytes for lsn). So it makes no sense to increase SYNCHRO_BODY_LEN_MAX from 32 to 400-500. It would become almost the same as plain BODY_LEN_MAX. Simply reuse the latter everywhere. In-scope-of #9138 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Serge Petrenko authored
There was an error in xrow_decode_synchro: it compared the expected type of the value to the type of the key (MP_UINT) instead of the type of the actual value. This went unnoticed because all values in synchro requests were integers. This is going to change soon, when PROMOTE requests will start holding a vclock, so fix the wrong type check. In-scope-of #9138 NO_DOC=bugfix NO_CHANGELOG=not user-visible
-
Sergey Kaplun authored
Without checking the return value of lua_pcall()` in `lua_field_inspect_ucdata()`, the error message itself is returned as a serialized result. The result status of `lua_pcall()` is not ignored now. NO_DOC=bugfix Closes #9396
-
Nikolay Shirokovskiy authored
Netbox internally watches 'box.shutdown' for the sake of graceful shutdown. The event subscription is async with connection API. Additionally we check error count on server using different connection. As a result we may or may not account error for the netbox internal watch failure. Let's account the internal watch failure reliably. Also while we at it let's get rid of races for error count check. Close #9423 NO_CHANGELOG=internal NO_DOC=internal
-
Alexander Turenko authored
If `config.etcd` is present and non-empty, `config.etcd.prefix` is required. This validation check was not performed due to a mistake in a schema node wrapper that adds a validator that checks an attempt to use an Enterprise Edition option on Community Edition. Part of #8862 NO_DOC=bugfix
-
- Nov 29, 2023
-
-
Nikolay Shirokovskiy authored
Looks like this is typo introduced in the commit 0704ebb7 ("xlog: rework writer API"). Close #9428 NO_TEST=will be tested when fiber_cxx_invoke suppression will be removed NO_CHANGELOG=introduced in 3.0.0-alpha3 NO_DOC=bugfix
-
Serge Petrenko authored
Starting with commit f1c2127d ("replication: add META stage to JOIN") replication master appends a special section, called IPROTO_JOIN_META to the initial snapshot sent to the replica. This section contains the latest raft term and synchronous transaction queue owner and term. The section is only sent to nodes, which have a non-zero version_id. For some reason, version_id encoding for FETCH_SNAPSHOT (analog of JOIN for anonymous replicas) wasn't added in that commit, so anonymous replicas do not receive synchronous queue state. This leads to them raising ER_SPLIT_BRAIN errors later after join, when the first synchronous row arrives. In order to fix this, start encoding version_id in FETCH_SNAPSHOT requests. Closes #9401 @TarantoolBot document Title: new field in `IPROTO_FETCH_SNAPSHOT` request `IPROTO_FETCH_SNAPSHOT` request was bodyless (only contained a header) until now, but now it receives a body with a single field: `IPROTO_SERVER_VERSION` : MP_UINT -- an encoded representation of the server version of a replica issuing the request.
-
Yan Shtunder authored
Added a new is_sync parameter to `box.begin()`, `box.commit()`, and `box.atomic()`. To make the transaction synchronous, set the `is_sync` option to `true`. If any value other than `true/nil` is set, for example `is_sync = "some string"`, then an error will be thrown. Example: ```Lua -- Sync transactions box.atomic({is_sync = true}, function() ... end) box.begin({is_sync = true}) ... box.commit({is_sync = true}) box.begin({is_sync = true}) ... box.commit() box.begin() ... box.commit({is_sync = true}) -- Async transactions box.atomic(function() ... end) box.begin() ... box.commit() ``` Closes #8650 @TarantoolBot document Title: box.atomic({is_sync = true}) Added the new `is_sync` parameter to `box.atomic()`. To make the transaction synchronous, set the `is_sync` option to `true`. Setting `is_sync = false` is prohibited. If to set any value other than true for example `is_sync = "some string"`, then an error will be thrown.
-
Mergen Imeev authored
This patch adds dependencies support for roles. Part of #9078 @TarantoolBot document Title: dependencies for roles Roles can now have dependencies. This means that the verify() and apply() methods will be executed for these roles, taking into account the dependencies. Dependencies should be written in the "dependencies" field of the array type. Note, the roles will be loaded (not applied!) in the same order in which they were specified, i.e. not taking dependencies into account. Example: Dependencies of role A: B, C Dependencies of role B: D No other role has dependencies. Order in which roles were given: [E, C, A, B, D, G] They will be loaded in the same order: [E, C, A, B, D, G] The order, in which functions verify() and apply() will be executed: [E, C, D, B, A, G].
-
Vladimir Davydov authored
Closes #9405 @TarantoolBot document Title: Document then new built-in system event `box.wal_error` The new event is broadcast whenever Tarantool fails to commit a transaction to the write-ahead log (WAL), which usually means there's a problem with the underlying disk storage. The new event's payload is a table that currently contains the only field `count` that stores the number of WAL errors happened so far or nil if there hasn't been any WAL errors.
-
- Nov 28, 2023
-
-
Nikolay Shirokovskiy authored
Test suite run can produce coredumps in case of bugs. Unfortunately coredumps related to bugs are mixed with coredumps produced related to special test conditions, like when we test Tarantool response to deadly signal. Avoid producing coredumps in correct test suite run. NO_CHANGELOG=internal NO_DOC=internal
-
Vladimir Davydov authored
The fix is simple: look up the function in `box.func` by name and, if found, execute its `call` method. The only tricky part is to avoid the lookup before `box.cfg` is called because `box.func` is unavailable at the time. We achieve that by checking `box.ctl.is_recovery_finished`. Closes #9131 NO_DOC=bug fix
-
Nikolay Shirokovskiy authored
On Tarantool shutdown we destroy all the fibers in some sequence. We don't require that all the fibers are finished before shutdown. So it may turn out that we first destroy some alive fiber and then destroy another alive fiber which joins the first one. Currently we have use-after-free issue in this case because clearing `link` field of the second fiber changes `wake` field of the first fiber. Close #9406 NO_DOC=bugfix
-
Nikolay Shirokovskiy authored
Graceful shutdown is done in a special fiber which is started for example on SIGTERM. So it can run concurrently with fiber executing Tarantool init script. On init fiber exit we break event loop to pass control back to the Tarantool initialization code. But we fail to run event loop a bit more to finish graceful shutdown. The test is a bit contrived. A more real world case is when Tarantool is termintated during lingering box.cfg(). Close #9411 NO_DOC=bugfix
-
- Nov 27, 2023
-
-
Alexander Turenko authored
It was suggested by Igor Munkin (@igormunkin) in PR #9288. Part of #8862 Follows up PR #9288 NO_DOC=the help message is not an API, nothing to document NO_CHANGELOG=see NO_DOC NO_TEST=see NO_DOC
-
Mergen Imeev authored
According to ANSI, EXISTS is a predicate that tests a given subquery and returns true if it returns more than 0 rows, false otherwise. However, after 2a720d11, EXISTS worked correctly only if there were exactly 0 or 1 rows, and in all other cases it gave an error. This patch makes EXITS work properly. Closes #8676 NO_DOC=bugfix
-
Magomed Kostoev authored
Before this commit the space rollback had been treated as a new space creation, so it caused creation of a new space object in the Lua's box.space namespace. Since the preceding space drop removed the space object from the namespace, on the space rollback all the Lua users of the space loosed the track of its changes: the original space object is never updated anymore. This is fixed by detecting the space rollback and restoring the old space object instead of creating a new one. Closes #9120 NO_DOC=bugfix
-