- Dec 14, 2023
-
-
Nikolay Shirokovskiy authored
During iproto graceful shutdown which is WIP we cancel all iproto request in progress. This causes election_qsync_stress test failure. We shutdown master on waiting transaction confirmation from quorum (which is never exist in this test). Currently on shutdown we rollback transaction in this state. So that when previous master is restarted after electing new master we don't expect the rollback on previous master. Let's keep the transaction in limbo if fiber is cancelled as our direction is to do only quorum rollbacks. Part of #8423 Closes #9480 NO_DOC=bugfix
-
Alexander Turenko authored
I got four fails on the given tests in a row on debug-asan job in CI for tarantool-ee. It seems, tarantool-ee is more sensitive to small timeouts, when the address sanitizer slows down the execution. Or I'm just lucky. Anyway, the given tests don't really need small timeouts: increasing it doesn't break any test logic, doesn't increase duration of the test in a successful case and doesn't increase it in case of a failure. The tests are more stable after the change: I verified it locally by running each of the tests in parallel many times on tarantool built with enabled address sanitizer. See the following commits for details about the given test cases and the problems behind. * commit 1fcfb8c2 ("app: start init script event loop explicitly") * commit 786eb2ac ("main: don't break graceful shutdown on init script exit") Follows up #9266 Follows up #9411 NO_DOC=test adjustment NO_CHANGELOG=see NO_DOC
-
- Dec 13, 2023
-
-
Gleb Kashkin authored
The new module can be enable only in the Enterprise Edition builds via `--integrity-check` cli option. Needed for tarantool/tarantool-ee#585 NO_DOC=will be added to Enterprise Edition NO_CHANGELOG=see NO_DOC
-
Gleb Kashkin authored
Before this patch, all user-defiled cli arguments to the `server:new()` were ignored. Now config-specific arguments that used to replace user-defined ones are added to the end of the `args` table instead. Part of tarantool/tarantool-ee#585 NO_DOC=test helper change NO_CHANGELOG=see NO_DOC NO_TEST=see NO_DOC
-
Serge Petrenko authored
Follow-up #9235 NO_DOC=changelog NO_TEST=changelog Co-authored-by:
Kseniia Antonova <73473519+xuniq@users.noreply.github.com>
-
Albert Skalt authored
This patch adds the `config.storage` role file to the build. Part of https://github.com/tarantool/tarantool-ee/issues/593 NO_DOC=supplement change NO_TEST=supplement change NO_CHANGELOG=supplement change
-
- Dec 12, 2023
-
-
Alexander Turenko authored
This commit updates test-run and the only change in test-run is a bunch of luatest updates. The list of luatest updates can be found in tarantool/test-run#415 or below. - assertions: Improved error message for one assert function [1] - TAP output: add missing tabulation to artifacts [2] - utils: add `version_current_ge_than()` [3] - server: fix unix socket path length check [4] - server: accept `new_box_uri` as a table [5] The list excludes changes that are not related to test-run's usage: documentation, testing of luatest itself, packaging of luatest and so on. [1]: tarantool/luatest@2a26c32 [2]: tarantool/luatest@5e8c3e3 [3]: tarantool/luatest@7b6f167 [4]: tarantool/luatest@a8b0389 [5]: tarantool/luatest@f37b353 NO_DOC=testing framework update NO_CHANGELOG=see NO_DOC NO_TEST=see NO_DOC
-
Astronomax authored
Before this patch there was an execution sequence in which the assertion in box_wait_limbo_acked would fail. The assertion is that the lsn of the last entry in limbo is always positive after wal_sync. Fix it. Closes #9235 NO_DOC=bugfix
-
- Dec 07, 2023
-
-
Nikolay Shirokovskiy authored
We need to call `tx_accept_msg` in `tx_process_override` before we pass message to the override handler. Unfortunately if handler response with IPROTO_HANDLER_FALLBACK we call the builtin handler for message that calls `tx_accept_msg` again which is not expected. Some actions of this function are idempotent and some are not. Let's make the function NOP if it called once again. Closes #9345 NO_DOC=bugfix
-
- Dec 06, 2023
-
-
Alexander Turenko authored
This commit allows to bootstrap an anonymous replica from a replicaset, where all the instances are in read-only mode. The reason of the change is that there are no technical reasons to forbid this action. An anonymous replica is not registered in `_cluster` system space, so it can join a replicaset even if there are no writable instances. Fixes #9432 @TarantoolBot document Title: config: anonymous replica is now supported `replication.anon: true` option is now working. There are configuration constraints that are related to anonymous replicas. * A replicaset must contain at least one non-anonymous instance. * An anonymous replica can't be configured as writable instance using `database.mode` or `<replicaset>.leader` options. * An anonymous replica can't be configured with `replication.election_mode` equals to `candidate`, `voter` or `manual` (only `off` is allowed). A few more nuances about anonymous replicas: * Anonymous replicas are filtered out from default upstream list. * A `replication.failover: election` replicaset can contain anonymous replicas, but `replication.election_mode` defaults to `off` for them (unlike non-anonymous instances, where the default is `candidate`). * `replication.failover: supervised` skips anonymous replicas, when choosing a bootstrap leader. * A anonymous replica can joined a replicaset, which has all the instances in read-only mode (unlike a non-anonymous instance). See details in [1] and [2]. [1]: https://github.com/tarantool/tarantool/issues/9432 [2]: https://github.com/tarantool/tarantool/pull/9418
-
Alexander Turenko authored
This commit adds several checks that are specific for `replication.failover` mode. * `replication.failover: off`: an anonymous replica shouldn't be set to read-write mode using `database.mode` option. * `replication.failover: manual`: an anonymous replica shouldn't be configured as a replicaset leader using `<replicaset>.leader` option. * `replication.failover: election`: an anonymous replica can't be configured with `replication.election_mode` other than `off`. This commit also adjusts default `replication.election_mode` to `off` for an anonymous replica if it is part of a `replication.failover: election` replicaset (the default for a non-anonymous instance is `candidate`). Part of #9432 NO_DOC=The documentation request is in the last commit of the series.
-
Alexander Turenko authored
A replicaset that contains only anonymous replicas can't be bootstrapped, because all the instances must be in read-only mode. Part of #9432 NO_DOC=The documentation request is in the last commit of the series.
-
Alexander Turenko authored
Filter out anonymous replicas when choosing a bootstrap leader in `replication.failover: supervised` mode. An anonymous replica can't be in read-write mode, so it can't be a replicaset bootstrap leader. Part of #9432 NO_DOC=It is bugfix. However, this detail is mentioned in the documentation request is in the last commit of the series just in case.
-
Alexander Turenko authored
This commit effectively allows to set `replication.anon: true` without specifying `replication.peers`. Without filtering out anonymous replicas from the list of upstreams we get an error regarding attempt to use an anonymous replica as an upstream for a non-anonymous instance. Also, anonymous replicas are excluded from autogenerated upstream list for other anonymous replicas. It makes the list the same on all the peers. A user can configure a custom data flow using `replication.peers` option. Part of #9432 NO_DOC=The documentation request is in the last commit of the series.
-
Alexander Turenko authored
The commit effectively enables support of anonymous replicas in the declarative configuration. It has several caveats (see the changelog entry), which will be resolved in the following commits of the patchset. An attempt to persist an instance name of an anonymous replica can't be successful, because it has no entry in `_cluster` system space. Such an attempt leads to ER_INSTANCE_NAME_MISMATCH error. This commit patches the configuration applying logic to skip attempt to set `box.cfg({instance_name = <...>})` if the instance is configured as an anonymous replica using `replication.anon: true` option. Part of #9432 NO_DOC=replication.anon option is already documented in the scope of https://github.com/tarantool/doc/issues/3851. The bugfix shouldn't affect the documentation pages much, however related constraints are summarized in a documentation request in the last commit of the series.
-
Alexander Turenko authored
NO_DOC=code owners file change, no code changes NO_TEST=see NO_DOC NO_CHANGELOG=see NO_DOC
-
Sergey Ostanevich authored
Some changes are not new features, rather developer tools updates and improvements. There are also number of tweaks we can introduce to improve testing and tests backporting across branches, which also are not considered neither feature nor bug fix. NO_DOC=changelog NO_TEST=changelog NO_CHANGELOG=changelog
-
Nikita Zheleztsov authored
Sometimes the test fails with "Peer closed" error, logs says, that fatal error happened: cfg_get('read_only'). This is caused by the instance processing its own ballot. The problem is that we set box.cfg to function in before_all trigger, so it's impossible to access box.cfg table during the whole time of test execution. Let's instead set box.cfg to function at the start of every test and restore box.cfg at the end. This way we'll decrease the time, in which such fatal error can happen. Even though it's still possible to get it in theory, the problem is not reproduced anymore. The alternative solution of introducing errinj seems to be overhead here. Closes tarantool/tarantool-qa#329 NO_DOC=testfix NO_CHANGELOG=testfix
-
Nikita Zheleztsov authored
We need to apply instance/replicaset_name as soon as the instance becomes RW, so currently we try to do so at every box.status broadcast. Even though broadcast happens pretty often, it's not enough: The bug is reproduced in config-luatest/set_names_reload, which checks the following situation: 1. Cluster is recovered from the xlogs without names set. 2. User forgets to set UUID for one replica, starts the cluster. 3. Replica, UUID of which have not been set, fails to start. 4. User notices that and updates config, reloading it on the instances, which succeeded to start, starting failed one. 5. Master must apply name for a failed replica. The test worked all right in the majority of runs, because box.status broadcast happens often: e.g. it's broadcasted, when master's applier synced with replica. However, under heavy load on CPU, the test failed sometimes, when master fails to subscribe on replica and broadcast doesn't happen. Let's try to set names not only, when box.status is broadcasted, but immediately after reload, as at this time new names, which must be set, might appear. Let's also change test so that, it doesn't rely on broadcast anymore. Closes tarantool/tarantool-qa#328 NO_DOC=bugfix
-
Alexander Turenko authored
See the details in the documentation request below. Fixes #9431 @TarantoolBot document Title: config: failover mode and election mode consistency `replication.failover: election` enables RAFT based leader election mechanism on a replicaset. The instances can be configured in the following election modes: `off`, `candidate`, `voter`, `manual`. It is controlled by the `replication.election_mode` parameter. However, the election mode parameter has no sense and confusing for other failover modes (`off`, `manual`, `supervised`). So, it is forbidden to set the election modes other than `off` in failover modes != `election`. Summary: * `replication.failover: off` * `replication.election_mode: off`: OK * `replication.election_mode: candidate`: FAIL * `replication.election_mode: voter`: FAIL * `replication.election_mode: manual`: FAIL * `replication.failover: manual` * `replication.election_mode: off`: OK * `replication.election_mode: candidate`: FAIL * `replication.election_mode: voter`: FAIL * `replication.election_mode: manual`: FAIL * `replication.failover: election` * `replication.election_mode: off`: OK * `replication.election_mode: candidate`: OK * `replication.election_mode: voter`: OK * `replication.election_mode: manual`: OK * `replication.failover: supervised` * `replication.election_mode: off`: OK * `replication.election_mode: candidate`: FAIL * `replication.election_mode: voter`: FAIL * `replication.election_mode: manual`: FAIL
-
Alexander Turenko authored
A next commit adds one more check and it becomes too large snippet of the code to be part of the constructor function. NO_DOC=refactoring NO_CHANGELOG=see NO_DOC NO_TEST=see NO_DOC
-
Alexander Turenko authored
It allows to start a replicaset from the given declarative configuration, perform checks on all or particular instances, add a new instance into the replicaset (without stopping existing instances), update the config file and reload it on the instances. NO_DOC=testing helper NO_CHANGELOG=see NO_DOC NO_TEST=see NO_DOC
-
Alexander Turenko authored
The module now has only one function, which allows to reduce a boilerplace code needed to verify a scenario, when all instances of a replicaset should fail with the same error message. The module will be extended with replicaset management functions later. NO_DOC=testing helper NO_CHANGELOG=see NO_DOC NO_TEST=see NO_DOC
-
Alexander Turenko authored
It allows to construct a declarative configuration for a test case using less boilerplace code/options, especially when a replicaset is to be tested, not a single instance. See a description at top of the file for details. NO_DOC=testing helper NO_CHANGELOG=see NO_DOC NO_TEST=see NO_DOC
-
- Dec 05, 2023
-
-
Serge Petrenko authored
Relay sometimes decodes the PROMOTE packets to be sent in order to conditionally delay their dispatch. It was believed that own WALs can't be corrupted and hence there is no point in checking that decoding succeeds. Let's be more strict here and check the decoding result before proceeding. Closes #9265 NO_DOC=bugfix NO_TEST=hard to test NO_CHANGELOG=not user-visible
-
Mergen Imeev authored
Closes #9435 @TarantoolBot document Title: `login` and `password` fields in URI table Now `uri.parse()` can parse a table containing the `login` and `password` fields. Values from these fields take precedence over values obtained from the string URI. For example, login and password of `{uri = 'one:two:localhost:3301, login = 'alpha', password = 'omega'}` will be `alpha` and `omega` respectively. If the `login` field is set and the `password` field is not set, the password is set to `nil`. If the `password` field is set, the `login` field must be present.
-
Magomed Kostoev authored
Before this patch it could happen that deletion of a function from the function cache didn't delete it from the funcs_by_name map. The reason is that the check if the function exists in the map performs the comparison of the search result with the `end` backet ID of the wrong hash table. The situation in which this could happen is the following: 1. Insertion of a new function into the cache triggers resize of the funcs_by_value map, but the size of the funcs map remains the same. 2. Then user deletes a function. This removes the function from the funcs map. Then we check if the function exists in the funcs_by_value map. The function exists there, but it so happens that it's bucket ID equals to the funcs map bucket count, so the incorrect check if the function exists in the funcs_by_value map states that the function does not exist there, so it's not dropped from the map. 3. Now we have the following result: the function is referenced in the funcs_by_value map, but not in funcs map. This triggers the assertion failure on any attempt to insert a new function with the same name. Closes #9426 NO_DOC=bugfix
-
- Dec 04, 2023
-
-
Mergen Imeev authored
This patch adds support for automatic master discovery in vshard. There is no longer a need to reapply the vshard storage configuration every time an instance becomes a master or ceases to be a master. Automatic master discovery also solves the problem with the rebalancer. Previously, the rebalancer could not work correctly if the masters of some replicasets were unknown, and since the vshard config generated by the config module did not contain information about all the masters, the rebalancer was disabled. The rebalancer can now perform master discovery on its own, which is why it is enabled. Before this patch, the config module was based on the "test: use dofile() for configs instead of require" commit in vshard. At a minimum, commit "storage: fix assertion error in conn_manager" is now required. Part of #8862 NO_DOC=will be added along with documentation for the rebalancer role NO_CHANGELOG=will be added along with changelog for the rebalancer role
-
Maxim Kokryashkin authored
This module became unused as a result of LuaJIT bump made in the commit 88333d13 ("luajit: bump new version"), so it can be purged safely from the Tarantool sources. Part of #8700 NO_DOC=internal NO_TEST=internal NO_CHANGELOG=added within the aforementioned commit
-
- Nov 30, 2023
-
-
Serge Petrenko authored
Current split-brain detector implementation raises an error each time a CONFIRM or ROLLBACK entry is received from the previous synchronous transaction queue owner. It is assumed that the new queue owner must have witnessed all the previous CONFIRMS. Besides, according to Raft, ROLLBACK should never happen. Actually there is a case when a CONFIRM from an old term is legal: it's possible that during leader transition old leader writes a CONFIRM for the same transaction that is confirmed by the new leader's PROMOTE. If PROMOTE and CONFIRM lsns match there is nothing bad about such situation. Symmetrically, when an old leader issues a ROLLBACK with the lsn right after the new leader's PROMOTE lsn, it is not a split-brain. Allow such cases by tracking the last confirmed lsn for each synchronous transaction queue owner and silently nopifying CONFIRMs with an lsn less than the one recorded and ROLLBACKs with lsn greater than that. Closes #9138 NO_DOC=bugfix
-
Serge Petrenko authored
Previously the replicas only persisted the confirmed lsn of the current synchronous transaction queue owner. As soon as the onwer changed, the info about which lsn was confirmed by the previous owner was lost. Actually, this info is needed to correctly filter synchro requests coming from the old term, so start tracking confirmed vclock instead of the confirmed lsn on replicas. In-scope of #9138 NO_TEST=covered by the next commit NO_CHANGELOG=internal change @TarantoolBot document Title: Document new IPROTO_RAFT_PROMOTE request field IPROTO_RAFT_PROMOTE and IPROTO_RAFT_DEMOTE requests receive a new key value pair: IPROTO_VCLOCK : MP_MAP The vclock holds a confirmed vclock of the node sending the request.
-
Serge Petrenko authored
Synchronous requests will receive a new field encoding a full vclock soon. Theoretically a vclock may take up to ~ 300-400 bytes (3 bytes for a map header + 32 components each taking up 1 byte for replica id and up to 9 bytes for lsn). So it makes no sense to increase SYNCHRO_BODY_LEN_MAX from 32 to 400-500. It would become almost the same as plain BODY_LEN_MAX. Simply reuse the latter everywhere. In-scope-of #9138 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Serge Petrenko authored
There was an error in xrow_decode_synchro: it compared the expected type of the value to the type of the key (MP_UINT) instead of the type of the actual value. This went unnoticed because all values in synchro requests were integers. This is going to change soon, when PROMOTE requests will start holding a vclock, so fix the wrong type check. In-scope-of #9138 NO_DOC=bugfix NO_CHANGELOG=not user-visible
-
Sergey Kaplun authored
Without checking the return value of lua_pcall()` in `lua_field_inspect_ucdata()`, the error message itself is returned as a serialized result. The result status of `lua_pcall()` is not ignored now. NO_DOC=bugfix Closes #9396
-
Nikolay Shirokovskiy authored
Netbox internally watches 'box.shutdown' for the sake of graceful shutdown. The event subscription is async with connection API. Additionally we check error count on server using different connection. As a result we may or may not account error for the netbox internal watch failure. Let's account the internal watch failure reliably. Also while we at it let's get rid of races for error count check. Close #9423 NO_CHANGELOG=internal NO_DOC=internal
-
Alexander Turenko authored
If `config.etcd` is present and non-empty, `config.etcd.prefix` is required. This validation check was not performed due to a mistake in a schema node wrapper that adds a validator that checks an attempt to use an Enterprise Edition option on Community Edition. Part of #8862 NO_DOC=bugfix
-
- Nov 29, 2023
-
-
Nikolay Shirokovskiy authored
Looks like this is typo introduced in the commit 0704ebb7 ("xlog: rework writer API"). Close #9428 NO_TEST=will be tested when fiber_cxx_invoke suppression will be removed NO_CHANGELOG=introduced in 3.0.0-alpha3 NO_DOC=bugfix
-
Serge Petrenko authored
Starting with commit f1c2127d ("replication: add META stage to JOIN") replication master appends a special section, called IPROTO_JOIN_META to the initial snapshot sent to the replica. This section contains the latest raft term and synchronous transaction queue owner and term. The section is only sent to nodes, which have a non-zero version_id. For some reason, version_id encoding for FETCH_SNAPSHOT (analog of JOIN for anonymous replicas) wasn't added in that commit, so anonymous replicas do not receive synchronous queue state. This leads to them raising ER_SPLIT_BRAIN errors later after join, when the first synchronous row arrives. In order to fix this, start encoding version_id in FETCH_SNAPSHOT requests. Closes #9401 @TarantoolBot document Title: new field in `IPROTO_FETCH_SNAPSHOT` request `IPROTO_FETCH_SNAPSHOT` request was bodyless (only contained a header) until now, but now it receives a body with a single field: `IPROTO_SERVER_VERSION` : MP_UINT -- an encoded representation of the server version of a replica issuing the request.
-
Yan Shtunder authored
Added a new is_sync parameter to `box.begin()`, `box.commit()`, and `box.atomic()`. To make the transaction synchronous, set the `is_sync` option to `true`. If any value other than `true/nil` is set, for example `is_sync = "some string"`, then an error will be thrown. Example: ```Lua -- Sync transactions box.atomic({is_sync = true}, function() ... end) box.begin({is_sync = true}) ... box.commit({is_sync = true}) box.begin({is_sync = true}) ... box.commit() box.begin() ... box.commit({is_sync = true}) -- Async transactions box.atomic(function() ... end) box.begin() ... box.commit() ``` Closes #8650 @TarantoolBot document Title: box.atomic({is_sync = true}) Added the new `is_sync` parameter to `box.atomic()`. To make the transaction synchronous, set the `is_sync` option to `true`. Setting `is_sync = false` is prohibited. If to set any value other than true for example `is_sync = "some string"`, then an error will be thrown.
-
Mergen Imeev authored
This patch adds dependencies support for roles. Part of #9078 @TarantoolBot document Title: dependencies for roles Roles can now have dependencies. This means that the verify() and apply() methods will be executed for these roles, taking into account the dependencies. Dependencies should be written in the "dependencies" field of the array type. Note, the roles will be loaded (not applied!) in the same order in which they were specified, i.e. not taking dependencies into account. Example: Dependencies of role A: B, C Dependencies of role B: D No other role has dependencies. Order in which roles were given: [E, C, A, B, D, G] They will be loaded in the same order: [E, C, A, B, D, G] The order, in which functions verify() and apply() will be executed: [E, C, D, B, A, G].
-