- Feb 20, 2024
-
-
Nikolay Shirokovskiy authored
Currently if there is uncancellable iproto request Tarantool shutdown will hang. Let's instead give it some time and then panic. On this way it is good to make iproto_drop_connections() fail on timeout. It is used in `box.ctl.iproto_lockdown` which is better to fail on timeout than to hang indefinitely too. In Tarantool CI which is run with TEST_BUILD set, we set the timeout to the infinity. This is on par with current fiber_shutdown() behaviour. We will not change the latter for a while because there is already several tests that count on that. Also it is currently easier to test that there is no hang than to test exit status. Part of #8423 NO_CHANGELOG=internal NO_DOC=internal
-
- Feb 19, 2024
-
-
Ilya Verbin authored
Currently, it's only possible to set an error cause with the set_prev method. This isn't very convenient, because one has to construct a new error without raising it, then set its cause, and only then raise it. To simplify this, a new argument `prev` is added to the error constructor. Closes #9103 @TarantoolBot document Title: Document `prev` argument to table constructor of `box.error.new` Product: Tarantool Since: 3.1 Root documents: https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_error/new/ and https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_error/error/ [Link to the design document](https://www.notion.so/tarantool/Error-subsystem-improvements-90faa0a4714b4143abaf8bed2c10b2fc?pvs=4#c72c870f24734020aae1fbf34e2b8569)
-
Timur Safin authored
Google fuzzing efforts revealed yet another bound condition we don't handle well in the `tnt_strptime` function: - for format `%m%g%W`; - and input string `07001`. We failed with assertion failure: ``` | datetime_strptime_fuzzer: ./src/lib/core/datetime.c:148: \ _Bool tm_to_datetime(struct tnt_tm *, struct datetime *): \ Assertion `mday >= 1 && mday <= 31' failed. ``` Closes #8525 NO_TEST=updated fuzzer corpus NO_CHANGELOG=internal NO_DOC=internal
-
Nikolay Shirokovskiy authored
We create a snapshot on SIGUSR1 signal in a newly spawned system fiber. It can interfere with Tarantool shutdown. In particular there is an assertion on shutdown during such a snapshot because of cord making snaphshot. Let's just trigger making snapshot in gc subsystem in it's own worker fiber. ``` #5 0x00007e7ec9a54d26 in __assert_fail ( assertion=0x63ad06748400 "pm_atomic_load(&cord_count) == 0", file=0x63ad067478b8 "./src/lib/core/fiber.c", line=2290, function=0x63ad06748968 <__PRETTY_FUNCTION__.6> "fiber_free") at assert.c:101 #6 0x000063ad061a6a91 in fiber_free () at /home/shiny/dev/tarantool/src/lib/core/fiber.c:2290 #7 0x000063ad05edc216 in tarantool_free () at /home/shiny/dev/tarantool/src/main.cc:632 #8 0x000063ad05edd144 in main (argc=1, argv=0x63ad079ca3b0) ``` Part of #8423 NO_CHANGELOG=internal NO_DOC=internal
-
- Feb 16, 2024
-
-
Mergen Imeev authored
Before this patch, it was possible that the config status could change to 'ready' even if there were pending alerts. This could happen if there were 'missed_privilege' warnings. Before this patch, warnings were cleared and then refilled if necessary. However, after clearing all these warnings, if there were no other warnings, the status was set to 'ready'. And if new warnings were added during the 'refilling' stage, the status did not change from 'ready' to 'check_warnings' or 'check_errors'. Part of #9689 NO_DOC=bugfix NO_CHANGELOG=unreleased bug
-
Ilya Verbin authored
By design, the error references its cause, see error_set_prev() in diag.c. Referencing the error from error_set_prev() in error.lua is wrong. Introduced by commit 2e3c81de ("error: use int64_t as reference counter"). Closes #9694 NO_DOC=bugfix NO_TEST=memory leak
-
Nikolay Shirokovskiy authored
`sql-tap/intpkey.test` start to flak under load after the commit fe769b0a ("vinyl: add graceful shutdown"). The issue is vy_scheduler_complete_tasks() may yield. So on shutdown `vinyl.scheduler` fiber may be cancelled during this yield and then we go to sleep waiting for new tasks forever. Now vinyl shutdown hangs. Part of #8423 NO_TEST=fix flaky test NO_CHANGELOG=fix flaky test NO_DOC=fix flaky test
-
Georgiy Lebedev authored
Currently, the error message is specified in the `reason` argument to the table constructor of `box.error.new`. This is confusing, because the message is accessed and printed by `error:unpack` as `message`. Let’s allow users to pass the error message in the `message` argument. We won’t drop the `reason` argument so as not to break potential users. Let's also allow to omit the message key by treating the first table constructor entry as an error message if present. The error message setting has the following priority: first table constructor entry > `message` > `reason`. This change is backwards-compatible and does not require a new `compat` option. Closes #9102 @TarantoolBot document Title: Document `message` argument to table constructor of `box.error.new` Product: Tarantool Since: 3.1 Root documents: https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_error/new/ and https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_error/error [Link to the design document](https://www.notion.so/tarantool/Error-subsystem-improvements-90faa0a4714b4143abaf8bed2c10b2fc?pvs=4#c984e766372743c99a4a7be79a5783f6)
-
Maxim Kokryashkin authored
In commit 3daf2399 ("ci: fix step parameters for reusable runs") the integration workflow was made reusable, but concurrency group pattern modification that was done for other workflows made reusable in the same patch was forgotten. This patch fixes the mentioned issue. NO_DOC=CI NO_TEST=CI NO_CHANGELOG=CI
-
Astronomax authored
Prior to this patch, the raft leader continued to send heartbeats even if it can't write anything, so it is seen by others as alive. As a result, once the leader encounters a disk error, it can't write anything, but the elections do not start. Now the leader resigns on the first encounter with `ER_WAL_IO` write error. Closes #9399 @TarantoolBot document Title: raft: leader resigns on the first encounter with `ER_WAL_IO` * Now the leader resigns on the first encounter with `ER_WAL_IO` write error (the leader broadcasts this).
-
Astronomax authored
Before this patch, it was possible to go into an infinite loop using `box.watch()`. Also, the tarantool got stuck in a loop when trying to call `box_register_watcher` during initialization in `box_init`. Fix it. Closes #9632 NO_DOC=bugfix
-
Ilya Verbin authored
See the docbot request for details. Closes #9104 @TarantoolBot document Title: Document custom error payload fields Product: Tarantool Since: 3.1 Root documents: https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_error/new/ and https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_error/error/ Since Tarantool 3.1 it is possible to add a custom payload to the error on error construction. The payload is passed as key-value pairs, where `key` is a string, and `value` is any Lua object. The key name can be arbitrary except it should not be the same as any of the built-in error field name: reason, code, type, base_type, custom_type, errno, message, prev, trace. NO_WRAP Replace: box.error.new({ reason = string[, code = number, type = string] }]) with: box.error.new({ reason = string[, code = number, type = string, key1 = value1, ...] }]) Replace: box.error({ reason = string[, code = number, type = string] }]) with: box.error({ reason = string[, code = number, type = string, key1 = value1, ...] }]) NO_WRAP
-
Nikolay Shirokovskiy authored
Fix test_shutdown_during_memtx_snapshot which fail from time to time in CI on macOS x86_64 due to hitting fiber slice limit. The issue is we insert 10k tuples in a row. Part of #8423 NO_CHANGELOG=test fix NO_DOC=test fix
-
Nikolay Shirokovskiy authored
So that graceful shutdown during heavy secondary index build is possible. Part of #8423 NO_CHANGELOG=internal NO_DOC=internal
-
Nikolay Shirokovskiy authored
Let's stop all vinyl internal fibers and threads. In case of scheduler it looks like we revert the commit e463128e ("vinyl: cancel reader and writer threads on shutdown") so we can again have delay on shutdown in 'vinyl/replica_quota.test'. I guess we should not. At the time of the commit deferring deletes was the default behavior and there is a secondary index in the test space. The deferred deletes involve TX thread communication and at moment of stopping scheduler worker threads the TX event loop was not running. This could result in worker threads hanging on stop. In this patch we stop worker threads in shutdown phase while TX event loop is active. We delete part of the test for #3412 as now we finish fibers that may use the latch. Also we restore destroying the latch. Part of #8423 NO_CHANGELOG=internal NO_DOC=internal
-
Nikolay Shirokovskiy authored
engine_shutdown() is called on free step so let's name it accordingly as other subsystems freeing functions. Also let's introduce engine_shutdown() again which will be used during Tarantool shutdown. Part of #8423 NO_TEST=refactoring/stubbing NO_CHANGELOG=refactoring/stubbing NO_DOC=refactoring/stubbing
-
Yaroslav Lobankov authored
NO_DOC=ci NO_TEST=ci NO_CHANGELOG=ci
-
- Feb 15, 2024
-
-
Maksim Kokryashkin authored
This patch fixes three issues: 1. It changes the condition for workflows so they can be run not only from the Tarantool repository but from any repository in the Tarantool organization. 2. Reusable workflows substitute the `${{ github.workflow }}` context variable with the name of their top-level workflow. This behavior causes concurrency group clashes when several reusable workflows are called from a single top-level workflow. This patch adds an additional constant part to the concurrency group pattern to solve the issue. 3. The checkout actions use the reference from the repository in which the top-level workflow is located instead of the one where the reusable workflow is located. This patch solves the issue by passing the reference explicitly. NO_DOC=CI NO_TEST=CI NO_CHANGELOG=CI
-
Maksim Kokryashkin authored
Some workflows are not relevant for integration testing. This patch disables them. NO_DOC=CI NO_TEST=CI NO_CHANGELOG=CI
-
Igor Munkin authored
* test: fix initialization in lj-549-lua-load.test.c * codehealth: add `nd` to the codespell ignore list * LJ_GC64: Always snapshot functions for non-base frames. * Avoid assertion in case of stack overflow from stitched trace. * Fix recording of __concat metamethod. * Avoid out-of-range number of results when compiling select(k, ...). * Consider slots used by upvalues in use-def analysis. * Only emit proper parent references in snapshot replay. * Optimize table.new() with constant args to (sinkable) IR_TNEW. * Followup fix for embedded bytecode loader. * Fix embedded bytecode loader. * LJ_GC64: Fix HREFK optimization. * Fix unsinking of IR_FSTORE for NULL metatable. * Fix zero stripping in %g number formatting. * Follow-up fix for stack overflow handling cleanup. * Cleanup stack overflow handling. * Improve error reporting on stack overflow. * sysprof: disable runtime host symtab updates * codehealth: fix the typo * Simplify handling of instable types in TNEW/TDUP load forwarding. * Respect jit.off() on pending trace exit. * Limit exponent range in number parsing. * ARM64: Allow building with unwinding disabled. * Emit sunk IR_NEWREF only once per key on snapshot replay. Closes #7937 Closes #8140 Part of #9145 Part of #9595 NO_DOC=LuaJIT submodule bump NO_TEST=LuaJIT submodule bump
-
Mikhail Elhimov authored
Closes #8632 NO_DOC=gdb extension NO_CHANGELOG=gdb extension NO_TEST=gdb extension
-
- Feb 14, 2024
-
-
Sergey Vorontsov authored
In Linux systems based on Debian, libraries are installed in paths depending on the architecture. For example, /usr/lib/x86_64-linux-gnu/, /usr/lib/aarch64-linux-gnu/. Some packages may be installed in these paths, but Tarantool does not look for libraries installed in these paths. This patch solves the problem. Also remove redundant OS depended `if` branches. Fix #9580 NO_DOC=bugfix NO_TEST=bugfix
-
- Feb 13, 2024
-
-
Yaroslav Lobankov authored
Bump test-run to new version with the following improvements: - Bump luatest to 1.0.0-5-gf31fe34 [1] - get_iproto_port: remove duplicates [2] - requirements: bump gevent to 22.10.2 [3] - Fix decoding error when reading server's log file [4] [1] tarantool/test-run@bfcc9e8 [2] tarantool/test-run@da98d7f [3] tarantool/test-run@bc1c473 [4] tarantool/test-run@434cbec NO_DOC=test NO_TEST=test NO_CHANGELOG=test
-
Ilya Verbin authored
Switch the legacy `box.iproto.override()' interface to the newly introduced event triggers. This change is mostly not user-visible, except: - Now it can be called before `box.cfg{}'; - Now request type can be set as a string; - Some changes in error messages; - The "overriding does not support ... request type" error is logged, rather than raised; - The internal trigger is visible via the `trigger' module. If some request type is overridden by both interfaces (legacy `box.iproto.override()' and new `trigger.set()'), the order of invocation of the handlers is unspecified. Closes #8138 NO_DOC=internal
-
Ilya Verbin authored
This patch allows to override IPROTO request handlers by setting triggers on the corresponding events after the initial `box.cfg{}' call. Part of #8138 @TarantoolBot document Title: Document iproto override using event triggers Product: Tarantool Since: 3.1 Root document: New page - https://www.tarantool.io/en/doc/latest/reference/reference_lua/trigger/ Since Tarantool 3.1 there are 2 ways to override iproto request handlers: 1. Using `box.iproto.override()`, introduced in Tarantool 2.11: https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_iproto/override/ 2. Using universal trigger registry: tarantool/doc#3988 To override an iproto request handler for the given request type, one can set a trigger (or multiple triggers) on the corresponding event. There are 2 types of iproto-overriding events: 1. set by request type id, e.g.: - box.iproto.override[1] - box.iproto.override[-1] 2. set by request type name (the name must be in the lowercase), e.g.: - box.iproto.override.select - box.iproto.override.unknown Override-by-id allows to set a handler for a particular request type, that is not known by the given version of Tarantool. This is not possible with override-by-name, where a type name must be known by Tarantool. Also there are a special type name "unknown" and a type id box.iproto.type.UNKNOWN (== -1) that allow to set a single handler for all unknown request types. Multiple triggers can be associated with a single event. The triggers are called in reverse order of their installation, however triggers set by id are called before triggers set by name. If a trigger returns `false`, the next trigger in the list is called, or a system handler if there are no more triggers. If a trigger returns `true`, no more triggers or system handlers are called. If some request type is overridden by both interfaces (legacy `box.iproto.override()' and new `trigger.set()'), the order of invocation of those handlers is unspecified. Co-authored-by:
Andrey Saranchin <Andrey22102001@gmail.com>
-
Ilya Verbin authored
This patch allows to override IPROTO request handlers by setting triggers on the corresponding events before the initial `box.cfg{}' call. For triggers that are set after the first `box.cfg{}', see next commit. There are 2 types of iproto-overriding events: 1. set by request type id, e.g.: - box.iproto.override[1] - box.iproto.override[-1] 2. set by request type name (the name must be in the lowercase), e.g.: - box.iproto.override.select - box.iproto.override.unknown Override-by-id allows to set a handler for a particular request type, that is not known by the given version of Tarantool. This is not possible with override-by-name, where a type name must be known by Tarantool. Also there are a special type name "unknown" and a type id box.iproto.type.UNKNOWN (== -1) that allow to set a single handler for all unknown request types. Multiple triggers can be associated with a single event. The triggers are called in reverse order of their installation, however triggers set by id are called before triggers set by name. If a trigger returns `false`, the next trigger in the list is called, or a system handler if there are no more triggers. If a trigger returns `true`, no more triggers or system handlers are called. Part of #8138 NO_DOC=see next commit NO_CHANGELOG=see next commit Co-authored-by:
Andrey Saranchin <Andrey22102001@gmail.com>
-
Ilya Verbin authored
So far only one type of a handler can be set for a particular request type, but the following commit will bring two more handlers. This patch prepares `struct iproto_req_handler` for this extension to simplify next commits. Part of #8138 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Ilya Verbin authored
This patch adds `iproto_key_translation' to the MsgPack decoding context, that is used to decode xrow header and body in the space recovery triggers. NO_DOC=Minor improvement NO_CHANGELOG=Minor improvement
-
Ilya Verbin authored
It works like func_adapter_push_msgpack, but with an additional MsgPack decoding context. The context is required to support translation of first-level `MP_MAP` keys. Needed for #8138 NO_DOC=internal NO_CHANGELOG=internal
-
Andrey Saranchin authored
Since we are going to move iproto override mechanism to the trigger registry, we need to allow to set core triggers to on change event because we need to notify iproto threads when a handler is overridden. Let's add a new method that accepts a core trigger and sets it to an internal trigger list. Needed for #8138 NO_DOC=internal NO_CHANGELOG=internal
-
Andrey Saranchin authored
Test box-luatest/iproto_request_handlers_overriding_test.lua has two test cases with the same name. That is why a test case that was declared first is replaced by the second one and that's why it is not launched. The commit renames one of the cases. NO_CHANGELOG=test NO_DOC=test
-
- Feb 09, 2024
-
-
Ilya Verbin authored
This function returns a key_def part by a field number. However, currently it returns NULL for parts that contain a JSON path to indexed data. Fix it. Needed for tarantool/tarantool-ee#671 NO_DOC=bugfix NO_CHANGELOG=not visible in CE
-
Mergen Imeev authored
Closes #9657 @TarantoolBot document Title: The `sharding.rebalancer_mode` option The `sharding.rebalancer_mode` option can have one of three values: `manual`, `auto` and `off`. Default value is `auto`. If the option is set to `manual`, one of the replicasets must have the `rebalancer` sharding role. The rebalancer will be in this replicaset. If the option value is `auto` and there are no replicasets with the sharding role `rebalancer`, the replicaset with rebalancer will be selected automatically among all replicasets. If the value of the parameter is `auto` and one of the replicasets has the sharding role `rebalancer`, then the rebalancer will be in that replicaset. If the option value is `off`, rebalancing will be disabled regardless of whether a replicaset with the sharding role `rebalancer` exists or no such replicaset exists.
-
Nikita Zheleztsov authored
This commit adds a new configuration option wal_retention_period and function stubs for it. It's needed to avoid rebootstrap on anonymous replicas, as Tarantool doesn't save xlog for them. The new option takes a floating point number that sets the period for every xlog file during which this xlog file cannot be deleted by garbage collector. The default value is 0, which means no delay. The option can be set dynamically. Note: - The delay is applied after xlog closing - During instance restart delay becomes box.cfg.wal_retention_period - last modification time of xlog. - The minimum vclock (same as xlog file name) can be found with box.info.gc().wal_retention_vclock. The option value is stored and used in C code, so we define configuration callbacks in EE: cfg_set_wal_retention_period. Needed for tarantool/tarantool-ee#513 NO_DOC=EE NO_CHANGELOG=EE
-
Nikita Zheleztsov authored
This commit introduces new methods for vclock library. Sometimes we need to take into account the 0-th component of vclock, as it's done in the following commit, that's why vclock_min/max are added. vclockset_foreach is just a macros, which allows to iterate over vclockset comfortably. NO_DOC=internal NO_CHANGELOG=internal
-
- Feb 08, 2024
-
-
Serge Petrenko authored
We've had numerous problems with transaction boundaries in replication. They were mostly caused by various cases when either the beginning or end of the transaction happened to be a local row. Local rows are not replicated, so the peer saw "corrupted" transactions with either no beginning or no end flag, even though the transaction contents were fine. The problem with starting a transaction with a local row was solved in commit f41d1ddd ("wal: fix tx boundaries"), and that fix seems to continue working fine to this day. The problem with ending transactions with a local row was first fixed in commit 25382617 ("replication: append NOP as the last tx row"), however there were problems with this approach: when a user tried to write to local spaces on a replica from a replication trigger, it made it impossible to ever start replicating from replica back to master. Another fix was proposed: in commit f96782b5 ("relay: send rows transactionally") we made relay read a full transaction into memory and then send it all at once mangling with transanction start and end flags when necessary. After that the NOPs were removed in commit f5e52b2c ("box: get rid of dummy NOPs after transactions ending with local rows"), since relay became capable of fixing transaction boundaries itself. Turns out the assumption that relay always sees a full transaction and may correctly set transaction boundaries is wrong: when a replica reconnects to master we set its starting vclock[0] to the one master has at the moment of reconnect, so when recovery reads local rows with lsns less than vclock[0] it silently skips them without showing them to relay. When such skipped rows contain the is_commit flag for a currently sent transaction we get the same problem as described before. Let's make recovery track whether it has pushed any transaction rows to relay or not, and if yes, recover rows with is_commit flag regardless of whether the rows were already applied. To prevent recovering the same data twice, recovery replaces such row contents with NOPs. Basically the row is "recovered" only for the sake of showing its is_commit flag to relay. Relay will skip the row anyway, since it remains local. Follow-up #8958 Closes #9491 NO_DOC=bugfix
-
Serge Petrenko authored
It doesn't make sense to assert that replica_id is correct in a row after using that replica id to make some decisions based on it. Let's switch the order of operations: first assert that replica_id is correct, then compare row lsn with the already recovered one. In-scope-of #9491 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
- Feb 07, 2024
-
-
Mergen Imeev authored
Roles are now started and stopped at the "post_apply" stage rather than at the "apply" stage. This allows require('config'):get() to correctly return the configuration that is being applied. Closes #9649 NO_DOC=bugfix
-
- Feb 06, 2024
-
-
Nikita Zheleztsov authored
This commit adds a test, that covers the integration of tarantool's config module with vshard and verifies correctness of changes in #9514 and tarantool/vshard#458. It checks, that we're able to upgrade several vshard clusters without downtime. NO_DOC=test NO_CHANGELOG=test
-
Nikolay Shirokovskiy authored
We stop client fibers in the process of Tarantool shutdown in order to be sure that subsystems that will be shutdown later are not being used. See the commit bf620650 ("box: finish client fibers on shutdown"). But there is one more way for client code to be executed - in watcher callback. So let's shutdown watcher too. After shutdown watcher API is usable so we can next shutdown client fibers but notifications are stopped. Part of #8423 NO_CHANGELOG=internal NO_DOC=internal
-