- Aug 23, 2022
-
-
Gleb Kashkin authored
When multiline commands were loaded from .tarantool_history, they were treated as a bunch of oneline commands. Now readline is configured to write timestamps in .tarantool_history as delimiters and multiline commands are handled correctly. If there is already a .tarantool_history file, readline will set timestamps automatically, nothing will be lost. Closes #7320 NO_DOC=bugfix NO_TEST=impossible to check readline history from lua
-
- Aug 18, 2022
-
-
Vladimir Davydov authored
Currently, we create a database read view only to create a memtx snapshot or join a replica, but there's already quite a bit of code duplication between these two scenarios. In the future, we will need the same functionality to create a user read view. So let's factor out this code into a separate module - read_view. The API of the read_view module is quite simple - there are just two methods: open and close a read view. The user can pass a space and index filter while opening a read view to skip certain spaces. E.g. we skip all temporary spaces and secondary indexes when we create a memtx snapshot. A read_view object has a list of space_read_view objects, one per each space included into the read view. A space_read_view object, in turn, has a map of all index_read_view objects (introduced earlier) corresponding to space indexes. There's nothing like a space cache - the user can create one if required. An engine that supports creation of a read view (currently, only memtx) is supposed to set the ENGINE_SUPPORTS_READ_VIEW flag and implement the create_read_view engine method in addition to the create_read_view index method. The engine method should do some engine-wide read view related preparations. For example, in case of memtx, it suspends tuple garbage collection. Closes #7363 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
- Aug 17, 2022
-
-
Serge Petrenko authored
downstream lag is the difference in time between the moment a transaction was written to master's WAL and the moment an ack for it arrived. Its calculation is supported by replicas sending the last applied row timestamp. When there is no replication, the last applied row timestamp stays the same, so in this case downstream lag grows as time passes. Once an old master is replaced by a new one, it notices changes in peer vclocks and tries to update downstream lag unconditionally. This makes the lag appear to be growing indefinitely, showing the time since the last transaction on the old master: ``` downstream: status: follow idle: 0.018218606001028 vclock: {1: 3, 2: 2} lag: 34.623061401367 ``` The commit 56571d83 ("raft: make followers notice leader hang") made relay exchange information with tx even when there are no new transactions, so the issue became even easier to reproduce. The issue itself was present since downstream lag introduction in commit 29025bce ("relay: provide information about downstream lag"). Closes #7581 NO_DOC=bugfix
-
Cyrill Gorcunov authored
The 'log' module uses fibers internally for logs rotation sake and before we can free log's resources (on program exit) we need to wait until rotation is complete, which implies that events loop is still running. But we break the event loop in `on_shutdown_f` trigger and calling any events based functionality later cause unexpected results because fibers are no loner valid to use. Thus move `say_logger_free` call into `on_shutdown_f` body where fibers are still alive. N.B. Testing the issue is sensitive to timings, during local tests found that minimal delay 1ms is enough to trigger, thus ERRINJ_LOG_ROTATE get increased. Fixes #4450 NO_DOC=bugfix Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Some functions is src/main.cc are declared as global while they used in file scope only. Declare them as appropriate. NO_DOC=cleanup NO_CHANGELOG=cleanup NO_TEST=cleanup Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- Aug 16, 2022
-
-
Ilya Verbin authored
If two or more fibers are yielding in fiber_join_timeout(), one of them will eventually join and recycle the fiber, while the rest will crash on accessing the recycled fiber's struct. Fix this by doing fiber_find() again after each waiting attempt in lbox_fiber_join(). Closes #7489 Closes #7531 NO_DOC=bugfix
-
Ilya Verbin authored
It is separated from fiber_join_timeout(), and will be used in lbox_fiber_join() too. Part of #7489 Part of #7531 NO_DOC=internal NO_CHANGELOG=internal
-
- Aug 15, 2022
-
-
Gleb Kashkin authored
As the underlying problem behind this injection is fixed in #7357 it can be removed and `-i` flag could be used as initially intended. Closes #7554 Requires #7357 NO_DOC=refactoring NO_CHANGELOG=refactoring
-
Vladimir Davydov authored
We will add all source files related to user read views under this option. Needed for https://github.com/tarantool/tarantool-ee/issues/191 NO_DOC=internal NO_TEST=internal NO_CHANGELOG=internal
-
Vladimir Davydov authored
We need these functions to implement format-less tuple comparison. Needed for https://github.com/tarantool/tarantool-ee/issues/191 NO_DOC=internal NO_TEST=internal NO_CHANGELOG=internal
-
- Aug 11, 2022
-
-
Vladimir Davydov authored
To make a memtx snapshot, we use the create_snapshot_iterator index method. The method creates a 'frozen' iterator over an index - changes done to the index after the iterator was created don't affect the iterator output. Also, the iterator is safe to use from any thread. This API works just fine for snapshots, but it's too limited to allow creation of user read views so we need to rework it. To make the existing snapshot infrastructure suitable for user read views, this commit replaces the create_snapshot_iterator method with create_read_view. The new method returns an index_read_view object, which has the API similar to the read-only API of an index. A read view object may only be created and destroyed in the tx thread, but it may be used in any thread. Currently, index_read_view has the only method - create_iterator, which takes iterator type and key and returns an index_read_view_iterator object. The iterator type and key arguments are ignored and we always assume the iterator type to be ITER_ALL (asserted), but later on we will fix this and also add a method to look up a tuple by key. Closes #7194 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Vladimir Davydov authored
Since commit f167c1af ("memtx: decompress tuples in snapshot iterator") a snapshot iterator may allocate the result tuple on the fiber region - the caller is supposed to clean the region after usage. So we don't need to store the tuple in sequence_data_iterator anymore - we can allocate it on the fiber region instead, which is simpler and more straightforward. NO_DOC=internal NO_TEST=internal NO_CHANGELOG=internal
-
Vladimir Davydov authored
The create_snapshot_iterator index callback is used by the memtx engine to create a consistent read view of data stored in memtx so that it can be written to a snapshot or sent to a remote replica. We also define and use this callback internally in vinyl to implement initial join. Actually, there's no need to have this code wrapped in a callback in vinyl, because it's never called from outside the vinyl internals. Let's inline it and drop the callback for vinyl. This will simplify further refactoring of the internal index read view API. Needed for #7194 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Vladimir Davydov authored
'get_raw' is a misleading name, because usually we append the '_raw' suffix to functions that work with raw MsgPack while 'get_raw' actually returns a formatted tuple. The function is used internally in memtx to implement tuple compression. Let's call it 'get_internal' to emphasize that. NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Boris Stepanenko authored
With current leader fencing implementation old leader doesn't resign it's leadership before new leader may be elected. Because of this several "leaders" might coexist in replicaset for some time. This commit changes replication_disconnect_timeout that it is twice as short for current raft leader (2*replication_timeout) if strict fencing is enabled. Assuming that replication_timeout is the same for every replica in replicaset this makes it less probable that new leader can be elected before old one resigns it's leadership. Old fencing behaviour can be enabled by setting fencing to soft mode. This is useful when connection death timeouts shouldn't be affected (e.g. different replication_timeouts are set to prioritize some replicas as leader over the others). Closes #7110 @TarantoolBot document Title: Strict fencing In `box.cfg` option `election_fencing_enabled` is deprecated in favor of `election_fencing_mode`. `election_fencing_mode` can be set to one of the following values: 'off' - fencing turned off (same as `election_fencing_enabled` set to false before). Connection death timeout is 4*replication_timeout for all nodes. 'soft' (default) - fencing turned on, but connection death timeout is the same for leader and followers in replicaset. This is enough to solve cluster being readonly and not being to elect a new leader in some situations because of pre-vote. Connection death timeout is 4*replication_timeout for all nodes. 'strict' - fencing turned on. In this mode leader tries its best to resign leadership before new leader can be elected. This is achived by halving death timeout on leader. Connection death timeout is 4*replication_timeout for followers and 2*replication_timout for current leader.
-
Boris Stepanenko authored
Currently box_raft asserts that raft is initialized when it is called. For strict fencing box_raft will be called in replication_disconnect_timeout to set different timeouts for leader and follower. Sometimes replication_disconnect_timeout is called before raft is initialized. This commit changes box_raft behaviour, removing the assertion and returning NULL instead of pointer to global raft state, if raft isn't initialized. This makes it possible to call box_raft even before raft has been initialized, checking that return value isn't NULL. Assuming that this assertion didn't trigger anywhere else, there is no need to check for box_raft returning NULL anywhere except new calls. Even if in future this will change it will trigger segmentation fault and the problem could be easily localized. Part of #7110 NO_DOC=internal changes NO_TEST=internal changes NO_CHANGELOG=internal changes
-
- Aug 09, 2022
-
-
Gleb Kashkin authored
The interactive mode has been ignored when stdin was not a tty and is no more. Now results of another command can be handled by tarantool. Before the patch: ``` $ echo 42 | tarantool -i LuajitError: stdin:1: unexpected symbol near '42' fatal error, exiting the event loop ``` After the patch: ``` $ echo 42 | tarantool -i Tarantool 2.5.0-130-ge3cf64a6c type 'help' for interactive help tarantool> 42 --- - 42 ... ``` Closes #5064 NO_DOC=bugfix
-
Alexander Turenko authored
It is counter-intuitive to see options of a component that is disabled at build time. Especially, when the returned value means that the component is enabled (while it is not so). Before this patch (on `-DENABLE_FEEDBACK_DAEMON=OFF` build): ```yaml tarantool> box.cfg() tarantool> box.cfg.feedback_enabled --- - true ... ``` After this patch (on `-DENABLE_FEEDBACK_DAEMON=OFF` build): ```yaml tarantool> box.cfg() tarantool> box.cfg.feedback_enabled --- - null ... ``` NB: The following test cases in cartridge are failed with `-DENABLE_FEEDBACK_DAEMON=OFF` (as before as well as after the patch): * integration.feedback.test_feedback * integration.feedback.test_rocks Since they verify cartridge's additions for the feedback daemon, it is expected outcome of disabling the component entirely. Ideally we should conditionally disable those test cases, but it is out of scope here. Follows up #3308 NO_DOC=I think it is expected behavior and unlikely it requires any change in the documentation NO_TEST=a test would verify behavior of the particular build type, but we have no such configuration in CI, so the test would be pretty useless NO_CHANGELOG=seems too minor to highlight it for users
-
- Aug 08, 2022
-
-
Ilya Verbin authored
To avoid potential buffer overflows and to make static analyzers happy. Fixed CWE-120: - sprintf: does not check for buffer overflows - strcpy: does not check for buffer overflows when copying to destination - strcat: does not check for buffer overflows when concatenating to destination Closes #7534 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Ilya Verbin authored
strlcat is a function from BSD, which is designed to be safer, more consistent, and less error prone replacement for strcat and strncat. NO_DOC=internal NO_CHANGELOG=internal Part of #7534
-
- Aug 05, 2022
-
-
Vladimir Davydov authored
The hash index doesn't create a snapshot clarifier, which is used for filtering out uncommitted tuples from a snapshot. Fix this. Also fix a bug in hash_snapshot_iterator_next, where we passed a wrong argument to tuple_data_range. It hasn't fired, because the clarifier didn't work. Fixes commit ee8ed065 ("txm: clarify all fetched tuples"). Fixes commit f167c1af ("memtx: decompress tuples in snapshot iterator"). Closes #7539 NO_DOC=bug fix
-
Georgiy Lebedev authored
Gap tracking does not handle gap writes when the key has the same value as the gap item: review the whole gap write handling logic, refactor it and fix handling of corner cases along the way. Co-authored-by:
Alexander Lyapunov <alyapunov@tarantool.org> Closes #7375 NO_DOC=bugfix
-
Georgiy Lebedev authored
Since `ITER_ALL` is an alias to `ITER_GE` in context of TREE index, denormalize it during iterator creation. Needed for #7375 NO_CHANGELOG=refactoring NO_DOC=refactoring NO_TEST=refactoring
-
Georgiy Lebedev authored
The problem is described in #7073. It was fixed only for `tree_iterator_start_raw` next method, but other methods used for reverse iterators are also subject to this bug: move tuple clarification from the wrapper of iterator `next` methods to individual iterator methods. Closes #7432 NO_DOC=bugfix
-
Alexander Turenko authored
The Rust module (see the issue) needs a getter and a setter for decimal values on the Lua stack. Let's make them part of the module API. Part of #7228 @TarantoolBot document Title: Lua/C functions for decimals in the module API The following functions are added into the module API: ```c /** * Allocate a new decimal on the Lua stack and return * a pointer to it. */ API_EXPORT box_decimal_t * luaT_newdecimal(struct lua_State *L); /** * Allocate a new decimal on the Lua stack with copy of given * decimal and return a pointer to it. */ API_EXPORT box_decimal_t * luaT_pushdecimal(struct lua_State *L, const box_decimal_t *dec); /** * Check whether a value on the Lua stack is a decimal. * * Returns a pointer to the decimal on a successful check, * NULL otherwise. */ API_EXPORT box_decimal_t * luaT_isdecimal(struct lua_State *L, int index); ```
-
Alexander Turenko authored
This change follows the previous commits regarding decimal, uuid and datetiem functions. See them for details. Part of #7228 NO_DOC=refactoring, no user-visible changes NO_TEST=refactoring, no behavior changes NO_CHANGELOG=refactoring, no user-visible changes
-
Alexander Turenko authored
This change follows the previous commits regarding `luaT_{new,push}decimal()` and `luaT_{new,push}uuid()`. See them for details. Part of #7228 NO_DOC=refactoring, no user-visible changes NO_TEST=refactoring, no behavior changes NO_CHANGELOG=refactoring, no user-visible changes
-
Alexander Turenko authored
This change follows the previous commit regarding `luaT_newdecimal()` and `luaT_pushdecimal()`, see explanation and details there. Also changed the `luaL_` prefix to more appropriate `luaT_`. The `struct tt_uuid` is our own type, the functions are specific to tarantool. So `luaT_`. Part of #7228 NO_DOC=refactoring, no user-visible changes NO_TEST=refactoring, no behavior changes NO_CHANGELOG=refactoring, no user-visible changes
-
Alexander Turenko authored
`luaT_pushdecimal()` now accepts a decimal argument to copy into the Lua managed memory. `luaT_newdecimal()` now doing what `luaT_pushdecimal()` did before: just allocates a storage for decimal on the Lua stack. This naming looks much more friendly. It also seems that it follow Lua API names: `lua_push*()` accepts what to push, `lua_new*()` doesn't. A couple of notes around the change: * On the first glance it seems that `luaT_pushdecimal()` is redundant, because it can be written using `luaT_newdecimal()` + copying. That's truth in contexts, where we know size of the internal `decimal_t` structure. A user of the module API don't know it and should pass `box_decimal_t *` pointer to `luaT_pushdecimal()` to write the value. * I use `memcpy()` instead of just `*a = *b` in `luaT_pushdecimal()` to copy the padding byte content. Who knows, maybe this not-so-legal way to hold extra information may be crucial for some use case or will allow us to add one field into the structure. This is preparatory commit for exposing `luaT_*decimal()` functions into the module API. Next commits will change uuid, datetime, interval functions in the same way. Part of #7228 NO_DOC=refactoring, no user-visible changes NO_TEST=will be tested in a next commit, after exposing to the module API NO_CHANGELOG=refactoring, no user-visible changes
-
Alexander Turenko authored
This way we can use just luaT_isdecimal() instead of two calls: luaT_isdecimal() + luaT_checkdecimal() or luaT_isdecimal() + luaL_checkcdata(). It is convenient and we already follow this way in luaT_istuple(). The difference from luaT_checkdecimal() is that luaT_isdecimal() does not raise a Lua exception. In may be undesirable and/or complicated to handle in some contexts. This is the preparation for exposing luaT_isdecimal() into the module API. Part of #7228 NO_DOC=refactoring, no user-visible changes NO_TEST=will be tested in a next commit, after exposing to the module API NO_CHANGELOG=refactoring, no user-visible changes
-
Alexander Turenko authored
We use the tarantool specific prefix for functions that are working with tarantool specific types. lua_ or luaL_ prefix may be confusing, because it is not always clear what is the origin of the function and where to find its documentation. This change is the preparation for exposing luaT_pushdecimal() and luaT_isdecimal() into the module API. While I'm here, I made several tidy changes: * Added `static` where appropriate. * Removed luaT_pushdecimalstr() from the header file, because it is not used outside of the compilation unit. Part of #7228 NO_DOC=refactoring, no user-visible changes NO_TEST=refactoring, nothing new to test NO_CHANGELOG=refactoring, no user-visible changes
-
Serge Petrenko authored
It's possible to hang an instance by some non-yielding request. The simplest example is `while true do end`. A more true to life one would be a `select{}` from a large space, or `pairs` iteration over a space without yields. Any such request makes the instance unresponsive - it can serve neither reads nor writes. At the same time, the instance appears alive to other cluster members: relay thread used to communicate with others is not hung and continues to send heartbeats every replication_timeout. The problem is the most severe with Raft leader elections: followers believe the leader is fine and do not start elections despite leader being unable to serve reads or writes. Closes #7512 NO_DOC=bugfix
-
- Aug 04, 2022
-
-
Vladimir Davydov authored
Currently, there's no notion of a BPS tree read view per se - one can create an iterator over a regular tree and then "freeze" it. This works just fine for snapshotting and joining replicas, but this spartan API doesn't let us implement user read views, because to do that we need to do lookups and create iterators over a frozen tree as many times as we want, not just once. So this patch introduces a concept of bps_tree_view, which contains a frozen image of a bps_tree and implements a subset of non-modifying bps_tree methods: - bps_tree_view_size - bps_tree_view_find - bps_tree_view_first - bps_tree_view_last - bps_tree_view_lower_bound - bps_tree_view_lower_bound_elem - bps_tree_view_upper_bound - bps_tree_view_upper_bound_elem - bps_tree_view_iterator_get_elem - bps_tree_view_iterator_prev - bps_tree_view_iterator_next - bps_tree_view_iterator_is_equal Note, bps_tree and bps_tree_view share bps_tree_iterator, because iterator methods (get_elem, next, prev, is_equal) take bps_tree or bps_tree_view. The bps_tree_iterator now contains only block index and offset. We could also implement the rest of non-modifying methods, but didn't do that, because they are not needed to implement user read views: - bps_tree_random - bps_tree_approximate_count - bps_tree_debug_check - bps_tree_print To create a bps_tree_view from a bps_tree, one is supposed to call bps_tree_view_create. If a bps_tree_view is no longer needed, it should be destroyed with bps_tree_view_destroy. Old methods used for creating frozen iterators were dropped: - bps_tree_iterator_freeze - bps_tree_iterator_destroy To avoid code duplication, we factored out the common part of bps_tree and bps_tree_view into a new structure, named bps_tree_common. Basically, the new structure contains all bps_tree members except matras, which is stored in bps_tree. The difference between bps_tree_view and bps_tree is that the latter stores matras_view instead of matras. The common part contains pointers to matras and matras_view, which are used by internal implementation to look up bps_tree blocks. All internal methods now take bps_tree_common instead of bps_tree. For all public methods that are implemented both for bps_tree and bps_tree_view, we have the common implementation defined in _impl suffixed private function, which is called by the corresponding public functions. To ensure that a modifying method isn't called on bps_tree_common object corresponding to a bps_tree_view because of a bug in the bps_tree implementation, we added !matras_is_read_view_created assertion to bps_tree_touch_block. Closes #7191 NO_DOC=refactoring NO_CHANGELOG=refactoring
-
Vladimir Davydov authored
We have a method for getting the number of elements stored in a BPS tree. Let's use it instead of accessing BPS tree internals directly so that we can freely refactor BPS tree internals. NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Vladimir Davydov authored
- Add bps_tree_delete_value to the comment and declarations. (All other public methods are there.) - Fix typo in a comment: approxiamte -> approximate. - Fix comment to bps_tree_random. - Remove repeated word 'count' from comments. NO_DOC=no change NO_TEST=no change NO_CHANGELOG=no change
-
Vladimir Davydov authored
- Rename bps_tree_iterator_are_equal to bps_tree_iterator_is_equal for consistency with other methods that check two objects for equality (for example, tt_uuid_is_equal). - Rename bps_tree_iterator_first and bps_tree_iterator_last to bps_tree_first and bps_tree_last, because these are methods of bps_tree, not bps_tree_iterator. Omitting _iterator is also consistent with bps_tree_lower_bound and bps_tree_upper_bound methods, which also create bps_tree_iterator objects. NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Georgiy Lebedev authored
In case of reverse iterators, due to index limitations, we need to clarify the successor tuple early: this implies that the successor's story is not always at the top of the history chain, whilst we need to add the gap item to the story currently present in index — fix this by reusing the iterators' check logic to set the current iterator's tuple (which is considered the successor) to a tuple in index. CLoses #7409 NO_DOC=bugfix
-
Georgiy Lebedev authored
All the 'base' TREE index iterator `next` methods (including `tree_iterator_start`) internally set the iterator's current tuple to the one found in the index satisfying the conditions: setting the iterator's current tuple to the clarified one is redundant and moreover gives a performance penalty each iteration because of the iterator check logic. Needed for #7409 NO_CHANGELOG=refactoring NO_DOC=refactoring NO_TEST=refactoring
-
Georgiy Lebedev authored
The `next` method of memtx HASH index 'GT' iterator is initially set to 'GT' and is supposed to be set to 'GE' after first iteration: it is mistakenly set to the 'base' method instead of the full method which also does tuple clarification — this allows dirty reads. Move the `next` method change on first iteration to `WRAP_ITERATOR_METHOD` for clarity and correctness. Closes #7477 NO_DOC=bugfix
-
Vladimir Davydov authored
Currently, there's no notion of a LIGHT hash table read view per se - one can create an iterator over a regular hash table and then "freeze" it. This works just fine for snapshotting and joining replicas, but this spartan API doesn't let us implement user read views, because to do that we need to do lookups and create iterators over a frozen hash table as many times as we want, not just once. So this patch introduces a concept of LIGHT(view), which contains a frozen image of a LIGHT(core) and implements a subset of non-modifying LIGHT(core) methods: - LIGHT(view_count) - LIGHT(view_find) - LIGHT(view_find_key) - LIGHT(view_get) - LIGHT(view_iterator_begin) - LIGHT(view_iterator_key) - LIGHT(view_iterator_get_and_next) Note, LIGHT(core) and LIGHT(view) share LIGHT(iterator), because iterator methods (begin, key, get_and_next) take LIGHT(core) or LIGHT(view). The LIGHT(iterator) now contains only a hash table slot. We could also implement the rest of non-modifying methods, but didn't do that, because they are not needed to implement user read views: - LIGHT(random) - LIGHT(selfcheck) To create a LIGHT(view) from a LIGHT(core), one is supposed to call LIGHT(view_create). If a LIGHT(view) is no longer needed, it should be destroyed with LIGHT(view_destroy). Old methods used for creating frozen iterators were dropped: - LIGHT(iterator_freeze) - LIGHT(iterator_destroy) To avoid code duplication, we factored out the common part of LIGHT(core) and LIGHT(view) into a new structure, named LIGHT(common). Basically, the new structure contains all LIGHT(core) members except matras, which is stored in LIGHT(core). The difference between LIGHT(view) and LIGHT(core) is that the latter stores matras_view instead of matras. The common part contains pointers to matras and matras_view, which are used by internal implementation to look up LIGHT(record). All internal methods now take LIGHT(common) instead of LIGHT(core). For all public methods that are implemented both for LIGHT(core) and LIGHT(view), we have the common implementation defined in _impl suffixed private function, which is called by the corresponding public functions. To ensure that a modifying method isn't called on LIGHT(common) object corresponding to a LIGHT(view) because of a bug in the LIGHT code, we added !matras_is_read_view_created assertion to LIGHT(touch_record), LIGHT(prepare_first_insert), and LIGHT(grow). Closes #7192 NO_DOC=refactoring NO_CHANGELOG=refactoring
-