- Oct 28, 2022
-
-
Vladimir Davydov authored
This commit fixes BEGIN, COMMIT, and ROLLBACK counters in the box.stat() output. Before this commit, they always showed 0. Now, they report the number of started, committed, and rolled back transactions, respectively. Closes #7583 NO_DOC=bug fix
-
Ilya Verbin authored
Currently, if a snapshot contains some correct entries, but doesn't include system spaces, Tarantool crashes with segmentation fault, or for Debug build: void diag_raise(): Assertion `e != NULL' failed. This happens because memtx_engine_recover_snapshot returns -1, while diag is not set. Let's panic instead of a crash. Closes #7800 NO_DOC=bugfix
-
- Oct 26, 2022
-
-
Vladimir Davydov authored
0xc1 isn't a valid MsgPack header, but it was allowed by mp_check. As a result, msgpack.decode crashed while trying to decode it. This commit updates the msgpuck library to fix this issue. Closes #7818 NO_DOC=bug fix
-
- Oct 25, 2022
-
-
Nikolay Shirokovskiy authored
So one can easily check current box status. NO_DOC=minor change Closes #7255
-
Vladimir Davydov authored
We're planning to introduce a basic C API for user read views (EE-only). Like all other box C API functions, the new API functions will use the existing box error C API for reporting errors. The problem is that a read view created using C API should be usable from user threads (started with the pthread lib) while the box error C API doesn't work in user threads, because those threads don't have the cord pointer initialized (a diagnostic area is stored in a cord object). To address this issue, let's create a new cord object automatically on first use of cord() if it wasn't created explicitly. Automatically created object is destroyed at thread exit (to achieve that, we use the C++ RAII concept). Closes #7814 NO_DOC=The C API documentation doesn't say anything about threads. Let's keep it this way for now. We're planning to introduce a new C API to work with threads in C modules. We'll update the doc when it's ready.
-
Serge Petrenko authored
Closes #7797 NO_DOC=security fix NO_TEST=security fix
-
Serge Petrenko authored
getenv() return values cannot be trusted, because an attacker might set them. For instance, we shouldn't expect, that getenv() returns a value of some sane size. Another problem is that getenv() returns a pointer to one of `char **environ` members, which might change upon next setenv(). Introduce a wrapper, getenv_safe(), which returns the value only when it fits in a buffer of a specified size, and copies the value onto the buffer. Use this wrapper everywhere in our code. Below's a slightly decorated output of `grep -rwn getenv ./src --include *.c --include *.h --include *.cc --include *.cpp --include *.hpp --exclude *.lua.c` as of 2022-10-14. `-` marks invalid occurences (comments, for example), `*` marks the places that are already guarded before this patch, `X` mars the places guarded in this patch, and `^` marks places fixed in the next commit: NO_WRAP ``` * ./src/lib/core/coio_file.c:509: const char *tmpdir = getenv("TMPDIR"); X ./src/lib/core/errinj.c:75: const char *env_value = getenv(inj->name); - ./src/proc_title.c:202: * that might try to hang onto a getenv() result.) - ./src/proc_title.c:241: * is mandatory to flush internal libc caches on getenv/setenv X ./src/systemd.c:54: sd_unix_path = getenv("NOTIFY_SOCKET"); * ./src/box/module_cache.c:300: const char *tmpdir = getenv("TMPDIR"); X ./src/box/sql/os_unix.c:1441: azDirs[0] = getenv("SQL_TMPDIR"); X ./src/box/sql/os_unix.c:1446: azDirs[1] = getenv("TMPDIR"); * ./src/box/lua/console.c:394: const char *envvar = getenv("TT_CONSOLE_HIDE_SHOW_PROMPT"); ^ ./src/box/lua/console.lua:771: local home_dir = os.getenv('HOME') ^ ./src/box/lua/load_cfg.lua:1007: local raw_value = os.getenv(env_var_name) X ./src/lua/init.c:575: const char *path = getenv(envname); X ./src/lua/init.c:592: const char *home = getenv("HOME"); * ./src/find_path.c:77: snprintf(buf, sizeof(buf) - 1, "%s", getenv("_")); ``` NO_WRAP Part-of #7797 NO_DOC=security
-
- Oct 24, 2022
-
-
Mergen Imeev authored
This patch fixes the issue described in issue #5310 when the tuple format has more fields than the space format. This solution is more general than the solution in 89057a21. Follow-up #5310 Closes #4666 NO_DOC=bugfix
-
- Oct 21, 2022
-
-
Georgiy Lebedev authored
Since our diagnostics use the `__FILE__` macro, they provide absolute paths, which is kind of redundant and inconsistent: replace them with relative ones. As for debugging information, replacing absolute paths with relative ones also requires an extra command to tell the debugger where to find the source files, which is not convenient for developers: provide a new `DEV_BUILD` option (turned off by default), which replaces absolute paths with relative ones in debugging information if turned off. Strip the prefix map flags from compiler flags exported to tarantool via `src/trvia/config.h`. Closes #7808 NO_DOC=<verbosity> NO_TEST=<verbosity>
-
- Oct 20, 2022
-
-
Andrey Saranchin authored
If we raise different errors in case of entering an invalid password and entering the login of a non-existent user during authorization, it will open the door for an unauthorized person to enumerate users. So let's unify raised errors in the cases described above. Closes #tarantool/security#16 NO_DOC=security fix
-
- Oct 19, 2022
-
-
Timur Safin authored
At the moment we are not yet compatible with readline support inside of Tarantool console. Diagnose that situation at the moment debugger started and bail out. NO_TEST=interactive NO_DOC=Markdown updated
-
Timur Safin authored
NO_TEST=see it elsewhere Part of #7593 @TarantoolBot document Title: Console debugger for Lua Console debugger luadebug.lua ============================== Module `luadebug.lua` is available as console debugger of Lua scripts. It's activated via: ``` local debugger = require 'luadebug' debugger() ``` Originally we have used 3rd-party code from slembcke/debugger.lua but significantly refactored since then. Currently available console shell commands are: ``` c|cont|continue - continue execution d|down - move down the stack by one frame e|eval $expression - execute the statement f|finish|step_out - step forward until exiting the current function h|help|? - print this help message l|locals - print the function arguments, locals and upvalues n|next|step_over - step forward by one line (skipping over functions) p|print $expression - execute the expression and print the result q|quit - exit debugger s|st|step|step_into - step forward by one line (into functions) t|trace|bt - print the stack trace u|up - move up the stack by one frame w|where $linecount - print source code around the current line ``` Console debugger `luadebug.lua` allows to see sources of builtin Tarantool module (e.g. `@builtin/datetime.lua`), and it uses new function introduced for that purpose `tarantool.debug.getsources()`, one could use this function in any external GUI debugger (i.e. vscode or JetBrains) if need to show sources of builtin modules while they have been debugged. > Please see third_party/lua/README-luadebug.md for a fuller description > of an original luadebug.lua implementation.
-
Mergen Imeev authored
The _vfunc system space is the sysview for the _func system space. However, the _vfunc format is different from the _func format. This patch makes the _vfunc format the same as the _func format. Closes #7822 NO_DOC=bugfix
-
- Oct 18, 2022
-
-
Georgiy Lebedev authored
Allocation of URI parameters and their values dynamic arrays is done inefficiently: they are reallocated each time a new parameter or parameter value is added — grow them exponentially instead. `struct uri_param` and `struct uri` are exposed in Lua via FFI (see src/lua/uri.lua): add warnings about the necessity of reflecting changes to them in `ffi.cdecl`. Closes #7155 NO_DOC=optimization NO_TEST=optimization
-
Timur Safin authored
We used to ignore timezone difference (in `tzoffset`) for datetime subtraction operation: ``` tarantool> datetime.new{tz='MSK'} - datetime.new{tz='UTC'} --- - +0 seconds ... tarantool> datetime.new{tz='MSK'}.timestamp - datetime.new{tz='UTC'}.timestamp --- - -10800 ... ``` Now we accumulate tzoffset difference in the minute component of a resultant interval: ``` tarantool> datetime.new{tz='MSK'} - datetime.new{tz='UTC'} --- - -180 minutes ... ``` Closes #7698 NO_DOC=bugfix
-
Timur Safin authored
We did not take into consideration the fact that as result of date/time arithmetic we could get in a different timezone, if DST boundary has been crossed during operation. ``` tarantool> datetime.new{year=2008, month=1, day=1, tz='Europe/Moscow'} + datetime.interval.new{month=6} --- - 2008-07-01T01:00:00 Europe/Moscow ... ``` Now we resolve tzoffset at the end of operation if tzindex is not 0. Fixes #7700 NO_DOC=bugfix
-
Ilya Verbin authored
Currently, in case of recovery from an old snapshot, Tarantool allows to perform DDL operations on an instance with non-upgraded schema. It leads to various unpredictable errors (because the DDL code assumes that the schema is already upgraded). This patch forbids the following operations unless the user has the most recent schema version: - box.schema.space.create - box.schema.space.drop - box.schema.space.alter - box.schema.index.create - box.schema.index.drop - box.schema.index.alter - box.schema.sequence.create - box.schema.sequence.drop - box.schema.sequence.alter - box.schema.func.create - box.schema.func.drop Closes #7149 NO_DOC=bugfix
-
- Oct 14, 2022
-
-
Mergen Imeev authored
This patch fixed the assertion when JOIN uses index of unsupported type. Closes #5678 NO_DOC=bugfix
-
Vladimir Davydov authored
This commit adds support of transaction isolation levels introduced earlier for memtx mvcc by commit ec750af6 ("txm: introduce transaction isolation levels"). The isolation levels work exactly in the same way as in memtx: - Unless a transaction explicitly specifies the 'read-committed' isolation level, it'll skip prepared statements, even if they are visible from its read view. The background for this was implemented in the previous patches, which added the is_prepared_ok flag to cache and mem iterators. - If a transaction skips a prepared statement, which would otherwise be visible from its read view, it's sent to the most recent read view preceding the prepared statement LSN. Note, older prepared statements are still visible from this read view and can actually be selected if committed later. - A transaction using the 'best-effort' isolation level (default) is switched to 'read-committed' when it executes the first write statement. The implementation is tested by the existing memtx mvcc tests that were made multi-engine in the scope of this commit. However, we add one more test case - the one that checks that a 'best-effort' read view is properly updated in case there is more than one prepared transaction. Also, there are few tests that relied upon the old implementation and assumed that select from Vinyl may return unconfirmed tuples. We update those tests here as well. Closes #5522 NO_DOC=already documented
-
- Oct 13, 2022
-
-
Vladislav Shpilevoy authored
There was a bug that an instance could ack a transaction from an old Raft term thus allowing the old leader to CONFIRM it, even if that first instance knew there is a newer Raft term going on. As a result, the old leader could write CONFIRM even if there is already a new leader elected and the synchro quorum was > half. That led to split-brain, when bad txn reached the new leader, and PROMOTE reached the old leader. Split-brain here is totally unnecessary. If the quorum is correct, synchro timeout is infinite, and there is no async transactions, then split-brain shouldn't ever happen. The fix is as simple as attach the current Raft term number to applier heartbeats. In the testcase above if terms are attached, the old leader gets ACK + new term. That causes the old leader freeze even if the pending txn got quorum. The old leader can't CONFIRM nor ROLLBACK its pending txns until a new leader is elected. Freeze is guaranteed, because if a new leader was elected, then it had got votes from > half cluster. It means > half nodes have the new term. That in turn means the old leader during collecting ACKs for its "new" txn will get the new term number from at least one replica. When the new leader finished writing PROMOTE, it either confirms or rolls back the txn of the old leader (depending on whether it has reached the new leader before promotion). Neither result causes split brain. The rollback only causes a non-critical error on the old leader raised by the bad txn's commit attempt. There were some alternatives considered. One of the most promising ones was to make instances reject txns if they see these txns coming from an instance having an old Raft term. It would help in the test provided above. But wouldn't do in a more complicated test, when there is a third node which gets the bad transaction, then gets local term bumped, and then replicates to any other instance. Others would accept that bad txn, because the sender has a newer Raft term, even though the txn author is still in the old term. Tracking terms of txn author is not possible in too many cases so as to rely on that. Closes #7253 @TarantoolBot document Title: New iproto field in applier -> relay ACKs The applier->relay channel (from replica back to master) is used only for sending ACKs. Replication data goes the other way (relay->applier). These ACKs had 2 fields: `IPROTO_VCLOCK (0x26)` and `IPROTO_VCLOCK_SYNC (0x5a)`. Now they have a new field: `IPROTO_TERM (0x53)`. It is a unsigned number containing `box.info.election.term` of the sender node (applier, replica).
-
Ilya Verbin authored
Currently if a non-string type is passed to luaT_key_def_set_part, lua_tolstring returns null-pointer type_name, which is passed to a string formatting function in diag_set. Closes #5222 NO_DOC=bugfix
-
Ilya Verbin authored
Don't accept an empty string or leading part of "str" or "num" as a valid field type. Closes #5940 NO_DOC=Partial field types weren't documented Co-authored-by:
Alexander Turenko <alexander.turenko@tarantool.org>
-
Aleksandr Lyapunov authored
Since the function is actually an eval, by default there should be no execute access right in public role. Closes tarantool/security#14 NO_DOC=bugfix
-
Mergen Imeev authored
Prior to this patch, it was possible to call box.execute() before box was initialized, i.e. before calling box.cfg(). This, however, caused box.cfg() to be called automatically, which could be problematic as some parameters could not be changed after box.cfg() was called. After this patch, box.execute() will only be available when the box has been initialized. Closes #4726 @TarantoolBot document Title: box.execute() now available only after initialization of box Previously, it was possible to call box.execute() before the box was configured, in which case the box was configured automatically, which could lead to problems with box parameters. Now box.execute() can only be called after the box has been properly configured. It is also forbidden to set language to SQL in a console with an unconfigured box.
-
- Oct 12, 2022
-
-
Aleksandr Lyapunov authored
Fix a simple typo that caused the problem. Closes #7645 NO_DOC=bugfix
-
- Oct 11, 2022
-
-
Mergen Imeev authored
This patch introduces new rules to determine type of NULLIF() built-in function. Closes #6990 @TarantoolBot document Title: New rules to determine type of result of NULLIF The type of the result of NULLIF() function now matches the type of the first argument.
-
Mergen Imeev authored
This patch introduces new rules to determine type of CASE operation. Part of #6990 @TarantoolBot document Title: New rules to determine type of result of CASE New rules are applied to determine the type of the CASE operation. If all values are NULL with no type, or if a bind variable exists among the possible results, then the type of CASE is ANY. Otherwise, all NULL values with no type are ignored, and the type of CASE is determined using the following rules: 1) if all values of the same type, then type of CASE is this type; 2) otherwise, if any of the possible results is of one of the incomparable types, then the type of CASE is ANY; 3) otherwise, if any of the possible results is of one of the non-numeric types, then the type of CASE is SCALAR; 4) otherwise, if any of the possible results is of type NUMBER, then the type of CASE is NUMBER; 5) otherwise, if any of the possible results is of type DECIMAL, then the type of CASE is DECIMAL; 6) otherwise, if any of the possible results is of type DOUBLE, then the type of CASE is DOUBLE; 7) otherwise the type of CASE is INTEGER.
-
- Oct 06, 2022
-
-
Serge Petrenko authored
The commit c1c77782 ("replication: fix bootstrap failing with ER_READONLY") made applier retry connection infinitely upon receiving a ER_READONLY error on join. At the time of writing that commit, this was the only way to make join retriable. Because there were no retries in scope of bootstrap_from_master. The join either succeeded or failed. Later on, bootstrap_from_master was made retriable in commit f2ad1dee ("replication: retry join automatically"). Now when bootstrap_from_master fails, replica reconnects to all the remote nodes, thus updating their ballots, chooses a new (probably different from the previous approach) bootstrap leader, and retries booting from it. The second approach is more preferable, and here's why. Imagine bootstrapping a cluster of 3 nodes, A, B and C in a full-mesh topology. B and C connect to all the remote peers almost instantly, and both independently decide that B will be the bootstrap leader (it means it has the smallest uuid among A, B, C). At the same time, A can't connect to C. B bootstraps the cluster, and joins C. After C is joined, A finally connects to C. Now A can choose a bootstrap leader. It has an old B's ballot (smallest uuid, but not yet booted) and C's ballot (already booted). This is because C's ballot is received after cluster bootstrap, and B's ballot was received earlier than that. So A believes C is a better bootstrap leader, and tries to boot from it. A will fail joining to C, because at the same time C tries to sync with everyone, including A, and thus stays read-only. Since A retries joining to the same instance over and over again, this situation makes the A and C stuck forever. Let's retry ER_READONLY on another level: instead of trying to join to the same bootstrap leader over and over, try to choose a new bootstrap leader and boot from it. In the situation described above, this means that A would try to join to C once, fail due to ER_READONLY, re-fetch new ballots from everyone and choose B as a join master (now it has smallest uuid and is booted). The issue was discovered due to linearizable_test.lua hanging occasionally with the following output: NO_WRAP No output during 40 seconds. Will abort after 320 seconds without output. List of workers not reporting the status: - 059_replication-luatest [replication-luatest/linearizable_test.lua, None] at /tmp/t/059_replication-luatest/linearizable.result:0 [059] replication-luatest/linearizable_test.lua [ fail ] [059] Test failed! Output from reject file /tmp/t/rejects/replication-luatest/linearizable.reject: [059] TAP version 13 [059] 1..6 [059] # Started on Thu Sep 29 10:30:45 2022 [059] # Starting group: linearizable-read [059] not ok 1 linearizable-read.test_wait_others [059] # ....11.0~entrypoint.531.dev/test/luatest_helpers/server.lua:104: Waiting for "readiness" on server server_1-q7berSRY4Q_E (PID 53608) timed out [059] # stack traceback: [059] # ....11.0~entrypoint.531.dev/test/luatest_helpers/server.lua:104: in function 'wait_for_readiness' [059] # ...11.0~entrypoint.531.dev/test/luatest_helpers/cluster.lua:92: in function 'start' [059] # ...t.531.dev/test/replication-luatest/linearizable_test.lua:50: in function <...t.531.dev/test/replication-luatest/linearizable_test.lua:20> [059] # ... [059] # [C]: in function 'xpcall' NO_WRAP Part-of #7737 NO_DOC=bugfix
-
Mergen Imeev authored
This patch fixed the assertion when using INDEXED BY with an index that is at least the third in space. Closes #5976 NO_DOC=bugfix
-
Mergen Imeev authored
If the length of the tuple is greater than the number of fields in the format, it is possible that the cursor in the VDBE will be overridden with zeros. Closes #5310 NO_DOC=bugfix
-
Igor Munkin authored
* FFI: Always fall back to metamethods for cdata length/concat. * FFI: Add tonumber() specialization for failed conversions. * build: introduce LUAJIT_ENABLE_CHECKHOOK option * Fix overflow check in unpack(). * gdb: refactor iteration over frames while dumping stack * gdb: adjust to support Python 2 (CentOS 7) Closes #7458 Closes #7655 Needed for #7762 Part of #7230 NO_DOC=LuaJIT submodule bump NO_TEST=LuaJIT submodule bump
-
- Oct 04, 2022
-
-
Georgiy Lebedev authored
For reasons described in #7231 HASH index 'GT' iterator type is deprecated: print a warning exactly once about the deprecation. Closes #7231 @TarantoolBot document Title: memtx HASH index 'GT' iterator deprecation memtx HASH index 'GT' iterator is deprecated since Tarantool 2.11 (tarantool/tarantool#7231) and will removed in a future release of Tarantool: the user will get a warning when using it.
-
- Sep 30, 2022
-
-
Georgiy Lebedev authored
Concurrent transactions can try to insert tuples that intersect only by parts of secondary index: in this case when one of them gets prepared, the others get conflicted, but the committed story does not get retained (because the conflicting statements are not added to the committed story's delete statement list as opposed to primary index) and is lost after garbage collection: retain stories if there is a newer uncommitted story in the secondary indexes' history chain. Closes #7712 NO_DOC=bugfix
-
- Sep 29, 2022
-
-
Serge Petrenko authored
When using vclockset_psearch, the resulting vclock may be incomparable to the search key. For example, with a vclock set { } (empty vclock), {0: 1, 1: 10}, {0: 2, 1:11} vclockset_psearch(set, {0:2, 1: 9}) might return {0: 1, 1: 10}, and not { }. This is known and avoided in other places, for example recover_remaining_wals(), where vclockset_match() is used instead. vclockset_match() starts with the same result as vclockset_psearch() and then unwinds the result until the first vclock which is less or equal to the search key is found. Having vclockset_psearch in wal_collect_garbage_f could lead to issues even before local space changes became written to 0-th vclock component. Once replica subscribes, its' gc consumer is set to the vclock, which the replica sent in subscribe request. This vclock might be incomparable with xlog vclocks of the master, leading to the same issue of potentially deleting a needed xlog during gc. Closes #7584 NO_DOC=bugfix
-
- Sep 28, 2022
-
-
Alexander Turenko authored
The idea is borrowed from [1]: hide and save prompt, user's input and cursor position before writing to stdout/stderr and return everything back afterwards. Not every stdout/stderr write is handled this way: only tarantool's logger (when it writes to stderr) and tarantool's print() Lua function performs the prompt hide/show actions. For example, `io.stdout:write(<...>)` Lua call or `write(STDOUT_FILENO, <...>)` C call may mix readline's prompt with actual output. However the logger and print() is likely enough for the vast majority of usages. The readline's interactive search state (usually invoked by Ctrl+R) is not covered by this patch. Sadly, I didn't find a way to properly save and restore readline's output in this case. Implementation details ---------------------- Several words about the allocation strategy. On the first glance it may look worthful to pre-allocate a buffer to store prompt and user's input data and reallocate it on demand. However rl_set_prompt() already performs free() plus malloc() at each call[^1], so avoid doing malloc() on our side would not change the picture much. Moreover, this code interacts with a human, which is on many orders of magnitude slower that a machine and will not notice a difference. So I decided to keep the code simpler. [^1]: Verified on readline 8.1 sources. However it worth to note that rl_replace_line() keeps the buffer and performs realloc() on demand. The code is organized to make say and print modules calling some callbacks without knowledge about its origin and dependency on the console module (or whatever else module would implement this interaction with readline). The downside here is that console needs to know all places to set the callbacks. OTOH, it offers explicit list of such callbacks in one place and, at whole, keep the relevant code together. We can redefine the print() function from every place in the code, but I prefer to make it as explicit as possible, so added the new internal print.lua module. We could redefine _G.print on demand instead of setting callbacks for a function assigned to _G.print once. The downside here is that if a user save/capture the old _G.print value, it'll use the raw print() directly instead of our replacement. Current implementation seems to be more safe. Alternatives considered ----------------------- I guess we can clear readline's prompt and user input manually and don't let readline know that something was changed (and restore the prompt/user input afterwards). It would save allocations and string copying, but likely would lean on readline internals too much and repeat some of its functionality. I considered this option as unstable and declined. We can redefine behavior for all writes to stdout and stderr. There are different ways to do so: 1. Redefine libc's write() with our own implementation, which will call the original libc's write()[^2]. It is defined as a weak symbol in libc (at least in glibc), so there is no problem to do so. 2. Use pipe(), dup() and dup2() to execute our own code at STDOUT_FILENO, STDERR_FILENO writes. [^2]: There is a good article about pitfalls on this road: [2]. It is about LD_PRELOAD, but I guess everything is very similar with wrapping libc's function from an executable. In my opinion, those options are dangerous, because they implicitly change behavior of a lot of code, which unlikely expects something of this kind. The second option (use pipe()) adds more user space/kernel space context switches, more copying and also would add possible implicit fiber yield at any `write(STD*_FILENO, <...>)` call -- unlikely all user's code is ready for that. Fixes #7169 [1]: https://metacpan.org/dist/AnyEvent-ReadLine-Gnu/source/Gnu.pm [2]: https://tbrindus.ca/correct-ld-preload-hooking-libc/ NO_DOC=this patch prevents mixing of output streams on a terminal and it is what a user actually expects; no reason to describe how bad would be his/her life without it
-
Georgiy Lebedev authored
When we rollback a transaction statement, we relink its read trackers to a newer story in the history chain, if present (6c990a7b), but we do not handle the case when there is no newer story. If there is an older story in the history chain, we can relink the rollbacked story's reader to it, but if the rollbacked story is the only one left, we need to retain it, because it stores the reader list needed for conflict resolution — such stories are distinguished by the rollbacked flag, and there can be no more than one such story located strictly at the end of a given history chain (which means a story can be fully unlinked from some indexes and present at the end of others). There are several nuances we need to account for: Firstly, such rollbacked stories must be impossible to read from an index: this is ensured by `memtx_tx_story_is_visible`. Secondly, rollbacked transactions need to be treated as prepared with stories that have `add_psn == del_psn`, so that they are correctly deleted during garbage collection. After this logical change we have the following partially ordered set over tuple stories: ———————————————————————————————————————————————————————> serialization time |- - - - - - - -|— — — — — -|— — — — — |— — — — — — -|— — — — — — — - | No more than | Committed | Prepared | In-progress | One dirty | one rollbacked| | | | story in index | story | | | | |- - - - - - - -|— — — — — -| — — — — —|— — — — — — -|— — — — — — — — Closes #7343 NO_DOC=bugfix
-
Serge Petrenko authored
Linearizability is a property of operations when operation performed on any node sees all the operations performed earlier on any other node of the cluster. More strictly speaking, it's a property demanding that if a response for some write request arrived earlier than some read request was made, this read request must see the results of that (or any earlier) write request. This patch introduces a new transaction isolation level: 'linearizable'. When the option is set, box.begin() is stalled until the node receives the latest data from at least one member of the quorum. This is needed to make sure that the node sees all the writes committed on a quorum. The transaction is served only after the node sees the relevant data, thus implementing linearizable semantics. The node working on a linearizable request uses its' relays vclock sync mechanism in order to know the fresh vclock of remote nodes. Closes #6707 @TarantoolBot document Title: New transaction isolation level - linearizable There is a new transaction isolation level - linearizable. You may call `box.begin` with `txn_isolation = 'linearizable'`, but you can't set the default transaction isolation level to 'linearizable'. Linearizable transactions may only perform requests to synchronous, local or temporary memtx spaces (vinyl engine support will be added later). Starting a linearizable transaction requires `box.cfg.memtx_use_mvcc_engine` to be on. Note: starting a linearizable transaction requires that the node is the replication **source** for at least N - Q + 1 remote replicas. Here `N` is the count of registered nodes in the cluster and `Q` is `replication_synchro_quorum` value (the same as `box.info.synchro.quorum`). This is the implementation limitation. For example, you may start linearizable transactions on any node of a cluster in full-mesh topology, but you can't perform linearizable transactions on anonymous replicas, because noone replicates **from** them. When a transcaction is linearizable it sees the latest changes performed on the quorum of nodes in the cluster. For example, if you use linearizable transactions to read data on a replica, such a transaction will never read stale data: all the committed writes performed on the master will be seen by the transaction. Making a transaction linearizable requires some waiting until the node receives all the committed data. In case the node can't contact enough remote peers to determine which data is committed an error is returned. Waiting for committed data may time out: if the data isn't received during the timeout specified by `timeout` option of `box.begin()`, an error is returned. When called with `{txn_isolation = 'linearizable'}`, `box.begin()` yields until the instance receives enough data from remote peers to be sure that the transaction is linearizable.
-
- Sep 26, 2022
-
-
Vladislav Shpilevoy authored
If an update operation tried to insert a new key into a map or an array which was created by a previous update operation, then the process would fail an assertion. That was because the first operation was stored as a bar update. The second operation tried to branch it assuming that the entire bar update's JSON path must exist, but it wasn't so for the newly created part of the path. The solution is to fallback to branching earlier than the entire bar path ends, if can see that the next part of the path can't be found. Closes #7705 NO_DOC=bugfix
-
- Sep 23, 2022
-
-
Georgiy Lebedev authored
TREE (HASH) index implements `random` method: if the space is empty from the transaction's perspective, which means we have to return nothing, add gap tracking of whole range (full scan tracking), since this result is equivalent to `index:select{}`, otherwise repeatedly call `random` and clarify result, until we get a non-empty one. We do not care about performance here, since all operations in context of transaction management currently have O(number of dirty tuples) complexity. Closes #7670 NO_DOC=bugfix
-
Georgiy Lebedev authored
If TREE index `get` result is empty, the key part count is incorrectly compared to the tree's `cmp_def->part_count`, though it should be compared with `cmp_def->unique_part_count`. But we can actually assume that by the time we get to the index's `get` method the part count is equal to the unique part count (partial keys are rejected and `get` is not supported for non-unique indexes): change check to correct assertion. Closes #7685 NO_DOC=<bugfix>
-