- Jul 06, 2020
-
-
Serge Petrenko authored
Since we stopped sending local space operations in replication, the last tx row has to be global in order to preserve tx boundaries on replica. If the last row happens to be a local one, replica will never receive the tx end marker, yielding the following errors: `ER_UNSUPPORTED: replication does not support interleaving transactions`. In order to fix the problem append a global NOP row at the tx end if it happens to end on a local row. Follow-up #4114 Closes #4928 Reviewed-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Serge Petrenko authored
In order to preserve transaction boundaries in replication protocol, wal assigns each tx row a transaction sequence number (tsn). Tsn is equal to the lsn of the first transaction row. Starting with commit 7eb4650e, local space requests are assigned a special replica id, 0, and have their own lsns. These operations are not replicated. If a transaction starting with a local space operation ends up in the WAL, it gets a tsn equal to the lsn of the local space request. Then, during replication, when such a transaction is replicated, the local space request is omitted, and replica receives a global part of the transaction with a seemingly random tsn, yielding an ER_PROTOCOL error: "Transaction id must be equal to LSN of the first row in the transaction". Assign tsn as equal to the lsn of the first global row in the transaction to fix the problem, and assign tsn as before for fully local transactions. Follow-up #4114 Part-of #4928 Reviewed-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Serge Petrenko authored
In case there are 2 "new" instances, running tarantool 2.2+, master and replica, and one "old" instance, running an earlier tarantool version, in a full-mesh cluster, it may happen that the "new" replica receives part of a tx from an "old" instance, and the remaining part from a "new" instance. Since "new" instances preserve tx boundaries, "new" replica would skip the tx remains assuming it has already applied the full tx if it has applied the first tx row. This leads to gaps in "new" replica's WAL and to skipping the remaining part of the tx forever. Fix this behaviour to apply the full tx even if it's beginning is already applied in mixed clusters. Closes #5125
-
- Jul 03, 2020
-
-
Alexander V. Tikhonov authored
Issue: [014] --- box/net.box_readahead_gh-3958.result Mon Jun 15 15:33:23 2020 [014] +++ box/net.box_readahead_gh-3958.reject Tue Jun 16 02:24:04 2020 [014] @@ -46,6 +46,7 @@ [014] ... [014] test_run:wait_log('default', 'readahead limit is reached', 1024, 0.1) [014] --- [014] +- readahead limit is reached [014] ... [014] s:drop() [014] --- [014] [014] Last 15 lines of Tarantool Log file [Instance "box"][/tarantool/test/var/014_box/box.log]: [014] 2020-06-16 02:24:03.792 [5585] main/121/console/unix/: I> set 'read_only' configuration option to false [014] 2020-06-16 02:24:03.834 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.835 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.835 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.837 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.837 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.837 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached [014] 2020-06-16 02:24:03.951 [5585] main/121/console/unix/: space.h:336 E> ER_NO_SUCH_INDEX_ID: No index #1 is defined in space '_space' [014] 2020-06-16 02:24:04.180 [5585] main/121/console/unix/: I> set 'readahead' configuration option to 128 [014] 2020-06-16 02:24:04.183 [5585] main/121/console/unix/: I> set 'readahead' configuration option to 102400 [014] 2020-06-16 02:24:04.189 [5585] main/453/console/unix/: I> set 'readahead' configuration option to 16320 Found that the root cause of the issue, was the previously run test 'box/net.box_call_blocks_gh-946.test.lua' on the same worker, in this case the log output mistakenly checked by wait_log/grep_log test_run function, which finds the grepping string in the log of the previous test. To avoid of it the tests can be swapped in worker running queue and in this case both tests pass, check swapped log output: 2020-06-17 10:57:39.881 [69372] main C> entering the event loop 2020-06-17 10:57:39.896 [69372] main/119/console/unix/: I> set 'readahead' configuration option to 128 2020-06-17 10:57:39.898 [69372] main/119/console/unix/: I> set 'readahead' configuration option to 102400 2020-06-17 10:57:40.003 [69372] main/156/console/unix/: I> set 'readahead' configuration option to 16320 2020-06-17 10:57:40.053 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.056 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.056 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.058 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.058 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.061 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.061 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.062 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.062 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.063 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached 2020-06-17 10:57:40.067 [69372] main C> got signal 15 - Terminated Also found that 'readahead' issue from the first test blocks its printing to log file due to suppressed. To fix this issue the default server must be restarted at the very start of the test. Closes #5082
-
Alexander V. Tikhonov authored
Found that some perf jobs were forgot to be updated with local cleanup routine as was done for the other jobs at commit: 892a188b "Correct cleanup gitlab-ci" Follows up #5036
-
Alexander V. Tikhonov authored
Building Tarantool sources on make command run may fail with: [ 10%] make[2]: *** [test/small] Error 1 [ 10%] make[1]: *** [test/CMakeFiles/symlink_small_tests.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs.... The root cause of the issue that Dockerfile.staticbuild uses local copy of sources: COPY . /tarantool Which may have broken links in tests, like: $ ls -al test ... luajit-tap -> /<wrong path>/third_party/luajit/test small -> /<wrong path>/src/lib/small/test/ ... To fix the issue this links should be removed from the docker local copy of sources before build, like: rm -rf test/small test/luajit-tap Closes #5025
-
- Jul 02, 2020
-
-
Chris Sosnin authored
This function will be used to determine, whether we can safely convert to integer without an information loss. Needed for #4415
-
Chris Sosnin authored
The behavior is similar with other strto* functions: parse the valid beginning of the string and optionally store the pointer to the first invalid character. Needed for tarantool/tarantool#4415
-
Chris Sosnin authored
-
Cyrill Gorcunov authored
Make sure we're allowed to setup json formatter before box.cfg() call, ie that named boot-time logger. Part-of #5121 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
There is no reason to not allow for json formatter on early logging stage. We add verification that box.cfg{log="syslog:", log_format="json"} or require('log').cfg{log="syslog:", format="json"} is triggering error since syslog output requires predefined structure and can't use json. Fixes #5121 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
For some reason in commit 09832455 we've disabled to use json format in boot time logger. There is no reason to do so. Only syslog output format is predefined and must not be changed, in turn json format is just a decoration over output stream so we can use it whenever requested. Part-of #5121 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- Jun 30, 2020
-
-
Nikita Pettik authored
Accidentally assignment is used in assertions instead of comparison operation. Let's fix this mistake and use comparison.
-
- Jun 29, 2020
-
-
Serge Petrenko authored
-
Cyrill Gorcunov authored
We never use this method so no need to waste space. In-scope-of #4842 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Introduced in 157beda5 and never used since. In-scope-of #4842 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Last time used in 1d979029 In-scope-of #4842 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Introduced in 157beda5 but never used since. In-scope-of #4842 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- Jun 26, 2020
-
-
Alexander V. Tikhonov authored
Found issue running test on FreeBSD VBox host: [011] --- box/net.box_wait_connected_gh-3856.result Mon Jun 15 09:39:49 2020 [011] +++ box/net.box_wait_connected_gh-3856.reject Fri May 8 08:23:30 2020 [011] @@ -12,7 +12,8 @@ [011] - opts: [011] wait_connected: false [011] host: 8.8.8.8 [011] - state: initial [011] + state: error [011] + error: Invalid argument [011] port: '123456' [011] ... [011] c:close() A. Turenko made deep investigation and found that the reason of the fail was that getaddrinfo() returned EIA_SERVICE for an incorrect TCP/IP port on FreeBSD, but crops it as modulo of 65536 on Linux/glibc. Checked with his local script './getaddrinfo': (Linux/glibc) $ ./getaddrinfo 8.8.8.8 123456 ---- family: AF_INET socktype: SOCK_STREAM protocol: IPPROTO_TCP host: 8.8.8.8 serv: 57920 (FreeBSD) $ ./getaddrinfo 8.8.8.8 123456 getaddrinfo: Service was not recognized for socket type So obvious fix is to change 123456 to something less or equal to 65535. Say, 1234. The test depended on an order in which fibers were scheduled (net_box.connect() creates a separate fiber for connecting in background using fiber.create(), which yields). Unlikely our fiber were not get execution time during the connection attempt, so it was more like a formal thing. But we can decrease probability of this situation even more if we'll grab all connection fields just when net_box.connect() returns, not after yield in console (which is due to waiting a next command from test-run). Closes #5083 Co-authored-by:
Alexander Turenko <alexander.turenko@tarantool.org> Co-authored-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
-
Alexander V. Tikhonov authored
Found issue (reproduced on VBox FreeBSD machine): [016] --- replication/wal_rw_stress.result Fri Feb 21 11:53:21 2020 [016] +++ replication/wal_rw_stress.reject Fri May 8 08:23:56 2020 [016] @@ -73,7 +73,42 @@ [016] ... [016] box.info.replication[1].downstream.status ~= 'stopped' or box.info [016] --- [016] -- true [016] +- version: 2.5.0-27-g32f59756a [016] + id: 2 [016] + ro: false [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + package: Tarantool [016] + cluster: [016] + uuid: 397c196f-9105-11ea-96ab-08002739cbd6 [016] + listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto [016] + replication: [016] + 1: [016] + id: 1 [016] + uuid: 397a1886-9105-11ea-96ab-08002739cbd6 [016] + lsn: 10005 [016] + upstream: [016] + status: follow [016] + idle: 0.46353673400017 [016] + peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto [016] + lag: -0.45732522010803 [016] + downstream: [016] + status: stopped [016] + message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati [016] + system_message: Broken pipe [016] + 2: [016] + id: 2 [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + lsn: 0 [016] + signature: 10005 [016] + status: running [016] + vinyl: [] [016] + uptime: 2 [016] + lsn: 0 [016] + sql: [] [016] + gc: [] [016] + pid: 41231 [016] + memory: [] [016] + vclock: {1: 10005} [016] ... [016] test_run:cmd("switch default") [016] --- To check the downstream status and it's message need to wait until an downstream appears. This prevents an attempt to index a nil value when one of those functions are called before a record about a peer appears in box.info.replication. It was observed on test: replication/show_error_on_disconnect after commit c6bea65f ('replication: recfg with 0 quorum returns immediately'). Checked that test still checks the error for which it was created at b9db91e1 ('xlog: fix fallocate vs read race') patch and successfully got the needed error "tx checksum mismatch": [153] --- replication/wal_rw_stress.result Fri Jun 19 15:01:49 2020 [153] +++ replication/wal_rw_stress.reject Fri Jun 19 15:04:02 2020 [153] @@ -73,7 +73,43 @@ [153] ... [153] test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info ... [153] + downstream: [153] + status: stopped [153] + message: tx checksum mismatch Note that wait_cond() allows to overcome a transient network connectivity errors, but 'tx checksum mismatch' is persistent one and will be catched. Closes #4977
-
Alexander V. Tikhonov authored
Found issue: [003] --- replication/wal_off.result Thu Apr 25 13:10:18 2019 [003] +++ replication/wal_off.reject Tue Jul 16 17:10:31 2019 [003] @@ -95,6 +95,8 @@ [003] ... [003] while string.find(box.info.replication[wal_off_id].upstream.message, check) == nil do fiber.sleep(0.01) end [003] --- [003] +- error: '[string "while string.find(box.info.replication[wal_of..."]:1: bad argument [003] + #1 to ''find'' (string expected, got nil)' [003] ... [003] box.cfg { replication = "" } [003] --- To check the upstream status and it's message need to wait until an upstream appears. This prevents an attempt to index a nil value when one of those functions are called before a record about a peer appears in box.info.replication. It was observed on test: replication/show_error_on_disconnect after commit c6bea65f ('replication: recfg with 0 quorum returns immediately'). Closes #4355
-
Igor Munkin authored
<box_process_lua> function created a new GCfunc object for a handler having no upvalues depending on the request context on each call. The change introduces the following mapping: | <handler id> -> <handler GCfunc object> Initializing this mapping on Tarantool startup is aimed to reduce Lua GC memory usage. Reviewed-by:
Sergey Ostanevich <sergos@tarantool.org> Reviewed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Signed-off-by:
Igor Munkin <imun@tarantool.org>
-
Igor Munkin authored
JIT compiler can generate an invalid trace for <fun.chain> iterator (i.e. chain_gen_r1) breaking its semantics (see LuaJIT/LuaJIT#584). Since interpreter works fine and produces the right results, disabling JIT for this function stops execution failures. As a result box-tap/key_def.test.lua is removed from box-tap suite fragile tests list. Relates to LuaJIT/LuaJIT#584 Fixes #4252 Reviewed-by:
Alexander V. Tikhonov <avtikhon@tarantool.org> Reviewed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Signed-off-by:
Igor Munkin <imun@tarantool.org>
-
- Jun 23, 2020
-
-
Nikita Pettik authored
Data read in vinyl is known to yield in case of disc access. So it opens a window for modifications of in-memory level. Imagine following scenario: right before data selection tuple is inserted into space. It passes first stage of commit procedure, i.e. it is prepared to be committed but still is not yet reached WAL. Meanwhile iterator is starting to read the same key. At this moment prepared statement is already inserted to in-memory tree ergo visible to read iterator. So, read iterator fetches this statement and proceeds to disk scan. In turn, disk scan yields and in this moment WAL fails to write statement on disk. Next, two cases are possible: 1. WAL thread has enough time to complete rollback procedure. 2. WAL thread fails to finish rollback in this time gap. In the first case read iterator should skip statement: version of in-memory tree has diverged from iterator's one, so we fall back into iterator restoration procedure. Mem iterator might become invalid so the only choice is to restart whole 'advance' routine. Let's don't try to restore it and always restart iteration cycle if L0 level has changed during yield. In the second case nothing is changed to read iterator, so it simply returns prepared statement (and it is considered to be OK). Closes #3395
-
Nikita Pettik authored
vy_page_find_key() assumes that equal_key parameter is initialized since it is used unconditionally. Originally, function was designed with assumption that parameter is initialized by caller. Since then it has been used in several other places, but some callers doesn't initialize this parameter to 'false' value. Let's fix it and inside vy_page_find_key() set this output parameter to false by default. Closes #5078
-
- Jun 22, 2020
-
-
Maria authored
Calling box.cfg{} more than once does not normally cause any errors (even though it might not have any effect). In contrast, assigning it to some variable and then using it after the box was configured caused an error since the method was overwritten by the initial call of <load_cfg>. The patch fixes this issue making box.cfg behave consistently in both scenarios. Follow-up #4231 Co-developed-by:
Alexander Turenko <alexander.turenko@tarantool.org>
-
Alexander Turenko authored
<box_load_and_execute> checks whether box is configured with appropriate locking and configures it when necessary. However it is not so for <lbox_execute>. We should replace the former with the latter only when box is fully loaded. Follow-up #4231
-
Maria authored
box.execute() initializes box if it is not initialized. For this sake, box.execute() is another function (so called <box_load_and_execute>) when box is not loaded: it loads box and calls 'real' box.execute(). However it is not enough: <box_load_and_execute> may be saved by a user before box initialization and called after box loading or during box loading from a separate fiber. Note: calling <box_load_and_execute> during box loading is safe now, but calling of box.execute() is not: the 'real' box.execute() does not verify whether box is configured. It will be fixed in a further commit. This commit changes <box_load_and_execute> to verify whether box is initialized and to load box only when it is not loaded. Also it adds appropriate locking around load_cfg() invocation from <box_load_and_execute>. While we're here, clarified contracts of functions that set box configuration options. Closes #4231 Co-developed-by:
Alexander Turenko <alexander.turenko@tarantool.org>
-
Kirill Yukhin authored
- test: don't use not aligned size for mempool
-
Cyrill Gorcunov authored
No need for +1 byte here, PATH_MAX already implies end of string. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
This is more consistent than relying that array size will remain PATH_MAX forever. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Similar to dirname there is no need for +1 byte. Same time make sure xlog_open never end up without trailing zero. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
The PATH_MAX is the longest path including end of string, no need for +1 byte. Same time use sizeof(dirname) to not bound how exactly dirname is declared. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
This makes structure less in size and eliminates useless padding (both enum and fd are integers 4 bytes long). Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
There is serious "inline disease" in the code: it spread left and right without a serious reason. The box_cfg_xc function is a pretty big one and doesn't require being inlined anyhow. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- Jun 19, 2020
-
-
Kirill Yukhin authored
Since currently query planner is unable to use HASH indexes and attempt to use it will likely lead to SEGFAULT, this patch raises an error on attempt to open VDBE cursor against HASH index. @TarantoolBot document Title: Doceument allowed index type for SQL Before the change, Tarantool query planner segfaulted on try of using non-tree index. It is blocked now w/ appropriate error message. Need to document the behaviour. It should be noted, that this restriction might be relaxed in future. Closes #4659
-
- Jun 17, 2020
-
-
Cyrill Gorcunov authored
Writing less bytes than requested is perfectly fine. In turn out that fio.write/pwrite api simply returns 'true' even if only some part of a buffer has been written. Thus make coio_write and coio_pwrite to write the whole data in a cycle. Note in most situations there will be only one pass, partial writes are really the rare cases. Note that we're not handling nonblocking writes here (which could return EAGAIN) simply because we need an other api which would accept timeouts. Fixes #4651 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- Jun 16, 2020
-
-
Ilya Kosarev authored
Since 527b02a2 (memtx: add yields during index build) memtx_build_on_replace was introduced to handle concurrent updates. The problem here was that the tuples being handled with this trigger did not get reference counter promotion, leading to a number of wrong behavior cases. Now this problem is solved. This problem was found through primary index altering with updates in background fiber. Corresponding test is introduced. Closes #4973
-
Vladislav Shpilevoy authored
Clang undefined behaviour sanitizer was turned on using -fsanitize=undefined flag, which is supposed to turn on all the sanitizations, except a few ones. Not needed sanitations were turned off explicitly, using -fno-sanitize=<type> flags. However appeared it does not work with some flags. For example, nullability sanitations can't be turned off when -fsanitize=undefined is used. Nullability sanitations lead to lots of false-positive fails such as typeof(*obj) where obj is NULL, or memcpy() with NULL destination but 0 size. The patch splits -fsanitize=undefined into separate flags and never turns on nullability checks. Part of #4609
-
Vladislav Shpilevoy authored
SQL heavily depends on box, and box on SQL. So they can't be separate libraries. The build started failing with undefined box symbols in SQL, when code of the latter has slightly changed in one of the recent commits. The build failed only with UB sanitizer enabled, but 'VERBOSE=1 make' showed that both with UB and without UB the build command was the same (not counting -fsanitize flags). So the sanitizer has nothing to do with it. The patch makes SQL sources being built as a part of box library. Closes #5067
-