- Oct 14, 2020
-
-
Vladislav Shpilevoy authored
Raft state machine now has a trigger invoked each time when any of the visible Raft attributes is changed: state, term, vote. The trigger is needed to commit synchronous transactions of an old leader, when a new leader is elected. This is done via a trigger so as not to depend on box in raft code too much. That would make it harder to extract it into a new module later. The trigger is executed in the Raft worker fiber, so as not to stop the state machine transitions anywhere, which currently don't contain a single yield. And the synchronous transaction queue clearance requires a yield, to write CONFIRM and ROLLBACK records to WAL. Part of #5339
-
Vladislav Shpilevoy authored
When a raft node was configured to be a candidate via election_mode, it didn't do anything if there was an active leader. But it should have started monitoring its health in order to initiate a new election round when it dies. The patch fixes this bug. It does not contain a test, because will be covered by a test for #5339. Needed for #5339
-
Vladislav Shpilevoy authored
Raft has a worker fiber to perform async tasks such as WAL write, state broadcast. The worker was created and woken up from 2 places, leading at least to code duplication. The patch wraps it into a new function raft_worker_wakeup(), and uses it. The patch is not need for anything functional, but was created while working on #5339 and trying ideas. The patch seems to be good refactoring making the code simpler, and therefore it is submitted.
-
Vladislav Shpilevoy authored
The test is long, about 10 seconds. But its name is too general. And it would be better used for a simpler more basic test. This is going to happen in the next commits. election_qsync.test.lua will check if the election and qsync work fine together without any stress cases. Needed for #5339
-
Alexander V. Tikhonov authored
Added new checksum for flaky fail on vinyl/gh.test.lua:427 line. Part of #5141
-
Alexander V. Tikhonov authored
On heavy loaded hosts found the following issue: [151] --- replication/replica_rejoin.result Tue Sep 29 10:57:26 2020 [151] +++ replication/replica_rejoin.reject Tue Sep 29 10:57:48 2020 [151] @@ -230,7 +230,12 @@ [151] return box.info ~= nil and box.info.replication[1] ~= nil [151] end) [151] --- [151] -- true [151] +- error: "builtin/box/load_cfg.lua:601: Please call box.cfg{} first\nstack traceback:\n\tbuiltin/box/load_cfg.lua:601: [151] + in function '__index'\n\t[string \"return test_run:wait_cond(function() ...\"]:1: [151] + in function 'cond'\n\t/tmp/tnt/151_replication/test_run.lua:411: in function </tmp/tnt/151_replication/test_run.lua:404>\n\t[C]: [151] + in function 'pcall'\n\tbuiltin/box/console.lua:402: in function 'eval'\n\tbuiltin/box/console.lua:708: [151] + in function 'repl'\n\tbuiltin/box/console.lua:842: in function <builtin/box/console.lua:828>\n\t[C]: [151] + in function 'pcall'\n\tbuiltin/socket.lua:1081: in function <builtin/socket.lua:1079>" [151] ... [151] test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'}) [151] --- [151] It happened because box.cfg was not ready to provide information. In real there is no need to use local check for replication information parts availablity, due to wait_upstream() function used below, do it itself. Part of #4985
-
- Oct 13, 2020
-
-
Igor Munkin authored
Fixes the regression from e5039742 ('luajit: bump new version'). Reported-by:
Alexander Tikhonov <avtikhon@tarantool.org> Signed-off-by:
Igor Munkin <imun@tarantool.org>
-
Ilya Kosarev authored
key_def didn't support key definitions with array, map, varbinary & any fields. Thus they couldn't be extracted with key_def_object:extract_key(). Since the restriction existed due to impossibility of such types comparison, this patch removes the restriction for the fields extraction and only leaves it for comparison. Closes #4538
-
Kirill Yukhin authored
* misc: add C and Lua API for platform metrics * core: introduce various platform metrics
-
Alexander V. Tikhonov authored
Added for tests with issues: app/socket.test.lua gh-4978 box/access.test.lua gh-5411 box/access_misc.test.lua gh-5401 box/gh-5135-invalid-upsert.test.lua gh-5376 box/hash_64bit_replace.test.lua test gh-5410 box/hash_replace.test.lua gh-5400 box/huge_field_map_long.test.lua gh-5375 box/net.box_huge_data_gh-983.test.lua gh-5402 replication/anon.test.lua gh-5381 replication/autoboostrap.test.lua gh-4933 replication/box_set_replication_stress.test.lua gh-4992 replication/election_basic.test.lua gh-5368 replication/election_qsync.test.lua test gh-5395 replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380 replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407 replication/gh-5287-boot-anon.test.lua gh-5412 replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379 replication/show_error_on_disconnect.test.lua gh-5371 replication/status.test.lua gh-5409 swim/swim.test.lua gh-5403 unit/swim.test gh-5399 vinyl/gc.test.lua gh-5383 vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408 vinyl/gh-4957-too-many-upserts.test.lua gh-5378 vinyl/gh.test.lua gh-5141 vinyl/quota.test.lua gh-5377 vinyl/snapshot.test.lua gh-4984 vinyl/stat.test.lua gh-4951 vinyl/upsert.test.lua gh-5398
-
Alexander V. Tikhonov authored
Testing on FreeBSD 12 had some tests previously blocked to avoid of flaky fails. For now we have the ability to avoid of it in test-run using checksums for fails with opened issues. So adding back 7 tests to testing on FreeBSD 12. Closes #4271
-
Alexander V. Tikhonov authored
Met flaky issues on test: replication/gh-3637-misc-error-on-replica-auth-fail.test.lua Found memory leaks: [093] Last 15 lines of Tarantool Log file [Instance "replica_auth"][/builds/DtQXhC5e/0/tarantool/tarantool/test/var/093_replication/replica_auth.log]: [093] #3 0xa13df8 in coio_on_call /builds/DtQXhC5e/0/tarantool/tarantool/src/lib/core/coio_task.c:264:16 [093] #4 0xfcedbe in eio_execute /builds/DtQXhC5e/0/tarantool/tarantool/third_party/libeio/eio.c:2015:9 [093] #5 0xfcedbe in etp_proc /builds/DtQXhC5e/0/tarantool/tarantool/third_party/libeio/etp.c:373 [093] #6 0x7f8c8260ffa2 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7fa2) [093] [093] Indirect leak of 4 byte(s) in 1 object(s) allocated from: [093] #0 0x525dfa in calloc (/builds/DtQXhC5e/0/tarantool/tarantool/src/tarantool+0x525dfa) [093] #1 0xa2eb4a in mh_i64ptr_new /builds/DtQXhC5e/0/tarantool/tarantool/src/lib/salad/mhash.h:408:22 [093] #2 0x8a516d in vy_recovery_new_f /builds/DtQXhC5e/0/tarantool/tarantool/src/box/vy_log.c:2321:23 [093] #3 0xa13df8 in coio_on_call /builds/DtQXhC5e/0/tarantool/tarantool/src/lib/core/coio_task.c:264:16 [093] #4 0xfcedbe in eio_execute /builds/DtQXhC5e/0/tarantool/tarantool/third_party/libeio/eio.c:2015:9 [093] #5 0xfcedbe in etp_proc /builds/DtQXhC5e/0/tarantool/tarantool/third_party/libeio/etp.c:373 [093] #6 0x7f8c8260ffa2 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7fa2) To stabilize testing these leaks added as suppressions to asan list. Part of #5343
-
Alexander V. Tikhonov authored
Set error message to log output in test: vinyl/gc.test.lua
-
Alexander V. Tikhonov authored
Set error message to log output in test: vinyl/snapshot.test.lua
-
Alexander V. Tikhonov authored
Set error message to log output in test: replication/gh-4402-info-errno.test.lua
-
Alexander V. Tikhonov authored
Set error message to log output in test: replication/replica_rejoin.test.lua
-
Alexander V. Tikhonov authored
Set error message to log output in test: replication/gh-3160-misc-heartbeats-on-master-changes.test.lua
-
Kirill Yukhin authored
* Restart server on each failed test in worker
-
- Oct 12, 2020
-
-
Vladislav Shpilevoy authored
The new option can be one of 3 values: 'off', 'candidate', 'voter'. It replaces 2 old options: election_is_enabled and election_is_candidate. These flags looked strange, that it was possible to set candidate true, but disable election at the same time. Also it would not look good if we would ever decide to introduce another mode like a data-less sentinel node, for example. Just for voting. Anyway, the single option approach looks easier to configure and to extend. - 'off' means the election is disabled on the node. It is the same as election_is_enabled = false in the old config; - 'voter' means the node can vote and is never writable. The same as election_is_enabled = true + election_is_candidate = false in the old config; - 'candidate' means the node is a full-featured cluster member, which eventually may become a leader. The same as election_is_enabled = true + election_is_candidate = true in the old config. Part of #1146
-
- Oct 07, 2020
-
-
Aleksandr Lyapunov authored
space:fselect and index:fselect fetch data like ordinal select, but formats the result like mysql does - with columns, column names etc. fselect converts tuple to strings using json, extending with spaces and cutting tail if necessary. It is designed for visual analysis of select result and shouldn't be used stored procedures. index:fselect(<key>, <opts>, <fselect_opts>) space:fselect(<key>, <opts>, <fselect_opts>) There are some options that can be specified in different ways: - among other common options (<opts>) with 'fselect_' prefix. (e.g. 'fselect_type=..') - in special <fselect_opts> map (with or without prefix). - in global variables with 'fselect_' prefix. The possible options are: - type: - 'sql' - like mysql result (default). - 'gh' (or 'github' or 'markdown') - markdown syntax, for copy-pasting to github. - 'jira' - jira table syntax (for copy-pasting to jira). - widths: array with desired widths of columns. - max_width: limit entire length of a row string, longest fields will be cut if necessary. Set to 0 (default) to detect and use screen width. Set to -1 for no limit. - print: (default - false) - print each line instead of adding to result. - use_nbsp: (default - true) - add invisible spaces to improve readability in YAML output. Not applicabble when print=true. There is also a pair of shortcuts: index/space:gselect - same as fselect, but with type='gh'. index/space:jselect - same as fselect, but with type='jira'. See test/engine/select.test.lua for examples. Closes #5161
-
Sergey Kaplun authored
In case when we build without `ENABLE_FIBER_TOP` neither `struct fiber` contains `clock_stat` field nor `FIBER_TIME_RES` constant is defined. This patch adds corresponding ifdef directive to avoid compilation errors.
-
- Oct 06, 2020
-
-
Kirill Yukhin authored
Update Clang 8 testing with most recent version of Clang: 11. Clang 8 still in the image. Closes #5386
-
Kirill Yukhin authored
File mp_error.cc was using a C99 feature called "designated initializers" which was catched by Clang v12. Remove this usage.
-
Kirill Yukhin authored
Variable `delete` wasn't free()-ed in case of error. Free it properly. Found by static analyzer.
-
Igor Munkin authored
While running GC hook (i.e. __gc metamethod) garbage collector engine is "stopped": the memory penalty threshold is set to LJ_MAX_MEM and incremental GC step is not triggered as a result. Ergo, yielding the execution at the finalizer body leads to further running platform with disabled LuaJIT GC. It is not re-enabled until the yielded fiber doesn't get the execution back. This changeset extends <cord_on_yield> routine with the check whether GC hook is active. If the switch-over occurs in scope of __gc metamethod the platform is forced to stop its execution with EXIT_FAILURE and calls panic routine before the exit. Relates to #4518 Follows up #4727 Reviewed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Reviewed-by:
Sergey Ostanevich <sergos@tarantool.org> Signed-off-by:
Igor Munkin <imun@tarantool.org>
-
Serge Petrenko authored
-
- Oct 02, 2020
-
-
Igor Munkin authored
Since Tarantool fibers don't respect Lua coroutine switch mechanism, JIT machinery stays unnotified when one lua_State substitutes another one. As a result if trace recording hasn't been aborted prior to fiber switch, the recording proceeds using the new lua_State and leads to a failure either on any further compiler phase or while the compiled trace is executed. This changeset extends <cord_on_yield> routine aborting trace recording when the fiber switches to another one. If the switch-over occurs while mcode is being run the platform finishes its execution with EXIT_FAILURE code and calls panic routine prior to the exit. Closes #1700 Fixes #4491 Reviewed-by:
Sergey Ostanevich <sergos@tarantool.org> Reviewed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Signed-off-by:
Igor Munkin <imun@tarantool.org>
-
Igor Munkin authored
Tarantool integrates several complex environments together and there are issues occurring at their junction leading to the platform failures. E.g. fiber switch-over is implemented outside the Lua world, so when one lua_State substitutes another one, main LuaJIT engines, such as JIT and GC, are left unnotified leading to the further platform misbehaviour. To solve this severe integration drawback <cord_on_yield> function is introduced. This routine encloses the checks and actions to be done when the running fiber yields the execution. Unfortunately the way callback is implemented introduces a circular dependency. Considering linker symbol resolving methods for static build an auxiliary translation unit is added to the particular tests mocking (i.e. exporting) <cord_on_yield> undefined symbol. Part of #1700 Relates to #4491 Reviewed-by:
Sergey Ostanevich <sergos@tarantool.org> Reviewed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Signed-off-by:
Igor Munkin <imun@tarantool.org>
-
- Oct 01, 2020
-
-
Cyrill Gorcunov authored
There is a mixture of types and clang prefer explicit conversion (since @value is a double). Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
d is "double" thus placate clang. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
pIn3->u.r is a "double", thus placate clang. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
The @r is "double" value thus use explicit conversion to placate clang compiler. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Convert to uint64_t explicitly. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- Sep 30, 2020
-
-
Kirill Yukhin authored
Without explicit cast we're getting warnings. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- Sep 29, 2020
-
-
Vladislav Shpilevoy authored
Part of #1146
-
Vladislav Shpilevoy authored
Box.info.election returns a table of form: { state: <string>, term: <number>, vote: <instance ID>, leader: <instance ID> } The fields correspond to the same named Raft concepts one to one. This info dump is supposed to help with the tests, first of all. And with investigation of problems in a real cluster. The API doesn't mention 'Raft' on purpose, to keep it not depending specifically on Raft, and not to confuse users who don't know anything about Raft (even that it is about leader election and synchronous replication). Part of #1146
-
Vladislav Shpilevoy authored
The commit is a core part of Raft implementation. It introduces the Raft state machine implementation and its integration into the instance's life cycle. The implementation follows the protocol to the letter except a few important details. Firstly, the original Raft assumes, that all nodes share the same log record numbers. In Tarantool they are called LSNs. But in case of Tarantool each node has its own LSN in its own component of vclock. That makes the election messages a bit heavier, because the nodes need to send and compare complete vclocks of each other instead of a single number like in the original Raft. But logic becomes simpler. Because in the original Raft there is a problem of uncertainty about what to do with records of an old leader right after a new leader is elected. They could be rolled back or confirmed depending on circumstances. The issue disappears when vclock is used. Secondly, leader election works differently during cluster bootstrap, until number of bootstrapped replicas becomes >= election quorum. That arises from specifics of replicas bootstrap and order of systems initialization. In short: during bootstrap a leader election may use a smaller election quorum than the configured one. See more details in the code. Part of #1146
-
sergepetrenko authored
The patch introduces a new type of system message used to notify the followers of the instance's raft status updates. It's relay's responsibility to deliver the new system rows to its peers. The notification system reuses and extends the same row type used to persist raft state in WAL and snapshot. Part of #1146 Part of #5204
-
Vladislav Shpilevoy authored
The new options are: - election_is_enabled - enable/disable leader election (via Raft). When disabled, the node is supposed to work like if Raft does not exist. Like earlier; - election_is_candidate - a flag whether the instance can try to become a leader. Note, it can vote for other nodes regardless of value of this option; - election_timeout - how long need to wait until election end, in seconds. The options don't do anything now. They are added separately in order to keep such mundane changes from the main Raft commit, to simplify its review. Option names don't mention 'Raft' on purpose, because - Not all users know what is Raft, so they may not even know it is related to leader election; - In future the algorithm may change from Raft to something else, so better not to depend on it too much in the public API. Part of #1146
-
Vladislav Shpilevoy authored
The patch introduces a sceleton of Raft module and a method to persist a Raft state in snapshot, not bound to any space. Part of #1146
-