- Oct 06, 2020
-
-
Kirill Yukhin authored
Update Clang 8 testing with most recent version of Clang: 11. Clang 8 still in the image. Closes #5386
-
Kirill Yukhin authored
File mp_error.cc was using a C99 feature called "designated initializers" which was catched by Clang v12. Remove this usage.
-
Kirill Yukhin authored
Variable `delete` wasn't free()-ed in case of error. Free it properly. Found by static analyzer.
-
Igor Munkin authored
While running GC hook (i.e. __gc metamethod) garbage collector engine is "stopped": the memory penalty threshold is set to LJ_MAX_MEM and incremental GC step is not triggered as a result. Ergo, yielding the execution at the finalizer body leads to further running platform with disabled LuaJIT GC. It is not re-enabled until the yielded fiber doesn't get the execution back. This changeset extends <cord_on_yield> routine with the check whether GC hook is active. If the switch-over occurs in scope of __gc metamethod the platform is forced to stop its execution with EXIT_FAILURE and calls panic routine before the exit. Relates to #4518 Follows up #4727 Reviewed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Reviewed-by:
Sergey Ostanevich <sergos@tarantool.org> Signed-off-by:
Igor Munkin <imun@tarantool.org>
-
Serge Petrenko authored
-
- Oct 02, 2020
-
-
Igor Munkin authored
Since Tarantool fibers don't respect Lua coroutine switch mechanism, JIT machinery stays unnotified when one lua_State substitutes another one. As a result if trace recording hasn't been aborted prior to fiber switch, the recording proceeds using the new lua_State and leads to a failure either on any further compiler phase or while the compiled trace is executed. This changeset extends <cord_on_yield> routine aborting trace recording when the fiber switches to another one. If the switch-over occurs while mcode is being run the platform finishes its execution with EXIT_FAILURE code and calls panic routine prior to the exit. Closes #1700 Fixes #4491 Reviewed-by:
Sergey Ostanevich <sergos@tarantool.org> Reviewed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Signed-off-by:
Igor Munkin <imun@tarantool.org>
-
Igor Munkin authored
Tarantool integrates several complex environments together and there are issues occurring at their junction leading to the platform failures. E.g. fiber switch-over is implemented outside the Lua world, so when one lua_State substitutes another one, main LuaJIT engines, such as JIT and GC, are left unnotified leading to the further platform misbehaviour. To solve this severe integration drawback <cord_on_yield> function is introduced. This routine encloses the checks and actions to be done when the running fiber yields the execution. Unfortunately the way callback is implemented introduces a circular dependency. Considering linker symbol resolving methods for static build an auxiliary translation unit is added to the particular tests mocking (i.e. exporting) <cord_on_yield> undefined symbol. Part of #1700 Relates to #4491 Reviewed-by:
Sergey Ostanevich <sergos@tarantool.org> Reviewed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Signed-off-by:
Igor Munkin <imun@tarantool.org>
-
- Oct 01, 2020
-
-
Cyrill Gorcunov authored
There is a mixture of types and clang prefer explicit conversion (since @value is a double). Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
d is "double" thus placate clang. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
pIn3->u.r is a "double", thus placate clang. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
The @r is "double" value thus use explicit conversion to placate clang compiler. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Convert to uint64_t explicitly. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- Sep 30, 2020
-
-
Kirill Yukhin authored
Without explicit cast we're getting warnings. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- Sep 29, 2020
-
-
Vladislav Shpilevoy authored
Part of #1146
-
Vladislav Shpilevoy authored
Box.info.election returns a table of form: { state: <string>, term: <number>, vote: <instance ID>, leader: <instance ID> } The fields correspond to the same named Raft concepts one to one. This info dump is supposed to help with the tests, first of all. And with investigation of problems in a real cluster. The API doesn't mention 'Raft' on purpose, to keep it not depending specifically on Raft, and not to confuse users who don't know anything about Raft (even that it is about leader election and synchronous replication). Part of #1146
-
Vladislav Shpilevoy authored
The commit is a core part of Raft implementation. It introduces the Raft state machine implementation and its integration into the instance's life cycle. The implementation follows the protocol to the letter except a few important details. Firstly, the original Raft assumes, that all nodes share the same log record numbers. In Tarantool they are called LSNs. But in case of Tarantool each node has its own LSN in its own component of vclock. That makes the election messages a bit heavier, because the nodes need to send and compare complete vclocks of each other instead of a single number like in the original Raft. But logic becomes simpler. Because in the original Raft there is a problem of uncertainty about what to do with records of an old leader right after a new leader is elected. They could be rolled back or confirmed depending on circumstances. The issue disappears when vclock is used. Secondly, leader election works differently during cluster bootstrap, until number of bootstrapped replicas becomes >= election quorum. That arises from specifics of replicas bootstrap and order of systems initialization. In short: during bootstrap a leader election may use a smaller election quorum than the configured one. See more details in the code. Part of #1146
-
sergepetrenko authored
The patch introduces a new type of system message used to notify the followers of the instance's raft status updates. It's relay's responsibility to deliver the new system rows to its peers. The notification system reuses and extends the same row type used to persist raft state in WAL and snapshot. Part of #1146 Part of #5204
-
Vladislav Shpilevoy authored
The new options are: - election_is_enabled - enable/disable leader election (via Raft). When disabled, the node is supposed to work like if Raft does not exist. Like earlier; - election_is_candidate - a flag whether the instance can try to become a leader. Note, it can vote for other nodes regardless of value of this option; - election_timeout - how long need to wait until election end, in seconds. The options don't do anything now. They are added separately in order to keep such mundane changes from the main Raft commit, to simplify its review. Option names don't mention 'Raft' on purpose, because - Not all users know what is Raft, so they may not even know it is related to leader election; - In future the algorithm may change from Raft to something else, so better not to depend on it too much in the public API. Part of #1146
-
Vladislav Shpilevoy authored
The patch introduces a sceleton of Raft module and a method to persist a Raft state in snapshot, not bound to any space. Part of #1146
-
Vladislav Shpilevoy authored
Struct replicaset didn't store a number of registered replicas. Only an array, which was necessary to fullscan each time when want to find the count. That is going to be needed in Raft to calculate election quorum. The patch makes the count tracked so as it could be found for constant time by simply reading an integer. Needed for #1146
-
Vladislav Shpilevoy authored
Relay.cc and box.cc obtained box.cfg.wal_dir value using cfg_gets() call. To initialize WAL and create struct recovery objects. That is not only a bit dangerous (cfg_gets() uses Lua API and can throw a Lua error) and slow, but also not necessary - wal_dir parameter is constant, it can't be changed after instance start. It means, the value can be stored somewhere one time and then used without Lua. Main motivation is that the WAL directory path will be needed inside relay threads to restart their recovery iterators in the Raft patch. They can't use cfg_gets(), because Lua lives in TX thread. But can access a constant global variable, introduced in this patch (it existed before, but now has a method to get it). Needed for #1146
-
Vladislav Shpilevoy authored
An instance is writable if box.cfg.read_only is false, and it is not orphan. Update of the final read-only state of the instance needs to fire read-only update triggers, and notify the engines. These 2 flags were easy and cheap to check on each operation, and the triggers were easy to use since both flags are stored and updated inside box.cc. That is going to change when Raft is introduced. Raft will add 2 more checks: - A flag if Raft is enabled on the node. If it is not, then Raft state won't affect whether the instance is writable; - When Raft is enabled, it will allow writes on a leader only. It means a check for being read-only would look like this: is_ro || is_orphan || (raft_is_enabled() && !raft_is_leader()) This is significantly slower. Besides, Raft somehow needs to access the read-only triggers and engine API - this looks wrong. The patch introduces a new flag is_ro_summary. The flag incorporates all the read-only conditions into one flag. When some subsystem may change read-only state of the instance, it needs to call box_update_ro_summary(), and the function takes care of updating the summary flag, running the triggers, and notifying the engines. Raft will use this function when its state or config will change. Needed for #1146
-
Vladislav Shpilevoy authored
Applier is going to need its numeric ID in order to tell the future Raft module who is a sender of a Raft message. An alternative would be to add sender ID to each Raft message, but this looks like a crutch. Moreover, applier still needs to know its numeric ID in order to notify Raft about heartbeats from the peer node. Needed for #1146
-
Sergey Kaplun authored
Found and fixed not closed va_list 'ap' with cppcheck: [src/httpc.c:190]: (error) va_list 'ap' was opened but not closed by va_end().
-
- Sep 28, 2020
-
-
Roman Khabibov authored
Ban ability to modify view on box level. Since a view is a named select, and not a table, in fact, altering view is not a valid operation.
-
Alexander V. Tikhonov authored
Added for tests with issues: app/fiber.test.lua gh-5341 app-tap/debug.test.lua gh-5346 app-tap/http_client.test.lua gh-5346 app-tap/inspector.test.lua gh-5346 box/gh-2763-session-credentials-update.test.lua gh-5363 box/hash_collation.test.lua gh-5247 box/lua.test.lua gh-5351 box/net.box_connect_triggers_gh-2858.test.lua gh-5247 box/net.box_incompatible_index-gh-1729.test.lua gh-5360 box/net.box_on_schema_reload-gh-1904.test.lua gh-5354 box/protocol.test.lua gh-5247 box/update.test.lua gh-5247 box-tap/net.box.test.lua gh-5346 replication/autobootstrap.test.lua gh-4533 replication/autobootstrap_guest.test.lua gh-4533 replication/ddl.test.lua gh-5337 replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940 replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua.test.lua gh-5357 replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343 replication/long_row_timeout.test.lua gh-4351 replication/on_replace.test.lua gh-5344, gh-5349 replication/prune.test.lua gh-5361 replication/qsync_advanced.test.lua gh-5340 replication/qsync_basic.test.lua gh-5355 replication/replicaset_ro_mostly.test.lua gh-5342 replication/wal_rw_stress.test.lua gh-5347 replication-py/multi.test.py gh-5362 sql/prepared.test.lua test gh-5359 sql-tap/selectG.test.lua gh-5350 vinyl/ddl.test.lua gh-5338 vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197 vinyl/iterator.test.lua gh-5336 vinyl/write_iterator_rand.test.lua gh-5356 xlog/panic_on_wal_error.test.lua gh-5348
-
Sergey Kaplun authored
Found and fixed Null pointer dereference with cppcheck: [src/box/alter.cc:395]: (error) Null pointer dereference
-
Sergey Kaplun authored
[src/lua/fiber.c:245] -> [src/lua/fiber.c:217]: (warning) Either the condition 'if(func)' is redundant or there is possible null pointer dereference: func.
-
- Sep 26, 2020
-
-
Alexander Turenko authored
Updated test_run:wait_upstream() and test_run:wait_downstream() to wait until box will be configured and an instance with given ID will appear in box.info.replication. See https://github.com/tarantool/test-run/issues/221 Fixes #5317 Fixes #5329
-
- Sep 25, 2020
-
-
Alexander Turenko authored
Justify columns in the output. https://github.com/tarantool/test-run/pull/222
-
Alexander V. Tikhonov authored
Removed dust line from merge.
-
Alexander V. Tikhonov authored
In test-run implemented the new format of the fragile lists based on JSON format set as fragile option in 'suite.ini' files per each suite: fragile = { "retries": 10, "tests": { "bitset.test.lua": { "issues": [ "gh-4095" ], "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ] } }} Added ability to check results file checksum on tests fail and compare with the checksums of the known issues mentioned in the fragile list. Also added ability to set 'retries' option, which sets the number of accepted reruns of the tests failed from 'fragile' list that have checksums on its fails. Closes #5050
-
Alexander V. Tikhonov authored
Found flaky issues multi running replication/anon.test.lua test on the single worker: [007] --- replication/anon.result Fri Jun 5 09:02:25 2020 [007] +++ replication/anon.reject Mon Jun 8 01:19:37 2020 [007] @@ -55,7 +55,7 @@ [007] [007] box.info.status [007] | --- [007] - | - running [007] + | - orphan [007] | ... [007] box.info.id [007] | --- [094] --- replication/anon.result Sat Jun 20 06:02:43 2020 [094] +++ replication/anon.reject Tue Jun 23 19:35:28 2020 [094] @@ -154,7 +154,7 @@ [094] -- Test box.info.replication_anon. [094] box.info.replication_anon [094] | --- [094] - | - count: 1 [094] + | - count: 2 [094] | ... [094] #box.info.replication_anon() [094] | --- [094] It happend because replications may stay active from the previous runs on the common tarantool instance at the test-run worker. To avoid of it added restarting of the tarantool instance at the very start of the test. Closes #5058
-
Alexander V. Tikhonov authored
Set opensuse jobs to test group to be sure that it will be run with artifacts collecting and without gitlab-ci jobs extra parallization.
-
Alexander V. Tikhonov authored
Added artifacts saver to all gitlab-ci jobs with testing. Gitlab-ci jobs saves its results files in the following paths: 1. base jobs for testing different features: - test/var/artifacts 2. OSX jobs: - ${OSX_VARDIR}/artifacts 3. pack/deploy jobs: - build/usr/src/*/tarantool-*/test/var/artifacts 4. VBOX jobs (freebsd_12) on virtual host: - ~/tarantool/test/var/artifacts In gitlab-ci configuration added 'after_script' section with script which collects from different test places 'artifacts' directories created by test-run tool. It saves 'artifacts' directories as root path in artifacts packages. User will be able to download these packages using gitlab-ci GUI either API. Additionally added OSX_VARDIR environment variable to be able to setup common path for artifacts and OSX shell scripts options. OSX_VARDIR: /tmp/tnt Part of #5050
-
Sergey Bronnikov authored
On running Jepsen tests created directory with Terraform state and directory with Jepsen tests source code in a build directory. Everything is ok on using out of source build in a separate directory, but with building in a project root directory these directories appears in `git status` output. This patch add ignores for these directories.
-
Sergey Bronnikov authored
For running Jepsen tests we need to checkout external repository with tests source code on a build stage. This behaviour brokes a Tarantool build under Gentoo. Option WITH_JEPSEN enables targets only when they needed. Closes #5325
-
- Sep 24, 2020
-
-
Alexander Turenko authored
Retry a failed test when it is marked as fragile (and several other conditions are met, see below). The test-run already allows to set a list of fragile tests. They are run one-by-one after all parallel ones in order to eliminate possible resource starvation and fit timings to ones when the tests pass. See [1]. In practice this approach does not help much against our problem with flaky tests. We decided to retry failed tests, when they are known as flagile. See [2]. The core idea is to split responsibility: known flaky fails will not deflect attention of a developer, but each fragile test will be marked explicitly, trackerized and will be analyzed by the quality assurance team. The default behaviour is not changed: each test from the fragile list will be run once after all parallel ones. But now it is possible to set retries amount. Beware: the implementation does not allow to just set retries count, it also requires to provide an md5sum of a failed test output (so called reject file). The idea here is to ensure that we retry the test only in case of a known fail: not some other fail within the test. This approach has the limitation: in case of fail a test may output an information that varies from run to run or depend of a base directory. We should always verify the output before put its checksum into the configuration file. Despite doubts regarding this approach, it looks simple and we decided to try and revisit it if there will be a need. See configuration example in [3]. [1]: https://github.com/tarantool/test-run/issues/187 [2]: https://github.com/tarantool/test-run/issues/189 [3]: https://github.com/tarantool/test-run/pull/217 Part of #5050
-
- Sep 23, 2020
-
-
Aleksandr Lyapunov authored
Closes #4897
-
Aleksandr Lyapunov authored
txn_proxy is a special utility for transaction tests. Formerly it was used only for vinyl tests and thus was placed in vinyl folder. Now the time has come to test memtx transactions and the utility must be placed amongst other utils - in box/lua. Needed for #4897
-