- Dec 22, 2020
-
-
Alexander V. Tikhonov authored
Due to all the activities moving from Gitlab-CI to Github-CI Actions, then docker images creation routine updated with the new images naming and containers registry: GITLAB_REGISTRY?=registry.gitlab.com changed to DOCKER_REGISTRY?=docker.io Part of #5294
-
Alexander V. Tikhonov authored
Added test-run filter on box.snapshot error message: 'Invalid VYLOG file: Slice [0-9]+ deleted but not registered' to avoid of printing changing data in results file to be able to use its checksums in fragile list of test-run to rerun it as flaky issue. Found issues: 1) vinyl/deferred_delete.test.lua https://gitlab.com/tarantool/tarantool/-/jobs/913623306#L4552 [036] 2020-12-15 19:10:01.996 [16602] coio vy_log.c:2202 E> failed to process vylog record: delete_slice{slice_id=744, } [036] 2020-12-15 19:10:01.996 [16602] main/103/vinyl vy_log.c:2068 E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 744 deleted but not registered 2) vinyl/gh-4864-stmt-alloc-fail-compact.test.lua https://gitlab.com/tarantool/tarantool/-/jobs/913810422#L4835 [052] @@ -56,9 +56,11 @@ [052] -- [052] dump(true) [052] | --- [052] - | ... [052] -dump() [052] - | --- [052] + | - error: 'Invalid VYLOG file: Slice 253 deleted but not registered' [052] + | ... 3) vinyl/misc.test.lua https://gitlab.com/tarantool/tarantool/-/jobs/913727925#L5284 [014] @@ -62,14 +62,14 @@ [014] ... [014] box.snapshot() [014] --- [014] -- ok [014] +- error: 'Invalid VYLOG file: Slice 1141 deleted but not registered' [014] ... 4) vinyl/quota.test.lua https://gitlab.com/tarantool/tarantool/-/jobs/914016074#L4595 [025] 2020-12-15 22:56:50.192 [25576] coio vy_log.c:2202 E> failed to process vylog record: delete_slice{slice_id=522, } [025] 2020-12-15 22:56:50.193 [25576] main/103/vinyl vy_log.c:2068 E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 522 deleted but not registered 5) vinyl/update_optimize.test.lua https://gitlab.com/tarantool/tarantool/-/jobs/913728098#L2512 [051] 2020-12-15 20:18:43.365 [17147] coio vy_log.c:2202 E> failed to process vylog record: delete_slice{slice_id=350, } [051] 2020-12-15 20:18:43.365 [17147] main/103/vinyl vy_log.c:2068 E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 350 deleted but not registered 6) vinyl/upsert.test.lua https://gitlab.com/tarantool/tarantool/-/jobs/913623510#L6132 [008] @@ -441,7 +441,7 @@ [008] -- Mem has DELETE [008] box.snapshot() [008] --- [008] -- ok [008] +- error: 'Invalid VYLOG file: Slice 1411 deleted but not registered' [008] ... 7) vinyl/replica_quota.test.lua https://gitlab.com/tarantool/tarantool/-/jobs/914272656#L5739 [023] @@ -41,7 +41,7 @@ [023] ... [023] box.snapshot() [023] --- [023] -- ok [023] +- error: 'Invalid VYLOG file: Slice 232 deleted but not registered' [023] ... 8) vinyl/ddl.test.lua https://gitlab.com/tarantool/tarantool/-/jobs/914309343#L4538 [039] @@ -81,7 +81,7 @@ [039] ... [039] box.snapshot() [039] --- [039] -- ok [039] +- error: 'Invalid VYLOG file: Slice 206 deleted but not registered' [039] ... 9) vinyl/write_iterator.test.lua https://gitlab.com/tarantool/tarantool/-/jobs/920646297#L4694 [059] @@ -80,7 +80,7 @@ [059] ... [059] box.snapshot() [059] --- [059] -- ok [059] +- error: 'Invalid VYLOG file: Slice 351 deleted but not registered' [059] ... [059] -- [059] -- Create a couple of tiny runs on disk, to increate the "number of runs" 10) vinyl/gc.test.lua https://gitlab.com/tarantool/tarantool/-/jobs/920441445#L4691 [050] @@ -59,6 +59,7 @@ [050] ... [050] gc() [050] --- [050] +- error: 'Invalid VYLOG file: Run 1176 deleted but not registered' [050] ... [050] files = ls_data() [050] --- 11) vinyl/gh-3395-read-prepared-uncommitted.test.lua https://gitlab.com/tarantool/tarantool/-/jobs/921944705#L4258 [019] @@ -38,7 +38,7 @@ [019] | ... [019] box.snapshot() [019] | --- [019] - | - ok [019] + | - error: 'Invalid VYLOG file: Slice 634 deleted but not registered' [019] | ... [019] [019] c = fiber.channel(1)
-
Alexander V. Tikhonov authored
Found that running vinyl test suite in parallel using test-run vardir on real hard drive may cause a lot of tests to fail. It happens because of bottleneck with hard drive usage up to 100% which can be seen by any of the tools like atop during vinyl tests run in parallel. To avoid of it all heavy loaded testing processes should use tmpfs for vardir path. Found that out-of-source build had to be updated to use tmpfs for it. This patch mounts additional tmpfs mount point in OOS build docker run process for test-run vardir. This mount point set using '--tmpfs' flag because '--mount' does not support 'exec' option which is needed to be able to execute commands in it [2][3]. Issues met on OOS before the patch, like described in #5504 and [1]: Test hung! Result content mismatch: --- vinyl/write_iterator.result Fri Nov 20 14:48:24 2020 +++ /rw_bins/test/var/081_vinyl/write_iterator.result Fri Nov 20 15:01:54 2020 @@ -200,831 +200,3 @@ --- ... for i = 1, 100 do space:insert{i, ''..i} if i % 2 == 0 then box.snapshot() end end ---- -... -space:delete{1} ---- -... Closes #5622 Part of #5504 [1] - https://gitlab.com/tarantool/tarantool/-/jobs/863266476#L5009 [2] - https://stackoverflow.com/questions/54729130/how-to-mount-docker-tmpfs-with-exec-rw-flags [3] - https://github.com/moby/moby/issues/35890
-
Sergey Kaplun authored
Part of #5187
-
- Dec 21, 2020
-
-
Vladislav Shpilevoy authored
If death timeout was decreased during waiting for leader death or discovery to a new value making the current death waiting end immediately, it could crash in libev. Because it would mean the remaining time until leader death became negative. The negative timeout was passed to libev without any checks, and there is an assertion, that a timeout should always be >= 0. This commit makes raft code covered almost on 100%, not counting one 'unreachable()' place. Closes #5303
-
Vladislav Shpilevoy authored
If election timeout was decreased during election to a new value making the current election expired immediately, it could crash in libev. Because it would mean the remaining time until election end became negative. The negative timeout was passed to libev without any checks, and there is an assertion, that a timeout should always be >= 0. Part of #5303
-
Vladislav Shpilevoy authored
raft_process_msg() only validated that the state is specified. But it didn't check if the state is inside of the allowed value range. Such messages were considered valid, and even their other fields were accepted. For instance, an invalid message could bump term. It is safer to reject such messages. Part of #5303
-
Vladislav Shpilevoy authored
Term in raft can never be 0. It starts from 1 and can only grow. It was assumed it can't be received from other nodes because they do the same. There was an assertion for that. But in raft_msg, used as a transport unit between raft nodes, it was still possible to send 0 term. It could happen as a result of a bug, or if someone would try to mimic the protocol but made a mistake. That led to a crash in the assert in debug build. Part of #5303
-
Vladislav Shpilevoy authored
Raft algorithm was tested only by functional Lua tests, as a part of the Tarantool executable. Functional testing of something like raft algorithm has drawbacks: - Not possible or too hard to cover some cases without error injections and/or good stability. Such as invalid messages, or small time durations, or a complex case which needs multiple steps to be reproduced. For instance, start WAL write, receive a message, finish the WAL write, and see if an expected thing happens. - Too long time to run when need to test not tiny timeouts. On the other hand, with tiny timeouts the test would become unstable. - Poor reproducibility due to random used in raft, due to system load, and number of other tests running in parallel. - Hard to debug, because for raft it is necessary to start one Tarantool process per raft instance. - Involves too much other systems, such as threads, real journal, relays, appliers, and so on. They can affect the process on the whole and reduce reproducibility and debug simplicity even more. Exactly the same problem existed for SWIM algorithm implemented as a module 'swim'. In order to test it, swim was moved to a separate library, refactored to be able to start many swims in the process (no global variables), its functions talking to other systems were virtualized (swim_ev, swim_transport), and substituted in the unit tests with fake analogue systems. In the unit tests these virtual functions were implemented differently, but the core swim algorithm was left intact and properly tested. The same is done for raft. This patch implements a set of helper functions and objects to unit test raft in raft_test_utils.c/.h files, and uses it to cover almost all the raft algorithm code. During implementation of the tests some bugs were found, which are not covered here, but fixed and covered in next commits. Part of #5303
-
Vladislav Shpilevoy authored
Raft_ev.h/.c encapsulates usage of libev, to a certain extent. All libev functions are wrapped into raft_ev_* wrappers. Objects and types are left intact. This is done in order to be able to replace raft_ev.c in the soon coming unit tests with fakeev functions. That will allow to simulate time and catch all the raft events with 100% reproducibility, and without actual waiting for the events in real time. The similar approach is used for swim unit tests. Original raft core file is built into a new library 'raft_algo'. It is done for the sake of code coverage in unit tests. A test could be built by directly referencing raft.c in unit/CMakeLists.txt, but it can't apply compilation options to it, including gcov options. When raft.c is built into a library right where it is defined, it gets the gcov options, and the code covered by unit tests can be properly shown. Part of #5303
-
Vladislav Shpilevoy authored
ev_timer_remaining() in libev returns number of seconds until the timer will expire. It is used in raft code. Raft is going to be tested using fakesys, and it means it needs fakeev analogue of ev_timer_remaining(). Part of #5303
-
Vladislav Shpilevoy authored
fakeev timers didn't set 'active' flag for ev_watcher objects. Because of that if fakeev_timer_start() was called, the timer wasn't visible as 'active' via libev API. Fakeev is supposed to be a drop-in simulation of libev, so it should "work" exactly the same.
-
mechanik20051988 authored
There was a problem whith on_schema_init trigger. This trigger gives a way to create on_replace trigger that will modify temporary or is_local spaces during recovery from snapshot, but on that stage of recovery process all space indexes are in special build mode when no check for uniqueness are made. I added a new function 'is_recovery_finished' in box.ctl, which gives user ability to check that we are in snapshot recovery stage and can't insert/replace/update/upsert something. Also i added a check for corresponding operations, now they are failed if user tries to do them during snapshot recovery. @TarantoolBot document Title: Add 'is_recovery_finished' function Add 'is_recovery_finished' function in box.ctl to add user ability to check that we are in snapshot recovery stage and can't insert/replace/update/upsert something Closes #5304
-
- Dec 20, 2020
-
-
Vladislav Shpilevoy authored
swim_quit yields, because it joins the event handler fiber. Hence it can't be called via FFI, where a yield might lead to platform panic. Closes #4570
-
- Dec 19, 2020
-
-
Alexander Turenko authored
Added `--test-timeout <seconds>` argument. 110 seconds by default. The main idea is to don't reach --no-output-timeout when possible and so be able to restart a failed test (according to fragile test checksums) and store artifacts. PR #244. Fixed various cases when test-run doesn't wait for a stopping instance, doesn't try to stop it at all or issue just SIGTERM, without SIGKILL after some delay. PR #257.
-
- Dec 16, 2020
-
-
Leonid Vasiliev authored
The bug was consisted in fail when working with temporary files created by VDBE to sort large result of a `SELECT` statement with `ORDER BY`, `GROUP BY` clauses. Whats happen (step by step): - We have two instances on one node (sharded cluster). - A query is created that executes on both. - The first instance creates the name of the temporary file and checks a file with such name on existence. - The second instance creates the name of the temporary file (the same as in first instance) and checks a file with such name on existence. - The first instance creates a file with the `SQL_OPEN_DELETEONCLOSE` flag. - The second instance opens(try to open) the same file. - The first instance closes (and removes) the temporary file. - The second instance tries to work with the file and fails. Why did it happen: The temporary file name format has a random part, but the random generator uses a fixed seed. When it was decided to use a fixed seed: 32cb1ad2 ("sql: drop useless code from os_unix.c") How the patch fixes the problem: The patch injects the PID in the temporary file name format. The generated name is unique for a single process (due to a random part) and unique between processes (due to the PID part). Alternatives: 1) Use `O_TMPFILE` or `tmpfile()` (IMHO the best way to work with temporary files). In both cases, we need to update a significant part of the code, and some degradation can be added. It's hard to review. 2) Return a random seed for the generator. As far as I understand, we want to have good reproducible system behavior, in which case it's good to use a fixed seed. 3) Add reopening file with the flags `O_CREAT | O_EXCL` until we win the fight. Now we set such flags when opening a temporary file, but after that we try to open the file in `READONLY` mode and if ok - return the descriptor. This is strange logic for me and I don't want to add any aditional logic here. Also, such solution will add additional attempts to open the file. So, it look like such minimal changes will work fine and are simple to review. Co-authored-by:
Mergen <Imeev<imeevma@gmail.com> Fixes #5537
-
Artem Starshov authored
Fixed luacheck warning 111 (setting non-standard global variable) in test/sql-tap directory. Enabled this directory for checking W111 in config file(.luacheckrc). Changed almost all variables in test/sql-tap from globals to locals. In any cases, where variables need to be global, added explicit _G. prefix (table of globals). Fixes #5173 Part-of #5464
-
Artem Starshov authored
Zero-lenght arrays are GNU C extension. There's ISO C99 flexible array member, which is preffered mechanism to declare variable-length types. Flexible array member allows us to avoid applying sizeof operator cause it's incomplete type, so it will be an error at compile time. There're any moments else why it's better way to implement such structures via FAM: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html In this issue it fixed gcc 10 warning: "warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]" Closes #4966 Closes #5564
-
Artem Starshov authored
GCC 10 produces the following error: cc1: warning: function may return address of local variable [-Wreturn-local-addr] Fix it. Part-of #4966
-
- Dec 11, 2020
-
-
Serge Petrenko authored
tuple_field_raw is an alias to tuple_field_raw_by_path with zero path. This involves multiple path != NULL checks which aren't needed for tuple field access by field number. The checks make this function rather slow compared to its 1.10 counterpart (see results below). In order to fix perf problems when JSON path indices aren't involved, factor out the part of tuple_field_raw_by_path which is responsible for direct field access by number and place it in tuple_field_raw. This patch was tested by snapshot recovery part involving secondary index building for a 1.5G snapshot with one space and one secondary index over 4 integer and one string field. Comparison table is below: Version | time(seconds) | Change relative to 1.10 ---------------|----------------|------------------------ 1.10 | 2:24 | -/- 2.x(unpatched) | 3:03 | + 27% 2.x (patched) | 2:10 | - 10% Numbers below show cumulative time spent in tuple_compare_slowpath, for 1.10 / 2.x(unpatched) / 2.x(patched) for 15, 19 and 14 second profiles respectively: 13.9 / 17.8 / 12.5. tuple_field_raw() isn't measured directly, since it's inlined, but all its uses come from tuple_compare_slowpath. As the results show, we manage to be even faster, than 1.10 used to be in this test. This must be due to tuple comparison hints, which are present only in 2.x. Closes #4774
-
Serge Petrenko authored
Since the introduction of JSON path indices tuple_init_field_map, which was quite a simple routine traversing a tuple and storing its field data offsets in the field map, was renamed to tuple_field_map_create and optimised for working with JSON path indices. The main difference is that tuple format fields are now organised in a tree rather than an array, and the tuple itself may have indexed fields, which are not plain array members, but rather members of some sub-array or map. This requires more complex iteration over tuple format fields and additional tuple parsing. All the changes were, however, unneeded for tuple formats not supporting fields indexed by JSON paths. Rework tuple_field_map_create so that it doesn't go through all the unnecessary JSON path-related checks for simple cases and restore most of the lost performance. Below are some benchmark results for the same workload that pointed to the degradation initially. Snapshot recovery time on RelWithDebInfo build for a 1.5G snapshot containing a single memtx space with one secondary index over 4 integer and 1 string field: Version | Time (s) | Difference relative to 1.10 ---------------------------|----------|---------------------------- 1.10 (the golden standard) | 28 | -/- 2.x (degradation) | 39 | + 39% 2.x (patched) | 31 | + 11% Profile shows that the main difference is in memtx_tuple_new due to tuple_init_field_map/tuple_field_map_create performance difference. Numbers below show cumulative time spent in tuple_init_field_map (1.10) / tuple_field_map_create (unpatched) / tuple_field_map_create (patched). 2.44 s / 8.61 s / 3.19 s More benchmark results can be seen at #4774 Part of #4774
-
Mergen Imeev authored
Due to the fact that space_cache_find () is called unnecessarily, it is possible to set diag "Space '0' does not exist", although in this case it is not a wrong situation when the space id is 0. Part of #5592
-
Ilya Kosarev authored
Tarantool codebase had at least two functions to generate random integer in given range and both of them had problems at least with integer overflow. This patch brings nice functions to generate random int64_t in given range without overflow while preserving uniform random generator distribution using unbiased bitmask with rejection method. It is now possible to use xoshiro256++ PRNG or random bytes as a basis. Most relevant replacements have been made. Needed tests are introduced. Closes #5075
-
Alexander V. Tikhonov authored
Found issue: [079] @@ -115,5 +115,14 @@ [079] -- connection is deleted by 'collect'. [079] weak.c [079] --- [079] -- null [079] +- peer_uuid: 035d7b36-f205-45f4-9e16-e5b0b99a9b0b [079] + opts: [079] + reconnect_after: 0.1 [079] + host: unix/ [079] + schema_version: 78 [079] + protocol: Binary [079] + state: error_reconnect [079] + error: Connection refused [079] + peer_version_id: 132864 [079] + port: /tmp/tnt/079_box/proxy.socket-iproto [079] ... Which could not be restarted with checksum because of changing UUID value each run. To avoid of it added filter on 'peer_uuid:' output.
-
Alexander V. Tikhonov authored
Added test-run filter on box.snapshot error message: 'Invalid VYLOG file: Slice [0-9]+ deleted but not registered' to avoid of printing changing data in results file to be able to use its checksums in fragile list of test-run to rerun it as flaky issue. Needed for #4346
-
- Dec 10, 2020
-
-
Alexander V. Tikhonov authored
Found that test replication/skip_conflict_row.test.lua fails with output message in results file: [260] @@ -117,11 +117,23 @@ [260] -- lsn is not promoted [260] lsn1 == box.info.vclock[1] [260] --- [260] -- true [260] +- false [260] ... [260] test_run:wait_upstream(1, {status = 'stopped', message_re = "Duplicate key exists in unique index 'primary' in space 'test'"}) [260] --- [260] -- true [260] +- false [260] +- id: 1 [260] + uuid: bdbf6673-6ee4-47eb-a88d-81164f4e61c9 Test could not be restarted with checksum because of changing values like UUID on each fail. It happend because test-run uses internal chain of functions wait_upstream() -> gen_box_info_replication_cond() which returns instance information on its fails. To avoid of it this output was redirected to log file instead of results file.
-
Alexander V. Tikhonov authored
Found that some tests on fail use box.info* calls to print information, like: [024] --- replication/wal_rw_stress.result Mon Nov 30 10:02:43 2020 [024] +++ var/rejects/replication/wal_rw_stress.reject Sun Dec 6 16:06:46 2020 [024] @@ -77,7 +77,45 @@ [024] r.downstream.status ~= 'stopped') \ [024] end) or box.info [024] --- [024] -- true [024] +- version: 2.7.0-109-g0b3ad5d8a0 [024] + id: 2 [024] + ro: false [024] + uuid: e0b8863f-7b50-4eb5-947f-77f92c491827 It denies test-run from rerunng these tests using checksums, because of changing output on each fail, like 'version:' either 'uuid:' fields values above. To avoid of it, these calls outputs should be redirected to log files using log.error(). Also the same fix made for tests with fio.listdir() and fio.stat() on errors.
-
Alexander V. Tikhonov authored
Found issue on Tarantool package build for Ubuntu 19.10 [1]: E: The repository 'http://archive.ubuntu.com/ubuntu eoan Release' does not have a Release file. E: The repository 'http://archive.ubuntu.com/ubuntu eoan-updates Release' does not have a Release file. E: The repository 'http://archive.ubuntu.com/ubuntu eoan-backports Release' does not have a Release file. E: The repository 'http://security.ubuntu.com/ubuntu eoan-security Release' does not have a Release file. Also found that time life of Ubuntu 19.04 ended with support [2] on 17 of July 2020. So packaging jobs for this OS removed from Gitlab-CI. [1] - https://gitlab.com/tarantool/tarantool/-/jobs/902339975#L172 [2] - https://fridge.ubuntu.com/2020/07/17/ubuntu-19-10-eoan-ermine-end-of-life-reached-on-july-17-2020/#:~:text=Ubuntu%20announced%20its%2019.10%20(Eoan,updated%20packages%20for%20Ubuntu%2019.10.
-
- Dec 08, 2020
-
-
Sergey Ostanevich authored
A problem was gh-4834-netbox-fiber-cancel left a request hanging so the net_msg_max fails in case it follows on the same runner.
-
- Dec 07, 2020
-
-
Oleg Babin authored
Before this patch it was impossible to compare uuid values with string representations of uuid. However we have cases when such comparisons is possible (e.g. "decimal" where we can compare decimal values with strings and numbers). This patch extends uuid comparators (__eq, __lt and __le) and every string argument is tried to be converted to uuid value to compare then. Follow-up #5511 @TarantoolBot document Title: uuid comparison rules Currently comparison between uuid values is supported. Example: ```lua u1 = uuid.fromstr('aaaaaaaa-aaaa-4000-b000-000000000001') u2 = uuid.fromstr('bbbbbbbb-bbbb-4000-b000-000000000001') u1 > u2 -- false u1 >= u2 -- false u1 <= u2 -- true u1 < u2 -- true ``` Also it's possible to compare uuid values with its string representations: ```lua u1_str = 'aaaaaaaa-aaaa-4000-b000-000000000001' u1 = uuid.fromstr(u1_str) u2_str = 'bbbbbbbb-bbbb-4000-b000-000000000001' u1 == u1_str -- true u1 == u2_str -- false u1 >= u1_str -- true u1 < u2_str -- true ```
-
Oleg Babin authored
Since Tarantool has uuid data type sometimes we want to compare uuid values as it's possible for primitive types and decimals. This patch exports function for uuid comparison and implements __le and __lt metamethods for uuid type. Closes #5511
-
Kirill Yukhin authored
* x64: Fix __call metamethod return dispatch.
-
- Dec 06, 2020
-
-
Alexander V. Tikhonov authored
In the previous commit the .tarantoolctl configuration file was placed into the test-run submodule repository as: <tarantool repository>/test-run/.tarantoolctl This commit removes it from the tarantool repository. In fact, it unblocks the `./test-run.py --replication-sync-timeout <seconds>` option and now all tests will actually receive test-run's value for the box.cfg() option (100 seconds by default instead of 300 seconds, which is tarantool's default). Updated tests with replication_sync_timeout check value. Set it to hidden value due to it could be set the other than default in options at test-run run command. Found that no need to copy tarantoolctl configuration file to binary path any more, after it was moved to test-run repository, so reverting changes from: aa609de2 ('cmake for tests updated: copy ctl config in builddir') Needed for #5504
-
Alexander Turenko authored
See commits in the PR [1] for detailed description of the changes. User visible changes are the following. 1. Now test-run.py can be invoked from any directory without changing a current working directory to `test/`. 2. The `test/.tarantoolctl` configuration file is not mandatory and can be removed. It is shipped now within the test-run repository. 3. test-run sets the `replication_sync_timeout` box.cfg() option when the `test/.tarantoolctl` is not present in a parent repository. The value is controlled by the --replication-sync-timeout argument and defaults to 100 seconds (unlike tarantool's default, which is 300 seconds). The reason of the changes is to set default `replication_sync_timeout` for all tests to a value lower than `--no-output-timeout` (120 seconds) to allow instances to step into the orphan mode before this deadline and see more descriptive picture when it leads to failure of a test. What is also important, when a test fails before the `--no-output-timeout`, we able to restart it based on the `fragile` suite.ini option and / or collect artifacts to store them in CI. The `--no-output-timeout` deadline remains the show-stopper. We'll introduce a test execution timeout later to step into the general `--no-output-timeout` only in quite rare and unusual cases. The next commit will actually remove `test/.tarantoolctl`, so the new `replication_sync_timeout` will be in effect. [1]: https://github.com/tarantool/test-run/pull/242 Part of #5504
-
- Dec 04, 2020
-
-
Vladislav Shpilevoy authored
Fakesys is a collection of fake implementations of deep system things such as libev and libc. The fake subsystems will provide API just like their original counterparts (except for function names), but with full control of their behaviour in user-space for the sake of unit testing. Fakeev is a bogus version of libev, whose main feature is virtual time. Fakeev has internal clock, which is fully controllable in user-space. That allows to roll hours of tests in milliseconds of real time. Fakeev is used in SWIM tests, and will be used in Raft tests. Part of #5303
-
Vladislav Shpilevoy authored
SWIM unit tests contain a special library for emulating the event loop: swim_test_ev. It provides API similar to libev, but implemented entirely in user-space, including clock functions. The latter is the most important point, as the original libev does not allow to define your own timing functions - internally it relies on select/kqueue/epoll/poll/select/... with true clock. Because of that it is impossible to perform long tests with the original libev, which could last for minutes or even tens of seconds if their count is big. swim_test_ev uses virtual time, where hours can be played in milliseconds. -- This commit extracts all swim code to swim_test_ev.c. Now this file is nothing but an implementation of swim_ev.h on top of fakeev API. Fakeev, in turn, does not depend on SWIM anymore, and can be moved to fakesys library. Part of #5303
-
Vladislav Shpilevoy authored
SWIM unit tests contain a special library for emulating the event loop: swim_test_ev. It provides API similar to libev, but implemented entirely in user-space, including clock functions. The latter is the most important point, as the original libev does not allow to define your own timing functions - internally it relies on select/kqueue/epoll/poll/select/... with true clock. Because of that it is impossible to perform long tests with the original libev, which could last for minutes or even tens of seconds if their count is big. swim_test_ev uses virtual time, where hours can be played in milliseconds. The fake libev is going to be re-used for Raft unit tests. But for that it is necessary to detach it from all SWIM dependencies. -- The patch renames swim_test_ev.c/.h to fakeev.c/.h because they will contain only fakeev functions soon. The swim methods, implementing swim_ev.h via fakeev, are moved to their own file in a separate commit. Because their file will be swim_test_ev.c. If they would be moved here, git would treat it like everything *except* swim functions was moved to fakeev.h/.c. It would ruin git history, and is split in 2 commits to avoid this. Part of #5303
-
Vladislav Shpilevoy authored
SWIM unit tests contain a special library for emulating the event loop: swim_test_ev. It provides API similar to libev, but implemented entirely in user-space, including clock functions. The latter is the most important point, as the original libev does not allow to define your own timing functions - internally it relies on select/kqueue/epoll/poll/select/... with true clock. Because of that it is impossible to perform long tests with the original libev, which could last for minutes or even tens of seconds if their count is big. swim_test_ev uses virtual time, where hours can be played in milliseconds. The fake libev is going to be re-used for Raft unit tests. But for that it is necessary to detach it from all SWIM dependencies. -- This commit makes all swim_test_ev functions have 'fakeev' prefix instead of 'swim'. The functions, implementing swim_ev.h API, are kept as one-line proxies to the fakeev functions. Part of #5303
-
Vladislav Shpilevoy authored
Fakesys is going to be a collection of fake implementations of deep system things such as libev and libc. The fake subsystems will provide API just like their original counterparts (except for function names), but with full control of their behaviour in user-space for the sake of unit testing. This commit introduces first part of fakesys - a subset of libc network API: sendto(), recvfrom(), bind(), close(), getifaddrs(). Main features of fakenet are: - Integration with event loop via fakenet_loop_update(). Although this could be also considered an issue if it will be ever necessary to implement fake epoll, or sockets not bound to any event loop; - Filters to decide which packets to drop depending on their src, dst, and content; - Socket block to suspend packets delivery until the socket is unblocked. Fakenet implements connection-less API, for UDP sockets. This is exactly what is needed in SWIM. Raft fake transport will need reliable sockets with broadcast API. Reliability can be ensured by setting drop rate to 0 (which is default). Broadcast functionality is already present - there is a broadcast interface in fakenet_getifaddrs() result. Part of #5303
-
Vladislav Shpilevoy authored
SWIM unit tests contain special libraries for emulating event loop and network: swim_test_ev and swim_test_transport. They provide API similar to libev and to network part of libc, which internally is implemented entirely in user-space and allows to simulate all kinds of errors, any time durations, etc. These test libraries are going to be re-used for Raft unit tests. But for that it is necessary to detach them from all SWIM dependencies. -- This commit extracts all swim code to swim_test_transport.c. Now this file is nothing but an implementation of swim_transport.h on top of fakenet API. Fakenet, in turn, does not depend on SWIM anymore, and can be moved to its own library. Part of #5303
-