- Nov 18, 2020
-
-
Roman Khabibov authored
Rename TK_COLUMN used for tokens treated as a column name to TK_COLUMN_REF. It is needed to allow the typing of COLUMN keyword in <ALTER TABLE ADD COLUMN> statement. Needed for #3075
-
- Nov 17, 2020
-
-
Nikita Pettik authored
After upsert rework in 5a61c471 (#5107 issue) update operations are applied consistently corresponding to upserts they belong to: if update operation from single upsert fails - all update operations from that upsert are skipped. But the rest of update operations related to other upserts (after squashing two upserts) are applied. So let's update #4957 test case: now upsert operation can't be processed only if it contains more than BOX_UPDATE_OP_CNT_MAX (4000) operations before (before squashing with any other upsert). Follow-up #4957 Follow-up #5107
-
- Nov 13, 2020
-
-
Cyrill Gorcunov authored
We've VCLOCK_MASK constant which limits the number of regular nodes in a replication cluster. This limit is bound to the vclock_map_t bitset type. Thus when we're tacking voting process in Raft node we better use this type for vote_mask member (otherwise it is a room for error if we ever change VCLOCK_MASK and extend the width of a bitset). Suggested-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- Nov 12, 2020
-
-
Kirill Yukhin authored
Variable `old` which contains tuple reference wasn't unrefed at all. Fix this.
-
- Nov 11, 2020
-
-
Mary Feofanova authored
Degradation caused by #2866 forbade specifying index optios in key parts, which is actually right. Tests were fixed and the behaviour was extended to the other way of specifying index parts. Closes #5473
-
- Nov 10, 2020
-
-
Vladislav Shpilevoy authored
Vclock is used in raft, which is going to be moved to src/lib. That means vclock also should be moved there. It is easy, because vclock does not depend on anything in box/. Needed for #5303
-
Vladislav Shpilevoy authored
Since box_raft is now initialized at runtime and is used from several subsystems (memtx for snapshots; applier for accepting rows; box.info for monitoring), it may be easy to screw the intialization order and accidentally use the not initialized global raft object. This patch adds a sanity check ensuring it does not happen. The raft state is set to 0 at program start. Then any access to the global raft object firstly checks the state not being 0. The initialization order will get trickier when raft will stop using globals from replication and from box, and will be used from them more extensively. Part of #5303
-
Vladislav Shpilevoy authored
All raft functions worked with a global raft object. That would make impossible to move raft to a separate module, where it could be properly unit-tested with multiple raft nodes in each test. The patch adds an explicit raft pointer argument to each raft function as a first part of moving raft to a separate library. The global object is renamed to box_raft_global so as to emphasize this is a global box object, not from the future raft library. Its access now should go through box_raft() function, which will get some sanity checks in the next commit. Part of #5303
-
Vladislav Shpilevoy authored
Struct fiber has a member va_list f_data. It is used to forward arguments to the fiber function when fiber_start() is called, right from the caller's stack. But it is useless when fiber is started asynchronously, with fiber_new + fiber_wakeup. And there is no way to pass anything into such a fiber. This patch adds a new member 'void *f_arg', which shares memory with va_list f_data, and can be used to pass something into the fiber. The feature is going to be used by raft. Currently raft worker fiber works only with global variables, but soon it will need to have its own pointer at struct raft object. And it can't be started with fiber_start(), because raft code does not yield anywhere in its state machine. Needed for #5303
-
Vladislav Shpilevoy authored
Raft state machine crashed if it was configured to be a candidate during a WAL write with a known leader. It tried to start waiting for the leader death, but should have waited for the WAL write end first. The code tried to handle it, but the order of 'if' conditions was wrong. WAL write being in progress was checked last, but should have been checked first. Closes #5506
-
Vladislav Shpilevoy authored
Raft state machine crashed if was restarted during a WAL write being in progress. When the machine was started, it didn't assume there still can be a not finished WAL write from the time it was enabled earlier. The patch makes it continue waiting for the write end. Part of #5506
-
Vladislav Shpilevoy authored
Raft didn't broadcast its state when the state machine was started. It could lead to the state being never sent until some other node would generate a term number bigger that the local one. That happened when a node participated in some elections, accumulated a big term number, then the election was turned off, and a new replica was connected in a 'candidate' state. Then the first node was configured to be a 'voter'. The first node didn't send anything to the replica, because at the moment of its connection the election was off. So the replica started from term 1, tried to start elections in this term, but was ignored by the first node. It waited for election timeout, bumped the term to 2, and the process was repeated until the replica reached the first node's term + 1. It could take very long time. The patch fixes it so now Raft broadcasts its state when it is enabled. To cover the replicas connected while it was disabled. Closes #5499
-
Vladislav Shpilevoy authored
The typo led to not resetting the election timeout to the default value. It was left 1000, and as a result the next election tests could work extremely long. Part of #5499
-
Vladislav Shpilevoy authored
Raft worker fiber does all the heavy and yielding jobs. These are 2 - disk write, and network broadcast. Disk write yields. Network broadcast is slow, so it happens at most once per event loop iteration. The worker on each iteration checks if any of these 2 jobs is active, and if not, it goes to sleep until an explicit wakeup. But there was a bug. Before going to sleep it did a yield + a check that there is nothing to do. However during the yield new tasks could appear, and the check failed, leading to a crash. The patch reorganizes this part of the code so now the worker does not yield between checking new tasks and going to sleep. No test, because extremely hard to reproduce, and don't want to clog this part of the code with error injections.
-
- Nov 03, 2020
-
-
Sergey Ostanevich authored
Static buffer overflow in thread local pool causes random fails on OSX platform. This was caused by an incorrect use of the allocator result. Fixes #5312 Co-authored-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
-
Vladislav Shpilevoy authored
"Too long WAL write" is supposed to warn a user that either the disk write was too long, or the event loop is too slow, maybe due to certain fibers not doing yields often enough. It was printed by the code doing the transaction commit. As a result, for synchronous transactions the check also included the replication time, often overflowing a threshold and printing "too long WAL write" even when it had nothing to do with a WAL write or the event loop being too slow. The patch makes so the warning is checked and printed after WAL write right away, not after commit. Closes #5139
-
Vladislav Shpilevoy authored
txn_complete used to handle all the transaction outcomes: - manual rollback; - error at WAL write; - successful WAL write and commit; - successful WAL write and wait for synchronization with replicas. The code became a mess after synchronous replication was introduced. This patch splits txn_complete's code into multiple pieces. Now WAL write success and fail are handled by txn_on_journal_write() exclusively. It also runs the WAL write triggers. It was very strange to call them from txn_complete(). txn_on_journal_write() also checks if the transaction is synchronous, and if it is not, it completes it with txn_complete_success() whose code is simple now, and only works on committing the changes. In case of fail the transaction always ends up in txn_complete_fail(). These success and fail functions are now used by the limbo as well. It appeared all the places finishing a transaction always know if they want to fail it or complete successfully. This should also remove a few ifs from the hot code of transaction commit. The patch simplifies the code in order to fix the false warning about too long WAL write for synchronous transactions, which is printed not at WAL write now, but at commit. These two events are far from each other for synchro requests. Part of #5139
-
Vladislav Shpilevoy authored
The function is called only by the journal when write is finished. Besides, it may not complete the transaction. In case of synchronous replication it is not enough for completion. It means, it can't have 'complete' in its name. Also the function is never used out of txn.c, so it is removed from txn.h and is now static. The patch is a preparation for not spaming "too long WAL write" on synchronous transactions, because it is simply misleading. Part of #5139
-
- Nov 02, 2020
-
-
Cyrill Gorcunov authored
We should never trust the request data it can carry anything, thus lets make sure we're not decoding some trash into a string. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
The size should be matched to the real size of a buffer, otherwise it is a room for mistake. Same time make sure we're not overriding the buffer. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- Nov 01, 2020
-
-
Alexander V. Tikhonov authored
Found that the previously fixed vinyl/gh.test.lua test in commit: 94dc5bdd ('test: gh test hangs after gh-4957-too-many-upserts') with adding fiber.sleep(1) workaround to avoid of raise from the previously run vinyl/gh-4957-too-many-upserts.test.lua test can be changed in the other way. The new change from one side will leave the found issue untouched to be able to resolve it within opened issue in github. And from the other side it will let the test-run tool to be able to avoid of this issue using fragile list feature to save the stability of testing due to found issue is flaky and can be passed on reruns. The current fix changes the forever waiting loop to especially created for such situations test_run:wait_cond() routine which has timeout in it to avoid of hanging the test till global timeout will occure. It will let the testing to be continued even after the fail. Needed for #5141
-
Alexander V. Tikhonov authored
Added restart the current server to resolve the issue #5141 which reproduced in test: vinyl/gh-5141-invalid-vylog-file.test.lua Added test-run filter on box.snapshot error message: 'Invalid VYLOG file: Slice [0-9]+ deleted but not registered' to avoid of printing changing data in results file to be able to use its checksums in fragile list of test-run to rerun it as flaky issue. Part of #5141
-
Alexander V. Tikhonov authored
Created the stable reproducer for the issue #5141: box.snapshot() --- -- ok +- error: 'Invalid VYLOG file: Slice <NUM> deleted but not registered' ... flaky occured in vinyl/ suite tests if running after the test: vinyl/gh-4957-too-many-upserts.test.lua as new standalone test: vinyl/gh-5141-invalid-vylog-file.test.lua based on test: vinyl/gh-4957-too-many-upserts.test.lua Due to issue not reproduced on FreeBSD 12, then test was blocked with: vinyl/gh-5141-invalid-vylog-file.skipcond Needed for #5141
-
Alexander V. Tikhonov authored
Added test-run filter on box.snapshot error message: 'Invalid VYLOG file: Slice [0-9]+ deleted but not registered' to avoid of printing changing data in results file to be able to use its checksums in fragile list of test-run to rerun it as flaky issue. Also added checksums to fragile list for the following tests: vinyl/iterator.test.lua gh-5141 vinyl/snapshot.test.lua gh-4984 Needed for #5141 Needed for #4984
-
Alexander V. Tikhonov authored
Sometimes it is convenient to use default compiler on CentOS 7. Added test job which uses for compiling default compiler files: CC=/usr/bin/gcc CXX=/usr/bin/g++ Closes #4941
-
Alexander V. Tikhonov authored
Added ENABLE_WERROR flag to build options to enable Werror. Part of #4941
-
Alexander V. Tikhonov authored
Building with gcc-4.8.5 on CentOS 7 found issue: cd /source/build/usr/src/debug/tarantool-2.6.0.144/test/unit && /usr/bin/g++ ... -Wp,-D_FORTIFY_SOURCE=2 ... -O2 ... -O0 -o CMakeFiles/fiber.test.dir/fiber.cc.o -c /source/build/usr/src/debug/tarantool-2.6.0.144/test/unit/fiber.cc In file included from /usr/include/inttypes.h:25:0, from /source/build/usr/src/debug/tarantool-2.6.0.144/src/lib/small/small/region.h:34, from /source/build/usr/src/debug/tarantool-2.6.0.144/src/lib/core/memory.h:33, from /source/build/usr/src/debug/tarantool-2.6.0.144/test/unit/fiber.cc:1: /usr/include/features.h:330:4: error: #warning _FORTIFY_SOURCE requires compiling with optimization (-O) [-Werror=cpp] # warning _FORTIFY_SOURCE requires compiling with optimization (-O) It happened because _FORTIFY_SOURCE=2 flag needed -O[1|2] optimization, but latest set in command was -O0. To fix it removed not needed '-O0' optimization from test/unit/CmakeLists.txt file. This optimization became unneeded after the commit: aa78a941 ("test/uint: fiber") when the test was completely rewritten. Needed for #4941
-
- Oct 30, 2020
-
-
Alexander Turenko authored
Store *.reject files in ${BUILD}/test/var/rejects/<...>/ instead of ${SOURCE}/test/<...>/. The past approach leads to problems with testing, when the out of source build is used and sources are on a read-only filesystem. The main problem is when a test fails, but it is marked as fragile and should be run again. The test fail assumes storing of the .reject file and the testing fails on attempt to write to the read-only filesystem. The re-run is not performed so. Side effect for the in-source build: since the test/var/ directory is gitignored, the *.reject files will not shown in `git status` output anymore. https://github.com/tarantool/test-run/pull/209 Follows up #4874
-
Mary Feofanova authored
Closes #5071 @TarantoolBot document Title: memtx: varbinary supported in bitset indexes Now it is possible to create bitset indexes on the fields of varbinary type, e.g.: s:create_index('b', {type = 'bitset', parts = {1, "varbinary"}})
-
Kirill Yukhin authored
* test: fix warnings spotted by luacheck
-
Mary Feofanova authored
Previously accepted formats of index parts: parts = {field1, type1, field2, type2}, or parts = {{field1, type1, ...}, {field2, type2, ...}} Now it is allowed to write without extra brace if there is one part only: parts = {field1, type1, ...} Closes #2866
-
Sergey Bronnikov authored
To run Jepsen tests in different configurations we need to parametrize run script by options, so lein options and number of nodes passed with environment variables. By default script runs testing with Tarantool built from latest commit. Added these configurations: - single instance - single instance with enabled TXM - cluster with enabled Raft - cluster with enabled Raft and TXM Closes #5437
-
- Oct 29, 2020
-
-
Olga Arkhangelskaia authored
When the connection was not established yet, netbox reported empty error while executing a remote request. Closes #4787
-
- Oct 23, 2020
-
-
Kirill Yukhin authored
-
- Oct 22, 2020
-
-
Aleksandr Lyapunov authored
Tx stories must be linked into correct double-linked list. Preserve it. Part of #5423
-
Aleksandr Lyapunov authored
There was a mess in tuple refernce in TX history. Now it was remade in the following asumptions: * a clean tuple belongs to space, and the space implicitly holds a reference to the tuple. * a dirty tuple belongs to TX manager and a reference is held in the corresponding story. Closes #5423
-
Serge Petrenko authored
When an instance is configured as candidate, it has a leader death timer ticking constantly to schedule an election as soon as leader disappears. When the instance receives the leader's heartbeat, it resets the timer to its initial value. When being a voter, the instance ignores heartbeats, since it has nothing to wait for. So its timer must be stopped. Otherwise it'll try to schedule a new election and fail. Stop the timer on transition from candidate to voter.
-
Vladislav Shpilevoy authored
When a node becomes a leader, it restarts relay recovery cursors to re-send all the data since the last acked row. But during recovery restart the relay lost the trigger, which used to update GC state in TX thread. The patch preserves the trigger. Follow up for #5433
-
Vladislav Shpilevoy authored
When a Raft node is elected as a leader, it should resend all its data to the followers from the last acked vclock. Because while the node was not a leader, the other instances ignored all the changes from it. The resending is done via restart of the recovery cursor in the relay thread. When the cursor was restarted, it used the last acked vclock to find the needed xlog file. But it didn't set the local LSN component, which was 0 (replicas don't send it). When in reality the component was not zero, the recovery cursor still tried to find the oldest xlog file having the first local row. And it couldn't. The first created local row may be gone long time ago. The patch makes the restart keep the local LSN component unchanged, as it was used by the previous recovery cursor, before the restart. Closes #5433
-
Alexander V. Tikhonov authored
box/net.box_incorrect_iterator_gh-841.test.lua gh-5434 replication/election_basic.test.lua gh-5368 replication/election_qsync.test.lua gh-5430 replication/election_qsync_stress.test.lua gh-5395 replication/gh-4402-info-errno.test.lua gh-5366 replication/gh-5426-election-on-off.test.lua gh-5433 wal_off/snapshot_stress.test.lua gh-5431
-