- Dec 04, 2020
-
-
Alexander Turenko authored
* Added --snapshot and --disable-schema-upgrade arguments (#240). * Fixed reporting of an error for conflicting arguments (#241). The `--snapshot path/to/snapshot` argument copies a given snapshot to a snapshot directory before start a tarantool instance. This allows to verify various functionality in the case, when tarantool is upgraded from a snapshot that is left by an older tarantool version (as opposite to test it on a freshly bootstrapped instance). There are limitations: when a test spawns a replica set, the option does not work correctly. The reason is that the same instance UUIDs (and IDs) cannot be used by different instances in a replica set. Maybe there are other pitfalls. The `--disable-schema-upgrade` argument instructs tarantool to skip execution of the schema upgrade script (using ERRINJ_AUTO_UPGRADE). This way we can verify that, when an instance works on an old schema version, a functionality is workable or at least gives correct error message. This commit only brings the new options into test-run. It does NOT add any new testing targets / rules. Part of #4801
-
- Dec 03, 2020
-
-
Serge Petrenko authored
Follow-up #5440
-
Alexander V. Tikhonov authored
Added replication_connect_timeout to replication/*quorum.lua scripts to decrease replication/quorum.test.lua test run time in 2 times which was 150 seconds before it. Before the patch this test run time was near to 'test-timeout' limit of 110 seconds and even to 120 seconds of 'no-output-timeout' limit. It caused test to fail because of it. Also the test uses to wait for 3rd replica till it will be connected and this timeout helps to avoid of long waitings.
-
Kirill Yukhin authored
Index variable run from 1 .. 5 and was used to index array of size 4. Use iv - 1 instead. Discovered by Coverity.
-
Sergey Voinov authored
Check schema version (stored in box.space._schema) on start and print a warning if it doesn't match last available schema version. It is needed because some users forget to call box.schema.upgrade() after Tarantool update and get stuck with an old schema version until they encounter some hard to debug problems. Closes #4574 Co-developed-by:
Roman Khabibov <roman.habibov@tarantool.org>
-
- Dec 02, 2020
-
-
Sergey Ostanevich authored
Before this patch fiber.cond():wait() just returns for cancelled fiber. In contrast fiber.channel():get() throws "fiber is canceled" error. This patch unifies behaviour of channels and condvars. It also fixes a related net.box module problem #4834 since fiber.cond now performs test for fiber cancellation. Closes #4834 Closes #5013 Co-authored-by:
Oleg Babin <olegrok@tarantool.org> @TarantoolBot document Title: fiber.cond():wait() throws if fiber is cancelled Currently fiber.cond():wait() throws an error if waiting fiber is cancelled.
-
Sergey Ostanevich authored
The fiber_cond_wait() will set an error in case fiber is cancelled. As a result, the current diag in the fiber can be reset during the wal_clear_watcher(). To prevent such overwrite the diag copy from the relay into current fiber is moved to the exit of the relay_subscribe_f(). Part of #5013
-
- Dec 01, 2020
-
-
Serge Petrenko authored
Users usually use box.ctl.wait_rw() to determine the moment when the instance becomes writeable. Since the synchronous replication introduction, this function became pointless, because even when an instance is writeable, it may fail at writing something because its limbo is not empty. To fix the problem introduce a new helper, txn_limbo_is_ro() and start using it in box_update_ro_summary(). Call box_update_ro_summary() every time the limbo gets emptied out or changes an owner. Closes #5440
-
Cyrill Gorcunov authored
Since the commit ae7e2103 we use internal serializer thus we no longer need serpent code. The patch removes the references from the source code and .gitmodules file, still one might need to run | git submodule deinit -f third_party/serpent manually to clean up the working tree depending on local git version. Closes #5517 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- Nov 26, 2020
-
-
Cyrill Gorcunov authored
In case if we're loading a fresh module we put it into a module's cache first which allows us to not reload same module twice (say there could be several functions in same module). But if the module is loaded for the first time and symbol resolution failed we continue keeping this module loaded even if there may be no more use of it. Thus make a cleanup if needed. There is no portable way to verify via test as far as I know, just manually via "lsof -p `pidof tarantool`". Fixes #5475 Reported-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Alexander Turenko authored
Improve logging and error reporting of the testing system. The most visible change is the new --debug option, which shows debug logs on the terminal. See details in [1]. [1]: https://github.com/tarantool/test-run/pull/237
-
Roman Khabibov authored
Print the true name of _session_settings space in error messages. Closes #4732
-
Roman Khabibov authored
Context is just a string with a few characters before and after wrong token, wrong token itself and a symbolic arrow pointing to this token. Closes #4339
-
Roman Khabibov authored
Print tokens themselves instead of token names "T_*" in the error messages. Part of #4339
-
Alexander V. Tikhonov authored
Implemented ability to remove opensuse-leap OS packages.
-
Alexander V. Tikhonov authored
Updated help message on remove option.
-
Alexander V. Tikhonov authored
Added message which file to remove to be sure that the needed files were searched to remove.
-
Alexander V. Tikhonov authored
Found that Sources file destroys when module uploaded without sources. Also found that it could happen for Packages file on modules uploading without binaries. To fix it was added additional its downloading from S3 if in modules it was not updated and routine was not used.
-
Alexander V. Tikhonov authored
Added flaky tests results files checksums: app-tap/logger.test.lua gh-5346 app-tap/tarantoolctl.test.lua gh-5059 box/access.test.lua gh-5373 gh-5411 box/alter.test.lua gh-5557 box/before_replace.test.lua gh-5546 box/cfg.test.lua gh-5530 box/ddl_call_twice_gh-2336.test.lua gh-5560 box/ddl_collation_deleted_gh-3290.test.lua gh-5555 box/gh-4703-on_shutdown-bug.test.lua gh-5560 box/hash_gh-1467.test.lua gh-5476 gh-5504 box/iterator.test.lua gh-5523 box/leak.test.lua gh-5548 box/net.box_connect_timeout_gh-2054.test.lua gh-5548 box/net.box_count_inconsistent_gh-3262.test.lua gh-5532 box/net.box_field_names_gh-2978.test.lua gh-5554 box/net.box_get_connection_object.test.lua gh-5549 box/net.box_gibberish_gh-3900.test.lua gh-5548 box/net.box_incorrect_iterator_gh-841.test.lua gh-5434 box/net.box_index_unique_flag_gh-4091.test.lua gh-5551 box/net.box_iproto_hangs_gh-3464.test.lua gh-5548 box/net.box_log_corrupted_rows_gh-4040.test.lua gh-5548 box/net.box_reload_schema_gh-636.test.lua gh-5550 box/net.box_schema_change_gh-2666.test.lua gh-5547 box/on_shutdown.test.lua gh-5562 box/schema_reload.test.lua gh-5552 box/select.test.lua gh-5548 box/tree_pk_multipart.test.lua gh-5528 gh-5556 box-tap/gh-4231-box-execute-locking.test.lua gh-5558 box-tap/session.test.lua gh-5346 box-tap/session.storage.test.lua gh-5346 engine/conflict.test.lua gh-5516 engine/tuple.test.lua gh-5480 replication/bootstrap_leader.test.lua gh-5478 replication/box_set_replication_stress.test.lua gh-4992 replication/gh-3160-misc-heartbeats-on-master-changes.test.> gh-4940 replication/ddl.test.lua gh-5337 replication/election_basic.test.lua gh-5368 replication/election_qsync.test.lua gh-5430 replication/election_qsync_stress.test.lua gh-5395 replication/gh-5287-boot-anon.test.lua gh-5412 replication/gh-5426-election-on-off.test.lua gh-5506 replication/prune.test.lua gh-5361 replication/rebootstrap.test.lua gh-5524 replication/show_error_on_disconnect.test.lua gh-5371 replication/sync.test.lua gh-3835 replication/transaction.test.lua gh-5563 sql/prepared.test.lua gh-5359 sql/checks.test.lua gh-5477 sql/gh2808-inline-unique-persistency-check.test.lua gh-5479 swim/swim.test.lua gh-5403 gh-5561 vinyl/deferred_delete.test.lua gh-5089 vinyl/errinj_tx.test.lua gh-5539 vinyl/gh-4810-dump-during-index-build.test.lua gh-5031 vinyl/gh-4957-too-many-upserts.test.lua gh-5378 vinyl/gh-5141-invalid-vylog-file.test.lua gh-5141 vinyl/gc.test.lua gh-5474 vinyl/iterator.test.lua gh-5141 vinyl/replica_rejoin.test.lua gh-4985 vinyl/snapshot.test.lua gh-4984 vinyl/tx_gap_lock.test.lua gh-4309 xlog/panic_on_broken_lsn.test.lua gh-4991
-
- Nov 23, 2020
-
-
Cyrill Gorcunov authored
It is never used and placed here accidentally. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
The instance_id name is too general, we use it in node's identification while limbo simply "belongs" to those who tracks current transactions queue. Lets rename it to owner_id to distinguish from global instance_id and better grepability. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
To make sure we won't access out of bounds in lsn array. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
We rely heavily that VCLOCK_MAX is 32 bits wide and using "VCLOCK_MAX - 1" as a mask for safe access to the replica id is simply misleading. Instead use assert here because we might change the number of supported replicas one day and they do not have to be pow2 value. And no need for this completely useless comment. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Vladislav Shpilevoy authored
Raft algorithm code does not depend on box anymore, and is moved to src/lib/raft. This is done to be able to unit test raft similarly to swim - with virtual event loop, network, time, and disk. Using any number of instances. That will allow to cover all crazy and rare cases possible in raft, but without problems of functional tests stability and clumsiness. Part of #5303
-
Vladislav Shpilevoy authored
Last piece of src/box used in raft code was error.h. It was added to be able to raise ClientErrors. To get rid of it the libraries usually introduce their own error type available from src/lib/core. Such as CollationError, SwimError, CryptoError. This patch adds RaftError and removes the last box dependency from raft code. Part of #5303
-
Vladislav Shpilevoy authored
RO summary update is supposed to make the instance read-only, when it becomes a follower, and read-write when becomes a leader. But it makes raft depend on box, and prevents raft move to a separate library. The patch moves the RO update to box-raft. This became possible after some preparatory work was done to make raft update triggers non-yielding, and invoked right after state change (without a yield between the change and the triggers). Part of #5303
-
Vladislav Shpilevoy authored
Raft used to call on_update trigger from the worker fiber. It was done because it could yield. But it is not the case anymore. The only yielding operation was box_clear_synchro_queue(), which is not called from the trigger now. That makes possible to call the trigger from within of the state machine. And this removes the yield between the raft state change and the trigger invocation. What, in turn, allows to move all box-related urgent updates to the trigger. Such as box_update_ro_summary(). Part of #5303
-
Vladislav Shpilevoy authored
The synchro queue was cleared from the raft on_update trigger installed by box. It was fine as long as the trigger is called from the worker fiber, because it won't block the state machine, while the synchro queue clearance yields. But the trigger is going to be called from the raft state machine directly soon. Because it will need to call box_update_ro_summary() right after raft state is updated, without a yield to switch to the worker fiber. This will be done in scope of getting rid of box in the raft library. It means, the trigger can't call box_clear_synchro_queue(). But it can schedule its execution for later, since the worker fiber now belongs to box. The patch does it. Part of #5303
-
Vladislav Shpilevoy authored
Worker fiber is used by raft library to perform yielding tasks like WAL write, and simply long tasks like network broadcast. That allows not to block the raft state machine, and to collect multiple updates during an event loop iteration to flush them all at once. While the worker fiber was inside raft library, it wasn't possible to use it for anything else. And that is exactly what is going to be needed. The reason chain is quite long. It all starts from that the elimination of all box appearances from raft library also includes relocation of box_update_ro_summary(). The only place it can be moved to is box_raft_on_update trigger. The trigger is currently called from the raft worker fiber. It means, that between raft state update and trigger invocation there is a yield. If box_update_ro_summary() would be blindly moved to the trigger, users sometimes could observe miracles like instance role being 'follower', but the node is still writable if it was a leader before, because box_raft_on_update wasn't invoked yet, and it didn't update RO summary. Assume, the on_update triggers are invoked by raft not in the worker fiber, but right from the state machine. Then box_update_ro_summary() would always follow a state change without a yield. However that creates another problem - the trigger also calls box_clear_synchro_queue(), which yields. But on_update triggers must not yield so as not to block the state machine. This can be easily solved if it would be possible to schedule box_clear_synchro_queue() from on_update trigger to be executed later. And after this patch it becomes possible, because the worker fiber now can be used not only to handle raft library async work, but also for box-raft async work, like the synchro queue clearance. Part of #5303
-
Vladislav Shpilevoy authored
This is a general practice throughout the code. If a fiber is not cancellable, it always means a system fiber which can't be woken up or canceled until its work done. It is used in gc (about xlogs), recovery, vinyl, WAL, at least. Before raft used flag raft.is_write_in_progress. But it won't work soon, because the worker fiber will move to box/raft.c, where it would be incorrect to rely on deeply internal parts of struct raft, such as is_write_in_progress. Hence, this patch makes raft use a more traditional way of spurious wakeup avoidance. Part of #5303
-
Vladislav Shpilevoy authored
Raft used to depend on xrow, because it used raft_request as a communication and persistence unit. Xrow is a part of src/box library set, so it blocked raft extraction into src/lib/raft. This patch makes raft not depend on xrow. For that raft introduces a new communication and persistence unit - struct raft_msg. Interestingly, throughout its source code raft already uses term 'message' to describe requests, so this patch also restores the consistency. This is because raft_request name was used to be consistent with other *_request structs in xrow.h. Now raft does not depend on this, and can use its own name. Struct raft_msg repeats raft_request literally, but it actually makes sense. Because when raft is extracted to a new library, it may start evolving independently. Its raft_msg may be populated with new members, or their behaviour may change depending on how the algorithm will evolve. But inside box it will be possible to tweak and extend raft_msg whenever it is necessary, via struct raft_request, and without changing the basic library. For instance, in future we may want to make nodes forward the messages to each other during voting to speed the process up, and for that we may want to add an explicit 'source' field to raft_request, while it won't be necessary on the level of raft_msg. There is a new compatibility layer in src/box/raft.h which hides raft_msg details from other box code, and does the msg <-> request conversions. Part of #5303
-
Vladislav Shpilevoy authored
Raft is being moved to a separate library in src/lib. It means, it can't depend on anything from box/. The patch makes raft stop using replicaset and journal objects. They were used to broadcast messages to all the other nodes, and to persist updates. Now raft does the same through vtab, which is configured by box. Broadcast still sends messages via relays, and disk write still uses the journal. But raft does not depend on any specific journal or network API. Part of #5303
-
Vladislav Shpilevoy authored
Raft is being moved to a separate library in src/lib. It means, it can't depend on anything from box/. The patch makes raft stop using replicaset.vclock. Instead, it has a new option 'vclock'. It is stored inside struct raft by pointer and should be configured using raft_cfg_vclock(). Box configures it to point at replicaset.vclock like before. But now raftlib code does not depend on it explicitly. Vclock is stored in Raft by pointer instead of by value so as not to update it for each transaction. It would be too high price to pay for raft independence from box. Part of #5303
-
Vladislav Shpilevoy authored
Raft is never supposed to change vclock. Not the stored one, nor the received ones. The patch makes it checked during compilation. The patch is mostly motivated by a next patch making Raft use an externally configured vclock which can't be changed. Since Raft uses raft_request to carry the vclock in a few places, the request's vclock also must become const. Part of #5303
-
Vladislav Shpilevoy authored
Raft is being moved to a separate library in src/lib. It means, it can't depend on anything from box/. The patch makes raft stop using instance_id. Instead, it has a new option 'instance_id'. It is stored inside struct raft as 'self', and should be configured using raft_cfg_instance_id(). The configuration is done when bootstrap ends and the instance_id is either recovered successfully, or the instance is anonymous. While working on this, I also considered introducing a new function raft_boot() instead of raft_cfg_instance_id(). Which I would also use to configure vclock later. Raft_boot() would be meant to be called only one time with non-dynamic parameters instance_id and vclock. But then I decided to keep adding new raft_cfg_*() functions. Because: - It is more consistent with the existing options; - Does not require to think about too many different functions like raft_create(), raft_boot(), raft_cfg_*() and in which order to call them; Also I was thinking to introduce a single raft_cfg() like I did in swim with swim_cfg(), to reduce number of raft_cfg_*() functions, but decided it would be even worse with so many options. Part of #5303
-
Vladislav Shpilevoy authored
Raft is being moved to a separate library in src/lib. It means, it can't depend on anything from box/, including global replication parameters such as replication_synchro_quorum. The patch makes raft stop using replication_synchro_quorum. Instead, it has a new option 'election_quorum'. Note, that this is just raft API. Box API still uses replication_synchro_quorum. But it is used to calculate the final quorum in src/box/raft, not in src/box/raftlib. And to pass it to the base raft implementation. Part of #5303
-
Vladislav Shpilevoy authored
Raft is being moved to a separate library in src/lib. It means, it can't depend on anything from box/, including global replication parameters such as replication_timeout, and functions like replication_disconnect_timeout(). The patch makes raft stop using replication_disconnect_timeout(). Instead, it stores death timeout in struct raft. It is configured by box simultaneously with replication_timeout. Part of #5303
-
Vladislav Shpilevoy authored
The commit moves raft functions and objects specific for box to src/box/raft from src/box/box and src/box/raftlib. The goal is to gradually eliminate all box dependencies from src/box/raftlib and move it to src/lib/raft. It makes the compilation work again after the previous commit broke it. Part of #5303
-
Vladislav Shpilevoy authored
The commit renames raft.h and raft.c to raftlib.h and raftlib.c. This is done to prepare to raft split into src/box/ and src/lib/raft. The commit is not atomic, the build won't work here. Because if raft is renamed to raftlib, and in the same commit new raft.c and raft.h are added, git thinks the original file was changed, and ruins all the git history. By splitting move of raft to raftlib and introduction of box/raft into 2 commits the git history is saved. Part of #5303
-
- Nov 22, 2020
-
-
Sergey Ostanevich authored
A number of places in sql.c uses direct access to box_process_rw() that does not check read-only setting. Fixed by use of an intended interface of box_process1(). Closes #5231
-