- Apr 10, 2019
-
-
Vladimir Davydov authored
The patch fixes the following test failures: | [022] --- vinyl/errinj_stat.result Tue Mar 19 17:52:48 2019 | [022] +++ vinyl/errinj_stat.reject Wed Mar 20 08:08:41 2019 | [022] @@ -229,7 +229,7 @@ | [022] ... | [022] stat.tasks_inprogress == 0 | [022] --- | [022] -- true | [022] +- false | [022] ... | [022] stat.tasks_completed == 1 | [022] --- | [013] --- vinyl/errinj_stat.result Tue Mar 19 17:52:48 2019 | [013] +++ vinyl/errinj_stat.reject Wed Mar 20 08:11:15 2019 | [013] @@ -168,7 +168,7 @@ | [013] ... | [013] stat.tasks_inprogress > 0 | [013] --- | [013] -- true | [013] +- false | [013] ... | [013] stat.tasks_completed == 0 | [013] --- | [013] @@ -183,7 +183,7 @@ | [013] ... | [013] box.stat.vinyl().scheduler.tasks_inprogress > 0 | [013] --- | [013] -- true | [013] +- false | [013] ... | [013] errinj.set('ERRINJ_VY_RUN_WRITE_DELAY', false) | [013] --- The problem occurred, because the test didn't make sure that an asynchronous dump/compaction task has actually started/completed. Even box.snapshot() doesn't guarantee that a dump task is complete, in fact. This patch adds wait_cond's to guarantee the test never fails like that anymore. Closes #4059 Closes #4060
-
Alexander Tikhonov authored
When a system in under heavy load (say, when tests are run in parallel) it is possible that disc writes stalls for some time. This can cause a fail of a check that a test performs, so now we retry such checks during 60 seconds until a condition will be met. This change targets replication test suite.
-
Alexander Turenko authored
Needed for parallel running of the test suite. Use default replication_connect_timeout (30 seconds) instead of 0.5 seconds. This don't changes meaning of the test cases. Increase replication_timeout from 0.01 to 0.1. These changes allow to run the test 100 times in 50 parallel jobs successfully.
-
Alexander Turenko authored
All changes are needed to eliminate sporadic fails when testing is run with, say, 30 parallel jobs. First, replication_connect_timeout is increased to 30 seconds. This parameter doesn't change meaning of the test cases. Second, increase replication_timeout from 0.01 to 0.03. We usually set it to 0.1 in tests, but a duration of the gh-3160 test case ('Send heartbeats if there are changes from a remote master only') is around 100 * replication_timeout seconds and we don't want to make this test much longer. Runs of the test case (w/o other ones that are in replication/mics.test.lua) in 30 parallel jobs show that 0.03 is enough for the gh-3160 case to pass stably and hopefully enough for the following test cases too.
-
Alexander Turenko authored
It allows to run `./test-run.py -j 1 replication/misc <...> replication/misc` that can be useful when debugging a flaky problem. This ability was broken after after 7474c14e ('test: enable cleaning of a test environment'), because test-run starts to clean package.loaded between runs and so each time the test is run it calls ffi.cdef() under require('rlimit'). This ffi.cdef() call defines a structure, so a second and following attempts to call the ffi.cdef() will give a Lua error. This commit does not change anything in regular testing, because each test runs once (if other is not stated in a configuration list).
-
Vladislav Shpilevoy authored
Before this patch UUID update was the same as introduction of a new member and waiting until the 'old' is dropped as 'dead' by the failure detection component. It could take 2.5 minutes with the default ack timeout. What is more, with GC turned off it would always result in never deleted old UUID. The patch on a UUID update marks the old UUID as 'left' member. In the best and most common case it guarantees that old UUID will be dropped not later than after 2 complete rounds, and marked as 'left' everywhere for log(cluster_size) round steps. Even with GC turned off. Part of #3234
-
Vladislav Shpilevoy authored
Quit allows to gracefully leave a cluster. Other members will not consider the quited instance as dead, and will drop it much earlier than it would happen via failure detection. Quit works as follows: a special message is sent to each member. Members, got that message, will mark the source as 'left' and will keep and disseminate that change for one round. In the best case after one round the left member will be marked as such in the whole cluster. 'Left' member will not be added back because, it is prohibited explicitly to add new 'left' members. Part of #3234
-
Vladislav Shpilevoy authored
Before that patch the swim test event loop worked like this: pop a new event, set the global watch to its deadline, process the event, repeat until the deadlines are the same. These events usually generate IO events, which are processed next. But after swim_quit() will be introduced, it is possible to insert new IO events before protocol's events like round steps and ack checks. Because of that it would be impossible to process new IO events only, with timeout = 0, or with timeout > 0, but without changing the global clock. For example, a typical test would try to call swim_quit() on a swim instance, and expect that it has sent all the quit messages without delays immediately. But before this patch it would be necessary to run at least one swim round to get to the IO processing. The patch splits protocol's events and IO events processing logic into two functions and calls them explicitly in swim_wait_timeout() - the main function to check something in the swim tests. Part of #3234
-
Vladislav Shpilevoy authored
The packets originator has already got an OK status and expects these messages sent even if the originator is closed right after that. This commit does the TCP-way and sends all the pending messages before actually closing the fake fd. Part of #3234
-
Vladislav Shpilevoy authored
Until now it was impossible in swim tests to drop a SWIM instance from the cluster. It should have been either restarted, or blocked, but a real drop led to an assertion on any attemp to use one of methods like swim_wait_timeout(). It was due to inability to get instance's UUID without the instance itself. Even if it was stored in membership tables of other instances. This patch makes swim_cluster store swim instances and UUIDs separately. This is going to be used to test swim_quit() API. Also, some cfg parameters are saved as well, like ack timeout, gc mode. They are used to restart a node with exactly same cfg as it was before restart. Even if original struct swim * is not valid already. Part of #3234
-
- Apr 09, 2019
-
-
Konstantin Osipov authored
Before this patch, it was possible to create a trigger without FOR EACH ROW clause, for example: CREATE TRIGGER trg AFTER DELETE ON tbl BEGIN ; END; In ANSI SQL, if trigger-timing-clause is not specified, FOR EACH STATEMENT is used. Tarantool, however, did not support FOR EACH STATEMENT and assumed FOR EACH ROW. This could break future applications, once FOR EACH STATEMENT is added. Thus, make FOR EACH ROW clause mandatory. Update tests. No docs ticket since there is no docs for this feature yet :/ - will document the fixed behaviour right away.
-
Konstantin Osipov authored
Rename event_queue -> dissemination_queue
-
Konstantin Osipov authored
-
Vladislav Shpilevoy authored
Dissemination components broadcasts events about member status updates. When any member attribute is updated (incarnation, status, UUID, address), the member stands into an event queue. Members from the queue are encoded into each round step message with a higher priority and before anti-entropy section. It means, then even if a cluster consists of hundreds of members and one of them was updated on one of instances, this update will be disseminated regardless of whether this memeber is encoded into anti-entropy section or not. It drastically speeds events dissemination up, according to the SWIM paper, and is noticed in the tests. Part of #3234
-
Vladislav Shpilevoy authored
Before dissemination component it was enough in the tests to either drop all packets to/from a certain member, or do not drop at all. But after dissemination it will be time to test more granulated packet loss table: not 0/100, but 5/10/20/50/.../100 packet loss rate. Part of #3234
-
Vladislav Shpilevoy authored
The test checks that if a member has failed in a big cluster, it is eventually deleted from all instances. But it takes too much real time despite usage of virtual time. This is because member total deletion takes O(N + ack_timeout * 5) time. N so as to wait until every member pinged the failed one at least once, + 3 * ack_timeout to learn that it is dead, and + 2 * ack_timeout to drop it. Of course, it is an upper border, and usually it is faster but not much. For example, on the cluster of size 50 it takes easily 55 virtual seconds. On the contrary, to just learn that a member is dead on every instance takes O(log(N)) according to the SWIM paper. On the same test with 50 instances cluster it takes ~15 virtual seconds to disseminate 'dead' status of the failed member on every instance. And even without dissemination component, with anti-entropy only. Leaping ahead, for the subsequent patches it is tested that with the dissemination component it takes already ~6 virtual seconds. In the summary, without losing test coverage it is much faster to turn off SWIM GC and wait until the failed member looks dead on all instances. Part of #3234
-
Vladislav Shpilevoy authored
At this moment SWIM protocol stores array of members only in one place: inside the anti-entropy component. Its decoding is a simple loop taking the member definitions one by one and upserting them into the member table. But the dissemination also has something kinda like members array: an array of events. The trick is that an event is basically the same as a member +/- a couple of optional fields. Events are also decoded into the member definition structure. It means that anti-entropy decoder can be easily reused. Part of #3234
-
Vladislav Shpilevoy authored
Each member stored in components dissemination and anti-entropy should carry a unique identifier, a status, and an address. Those attributes are UUID, IP, Port, enum swim_member_status, incarnation. Now they are sent only in scope of anti-entropy, but forthcoming dissemination component also would like to use these attributes for each event. This commit makes the vital attributes and their code more reusable by encapsulation of them into a binary passport structure. Part of #3234
-
- Apr 08, 2019
-
-
Konstantin Osipov authored
Rename vy_set() and vy_set_with_colmask() to vy_tx_set() and vy_tx_set_with_colmask() These methods really belong to vy_tx module, so move them there.
-
Serge Petrenko authored
Add the type of operation which is being executed to before_replace and on_replace triggers. Closes #4099 @TarantoolBot document Title: new parameter for space before_replace and on_replace triggers Now before_replace and on_replace triggers accept an additional parameter: the type of operation that is being executed. (INSERT/REPLACE/DELETE/UPDATE/UPSERT) For example, a trigger function may now look like this: ``` function before_replace_trig(old, new, space_name, op_type) if op_type == 'INSERT' then return old else return new end end ``` And will restrict all INSERTs, but allow REPLACEs, UPSERTs, DELETEs and UPDATEs.
-
Roman Tokarev authored
-
- Apr 07, 2019
-
-
Alexander Turenko authored
Add more logging into wait_fullmesh() and return immediately with false when 'stopped' status is observed. The purpose of the change is to provide more information in case of a master-master replication bootstrap failure.
-
Vladimir Davydov authored
Apart from speeding up statement comparisons and hence index lookups, this is also a prerequisite for multikey indexes, which will reuse tuple comparison hints as offsets in indexed arrays. Albeit huge, this patch is pretty straightforward - all it does is replace struct tuple with struct vy_entry (which is tuple + hint pair) practically everywhere in the code. Now statements are stored and compared without hints only in a few places, primarily at the very top level. Hints are also computed at the top level so it should be pretty easy to replace them with multikey offsets when the time comes.
-
Vladimir Davydov authored
This patch adds a helper struct vy_entry, which unites a statement with a hint. We will use this struct to store hinted statements in vinyl data structures, such as cache or memory tree. Note, it's defined in a separate file to minimize dependencies.
-
Vladimir Davydov authored
This patch adds vy_set and vy_set_with_colmask functions. For now they simply forward all arguments to vy_tx_set, but once comparison hints are introduced, they will also compute a hint for the inserted statement. Later, with the appearance of multikey indexes, they will also extract multikey offsets.
-
Vladimir Davydov authored
It's a trivial one-line function, which can be folded without hurting readability, i.e. it only obfuscates the code. Let's kill it.
-
Vladimir Davydov authored
For aesthetic purposes. No functional changes.
-
Vladimir Davydov authored
In the next patch I'm planning to introduce the concept of vy_entry, which will encapsulate a statement stored in a container. Let's rename vy_cache_entry to vy_cache_node so as not to mix the two concepts.
-
Vladimir Davydov authored
So as not to include heavy key_def.h when we only need hint_t.
-
Kirill Shcherbatov authored
The msgpack dependency has been updated because the new version introduces the new method mp_stack_top for the mp_stack class which we will use to store a pointer for a multikey frame to initialize a field_map in case of multikey index. As the library API has changed, the code has been modified correspondingly. @locker: add missing frame update in vy_stmt_new_surrogate_delete. Needed for #1012
-
Serge Petrenko authored
Improve row printing to log. Since say only has 16k buffer, there is no point in printing the whole packet, which can have arbitrary length, in one go. So, print the header row by row, 16 bytes in a row, and format output to match `xxd` output: ``` [001] 2019-04-05 18:22:46.679 [11859] iproto V> Got a corrupted row: [001] 2019-04-05 18:22:46.679 [11859] iproto V> 00000000: A3 02 D6 5A E4 D9 E7 68 A1 53 8D 53 60 5F 20 3F [001] 2019-04-05 18:22:46.679 [11859] iproto V> 00000010: D8 E2 D6 E2 A3 02 D6 5A E4 D9 E7 68 A1 53 8D 53 ``` Now we can get rid of malloc, and use a preallocated tt_static_buf instead. Also, replace a big macro with a small macro and a helper function. Followup to f645119f
-
Vladimir Davydov authored
This reverts commit 8be593ce. Now, as the use-after-free bug in space_truncate() implementation has been fixed, we can enable this test again. Follow-up #4093
-
Vladimir Davydov authored
space_truncate allocates a statement on the stack which is grossly incorrect as the stack may be purged once the function returns while box_process_rw expects the statement to be valid until the end of the transaction. By happy accident, it worked fine until commit 1f7b0d65 ("Require for single statement not autocommit in case of ddl"), which made it possible to run this function from a transaction and hence increased the probability of hitting the use-after-free bug. The fix is trivial: allocate a truncation statement on the region. Fixes commit 353bcdc5 ("Rework space truncation"). Closes #4093
-
Alexander Turenko authored
This commit enables pretest_clean test-run option on 'core = tarantool' test suites with Lua tests and 'core = app' test suites. Consider #4094 for an example of a problem that is eliminated by this option. For 'core = tarantool': this option drops non-system spaces, drops data in system spaces and global variables to the initial state, unloads packages except build-in ones. For 'core = app': this option deletes xlog and snap files before run a test. test-run doesn't remove global variables that are listed in the 'protected_globals' global variable. Use it for, say, functions that are defined in an instance file and called from tests. Consider test-run/README.md for the information how exactly the option works. Removed unused cfg_filter() function from test/engine/box.lua. Fixes #4094.
-
- Apr 05, 2019
-
-
Alexander Turenko authored
This reverts commit 14a87bb7. The test cases generate corrupted xlog files (see #4093) and don't allow other tests to proceed successfully, so we need to temporary disable these cases. They should be enabled back in the scope of #4093.
-
Alexander Turenko authored
* Added default timeout for wait_cond() (60 sec). * Updated pyyaml version in requirements.txt. * Fixed reporting of non-default server fail at start. * Stop 'proxy' when a new non-default instance fails. * Added user-defined protected globals for pretest_clean.
-
Nikita Pettik authored
In SQL type of constant literal (e.g. 1, 2.5, 'abc') is assigned right after parsing and saving into struct Expr. Occasionally, type is re-assigned before emitting opcodes to store literal into VDBE memory. What is more, for floating point number type is changed to "integer". This patch fixes this obvious misbehaviour.
-
Vladimir Davydov authored
Using the const qualifier for complex structures like tuple is bad. We already have to cast it to drop the const qualifier now and then, e.g. to increment/decrement the reference counter. We are planning to wrap struct tuple in a helper struct (aka entry) to store it in vinyl containers along with a comparison hint (cache, memory tree, etc). We will be passing this struct by value so we won't be able to retain const qualifier, because in contrast to a const pointer, one must initialize a const struct upon definition. That said, it's time to drop const qualifier of struct tuple everywhere, like we have already done in case of struct key_def and tuple_format.
-
Mergen Imeev authored
These tables won't be used anymore and should be deleted. Note, this patch breaks backward compatibility between 2.1.1 and 2.1.2, but that's okay as 2.1.1 was beta and we didn't recommend anyone to use it. Part of #2843 Follow up #4069
-
Mergen Imeev authored
Currently, the memory for index_id is not allocated in VDBE code in the sql_code_drop_table() and sql_drop_index() functions. This may lead to SEGMENTATION FAULT. Needed for #2843
-