- Mar 15, 2019
-
-
Vladimir Davydov authored
An open iterator may disrupt the following test run, because it may prevent dump/compaction from purging stale rows. In particular, iterators left by vinyl/iterator result in the following test failure: | --- vinyl/deferred_delete.result Mon Feb 11 19:14:01 2019 | +++ vinyl/deferred_delete.reject Fri Mar 15 16:21:11 2019 | @@ -155,7 +155,7 @@ | ... | pk:stat().rows -- 5 new REPLACEs | --- | -- 5 | +- 10 | ... | i1:stat().rows -- 10 old REPLACE + 5 new REPLACEs + 10 deferred DELETEs | --- Fix this by calling the Lua garbage collector to delete all dangling iterators in the end of vinyl/iterator test. Closes #3862
-
- Mar 14, 2019
-
-
Serge Petrenko authored
The function evio_timeout_update() failed to update the starting time point, which lead to timeouts happening much faster than they should if there were consecutive calls to the function. This lead, for example, to applier timing out while reading a several megabyte-size row in 0.2 seconds even if replication_timeout was set to 15 seconds. Closes #4042
-
- Mar 13, 2019
-
-
Alexander Turenko authored
Run a unit test from a var directory. It is needed to, say, allow a test to write a log file to a gitignored directory. The only behaviour change observed on tarantool's tests is that unit/swim.test and unit/swim_proto.test write a log.txt file to a test var directory ./test/xxx_unit instead of ./test.
-
Sergei Voronezhskii authored
Fixed issues: - box-py/iproto.test.py 1) Fixed receive_response() to wait for whole response. 2) Clean up _cluster space. - replication-py/cluster.test.py 1) Clean up _cluster space. - replication-py/multi.test.py 1) Removed vclock checking because it fails if previous test make some DML and vclock was incremented. Looks like it was used for debug and is not part of this test case. 2) Fixed typo in 'Synchronize' block. The following test sequences did fail due to unexpected IDs in _cluster space: - [box-py/iproto.test.py, null] - [box-py/bootstrap.test.py, null] - [replication-py/cluster.test.py, null] - [replication-py/multi.test.py, null] Part of #3232
-
Sergei Voronezhskii authored
Also enabled it for luajit-tap. Part of #3232
-
Vladimir Davydov authored
It had been used only in Vinyl's vy_stmt_new_surrogate_from_key, which was deleted by the previous patches, so we can drop it as well.
-
Vladimir Davydov authored
There are three places where we use this expensive functions while we could get along with a cheaper one: - Deferred DELETE space on_replace trigger. Here we can use simple vy_stmt_new_delete, because the trigger is already passed a surrogate DELETE statement. - Secondary index build on_replace trigger. Here we can extract the secondary key, set its type to DELETE and insert it into the index. We don't need all the other indexed fields. - Secondary index build recovery procedure. Similarly to the previous case, we can use extracted here rather than building a surrogate DELETE statement.
-
Vladimir Davydov authored
This heavy function isn't needed anymore, as we can now insert key statements into the memory level.
-
Vladimir Davydov authored
In contrast to a primary index, which stores full tuples, secondary indexes only store extended (secondary + primary) keys on disk. To make them look like tuples, we fill missing fields with nulls (aka tuple surrogate). This isn't going to work nicely with multikey indexes though: how would you make a surrogate array from a key? We could special-case multikey index handling, but that would look cumbersome. So this patch removes nulls from secondary tuples restored from disk altogether. To achieve that, it's enough to use key_format for them - then the comparators will detect that it's actually a key, not a tuple and use the appropriate primitive.
-
Vladimir Davydov authored
It's actually only needed to initialize disk streams so let's pass it to vy_write_iterator_new_slice() instead.
-
Vladimir Davydov authored
By convention we have two methods in each write iterator stream implementation (including the write iterator itself as it implements the interface too): 'stop' and 'close'. The 'stop' method is called in a worker thread. It reverses the effect of 'start'. We need it unreference all tuples referenced during the iteration (we must do it in the worker thread, where the tuples were referenced in the first place so as not to unreference tuple formats, see vy_tuple_delete). The 'close' method is called from the tx thread to unreference tuple formats if necessary and release memory. For the write iterator itself we follow this convention. However, for individual sources, for vy_slice_stream source to be more exact, we do not - the write iterator calls both 'stop' and 'close' from its own 'stop method. Let's cleanup this mess and make the write iterator follow the convention. We'll need it in the next patch.
-
Vladimir Davydov authored
Use the format of the given statement instead. Passing format is a legacy from the time when we have a separate format for UPSERTs. Nowadays it only obfuscates the code.
-
Vladimir Davydov authored
A Vinyl statement may be either a key or a tuple. We must use different functions for the two kinds when working with a bloom filter. Let's introduce helpers incorporating that logic. Notes: - Currently, we never add keys to bloom filters, but after the next patch we will, so this patch adds tuple_bloom_builder_add_key helper. - According to the function protocol, tuple_bloom_builder_add may fail with out-of-memory, but we never checked that. Fix that while we are at it.
-
Vladimir Davydov authored
No functional changes, just move a piece of code, so as not to mix it in the next patch.
-
Vladimir Davydov authored
Tuple bloom filter is an array of bloom filters, each of which reflects lookups by all possible partial keys. To optimize the overall bloom filter size, we need to know how many unique elements there are for each partial key. To achieve that, we require the caller to pass the number of key parts that have been hashed for the given tuple. Here's how it looks in Vinyl: uint32_t hashed_parts = writer->last_stmt == NULL ? 0 : tuple_common_key_parts(stmt, writer->last_stmt, writer->key_def); tuple_bloom_builder_add(writer->bloom, stmt, writer->key_def, hashed_parts); Actually, there's no need in such a requirement as instead we can calculate the hash value for the given tuple, compare it with the hash of the tuple added last time, and add the new hash only if the two values differ. This should be accurate enough while allowing us to get rid of the cumbersome tuple_common_key_parts helper. Note, such a check will only work if tuples are added in the order defined by the key definition, but that already holds - anyway, one wouldn't be able to use tuple_common_key_parts either if it wasn't true. While we are at it, refresh the obsolete comment to tuple_bloom_builder.
-
Vladimir Davydov authored
To differentiate between key and tuple statements in comparators, we set IPROTO_SELECT type for key statements. As a result, we can't use key statements in the run iterator directly although secondary index runs do store statements in key format. Instead we create surrogate tuples filling missing fields with NULLs. This won't play nicely with multikey indexes so we need to teach iterators to deal with statements in key format. The first step in this direction is dropping IPROTO_SELECT in favor of identifying key statements by format.
-
Vladimir Davydov authored
Currently, it's called vy_stmt_new_select, but soon a key statement will be allowed to have any type, not just IPROTO_SELECT. So let's rename it to vy_key_new.
-
Vladimir Davydov authored
Store tuple_format_vtab, max_tuple_size, and key_format there. This will allow us to determine a statement type (key or tuple) by checking its format against key_format.
-
Vladimir Davydov authored
A vinyl statement (vy_stmt struct) may represent either a tuple or a key. We differentiate between the two kinds by statement type - we use SELECT for keys and other types for tuples. This was done that way so that we could pass both tuples and keys to a read iterator as a search key. To avoid branching in comparators when the types of compared statements are known in advance, we provide several comparators, each of which expects certain statement types, e.g. a tuple and a key. Actually, such a micro optimization looks like an overkill, because a typical comparator is called by function pointer and has a lot of comparisons in the code, see tuple_compare_slowpath for instance. Eliminating one branch will hardly make the code perform better. At the same time, it makes the code more difficult to write. Besides, once we remove nils from statements read from disk (aka surrogate tuples), which will ease implementation of multikey indexes, the number of places where types of compared statements are known will diminish drastically. That said, let's remove optimized comparators and always use vy_stmt_compare, which checks types of compared statements and calls the appropriate comparator.
-
Vladimir Davydov authored
We advance replica->gc state only when an xlog file is fully recovered, see recovery_close_log and relay_on_close_log_f. It may turn out that an xlog file is fully recovered, but isn't closed properly by relay (i.e. recovery_close_log isn't called), because the replica closes connection for some reason (e.g. timeout). If this happens, the old xlog file won't be removed when the replica reconnects, because we don't advance replica->gc state on reconnect, so the useless xlog file won't be removed until the next xlog file is relayed. This results in occasional replication/gc.test.lua failures. Fix this by updating replica->gc on reconnect with the current replica vclock. Closes #4034
-
Kirill Shcherbatov authored
The function on_replace_trigger_rollback in the case of a replace operation rollback was called with an incorrect argument, as a result of which the used memory was freed.
-
Kirill Shcherbatov authored
The set_system_triggers and erase routines in upgrade.lua did not proceed actions for _fk_constraint space.
-
- Mar 12, 2019
-
-
Kirill Shcherbatov authored
Reworked memtx_tree class to use structure memtx_tree_data as a tree node. This makes possible to extend it with service field to implement tuple hints and multikey indexes in subsequent patches. Needed for #3961
-
Kirill Shcherbatov authored
The http library intelligently sets the headers "Accept", "Connection", "Keep-Alive". However, when the user explicitly specified them in the header options section of the call argument, they could be written to the HTTP request twice. We postponed the auto headers setup before https_exececute call. Now they are set only if they were not set by the user. Closes #3955
-
- Mar 11, 2019
-
-
Nikita Pettik authored
When we allowed using HAVING clause without GROUP BY (b40f2443), one possible combination was forgotten to be tested: SELECT 1 FROM te40 HAVING SUM(s1) < 0; -- And SUM(s1) >= 0, i.e. HAVING condition is false. In other words, resulting set contains no aggregates, but HAVING does contain, but condition is false. In this case no byte-code related to aggregate execution is emitted at all. Hence, query above equals to simple SELECT 1; Unfortunately, result of such query is the same when condition under HAVING clause is unsatisfied. To fix this behaviour, it is enough to indicate to byte-code generator that we should analyze aggregates not only in ORDER BY clauses, but also in HAVING clause. Closes #3932 Follow-up #2364
-
Nikita Pettik authored
Functions such as trim(), substr() etc should return result with collation derived from their arguments. So, lets add flag indicating that collation of first argument must be applied to function's result to SQL function definition. Using this flag, we can derive appropriate collation in sql_expr_coll(). Part of #3932
-
Georgy Kirichenko authored
Form a separate transaction with local changes in case of replication. This is important because we should be able to replicate such changes (e.g. made within an on_replace trigger) back. In the opposite case local changes will be incorporated into originating transaction and would be skipped by the originator replica. Needed for #2798
-
- Mar 07, 2019
-
-
Vladimir Davydov authored
Fixes commit 8031071e ("Lightweight vclock_create and vclock_copy"). Closes #4033
-
Nikita Pettik authored
BLOB column type is represented by SCALAR field type in terms of NoSQL. We attempted at emulating BLOB behaviour, but such efforts turn out to be not decent enough. For this reason, we've decided to abandon these attempts and fairly replace it with SCALAR column type. SCALAR column type acts in the same way as it does in NoSQL: it is aggregator-type for INTEGER, NUMBER and STRING types. So, column declared with this type can contain values of these three (available in SQL) types. It is worth mentioning that CAST operator in this case does nothing. Still, we consider BLOB values as entries encoded in msgpack with MP_BIN format. To make this happen, values to be operated should be represented in BLOB form x'...' (e.g. x'000000'). What is more, there are two built-in functions returning BLOBs: randomblob() and zeroblob(). On the other hand, columns with STRING NoSQL type don't accept BLOB values. Closes #4019 Closes #4023 @TarantoolBot document Title: SQL types changes There are couple of recently introduced changes connected with SQL types. Firstly, we've removed support of DATE/TIME types from parser due to confusing behaviour of these types: they were mapped to NUMBER NoSQL type and have nothing in common with generally accepted DATE/TIME types (like in other DBs). In addition, all built-in functions related to these types (julianday(), date(), time(), datetime(), current_time(), current_date() etc) are disabled until we reimplement TIME-like types as a native NoSQL ones (see #3694 issue). Secondly, we've removed CHAR type (i.e. alias to VARCHAR and TEXT). The reason is that according to ANSI SQL CHAR(len) must accept only strings featuring length exactly equal to given in type definition. Obviously, now we don't provide such checks. Types VARCHAR and TEXT are still legal. For the same reason, we've removed NUMERIC and DECIMAL types, which were aliases to NUMBER NoSQL type. REAL, FLOAT and DOUBLE are still exist as aliases. Finally, we've renamed BLOB column type to SCALAR. We've decided that all our attempts to emulate BLOB behaviour using SCALAR NoSQL type don't seem decent enough, i.e. without native NoSQL type BLOB there always will be inconsistency, especially taking into account possible NoSQL-SQL interactions. In SQL SCALAR type works exactly in the same way as in NoSQL: it can store values of INTEGER, FLOAT and TEXT SQL types at the same time. Also, with this change behaviour of CAST operator has been slightly corrected: now cast to SCALAR doesn't affect type of value at all. Couple of examples: CREATE TABLE t1 (a SCALAR PRIMARY KEY); INSERT INTO t1 VALUES ('1'); SELECT * FROM t1 WHERE a = 1; -- [] Result is empty set since column "a" contains string literal value '1', not integer value 1. CAST(123 AS SCALAR); -- Returns 123 (integer) CAST('abc' AS SCALAR); -- Returns 'abc' (string) Note that in NoSQL values of BLOB type defined as ones decoded in msgpack with MP_BIN format. In SQL there are still a few ways to force this format: declaring literal in "BLOB" format (x'...') or using one of two built-in functions (randomblob() and zeroblob()). TEXT and VARCHAR SQL types don't accept BLOB values: CREATE TABLE t (a TEXT PRIMARAY KEY); INSERT INTO t VALUES (randomblob(5)); --- - error: 'Tuple field 1 type does not match one required: expected string' ... BLOB itself is going to be reimplemented in scope of #3650.
-
Nikita Pettik authored
NMERIC and DECIMAL were allowed to be specified as column types. But in fact, they were just synonyms for FLOAT type and mapped to NUMERIC Tarantool NoSQL type. So, we've decided to remove this type from parser and return back when NUMERIC will be implemented as a native type. Part of #4019
-
Nikita Pettik authored
Since now no checks connected with length of string are performed, it might be misleading to allow specifying this type. Instead, users must rely on VARCHAR type. Part of #4019
-
Nikita Pettik authored
Currently, there is no native (in Tarantool terms) types to represent time-like types. So, until we add implementation of those types, it makes no sense to allow to specify those types in table definition. Note that previously they were mapped to NUMBER type. For the same reason all built-in functions connected with DATE/TIME are disabled as well. Part of #4019
-
Vladislav Shpilevoy authored
SWIM - Scalable Weakly-consistent Infection-style Process Group Membership Protocol. It consists of 2 components: events dissemination and failure detection, and stores in memory a table of known remote hosts - members. Also some SWIM implementations have an additional component: anti-entropy - periodical broadcast of a random subset of members table. Dissemination component spreads over the cluster changes occurred with members. Failure detection constantly searches for failed dead members. Anti-entropy just sends all known information at once about a member so as to synchronize it among all other members in case some events were not disseminated (UDP problems). Anti-entropy is the most vital component, since it can work without dissemination and failure detection. But they can not work properly with out the former. Consider the example: two SWIM nodes, both are alive. Nothing happens, so the events list is empty, only pings are being sent periodically. Then a third node appears. It knows about one of existing nodes. How should it learn about another one? Sure, its known counterpart can try to notify another one, but it is UDP, so this event can get lost. Anti-entropy is an extra simple component, it just piggybacks random part of members table with each regular round message. In the example above the new node will learn about the third one via anti-entropy messages of the second one soon or late. This is why anti-entropy is the first implemented component. Part of #3234
-
Kirill Shcherbatov authored
In order to give a user ability to use a delimiter symbol within a code the real delimiter is user-provided 'delim' plus "\n". Since telnet sends "\r\n" on line break, the updated expression delim + "\n" could not be found in a sequence data+delim+"\r\n", so delimiter feature did not work at all. Added delim + "\r" check along with delim + "\n", that solves the described problem and does not violate backward compatibility. Closes #2027
-
Georgy Kirichenko authored
Remove xstream dependency and use direct box interface to apply all replication rows. This is refactoring needed for transactional replication. Needed for #2798
-
Mergen Imeev authored
The module table.c is not used and should be removed.
-
- Mar 06, 2019
-
-
Vladimir Davydov authored
The test creates a space, but doesn't drop it, which leads to box-tap/on_schema_init failure: | box-tap/trigger_yield.test.lua [ pass ] | box-tap/on_schema_init.test.lua [ fail ] | Test failed! Output from reject file box-tap/on_schema_init.reject: | TAP version 13 | 1..7 | ok - on_schema_init trigger set | ok - system spaces are accessible | ok - before_replace triggers | ok - on_replace triggers | ok - set on_replace trigger | ok - on_schema_init trigger works | | Last 15 lines of Tarantool Log file [Instance "app_server"][/Users/travis/build/tarantool/tarantool/test/var/002_box-tap/on_schema_init.test.lua.tarantool.log]: | 2019-03-06 17:00:12.057 [87410] main/102/on_schema_init.test.lua F> Space 'test' already exists Fix this.
-
Serge Petrenko authored
This patch introduces an on_schema_init trigger. The trigger may be set before box.cfg() is called and is called during box.cfg() right after prototypes of system spaces, such as _space, are created. This allows to set triggers on system spaces before any other non-system data is recovered. For example, it is possible to set an on_replace trigger on _space, which will work even during recovery. Part of #3159 @TarantoolBot document Title: document box.ctl.on_schema_init triggers on_schema_init triggers are set before the first call to box.cfg() and are fired during box.cfg() before user data recovery start. To set the trigger, say ``` box.ctl.on_schema_init(new_trig, old_trig) ``` where `old_trig` may be omitted. This will replace `old_trig` with `new_trig`. Such triggers let you, for example, set triggers on system spaces before recovery of any data, so that the triggers are fired even during recovery. For example, such triggers make it possible to change a specific space's storage engine or make a replicated space replica-local on a freshly bootstrapped replica. If you want to change space's `space_name` storage engine to `vinyl` , you may say: ``` function trig(old, new) if new[3] == 'space_name' and new[4] ~= 'vinyl' then return new:update{{'=', 4, 'vinyl'}} end end ``` Such a trigger may be set on `_space` as a `before_replace` trigger. And thanks to `on_schema_init` triggers, it will happen before any non-system spaces are recovered, so the trigger will work for all user-created spaces: ``` box.ctl.on_schema_init(function() box.space._space:before_replace(trig) end) ``` Note, that the above steps are done before initial `box.cfg{}` call. Othervise the spaces will be already recovered by the time you set any triggers. Now you can say `box.cfg{replication='master_uri', ...}` And replica will have the space `space_name` with same contents, as on master, but on `vinyl` storage engine.
-
Vladislav Shpilevoy authored
SWIM wants to allow to bind to zero ports so as the kernel could choose any free port automatically. It is needed mainly for tests. Zero port means that a real port is known only after bind() has called, and getsockname() should be used to get it. SWIM uses sio library for such lowlevel API. This is why that function is added to sio. Needed for #3234
-
Kirill Shcherbatov authored
Before the commit d9f82b17 "More than one row in fixheader. Zstd compression", xrow_header_decode treated everything until 'end' as the packet body while currently it allows a packet to end before 'end'. The iproto_msg_decode may receive an invalid msgpack but it still assumes that xrow_header_decode sets an error in such case and use assert to test it, bit it is not so. Introduced a new boolean flag to control routine behaviour. When flag is set, xrow_header_decode should raise 'packet body' error unless the packet ends exactly at 'end'. @locker: renamed ensure_package_read to end_is_exact; fixed comments. Closes #3900
-