- Mar 13, 2019
-
-
Vladimir Davydov authored
There are three places where we use this expensive functions while we could get along with a cheaper one: - Deferred DELETE space on_replace trigger. Here we can use simple vy_stmt_new_delete, because the trigger is already passed a surrogate DELETE statement. - Secondary index build on_replace trigger. Here we can extract the secondary key, set its type to DELETE and insert it into the index. We don't need all the other indexed fields. - Secondary index build recovery procedure. Similarly to the previous case, we can use extracted here rather than building a surrogate DELETE statement.
-
Vladimir Davydov authored
This heavy function isn't needed anymore, as we can now insert key statements into the memory level.
-
Vladimir Davydov authored
In contrast to a primary index, which stores full tuples, secondary indexes only store extended (secondary + primary) keys on disk. To make them look like tuples, we fill missing fields with nulls (aka tuple surrogate). This isn't going to work nicely with multikey indexes though: how would you make a surrogate array from a key? We could special-case multikey index handling, but that would look cumbersome. So this patch removes nulls from secondary tuples restored from disk altogether. To achieve that, it's enough to use key_format for them - then the comparators will detect that it's actually a key, not a tuple and use the appropriate primitive.
-
Vladimir Davydov authored
It's actually only needed to initialize disk streams so let's pass it to vy_write_iterator_new_slice() instead.
-
Vladimir Davydov authored
By convention we have two methods in each write iterator stream implementation (including the write iterator itself as it implements the interface too): 'stop' and 'close'. The 'stop' method is called in a worker thread. It reverses the effect of 'start'. We need it unreference all tuples referenced during the iteration (we must do it in the worker thread, where the tuples were referenced in the first place so as not to unreference tuple formats, see vy_tuple_delete). The 'close' method is called from the tx thread to unreference tuple formats if necessary and release memory. For the write iterator itself we follow this convention. However, for individual sources, for vy_slice_stream source to be more exact, we do not - the write iterator calls both 'stop' and 'close' from its own 'stop method. Let's cleanup this mess and make the write iterator follow the convention. We'll need it in the next patch.
-
Vladimir Davydov authored
Use the format of the given statement instead. Passing format is a legacy from the time when we have a separate format for UPSERTs. Nowadays it only obfuscates the code.
-
Vladimir Davydov authored
A Vinyl statement may be either a key or a tuple. We must use different functions for the two kinds when working with a bloom filter. Let's introduce helpers incorporating that logic. Notes: - Currently, we never add keys to bloom filters, but after the next patch we will, so this patch adds tuple_bloom_builder_add_key helper. - According to the function protocol, tuple_bloom_builder_add may fail with out-of-memory, but we never checked that. Fix that while we are at it.
-
Vladimir Davydov authored
No functional changes, just move a piece of code, so as not to mix it in the next patch.
-
Vladimir Davydov authored
Tuple bloom filter is an array of bloom filters, each of which reflects lookups by all possible partial keys. To optimize the overall bloom filter size, we need to know how many unique elements there are for each partial key. To achieve that, we require the caller to pass the number of key parts that have been hashed for the given tuple. Here's how it looks in Vinyl: uint32_t hashed_parts = writer->last_stmt == NULL ? 0 : tuple_common_key_parts(stmt, writer->last_stmt, writer->key_def); tuple_bloom_builder_add(writer->bloom, stmt, writer->key_def, hashed_parts); Actually, there's no need in such a requirement as instead we can calculate the hash value for the given tuple, compare it with the hash of the tuple added last time, and add the new hash only if the two values differ. This should be accurate enough while allowing us to get rid of the cumbersome tuple_common_key_parts helper. Note, such a check will only work if tuples are added in the order defined by the key definition, but that already holds - anyway, one wouldn't be able to use tuple_common_key_parts either if it wasn't true. While we are at it, refresh the obsolete comment to tuple_bloom_builder.
-
Vladimir Davydov authored
To differentiate between key and tuple statements in comparators, we set IPROTO_SELECT type for key statements. As a result, we can't use key statements in the run iterator directly although secondary index runs do store statements in key format. Instead we create surrogate tuples filling missing fields with NULLs. This won't play nicely with multikey indexes so we need to teach iterators to deal with statements in key format. The first step in this direction is dropping IPROTO_SELECT in favor of identifying key statements by format.
-
Vladimir Davydov authored
Currently, it's called vy_stmt_new_select, but soon a key statement will be allowed to have any type, not just IPROTO_SELECT. So let's rename it to vy_key_new.
-
Vladimir Davydov authored
Store tuple_format_vtab, max_tuple_size, and key_format there. This will allow us to determine a statement type (key or tuple) by checking its format against key_format.
-
Vladimir Davydov authored
A vinyl statement (vy_stmt struct) may represent either a tuple or a key. We differentiate between the two kinds by statement type - we use SELECT for keys and other types for tuples. This was done that way so that we could pass both tuples and keys to a read iterator as a search key. To avoid branching in comparators when the types of compared statements are known in advance, we provide several comparators, each of which expects certain statement types, e.g. a tuple and a key. Actually, such a micro optimization looks like an overkill, because a typical comparator is called by function pointer and has a lot of comparisons in the code, see tuple_compare_slowpath for instance. Eliminating one branch will hardly make the code perform better. At the same time, it makes the code more difficult to write. Besides, once we remove nils from statements read from disk (aka surrogate tuples), which will ease implementation of multikey indexes, the number of places where types of compared statements are known will diminish drastically. That said, let's remove optimized comparators and always use vy_stmt_compare, which checks types of compared statements and calls the appropriate comparator.
-
Vladimir Davydov authored
We advance replica->gc state only when an xlog file is fully recovered, see recovery_close_log and relay_on_close_log_f. It may turn out that an xlog file is fully recovered, but isn't closed properly by relay (i.e. recovery_close_log isn't called), because the replica closes connection for some reason (e.g. timeout). If this happens, the old xlog file won't be removed when the replica reconnects, because we don't advance replica->gc state on reconnect, so the useless xlog file won't be removed until the next xlog file is relayed. This results in occasional replication/gc.test.lua failures. Fix this by updating replica->gc on reconnect with the current replica vclock. Closes #4034
-
Kirill Shcherbatov authored
The function on_replace_trigger_rollback in the case of a replace operation rollback was called with an incorrect argument, as a result of which the used memory was freed.
-
Kirill Shcherbatov authored
The set_system_triggers and erase routines in upgrade.lua did not proceed actions for _fk_constraint space.
-
- Mar 12, 2019
-
-
Kirill Shcherbatov authored
Reworked memtx_tree class to use structure memtx_tree_data as a tree node. This makes possible to extend it with service field to implement tuple hints and multikey indexes in subsequent patches. Needed for #3961
-
Kirill Shcherbatov authored
The http library intelligently sets the headers "Accept", "Connection", "Keep-Alive". However, when the user explicitly specified them in the header options section of the call argument, they could be written to the HTTP request twice. We postponed the auto headers setup before https_exececute call. Now they are set only if they were not set by the user. Closes #3955
-
- Mar 11, 2019
-
-
Nikita Pettik authored
When we allowed using HAVING clause without GROUP BY (b40f2443), one possible combination was forgotten to be tested: SELECT 1 FROM te40 HAVING SUM(s1) < 0; -- And SUM(s1) >= 0, i.e. HAVING condition is false. In other words, resulting set contains no aggregates, but HAVING does contain, but condition is false. In this case no byte-code related to aggregate execution is emitted at all. Hence, query above equals to simple SELECT 1; Unfortunately, result of such query is the same when condition under HAVING clause is unsatisfied. To fix this behaviour, it is enough to indicate to byte-code generator that we should analyze aggregates not only in ORDER BY clauses, but also in HAVING clause. Closes #3932 Follow-up #2364
-
Nikita Pettik authored
Functions such as trim(), substr() etc should return result with collation derived from their arguments. So, lets add flag indicating that collation of first argument must be applied to function's result to SQL function definition. Using this flag, we can derive appropriate collation in sql_expr_coll(). Part of #3932
-
Georgy Kirichenko authored
Form a separate transaction with local changes in case of replication. This is important because we should be able to replicate such changes (e.g. made within an on_replace trigger) back. In the opposite case local changes will be incorporated into originating transaction and would be skipped by the originator replica. Needed for #2798
-
- Mar 07, 2019
-
-
Vladimir Davydov authored
Fixes commit 8031071e ("Lightweight vclock_create and vclock_copy"). Closes #4033
-
Nikita Pettik authored
BLOB column type is represented by SCALAR field type in terms of NoSQL. We attempted at emulating BLOB behaviour, but such efforts turn out to be not decent enough. For this reason, we've decided to abandon these attempts and fairly replace it with SCALAR column type. SCALAR column type acts in the same way as it does in NoSQL: it is aggregator-type for INTEGER, NUMBER and STRING types. So, column declared with this type can contain values of these three (available in SQL) types. It is worth mentioning that CAST operator in this case does nothing. Still, we consider BLOB values as entries encoded in msgpack with MP_BIN format. To make this happen, values to be operated should be represented in BLOB form x'...' (e.g. x'000000'). What is more, there are two built-in functions returning BLOBs: randomblob() and zeroblob(). On the other hand, columns with STRING NoSQL type don't accept BLOB values. Closes #4019 Closes #4023 @TarantoolBot document Title: SQL types changes There are couple of recently introduced changes connected with SQL types. Firstly, we've removed support of DATE/TIME types from parser due to confusing behaviour of these types: they were mapped to NUMBER NoSQL type and have nothing in common with generally accepted DATE/TIME types (like in other DBs). In addition, all built-in functions related to these types (julianday(), date(), time(), datetime(), current_time(), current_date() etc) are disabled until we reimplement TIME-like types as a native NoSQL ones (see #3694 issue). Secondly, we've removed CHAR type (i.e. alias to VARCHAR and TEXT). The reason is that according to ANSI SQL CHAR(len) must accept only strings featuring length exactly equal to given in type definition. Obviously, now we don't provide such checks. Types VARCHAR and TEXT are still legal. For the same reason, we've removed NUMERIC and DECIMAL types, which were aliases to NUMBER NoSQL type. REAL, FLOAT and DOUBLE are still exist as aliases. Finally, we've renamed BLOB column type to SCALAR. We've decided that all our attempts to emulate BLOB behaviour using SCALAR NoSQL type don't seem decent enough, i.e. without native NoSQL type BLOB there always will be inconsistency, especially taking into account possible NoSQL-SQL interactions. In SQL SCALAR type works exactly in the same way as in NoSQL: it can store values of INTEGER, FLOAT and TEXT SQL types at the same time. Also, with this change behaviour of CAST operator has been slightly corrected: now cast to SCALAR doesn't affect type of value at all. Couple of examples: CREATE TABLE t1 (a SCALAR PRIMARY KEY); INSERT INTO t1 VALUES ('1'); SELECT * FROM t1 WHERE a = 1; -- [] Result is empty set since column "a" contains string literal value '1', not integer value 1. CAST(123 AS SCALAR); -- Returns 123 (integer) CAST('abc' AS SCALAR); -- Returns 'abc' (string) Note that in NoSQL values of BLOB type defined as ones decoded in msgpack with MP_BIN format. In SQL there are still a few ways to force this format: declaring literal in "BLOB" format (x'...') or using one of two built-in functions (randomblob() and zeroblob()). TEXT and VARCHAR SQL types don't accept BLOB values: CREATE TABLE t (a TEXT PRIMARAY KEY); INSERT INTO t VALUES (randomblob(5)); --- - error: 'Tuple field 1 type does not match one required: expected string' ... BLOB itself is going to be reimplemented in scope of #3650.
-
Nikita Pettik authored
NMERIC and DECIMAL were allowed to be specified as column types. But in fact, they were just synonyms for FLOAT type and mapped to NUMERIC Tarantool NoSQL type. So, we've decided to remove this type from parser and return back when NUMERIC will be implemented as a native type. Part of #4019
-
Nikita Pettik authored
Since now no checks connected with length of string are performed, it might be misleading to allow specifying this type. Instead, users must rely on VARCHAR type. Part of #4019
-
Nikita Pettik authored
Currently, there is no native (in Tarantool terms) types to represent time-like types. So, until we add implementation of those types, it makes no sense to allow to specify those types in table definition. Note that previously they were mapped to NUMBER type. For the same reason all built-in functions connected with DATE/TIME are disabled as well. Part of #4019
-
Vladislav Shpilevoy authored
SWIM - Scalable Weakly-consistent Infection-style Process Group Membership Protocol. It consists of 2 components: events dissemination and failure detection, and stores in memory a table of known remote hosts - members. Also some SWIM implementations have an additional component: anti-entropy - periodical broadcast of a random subset of members table. Dissemination component spreads over the cluster changes occurred with members. Failure detection constantly searches for failed dead members. Anti-entropy just sends all known information at once about a member so as to synchronize it among all other members in case some events were not disseminated (UDP problems). Anti-entropy is the most vital component, since it can work without dissemination and failure detection. But they can not work properly with out the former. Consider the example: two SWIM nodes, both are alive. Nothing happens, so the events list is empty, only pings are being sent periodically. Then a third node appears. It knows about one of existing nodes. How should it learn about another one? Sure, its known counterpart can try to notify another one, but it is UDP, so this event can get lost. Anti-entropy is an extra simple component, it just piggybacks random part of members table with each regular round message. In the example above the new node will learn about the third one via anti-entropy messages of the second one soon or late. This is why anti-entropy is the first implemented component. Part of #3234
-
Kirill Shcherbatov authored
In order to give a user ability to use a delimiter symbol within a code the real delimiter is user-provided 'delim' plus "\n". Since telnet sends "\r\n" on line break, the updated expression delim + "\n" could not be found in a sequence data+delim+"\r\n", so delimiter feature did not work at all. Added delim + "\r" check along with delim + "\n", that solves the described problem and does not violate backward compatibility. Closes #2027
-
Georgy Kirichenko authored
Remove xstream dependency and use direct box interface to apply all replication rows. This is refactoring needed for transactional replication. Needed for #2798
-
Mergen Imeev authored
The module table.c is not used and should be removed.
-
- Mar 06, 2019
-
-
Vladimir Davydov authored
The test creates a space, but doesn't drop it, which leads to box-tap/on_schema_init failure: | box-tap/trigger_yield.test.lua [ pass ] | box-tap/on_schema_init.test.lua [ fail ] | Test failed! Output from reject file box-tap/on_schema_init.reject: | TAP version 13 | 1..7 | ok - on_schema_init trigger set | ok - system spaces are accessible | ok - before_replace triggers | ok - on_replace triggers | ok - set on_replace trigger | ok - on_schema_init trigger works | | Last 15 lines of Tarantool Log file [Instance "app_server"][/Users/travis/build/tarantool/tarantool/test/var/002_box-tap/on_schema_init.test.lua.tarantool.log]: | 2019-03-06 17:00:12.057 [87410] main/102/on_schema_init.test.lua F> Space 'test' already exists Fix this.
-
Serge Petrenko authored
This patch introduces an on_schema_init trigger. The trigger may be set before box.cfg() is called and is called during box.cfg() right after prototypes of system spaces, such as _space, are created. This allows to set triggers on system spaces before any other non-system data is recovered. For example, it is possible to set an on_replace trigger on _space, which will work even during recovery. Part of #3159 @TarantoolBot document Title: document box.ctl.on_schema_init triggers on_schema_init triggers are set before the first call to box.cfg() and are fired during box.cfg() before user data recovery start. To set the trigger, say ``` box.ctl.on_schema_init(new_trig, old_trig) ``` where `old_trig` may be omitted. This will replace `old_trig` with `new_trig`. Such triggers let you, for example, set triggers on system spaces before recovery of any data, so that the triggers are fired even during recovery. For example, such triggers make it possible to change a specific space's storage engine or make a replicated space replica-local on a freshly bootstrapped replica. If you want to change space's `space_name` storage engine to `vinyl` , you may say: ``` function trig(old, new) if new[3] == 'space_name' and new[4] ~= 'vinyl' then return new:update{{'=', 4, 'vinyl'}} end end ``` Such a trigger may be set on `_space` as a `before_replace` trigger. And thanks to `on_schema_init` triggers, it will happen before any non-system spaces are recovered, so the trigger will work for all user-created spaces: ``` box.ctl.on_schema_init(function() box.space._space:before_replace(trig) end) ``` Note, that the above steps are done before initial `box.cfg{}` call. Othervise the spaces will be already recovered by the time you set any triggers. Now you can say `box.cfg{replication='master_uri', ...}` And replica will have the space `space_name` with same contents, as on master, but on `vinyl` storage engine.
-
Vladislav Shpilevoy authored
SWIM wants to allow to bind to zero ports so as the kernel could choose any free port automatically. It is needed mainly for tests. Zero port means that a real port is known only after bind() has called, and getsockname() should be used to get it. SWIM uses sio library for such lowlevel API. This is why that function is added to sio. Needed for #3234
-
Kirill Shcherbatov authored
Before the commit d9f82b17 "More than one row in fixheader. Zstd compression", xrow_header_decode treated everything until 'end' as the packet body while currently it allows a packet to end before 'end'. The iproto_msg_decode may receive an invalid msgpack but it still assumes that xrow_header_decode sets an error in such case and use assert to test it, bit it is not so. Introduced a new boolean flag to control routine behaviour. When flag is set, xrow_header_decode should raise 'packet body' error unless the packet ends exactly at 'end'. @locker: renamed ensure_package_read to end_is_exact; fixed comments. Closes #3900
-
- Mar 05, 2019
-
-
Vladimir Davydov authored
A Vinyl transaction may yield while having a non-empty write set. This opens a time window for the instance to switch to read-only mode. Since we check ro flag only before executing a DML request, the transaction would successfully commit in such a case, breaking the assumption that no writes are possible on an instance after box.cfg{read_only=true} returns. In particular, this breaks master-replica switching logic. Fix this by aborting all local rw transactions before switching to read-only mode. Note, remote rw transactions must not be aborted, because they ignore ro flag. Closes #4016
-
Vladimir Davydov authored
We will use this callback to abort rw transactions in Vinyl when an instance is switch to read-only mode. Needed for #4016
-
Vladimir Davydov authored
Currently, we add a transaction to the list of writers when executing a DML request, i.e. in vy_tx_set. The problem is a transaction can yield on read before calling vy_tx_set, e.g. to check a uniqueness constraint, which opens a time window when a transaction is not yet on the list, but it will surely proceed to DML after it continues execution. If we need to abort writers in this time window, we'll miss it. To prevent this, let's add a transaction to the list of writers in vy_tx_begin_statement. Note, after this patch, when a transaction is aborted for DDL, it may have an empty write set - it happens if tx_manager_abort_writers is called between vy_tx_begin_statement and vy_tx_set. Hence we have to remove the corresponding assertion from tx_manager_abort_writers. Needed for #4016
-
Vladimir Davydov authored
Rename vy_tx_rollback_to_savepoint to vy_tx_rollback_statement and vy_tx_savepoint to vy_tx_begin_statement, because soon we will do some extra work there. Needed for #4016
-
- Mar 04, 2019
-
-
Stanislav Zudin authored
Adds collation analysis into creating of a composite key for index tuples. The keys of secondary index consist of parts defined for index itself combined with parts defined for primary key. The duplicate parts are ignored. But the search of duplicates didn't take the collation into consideration. If non-unique secondary index contained primary key columns their parts from the primary key were omitted. This fact caused an issue. @locker: comments, renames. Closes #3537
-
Cyrill Gorcunov authored
When building "tags" target we scan the whole working directory which is redundant. In particular .git,.pc,patches directories should not be scanned for sure.
-