- Jan 29, 2019
-
-
Mergen Imeev authored
Currently, function sql_response_dump() puts data into an already created map. Moving the map creation to sql_response_dump() simplifies the code and allows us to use sql_response_dump() as one of the port_sql methods. Needed for #3505
-
Vladimir Davydov authored
The buffer is defined in a nested {} block. This gives the compiler the liberty to overwrite it once the block has been executed, which would be incorrect since the content of the buffer is used outside the {} block. This results in box/hash and viny/bloom test failures when tarantool is compiled in the release mode. Fix this by moving the buffer definition to the beginning of the function. Fixes commit 0dfd99c4 ("tuple: fix hashing of integer numbers").
-
Vladimir Davydov authored
Integer numbers stored in tuples as MP_FLOAT/MP_DOUBLE are hashed differently from integer numbers stored as MP_INT/MP_UINT. This breaks select() for memtx hash indexes and vinyl indexes (the latter use bloom filters). Fix this by converting MP_FLOAT/MP_DOUBLE to MP_INT/MP_UINT before hashing if the value can be stored as an integer. This is consistent with the behavior of tuple comparators, which treat MP_FLOAT and MP_INT as equal in case they represent the same number. Closes #3907
-
- Jan 25, 2019
-
-
Vladimir Davydov authored
In contrast to TX thread, WAL thread performs garbage collection synchronously, blocking all concurrent writes. We expected file removal to happen instantly so we didn't bother to offload this job to eio threads. However, it turned out that sometimes removal of a single xlog file can take 50 or even 100 ms. If there are a dozen files to be removed, this means a second delay and 'too long WAL write' warnings. To fix this issue, let's make WAL garbage collection fully asynchronous. Simply submit a jobs to eio and assume it will successfully complete sooner or later. This means that if unlink() fails for some reason, we will log an error and never retry file removal until the server is restarted. Not a big deal. We can live with it assuming unlink() doesn't normally fail. Closes #3938
-
Vladimir Davydov authored
We build the checkpoint list from the list of memtx snap files. So to ensure that it is always possible to recover from any checkpoint present in box.info.gc() output, we abort garbage collection if we fail to unlink a snap file. This introduces extra complexity to the garbage collection code, which makes it difficult to make WAL file removal fully asynchronous. Actually, it looks like we are being way too overcautious here, because unlink() doesn't normally fail so an error while removing a snap file is highly unlikely to occur. Besides, even if it happens, it still won't be critical, because we never delete the last checkpoint, which is usually used for backups/recovery. So let's simplify the code by removing that check. Needed for #3938
-
Kirill Yukhin authored
Since under heavy load with SQL queries ephemeral spaces might be extensively used it is possible to run out of tuple_formats for such spaces. This occurs because tuple_format is not immediately deleted when ephemeral space is dropped. Its removel is postponed instead and triggered only when tuple memory is exhausted. As far as there's no way to alter ephemeral space's format, let's re-use them for multiple epehemral spaces in case they're identical. Closes #3924
-
- Jan 24, 2019
-
-
Kirill Yukhin authored
This is trivial patch which sets error kind if epehemeral spaces cannot be created due to Tarantool's backend (e.g. there's no more memory or formats available).
-
Kirill Yukhin authored
Before the patch, when ephemeral space was created flag is_temporary was set after space was actually created. Which in turn lead to corresponding flag of tuple_format being set to `false`. So, having heavy load using ephemeral spaces (almost any SQL query) and snapshotting at the same time might lead to OOM, since tuples of ephemeral spaces were not marked as temporary and were not gc-ed. Patch sets the flag in space definition.
-
Kirill Yukhin authored
There were three extra fields of tuple_format which were setup after it was created. Fix that by extending tuple_format contstructor w/ three new arguments: engine, is_temporary, exact_field_count.
-
Vladimir Davydov authored
Currently, if we encounter an unknown key while parsing a .run, .index, or .vylog file we raise an error. As a result, if we add a new key to either of those entities, we will break forward compatibility although there's actually no reason for that. To avoid that, let's silently ignore unknown keys, as we do in case of xrow header keys.
-
Vladimir Davydov authored
Upon LSM tree dump completion, we iterate over all ranges of the LSM tree to update their priority and the position in the compaction heap. Since typically we need to update all ranges, we better use update_all heap method instead of updating the heap entries one by one.
-
Nikita Pettik authored
SQLite discards type and collation of IN operator when it comes with only one operand. This leads to different results of straight comparison using '=' operator and IN: SELECT x FROM t1 WHERE x IN (1.0); -- Result is empty set SELECT x FROM t1 WHERE x = 1.0; - - ['1'] Lets remove this strange ignorance and always take into consideration types and collations of operands. Closes #3934
-
Alexander Turenko authored
* Fixed wait_vclock() LSN problem with nil handling (#3895). * Enabled HangWatcher under --long. * Show result file for a hang test once at the end. * Show diff against a result file for a hung test.
-
- Jan 16, 2019
-
-
Vladimir Davydov authored
In order to estimate space amplification of a vinyl database, we need to know the size of data stored at the last LSM tree level. So this patch adds such a counter both per index and globablly. Per-index it is reported under disk.last_level, in rows, bytes, bytes after compression, and pages, just like any other disk counter. Globablly it is repoted in bytes only under disk.data_compacted. Note, to be consistent with disk.data, it doesn't include the last level of secondary indexes.
-
Vladimir Davydov authored
This patch adds dump_time and compaction_time to the scheduler section of global vinyl statistics and disk.dump.time and disk.compaction.time to per-index statistics. They report the total time spent doing dump and compaction tasks, respectively and can be useful for estimating average disk write rate, which is required for compaction-aware throttling.
-
Vladimir Davydov authored
This patch adds the following new fields to box.stat.vinyl(): scheduler.tasks_inprogress - number of currently running tasks scheduler.tasks_completed - number of successfully completed tasks scheduler.tasks_failed - number of aborted tasks tasks_failed can be useful for monitoring disk write errors while tasks_inprogress and tasks_completed can shed light on worker thread pool effeciency.
-
Vladimir Davydov authored
Indexes of the same space share a memory level so we should account them to box.stat.vinyl().scheduler.dump_input once per space dump, not each time an index is dumped.
-
Vladimir Davydov authored
This patch adds scheduler.dump_count to box.stat.vinyl(), which shows the number of memory level dumps that have happened since the instance startup or box.stat.reset(). It's useful for estimating an average size of a single memory dump, which in turn can be used for calculating LSM tree fanout.
-
Vladimir Davydov authored
Although it's convenient to maintain dump/compaction input/output metrics in vy_lsm_env, semantically it's incorrect as those metrics characterize the scheduler not the LSM environment. Also, we can't easily extend those stats with e.g. the number of completed dumps or the number of tasks in progress, because those are only known to the scheduler. That said, let's introduce 'scheduler' section in box.stat.vinyl() and move dump/compaction stats from 'disk' to the new section. Let's also move the stats accounting from vy_lsm.c to vy_scheduler.c. The 'disk' section now stores only the size of data and index on disk and no cumulative statistics, which makes it similar to the 'memory' section. Note, this patch flattens the stats (disk.compaction.input is moved to scheduler.compaction_input and so forth), because all other global stats are reported without using nested tables.
-
- Jan 15, 2019
-
-
Vladimir Davydov authored
During local recovery we may encounter an LSM tree marked as dropped. This means that the LSM tree was dropped before restart and hence will be deleted before recovery completion. There's no need to add such trees to the vinyl scheduler - it looks confusing and can potentially result in mistakes when the code gets modified.
-
Vladimir Davydov authored
Currently, we bump range->version in vy_scheduler.c. This looks like an encapsulation violation, and may easily result in an error (as we have to be cautious to inc range->version whenever we modify a range). That said, let's bump range version right in vy_range.c.
-
Vladimir Davydov authored
compact_input sounds confusing, because 'compact' works as an adjective here. Saving 3 characters per variable/stat name related to compaction doesn't justify this. Let's rename 'compact' to 'compaction' both in stats and in the code.
-
Vladimir Davydov authored
'in' is a reserved keyword in Lua, so using 'in' as a map key was a bad decision - one has to access it with [] rather than simply with dot. Let's rename 'in' to 'input' and 'out' to 'output' both in the output and in the code.
-
Vladimir Davydov authored
This test is huge and takes long to complete. Let's move ddl, tx, and stat related stuff to separate files.
-
Vladimir Davydov authored
The test was called 'info' in the first place, because back when it was introduced vinyl statistics were reported by 'info' method. Today, stats are reported by 'stat' so let's rename the test as well to conform.
-
- Jan 14, 2019
-
-
Konstantin Osipov authored
Add box.info.gc.checkpoint_is_in_progress which is true when there is an ongoing checkpoint/snapshot and false otherwise Closes gh-3935 @TarantoolBot document Title: box.info.gc().checkpoint_is_in_progress Extend box.info.gc() documentation with a new member - checkpoint_is_in_progress, which is true if there is an ongoing checkpoint, false otherwise.
-
- Jan 10, 2019
-
-
Mergen Imeev authored
Not really critical as obuf_alloc() fails only on OOM, i.e. never in practice.
-
Kirill Shcherbatov authored
Introduced a new OP_Update opcode executing Tarantool native Update operation. In case of UPDATE or REPLACE we can't use new OP_Update as it has a complex SQL-specific semantics: CREATE TABLE tj (s1 INT PRIMARY KEY, s2 INT); INSERT INTO tj VALUES (1, 3),(2, 4),(3,5); CREATE UNIQUE INDEX i ON tj (s2); SELECT * FROM tj; [1, 3], [2, 4], [3, 5] UPDATE OR REPLACE tj SET s2 = s2 + 1; SELECT * FROM tj; [1, 4], [3, 6] I.e. [1, 3] tuple is updated as [1, 4] and have replaced tuple [2, 4]. This logic is implemented as preventive tuples deletion by all corresponding indexes in SQL. The other significant change is forbidden primary key update. It was possible to deal with it the same way like with or REPLACE specifier but we need an atomic UPDATE step for #3691 ticket to support "or IGNORE/or ABORT/or FAIL" specifiers. Reworked tests to make testing avoiding primary key UPDATE where possible. Closes #3850
-
Kirill Shcherbatov authored
Introduced new sql_vdbe_mem_encode_tuple and mpstream_encode_vdbe_mem routines to perform Vdbe memory to msgpack encoding on region without previous size estimation call. Got rid of sqlite3VdbeMsgpackRecordLen and sqlite3VdbeMsgpackRecordPut functions that became useless. This approach also resolves problem with invalid size estimation #3035 because it is not required anymore. Needed for #3850 Closes #3035
-
Kirill Shcherbatov authored
UPDATE operation doesn't fail when fkey self-reference condition unsatisfied and table has other records. To do not raise error where it is not necessary Vdbe inspects parent table with OP_Found. This branch is not valid for self-referenced table since its looking for a tuple affected by UPDATE operation and since the foreign key has already detected a conflict it must be raised. Example: CREATE TABLE t6(a INTEGER PRIMARY KEY, b TEXT, c INT, d TEXT, UNIQUE(a, b), FOREIGN KEY(c, d) REFERENCES t6(a, b)); INSERT INTO t6 VALUES(1, 'a', 1, 'a'); INSERT INTO t6 VALUES(100, 'one', 100, 'one'); UPDATE t6 SET c = 1, d = 'a' WHERE a = 100; -- fk conflict must be raised here Needed for #3850 Closes #3918
-
Kirill Shcherbatov authored
Function sql_vdbe_mem_alloc_region() that constructs the value of Vdbe Mem object used to change only type related flags. However, it is also required to erase other flags (for instance flags related to allocation policy: static, dynamic etc), since their combination may be invalid. In a typical Vdbe scenario, OP_MakeRecord and OP_RowData release memory with sqlite3VdbeMemRelease() and allocate on region with sql_vdbe_mem_alloc_region(). An integrity assert based on sqlite3VdbeCheckMemInvariants() would fire here due to incompatible combination of flags: MEM_Static | (MEM_Blob | MEM_Ephem). Needed for #3850
-
Kirill Shcherbatov authored
Removed vdbe code generation making type checks from vdbe_emit_constraint_checks as it is useless since strict types have been introduced.
-
- Jan 09, 2019
-
-
Georgy Kirichenko authored
Reclaim memory used while previous page recovery not the last one. There is no specific test case. Fixes: 3920
-
- Jan 05, 2019
-
-
Alexander Turenko authored
It catched by ASAN at build time (lemon is executed to generate parse.[ch]), so tarantool couldn't be built with -DENABLE_ASAN=ON.
-
- Dec 29, 2018
-
-
Kirill Shcherbatov authored
Reworked tuple_init_field_map to fill a local bitmap and compare it with template required_fields bitmap containing information about required fields. Each field is mapped to bitmap with field:id - unique field identifier. This approach to check the required fields will work even after the introduction of JSON paths, when the field tree becomes multilevel. @locker: massive code refactoring, comments. Needed for #1012
-
Kirill Shcherbatov authored
@locker: comments. Needed for #1012
-
Nikita Pettik authored
Closes #3906
-
Kirill Shcherbatov authored
Allowed to make SELECT requests that have HAVING clause without GROUP BY. It is possible when both - left and right parts of request have aggregate function or constant value. Closes #2364. @TarantoolBot document Title: HAVING without GROUP BY clause A query with a having clause should also have a group by clause. If you omit group by, all the rows not excluded by the where clause return as a single group. Because no grouping is performed between the where and having clauses, they cannot act independently of each other. Having acts like where because it affects the rows in a single group rather than groups, except the having clause can still use aggregates. Having without group by is not supported for select from multiple tables. 2011 SQL standard "Part 2: Foundation" 7.10 <having clause> p.381 Example: SELECT MIN(s1) FROM te40 HAVING SUM(s1) > 0; -- is valid SELECT 1 FROM te40 HAVING SUM(s1) > 0; -- is valid SELECT NULL FROM te40 HAVING SUM(s1) > 0; -- is valid SELECT date() FROM te40 HAVING SUM(s1) > 0; -- is valid
-
Vladimir Davydov authored
xlog and xlog_cursor must be opened and closed in the same thread, because they use cord's slab allocator. Follow-up #3910
-
Vladimir Davydov authored
An xlog_cursor created and used by a relay via recovery context is destroyed by the main thread once the relay thread has exited. This is incorrect, because xlog_cursor uses cord's slab allocator and therefore must be destroyed in the same thread it was created by, otherwise we risk getting a use-after-free bug. So this patch moves recovery_delete() invocation to the end of the relay thread routine. No test is added, because our existing tests already cover this case - crashes don't usually happen, because we are lucky. The next patch will add some assertions to make the bug 100% reproducible. Closes #3910
-