- Feb 04, 2019
-
-
Konstantin Osipov authored
-
Vladimir Davydov authored
The patch adds missing -fPIC option for clang, without which msgpuck library might fail to compile.
-
Kirill Shcherbatov authored
Implemented a more convenient interface for creating an index by JSON path. Instead of specifying fieldno and relative path it is now possible to pass full JSON path to data. Closes #1012 @TarantoolBot document Title: Indexes by JSON path Sometimes field data could have complex document structure. When this structure is consistent across whole space, you are able to create an index by JSON path. Example: s = box.schema.space.create('sample') format = {{'id', 'unsigned'}, {'data', 'map'}} s:format(format) -- explicit JSON index creation age_idx = s:create_index('age', {{2, 'number', path = "age"}}) -- user-friendly syntax for JSON index creation parts = {{'data.FIO["fname"]', 'str'}, {'data.FIO["sname"]', 'str'}, {'data.age', 'number'}} info_idx = s:create_index('info', {parts = parts}}) s:insert({1, {FIO={fname="James", sname="Bond"}, age=35}})
-
Kirill Shcherbatov authored
tuple_field_by_part looks up the tuple_field corresponding to the given key part in tuple_format in order to quickly retrieve the offset of indexed data from the tuple field map. For regular indexes this operation is blazing fast, however of JSON indexes it is not as we have to parse the path to data and then do multiple lookups in a JSON tree. Since tuple_field_by_part is used by comparators, we should strive to make this routine as fast as possible for all kinds of indexes. This patch introduces an optimization that is supposed to make tuple_field_by_part for JSON indexes as fast as it is for regular indexes in most cases. We do that by caching the offset slot right in key_part. There's a catch here however - we create a new format whenever an index is dropped or created and we don't reindex old tuples. As a result, there may be several generations of tuples in the same space, all using different formats while there's the only key_def used for comparison. To overcome this problem, we introduce the notion of tuple_format epoch. This is a counter incremented each time a new format is created. We store it in tuple_format and key_def, and we only use the offset slot cached in a key_def if it's epoch coincides with the epoch of the tuple format. If they don't, we look up a tuple_field as before, and then update the cached value provided the epoch of the tuple format. Part of #1012
-
Kirill Shcherbatov authored
Introduced has_json_path flag for compare, hash and extract functions templates(that are really hot) to make possible do not look to path field for flat indexes without any JSON paths. Part of #1012
-
Kirill Shcherbatov authored
New JSON indexes allows to index documents content. At first, introduced new key_part fields path and path_len representing JSON path string specified by user. Modified tuple_format_use_key_part routine constructs corresponding tuple_fields chain in tuple_format::fields tree to indexed data. The resulting tree is used for type checking and for alloctating indexed fields offset slots. Then refined tuple_init_field_map routine logic parses tuple msgpack in depth using stack allocated on region and initialize field map with corresponding tuple_format::field if any. Finally, to proceed memory allocation for vinyl's secondary key restored by extracted keys loaded from disc without fields tree traversal, introduced format::min_tuple_size field - the size of tuple_format tuple as if all leaf fields are zero. Example: To create a new JSON index specify path to document data as a part of key_part: parts = {{3, 'str', path = '.FIO.fname', is_nullable = false}} idx = s:create_index('json_idx', {parts = parse}) idx:select("Ivanov") Part of #1012
-
Kirill Shcherbatov authored
Introduced a new function tuple_field_raw_by_path is used to get tuple fields by field index and relative JSON path. This routine uses tuple_format's field_map if possible. It will be further extended to use JSON indexes. The old tuple_field_raw_by_path routine used to work with full JSON paths, renamed tuple_field_raw_by_full_path. It's return value type is changed to const char * because the other similar functions tuple_field_raw and tuple_field_by_part_raw use this convention. Got rid of reporting error position for 'invalid JSON path' error in lbox_tuple_field_by_path because we can't extend other routines to behave such way that makes an API inconsistent, moreover such error are useless and confusing. Needed for #1012
-
Kirill Shcherbatov authored
The msgpack dependency has been updated because the new version introduces the new mp_stack class which we will use to parse tuple without recursion when initializing the field map. Needed for #1012
-
- Jan 30, 2019
-
-
Serge Petrenko authored
Move a call to tarantool_free() to the end of main(). We needn't call atexit() at all anymore, since we've implemented on_shutdown triggers and patched os.exit() so that when exiting not due to a fatal signal (when no cleanup routines are called anyway) control always reaches a call to tarantool_free().
-
Serge Petrenko authored
Make os.exit() call tarantool_exit(), just like the signal handler does. Now on_shutdown triggers are not run only when a fatal signal is received. Closes #1607 @TarantoolBot document Title: Document box.ctl.on_shutdown triggers on_shutdown triggers may be set similar to space:on_replace triggers: ``` box.ctl.on_shutdown(new_trigger, old_trigger) ``` The triggers will be run when tarantool exits due to receiving one of the signals: `SIGTERM`, `SIGINT`, `SIGHUP` or when user executes `os.exit()`. Note that the triggers will not be run if tarantool receives a fatal signal: `SIGSEGV`, `SIGABORT` or any signal causing immediate program termination.
-
Serge Petrenko authored
Add on_shutdown triggers which are run by a preallocated fiber on shutdown and make it possible to register them via box.ctl.on_shutdown() Make use of the new triggers: now dedicate an on_shutdown trigger to break event loop instead of doing it explicitly from signal handler. The trigger is run last, so that all other on_shutdown triggers may yield, sleep and so on. Also make sure we can register lbox_triggers without push_event function in case we don't need one. Part of #1607
-
Stanislav Zudin authored
The "box.sql.execute('values(blob)')" causes an accert in the expression processing, because the parser doesn't distinguish the keyword "BLOB" from the binary value (in the form X'hex'). This fix adds an additional checks in the SQL grammar. Thus the expressions such as "VALUES(BLOB)", "SELECT FLOAT" and so on are treated as a syntax errors. Closes #3888
-
- Jan 29, 2019
-
-
Mergen Imeev authored
Currently, function sql_response_dump() puts data into an already created map. Moving the map creation to sql_response_dump() simplifies the code and allows us to use sql_response_dump() as one of the port_sql methods. Needed for #3505
-
Vladimir Davydov authored
The buffer is defined in a nested {} block. This gives the compiler the liberty to overwrite it once the block has been executed, which would be incorrect since the content of the buffer is used outside the {} block. This results in box/hash and viny/bloom test failures when tarantool is compiled in the release mode. Fix this by moving the buffer definition to the beginning of the function. Fixes commit 0dfd99c4 ("tuple: fix hashing of integer numbers").
-
Vladimir Davydov authored
Integer numbers stored in tuples as MP_FLOAT/MP_DOUBLE are hashed differently from integer numbers stored as MP_INT/MP_UINT. This breaks select() for memtx hash indexes and vinyl indexes (the latter use bloom filters). Fix this by converting MP_FLOAT/MP_DOUBLE to MP_INT/MP_UINT before hashing if the value can be stored as an integer. This is consistent with the behavior of tuple comparators, which treat MP_FLOAT and MP_INT as equal in case they represent the same number. Closes #3907
-
- Jan 25, 2019
-
-
Vladimir Davydov authored
In contrast to TX thread, WAL thread performs garbage collection synchronously, blocking all concurrent writes. We expected file removal to happen instantly so we didn't bother to offload this job to eio threads. However, it turned out that sometimes removal of a single xlog file can take 50 or even 100 ms. If there are a dozen files to be removed, this means a second delay and 'too long WAL write' warnings. To fix this issue, let's make WAL garbage collection fully asynchronous. Simply submit a jobs to eio and assume it will successfully complete sooner or later. This means that if unlink() fails for some reason, we will log an error and never retry file removal until the server is restarted. Not a big deal. We can live with it assuming unlink() doesn't normally fail. Closes #3938
-
Vladimir Davydov authored
We build the checkpoint list from the list of memtx snap files. So to ensure that it is always possible to recover from any checkpoint present in box.info.gc() output, we abort garbage collection if we fail to unlink a snap file. This introduces extra complexity to the garbage collection code, which makes it difficult to make WAL file removal fully asynchronous. Actually, it looks like we are being way too overcautious here, because unlink() doesn't normally fail so an error while removing a snap file is highly unlikely to occur. Besides, even if it happens, it still won't be critical, because we never delete the last checkpoint, which is usually used for backups/recovery. So let's simplify the code by removing that check. Needed for #3938
-
Kirill Yukhin authored
Since under heavy load with SQL queries ephemeral spaces might be extensively used it is possible to run out of tuple_formats for such spaces. This occurs because tuple_format is not immediately deleted when ephemeral space is dropped. Its removel is postponed instead and triggered only when tuple memory is exhausted. As far as there's no way to alter ephemeral space's format, let's re-use them for multiple epehemral spaces in case they're identical. Closes #3924
-
- Jan 24, 2019
-
-
Kirill Yukhin authored
This is trivial patch which sets error kind if epehemeral spaces cannot be created due to Tarantool's backend (e.g. there's no more memory or formats available).
-
Kirill Yukhin authored
Before the patch, when ephemeral space was created flag is_temporary was set after space was actually created. Which in turn lead to corresponding flag of tuple_format being set to `false`. So, having heavy load using ephemeral spaces (almost any SQL query) and snapshotting at the same time might lead to OOM, since tuples of ephemeral spaces were not marked as temporary and were not gc-ed. Patch sets the flag in space definition.
-
Kirill Yukhin authored
There were three extra fields of tuple_format which were setup after it was created. Fix that by extending tuple_format contstructor w/ three new arguments: engine, is_temporary, exact_field_count.
-
Vladimir Davydov authored
Currently, if we encounter an unknown key while parsing a .run, .index, or .vylog file we raise an error. As a result, if we add a new key to either of those entities, we will break forward compatibility although there's actually no reason for that. To avoid that, let's silently ignore unknown keys, as we do in case of xrow header keys.
-
Vladimir Davydov authored
Upon LSM tree dump completion, we iterate over all ranges of the LSM tree to update their priority and the position in the compaction heap. Since typically we need to update all ranges, we better use update_all heap method instead of updating the heap entries one by one.
-
Nikita Pettik authored
SQLite discards type and collation of IN operator when it comes with only one operand. This leads to different results of straight comparison using '=' operator and IN: SELECT x FROM t1 WHERE x IN (1.0); -- Result is empty set SELECT x FROM t1 WHERE x = 1.0; - - ['1'] Lets remove this strange ignorance and always take into consideration types and collations of operands. Closes #3934
-
Alexander Turenko authored
* Fixed wait_vclock() LSN problem with nil handling (#3895). * Enabled HangWatcher under --long. * Show result file for a hang test once at the end. * Show diff against a result file for a hung test.
-
- Jan 16, 2019
-
-
Vladimir Davydov authored
In order to estimate space amplification of a vinyl database, we need to know the size of data stored at the last LSM tree level. So this patch adds such a counter both per index and globablly. Per-index it is reported under disk.last_level, in rows, bytes, bytes after compression, and pages, just like any other disk counter. Globablly it is repoted in bytes only under disk.data_compacted. Note, to be consistent with disk.data, it doesn't include the last level of secondary indexes.
-
Vladimir Davydov authored
This patch adds dump_time and compaction_time to the scheduler section of global vinyl statistics and disk.dump.time and disk.compaction.time to per-index statistics. They report the total time spent doing dump and compaction tasks, respectively and can be useful for estimating average disk write rate, which is required for compaction-aware throttling.
-
Vladimir Davydov authored
This patch adds the following new fields to box.stat.vinyl(): scheduler.tasks_inprogress - number of currently running tasks scheduler.tasks_completed - number of successfully completed tasks scheduler.tasks_failed - number of aborted tasks tasks_failed can be useful for monitoring disk write errors while tasks_inprogress and tasks_completed can shed light on worker thread pool effeciency.
-
Vladimir Davydov authored
Indexes of the same space share a memory level so we should account them to box.stat.vinyl().scheduler.dump_input once per space dump, not each time an index is dumped.
-
Vladimir Davydov authored
This patch adds scheduler.dump_count to box.stat.vinyl(), which shows the number of memory level dumps that have happened since the instance startup or box.stat.reset(). It's useful for estimating an average size of a single memory dump, which in turn can be used for calculating LSM tree fanout.
-
Vladimir Davydov authored
Although it's convenient to maintain dump/compaction input/output metrics in vy_lsm_env, semantically it's incorrect as those metrics characterize the scheduler not the LSM environment. Also, we can't easily extend those stats with e.g. the number of completed dumps or the number of tasks in progress, because those are only known to the scheduler. That said, let's introduce 'scheduler' section in box.stat.vinyl() and move dump/compaction stats from 'disk' to the new section. Let's also move the stats accounting from vy_lsm.c to vy_scheduler.c. The 'disk' section now stores only the size of data and index on disk and no cumulative statistics, which makes it similar to the 'memory' section. Note, this patch flattens the stats (disk.compaction.input is moved to scheduler.compaction_input and so forth), because all other global stats are reported without using nested tables.
-
- Jan 15, 2019
-
-
Vladimir Davydov authored
During local recovery we may encounter an LSM tree marked as dropped. This means that the LSM tree was dropped before restart and hence will be deleted before recovery completion. There's no need to add such trees to the vinyl scheduler - it looks confusing and can potentially result in mistakes when the code gets modified.
-
Vladimir Davydov authored
Currently, we bump range->version in vy_scheduler.c. This looks like an encapsulation violation, and may easily result in an error (as we have to be cautious to inc range->version whenever we modify a range). That said, let's bump range version right in vy_range.c.
-
Vladimir Davydov authored
compact_input sounds confusing, because 'compact' works as an adjective here. Saving 3 characters per variable/stat name related to compaction doesn't justify this. Let's rename 'compact' to 'compaction' both in stats and in the code.
-
Vladimir Davydov authored
'in' is a reserved keyword in Lua, so using 'in' as a map key was a bad decision - one has to access it with [] rather than simply with dot. Let's rename 'in' to 'input' and 'out' to 'output' both in the output and in the code.
-
Vladimir Davydov authored
This test is huge and takes long to complete. Let's move ddl, tx, and stat related stuff to separate files.
-
Vladimir Davydov authored
The test was called 'info' in the first place, because back when it was introduced vinyl statistics were reported by 'info' method. Today, stats are reported by 'stat' so let's rename the test as well to conform.
-
- Jan 14, 2019
-
-
Konstantin Osipov authored
Add box.info.gc.checkpoint_is_in_progress which is true when there is an ongoing checkpoint/snapshot and false otherwise Closes gh-3935 @TarantoolBot document Title: box.info.gc().checkpoint_is_in_progress Extend box.info.gc() documentation with a new member - checkpoint_is_in_progress, which is true if there is an ongoing checkpoint, false otherwise.
-
- Jan 10, 2019
-
-
Mergen Imeev authored
Not really critical as obuf_alloc() fails only on OOM, i.e. never in practice.
-
Kirill Shcherbatov authored
Introduced a new OP_Update opcode executing Tarantool native Update operation. In case of UPDATE or REPLACE we can't use new OP_Update as it has a complex SQL-specific semantics: CREATE TABLE tj (s1 INT PRIMARY KEY, s2 INT); INSERT INTO tj VALUES (1, 3),(2, 4),(3,5); CREATE UNIQUE INDEX i ON tj (s2); SELECT * FROM tj; [1, 3], [2, 4], [3, 5] UPDATE OR REPLACE tj SET s2 = s2 + 1; SELECT * FROM tj; [1, 4], [3, 6] I.e. [1, 3] tuple is updated as [1, 4] and have replaced tuple [2, 4]. This logic is implemented as preventive tuples deletion by all corresponding indexes in SQL. The other significant change is forbidden primary key update. It was possible to deal with it the same way like with or REPLACE specifier but we need an atomic UPDATE step for #3691 ticket to support "or IGNORE/or ABORT/or FAIL" specifiers. Reworked tests to make testing avoiding primary key UPDATE where possible. Closes #3850
-