- Feb 11, 2019
-
-
Alexander Turenko authored
Nikita Pettik suggests me that free(NULL) is no-op according to POSIX. This is follow up of 9dbcaa3a.
-
Konstantin Belyavskiy authored
This is a draft paper covering following topics: 1. Draft protocol for discovering and maintaining network topology in case of large arbitrary network. 2. List of required changes to support this feature. 3. Open questions and alternatives. Changes in V2: Based or Vlad's review 1. Rewrite couple sections to make it more clear. 2. Clarify with more details and add examples. 3. Fixed error. RFC for #3294
-
Vladimir Davydov authored
When computing the number of runs that need to be compacted for a range to conform to the target LSM tree shape, we use the newest run size for the size of the first LSM tree level. This isn't quite correct for two reasons. First, the size of the newest run is unstable - it may vary in a relatively wide range from dump to dump. This leads to frequent changes in the target LSM tree shape and, as a result, unpredictable compaction behavior. In particular this breaks compaction randomization, which is supposed to smooth out IO load generated by compaction. Second, this can increase space amplification. We trigger compaction at the last level when there's more than one run, irrespective of the value of run_count_per_level configuration option. We expect this to keep space amplification below 2 provided run_count_per_level is not greater than (run_size_ratio - 1). However, if the newest run happens to have such a size that multiplying it by run_size_ratio several times gives us a value only slightly less than the size of the oldest run, we can accumulate up to run_count_per_level more runs that are approximately as big as the last level run without triggering compaction, thus increasing space amplification by up to run_count_per_level. To fix these problems, let's use the oldest run size for computing the size of the first LSM tree level - simply divide it by run_size_ratio until it exceeds the size of the newest run. Follow-up #3657
-
Vladimir Davydov authored
While a replica is bootstrapped from a remote master, vinyl engine may need to perform compaction, which means that it may write to the _vinyl_deferred_delete system space. Compaction proceeds fully asynchronously, i.e. a write may occur after the join stage is complete, but before the WAL is initialized, in which case the new replica will crash. To make sure a race like that won't happen, let's setup WAL before making the initial checkpoint. The WAL writer is now initialized right before starting the WAL thread and so we don't need to split WAL struct into the thread and the writer anymore. Closes #3968
-
- Feb 08, 2019
-
-
Vladimir Davydov authored
If this test is executed after some other test that bumps LSN, then the output line gets truncated differently, because greater LSNs may increase its length. Fix this by filtering out the LSN manually. Closes #3970
-
Ivan Koptelov authored
Currently all on 'conflict' actions are silently ignored for 'check' constraints. This patch add explicit parse-time error. Closes #3345
-
Nikita Pettik authored
Closes #3698
-
Nikita Pettik authored
Replace remains of affinity usage in SQL parser, query optimizer and VDBE. Don't add affinity to field definition when table is encoded into msgpack. Remove field type <-> affinity converters, since now we can operate directly on field type. Part of #3698
-
Nikita Pettik authored
Also this patch resolves issue connected with wrong query plans during select on spaces created from Lua: instead of index search in most cases table scan was used. It appeared due to the fact that index was checked on affinity compatibility with space format. So, if space is created without affinity in format, indexes won't be used. However, now all checks are related to field types, and as a result query optimizer is able to choose correct index. Closes #3886 Part of #3698
-
Nikita Pettik authored
This stage of affinity removal requires introducing of auxiliary intermediate function to convert array of affinity values to field type values. The rest of job done in this commit is a straightforward refactoring. Part of #3698
-
Nikita Pettik authored
Lets use field_type instead of affinity as a type of return value of user function registered in SQL. Moreover, lets assign type of return value to expression representing functions. It allows to take it into consideration during derived type calculation. Part of #3698
-
Nikita Pettik authored
Numeric affinity in SQLite means the same as real, except that it forces integer values into floating point representation in case it can be converted without loss (e.g. 2.0 -> 2). Since in Tarantool core there is no difference between numeric and real values (both are stored as values of Tarantool type NUMBER), lets remove numeric affinity and use instead real. The only real pitfall is implicit conversion mentioned above. We can't pass *.0 as an iterator value since our fast comparators (TupleCompare, TupleCompareWithKey) are designed to work with only values of same MP_ type. They do not use slow tuple_compare_field() which is able to compare double and integer. Solution to this problem is simple: lets always attempt at encoding floats as ints if conversion takes place without loss. This is a straightforward approach, but to implement it we need to care about reversed (decoding) situation. OP_Column fetches from msgpack field with given number and stores it as a native VDBE memory object. Type of that memory is based on type of msgpack value. So, if space field is of type NUMBER and holds value 1, type of VDBE memory will be INT (after decoding), not float 1.0. As a result, further calculations may be wrong: for instance, instead of floating point division, we could get integer division. To cope with this problem, lets add auxiliary conversion to decoding routine which uses space format of tuple to be decoded. It is worth mentioning that ephemeral spaces don't feature space format, so we are going to rely on type of key parts. Finally, internal VDBE merge sorter also operates on entries encoded into msgpack. To fix this case, we check type of ORDER BY/GROUP BY arguments: if they are of type float, we are emitting additional opcode OP_AffinityReal to force float type after encoding. Part of #3698
-
Nikita Pettik authored
Also, this allows to delay affinity assignment to field def until encoding of table format. Part of #3698
-
Nikita Pettik authored
Code under this define is dead. What is more, it uses affinity, so lets remove it alongside with tests related to it. Needed for #3698
-
Georgy Kirichenko authored
Applier used to promote vclock prior to applying the row. This lead to a situation when master's row would be skipped forever in case there is an error trying to apply it. However, some errors are transient, and we might be able to successfully apply the same row later. While we're at it, make wal writer the only one responsible for advancing replicaset vclock. It was already doing it for rows coming from the local instance, besides, it makes the code cleaner since now we want to advance vclock direct from wal batch reply and lets us get rid of unnecessary checks whether applier or wal has already advanced the vclock. Closes #2283 Prerequisite #980
-
Georgy Kirichenko authored
Wal used to promote vclock prior to write the row. This lead to a situation when master's row would be skipped forever in case there is an error trying to write it. However, some errors are transient, and we might be able to successfully apply the same row later. So we do not promote writer vclock in order to be able to restart replication from failing point. Obsoletes xlog/panic_on_lsn_gap.test. Needed for #2283
-
- Feb 07, 2019
-
-
Vladimir Davydov authored
Follow-up dd30970e ("replication: log replica_id in addition to lsn on conflict").
-
Vladimir Davydov authored
Without replica_id, lsn of the conflicting row doesn't make much sense.
-
Serge Petrenko authored
On replica subscribe master checks that replica's cluster id matches master's one, and disallows replication in case of mismatch. This behaviour blocks implementation of anonymous replicas, which shouldn't pollute _cluster space and could accumulate changes from multiple clusters at once. So let's move the check to replica to let it decide which action to take in case of mismatch. Needed for #3186 Closes #3704
-
Stanislav Zudin authored
VDBE returns an error if LIMIT or OFFSET expressions are casted to the negative integer value. If expression in the LIMIT clause can't be converted into integer without data loss the VDBE instead of SQLITE_MISMATCH returns SQL_TARANTOOL_ERROR with message "Only positive integers are allowed in the LIMIT clause". The same for OFFSET clause. Closes #3467
-
- Feb 06, 2019
-
-
Vladimir Davydov authored
Historically, when considering splitting or coalescing a range or updating compaction priority, we use sizes of compressed runs (see bytes_compressed). This makes the algorithms dependent on whether compression is used or not and how effective it is, which is weird, because compression is a way of storing data on disk - it shouldn't affect the way data is partitioned. E.g. if we turned off compression at the first LSM tree level, which would make sense, because it's relatively small, we would affect the compaction algorithm because of this. That said, let's use uncompressed run sizes when considering range tree transformations.
-
Serge Petrenko authored
After the patch which made os.exit() execute on_shutdown triggers (see commit 6dc4c8d7) we relied on on_shutdown triggers to break the ev_loop and exit tarantool. Hovewer, there is an auxiliary event loop which is run in tarantool_lua_run_script() to reschedule the fiber executing chunks of code passed by -e option and executing interactive mode. This event loop is started only to execute interactive mode, and doesn't exist during execution of -e chunks. Make sure we don't start it if os.exit() was already executed in one of the chunks. Closes #3966
-
Serge Petrenko authored
In case a fiber joining another fiber gets cancelled, it stays suspended forever and never finishes joining. This happens because fiber_cancel() wakes the fiber and removes it from all execution queues. Fix this by adding the fiber back to the wakeup queue of the joined fiber after each yield. Closes #3948
-
Serge Petrenko authored
Start showing downstream status for relays in "follow" state. Also refactor lbox_pushrelay to unify code for different relay states. Closes #3904
-
- Feb 05, 2019
-
-
Konstantin Osipov authored
Initially tuple_field_* getters were placed in tuple_format.h to avoid including tuple_format.h in tuple.h. Now we include tuple_format.h in tuple.h anyway, so move the code where it belongs. Besides, there were a bunch of new getters added to tuple.h since then, so the code has rotten a bit. This is a preparation for an overhaul of tuple_field_* getters naming.
-
Konstantin Osipov authored
-
Konstantin Osipov authored
-
Konstantin Osipov authored
We use tuple_field_raw_ prefix for other similar members.
-
Konstantin Osipov authored
Use cached tuple data nad format in tuple_hash.c.
-
Konstantin Osipov authored
-
Konstantin Osipov authored
Add a comment explaining the logic behind intermediate lookups in json_tree_lookup_path() function.
-
- Feb 04, 2019
-
-
Konstantin Osipov authored
-
Vladimir Davydov authored
The patch adds missing -fPIC option for clang, without which msgpuck library might fail to compile.
-
Kirill Shcherbatov authored
Implemented a more convenient interface for creating an index by JSON path. Instead of specifying fieldno and relative path it is now possible to pass full JSON path to data. Closes #1012 @TarantoolBot document Title: Indexes by JSON path Sometimes field data could have complex document structure. When this structure is consistent across whole space, you are able to create an index by JSON path. Example: s = box.schema.space.create('sample') format = {{'id', 'unsigned'}, {'data', 'map'}} s:format(format) -- explicit JSON index creation age_idx = s:create_index('age', {{2, 'number', path = "age"}}) -- user-friendly syntax for JSON index creation parts = {{'data.FIO["fname"]', 'str'}, {'data.FIO["sname"]', 'str'}, {'data.age', 'number'}} info_idx = s:create_index('info', {parts = parts}}) s:insert({1, {FIO={fname="James", sname="Bond"}, age=35}})
-
Kirill Shcherbatov authored
tuple_field_by_part looks up the tuple_field corresponding to the given key part in tuple_format in order to quickly retrieve the offset of indexed data from the tuple field map. For regular indexes this operation is blazing fast, however of JSON indexes it is not as we have to parse the path to data and then do multiple lookups in a JSON tree. Since tuple_field_by_part is used by comparators, we should strive to make this routine as fast as possible for all kinds of indexes. This patch introduces an optimization that is supposed to make tuple_field_by_part for JSON indexes as fast as it is for regular indexes in most cases. We do that by caching the offset slot right in key_part. There's a catch here however - we create a new format whenever an index is dropped or created and we don't reindex old tuples. As a result, there may be several generations of tuples in the same space, all using different formats while there's the only key_def used for comparison. To overcome this problem, we introduce the notion of tuple_format epoch. This is a counter incremented each time a new format is created. We store it in tuple_format and key_def, and we only use the offset slot cached in a key_def if it's epoch coincides with the epoch of the tuple format. If they don't, we look up a tuple_field as before, and then update the cached value provided the epoch of the tuple format. Part of #1012
-
Kirill Shcherbatov authored
Introduced has_json_path flag for compare, hash and extract functions templates(that are really hot) to make possible do not look to path field for flat indexes without any JSON paths. Part of #1012
-
Kirill Shcherbatov authored
New JSON indexes allows to index documents content. At first, introduced new key_part fields path and path_len representing JSON path string specified by user. Modified tuple_format_use_key_part routine constructs corresponding tuple_fields chain in tuple_format::fields tree to indexed data. The resulting tree is used for type checking and for alloctating indexed fields offset slots. Then refined tuple_init_field_map routine logic parses tuple msgpack in depth using stack allocated on region and initialize field map with corresponding tuple_format::field if any. Finally, to proceed memory allocation for vinyl's secondary key restored by extracted keys loaded from disc without fields tree traversal, introduced format::min_tuple_size field - the size of tuple_format tuple as if all leaf fields are zero. Example: To create a new JSON index specify path to document data as a part of key_part: parts = {{3, 'str', path = '.FIO.fname', is_nullable = false}} idx = s:create_index('json_idx', {parts = parse}) idx:select("Ivanov") Part of #1012
-
Kirill Shcherbatov authored
Introduced a new function tuple_field_raw_by_path is used to get tuple fields by field index and relative JSON path. This routine uses tuple_format's field_map if possible. It will be further extended to use JSON indexes. The old tuple_field_raw_by_path routine used to work with full JSON paths, renamed tuple_field_raw_by_full_path. It's return value type is changed to const char * because the other similar functions tuple_field_raw and tuple_field_by_part_raw use this convention. Got rid of reporting error position for 'invalid JSON path' error in lbox_tuple_field_by_path because we can't extend other routines to behave such way that makes an API inconsistent, moreover such error are useless and confusing. Needed for #1012
-
Kirill Shcherbatov authored
The msgpack dependency has been updated because the new version introduces the new mp_stack class which we will use to parse tuple without recursion when initializing the field map. Needed for #1012
-
- Jan 30, 2019
-
-
Serge Petrenko authored
Move a call to tarantool_free() to the end of main(). We needn't call atexit() at all anymore, since we've implemented on_shutdown triggers and patched os.exit() so that when exiting not due to a fatal signal (when no cleanup routines are called anyway) control always reaches a call to tarantool_free().
-