- Aug 18, 2021
-
-
mechanik20051988 authored
Since tarantool can use different allocators, we need a single interface for getting statistics that is suitable for all allocators. Follow-up #5419
-
mechanik20051988 authored
The ability to select an allocator for memtx has been added to tarantool. To test a new type of allocator, all tests must also be run with it. Implemented new option, which allows to set allocator for memtx. If you wan't to choose allocator type for tests, run test-run.py with --memtx-allocator="small" or --memtx-allocator="system". Allocator type is passed via MEMTX_ALLOCATOR environment variable to the test.
-
Nikita Pettik authored
Firstly, Allocator::stats() must accept callback function and its argument to fulfill small interfaces. Secondly, unify stats and store both system and small allocators statistics in the same structure in order to use ::stats() method in foreach_allocator() helper. Follow-up #5419
-
Nikita Pettik authored
- Remove two delayed free lists from memtx_engine; - Use memtx_allocators_init/memtx_allocators_destroy to initialize and destroy memtx allocators; - Use memtx_allocators_set_mode() to set delayed free mode. Follow-up #5419
-
Nikita Pettik authored
That includes: - foreach_memtx_allocator() - for each basic allocator (Small and Sys so far) we declare corresponding MemtxAllocator. So to iterate over all allocators and invoke same function we are going to use this helper; - memtx_allocators_init() - invoke ::create() of each Allocator and corresponding MemtxAllocator; - memtx_allocators_destroy() - invoke ::destroy() of each Allocator and corresponding MemtxAllocator; - memtx_allocators_set_mode() - set given delayed free mode for each MemtxAllocator. Follow-up #5419
-
Nikita Pettik authored
It is considered to be wrapper (MemtxAllocator is parameterized by ordinary allocator like Small or System) encapsulating allocator and providing its own delayed deletion strategy. Follow-up #5419
-
Nikita Pettik authored
This is helper to set proper tuple_format vtable depending on allocator symbolic name. Follow-up #5419
-
Nikita Pettik authored
Let's initialize them in the dedicated allocator.cc source file Follow-up #5419
-
Nikita Pettik authored
It is supposed to generalize workflow with allocators so that apply the same function (like create()/destroy() etc) for each existing allocator. Newborn functions are not used yet since we are going to add more wrappers. Warning: dark template magic is involved, do not try what you're about to see at home. Follow-up #5419
-
Nikita Pettik authored
It is assumed to accumulate all allocation setting across all allocators in order to unify Allocator::create() interface. Follow-up #5419
-
mechanik20051988 authored
Add new 'memtx_allocator' option to box.cfg{} which allows to select the appropriate allocator for memtx tuples if necessary. Possible values are "system" for malloc allocator and "small" for default small allocator. Closes #5419 @TarantoolBot document Title: Add new 'memtx_allocator' option to box.cfg{} Add new 'memtx_allocator' option to box.cfg{} which allows to select the appropriate allocator for memtx tuples if necessary. Possible values are "system" for malloc allocator and "small" for default small allocator.
-
mechanik20051988 authored
Slab allocator, which is used for tuples allocation, has a certain disadvantage - it tends to unresolvable fragmentation on certain workloads (size migration). In this case user should be able to choose other allocator. System allocator based on malloc function, but restricted by the same qouta as slab allocator. System allocator does not alloc all memory at start, istead, it allocate memory as needed, checking that quota is not exceeded. Part of #5419
-
mechanik20051988 authored
Patch which prepare ability to select memory allocator. Changed tuple allocation functions to templates, with parameterized by the memory allocator type. Part of #5419
-
mechanik20051988 authored
In the patch with the choice of allocator for memtx, it was decided to use templates, so we need to change all associated files from *.c to *.cc. At the same time, changes were made to ensure c++ compilation: added explicit type casting, refactoring code with goto, which crosses variable initialization. Part of #5419
-
mechanik20051988 authored
Previously in memtx space direct memtx_tuple_new/memtx_tuple_delete function calls were used. Also pointers to functions, used for alloc/free memory for memtx tuples are stored in tuple_format_vtab. Replaced direct memtx_tuple_new and memtx_tuple_delete function calls in memtx_space to calls via pointers from vtab. Part of #5419
-
mechanik20051988 authored
Delayed free mode in small allocator is only used to free up tuple memory, during snapshot creation. This is not directly related to the small allocator itself, so moved this code to tarantool in memtx_engine.c, where tuples memory allocation/deallocation occurs.
-
mechanik20051988 authored
Changed small allocator strategy. In previous version small allocate memory from the most appropriate pool. This strategy has one significant disadvantage: during memory allocation for tuples with highly distributed sizes, we allocate an entire whole slab for one or two objects, moreover, when they are further deleted, slab is not released (see spare slab in mempool). Now we divide mempools with the same slab size into groups containing no more than 32 pools. First, we allocate memory from mempool with the largest size in group, then when memory waste for a certain size of objects as a result of a non-optimal pool selection became larger then slab size / 4, we start allocating memory for them from the most suitable mempool. At the same time, for other objects, we can use both of these mempools, in case if new mempool has a larger objsize. With this strategy, we avoid losing memory. Also change allocator behaviour in the matter of saving spare slab. With new allocator strategy we don't need to save spare slab for all mempools, we need to save it only for the last mempool in group. This strategy solves both problems - there is no unnecessary memory loss on the spare slabs and we prevent oscillations when single object is repeatedly allocated. Closes #3633
-
- Aug 17, 2021
-
-
Vladimir Davydov authored
The test assumes that a version string looks like this 2.9.0-123-gabcabcababc. We want to append a flow string after <major>.<minor>.<patch>. Fix the test accordingly. Needed for #6183
-
Igor Munkin authored
* Fix bytecode register allocation for comparisons. * gdb: support LJ_DUALNUM mode Closes #6224 Closes #6227 Part of #5629
-
Serge Petrenko authored
Direct upgrade support from pre-1.7.5 versions was removed in commit 7d3b80e7 (Forbid upgrade from Tarantool < 1.7.5 and refactor upgrade.lua) The reason for that was the mandatory space format checks introduced back then. With these space format checks, old schema couldn't be recovered on new Tarantool versions, because newer versions had different system space formats. So old schema couldn't be upgraded because it couldn't even be recovered. Actually this was rather inconvenient. One had to perform an extra upgrade step when upgrading from, say, 1.6 to 2.x: instead of performing a direct upgrade one had to do 1.6 -> 1.10 -> 2.x upgrade which takes twice the time. Make it possible to boot from snapshots coming from Tarantool version 1.6.8 and above. In order to do so, introduce before_replace triggers on system spaces, which work during snapshot/xlog recovery. The triggers will set tuple formats to the ones supported by current Tarantool (2.x). This way the recovered data will have the correct format for a usual schema upgrade. Also add upgrade_to_1_7_5() handler, which finishes transformation of old schema to 1.7.5. The handler is fired together with other box.schema.upgrade() handlers, so there's no user-visible behaviour change. Side note: it would be great to use the same technique to allow booting from pre-1.6.8 snapshots. Unfortunately, this is not possible. Current triggers don't break the order of schema upgrades, so 1.7.1 upgrades come before 1.7.2 and 1.7.5. This is because all the upgrades in these versions are replacing existing tuples and not inserting new ones, so the upgrades may be handled by the before_replace triggers. Upgrade to 1.6.8 requires inserting new tuples: creating sysviews, like _vspace, _vuser and so on. This can't be done from the before_replace triggers, so we would have to run triggers for 1.7.x first which would allow Tarantool to recover the snapshot, and then run an upgrade handler for 1.6.8. This looks really messy. Closes #5894
-
Serge Petrenko authored
Introduce table.equals for comparing tables. The method respects __eq metamethod, if provided. Needed-for #5894 @TarantoolBot document Title: lua: new method table.equals Document the new lua method table.equals It compares two tables deeply. For example: ``` tarantool> t1 = {a=3} --- ... tarantool> t2 = {a=3} --- ... tarantool> t1 == t2 --- - false ... tarantool> table.equals(t1, t2) --- - true ... ``` The method respects the __eq metamethod. When both tables being compared have the same __eq metamethod, it's used for comparison (just like this is done in Lua 5.1)
-
Serge Petrenko authored
Found the following error in our CI: Test failed! Result content mismatch: --- replication/election_basic.result Fri Aug 13 13:50:26 2021 +++ /build/usr/src/debug/tarantool-2.9.0.276/test/var/rejects/replication/election_basic.reject Sat Aug 14 08:14:17 2021 @@ -116,6 +116,7 @@ | ... box.ctl.demote() | --- + | - error: box.ctl.demote does not support simultaneous invocations | ... -- Even though box.ctl.demote() or box.ctl.promote() isn't called above the failing line, promote() is issued internally once the instance becomes the leader. Wait until previous promote is finished (i.e. box.info.synchro.queue.owner is set)
-
Serge Petrenko authored
upstream.lag is the delta between the moment when a row was written to master's journal and the moment when it was received by the replica. It's an important metric to check whether the replica has fallen too far behind master. Not all the rows coming from master have a valid time of creation. For example, RAFT system messages don't have one, and we can't assign correct time to them: these messages do not originate from the journal, and assigning current time to them would lead to jumps in upstream.lag results. Stop updating upstream.lag for rows which don't have creation time assigned. The upstream.lag calculation changes were meant to fix the flaky replication/errinj.test: Test failed! Result content mismatch: --- replication/errinj.result Fri Aug 13 15:15:35 2021 +++ /tmp/tnt/rejects/replication/errinj.reject Fri Aug 13 15:40:39 2021 @@ -310,7 +310,7 @@ ... box.info.replication[1].upstream.lag < 1 --- -- true +- false ... But the changes were not enough, because now the test may see the initial lag value (TIMEOUT_INFINITY). So fix the test as well by waiting until upstream.lag becomes < 1.
-
- Aug 16, 2021
-
-
Vladimir Davydov authored
Before commit 954194a1 ("net.box: rewrite request implementation in C"), net.box future was a plain Lua table so that the caller could attach extra information to it. Now it isn't true anymore - a future is a userdata object, and it doesn't have indexing methods. For backward compatibility, let's add __index and __newindex fields and store user-defined fields in a Lua table, which is created lazily on the first __newindex invocation. __index falls back on the metatable methods if a field isn't found in the table. Follow-up #6241 Closes #6306
-
Vladimir Davydov authored
It didn't yield before commit 954194a1 ("net.box: rewrite request implementation in C"). It shouldn't yield now. Follow-up #6241
-
Nikita Pettik authored
To avoid sharing (ergo phantom reads) metadata object for different transactions in MVCC mode, let's do following things. Firstly, let's set on replace trigger on all system spaces (content's change in system space is considered to be DDL operation) which disables yields until transaction is committed. The only exceptions are index build and space format check: during these operations yields are allowed since they may take a while (so without yields they block execution). Actually it is not a problem 'cause these two operations must be first-in-transaction: as a result transaction can't contain two yielding statements. So after any cache modification no yields take place for sure. Secondly, on committing transaction that provides DDL changes let's abort all other transaction since they may refer to obsolete schema objects. The last restriction may seem too strict, but it is OK as primitive workaround until transactional DDL is introduced. In fact we should only abort transactions that have read dirty (i.e. modified) objects. Closes #5998 Closes #6140 Workaround for #6138
-
- Aug 14, 2021
-
-
Aleksandr Lyapunov authored
It seem that the issue was fixes in one of previous commits. Just add the test. No logical changes. Closes #5801
-
Aleksandr Lyapunov authored
The return code was not checked and thus in case of memory error we could loose conflicts. Fix it. Follow up #6040
-
Aleksandr Lyapunov authored
There was a bug when a transaction makes a wrong statement that is aborted because of duplicate tuple in primary or secondary index. The problem is that check of existing tuple is an implicit read that has usual side effect. This patch tracks that kind of reads like ordinal reads. Part of #5999
-
Aleksandr Lyapunov authored
After the previous patch it became possible to link read trackers to in-progress stories. This patch use one read tracker instead of bunch of direct conflicts in tuple_clarify. This is a bit accurate. Is also allows to avoid unnecessary conflict when a transaction reads its own change. Part of #5999
-
Aleksandr Lyapunov authored
Before this patch when a transaction has performed a write to read gap (interval) a conflict record has beed created for the reader of this gaps. That is wrong since the next writer of the same value will not find a gap - the gap has been splitted into parts. This patch fixes that and create a special read tracker that was designed specially for further tracking of writes. This also requires writer to search for read trackers not only in prepared stories but also in in-progress stories too. Part of #5999
-
Aleksandr Lyapunov authored
There was a obvious bug in transactinal manager's GC. There can be stories about deleted tuples. In other word tuples were deleted, but their story remains for history for some time. That means that pointers to dirty tuples are left in indexes, while the stories say that that tuples are deleted. When GC comes, it must remove pointer to tuple from indexes too. That is simple to check - if a story is on top of chain - it must be in index, and if it is a story about deleted tuple - it must be removed from index. But also that story must be unliked from chain, and the next story becomes the top on chain, but (1) in turn it must not try to delete its tuple from index - we have already done it, deleting the first tuple. For this purpose we mark the next story with space = NULL. The problem is that setting space = NULL work for every index at once, while sometimes we have to hande each index independently. Fortunately the previous commit introduced in_index member of story's link, NULL by default. We can just leave that NULL in older story as a mark that is not in index. This commit makes so and fixes the bug. Closes #6234
-
Aleksandr Lyapunov authored
There was a tricky problem in TX manager that could lead to a crash after deletion of a space. When a space is deleted, TX manager uses a special callback to remove dirty tuples from indexes. It is necessary for correct destruction of space and indexes. The problem is that actual space drop works in several steps, deletings secondary indexes and then deleting primary indexes. Each step is an independend alter. And alters are tricky. For example we had a struct space instance, namely S1, with two indexes I1 and I2. At the first step we have to delete the second index. By design, for that purpose a new instance of space is created, namely S2, with one empty index I3. Then the spaces exchanges their indexes, and S1 becomes with I3 and I2, and S2 owns I1. After that S1 is deleted. That is good until we try to make story cleanup - all the dirty tuples remain in S2.I1, while we try to clean empty S1.I3. The only way to fix it - story index pointer right in story to make sure we are cleaning the right index. Part of #6234 Closes #6274
-
Egor Elchinov authored
MVCC used not to track hash index writes. This patch fixes this problem by transferring the readers which use `ITER_ALL` or `ITER_GT` iterators of hash index to read view after any subsequent external write to this index. Closes #6040
-
Aleksandr Lyapunov authored
The previous commit fixed a bug that caused dirty read but also introduced a much less significat problem - excess conflict in some cases. Usually if a reader reads a tuple - in its story aspecial record is stored. Any write that replaces or deletes that tuple can now cause conflict of current transaction. The problem happened when a reader tries to execute select from some index, but only deleted story is found there. The record is stored and that is good - we must know when somebody will insert a tuple to this place in index. But actually we need to know it only for the index from which the reader executed select. This patch introduces a special index mask in read tracker that is used in the case above to be more precise in conflict detection. Closes #6206
-
Aleksandr Lyapunov authored
In order to preserve repeated reads transactional manager tracks read of each transactions. Generally reads can be of two types - those that have read a tuple or that have found nothing. The first are stored in tuple story, the second - in special gap and hole structures. The problem was that reads that found a dirty tuple that was invisible to this transaction (the story says that it is deleted) was not stored neither in story nor in gap/holes. This patch fixes that. Part of #6206
-
Aleksandr Lyapunov authored
During iteration a memtx tree index must write gap records to TX manager. It is done in order to detect the further writes to that gaps and execute some logic preventing phantom reads. There are two cases when that gap is stores: * Iterator reads the next tuple, the gap is between two tuples. * Iterator finished reading, the gap is between the previous tuple and the key boundary. By a mistake these two cases were not distinguished correctly and that led to excess conflicts. This patch fixes it. Part of #6206
-
Aleksandr Lyapunov authored
Just add a function that allocates and initializes the structure. No logical changes. Part of #6206
-
Aleksandr Lyapunov authored
No logical changes, only for the next commit simplification Part of #6206
-
Aleksandr Lyapunov authored
Implement check_dup_common function that calls either check_dup_clean or check_dup_dirty. No logical changes. Follow up #6132
-