Skip to content
Snippets Groups Projects
  1. Aug 18, 2021
    • mechanik20051988's avatar
      memtx: implement unified interface for getting allocator statistics · 8f97c051
      mechanik20051988 authored
      Since tarantool can use different allocators, we need a single
      interface for getting statistics that is suitable for all allocators.
      Follow-up #5419
      8f97c051
    • mechanik20051988's avatar
      tests: add ability to run box/engine tests with different allocators · dcf6fdb3
      mechanik20051988 authored
      The ability to select an allocator for memtx has been added to
      tarantool. To test a new type of allocator, all tests must also
      be run with it. Implemented new option, which allows to set allocator
      for memtx. If you wan't to choose allocator type for tests, run test-run.py
      with --memtx-allocator="small" or --memtx-allocator="system". Allocator type
      is passed via MEMTX_ALLOCATOR environment variable to the test.
      dcf6fdb3
    • Nikita Pettik's avatar
      memtx: provide correct stats calculation for allocators · 279ca6fb
      Nikita Pettik authored
      Firstly, Allocator::stats() must accept callback function and its
      argument to fulfill small interfaces.
      Secondly, unify stats and store both system and small allocators
      statistics in the same structure in order to use ::stats() method in
      foreach_allocator() helper.
      
      Follow-up #5419
      279ca6fb
    • Nikita Pettik's avatar
      memtx: use MemtxAllocator in memtx engine · 6552cfab
      Nikita Pettik authored
      - Remove two delayed free lists from memtx_engine;
      - Use memtx_allocators_init/memtx_allocators_destroy to initialize and
        destroy memtx allocators;
      - Use memtx_allocators_set_mode() to set delayed free mode.
      
      Follow-up #5419
      6552cfab
    • Nikita Pettik's avatar
      memtx: introduce interfaces for MemtxAllocator · f9f7ff66
      Nikita Pettik authored
      That includes:
       - foreach_memtx_allocator() - for each basic allocator (Small and Sys
         so far) we declare corresponding MemtxAllocator. So to iterate over
         all allocators and invoke same function we are going to use this
         helper;
       - memtx_allocators_init() - invoke ::create() of each Allocator and
         corresponding MemtxAllocator;
       - memtx_allocators_destroy() - invoke ::destroy() of each Allocator
         and corresponding MemtxAllocator;
       - memtx_allocators_set_mode() - set given delayed free mode for each
         MemtxAllocator.
      
      Follow-up #5419
      f9f7ff66
    • Nikita Pettik's avatar
      memtx: introduce MemtxAllocator template class · ce2dcad5
      Nikita Pettik authored
      It is considered to be wrapper (MemtxAllocator is parameterized by
      ordinary allocator like Small or System) encapsulating allocator and
      providing its own delayed deletion strategy.
      
      Follow-up #5419
      ce2dcad5
    • Nikita Pettik's avatar
      memtx: introduce memtx_set_tuple_format_vtab() · 7417aed7
      Nikita Pettik authored
      This is helper to set proper tuple_format vtable depending on allocator
      symbolic name.
      
      Follow-up #5419
      7417aed7
    • Nikita Pettik's avatar
      memtx: move initialization of Small and Sys allocators to source file · 7104a533
      Nikita Pettik authored
      Let's initialize them in the dedicated allocator.cc source file
      
      Follow-up #5419
      7104a533
    • Nikita Pettik's avatar
      memtx: introduce foreach_allocator template walker · 446236e1
      Nikita Pettik authored
      It is supposed to generalize workflow with allocators so that apply the
      same function (like create()/destroy() etc) for each existing allocator.
      Newborn functions are not used yet since we are going to add more
      wrappers.
      
      Warning: dark template magic is involved, do not try what you're about
      to see at home.
      
      Follow-up #5419
      446236e1
    • Nikita Pettik's avatar
      memtx: introduce allocator_settings structure · 96793697
      Nikita Pettik authored
      It is assumed to accumulate all allocation setting across all allocators
      in order to unify Allocator::create() interface.
      
      Follow-up #5419
      96793697
    • mechanik20051988's avatar
      memtx: implement api for memory allocator selection · 1847ae07
      mechanik20051988 authored
      Add new 'memtx_allocator' option to box.cfg{} which allows to
      select the appropriate allocator for memtx tuples if necessary.
      Possible values are "system" for malloc allocator and
      "small" for default small allocator.
      
      Closes #5419
      
      @TarantoolBot document
      Title: Add new 'memtx_allocator' option to box.cfg{}
      Add new 'memtx_allocator' option to box.cfg{} which allows to
      select the appropriate allocator for memtx tuples if necessary.
      Possible values are "system" for malloc allocator and
      "small" for default small allocator.
      1847ae07
    • mechanik20051988's avatar
      memtx: implement system allocator, based on malloc · bc88b325
      mechanik20051988 authored
      Slab allocator, which is used for tuples allocation,
      has a certain disadvantage - it tends to unresolvable
      fragmentation on certain workloads (size migration).
      In this case user should be able to choose other allocator.
      System allocator based on malloc function, but restricted
      by the same qouta as slab allocator. System allocator
      does not alloc all memory at start, istead, it allocate
      memory as needed, checking that quota is not exceeded.
      Part of #5419
      bc88b325
    • mechanik20051988's avatar
      memtx: implement template tuple allocation · 94e137bc
      mechanik20051988 authored
      Patch which prepare ability to select memory allocator.
      Changed tuple allocation functions to templates, with
      parameterized by the memory allocator type.
      Part of #5419
      94e137bc
    • mechanik20051988's avatar
      memtx: convert some *.c files to *.cc · 02f71754
      mechanik20051988 authored
      In the patch with the choice of allocator for memtx, it was
      decided to use templates, so we need to change all associated
      files from *.c to *.cc. At the same time, changes were made to
      ensure c++ compilation: added explicit type casting, refactoring
      code with goto, which crosses variable initialization.
      Part of #5419
      02f71754
    • mechanik20051988's avatar
      memtx: replace direct function calls to calls via pointers from vtab · 801c906d
      mechanik20051988 authored
      Previously in memtx space direct memtx_tuple_new/memtx_tuple_delete
      function calls were used. Also pointers to functions, used for alloc/free
      memory for memtx tuples are stored in tuple_format_vtab. Replaced direct
      memtx_tuple_new and memtx_tuple_delete function calls in memtx_space to
      calls via pointers from vtab.
      Part of #5419
      801c906d
    • mechanik20051988's avatar
      memtx: move delayed tuples deletion from small allocator to memtx · 2dd126fb
      mechanik20051988 authored
      Delayed free mode in small allocator is only used to free up tuple
      memory, during snapshot creation. This is not directly related to
      the small allocator itself, so moved this code to tarantool in
      memtx_engine.c, where tuples memory allocation/deallocation occurs.
      2dd126fb
    • mechanik20051988's avatar
      memtx: fix memory allocation errors for distributed size tuples · fcfa6bcf
      mechanik20051988 authored
      Changed small allocator strategy. In previous version small allocate
      memory from the most appropriate pool. This strategy has one significant
      disadvantage: during memory allocation for tuples with highly distributed
      sizes, we allocate an entire whole slab for one or two objects, moreover,
      when they are further deleted, slab is not released (see spare slab in
      mempool). Now we divide mempools with the same slab size into groups
      containing no more than 32 pools. First, we allocate memory from mempool
      with the largest size in group, then when memory waste for a certain size
      of objects as a result of a non-optimal pool selection became larger then
      slab size / 4, we start allocating memory for them from the most suitable
      mempool. At the same time, for other objects, we can use both of these
      mempools, in case if new mempool has a larger objsize. With this strategy,
      we avoid losing memory.
      Also change allocator behaviour in the matter of saving spare slab. With new
      allocator strategy we don't need to save spare slab for all mempools, we need
      to save it only for the last mempool in group. This strategy solves both
      problems - there is no unnecessary memory loss on the spare slabs and we
      prevent oscillations when single object is repeatedly allocated.
      
      Closes #3633
      fcfa6bcf
  2. Aug 17, 2021
    • Vladimir Davydov's avatar
      test: fix box-py/args.test.py for new release policy · 7dfad83f
      Vladimir Davydov authored
      The test assumes that a version string looks like this
      2.9.0-123-gabcabcababc. We want to append a flow string
      after <major>.<minor>.<patch>. Fix the test accordingly.
      
      Needed for #6183
      7dfad83f
    • Igor Munkin's avatar
      luajit: bump new version · ba5398c7
      Igor Munkin authored
      * Fix bytecode register allocation for comparisons.
      * gdb: support LJ_DUALNUM mode
      
      Closes #6224
      Closes #6227
      Part of #5629
      Unverified
      ba5398c7
    • Serge Petrenko's avatar
      box: allow upgrading from version 1.6 · 9d9e9289
      Serge Petrenko authored
      Direct upgrade support from pre-1.7.5 versions was removed in commit
      7d3b80e7
      (Forbid upgrade from Tarantool < 1.7.5 and refactor upgrade.lua)
      The reason for that was the mandatory space format checks introduced
      back then. With these space format checks, old schema couldn't be
      recovered on new Tarantool versions, because newer versions had
      different system space formats. So old schema couldn't be upgraded
      because it couldn't even be recovered.
      
      Actually this was rather inconvenient. One had to perform an extra
      upgrade step when upgrading from, say, 1.6 to 2.x: instead of
      performing a direct upgrade one had to do 1.6 -> 1.10 -> 2.x upgrade
      which takes twice the time.
      
      Make it possible to boot from snapshots coming from Tarantool version
      1.6.8 and above.
      
      In order to do so, introduce before_replace triggers on system spaces,
      which work during snapshot/xlog recovery. The triggers will set tuple
      formats to the ones supported by current Tarantool (2.x). This way the
      recovered data will have the correct format for a usual schema upgrade.
      
      Also add upgrade_to_1_7_5() handler, which finishes transformation of
      old schema to 1.7.5. The handler is fired together with other
      box.schema.upgrade() handlers, so there's no user-visible behaviour
      change.
      
      Side note: it would be great to use the same technique to allow booting
      from pre-1.6.8 snapshots. Unfortunately, this is not possible.
      
      Current triggers don't break the order of schema upgrades, so 1.7.1
      upgrades come before 1.7.2 and 1.7.5. This is because all the upgrades
      in these versions are replacing existing tuples and not inserting new
      ones, so the upgrades may be handled by the before_replace triggers.
      
      Upgrade to 1.6.8 requires inserting new tuples: creating sysviews, like
      _vspace, _vuser and so on. This can't be done from the before_replace
      triggers, so we would have to run triggers for 1.7.x first which would
      allow Tarantool to recover the snapshot, and then run an upgrade handler for
      1.6.8. This looks really messy.
      
      Closes #5894
      9d9e9289
    • Serge Petrenko's avatar
      lua: introduce table.equals method · 0afe1f78
      Serge Petrenko authored
      Introduce table.equals for comparing tables.
      The method respects __eq metamethod, if provided.
      
      Needed-for #5894
      
      @TarantoolBot document
      Title: lua: new method table.equals
      
      Document the new lua method table.equals
      It compares two tables deeply. For example:
      ```
      tarantool> t1 = {a=3}
      ---
      ...
      
      tarantool> t2 = {a=3}
      ---
      ...
      
      tarantool> t1 == t2
      ---
      - false
      ...
      
      tarantool> table.equals(t1, t2)
      ---
      - true
      ...
      ```
      The method respects the __eq metamethod. When both tables being compared
      have the same __eq metamethod, it's used for comparison (just like this
      is done in Lua 5.1)
      0afe1f78
    • Serge Petrenko's avatar
      replication: fix flaky election_basic test · fc3e6986
      Serge Petrenko authored
      Found the following error in our CI:
      
       Test failed! Result content mismatch:
       --- replication/election_basic.result	Fri Aug 13 13:50:26 2021
       +++ /build/usr/src/debug/tarantool-2.9.0.276/test/var/rejects/replication/election_basic.reject	Sat Aug 14 08:14:17 2021
       @@ -116,6 +116,7 @@
         | ...
        box.ctl.demote()
         | ---
       + | - error: box.ctl.demote does not support simultaneous invocations
         | ...
        --
      
      Even though box.ctl.demote() or box.ctl.promote() isn't called above the
      failing line, promote() is issued internally once the instance becomes
      the leader.
      
      Wait until previous promote is finished
      (i.e. box.info.synchro.queue.owner is set)
      fc3e6986
    • Serge Petrenko's avatar
      applier: fix upstream.lag calculations · 884e93ad
      Serge Petrenko authored
      upstream.lag is the delta between the moment when a row was written to
      master's journal and the moment when it was received by the replica.
      It's an important metric to check whether the replica has fallen too far
      behind master.
      
      Not all the rows coming from master have a valid time of creation. For
      example, RAFT system messages don't have one, and we can't assign
      correct time to them: these messages do not originate from the journal,
      and assigning current time to them would lead to jumps in upstream.lag
      results.
      
      Stop updating upstream.lag for rows which don't have creation time
      assigned.
      
      The upstream.lag calculation changes were meant to fix the flaky
      replication/errinj.test:
      
       Test failed! Result content mismatch:
       --- replication/errinj.result	Fri Aug 13 15:15:35 2021
       +++ /tmp/tnt/rejects/replication/errinj.reject	Fri Aug 13 15:40:39 2021
       @@ -310,7 +310,7 @@
        ...
        box.info.replication[1].upstream.lag < 1
        ---
       -- true
       +- false
        ...
      
      But the changes were not enough, because now the test
      may see the initial lag value (TIMEOUT_INFINITY).
      So fix the test as well by waiting until upstream.lag becomes < 1.
      884e93ad
  3. Aug 16, 2021
    • Vladimir Davydov's avatar
      net.box: allow to store user-defined fields in future object · 7ffae819
      Vladimir Davydov authored
      Before commit 954194a1 ("net.box:
      rewrite request implementation in C"), net.box future was a plain Lua
      table so that the caller could attach extra information to it. Now it
      isn't true anymore - a future is a userdata object, and it doesn't have
      indexing methods.
      
      For backward compatibility, let's add __index and __newindex fields and
      store user-defined fields in a Lua table, which is created lazily on the
      first __newindex invocation. __index falls back on the metatable methods
      if a field isn't found in the table.
      
      Follow-up #6241
      Closes #6306
      7ffae819
    • Vladimir Davydov's avatar
      net.box: do not yield in future.wait_result(0) · 7b4eb172
      Vladimir Davydov authored
      It didn't yield before commit 954194a1 ("net.box: rewrite request
      implementation in C"). It shouldn't yield now.
      
      Follow-up #6241
      7b4eb172
    • Nikita Pettik's avatar
      txm: disallow yields after DDL operation in TX · 8f4be322
      Nikita Pettik authored
      To avoid sharing (ergo phantom reads) metadata object for different
      transactions in MVCC mode, let's do following things.
      Firstly, let's set on replace trigger on all system spaces (content's change in
      system space is considered to be DDL operation) which disables yields until
      transaction is committed. The only exceptions are index build and space format
      check: during these operations yields are allowed since they may take a while
      (so without yields they block execution). Actually it is not a problem 'cause
      these two operations must be first-in-transaction: as a result transaction
      can't contain two yielding statements. So after any cache modification no
      yields take place for sure.
      Secondly, on committing transaction that provides DDL changes let's abort all
      other transaction since they may refer to obsolete schema objects. The last
      restriction may seem too strict, but it is OK as primitive workaround until
      transactional DDL is introduced. In fact we should only abort transactions that
      have read dirty (i.e. modified) objects.
      
      Closes #5998
      Closes #6140
      Workaround for #6138
      8f4be322
  4. Aug 14, 2021
    • Aleksandr Lyapunov's avatar
      txm: add one more test · d551a758
      Aleksandr Lyapunov authored
      It seem that the issue was fixes in one of previous commits.
      Just add the test. No logical changes.
      
      Closes #5801
      d551a758
    • Aleksandr Lyapunov's avatar
      txm: check memtx_tx_handle_gap_write return code · 3b59f4eb
      Aleksandr Lyapunov authored
      The return code was not checked and thus in case of memory error
      we could loose conflicts. Fix it.
      
      Follow up #6040
      3b59f4eb
    • Aleksandr Lyapunov's avatar
      txm: track duplicated tuples · 2cafa623
      Aleksandr Lyapunov authored
      There was a bug when a transaction makes a wrong statement that is
      aborted because of duplicate tuple in primary or secondary index.
      The problem is that check of existing tuple is an implicit read
      that has usual side effect.
      
      This patch tracks that kind of reads like ordinal reads.
      
      Part of #5999
      2cafa623
    • Aleksandr Lyapunov's avatar
      txm: track read instead of direct conflict in clarify · b7065d33
      Aleksandr Lyapunov authored
      After the previous patch it became possible to link read trackers
      to in-progress stories.
      
      This patch use one read tracker instead of bunch of direct
      conflicts in tuple_clarify. This is a bit accurate. Is also allows
      to avoid unnecessary conflict when a transaction reads its own
      change.
      
      Part of #5999
      b7065d33
    • Aleksandr Lyapunov's avatar
      txm: use read trackers instead of direct conflict · 32911677
      Aleksandr Lyapunov authored
      Before this patch when a transaction has performed a write to
      read gap (interval) a conflict record has beed created for the
      reader of this gaps. That is wrong since the next writer of the
      same value will not find a gap - the gap has been splitted into
      parts.
      
      This patch fixes that and create a special read tracker that was
      designed specially for further tracking of writes.
      
      This also requires writer to search for read trackers not only
      in prepared stories but also in in-progress stories too.
      
      Part of #5999
      32911677
    • Aleksandr Lyapunov's avatar
      txm: make replace in indexes less dependent · cc24a1d3
      Aleksandr Lyapunov authored
      There was a obvious bug in transactinal manager's GC.
      
      There can be stories about deleted tuples. In other word tuples
      were deleted, but their story remains for history for some time.
      That means that pointers to dirty tuples are left in indexes,
      while the stories say that that tuples are deleted.
      
      When GC comes, it must remove pointer to tuple from indexes too.
      That is simple to check - if a story is on top of chain - it must
      be in index, and if it is a story about deleted tuple - it must be
      removed from index. But also that story must be unliked from chain,
      and the next story becomes the top on chain, but (1) in turn it
      must not try to delete its tuple from index - we have already done
      it, deleting the first tuple. For this purpose we mark the next
      story with space = NULL.
      
      The problem is that setting space = NULL work for every index at
      once, while sometimes we have to hande each index independently.
      
      Fortunately the previous commit introduced in_index member of
      story's link, NULL by default. We can just leave that NULL in
      older story as a mark that is not in index. This commit makes so
      and fixes the bug.
      
      Closes #6234
      cc24a1d3
    • Aleksandr Lyapunov's avatar
      txm: store pointers to indexes in story · 914a36da
      Aleksandr Lyapunov authored
      There was a tricky problem in TX manager that could lead to a
      crash after deletion of a space.
      
      When a space is deleted, TX manager uses a special callback to
      remove dirty tuples from indexes. It is necessary for correct
      destruction of space and indexes.
      
      The problem is that actual space drop works in several steps,
      deletings secondary indexes and then deleting primary indexes.
      Each step is an independend alter. And alters are tricky.
      
      For example we had a struct space instance, namely S1, with
      two indexes I1 and I2. At the first step we have to delete the
      second index. By design, for that purpose a new instance of
      space is created, namely S2, with one empty index I3. Then the
      spaces exchanges their indexes, and S1 becomes with I3 and I2,
      and S2 owns I1. After that S1 is deleted. That is good until we
      try to make story cleanup - all the dirty tuples remain in S2.I1,
      while we try to clean empty S1.I3.
      
      The only way to fix it - story index pointer right in story to
      make sure we are cleaning the right index.
      
      Part of #6234
      Closes #6274
      914a36da
    • Egor Elchinov's avatar
      txm: fix iterators for hash index · 6571afcb
      Egor Elchinov authored
      MVCC used not to track hash index writes.
      This patch fixes this problem by transferring the readers which use
      `ITER_ALL` or `ITER_GT` iterators of hash index to read view after
      any subsequent external write to this index.
      
      Closes #6040
      6571afcb
    • Aleksandr Lyapunov's avatar
      txm: track read more carefully · 9db816b1
      Aleksandr Lyapunov authored
      The previous commit fixed a bug that caused dirty read but also
      introduced a much less significat problem - excess conflict in
      some cases.
      
      Usually if a reader reads a tuple - in its story aspecial record
      is stored. Any write that replaces or deletes that tuple can now
      cause conflict of current transaction.
      
      The problem happened when a reader tries to execute select from
      some index, but only deleted story is found there. The record is
      stored and that is good - we must know when somebody will insert
      a tuple to this place in index. But actually we need to know it
      only for the index from which the reader executed select.
      
      This patch introduces a special index mask in read tracker that is
      used in the case above to be more precise in conflict detection.
      
      Closes #6206
      9db816b1
    • Aleksandr Lyapunov's avatar
      txm: track deleted stories · dade56ac
      Aleksandr Lyapunov authored
      In order to preserve repeated reads transactional manager tracks
      read of each transactions. Generally reads can be of two types -
      those that have read a tuple or that have found nothing. The first
      are stored in tuple story, the second - in special gap and hole
      structures.
      
      The problem was that reads that found a dirty tuple that was
      invisible to this transaction (the story says that it is deleted)
      was not stored neither in story nor in gap/holes.
      
      This patch fixes that.
      
      Part of #6206
      dade56ac
    • Aleksandr Lyapunov's avatar
      txm: avoid excess conflict while reading gaps · 9d42ad47
      Aleksandr Lyapunov authored
      During iteration a memtx tree index must write gap records to TX
      manager. It is done in order to detect the further writes to that
      gaps and execute some logic preventing phantom reads.
      
      There are two cases when that gap is stores:
       * Iterator reads the next tuple, the gap is between two tuples.
       * Iterator finished reading, the gap is between the previous
      tuple and the key boundary.
      
      By a mistake these two cases were not distinguished correctly and
      that led to excess conflicts.
      
      This patch fixes it.
      
      Part of #6206
      9d42ad47
    • Aleksandr Lyapunov's avatar
      txm: simplify construction of tx_read_tracker · b6fab015
      Aleksandr Lyapunov authored
      Just add a function that allocates and initializes the structure.
      No logical changes.
      
      Part of #6206
      b6fab015
    • Aleksandr Lyapunov's avatar
      txm: split memtx_tx_track_read method into two parts · 2bf4484f
      Aleksandr Lyapunov authored
      No logical changes, only for the next commit simplification
      
      Part of #6206
      2bf4484f
    • Aleksandr Lyapunov's avatar
      txm: simplify code with check_dup_common function · eebf7ba0
      Aleksandr Lyapunov authored
      Implement check_dup_common function that calls either
      check_dup_clean or check_dup_dirty.
      
      No logical changes.
      
      Follow up #6132
      eebf7ba0
Loading