Skip to content
Snippets Groups Projects
  1. Dec 11, 2019
  2. Dec 10, 2019
    • Vladislav Shpilevoy's avatar
      errinj: provide 'get' method in Lua · c3c6d3fc
      Vladislav Shpilevoy authored
      Error injections are used to simulate an error. They are
      represented as a flag, or a number, and are used in Lua tests. But
      they don't have any feedback. That makes impossible to use the
      injections to check that something has happened. Something very
      needed to be checked, and impossible to check in a different way.
      
      More certainly, the patch is motivated by a necessity to count
      loaded dynamic libraries to ensure, that they are loaded and
      unloaded when expected. This is impossible to do in a platform
      independent way. But an error injection as a debug-only counter
      would solve the problem.
      
      Needed for #4648
      c3c6d3fc
  3. Oct 21, 2019
    • Ilya Kosarev's avatar
      recovery: build secondary index in hot standby mode · 5aa243de
      Ilya Kosarev authored
      End recovery (which means building secondary indexes) just after
      last known log file was read. This allows fast switch to hot standby
      instance without any delay for secondary index to be built.
      Due to engine_end_recovery carryover, xdir_collect_inprogress,
      previously being called from it, is now moved to garbage collector.
      
      Closes #4135
      5aa243de
  4. Sep 13, 2019
    • Vladimir Davydov's avatar
      relay: join new replicas off read view · 6332aca6
      Vladimir Davydov authored
      Historically, we join a new replica off the last checkpoint. As a
      result, we must always keep the last memtx snapshot and all vinyl data
      files corresponding to it. Actually, there's no need to use the last
      checkpoint for joining a replica. Instead we can use the current read
      view as both memtx and vinyl support it. This should speed up the
      process of joining a new replica, because we don't need to replay all
      xlogs written after the last checkpoint, only those that are accumulated
      while we are relaying the current read view. This should also allow us
      to avoid creating a snapshot file on bootstrap, because the only reason
      why we need it is allowing joining replicas. Besides, this is a step
      towards decoupling the vinyl metadata log from checkpointing in
      particular and from xlogs in general.
      
      Closes #1271
      6332aca6
  5. Aug 20, 2019
    • Kirill Shcherbatov's avatar
      sql: GREATEST, LEAST instead of MIN/MAX overload · a46b5200
      Kirill Shcherbatov authored
      This patch does two things: renames existing scalar min/max
      functions and reserves names for them in NoSQL cache.
      
      Moreover it is an important step to get rid of function's name
      overloading required for replace FuncDef cache with Tarantool's
      function cache.
      
      Closes #4405
      Needed for #2200, #4113, #2233
      
      @TarantoolBot document
      Title: Scalar functions MIN/MAX are renamed to LEAST/GREATEST
      
      The MIN/MAX functions are typically used only as aggregate
      functions in other RDBMS(MSSQL, Postgress, MySQL, Oracle) while
      Tarantool's SQLite legacy code use them also in meaning
      GREATEST/LEAST scalar function. Now it fixed.
      a46b5200
  6. Aug 14, 2019
    • Vladimir Davydov's avatar
      wal: make wal_sync fail on write error · 2d5e56ff
      Vladimir Davydov authored
      wal_sync() simply flushes the tx<->wal request queue, it doesn't
      guarantee that all pending writes are successfully committed to disk.
      This works for now, but in order to implement replica join off the
      current read view, we need to make sure that all pending writes have
      been persisted and won't be rolled back before we can use memtx
      snapshot iterators. So this patch adds a return code to wal_sync():
      since now on it returns -1 if rollback is in progress and hence
      some in-memory changes are going to be rolled back. We will use
      this method after opening memtx snapshot iterators used for feeding
      a consistent read view a newly joined replica so as to ensure that
      changes frozen by the iterators have made it to the disk.
      2d5e56ff
  7. Jul 05, 2019
    • Vladislav Shpilevoy's avatar
      test: redo some swim tests using error injections · a0d6ac29
      Vladislav Shpilevoy authored
      There were tests relying on certain content of SWIM messages.
      After next patches these conditions won't work without an
      explicit intervention with error injections.
      
      The patchset moves these tests to separate release-disabled
      files.
      
      Part of #4253
      a0d6ac29
  8. Jul 04, 2019
  9. Jun 11, 2019
  10. Jun 04, 2019
  11. May 30, 2019
    • Serge Petrenko's avatar
      test: move background index build test to engine suite from vinyl · 39d0e427
      Serge Petrenko authored
      Since we have implemented memtx background index build, the
      corresponding vinyl test cases are now also suitable for memtx,
      so move them to engine suite so that both engines are tested.
      Also add some tests to check that an ongoing index build is aborted in
      case a tuple violating unique constraint or format of the new index is
      inserted.
      Add some error injections to unify appropriate memtx/vinyl tests.
      
      Closes #3976
      39d0e427
  12. May 07, 2019
    • Vladimir Davydov's avatar
      box: zap field_map_get_size · 55e1a140
      Vladimir Davydov authored
      Turns out we don't really need it as we can use data_offset + bsize
      (i.e. the value returned by tuple_size() helper function) to get the
      size of a tuple to free. We only need to take into account the offset
      of the base tuple struct in the derived struct (memtx_tuple).
      
      There's a catch though:
      
       - We use sizeof(struct memtx_tuple) + field_map_size + bsize for
         allocation size.
       - We set data_offset to sizeof(struct tuple) + field_map_size.
       - struct tuple is packed, which makes its size 10 bytes.
       - memtx_tuple embeds struct tuple (base) at 4 byte offset, but since
         it is not packed, its size is 16 bytes, NOT 4 + 10 = 14 bytes as
         one might expect!
       - This means data_offset + bsize + offsetof(struct memtx_tuple, base)
         doesn't equal allocation size.
      
      To fix that, let's mark memtx_tuple packed. The only side effect it has
      is that we save 2 bytes per each memtx tuple. It won't affect tuple data
      layout at all, because struct memtx_tuple already has a packed layout
      and so 'packed' will only affect its size, which is only used for
      computing allocation size.
      
      My bad I overlooked it during review.
      
      Follow-up f1d9f257 ("box: introduce multikey indexes in memtx").
      55e1a140
  13. Apr 29, 2019
  14. Apr 16, 2019
    • Vladimir Davydov's avatar
      vinyl: fix crash during index build · ccd46a27
      Vladimir Davydov authored
      To propagate changes applied to a space while a new index is being
      built, we install an on_replace trigger. In case the on_replace
      trigger callback fails, we abort the DDL operation.
      
      The problem is the trigger may yield, e.g. to check the unique
      constraint of the new index. This opens a time window for the DDL
      operation to complete and clear the trigger. If this happens, the
      trigger will try to access the outdated build context and crash:
      
       | #0  0x558f29cdfbc7 in print_backtrace+9
       | #1  0x558f29bd37db in _ZL12sig_fatal_cbiP9siginfo_tPv+1e7
       | #2  0x7fe24e4ab0e0 in __restore_rt+0
       | #3  0x558f29bfe036 in error_unref+1a
       | #4  0x558f29bfe0d1 in diag_clear+27
       | #5  0x558f29bfe133 in diag_move+1c
       | #6  0x558f29c0a4e2 in vy_build_on_replace+236
       | #7  0x558f29cf3554 in trigger_run+7a
       | #8  0x558f29c7b494 in txn_commit_stmt+125
       | #9  0x558f29c7e22c in box_process_rw+ec
       | #10 0x558f29c81743 in box_process1+8b
       | #11 0x558f29c81d5c in box_upsert+c4
       | #12 0x558f29caf110 in lbox_upsert+131
       | #13 0x558f29cfed97 in lj_BC_FUNCC+34
       | #14 0x558f29d104a4 in lua_pcall+34
       | #15 0x558f29cc7b09 in luaT_call+29
       | #16 0x558f29cc1de5 in lua_fiber_run_f+74
       | #17 0x558f29bd30d8 in _ZL16fiber_cxx_invokePFiP13__va_list_tagES0_+1e
       | #18 0x558f29cdca33 in fiber_loop+41
       | #19 0x558f29e4e8cd in coro_init+4c
      
      To fix this issue, let's recall that when a DDL operation completes,
      all pending transactions that affect the altered space are aborted by
      the space_invalidate callback. So to avoid the crash, we just need to
      bail out early from the on_replace trigger callback if we detect that
      the current transaction has been aborted.
      
      Closes #4152
      ccd46a27
  15. Apr 10, 2019
    • Vladimir Davydov's avatar
      test: fix vinyl/errinj_ddl failure · f41d2999
      Vladimir Davydov authored
      The test fixes the following two test failures:
      
       | --- vinyl/errinj_ddl.result	Tue Mar 19 17:52:48 2019
       | +++ vinyl/errinj_ddl.reject	Tue Mar 19 19:05:36 2019
       | @@ -358,7 +358,7 @@
       | ...
       | s.index.sk:stat().memory.rows
       | ---
       | -- 27
       | +- 23
       | ...
       | test_run:cmd('restart server default')
       | fiber = require('fiber')
      
      This happens, because creation of the test index can happen later than
      we expect. Fix it by adding an appropriate wait_cond.
      
       | --- vinyl/errinj_ddl.result	Tue Mar 19 17:52:48 2019
       | +++ vinyl/errinj_ddl.reject	Tue Mar 19 18:07:55 2019
       | @@ -504,6 +504,7 @@
       | ...
       | _ = s1:create_index('sk', {parts = {2, 'unsigned'}})
       | ---
       | +- error: Tuple field 2 required by space format is missing
       | ...
       | errinj.set("ERRINJ_VY_READ_PAGE_TIMEOUT", 0)
       | ---
      
      This one is due to a test transaction completing before DDL starts so
      that the transaction isn't aborted by DDL, as we expect. Fix it by
      making sure the transaction won't commit before DDL starts, again with
      the aid of wait_cond.
      
       | --- vinyl/errinj_ddl.result     Wed Apr 10 18:59:57 2019
       | +++ vinyl/errinj_ddl.reject     Wed Apr 10 19:05:35 2019
       | @@ -779,7 +779,7 @@
       |  ...
       |  ch1:get()
       |  ---
       | -- Transaction has been aborted by conflict
       | +- Duplicate key exists in unique index 'i1' in space 'test'
       |  ...
       |  ch2:get()
       |  ---
      
      This test case fails, because we use a timeout to stall reading DML
      operations. This was initially a bad call, because under severe load
      (e.g. parallel test run), the timeout may fire before we get to execute
      the DDL request, which is supposed to abort the DML operations, in which
      case they won't be aborted. Fix this by replacing the timeout with a
      delay, as we should have done right from the start.
      
      Closes #4056
      Closes #4057
      f41d2999
  16. Mar 27, 2019
  17. Mar 26, 2019
    • Serge Petrenko's avatar
      test: fix long_row_timeout.test.lua failure in parallel mode · 17acae1f
      Serge Petrenko authored
      The test used to write big rows (20 mb in size), so when run in parallel
      mode, it put high load on the disk and processor, which made appliers
      time out multiple times during read, and caused the test to fail
      occasionally.
      So, instead of writing huge rows in test, introduce a new error
      injection restricting sio from reading more than a couple of bytes per
      request. This ensures that the test is still relevant and makes it a lot
      more lightweight.
      
      Closes #4062
      17acae1f
  18. Feb 08, 2019
    • Georgy Kirichenko's avatar
      wal: do not promote wal vclock for failed writes · 066b929b
      Georgy Kirichenko authored
      Wal used to promote vclock prior to write the row. This lead to a
      situation when master's row would be skipped forever in case there is
      an error trying to write it. However, some errors are transient, and we
      might be able to successfully apply the same row later. So we do not
      promote writer vclock in order to be able to restart replication from
      failing point.
      
      Obsoletes xlog/panic_on_lsn_gap.test.
      
      Needed for #2283
      066b929b
  19. Jan 25, 2019
    • Kirill Yukhin's avatar
      Allow to reuse tuple_formats for ephemeral spaces · dbbd9317
      Kirill Yukhin authored
      Since under heavy load with SQL queries ephemeral
      spaces might be extensively used it is possible to run out
      of tuple_formats for such spaces. This occurs because
      tuple_format is not immediately deleted when ephemeral space is
      dropped. Its removel is postponed instead and triggered only
      when tuple memory is exhausted.
      As far as there's no way to alter ephemeral space's format,
      let's re-use them for multiple epehemral spaces in case
      they're identical.
      
      Closes #3924
      dbbd9317
  20. Dec 06, 2018
    • Sergei Voronezhskii's avatar
      test: errinj for pause relay_send · 1c34c91f
      Sergei Voronezhskii authored
      Instead of using timeout we need just pause `relay_send`. Can't rely
      on timeout because of various system load in parallel mode. Add new
      errinj which checks boolean in loop and until it is not `True` do not
      pass the method `relay_send` to the next statement.
      
      To check the read-only mode, need to make a modification of tuple. It
      is enough to call `replace` method. Instead of `delete` and then
      useless verification that we have not delete tuple by using `get`
      method.
      
      And lookup the xlog files in loop with a little sleep, until the file
      count is not as expected.
      
      Update box/errinj.result because new errinj was added.
      
      Part of #2436, #3232
      1c34c91f
  21. Nov 29, 2018
    • Vladimir Davydov's avatar
      test: fix vinyl/errinj spurious failure · 8e13153b
      Vladimir Davydov authored
      The failing test case checks that modifications done to the space during
      the final dump of a newly built index are recovered properly. It assumes
      that a series of operations will complete in 0.1 seconds, but it may not
      happen if the disk is slow (like on Travis CI). This results in spurious
      failures. To fix this issue, let's replace ERRINJ_VY_RUN_WRITE_TIMEOUT
      used by the test with ERRINJ_VY_RUN_WRITE_DELAY, which blocks index
      creation until it is disabled instead of injecting a time delay as its
      predecessor did.
      
      Closes #3756
      8e13153b
  22. Oct 25, 2018
    • Vladimir Davydov's avatar
      wal: delete old wal files when running out of disk space · 8a1bdc82
      Vladimir Davydov authored
      Now if the WAL thread fails to preallocate disk space needed to commit
      a transaction, it will delete old WAL files until it succeeds or it
      deletes all files that are not needed for local recovery from the oldest
      checkpoint. After it deletes a file, it notifies the garbage collector
      via the WAL watcher interface. The latter then deactivates consumers
      that would need deleted files.
      
      The user doesn't see a ENOSPC error if the WAL thread successfully
      allocates disk space after deleting old files. Here's what's printed
      to the log when this happens:
      
        wal/101/main C> ran out of disk space, try to delete old WAL files
        wal/101/main I> removed /home/vlad/src/tarantool/test/var/001_replication/master/00000000000000000005.xlog
        wal/101/main I> removed /home/vlad/src/tarantool/test/var/001_replication/master/00000000000000000006.xlog
        wal/101/main I> removed /home/vlad/src/tarantool/test/var/001_replication/master/00000000000000000007.xlog
        main/105/main C> deactivated WAL consumer replica 82d0fa3f-6881-4bc5-a2c0-a0f5dcf80120 at {1: 5}
        main/105/main C> deactivated WAL consumer replica 98dce0a8-1213-4824-b31e-c7e3c4eaf437 at {1: 7}
      
      Closes #3397
      8a1bdc82
  23. Sep 22, 2018
  24. Sep 20, 2018
    • Serge Petrenko's avatar
      test: remove universal grants from tests · af6b554b
      Serge Petrenko authored
      This patch rewrites all tests to grant only necessary privileges, not
      privileges to universe. This was made possible by bugfixes in access
      control, patches #3516, #3574, #3524, #3530.
      
      Follow-up #3530
      af6b554b
  25. Sep 19, 2018
    • Vladimir Davydov's avatar
      vinyl: keep track of compaction queue length · 06e70cad
      Vladimir Davydov authored
      Currently, there's no way to figure out whether compaction keeps up
      with dumps or not while this is essential for implementing transaction
      throttling. This patch adds a metric that is supposed to help answer
      this question. This is the compaction queue size. It is calculated per
      range and per LSM tree as the total size of slices awaiting compaction.
      We update the metric along with the compaction priority of a range, in
      vy_range_update_compact_priority(), and account it to an LSM tree in
      vy_lsm_acct_range(). For now, the new metric is reported only on per
      index basis, in index.stat() under disk.compact.queue.
      06e70cad
  26. Aug 29, 2018
  27. Aug 08, 2018
    • Mergen Imeev's avatar
      test: fix box/errinj.test.lua sporadic failure · 8c06a069
      Mergen Imeev authored
      In some cases operation box.snapshot() takes longer than expected.
      This leads to situations when the previous error is reported instead
      of the new one. Now these errors completely separated.
      
      Closes #3599
      8c06a069
  28. Jul 30, 2018
    • Vladimir Davydov's avatar
      vinyl: implement rebootstrap support · 06658416
      Vladimir Davydov authored
      If vy_log_bootstrap() finds a vylog file in the vinyl directory, it
      assumes it has to be rebootstrapped and calls vy_log_rebootstrap().
      The latter scans the old vylog file to find the max vinyl object id,
      from which it will start numbering objects created during rebootstrap to
      avoid conflicts with old objects, then it writes VY_LOG_REBOOTSTRAP
      record to the old vylog to denote the beginning of a rebootstrap
      section. After that initial join proceeds as usual, writing information
      about new objects to the old vylog file after VY_LOG_REBOOTSTRAP marker.
      Upon successful rebootstrap completion, checkpoint, which is always
      called right after bootstrap, rotates the old vylog and marks all
      objects created before the VY_LOG_REBOOTSTRAP marker as dropped in the
      new vylog. The old objects will be purged by the garbage collector as
      usual.
      
      In case rebootstrap fails and checkpoint never happens, local recovery
      writes VY_LOG_ABORT_REBOOTSTRAP record to the vylog. This marker
      indicates that the rebootstrap attempt failed and all objects created
      during rebootstrap should be discarded. They will be purged by the
      garbage collector on checkpoint. Thus even if rebootstrap fails, it is
      possible to recover the database to the state that existed right before
      a failed rebootstrap attempt.
      
      Closes #461
      06658416
  29. Jun 28, 2018
  30. Jun 14, 2018
  31. Jun 01, 2018
    • Vladimir Davydov's avatar
      vinyl: fix compaction vs checkpoint race resulting in invalid gc · b25e3168
      Vladimir Davydov authored
      The callback invoked upon compaction completion uses checkpoint_last()
      to determine whether compacted runs may be deleted: if the max LSN
      stored in a compacted run (run->dump_lsn) is greater than the LSN of the
      last checkpoint (gc_lsn) then the run doesn't belong to the last
      checkpoint and hence is safe to delete, see commit 35db70fa ("vinyl:
      remove runs not referenced by any checkpoint immediately").
      
      The problem is checkpoint_last() isn't synced with vylog rotation - it
      returns the signature of the last successfully created memtx snapshot
      and is updated in memtx_engine_commit_checkpoint() after vylog is
      rotated. If a compaction task completes after vylog is rotated but
      before snap file is renamed, it will assume that compacted runs do not
      belong to the last checkpoint, although they do (as they have been
      appended to the rotated vylog), and delete them.
      
      To eliminate this race, let's use vylog signature instead of snap
      signature in vy_task_compact_complete().
      
      Closes #3437
      b25e3168
  32. May 31, 2018
    • Vladimir Davydov's avatar
      vinyl: fix false-positive assertion at exit · ff02157f
      Vladimir Davydov authored
      latch_destroy() and fiber_cond_destroy() are basically no-op. All they
      do is check that latch/cond is not used. When a global latch or cond
      object is destroyed at exit, it may still have users and this is OK as
      we don't stop fibers at exit. In vinyl this results in the following
      false-positive assertion failures:
      
        src/latch.h:81: latch_destroy: Assertion `l->owner == NULL' failed.
      
        src/fiber_cond.c:49: fiber_cond_destroy: Assertion `rlist_empty(&c->waiters)' failed.
      
      Remove "destruction" of vy_log::latch to suppress the first one. Wake up
      all fibers waiting on vy_quota::cond before destruction to suppress the
      second one. Add some test cases.
      
      Closes #3412
      ff02157f
  33. May 25, 2018
  34. May 21, 2018
    • Vladimir Davydov's avatar
      memtx: free tuples asynchronously when primary index is dropped · 2a1482f3
      Vladimir Davydov authored
      When a memtx space is dropped or truncated, we have to unreference all
      tuples stored in it. Currently, we do it synchronously, thus blocking
      the tx thread. If a space is big, tx thread may remain blocked for
      several seconds, which is unacceptable. This patch makes drop/truncate
      hand actual work to a background fiber.
      
      Before this patch, drop of a space with 10M 64-byte records took more
      than 0.5 seconds. After this patch, it takes less than 1 millisecond.
      
      Closes #3408
      2a1482f3
  35. May 08, 2018
    • Vladislav Shpilevoy's avatar
      iproto: fix error with unstoppable batching · 01bfa59b
      Vladislav Shpilevoy authored
      IProto connection stops to read input on reached request limit.
      But when multiple requests are in a batch, the IProto does not
      check the limit, so it can be violated.
      
      Lets check the limit during batch parsing after each message too,
      not only once before parsing.
      01bfa59b
  36. Apr 22, 2018
  37. Apr 10, 2018
    • Vladimir Davydov's avatar
      space: space_vtab::build_secondary_key => build_index · 4cbfdede
      Vladimir Davydov authored
      The build_secondary_key method of space vtab is used not only for
      building secondary indexes, but also for rebuilding primary indexes.
      To avoid confusion, let's rename it to build_index and pass to it
      the source space, the new index, and the new tuple format.
      4cbfdede
  38. Apr 07, 2018
    • Vladimir Davydov's avatar
      vinyl: use ERRINJ_DOUBLE for ERRINJ_VY_READ_PAGE_TIMEOUT · 8dc9895f
      Vladimir Davydov authored
      We use ERRINJ_DOUBLE for all other timeout injections. This makes them
      more flexible as we can inject an arbitrary timeout in tests, not just
      enable some hard-coded timeout. Besides, it makes tests easier to
      follow. So let's use ERRINJ_DOUBLE for ERRINJ_VY_READ_PAGE_TIMEOUT too.
      8dc9895f
Loading