Skip to content
Snippets Groups Projects
  1. Sep 17, 2024
    • Andrey Saranchin's avatar
      memtx: handle all statements related to the space on DDL · f7f491b3
      Andrey Saranchin authored
      When DDL happens, we remove statements of concurrent transactions
      from MVCC. When removing statements, we set their `engine_savepoint` to
      `NULL` so that they won't be rolled back because we've already handled
      them.
      
      However, we remove statements only from stories, and not all statements
      can be accessed in this way. For example, when we have several delete
      statements of one story, and one of them gets prepared, others are
      unlinked. It leads to use-after-free, but it's read-only and doesn't
      affect anything, so only ASAN can catch it. It happens when the
      statement is being rolled back in `memtx_engine_rollback_statement`:
      we check if `space->upgrade` is not `NULL` (space can be already deleted)
      but this check affects instruction flow only if `stmt->new_tuple != NULL`
      and in our case that's not so. Anyway, let's iterate over all statements
      of all transactions and remove savepoints for ones related to the space
      that is being invalidated. It takes more time, but anyway, we are doing
      DDL that is heavy, so it doesn't really matter.
      
      Along the way, the commit removes helper `memtx_tx_history_remove_stmt`
      and all its helpers because they are not needed anymore. This helper
      unlinks added story from history chain, links all its delete statements
      to the previous story, if any, unlinks the statement from related stories
      and sets `engine_savepoint` to `NULL`. Since we already do all of this
      things except for unlinking statements from stories, let's simply call
      `memtx_tx_story_unlink_added[deleted]_by` instead. This change makes the
      code much more straightforward.
      
      Closes #10146
      
      NO_DOC=bugfix
      
      (cherry picked from commit ac112b73192ad96271a02ee85dba3e9737fdaa9d)
      f7f491b3
    • Andrey Saranchin's avatar
      memtx: do not rollback prepared statements on DDL · 59e4cf61
      Andrey Saranchin authored
      Currently, when DDL is being committed, we delete all the stories and
      rollback prepared delete statements. The problem is such rollback is
      likely to fail because of assertion. When transaction is rolled back,
      all the statements are rolled back in reversed order, but when rollback
      happened because of DDL, order is not specified and some invariants are
      violated. Let's simply unlink delete statement instead of rollback.
      
      Rollback does two things: unlink delete statement and abort readers
      (including gaps) of prepared stories. The commit actually drops the
      second part - it's safe because after the previous commits we delete
      stories right after aborting all concurrent transactions so there is no
      need to abort anything anymore.
      
      Part of #10146
      Closes #10474
      
      NO_CHANGELOG=later
      NO_DOC=bugfix
      
      (cherry picked from commit 6a11224c85c1be28e7d1570cd4ba01efc033c34f)
      59e4cf61
    • Andrey Saranchin's avatar
      memtx: remove all the space stories right on DDL · 74fdd1c8
      Andrey Saranchin authored
      Currently, every DDL transaction has such properties:
      1. DDL and DML operations can be mixed inside one transaction, order is
         not restricted.
      2. DDL has an effect right after the operation, not after it's prepared
         or committed.
      3. After the first DDL operation, transaction is not allowed to yield.
      4. If DDL yields (format check, index build), operation must be the
         first in its transaction. If DDL yields, space cache is updated
         strictly after yields are over. In other words, transaction cannot
         yield after DDL changes became visible.
      
      Keeping these properties in mind, we can recognize several flows in
      the way memtx MVCC handles DDL now. Here is the approach:
      1. All transactions concurrent with DDL are aborted when DDL gets
         prepared. Memtx stories are not deleted here.
      2. If DDL gets committed, all memtx stories of the old space are
         deleted. It means that only stories created before DDL are deleted.
      
      Such design is bad. Firstly, if transaction does a DDL and than a DML,
      stories belonging to the new and the old schema will be mixed in the
      indexes. Starting a DML transaction after DDL is prepared but before
      it's committed leads to the same problem. If stories of different
      schemas are mixed, Tarantool is most likely to crash since MVCC does
      not handle this case at all. Secondly, transactions started after DDL is
      prepared but before it's committed can read stories belonging to the old
      schema. In this case, after DDL is committed and the stories are
      deleted, we cannot do proper mvcc for such transactions anymore because
      we've just deleted the tuple it has read.
      
      Here is the new approach that current commit implements:
      1. Abort all concurrent transactions and delete all memtx stories on
         every replace in space cache (right on DDL operation and on its
         rollback). In this case, every time a space and its schema is
         replaced with a new one, all the mvcc objects belonging to the old
         schema are deleted. In order not to break isolation of concurrent
         transactions, we have to abort them all right before deleting
         stories - it makes sense because we don't support old schema anymore
         and all its readers should be aborted.
      2. When DDL gets committed, nothing happens since nothing has been
         changed.
      3. Abort all concurrent transactions when DDL is being prepared. It is
         needed for DDL that does not update any space. For example, update in
         space `_cluster` or `_schema`. It worked this way before the commit
         so it is needed to provide the old behavior in such cases. Note that
         if DDL updates any space, all concurrent transactions are aborted
         right on DDL operation, when replace in space cache happens, and
         since DDL cannot yield, we have nothing to abort here so it's noop
         in such cases.
      
      Since `memtx_tx_on_space_delete` was supposed to be called on commit, it
      inserts prepared tuples to indexes. Since we want it to be called on
      actual DDL, the commit makes it insert all tuples visible to the
      transaction doing DDL.
      
      Also, if we delete all the stories on DDL, we should correctly rollback
      all non-committed DML statements - DML statements of the transaction
      doing DDL if they happened before DDL and all transactions prepared
      before DDL - if WAL write fails, we will need to roll them back without
      stories. For this purpose, firstly, let's remove `engine_savepoint` only
      for statements of aborted transactions in `memtx_tx_history_remove_stmt`
      (it is needed not to roll them back because they are already handled) so
      that statements of transaction doing DDL can be rolled back. And,
      secondly, let's add a new helper `memtx_tx_history_rollback_empty_stmt`
      that handles rollback of statements without stories (see helper's
      description for elaboration).
      
      Along the way, let's clean read lists of transactions aborted on DDL -
      they are not needed anymore and potentially keeping them can lead to
      use-after-free in future. Moreover, we should remove transactions
      aborted by DDL from `read_view_txs` list so that they won't affect
      memtx story GC until they are deleted - cleaning of transactions
      does it as well.
      
      Part of #10146
      Part of #10474
      Closes #10171
      Closes #10096
      Closes #10097
      
      NO_CHANGELOG=later
      NO_DOC=bugfix
      
      (cherry picked from commit 959027de5553078dbe442aacb52e01e9b46c542c)
      74fdd1c8
    • Andrey Saranchin's avatar
      memtx: refactor `memtx_tx_unlink_top_on_space_delete` · 584a8d9c
      Andrey Saranchin authored
      The helper consists of `unlink_top_common` and
      `unlink_top_on_space_delete_light` helpers. It would make sense if
      `unlink_top_common` would be used in any other place, but it's used only
      here actually. Let's inline this helpers - it will be easier to read
      this code and it will allow to simply patch this function in future
      commits.
      
      NO_TEST=refactoring
      NO_CHANGELOG=refactoring
      NO_DOC=refactoring
      
      (cherry picked from commit 7b2243828937413db9b1c3ffcf8abd8fd92daf02)
      584a8d9c
    • Andrey Saranchin's avatar
      memtx: fix use-after-free in mvcc on ddl · 8cbe42eb
      Andrey Saranchin authored
      When space is being altered, `memtx_tx_space_on_delete` is called - it
      deletes all the stories associated with the old schema. However, before
      deleting a story, its `reader_list` member is not unlinked from the list
      so other nodes can still access this memory. The commit fixes this
      problem and adds an assertion that checks if story is always unlinked
      from reader list when is being deleted.
      
      Part of #10146
      
      NO_CHANGELOG=later
      NO_DOC=bugfix
      
      (cherry picked from commit a32f56dfbb4b56b410ac376fce079613cac0ccb6)
      8cbe42eb
    • Andrey Saranchin's avatar
      memtx: do not use memtx_build_on_replace trigger with mvcc enabled · 74584869
      Andrey Saranchin authored
      Now background build of index uses index iterator that collects
      conflicts during iteration if MVCC is enabled. Thus, trigger
      `memtx_build_on_replace` is not needed - if someone writes to
      prefix we already scanned, it will lead to transaction conflict.
      Moreover, `memtx_ddl_state` that is needed for rollback is allocated
      on stack of function called from DDL transaction, so if conflicted
      transaction rolls back later that DDL is over (and it's possible only
      with MVCC enabled), segmentation fault will happen. So let's simply
      don't set the trigger is MVCC is enabled.
      
      Closes #10147
      
      NO_CHANGELOG=later
      NO_DOC=bugfix
      
      (cherry picked from commit 9fe60c5754cf77686404fc7ee3d24af32b6c486c)
      74584869
    • Andrey Saranchin's avatar
      memtx: unlink all delete statements of mvcc stories on space delete · 1d72b80f
      Andrey Saranchin authored
      Since one tuple can be deleted by many concurrent transactions, member
      `del_stmt` of `struct memtx_story` is actually a list. It seems we
      forgot about it when implementing `memtx_tx_on_space_delete` so the
      function unlink only one of delete statements. The commit fixes this
      mistake.
      
      Part of #10146
      
      NO_CHANGELOG=later
      NO_DOC=bugfix
      
      (cherry picked from commit 5a31551467308f26b8471a9de233b94e380f23cf)
      1d72b80f
  2. Sep 16, 2024
    • Nikolay Shirokovskiy's avatar
      box: fix crash on rollback on memtx memory OOM and massive index change · e9fc51d0
      Nikolay Shirokovskiy authored
      We cannot tolerate index extent memory allocation failure on rollback.
      At the same time it is not practical to reserve memory because a whole
      index can easily be changed on rollback if read view is created before
      rollback.
      
      So in case of rollback and memtx memory OOM let's allocate outside the
      memtx arena limited by quota.
      
      Now part of the index can reside outside memtx arena. But regularly the
      index changes will move this part back to the memtx arena. Until next
      such situation of course.
      
      Closes #10551
      
      NO_DOC=bugfix
      
      (cherry picked from commit 32ea713af0a4f27f9ae37bb767c21722ee8c6742)
      e9fc51d0
    • Nikolay Shirokovskiy's avatar
      memtx: free extents on exit · 1fb5a7cc
      Nikolay Shirokovskiy authored
      Part-of #10211
      
      NO_TEST=internal
      NO_CHANGELOG=internal
      NO_DOC=internal
      
      (cherry picked from commit 134a2a4f7f0a3bad15bc42e2dc051708c3583fed)
      1fb5a7cc
    • Nikolay Shirokovskiy's avatar
      core: add (void *) set definition · e320972a
      Nikolay Shirokovskiy authored
      Part-of #10551
      
      NO_TEST=declarative code
      NO_CHANGELOG=internal
      NO_DOC=internal
      
      (cherry picked from commit 398c7031c915380bd6e93b7aeab9145cf0ebe511)
      e320972a
  3. Sep 13, 2024
    • Nikolay Shirokovskiy's avatar
      small: bump version · e60f5fbd
      Nikolay Shirokovskiy authored
      New commits:
      * slab cache: fix slab alignment to 16 bytes
      
      NO_TEST=submodule bump
      NO_CHANGELOG=submodule bump
      NO_DOC=submodule bump
      
      (cherry picked from commit 2300704e8317f2d8a545cde1394f8cbbb7e95741)
      e60f5fbd
    • Vladimir Davydov's avatar
      sptree: don't use variable length arrays · 977ef353
      Vladimir Davydov authored
      This causes warnings if compiled with clang-18. Let's define a sane
      upper limit for the max tree depth and use it for allocating arrays
      on stack. Note that we don't really care about performance because
      sptree is used only in unit tests.
      
      Closes #10354
      
      NO_DOC=internal
      NO_TEST=internal
      NO_CHANGELOG=internal
      
      (cherry picked from commit 187d288f0c3b008ed2d281e8bb43159e44c4106e)
      977ef353
    • Sergey Bronnikov's avatar
      test: disable flaky testcases in http_client_test · 7d120035
      Sergey Bronnikov authored
      The testcase "http_client.sock_family:\"AF_UNIX\".test_follow_location"
      is flaky in each run of `release_clang_asan` and
      `debug_asan_clang` workflows. Disabling a single testcase does not
      help. The patch disables a group of testcases executed with Unix
      domain socket.
      
      Needed for #9854
      
      NO_CHANGELOG=testing
      NO_DOC=testing
      
      (cherry picked from commit 8fae8004f79ecd555537960c60c6e646b037c4cc)
      7d120035
    • Sergey Bronnikov's avatar
      test: fix luacheck warnings · e393ee29
      Sergey Bronnikov authored
      The patch fixes a warning produced by luacheck:
      
      NO_WRAP
      test/app-luatest/http_client_test.lua:27:8: Error prone negation: negation is executed before relational operator.
      test/app-luatest/http_client_test.lua:28:8: Error prone negation: negation is executed before relational operator.
      NO_WRAP
      
      Found by Luacheck 1.2.0.
      
      Closes #10037
      
      NO_CHANGELOG=codehealth
      NO_DOC=codehealth
      NO_TEST=codehealth
      
      (cherry picked from commit 8fd37731b68e1e1d8e258ab919d65907d52ec764)
      e393ee29
  4. Sep 09, 2024
    • Vladimir Davydov's avatar
      test: fix flaky #10148 test · adbb726a
      Vladimir Davydov authored
      The test may exceed the default fiber slice (1 second):
      
      ```
      [060] server | 2024-09-09 09:16:16.329 [33093] main/111/main fiber.h:1132 W> fiber has not yielded for more than 0.500 seconds
      [060] server | 2024-09-09 09:16:16.825 [33093] main/111/main/test-run.lib.luatest.luatest.log I> Assert "FiberSliceIsExceeded" equals to "OutOfMemory"
      [060] not ok 1	box-luatest.gh_10148_fix_crash_low_slab_alloc_factor.test_low_slab_alloc_factor
      [060] #   ...uatest/gh_10148_fix_crash_low_slab_alloc_factor_test.lua:36: expected: "OutOfMemory"
      [060] #   actual: "FiberSliceIsExceeded"
      [060] #   stack traceback:
      [060] #   	...uatest/gh_10148_fix_crash_low_slab_alloc_factor_test.lua:30: in function 'box-luatest.gh_10148_fix_crash_low_slab_alloc_factor.test_low_slab_alloc_factor'
      [060] #   	...
      [060] #   	[C]: in function 'xpcall'
      [060] #   artifacts:
      [060] #   	server -> /tmp/t/060_box-luatest/artifacts/server-RulP4Fj6qEoI
      [060] luatest | 2024-09-09 09:16:16.839 [32904] main/104/luatest/test-run.lib.luatest.luatest.log I> End test "box-luatest.gh_10148_fix_crash_low_slab_alloc_factor.test_low_slab_alloc_factor"
      [060] server | 2024-09-09 09:16:16.849 [33093] main/116/iproto.shutdown I> tx_binary: stopped
      [060] # Ran 1 tests in 2.388 seconds, 0 succeeded, 1 failed
      ```
      
      Let's set the fiber slice to a sufficiently big value.
      
      Fixes commit e4ce9e111483 ("test: add test for #10148").
      
      NO_DOC=test fix
      NO_CHANGELOG=test fix
      
      (cherry picked from commit 565cda7f2f0d74b2b726b475d2b7ed0c3344920e)
      adbb726a
    • Vladimir Davydov's avatar
      vinyl: fix ERRINJ_VY_DELAY_PK_LOOKUP · 7c1d6841
      Vladimir Davydov authored
      Enabling `ERRINJ_VY_DELAY_PK_LOOKUP` makes Vinyl yield in a place where
      it wouldn't normally do. If the transaction is aborted in the meantime,
      we'll get the assertion failure:
      
      ```
      ./src/box/vy_point_lookup.c:219: vy_point_lookup: Assertion 'tx == NULL || tx->state == VINYL_TX_READY' failed.
      ```
      
      To prevent this from happening, let's replace this invalid error
      injection with the new one `ERRINJ_VY_POINT_LOOKUP_DELAY` that injects
      a delay to `vy_point_lookup()` before reading disk. This doesn't have
      exactly the same effect as the old error injection because it also
      delays direct lookups in the primary index. Fortunately, the old error
      injection is used in the only test, where the new one works as expected
      if we make the secondary index created in the test non-unique and enable
      deferred writes (this makes the `s:replace{2, 2}` statement bypass
      a lookup in the primary index).
      
      Also, let's replace `VY_POINT_ITER_WAIT` with the new error injection
      because they have very a similar meaning and `VY_POINT_LOOKUP_DELAY`
      works in the test using it with a very small adjustment (we need to
      clear it explicitly after `box.snapshot()`).
      
      Closes #10517
      
      NO_DOC=errinj fix
      NO_CHANGELOG=errinj fix
      
      (cherry picked from commit 926196359eaa46bbc670d196103730e196c31437)
      7c1d6841
    • Vladimir Davydov's avatar
      vinyl: use VERBOSE level for logging ranges · a9765933
      Vladimir Davydov authored
      Whenever a range is compacted, split, or coalesced, we log the range
      boundaries. This gets really annoying if there's an index that has
      a lot of key parts or contains binary strings. Let's lower the level
      used for logging these events down to VERBOSE so that they are not
      shown by default but can be enabled if needed.
      
      Closes #10524
      
      NO_DOC=bug fix
      
      (cherry picked from commit 06fa83947b0b63c39732efba4c9d67578f113612)
      a9765933
    • Nikolay Shirokovskiy's avatar
      test: add test for #10148 · 6bebc1b5
      Nikolay Shirokovskiy authored
      The fix itself is in the small submodule which is bumped in the previous
      commit.
      
      Closes #10148
      
      NO_DOC=bugfix
      
      (cherry picked from commit e4ce9e111483a24d66e078f4f05679d309fcb94d)
      6bebc1b5
    • Nikolay Shirokovskiy's avatar
      small: bump version · e1bb094f
      Nikolay Shirokovskiy authored
      New commits:
      
      * small: small: fix crash with low alloc_factor and high memory pressure
      * test: get rid of debug message
      * test: assign label to tests
      * test: introduce a CMake function create_test
      
      Part of #10148
      
      NO_TEST=submodule bump
      NO_CHANGELOG=submodule bump
      NO_DOC=submodule bump
      
      (cherry picked from commit f3dd6960852f1885ca14587a9c72769fad6b9f55)
      e1bb094f
    • Serge Petrenko's avatar
      small: bump version · 387dcbaa
      Serge Petrenko authored
      New commits:
      * test: fix memory leaks reported by LSAN
      * region: fix memleak in ASAN version
      * matras: introduce `matras_needs_touch` and `matras_touch_no_check`
      * lsregion: implement lsregion_reserve for asan build
      
      Prerequisite #10161
      
      NO_CHANGELOG=submodule bump
      NO_TEST=submodule bump
      NO_DOC=submodule bump
      
      (cherry picked from commit c191a1bbe96a67405cbdbb3e421dbf7ea543bf47)
      387dcbaa
  5. Sep 06, 2024
    • Vladimir Davydov's avatar
      vinyl: handle error loading statement from disk during key lookup · 69450ca7
      Vladimir Davydov authored
      `vy_page_stmt()` may fail (return NULL) if:
       - the statement is corrupted;
       - memory allocation for the statement fails;
       - the statement size exceeds `box.cfg.vinyl_max_tuple_size`.
      
      If this happens `vy_page_find_key()` won't return an error. Instead,
      it'll either point the caller to a wrong statement or claim that there's
      no statement matching the key in this page. This may result in invalid
      index selection results and, later on, a crash caused by inconsistencies
      in the tuple cache. The issue was introduced by commit ac8ce023
      ("vinyl: factor out function to lookup key in page").
      
      All of the three cases are actually very unlikely to happen in
      production:
       - If a statement stored in a run file is corrupted, we'll probably fail
         to load the whole page due to failed checksums and never even get to
         `vy_page_stmt()`.
       - Statements are allocated with `malloc()`, which doesn't normally
         fail (instead the whole process would be terminated by OOM) .
       - Users don't tend to lower the tuple size limit after restart.
      
      Still, let's fix the issue by implementing proper error handling for
      `vy_page_find_key()`.
      
      Closes #10512
      
      NO_DOC=bug fix
      
      (cherry picked from commit 9dbaa6a9bc0d65984b417f8a76aa8373b6125d16)
      69450ca7
  6. Aug 30, 2024
    • Nikolay Shirokovskiy's avatar
      lua: fix iconv memory leak · 08c80081
      Nikolay Shirokovskiy authored
      `ffi.C.tnt_iconv_open` returns pointer to `struct iconv`. In this case
      `__gc` in metatable is not bound to the object.
      
      Closes #10487
      Part-of #10211
      
      NO_TEST=covered by existing tests
      NO_DOC=bugfix
      
      (cherry picked from commit 105e6188ee6cc8de71ca2ab077f78f51be07559d)
      08c80081
    • Nikolay Shirokovskiy's avatar
      vinyl: fix memory leak on dump/compaction failure · b6cd6bbe
      Nikolay Shirokovskiy authored
      The issue is we increment `page_count` only on page write. If we fail
      for some reason before then page info `min_key` in leaked.
      
      LSAN report for 'vinyl/recovery_quota.test.lua':
      
      ```
      2024-07-05 13:30:34.605 [478603] main/103/on_shutdown vy_scheduler.c:1668 E> 512/0: failed to compact range (-inf..inf)
      
      =================================================================
      ==478603==ERROR: LeakSanitizer: detected memory leaks
      
      Direct leak of 4 byte(s) in 1 object(s) allocated from:
          #0 0x5e4ebafcae09 in malloc (/home/shiny/dev/tarantool/build-asan-debug/src/tarantool+0x1244e09) (BuildId: 20c5933d67a3831c4f43f6860379d58d35b81974)
          #1 0x5e4ebb3f9b69 in vy_key_dup /home/shiny/dev/tarantool/src/box/vy_stmt.c:308:14
          #2 0x5e4ebb49b615 in vy_page_info_create /home/shiny/dev/tarantool/src/box/vy_run.c:257:23
          #3 0x5e4ebb48f59f in vy_run_writer_start_page /home/shiny/dev/tarantool/src/box/vy_run.c:2196:6
          #4 0x5e4ebb48c6b6 in vy_run_writer_append_stmt /home/shiny/dev/tarantool/src/box/vy_run.c:2287:6
          #5 0x5e4ebb72877f in vy_task_write_run /home/shiny/dev/tarantool/src/box/vy_scheduler.c:1132:8
          #6 0x5e4ebb73305e in vy_task_compaction_execute /home/shiny/dev/tarantool/src/box/vy_scheduler.c:1485:9
          #7 0x5e4ebb73e152 in vy_task_f /home/shiny/dev/tarantool/src/box/vy_scheduler.c:1795:6
          #8 0x5e4ebb01e0b1 in fiber_cxx_invoke(int (*)(__va_list_tag*), __va_list_tag*) /home/shiny/dev/tarantool/src/lib/core/fiber.h:1331:10
          #9 0x5e4ebc389ee0 in fiber_loop /home/shiny/dev/tarantool/src/lib/core/fiber.c:1182:18
          #10 0x5e4ebd3e9595 in coro_init /home/shiny/dev/tarantool/third_party/coro/coro.c:108:3
      
      SUMMARY: AddressSanitizer: 4 byte(s) leaked in 1 allocation(s).
      ```
      
      Closes #10489
      Part-of #10211
      
      NO_TEST=covered by existing tests
      NO_DOC=bugfix
      
      (cherry picked from commit 84101f60947dc9322b6bb31d2b3c536101c723c7)
      b6cd6bbe
    • Nikolay Shirokovskiy's avatar
      box: fix memory leak on user DDL when access is denied · 97902542
      Nikolay Shirokovskiy authored
      Besides mentioned #10485 we also fix a similar memleak (updating user)
      that introduced by the same commit 5b32bb7f ("alter: Refactor
      access_check outside constructors").
      
      Closes #10485
      Part-of #10211
      
      NO_TEST=covered by existing tests
      NO_DOC=bugfix
      
      (cherry picked from commit 84f10be00824348844c9e1997bd813b881836928)
      97902542
    • Maksim Tiushev's avatar
      test: add test_ prefix to a function name · 12246244
      Maksim Tiushev authored
      The test function `g.jit_off_on_macOS_by_default` in `gh_8252` was
      silently ignored by the luatest due to its lack of the required
      `test_` prefix. This commit renames the function to
      `test_jit_off_on_macOS_by_default`, ensuring that it is recognized
      and executed by the luatest.
      
      Closes #10210
      
      NO_DOC=codehealth
      NO_CHANGELOG=codehealth
      
      (cherry picked from commit eca4f17b3588d38a4d61a71af8371f5ed15de248)
      12246244
  7. Aug 28, 2024
  8. Aug 26, 2024
    • Vladimir Davydov's avatar
      vinyl: do not discard run on dump/compaction abort if index was dropped · 5b5a0568
      Vladimir Davydov authored
      If an index is dropped while a dump or compaction task is in progress
      we must not write any information about it to the vylog when the task
      completes otherwise there's a risk of getting a vylog recovery failure
      in case the garbage collector manages to purge the index from the vylog.
      
      We disabled logging on successful completion of a dump task quite a
      while ago, in commit 29e2931c ("vinyl: fix race between compaction
      and gc of dropped LSM"), and for compaction only recently, in commit
      ae6a02eb ("vinyl: do not log dump if index was dropped"), but the
      issue remains for a dump/compaction failure, when we log a discard
      record for a run file we failed to write. These results in errors like:
      
      ```
      ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Run 6 deleted twice
      ```
      
      or
      
      ```
      ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Run 5768 deleted but not registered
      ```
      
      Let's fix these issues in exactly the same way as we fixed them for
      successful dump/compaction completion - by skipping writing to vylog
      in case the index is marked as dropped.
      
      Closes #10452
      
      NO_DOC=bug fix
      
      (cherry picked from commit de59504c2bdb0369cdd27af892301f8515293fe1)
      5b5a0568
    • Andrey Saranchin's avatar
      memtx: skip excluded tuples in index count with MVCC enabled · a9b5ae1d
      Andrey Saranchin authored
      Excluded tuples actually have their own history chains in MVCC - such
      chains consist of only one `memtx_story` containing excluded tuple
      itself. Such chains should be skipped when counting invisible tuples
      because they are not inserted to the index - that's what the commit
      does.
      
      Closes #10396
      
      NO_DOC=bugfix
      
      (cherry picked from commit 8947cb04f59423e2944d48b8a1effec2fb11b1db)
      a9b5ae1d
  9. Aug 23, 2024
    • Andrey Saranchin's avatar
      memtx: do not pass pagination key to MVCC · 76bd0d99
      Andrey Saranchin authored
      Currently, when starting an iterator in memtx tree on a range request,
      we pass key from `start_data` to memtx MVCC. The problem is `start_data`
      can contain pagination key that is extracted with `cmp_def`, but MVCC
      performs all comparisons with `key_def`. Fortunately, first parts of
      `cmp_def` is actually `key_def` of the index, so let's crop `start_data`
      by passing `part_count` not greater than `key_def->part_count` to MVCC.
      
      Closes #10448
      
      NO_DOC=bugfix
      
      (cherry picked from commit 0dca0076c0fdaee142020cdeddb031bc0e2238cb)
      76bd0d99
    • Vladimir Davydov's avatar
      vinyl: enable exact match optimization for unique secondary indexes · 93a8edbc
      Vladimir Davydov authored
      If the iterator type is EQ/REQ/LE/GE and the search key is exact (that
      is, there may be at most one tuple matching the key in the index),
      there's no need to scan disk levels if we found a statement for this
      key in the memory level. We've had this optimization for ages but it
      worked only for full keys in terms `cmp_def` (key definition extended
      with primary key parts). Apparently, a lookup in a secondary index
      performed by the user wouldn't match these criteria unless the secondary
      index explicitly included all primary key parts.
      
      This commit improves on that. Now, we enable the optimization if the
      search key is **exact**. We consider a key **exact** if either of the
      following conditions is true:
      
       - The key statement is a tuple (tuple has all key parts).
       - The key statement is a full key in terms of `cmp_def`.
       - The key statement is a full key in terms of `key_def`, it doesn't
         contain nulls, and the index is unique. The check for nulls is
         necessary because even a unique nullable index may contain more than
         one equal key with nulls.
      
      Note, this patch slightly refactors the optimization, adding a few
      comments and hopefully making it more understandable. In particular,
      we remove the one-result-tuple optimization for exact EQ/REQ from
      `vy_read_iterator_advance` and put it in `vy_read_iterator_evaluate_src`
      instead. This way the whole optimization resides in one place.
      
      Closes #10442
      
      NO_DOC=bug fix
      
      (cherry picked from commit 850673db5a69df2c7250d174ab15305624b2634a)
      93a8edbc
  10. Aug 22, 2024
    • Vladimir Davydov's avatar
      test: fix flaky gh-5998-one-tx-for-ddl.test.lua · 3067139b
      Vladimir Davydov authored
      The test expects that any DDL operation aborts **all** concurrent
      transactions, but since commit f5f061d051dc ("vinyl: do not abort
      unrelated transactions on DDL") this isn't exactly true: transactions
      that haven't read/written anything aren't aborted. In the test we expect
      a transaction that haven't done anything to be aborted by DDL and it
      **is** aborted most of them time but for a different reason: it reads
      data that are later modified for `box.schema.user.create()` reads
      `box.space._user:max()` to generate an id for the new user first. Since
      it reads before writing anything, it has the "read-confirmed" isolation
      level hence it's aborted by the transaction creating another user
      because the latter updates `box.space._user:max()`. However, sometimes
      both users are created and the test fails. This happens if the first
      transaction manages to commit before the second one reads the `_user`
      system space.
      
      To fix the test and make the transaction creating the second user fail
      due to DDL, let's add a read of the `_user` system space before putting
      it to sleep. Actually, this even makes the test closer to the "original
      test from #5998".
      
      Closes #10444
      
      NO_DOC=test fix
      NO_CHANGELOG=test fix
      
      (cherry picked from commit 62c051e22109369f9079b5adf4de30e0c53f6ca7)
      3067139b
  11. Aug 21, 2024
  12. Aug 20, 2024
  13. Aug 16, 2024
    • Nikita Zheleztsov's avatar
      engine: introduce stubs for checkpoint FETCH_SNAPSHOT · 23c7899e
      Nikita Zheleztsov authored
      This commit introduces engine stubs that enable a new method
      of fetching snapshots for anonymous replicas. Instead of using
      the traditional read-view join approach, this update allows
      file snapshot fetching. Note that file snapshot fetching
      is only available in Tarantool EE.
      
      Checkpoint fetching is done via IPROTO_IS_CHECKPOINT_JOIN,
      IPROTO_CHECKPOINT_VCLOCK and IPROTO_CHECKPOINT_LSN fields.
      
      If IPROTO_CHECKPOINT_JOIN is set to true, join will be done from
      files: .snap for memtx, .run for vinyl, if false - from read view.
      
      Checkpoint join allows to continue from the place, where client
      stopped in case of snapshot fetching error. This allows to avoid
      rebootstrap of an anonymous client. This can be done by specifying
      CHECKPOINT_VCLOCK, which says from which file server should continue
      join, client gets vclock at the beginning of the join. Specifying
      CHECKPOINT_LSN allows to continue from some position in checkpoint.
      Server sends all data >= CHECKPOINT_LSN.
      
      If CHECKPOINT_VCLOCK is not specified, fetching is done from the latest
      available checkpoint. If CHECKPOINT_LSN is not specified - start from
      the beginning of the snap. So, specifying only IS_CHECKPOINT_JOIN
      triggers fetching the latest checkpoint from files.
      
      Needed for tarantool/tarantool-ee#741
      
      NO_DOC=ee
      NO_TEST=ee
      NO_CHANGELOG=ee
      
      (cherry picked from commit 2fca5c13)
      23c7899e
Loading