Skip to content
Snippets Groups Projects
  1. Jul 15, 2024
    • Vladimir Davydov's avatar
      vinyl: use broadcast instead of signal to notify about dump completion · 04347ee7
      Vladimir Davydov authored
      There may be more than one fiber waiting on `vy_scheduler::dump_cond`:
      
      ```
      box.snapshot
        vinyl_engine_wait_checkpoint
          vy_scheduler_wait_checkpoint
      
      space.create_index
        vinyl_space_build_index
          vy_scheduler_dump
      ```
      
      To avoid hang, we should use `fiber_cond_broadcast`.
      
      Closes #10233
      
      NO_DOC=bug fix
      
      (cherry picked from commit 30547157)
      04347ee7
    • Lev Kats's avatar
      small: bump new version with UBSan fixes · cf278f56
      Lev Kats authored
      This patch bumped small to the new version that does not trigger
      UBSan with *_entry* macros and should support new oss-fuzz builder.
      
      New commits:
      
      * rlist: make its methods accept const arguments
      * lsregion: introduce lsregion_to_iovec method
      * rlist: make foreach_enrty_* macros not to use UB
      
      Fixes: #10143
      
      NO_DOC=small submodule bump
      NO_TEST=small submodule bump
      NO_CHANGELOG=small submodule bump
      
      (cherry picked from commit 3e183044)
      cf278f56
    • Lev Kats's avatar
      trivia: use __builtin* for offsetof macro · 25146985
      Lev Kats authored
      Changed default tarantool `offsetof` macro implementation so it don't
      access members of null pointer in typeof that triggers UBsan.
      
      Needed for #10143
      
      NO_DOC=bugfix
      NO_CHANGELOG=minor
      NO_TEST=tested manually with fuzzer
      
      (cherry picked from commit 27e94824)
      25146985
  2. Jul 08, 2024
    • Nikolay Shirokovskiy's avatar
      fiber: phohibit fiber self join · 2131743e
      Nikolay Shirokovskiy authored
      In this case join will just hang. Instead let's raise an error in case
      of Lua API and panic in case of C API.
      
      Closes #10196
      
      NO_DOC=minor
      
      (cherry picked from commit 1e1bf36d)
      2131743e
    • Magomed Kostoev's avatar
      fiber: make the concurrent fiber_join safer · cf3def52
      Magomed Kostoev authored
      Prior to this patch a bunch of illegal conditions was possible:
      1. The joinability of a fiber could be changed while the fiber is
         being joined by someone. This could lead to double recycling:
         the first one happened on the fiber finish, and the second one
         in the fiber join.
      2. The joinability of a dead joinable fiber could be altered, this
         led to inability jo join the dead fiber and free its resources.
      3. A running fiber could be joined concurrently by two or more
         fibers, so the fiber could be recycled more than once (once
         per each concurrent join).
      4. A dead recycled fiber could be made joinable and joined leading
         to the double recycle.
      
      Fixed these issues by adding a new FIBER_JOIN_BEEN_INVOKED flag: now
      the `fiber_set_joinable` and `fiber_join_timeout` functions detect
      the double join. Because of the API limitations both of them panic
      when an invalid condition is met:
      - The `fiber_set_joinable` was not designed to report errors.
      - The `fiber_join_timeout` can't raise any error unless a timeout
        is met, because the `fiber_join` users don't expect to receive
        any error from this function at all (except the one generated
        by the joined fiber).
      
      It's still possible that a fiber join is performed on a struct which
      has been recycled and, if the new fiber is joinable too, this can't
      be detected. The current fiber API does not allow to fix this, so
      this is to be the user's responsibility, they should be warned about
      the fact the double join to the same fiber is illegal.
      
      Closes #7562
      
      @TarantoolBot document
      Title: `fiber_join`, `fiber_join_timeout` and `fiber_set_joinable`
      behave differently now.
      
      `fiber_join` and `fiber_join_timeout` now panic in case if double
      join of the given fiber is detected.
      
      `fiber_set_joinable` now panics if the given fiber is dead or is
      joined already. This prevents some amount of error conditions that
      could happen when using the API in an unexpected way, including:
      - Making a dead joinable fiber non-joinable could lead to a memory
        leak: one can't join the fiber anymore.
      - Making a dead joinable fiber joinable again is a sign of attempt
        to join the fiber later. That means the fiber struct may be joined
        later, when it's been recycled and reused. This could lead to a
        very hard to debug double join.
      - Making an alive joined fiber non-joinable would lead to the double
        free: once on the fiber function finish, and secondly in the active
        fiber join finish. Risks of making it joinable are described above.
      - Making a dead and recycled fiber joinable allowed to join the fiber
        once again leading to a double free.
      
      Any given by the API `struct fiber` should only be joined once. If a
      fiber is joined after the first join on it has finished the behavior
      is undefined: it can either be a panic or an incidental join to a
      totally foreign fiber.
      
      (cherry picked from commit 44401529)
      cf3def52
    • Sergey Kaplun's avatar
      luajit: bump new version · 03d9038c
      Sergey Kaplun authored
      * Correct fix for stack check when recording BC_VARG.
      * test: remove inline suppressions of _TARANTOOL
      * FFI: Fix ffi.alignof() for reference types.
      * FFI: Fix sizeof expression in C parser for reference types.
      * FFI: Allow ffi.metatype() for typedefs with attributes.
      * FFI: Fix ffi.metatype() for non-raw types.
      * Maintain chain invariant in DCE.
      * build: introduce option LUAJIT_ENABLE_TABLE_BUMP
      * ci: add tablebump flavor for exotic builds
      * test: allow `jit.parse` to return aborted traces
      * Handle all types of errors during trace stitching.
      * Use generic trace error for OOM during trace stitching.
      * Check for IR_HREF vs. IR_HREFK aliasing in non-nil store check.
      * cmake: set cmake_minimum_required only once
      * cmake: fix warning about minimum required version
      * ci: add a workflow for testing with AVX512 enabled
      * test: introduce a helper read_file
      * OSX/iOS/ARM64: Fix generation of Mach-O object files.
      * OSX/iOS/ARM64: Fix bytecode embedding in Mach-O object file.
      * build: introduce LUAJIT_USE_UBSAN option
      * ci: enable UBSan for sanitizers testing workflow
      * cmake: add the build directory to the .gitignore
      * Prevent sanitizer warning in snap_restoredata().
      * Avoid negation of signed integers in C that may hold INT*_MIN.
      * Show name of NYI bytecode in -jv and -jdump.
      
      Closes #9924
      Closes #8473
      
      NO_DOC=LuaJIT submodule bump
      NO_TEST=LuaJIT submodule bump
      03d9038c
  3. Jul 04, 2024
    • Nikolay Shirokovskiy's avatar
      fiber: fix leak on dead joinable fiber search · e97b01f6
      Nikolay Shirokovskiy authored
      When fiber is accessed from Lua we create a userdata object and keep the
      reference for future accesses. The reference is cleared when fiber is
      stopped. But if fiber is joinable is still can be found with
      `fiber.find`. In this case we create userdata object again.
      Unfortunately as fiber is already stopped we fail to clear the
      reference. The trigger memory that clear the reference is also leaked.
      As well as fiber storage if it is accessed after fiber is stopped.
      
      Let's add `on_destroy` trigger to fiber and clear the references there.
      
      Note that with current set of LSAN suppressions the trigger memory leak
      of the issue is not reported.
      
      Closes #10187
      
      NO_DOC=bugfix
      
      (cherry picked from commit 7db4de75)
      e97b01f6
  4. Jun 26, 2024
    • Nikolay Shirokovskiy's avatar
      box: fix memleak on functional index drop · 432789dc
      Nikolay Shirokovskiy authored
      We just don't free functional index keys on functional index drop now.
      Let's approach keys deletion as in the case of primary index drop ie
      let's drop these keys in background.
      
      We should set `use_hint` to `true` in case of MEMTX_TREE_VTAB_DISABLED
      tree index methods because `memtx_tree_disabled_index_vtab` uses
      `memtx_tree_index_destroy<true>`. Otherwise we get read outside of index
      structure for stub functional index on destroy for introduced `is_func`
      field (which is reported by ASAN).
      
      Closes #10163
      
      NO_DOC=bugfix
      
      (cherry picked from commit 319357d5)
      432789dc
  5. Jun 25, 2024
  6. Jun 22, 2024
    • Vladislav Shpilevoy's avatar
      sio: use kern.ipc.somaxconn for listen() on Mac · 23e58efb
      Vladislav Shpilevoy authored
      listen() on Mac used to take SOMAXCONN as the backlog size. It is
      just 128, which is too small when connections are incoming too
      fast. They get rejected.
      
      Increase of the queue size wasn't possible, because the limit was
      hardcoded. But now sio takes the runtime limit from
      kern.ipc.somaxconn sysctl setting.
      
      One weird thing is that when set too high, it seems to have no
      effect, like if nothing was changed. Specifically, values above
      32767 are not doing anything, even though stay visible in
      kern.ipc.somaxconn.
      
      It seems listen() on Mac internally might be using 'short' or
      int16_t to store the queue size and it gets broken when anything
      above INT16_MAX is used. The code truncates the queue size to this
      value if the given one is too high.
      
      Closes #8130
      
      NO_DOC=bugfix
      NO_TEST=requires root privileges for testing
      
      (cherry picked from commit 7e9a872f)
      23e58efb
  7. Jun 20, 2024
    • Nikolay Shirokovskiy's avatar
      ci: add workflow to check downgrade versions · 4ab1dcfd
      Nikolay Shirokovskiy authored
      Tarantool has hardcoded list of versions it can downgrade to. This list
      should consist of all the released versions less than Tarantool version.
      This workflow helps to make sure we update the list before release.
      
      It is run on pushing release tag to the repo, checks the list and fails
      if it misses some released version less than current. In this case we
      are supposed to update downgrade list (with required downgrade code) and
      update the release tag.
      
      Closes #8319
      
      NO_TEST=ci
      NO_CHANGELOG=ci
      NO_DOC=ci
      
      (cherry picked from commit 6d856347)
      4ab1dcfd
  8. Jun 14, 2024
  9. Jun 13, 2024
    • Serge Petrenko's avatar
      ci: followup fix RPM package builds on aarch64 runners · 74223a2d
      Serge Petrenko authored
      Commit 715abaaf ("ci: fix RPM package builds on aarch64 runners")
      has limited number of parallel jobs to 6 on these runners to fix the
      OOM, but it turns out this isn't enough: almalinux_9_aarch64 workflow
      fails constantly even with this setting. Let's try to reduce the amount
      of jobs to 4.
      
      NO_CHANGELOG=ci
      NO_TEST=ci
      NO_DOC=ci
      74223a2d
    • Vladislav Shpilevoy's avatar
      relay: do not report vclock[0] anywhere · 4f2e67f5
      Vladislav Shpilevoy authored
      Remote replica's vclock is given to master to send data starting
      from that position. The master does that, but, in order to find
      the relevant position in local WAL to start from, the master must
      ignore the local rows. Consider them all already "sent". For that
      the master replaces the remote vclock[0] with the local vclock[0].
      That makes xlog cursor skip all the local rows.
      
      The problem is that this vclock was taken by relay as is, like if
      it was truly reported by the replica. It was even saved as the
      "last received ACK". Which clearly isn't the case.
      
      When a real ACK was received, it didn't contain anything in
      vclock[0], and yet relay "saw" that the previous ACK has
      vclock[0] > 0. That looked like the replica went backwards without
      even closing connection, which isn't possible. That made the relay
      crash from cringe (on assert).
      
      The fix is not to save the local vclock[0] in the last received
      ACK.
      
      For GC and xlog cursor the hack is still needed. An option how to
      make it easier was to set vclock[0] to INT64_MAX to just never
      even bother with any local rows, but that didn't work. Some
      assumptions in other places seem to depend on having a proper
      local LSN in these places.
      
      Closes #10047
      
      NO_CHANGELOG=the bug wasn't released
      NO_DOC=bugfix
      
      (cherry picked from commit 1f75231a)
      4f2e67f5
    • Vladislav Shpilevoy's avatar
      relay: rename vclock args and make const · 49b374f9
      Vladislav Shpilevoy authored
      It wasn't clear which of them are inputs and which are outputs.
      The patch explicitly marks the input vclocks as const. It makes
      the code a bit easier to read inside of relay.cc knowing that
      these vclocks shouldn't change.
      
      Alongside "replica_clock" in subscribe is renamed to
      "start_vclock". To make it consistent with relay_final_join(), and
      to signify that technically it doesn't have to be a replica
      vclock. It isn't really. Box.cc alters the replica's vclock before
      giving it to relay, which means it is no longer "replica clock".
      
      In scope of #10047
      
      NO_TEST=refactoring
      NO_CHANGELOG=refactoring
      NO_DOC=refactoring
      
      (cherry picked from commit 5ebbed77)
      49b374f9
    • Vladislav Shpilevoy's avatar
      relay: move gc subscriber creation out of it · 605752e5
      Vladislav Shpilevoy authored
      GC consumer creation and destroy seemed to only happen in box.cc
      with one exception in relay_subscribe(). Lets move it out for
      consistency. Now relay can only notify GC consumers, but can't
      manage them.
      
      That also makes it harder to misuse the GC by passing some wrong
      vclock to it, similar to what was happening in #10047.
      
      In scope of #10047
      
      NO_TEST=refactoring
      NO_CHANGELOG=refactoring
      NO_DOC=refactoring
      
      (cherry picked from commit 4dc0c1ea)
      605752e5
    • Vladislav Shpilevoy's avatar
      box: introduce box_localize_vclock · 149fc1f7
      Vladislav Shpilevoy authored
      The function takes the burden of explaining why this hack about
      setting local component in a remote vclock is needed. It also
      creates a new vclock, not alters an existing one. This is to
      signify that the vclock is no longer what was received from a
      remote host.
      
      Otherwise it is too easy to actually mistreat this mutant vlock as
      a remote vclock. That btw did happen and is fixed in following
      commits.
      
      In scope of #10047
      
      NO_TEST=refactoring
      NO_CHANGELOG=refactoring
      NO_DOC=refactoring
      
      (cherry picked from commit b8463960)
      149fc1f7
    • Nikolay Shirokovskiy's avatar
      ci: add a workflow to check for entrypoint tags · 426bff55
      Nikolay Shirokovskiy authored
      Check check-entrypoint.sh comment for explanation of what entrypoint tag
      is. The workflow fails if current branch does not have a most recent
      entrypoint tag that it should have.
      
      Part of #8319
      
      NO_TEST=ci
      NO_CHANGELOG=ci
      NO_DOC=ci
      
      (cherry picked from commit c06d0d14)
      426bff55
    • Vladimir Davydov's avatar
      vinyl: fix gc vs vylog race leading to duplicate record · 085279aa
      Vladimir Davydov authored
      Vinyl run files aren't always deleted immediately after compaction,
      because we need to keep run files corresponding to checkpoints for
      backups. Such run files are deleted by the garbage collection procedure,
      which performs the following steps:
      
       1. Loads information about all run files from the last vylog file.
       2. For each loaded run record that is marked as dropped:
          a. Tries to remove the run files.
          b. On success, writes a "forget" record for the dropped run,
             which will make vylog purge the run record on the next
             vylog rotation (checkpoint).
      
      (see `vinyl_engine_collect_garbage()`)
      
      The garbage collection procedure writes the "forget" records
      asynchronously using `vy_log_tx_try_commit()`, see `vy_gc_run()`.
      This procedure can be successfully executed during vylog rotation,
      because it doesn't take the vylog latch. It simply appends records
      to a memory buffer which is flushed either on the next synchronous
      vylog write or vylog recovery.
      
      The problem is that the garbage collection isn't necessarily loads
      the latest vylog file because the vylog file may be rotated between
      it calls `vy_log_signature()` and `vy_recovery_new()`. This may
      result in a "forget" record written twice to the same vylog file
      for the same run file, as follows:
      
        1. GC loads last vylog N
        2. GC starts removing dropped run files.
        3. CHECKPOINT starts vylog rotation.
        4. CHECKPOINT loads vylog N.
        5. GC writes a "forget" record for run A to the buffer.
        6. GC is completed.
        7. GC is restarted.
        8. GC finds that the last vylog is N and blocks on the vylog latch
           trying to load it.
        9. CHECKPOINT saves vylog M (M > N).
       10. GC loads vylog N. This triggers flushing the forget record for
           run A to vylog M (not to vylog N), because vylog M is the last
           vylog at this point of time.
       11. GC starts removing dropped run files.
       12. GC writes a "forget" record for run A to the buffer again,
           because in vylog N it's still marked as dropped and not forgotten.
           (The previous "forget" record was written to vylog M).
       13. Now we have two "forget" records for run A in vylog M.
      
      Such duplicate run records aren't tolerated by the vylog recovery
      procedure, resulting in a permanent error on the next checkpoint:
      
      ```
      ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Run XXXX forgotten but not registered
      ```
      
      To fix this issue, we move `vy_log_signature()` under the vylog latch
      to `vy_recovery_new()`. This makes sure that GC will see vylog records
      that it's written during the previous execution.
      
      Catching this race in a function test would require a bunch of ugly
      error injections so let's assume that it'll be tested by fuzzing.
      
      Closes #10128
      
      NO_DOC=bug fix
      NO_TEST=tested manually with fuzzer
      
      (cherry picked from commit 9d3859b2)
      085279aa
    • Georgiy Lebedev's avatar
      box: prevent demoted leader from being a candidate in the next elections · 22a9cfd8
      Georgiy Lebedev authored
      
      Currently, the demoted leader sees that nobody has requested a vote in the
      newly persisted term (because it has just written it without voting, and
      nobody had time to see the new term yet), and hence votes for itself,
      becoming the most probable winner of the next elections.
      
      To prevent this from happening, let's forbid the demoted leader to be a
      candidate in the next elections using `box_raft_leader_step_off`.
      
      Closes #9855
      
      NO_DOC=<bugfix>
      
      Co-authored-by: default avatarSerge Petrenko <sergepetrenko@tarantool.org>
      (cherry picked from commit 05d03a1c)
      22a9cfd8
    • Georgiy Lebedev's avatar
      box: refactor `box_demote` to make it more comprehensible · 49747a4b
      Georgiy Lebedev authored
      
      Suggested by Nikita Zheleztsov in the scope of #9855.
      
      Needed for #9855
      
      NO_CHANGELOG=<refactoring>
      NO_DOC=<refactoring>
      NO_TEST=<refactoring>
      
      Co-authored-by: default avatarNikita Zheleztsov <n.zheleztsov@proton.me>
      (cherry picked from commit ff010fe9)
      49747a4b
    • Vladislav Shpilevoy's avatar
      election: fix box.ctl.demote() nop in off-mode · 42631d5b
      Vladislav Shpilevoy authored
      box.ctl.demote() used not to do anything with election_mode='off'
      if the synchro queue didn't belong to the caller in the same term
      as the election state.
      
      The reason could be that if the synchro queue term is "outdated",
      there is no guarantee that some other instance doesn't own it in
      the latest term right now.
      
      The "problem" is that this could be workarounded easily by just
      calling promote + demote together.
      
      There isn't much sense in fixing it for the off-mode because the
      only reasons off-mode exists are 1) for people who don't use
      synchro at all, 2) who did use it and want to stop. Hence they
      need demote just to disown the queue.
      
      The patch "legalizes" the mentioned workaround by allowing to
      perform demote in off-mode even if the synchro queue term is old.
      
      Closes #6860
      
      NO_DOC=bugfix
      
      (cherry picked from commit 1afe2274)
      42631d5b
    • Vladimir Davydov's avatar
      tuple: don't use offset_slot_cache in vinyl threads · 7d90a94c
      Vladimir Davydov authored
      `key_part::offset_slot_cache` and `key_part::format_epoch` are used for
      speeding up tuple field lookup in `tuple_field_raw_by_part()`. These
      structure members are accessed and updated without any locks, assuming
      this code is executed exclusively in the tx thread. However, this isn't
      necessarily true because we also perform tuple field lookups in vinyl
      read threads. Apparently, this can result in unexpected races and bugs,
      for example:
      
      ```
        #1  0x590be9f7eb6d in crash_collect+256
        #2  0x590be9f7f5a9 in crash_signal_cb+100
        #3  0x72b111642520 in __sigaction+80
        #4  0x590bea385e3c in load_u32+35
        #5  0x590bea231eba in field_map_get_offset+46
        #6  0x590bea23242a in tuple_field_raw_by_path+417
        #7  0x590bea23282b in tuple_field_raw_by_part+203
        #8  0x590bea23288c in tuple_field_by_part+91
        #9  0x590bea24cd2d in unsigned long tuple_hint<(field_type)5, false, false>(tuple*, key_def*)+103
        #10 0x590be9d4fba3 in tuple_hint+40
        #11 0x590be9d50acf in vy_stmt_hint+178
        #12 0x590be9d53531 in vy_page_stmt+168
        #13 0x590be9d535ea in vy_page_find_key+142
        #14 0x590be9d545e6 in vy_page_read_cb+210
        #15 0x590be9f94ef0 in cbus_call_perform+44
        #16 0x590be9f94eae in cmsg_deliver+52
        #17 0x590be9f9583e in cbus_process+100
        #18 0x590be9f958a5 in cbus_loop+28
        #19 0x590be9d512da in vy_run_reader_f+381
        #20 0x590be9cb4147 in fiber_cxx_invoke(int (*)(__va_list_tag*), __va_list_tag*)+34
        #21 0x590be9f8b697 in fiber_loop+219
        #22 0x590bea374bb6 in coro_init+120
      ```
      
      Fix this by skipping this optimization for threads other than tx.
      
      No test is added because reproducing this race is tricky. Ideally, bugs
      like this one should be caught by fuzzing tests or thread sanitizers.
      
      Closes #10123
      
      NO_DOC=bug fix
      NO_TEST=tested manually with fuzzer
      
      (cherry picked from commit 19d1f1cc)
      7d90a94c
    • Vladimir Davydov's avatar
      vinyl: fix cache iterator skipping tuples in read view · 1bad1afc
      Vladimir Davydov authored
      The tuple cache doesn't store older tuple versions so if a reader is
      in a read view, it must skip tuples that are newer than the read view,
      see `vy_cache_iterator_stmt_is_visible()`. A reader must also ignore
      cached intervals if any of the tuples used as a boundary is invisible
      from the read view, see `vy_cache_iterator_skip_to_read_view()`.
      There's a bug in `vy_cache_iterator_restore()` because of which such
      an interval may be returned to the reader: when we step backwards
      from the last returned tuple we consider only one of the boundaries.
      As a result, if the other boundary is invisible from the read view,
      the reader will assume there's nothing in the index between the
      boundaries and skip reading older sources (memory, disk). Fix this by
      always checking if the other boundary is visible.
      
      Closes #10109
      
      NO_DOC=bug fix
      
      (cherry picked from commit 7b72080d)
      1bad1afc
    • Vladimir Davydov's avatar
      vinyl: fix run iterator skipping tuples following non-terminal statement · 56b6ed79
      Vladimir Davydov authored
      If a run iterator is positioned at a non-terminal statement (UPSERT or
      UPDATE), `vy_run_iterator_next()` will iterate over older statements
      with the same key using `vy_run_iterator_next_lsn()` to build the key
      history. While doing so, it may reach the end of the run file (if the
      current key is the last in the run). This would stop iteration
      permanently, which is apparently wrong for reverse iterators (LE or LT):
      if this happens the run iterator won't return any keys preceding the
      last one in the run file. Fix this by removing `vy_run_iterator_stop()`
      from `vy_run_iterator_next_lsn()`.
      
      Part of #10109
      
      NO_DOC=bug fix
      NO_CHANGELOG=next commit
      
      (cherry picked from commit 72763f94)
      56b6ed79
  10. Jun 10, 2024
    • Yaroslav Lobankov's avatar
      ci: fix RPM package builds on aarch64 runners · 715abaaf
      Yaroslav Lobankov authored
      We're using LXD containers as aarch64 runners. For some reason, OOM
      killer just kills the compilation process while package building when
      `make -j $(nproc)`. The issue happens only with builds where LTO is
      enabled. It's found, that `-j6` works fine. The bigger value causes
      problems.
      
      NO_DOC=ci
      NO_TEST=ci
      NO_CHANGELOG=ci
      715abaaf
    • Yaroslav Lobankov's avatar
      test: bump test-run to new version · 4d8dc4f2
      Yaroslav Lobankov authored
      Bump test-run to new version with the following improvements:
      
      - Calculate parallel jobs based on available CPUs [1]
      - Bump luatest to 1.0.1-15 (--list-test-cases) [2]
      - luatest: detox test searching code [3]
      - luatest: allow to run test cases in parallel [4]
      
      [1] tarantool/test-run@182aa77
      [2] tarantool/test-run@1fbbf9a
      [3] tarantool/test-run@3b0ccd0
      [4] tarantool/test-run@dd00063
      
      NO_DOC=test
      NO_TEST=test
      NO_CHANGELOG=test
      
      (cherry picked from commit 32bcea7d)
      4d8dc4f2
    • Yaroslav Lobankov's avatar
      ci: disable workaround for LuaJIT profiling tests on aarch64 runners · 307e3377
      Yaroslav Lobankov authored
      Disable workaround for LuaJIT profiling tests on aarch64 runners due to
      the following error:
      
          mount: /tmp/luajit-test-vardir: mount failed: Operation not permitted
      
      Looks like it happens because our aarch64 runners are LXD containers.
      
      NO_DOC=ci
      NO_TEST=ci
      NO_CHANGELOG=ci
      
      (cherry picked from commit e64457d9)
      307e3377
    • Vladimir Davydov's avatar
      vinyl: fix crash on invalid upsert · ca21e6d5
      Vladimir Davydov authored
      `vy_apply_result_does_cross_pk()` must be called after the new tuple
      format is validated, otherwise it may crash in case the new tuple has
      fields conflicting with the primary key definition.
      
      While we are at it, fix the operation cursor (`ups_ops`) not advanced
      on this kind of error. This resulted in skipped `upsert` statements
      following an invalid `upsert` statement in a transaction.
      
      Closes #10099
      
      NO_DOC=bug fix
      
      (cherry picked from commit dd0ac814)
      ca21e6d5
  11. Jun 07, 2024
    • Vladimir Davydov's avatar
      vinyl: fix crash on extending secondary key parts with primary · 05fa2f74
      Vladimir Davydov authored
      If a secondary index is altered in such a way that its key parts are
      extended with the primary key parts, rebuild isn't required because
      `cmp_def` doesn't change, see `vinyl_index_def_change_requires_rebuild`.
      In this case `vinyl_index_update_def` will try to update `key_def` and
      `cmp_def` in-place with `key_def_copy`. This will lead to a crash
      because the number of parts in the new `key_def` is greater.
      
      We can't use `key_def_dup` instead of `key_def_copy` there because
      there may be read iterators using the old `key_def` by pointer so
      there's no other option but to force rebuild in this case.
      
      The bug was introduced in commit 64817066 ("vinyl: use update_def
      index method to update vy_lsm on ddl").
      
      Closes #10095
      
      NO_DOC=bug fix
      
      (cherry picked from commit 9b817848)
      05fa2f74
    • Vladimir Davydov's avatar
      vinyl: fix crash in index drop if there is DML request reading from it · f7f01196
      Vladimir Davydov authored
      A DML request (insert, replace, update) can yield while reading from
      the disk in order to check unique constraints. In the meantime the index
      can be dropped. The DML request can't crash in this case thanks to
      commit d3e12369 ("vinyl: abort affected transactions when space is
      removed from cache"), but the DDL operation can because:
       - It unreferences the index in `alter_space_commit`, which may result
         in dropping the LSM tree with `vy_lsm_delete`.
       - `vy_lsm_delete` may yield in `vy_range_tree_free_cb` while waiting
         for disk readers to complete.
       - Yielding in commit triggers isn't allowed (crashes).
      
      We already fixed a similar issue when `index.get` crashed if raced
      with index drop, see commit 75f03a50 ("vinyl: fix crash if space is
      dropped while space.get is reading from it"). Let's fix this issue in
      the same way - by taking a reference to the LSM tree while checking
      unique constraints. To do that it's enough to move `vy_lsm_ref` from
      `vinyl_index_get` to `vy_get`.
      
      Also, let's replace `vy_slice_wait_pinned` with an assertion checking
      that the slice pin count is 0 in `vy_range_tree_free_cb` because
      `vy_lsm_delete` must not yield.
      
      Closes #10094
      
      NO_DOC=bug fix
      
      (cherry picked from commit bde28f0f)
      f7f01196
    • Vladimir Davydov's avatar
      tuple: fix crash on hashing tuple with double fields · 73dd3a8e
      Vladimir Davydov authored
      `tuple_hash_field()` doesn't advance the MsgPack cursor after hashing
      a tuple field with the type `double`, which can result in crashes both
      in memtx (while inserting a tuple into a hash index) and in vinyl
      (while writing a bloom filter on dump or compaction).
      
      The bug was introduced by commit 51af059c ("box: compare and hash
      msgpack value of double key field as double").
      
      Closes #10090
      
      NO_DOC=bug fix
      
      (cherry picked from commit bc0daf99)
      73dd3a8e
  12. Jun 06, 2024
    • Nikolay Shirokovskiy's avatar
      test: bump test-run to new version · 9b8fb7ab
      Nikolay Shirokovskiy authored
      Bump test-run to new version with the following improvements:
      
      - Bump luatest to 1.0.1-14-gdfee2f3 [1]
      - Adjust test result report width to terminal size [2]
      - dispatcher: lift pipe buffer size restriction [3]
      - flake8: fix E721 do not compare types [4]
      
      [1] tarantool/test-run@84ebae5
      [2] tarantool/test-run@1724211
      [3] tarantool/test-run@81259c4
      [4] tarantool/test-run@1037299
      
      We also have to fix several tests that check that script with luatest
      assertions have empty stderr output. test-run brings Luatest which
      logs assertions at 'info' level.
      
      Note that gh_8433_raft_is_candidate_test is different. Original
      assertion involves logging huge tables that have somewhere closed
      sockets inside. And 'socket.__tostring' currently raises error for
      closed sockets.
      
      NO_DOC=submodule bump
      NO_TEST=submodule bump
      NO_CHANGELOG=submodule bump
      
      (cherry picked from commit 97a801e1)
      9b8fb7ab
    • Oleg Chaplashkin's avatar
      test: bump test-run to new version · ac9e8897
      Oleg Chaplashkin authored
      Bump test-run to new version with the following improvements:
      
      - Bump luatest to 1.0.1-5-g105c69d [1]
      - tap13: fix worker fail on failed TAP13 parsing [2]
      
      [1] tarantool/test-run@ed5b623
      [2] tarantool/test-run@7c1a0a7
      
      NO_DOC=test
      NO_TEST=test
      NO_CHANGELOG=test
      
      (cherry picked from commit 4466deaf)
      ac9e8897
  13. May 30, 2024
  14. May 29, 2024
    • Georgiy Lebedev's avatar
      txn: run statement `on_rollback` triggers before rolling back statement · 41af99a2
      Georgiy Lebedev authored
      Logically, we call triggers after running statements. These triggers can
      make significant changes (for instance, DDL triggers), so, for consistency,
      we should call the statement's `on_rollback` triggers before rolling back
      the statement. This also adheres to the logic that transaction
      `on_rollback` triggers are called before rolling back individual
      transaction statements.
      
      One particular bug that this patch fixes is rolling back of DDL on the
      `_space` space. DDL is essentially a replace operation on the `_space`
      space, which also invokes the `on_replace_dd_space` trigger. In this
      trigger, among other things, we swap the indexes of the original space,
      `alter->old_space`, which is equal to the corresponding transaction
      `stmt->space`, with the indexes of the newly created space,
      `alter->new_space`:
      https://github.com/tarantool/tarantool/blob/de80e0264f7deb58ea86ef85b37b92653a803430/src/box/alter.cc#L1036-L1047
      
      If then a rollback happens, we first rollback the replace operation, using
      `stmt->space`, and only after that do we swap back the indexes in
      `alter_space_rollback`:
      https://github.com/tarantool/tarantool/blob/de80e0264f7deb58ea86ef85b37b92653a803430/src/box/memtx_engine.cc#L659-L669
      https://github.com/tarantool/tarantool/blob/de80e0264f7deb58ea86ef85b37b92653a803430/src/box/alter.cc#L916-L925
      
      For DDL on the _space space, the replace operation and DDL occur on the
      same space. This means that during rollback of the replace, we will try to
      do a replace in the empty indexes that were created for `alter->new_space`.
      Not only does this break the replace operation, but also the newly inserted
      tuple, which remains in the index, gets deleted, and access to it causes
      undefined behavior (heap-use-after-free).
      
      As part of the work on this patch, tests of rollback of DDL on system
      spaces which use `on_rollback` triggers were enumerated:
      * `_sequence` — box/sequence.test.lua;
      * `_sequence_data` — box/sequence.test.lua;
      * `_space_sequence` — box/sequence.test.lua;
      * `_trigger` — sql/ddl.test.lua, sql/errinj.test.lua;
      * `_collation` — engine-luatest/gh_4544_collation_drop_test.lua,
                       box/ddl_collation.test.lua;
      * `_space` — box/transaction.test.lua, sql/ddl.test.lua;
      * `_index` — box/transaction.test.lua, sql/ddl.test.lua;
      * `_cluster` — box/transaction.test.lua;
      * `_func` — box/transaction.test.lua, box/function1.test.lua;
      * `_priv` — box/errinj.test.lua,
                  box-luatest/rollback_ddl_on__priv_space_test.lua;
      * `_user` — box/transaction.test.lua,
                  box-luatest/gh_4348_transactional_ddl_test.lua.
      
      Closes #9893
      
      NO_DOC=<bugfix>
      
      (cherry picked from commit d529082f)
      41af99a2
    • Georgiy Lebedev's avatar
      box: pass statement being rolled back (if any) to `priv_grant` · 83ae9be8
      Georgiy Lebedev authored
      In scope of #9893 we are going to run statement `on_rollback` triggers
      before rolling back the corresponding statement. During rollback of DDL in
      the `_priv` space, the database is accessed from `user_reload_privs` to
      reload user privileges, so we need it to account for the current statement
      being rolled back: i.e., the new tuple that was introduced (if any) must
      not be used, while the old tuple (if any) must be used.
      
      Needed for #9893
      
      NO_CHANGELOG=<refactoring>
      NO_DOC=<refactoring>
      
      (cherry picked from commit 797c04ff)
      83ae9be8
    • Ilya Verbin's avatar
      txn: pass txn_stmt instead of txn to on_commit/on_rollback · 817697e8
      Ilya Verbin authored
      Currently on_rollback triggers are called on rollback of the whole
      transaction. To make it possible to invoke them on rollback to a
      savepoint, we need to pass a statement at which the savepoint was
      created.
      
      Needed for #9340
      
      NO_DOC=refactoring
      NO_TEST=refactroring
      NO_CHANGELOG=refactoring
      
      (cherry picked from commit a1d85827)
      817697e8
Loading