Skip to content
Snippets Groups Projects
  1. Jul 19, 2021
    • VitaliyaIoffe's avatar
      build: fix uninitialized variables · 7e0d751e
      VitaliyaIoffe authored
      Build for Fedora 34 is breaking out due to uninitialized variables in
       a few places:
       For example,
       [100%] Built target merger.test
      /source/build/usr/src/debug/tarantool-2.9.0.116/src/box/sql.c: In function 'tarantoolSqlNextSeqId':
      /source/build/usr/src/debug/tarantool-2.9.0.116/src/box/sql.c:1186:13: error: 'key' may be used uninitialized [-Werror=maybe-uninitialized]
       1186 |         if (box_index_max(BOX_SEQUENCE_ID, 0 /* PK */, key,
      
      Needed for: #6074
      7e0d751e
  2. Jul 16, 2021
    • Cyrill Gorcunov via Tarantool-patches's avatar
      github-ci: freebsd -- filter out `-notest` · 1575f3c0
      
      We use `-notest` postfix when wanna share
      the code only, without running tests. And
      for FreeBSD template the snippet has been
      missed. Add it.
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      Reviewed-by: default avatarAlexander Turenko <alexander.turenko@tarantool.org>
      1575f3c0
    • Serge Petrenko's avatar
      replication: stop pushing TimedOut error to the replica · 61b95d16
      Serge Petrenko authored
      Every error that happens during master processes a join or subscribe
      request is sent to the replica for better diagnostics.
      
      This could lead to the following situation with the TimedOut error:
      it could be written on top of a half-written row and make the replica stop
      replication with ER_INVALID_MSGPACK error. The error is unrecoverable
      and the only way to resume replication after it happens is to reset
      box.cfg.replication.
      
      Here's what happened:
      
      1) Replica is under heavy load, meaning it's event loop is occupied by
         some fiber not yielding control to others.
      
      2) applier and other fibers aren't scheduled while the event loop is
         blocked. This means applier doesn't send heartbeat messages to the
         master and doesn't read any data coming from the master.
      
      3) The unread master's data piles up. First in replica's receive buffer, then
         in master's send buffer.
      
      4) Once master's send buffer is full, the corresponding socket stops
         being writeable and the relay yields waiting for the socket to become
         writeable again. The send buffer might contain a partially written
         row by now.
      
      5) Replication timeout happens on master, because it hasn't heard from
         replica for a while. An exception is raised, and the exception is
         pushed to the replica's socket. Now two situations are possible:
      
        a) the socket becomes writeable by the time exception is raised.
           In this case the exception is logged to the buffer right after
           a partially written row. Once replica receives the half-written
           row with an exception logged on top, it errors with
           ER_INVALID_MSGPACK. Replication is broken.
      
        b) the socket isn't writeable still (the most probable scenario)
           The exception isn't logged to the socket and the connection is
           closed. Replica eventually receives a partially-written row and
           retries connection to the master normally.
      
      In order to prevent case a) from happening, let's not push TimedOut
      errors to the socket at all. They're the only errors that could be
      raised while a row is being written, i.e. the only errors that could
      lead to the situation described in 5a.
      
      Closes #4040
      61b95d16
  3. Jul 15, 2021
  4. Jul 14, 2021
    • Mergen Imeev's avatar
      sql: make type mismatch errors more informative · 9a635680
      Mergen Imeev authored
      Prior to this patch, in some cases the type mismatch error description
      showed the value, and in some cases the type of the value. After this
      patch, both the type and value will be shown. Also, inconsistent type
      error description also become more informative. Previously it contained
      only type of value, now it contains value and its type.
      
      Close #6176
      9a635680
    • Mergen Imeev's avatar
      sql: use proper type names in error descriptions · 44e75c98
      Mergen Imeev authored
      Prior to this patch, the type mismatch error description and the
      inconsistent types error description in some cases displayed type names
      that were different from the default ones. After this patch, all types
      in these descriptions are described using the default names.
      
      Part of #6176
      44e75c98
    • Mergen Imeev's avatar
      sql: properly show values in type mismatch error · e1ee3bab
      Mergen Imeev authored
      Currently, some values are displayed improperly in the type mismatch
      error description. For VARBINARY, the word "varbinary" is printed
      instead of the value. STRING values are printed without quotes, which
      can be confusing in some cases, such as when it consists of spaces.
      
      This patch introduces the following changes:
      1) VARBINARY value will be printed as x'<value in hexadecimal format>'.
      2) STRING value will be printed in single quotes.
      3) UUID value will be printed in single quotes.
      
      UUID value does not need to be enclosed in single quotes, since there
      are no literals for UUIDs, but it looks more convenient.
      
      Part of #6176
      e1ee3bab
    • Mergen Imeev's avatar
      sql: truncate values in type mismatch error · 484a48b7
      Mergen Imeev authored
      STRING, MAP, and ARRAY values that are too long can make the type
      mismatch error description less descriptive than necessary. This patch
      truncates values that are too long and adds "..." to indicate that the
      value has been truncated.
      
      Part of #6176
      484a48b7
  5. Jul 12, 2021
  6. Jul 09, 2021
    • Andrey Kulikov's avatar
      build: Fix build with backtraces enabled on arm64 (aarch64) · b68ae47b
      Andrey Kulikov authored
      Fix build errors on arm64 with backtraces being enabled.
      
      Fixes #6142
      
      See also:
       - https://github.com/libunwind/libunwind/pull/221
       - #5471
       - #6142
      b68ae47b
    • Aleksandr Lyapunov's avatar
      txm: add a test that creates an index in transaction · 4da3fb5e
      Aleksandr Lyapunov authored
      The problem was fixed in #5515, this commit just verifies that the
      test case works fine.
      
      Closes #6137
      4da3fb5e
    • Aleksandr Lyapunov's avatar
      txm: use index itself instead of index_id · 8bd94bc0
      Aleksandr Lyapunov authored
      There was a serious problem in txm: index_id from struct index was
      used as an index in some arrays (for example in array of links in
      stories). As a result, if a user had created an index specifying ID
      that is not sequential, the array access would have been out of
      range which could lead to segfault.
      
      This patch makes use of indexes directly, and when it comes to
      array aceess, a dense_id is used, which fits perfectly for that.
      
      As a part of #5515 this patch makes the cases in it at least stable.
      
      Part of #5515
      8bd94bc0
    • Aleksandr Lyapunov's avatar
      box: store compact_id for indexes · 234a32fe
      Aleksandr Lyapunov authored
      Histoically an index space may be accessed by iid (index ID), that
      is the ID set in index definition, or by sequential ID, that is a
      number in [0..space->index_count]. In other words, a space holds two
      arrays of indexes: 1) sparse (by iid) and 2) dense, by sequential ID.
      
      Since an instance of index belongs to one and only once space, any
      index is implicitly has this sequential ID. We can simply save this
      ID in index and distinguish indexes by it too.
      
      We could call this member 'sequential_id', but this name has too
      general meaning, while dense_id directly mentions dence array of a
      space.
      
      Part of #5515
      234a32fe
    • Aleksandr Lyapunov's avatar
      txm: postpone garbage collector · f8d21cca
      Aleksandr Lyapunov authored
      Before this patch garabage collector was executed right before
      allocation of a new story. That means that, for example, in the
      memtx_tx_history_add_stmt GC could be called a couple of times.
      
      Garbage collector is free to delete stories if they are no more
      used. Removing a story can cause an index modification with
      further tuple delete.
      
      For example imagine a space with one index, where one tuple
      {1, 1, 1} is placed. Then a transaction comes, deletes that tuple
      and commits. In this moment the tuple {1, 1, 1} can be still in
      index, marked as 'dirty' and having a corresponding story, which
      states that the tuple is deleted.
      
      This a valid situation, even necessary, for the case when another
      transaction is in a read view and must see that {1, 1, 1} not yet
      deleted. But when possible, GC would try to delete the story and
      remove the tuple from index.
      
      Now imagine that this GC happens when a new transaction inserts,
      for example, {1, 1, 1, 4}. In memtx_tx_history_add_stmt the new
      tuple replaces the old one in index, but the story of new tuple
      is not created yet. Then the new story is created, that causes
      GC, that tries to remove {1, 1, 1} from index and delete it from
      memory. An this moment memtx_tx_history_add_stmt relies on
      existance of {1, 1, 1} which doesn't exist.
      
      That is an example of general problem: a cleanup should not be
      done in the middle of complex function that can have some half
      made not valid intermediate state. The cleanup, including GC,
      should be done in the end of functions.
      
      This patch move story GC to the end of functions that use it.
      
      Part of #5515
      f8d21cca
  7. Jul 08, 2021
    • Cyrill Gorcunov's avatar
      raft: change request state to uint64_t · 7b835767
      Cyrill Gorcunov authored
      
      When new raft message comes in from the network we need
      to be sure that the payload is suitable for processing,
      in particular `raft_msg::state` must be valid because
      our code logic depends on it.
      
      For this sake make `raft_msg::state` being uint64_t
      which allows to an easier processing of the state
      field verification.
      
      Same time use panic() instead of unreacheable() macro
      because the test for valid state must be enabled all
      the time.
      
      Closes #6067
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      7b835767
  8. Jul 07, 2021
  9. Jul 05, 2021
    • Aleksandr Lyapunov's avatar
      txm: fix a crash in rollback when the space was deleted · 8620e61f
      Aleksandr Lyapunov authored
      With MVCC a case may happen:
      TX1 does something with some space and yields.
      TX2 deletes the space and commits.
      TX1 rolls back.
      
      The problem was that TX1 does something with already deleted space.
      This commit fixes that.
      
      Part of #6140
      8620e61f
    • Aleksandr Lyapunov's avatar
      debug: trash memory before freeing a space · 15f5a2cc
      Aleksandr Lyapunov authored
      That's a good practice in general.
      In particular it makes test case from #6140 to be stable.
      
      Part of #6140
      15f5a2cc
    • Aleksandr Lyapunov's avatar
      txm: fix a crash in conflict collection · d12f656a
      Aleksandr Lyapunov authored
      The problem was in case when mvcc engine was enabled and a transaction
      that was sent to read view due to conflict was trying to read a key that
      was the cause of the conflict.
      
      Closes #6131
      d12f656a
    • Alexander V. Tikhonov's avatar
      test: add new checksums to replication, box suites · 75193e5d
      Alexander V. Tikhonov authored
      Updated:
      
        box/net.box_reconnect_after_gh-3164.test.lua gh-5081
        replication/errinj.test.lua                  gh-3870
        replication/qsync_basic.test.lua             gh-5355
        replication/anon.test.lua                    gh-5381
        replication/status.test.lua                  gh-5409
        replication/election_qsync.test.lua          gh-5430
      
      Added new:
      
        box-py/iproto.test.py                             gh-qa-132
        replication/gh-5435-qsync-clear-synchro-queue-co> gh-qa-129
        replication/gh-5445-leader-inconsistency.test.lua gh-qa-129
        replication/gh-3055-election-promote.test.lua     gh-qa-127
        replication/election_basic.test.lua               gh-qa-133
      75193e5d
  10. Jul 02, 2021
    • mechanik20051988's avatar
      memtx: fix corrupted snapshot name in log file · da23a346
      mechanik20051988 authored
      Static buffer to save snapshot filename, reused later in `xlog_cursor_open`
      function. So when we log this name after it, we get corrupted name that has
      nothing to do with the real name. We should use `cursor.name` instead.
      da23a346
    • mechanik20051988's avatar
      test: fix flaky force_recovery test · 127044fa
      mechanik20051988 authored
      In the test, there is a place, where it was checked that the amount
      of valid data in snapshot in case when it was truncated is less than
      in case we write garbage to it. Often it's really so, but depends on
      the place of truncation/garbage location. Removed this check, because
      on different systems, snapshot size is slightly different each time
      you run test, so check will not pass every time.
      Follow-up #5422
      127044fa
    • Nikita Pettik's avatar
      txm: skip ephemeral spaces while positioning iterator · 3a1405bb
      Nikita Pettik authored
      In tree_iterator_start() it was assumed that iterator always contains
      valid space id. However, ephemeral spaces are known to have zero space
      id. So in case we are starting iterator which belongs to ephemeral
      space, we can't simply find that space in space cache. Moreover, we
      don't need to track ephemeral spaces in MVCC at all since they can be
      accessed only pointers and their lifespan is restricted by SQL query
      execution. So let's skip any MVCC-related routine while starting an
      iterator.
      
      Closes #6095
      3a1405bb
    • Aleksandr Lyapunov's avatar
      8a645be5
    • Vladimir Davydov's avatar
      Add changelog entry for gh-5436 · 15cb0bd5
      Vladimir Davydov authored
      Follow-up 29e2931c ("vinyl: fix race between compaction and gc of
      dropped LSM").
      15cb0bd5
    • Alexander V. Tikhonov's avatar
      github-ci: add GitHub Actions workflow for Odroid · 72c77166
      Alexander V. Tikhonov authored
      Odroid is GNU/Linux ARM64 platform. In scope of this commit new GitHub
      Actions workflows for testing Tarantool on Odroid hosts are added:
      
        Release: .github/workflows/odroid_arm64.yml
        Debug: .github/workflows/odroid_debug_arm64.yml
      
      Introduced new targets in .travis.mk Makefile:
      
        deps_odroid: Installs required dependencies.
      
        build_odroid: Builds Tarantool with the following flags set
          in env of .github/workflows/odroid_debug_arm64.yml file:
            1. to avoid the issue #6142:
               -DENABLE_BACKTRACE=OFF
            2. to avoid the issue #6143:
               -DCMAKE_C_FLAGS="-Wno-type-limits "
               -DCMAKE_BUILD_TYPE=Debug
      
        test_odroid: Builds and tests `LuaJIT-test` suite on Odroid.
      
      Also v1 version of GitHub checkout action is used, because action
      version v2 was introduced in git version 2.18.0 [1]. The latest
      available version on Odroid is the following:
      
        git is already the newest version (1:2.17.1-1ubuntu0.8).
      
      [1]: https://github.com/actions/checkout#readme
      
      Closes tarantool/tarantool-qa#121
      72c77166
  11. Jul 01, 2021
    • Vladimir Davydov's avatar
      vinyl: fix race between compaction and gc of dropped LSM · 29e2931c
      Vladimir Davydov authored
      An LSM tree (space index, that is) can be dropped while compaction is in
      progress for it. In this case compaction will still commit the new run
      to vylog upon completion. This usually works fine, but not if gc has
      already purged all the information about the dropped LSM tree from vylog
      by that time, in which case an attempt to commit the new run will result
      in permanently broken vylog (because compaction will write vylog records
      for a non-existing object):
      
      ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 13 deleted but not registered
      
      To prevent this from happening, let's make compaction silently drop the
      new run without committing it to vylog if the LSM tree has been dropped.
      This should work just fine - since the LSM tee isn't used anymore we
      don't need to have it compacted, neither do we need to delete the run,
      since gc will eventually clean up all artefacts left from the dropped
      LSM tree.
      
      One thing to be noted is that we also must exclude dropped LSM trees
      from further compaction - if we don't do that, we might end up picking
      the dropped LSM tree for compaction over and over again (because it
      isn't actually compacted).
      
      This patch also drops the gh-5141-invalid-vylog-file test, because the
      latter just ensured that the issue fixed by this patch is there.
      
      Closes #5436
      29e2931c
    • Egor Elchinov's avatar
      fiber: hide only backtraces of idle fibers · 77838aa8
      Egor Elchinov authored
      Now idle fibers are present in fiber.info()
      but without their stacks.
      
      Added test ensuring that fiber.info doesn't
      get cluttered by idle fibers stacks after
      dispatching multiple requests in short time.
      
      Closes #4235
      77838aa8
    • Egor Elchinov's avatar
      fiber: add FIBER_IS_IDLE flag · 0cf240e2
      Egor Elchinov authored
      In some cases it's good to have an opportunity to detect
      if fiber is idle in a fiber_pool.
      Now this can be done as fiber->flags & FIBER_IS_IDLE.
      
      Needed for: #4235
      0cf240e2
  12. Jun 24, 2021
  13. Jun 23, 2021
    • Cyrill Gorcunov's avatar
      relay: provide information about downstream lag · 29025bce
      Cyrill Gorcunov authored
      
      We already have `box.replication.upstream.lag` entry for monitoring
      sake. Same time in synchronous replication timeouts are key properties
      for quorum gathering procedure. Thus we would like to know how long
      it took of a transaction to traverse `initiator WAL -> network ->
      remote applier -> initiator ACK reception` path.
      
      Typical output is
      
       | tarantool> box.info.replication[2].downstream
       | ---
       | - status: follow
       |   idle: 0.61753897101153
       |   vclock: {1: 147}
       |   lag: 0
       | ...
       | tarantool> box.space.sync:insert{69}
       | ---
       | - [69]
       | ...
       |
       | tarantool> box.info.replication[2].downstream
       | ---
       | - status: follow
       |   idle: 0.75324084801832
       |   vclock: {1: 151}
       |   lag: 0.0011014938354492
       | ...
      
      Closes #5447
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      
      @TarantoolBot document
      Title: Add `box.info.replication[n].downstream.lag` entry
      
      `replication[n].downstream.lag` represents a lag between the main
      node writes a certain transaction to it's own WAL and a moment it
      receives an ack for this transaction from a replica.
      29025bce
    • Cyrill Gorcunov's avatar
      applier: send transaction's first row WAL time in the applier_writer_f · 45edc9bb
      Cyrill Gorcunov authored
      
      Applier fiber sends current vclock of the node to remote relay reader,
      pointing current state of fetched WAL data so the relay will know which
      new data should be sent. The packet applier sends carries xrow_header::tm
      field as a zero but we can reuse it to provide information about first
      timestamp in a transaction we wrote to our WAL. Since old instances of
      Tarantool simply ignore this field such extension won't cause any
      problems.
      
      The timestamp will be needed to account lag of downstream replicas
      suitable for information purpose and cluster health monitoring.
      
      We update applier statistics in WAL callbacks but since both
      apply_synchro_row and apply_plain_tx are used not only in real data
      application but in final join stage as well (in this stage we're not
      writing the data yet) the apply_synchro_row is extended with replica_id
      argument which is non zero when applier is subscribed.
      
      The calculation of the downstream lag itself lag will be addressed
      in next patch because sending the timestamp and its observation
      are independent actions.
      
      Part-of #5447
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      45edc9bb
    • Alexander V. Tikhonov's avatar
      test: remove fixed tests from fragile lists · 4053a355
      Alexander V. Tikhonov authored
      Checked and found that:
      
        #4353 -> tarantool/tarantool-qa#13:
          engine/ddl.test.lua fixed in #6102.
        #4926, tarantool/tarantool#115:
          box/alter_limits.test.lua fixed in
          tarantool/tarantool-qa#126.
        #5547 -> tarantool/tarantool-qa#50:
          box/net.box_schema_change_gh-2666.test.lua fixed in
          tarantool/tarantool-qa#126.
        #5583 -> tarantool/tarantool-qa#22:
          box/net.box_methods_gh-3107.test.lua fixed in
          tarantool/tarantool-qa#126.
      
      Closes tarantool/tarantool-qa#13
      Closes tarantool/tarantool-qa#115
      Closes #4926
      Closes tarantool/tarantool-qa#50
      Closes tarantool/tarantool-qa#22
      4053a355
Loading