Skip to content
Snippets Groups Projects
  1. Jun 15, 2023
    • Georgiy Lebedev's avatar
      build: replace `string` with JOIN option to `string_join` helper in CMake · 3250910f
      Georgiy Lebedev authored
      `string` with JOIN option is only available since CMake 3.12, but we have
      developers using CMake 3.1: implement a utility `string_join` function to
      remove this dependency.
      
      Closes #5881
      
      NO_CHANGELOG=<build fix>
      NO_DOC=<build fix>
      NO_TEST=<build fix>
      
      (cherry picked from commit 55298308)
      3250910f
    • Vladimir Davydov's avatar
      xrow: fix large bit shift error in xrow_decode_dml · 992ea525
      Vladimir Davydov authored
      Reported by ASAN. The issue was fixed in the master branch in commit
      b9550f19 ("box: support space and index names in IPROTO requests").
      
      NO_TEST=asan
      NO_DOC=bug fix
      NO_CHANGELOG=minor
      
      (cherry picked from commit d64a639b)
      992ea525
    • Vladimir Davydov's avatar
      xrow: ignore unknown IPROTO keys on decode · 5bbc2b6e
      Vladimir Davydov authored
      The xrow_decode_* functions are written in such a way that they ignore
      unknown IPROTO keys. This is required for connectivity between different
      Tarantool version. However, there's bug in the code connected with the
      value type checking: we fail if the key is >= iproto_key_MAX. This
      worked fine as long as we added new IPROTO keys in the middle of the key
      space, without bumping iproto_key_MAX, but this assumption broke when we
      added IPROTO_AUTH_TYPE. The issue is exacerbated by the fact that
      IPROTO_AUTH_TYPE is used by IPROTO_ID, which is sent unconditionally on
      connect. Let's fix the value type check and add some tests.
      
      Notes:
       - xrow_decode_heartbeat turns out to be unused. Drop it.
       - Fix the net.box helpers response_body_decode and netbox_decode_table
         to handle unknown keys and empty body. This is needed to properly
         decode a response to an injection in tests.
       - Testing unknown keys in replication requests would be complicated.
         Instead we add a bunch of unit tests.
       - Convert the xrow unit test to TAP.
      
      Closes #8745
      
      NO_DOC=bug fix
      
      (cherry picked from commit ee0660b8)
      5bbc2b6e
    • Georgiy Lebedev's avatar
      box: refactor net.box response body decoding · 81ce2185
      Georgiy Lebedev authored
      Response body decoding of DML and call/eval requests is very ad-hoc and
      hard to extend: introduce a new `response_body_decode` helper that decodes
      the response body similarly to `xrow_decode_dml` — this will allow to
      separate decoding from processing.
      
      Needed for #8147
      
      NO_CHANGELOG=refactoring
      NO_DOC=refactoring
      NO_TEST=refactoring
      
      (cherry picked from commit 1d6043fa)
      81ce2185
  2. Jun 13, 2023
  3. Jun 08, 2023
  4. Jun 07, 2023
  5. Jun 06, 2023
    • Vladimir Davydov's avatar
      test: write bad snap with errinj in gh_7974_force_recovery_bugs_test · ab299c6a
      Vladimir Davydov authored
      Corrupted snap files that are used in the test were generated manually
      using a now old Tarantool version that has an outdated system schema.
      In the scope of #7149 DDL was forbidden until the system schema is
      upgraded. The problem is luatest tries to grant super privileges to
      the guest user (which is a DDL operation) after starting a test instance
      unless they are already granted. Since the snap files don't store the
      required privileges, luatest fails.
      
      To fix this issue, let's generate corrupted snap files right in the test
      using error injection.
      
      Closes #8702
      
      NO_DOC=test
      NO_CHANGELOG=test
      
      (cherry picked from commit 67598073)
      ab299c6a
  6. Jun 02, 2023
    • Oleg Chaplashkin's avatar
      test: ban direct calling of box.cfg() · 67dc40e2
      Oleg Chaplashkin authored
      Direct call and configuration of the runner instance is prohibited. Now
      if you need to test something with specific configuration use a server
      instance please (see luatest.Server module).
      
      In-scope-of tarantool/luatest#245
      
      NO_DOC=ban calling box.cfg
      NO_TEST=ban calling box.cfg
      NO_CHANGELOG=ban calling box.cfg
      
      (cherry picked from commit fc3426d8)
      67dc40e2
  7. May 29, 2023
    • Serge Petrenko's avatar
      raft: fix spurious split-vote · 3e0229fb
      Serge Petrenko authored
      Due to a typo raft candidate counted a vote for another node as a vote
      for self in its split-vote detector. This could lead to spurious
      split-vote detection in cases when another node wins elections with a bare
      minimum of votes for it (exactly a quorum of votes).
      
      Closes #8698
      
      NO_DOC=bugfix
      
      (cherry picked from commit 2afde5b1)
      3e0229fb
    • Serge Petrenko's avatar
      raft: make promote bump term and vote at once · 27c550cf
      Serge Petrenko authored
      box.ctl.promote() was implemented as follows: an instance bumps the
      term and marks itself a candidate, but doesn't vote for self
      immediately. Instead it relies on the machinery which makes a candidate
      vote for self as soon as it persists a new term.
      
      This differs from a normal election start due to leader timeout: there
      term and vote are bumped at once.
      
      Besides, this increases probability of box.ctl.promote() resulting in
      other node getting elected: if a node first broadcasts a term without a
      vote, it is not considered a candidate, so other candidates might start
      elections and vote for themselves.
      
      Let's bring promote into line with automatic elections.
      
      Closes #8497
      
      NO_DOC=bugfix
      
      (cherry picked from commit 17371215)
      27c550cf
    • Serge Petrenko's avatar
      raft: persist vote for self together with term bump · 657e3f92
      Serge Petrenko authored
      Commit c9155ac8 ("raft: persist new term and vote separately") made
      the nodes persist new term and vote separately, using 2 WAL writes.
      Writing the term first is needed to flush all the ongoing transactions,
      so that the node's vclock is updated and can be checked against the
      candidate's vclock. Otherwise it could happen that the node persists a
      vote for some candidate only to find that it's vclock would actually
      become incomparable with the candidate's.
      
      Actually, this guard is not needed when checking a vote for self,
      because a node can always vote for self. Besides, splitting term bump
      and vote can lead to increased probability of split-vote. It may happen
      that a candidate bumps and broadcasts the new term without a vote,
      making other nodes vote for self. Let's go back to writing term and vote
      together for self votes.
      
      This change makes raft candidate persist term bump and vote for self in
      one WAL write instead of two, so all the tests which count WAL writes or
      expect 2 separate state updates for term and vote are rewritten.
      
      Prerequisite #8497
      
      NO_DOC=not user-visible
      NO_CHANGELOG=not user-visible
      
      (cherry picked from commit 8a124e50)
  8. May 24, 2023
  9. May 23, 2023
    • Igor Munkin's avatar
      luajit: bump new version · 43297db7
      Igor Munkin authored
      * LJ_GC64: Make ASMREF_L references 64 bit.
      * lldb: introduce luajit-lldb
      * x64/LJ_GC64: Fix emit_rma().
      * Limit path length passed to C library loader.
      
      Part of #4808
      Part of #8069
      Part of #8516
      
      NO_DOC=LuaJIT submodule bump
      NO_TEST=LuaJIT submodule bump
      43297db7
    • Nikita Zheleztsov's avatar
      replication: replicaset state machine assert fail · 9128f50e
      Nikita Zheleztsov authored
      Currently replicaset state machine tracking the number of connected,
      loading and synced appliers may perform unnecessary decrementing of
      their count. On debug version this may lead to assertion failure.
      Here's the way it may happen:
        1. Any kind of exception occurs in applier thread and leads to
           invoking its destructor (applier_thread_data_destroy), which
           is set with scoped guard;
        2. Cbus call is made in order to remove the corresponding applier
           from the thread. According to the fact that cbus_call is
           synchronous, we yield, waiting for the result from the applier
           thread.
        3. During yielding user calls reconfiguration, which invokes
           replicaset_update. Old appliers are pruned: for every replica
           trigger on changing state machine counter is deleted after which
           we stop fiber and wait its join.
        4. If the first replica in replicaset_foreach is not the errored
           one and the errored fiber wakes up during yielding with
           fiber_join, then zero decrementing happens.
      
      Let's clear the above mentioned triggers for all replicas at the
      first place and only after that stop and join their applier fibers.
      
      Closes #7590
      
      NO_DOC=bugfix
      
      (cherry picked from commit 7ec82674)
      9128f50e
    • Serge Petrenko's avatar
      replication: make ER_READONLY non-retriable for applier · 25c77df6
      Serge Petrenko authored
      The commit c1c77782 ("replication: fix bootstrap failing with
      ER_READONLY") made applier retry connection infinitely upon receiving a
      ER_READONLY error on join. At the time of writing that commit, this was
      the only way to make join retriable. Because there were no retries in
      scope of bootstrap_from_master. The join either succeeded or failed.
      
      Later on, bootstrap_from_master was made retriable in commit
      f2ad1dee ("replication: retry join automatically"). Now when
      bootstrap_from_master fails, replica reconnects to all the remote nodes,
      thus updating their ballots, chooses a new (probably different from the
      previous approach) bootstrap leader, and retries booting from it.
      
      The second approach is more preferable, and here's why. Imagine
      bootstrapping a cluster of 3 nodes, A, B and C in a full-mesh topology.
      B and C connect to all the remote peers almost instantly, and both
      independently decide that B will be the bootstrap leader (it means it
      has the smallest uuid among A, B, C).
      
      At the same time, A can't connect to C. B bootstraps the cluster, and
      joins C. After C is joined, A finally connects to C. Now A can choose a
      bootstrap leader. It has an old B's ballot (smallest uuid, but not yet
      booted) and C's ballot (already booted). This is because C's ballot is
      received after cluster bootstrap, and B's ballot was received earlier
      than that. So A believes C is a better bootstrap leader, and tries to
      boot from it.
      
      A will fail joining to C, because at the same time C tries to sync with
      everyone, including A, and thus stays read-only. Since A retries joining
      to the same instance over and over again, this situation makes the A and
      C stuck forever.
      
      Let's retry ER_READONLY on another level: instead of trying to join to
      the same bootstrap leader over and over, try to choose a new bootstrap
      leader and boot from it.
      
      In the situation described above, this means that A would try to join to
      C once, fail due to ER_READONLY, re-fetch new ballots from everyone and
      choose B as a join master (now it has smallest uuid and is booted).
      
      The issue was discovered due to linearizable_test.lua hanging
      occasionally with the following output:
      NO_WRAP
       No output during 40 seconds. Will abort after 320 seconds without output. List of workers not reporting the status:
      - 059_replication-luatest [replication-luatest/linearizable_test.lua, None] at /tmp/t/059_replication-luatest/linearizable.result:0
      [059] replication-luatest/linearizable_test.lua                       [ fail ]
      [059] Test failed! Output from reject file /tmp/t/rejects/replication-luatest/linearizable.reject:
      [059] TAP version 13
      [059] 1..6
      [059] # Started on Thu Sep 29 10:30:45 2022
      [059] # Starting group: linearizable-read
      [059] not ok 1	linearizable-read.test_wait_others
      [059] #   ....11.0~entrypoint.531.dev/test/luatest_helpers/server.lua:104: Waiting for "readiness" on server server_1-q7berSRY4Q_E (PID 53608) timed out
      [059] #   stack traceback:
      [059] #   	....11.0~entrypoint.531.dev/test/luatest_helpers/server.lua:104: in function 'wait_for_readiness'
      [059] #   	...11.0~entrypoint.531.dev/test/luatest_helpers/cluster.lua:92: in function 'start'
      [059] #   	...t.531.dev/test/replication-luatest/linearizable_test.lua:50: in function <...t.531.dev/test/replication-luatest/linearizable_test.lua:20>
      [059] #   	...
      [059] #   	[C]: in function 'xpcall'
      NO_WRAP
      
      Part-of #7737
      
      NO_DOC=bugfix
      
      (cherry picked from commit 09c18907)
      25c77df6
    • Yan Shtunder's avatar
      replication: retry join automatically · 8a04b374
      Yan Shtunder authored
      If the error is non-critical, the instance retries join
      automatically.
      
      @TarantoolBot document
      Title: Retry join automatically (for a timeout)
      
      There are two types of errors: critical and non-critical.
      You can recover from non-critical errors. For example, the
      connection master turned out to be read-only. It looks like
      a configuration error. If the error is non-critical, the
      instance retries join automatically. After a critical error
      there is no way to recover, because any of these mistakes
      are irreparable anyway. For example, vinyl can create some
      files. It's not clear what to do with them to try bootstrap
      again.
      
      Closes #6126
      
      (cherry picked from commit f2ad1dee)
      8a04b374
    • Yan Shtunder's avatar
      replication: reshuffle names of the state · 16c2211f
      Yan Shtunder authored
      In this patch will be introduced a new state for which the
      name is suitable: `APPLIER_FETCH_SNAPSHOT`. But it's already
      taken. The names of the state will be reshuffled a bit.
      
          `APPLIER_FETCH_SNAPSHOT -> APPLIER_WAIT_SNAPSHOT;`
          `APPLIER_INITIAL_JOIN -> APPLIER_WAIT_SNAPSHOT;`
      
      Part of #6126
      
      NO_DOC=preparatory commit
      NO_CHANGELOG=preparatory commit
      NO_TEST=preparatory commit
      
      (cherry picked from commit 218a62c4)
      16c2211f
    • Mergen Imeev's avatar
      sql: fix assertion and check in PRINTF() · 95e46e1a
      Mergen Imeev authored
      In 2.11 and later, there is no STRACCUM_NOMEM error in printf(), however
      this is not the case in 2.10, resulting in an assertion or segmentation
      error. This patch fixes that.
      
      Follow-up #tarantool/security#122
      
      NO_DOC=backport fix
      NO_TEST=backport fix
      NO_CHANGELOG=backport fix
      95e46e1a
    • Mergen Imeev's avatar
      sql: check printf() for failure · 9aa5772d
      Mergen Imeev authored
      This patch adds a check that sqlXPrintf() does not fail in the built-in
      SQL function printf(). There are two possible problems: the result might
      get too large, or there might be an integer overflow because internally
      int values are converted to size_t.
      
      Closes #tarantool/security#122
      
      NO_DOC=bugfix
      
      (cherry picked from commit 13159230)
      9aa5772d
    • Mergen Imeev's avatar
      sql: assert in xferOptimization() · 4ea1fb25
      Mergen Imeev authored
      This patch fixes problems with INSERT INTO ... SELECT FROM optimization.
      These problems appeared after 6b8acd8f, where the check became redundant,
      but was not updated. Two problems arose:
      1) an assertion or segmentation fault when optimization was used and the
      source space does not have an index;
      2) optimization can be used even if the indexes are incompatible.
      
      The second problem does not result in changes that are user-visible, so
      there is no test.
      
      Closes #8661
      
      NO_DOC=bugfix
      
      (cherry picked from commit 039f714d)
      4ea1fb25
  10. May 16, 2023
    • Nikita Zheleztsov's avatar
      replication: fix updating is_candidate in raft · 8622994d
      Nikita Zheleztsov authored
      Currently on applier death `is_candidate` is updated after trying
      to start election. So, raft assumes it has healthy quorum and
      bumps term even when there's not enough healthy nodes to do that.
      
      Trigger on updating above-mentioned flag is run in
      `replicaset_on_health_change`. So, let's move it before executing
      `raft_notify_is_leader_seen`, which tries to start election.
      
      Closes #8433
      
      NO_DOC=bugfix
      
      (cherry picked from commit f077ebf6)
      8622994d
    • Oleg Babin's avatar
      datetime: fix invalid representation of timestamps with fraction part · ca8813a0
      Oleg Babin authored
      Sometimes we need negative timestamps to work with dates before
      1970. But seems such cases were even covered in tests. So there
      wasn't any handling of negative timestamps with fraction part.
      Such datetime objects had incorrect string representation (e.g.
      "1963-11-22T12:30:02.-999"). This patch fixes it.
      
      Closes #8570
      
      NO_DOC=bugfix
      
      (cherry picked from commit 8e7514b9)
      ca8813a0
    • Oleg Babin's avatar
      datetime: fix negative nsec handling · d92331a6
      Oleg Babin authored
      Seems that problem code part was ported from Lua as is. But there
      is some difference between modulo operator in C and in Lua. Lua
      always returns positive value but in C result could be negavive.
      This difference led to the case when after subtraction nsec part
      of datetime object become negative that yielded weird result on
      attempt to get string representation (e.g."2008-02-03T03:36:43.-100Z").
      This patch fixes it.
      
      Part of #8570
      
      NO_DOC=bugfix
      NO_CHANGELOG=see next commit
      
      (cherry picked from commit a9c7639a)
      d92331a6
  11. May 15, 2023
    • Oleg Babin's avatar
      datetime: fix error when timestamp is set with nsec/usec/msec · 8798b899
      Oleg Babin authored
      This patch fixes a case when timestamp is passed to datetime.set
      function at the same time with nsec, usec or msec.
      It works fine for datetime.new but some logic was missed for set
      function. Here we fix that and introduce a test.
      
      Closes #8583
      
      NO_DOC=bugfix
      
      (cherry picked from commit e0855097)
      8798b899
    • Vladimir Davydov's avatar
      box: exclude uncommitted alter records from snapshot · 709e735a
      Vladimir Davydov authored
      With MVCC off (box.cfg.memtx_use_mvcc_engine = false), a memtx space
      read view may include a dirty (not committed to WAL) record. To prevent
      such records from being written to a snapshot, we sync WAL after
      creating a read view for a snapshot. The problem is that it doesn't work
      for long (yielding) DDL operations, such as building a new index,
      because such operations yield before waiting on WAL. As a result,
      a dirty DDL record may make it to a snapshot even though it may fail
      eventually. To fix that, let's keep track of all yielding DDL statements
      and exclude them from a read view using the memtx snapshot cleaner.
      
      Closes #8530
      
      NO_DOC=bug fix
      
      (cherry picked from commit a532e375)
      709e735a
    • Vladimir Davydov's avatar
      iproto: send IPROTO_WATCH sync number in IPROTO_EVENT packet · 1651ab08
      Vladimir Davydov authored
      We don't use this functionality in net.box (sync number is always 0 for
      all watch/event packets), but other clients may actually use it.
      
      Closes #8393
      
      @TarantoolBot document
      Title: Document that `IPROTO_EVENT` has sync number
      
      Initially the sync number sent by a client in an `IPROTO_WATCH` request
      was ignored and `IPROTO_EVENT` packet didn't have a sync number. There
      were complaints about it from users so we consider this to be a bug.
      Now the server sends the same sync number in an `IPROTO_EVENT` packet
      as the one sent by the client in the last corresponding `IPROTO_WATCH`
      request.
      
      (cherry picked from commit 99389ac6)
      1651ab08
  12. May 12, 2023
    • Sergey Ostanevich's avatar
      flaky: fix the qsync_advanced test · 6d6c03a7
      Sergey Ostanevich authored
      Add necessary wait for replication to appear on the replica, enforce
      correct txn isolation to avoid memtx/vinyl discrepancy.
      Remove the test from the fragile list.
      
      Closes tarantool/tarantool-qa#292
      
      NO_DOC=test fix
      NO_CHANGELOG=test fix
      
      (cherry picked from commit 3a220dad)
      6d6c03a7
  13. May 05, 2023
    • Sergey Bronnikov's avatar
      test: fix format of dictionaries · ef1b0d00
      Sergey Bronnikov authored
      According to libFuzzer documentation [1] backslash should be escaped.
      
      1. https://llvm.org/docs/LibFuzzer.html#dictionaries
      
      ```
      $ swim_proto_meta_fuzzer -dict=swim_proto_meta_fuzzer.dict
      ParseDictionaryFile: error in line 1
                      "\001\000\000\004"
      $ swim_proto_member_fuzzer -dict=swim_proto_member_fuzzer.dict
      ParseDictionaryFile: error in line 1
                      "\022\000\000\000\000\000\000\000"
      ```
      
      NO_CHANGELOG=internal
      NO_DOC=internal
      NO_TEST=internal
      
      (cherry picked from commit 62d03f15)
      ef1b0d00
  14. Apr 27, 2023
    • Vladimir Davydov's avatar
      static-build: enable compiler optimizations for dependencies · 5e7d999b
      Vladimir Davydov authored
      An autoconf-generated configure script doesn't enable compiler
      optimization flags if CFLAGS / CXXFLAGS options are set explicitly.
      We started setting CFLAGS / CXXFLAGS in commit e6abe1c9
      ("cmake: add extra security compiler options"). As a result, users
      started experiencing performance degradation issues, like the one
      described in tarantool/tarantool-ee#440.
      
      Let's set -O2 in CFLAGS / CXXFLAGS explicitly to fix that.
      
      Closes #8606
      Needed for tarantool/tarantool-ee#440
      
      NO_DOC=build
      NO_TEST=build
      
      (cherry picked from commit 52f6ed4d)
      5e7d999b
Loading