Skip to content
Snippets Groups Projects
  1. Jul 08, 2022
    • Vladimir Davydov's avatar
      test: fix cleanup in vinyl-luatest/gh_6565_hot_standby_unsupported test · 6213907c
      Vladimir Davydov authored
      The gh_6565 test doesn't stop the hot standby replica it started,
      because the replica should fail to initialize and exit eventually
      anyway. However, if the replica lingers until the next test due to
      https://github.com/tarantool/test-run/issues/345, the next test may
      successfully connect to it, which is likely to lead to a failure,
      because UNIX socket paths used by luatest servers are not randomized.
      
      For example, here gh_6568 test fails after gh_6565, because it uses the
      same alias for the test instance ('replica'):
      
      NO_WRAP
      [008] vinyl-luatest/gh_6565_hot_standby_unsupported_>                 [ pass ]
      [008] vinyl-luatest/gh_6568_replica_initial_join_rem>                 [ fail ]
      [008] Test failed! Output from reject file /tmp/t/rejects/vinyl-luatest/gh_6568_replica_initial_join_removal_of_compacted_run_files.reject:
      [008] TAP version 13
      [008] 1..1
      [008] # Started on Fri Jul  8 15:30:47 2022
      [008] # Starting group: gh-6568-replica-initial-join-removal-of-compacted-run-files
      [008] not ok 1  gh-6568-replica-initial-join-removal-of-compacted-run-files.test_replication_compaction_cleanup
      [008] #   builtin/fio.lua:242: fio.pathjoin(): undefined path part 1
      [008] #   stack traceback:
      [008] #         builtin/fio.lua:242: in function 'pathjoin'
      [008] #         ...ica_initial_join_removal_of_compacted_run_files_test.lua:43: in function 'gh-6568-replica-initial-join-removal-of-compacted-run-files.test_replication_compaction_cleanup'
      [008] #         ...
      [008] #         [C]: in function 'xpcall'
      [008] replica | 2022-07-08 15:30:48.311 [832856] main/103/default.lua F> can't initialize storage: unlink, called on fd 30, aka unix/:(socket), peer of unix/:(socket): Address already in use
      [008] # Ran 1 tests in 0.722 seconds, 0 succeeded, 1 errored
      NO_WRAP
      
      Let's fix this by explicitly killing the hot standby replica. Since it
      could have exited voluntarily, we need to use pcall, because server.stop
      fails if the instance is already dead.
      
      This issue is similar to the one fixed by commit 85040161 ("test:
      stop server started by vinyl-luatest/update_optimize test").
      
      NO_DOC=test
      NO_CHANGELOG=test
      6213907c
    • Nikolay Shirokovskiy's avatar
      http_parser: fix parsing HTTP protocol version · 9ee7e568
      Nikolay Shirokovskiy authored
      Handle status header response like 'HTTP/2 200' with version without
      dot.
      
      Closes #7319
      
      NO_DOC=bugfix
      9ee7e568
    • Nikolay Shirokovskiy's avatar
      box: fix unexpected error on granting privileges to admin · aaf6f8e9
      Nikolay Shirokovskiy authored
      We use LuaJIT 'bit' module for bitwise operations. Due to platform
      interoperability it truncates arguments to 32bit and returns signed
      result. Thus on granting rights using bit.bor to admin user which
      have 0xffffffff rights (from bootstrap snapshot) we get -1 as a result.
      This leads to type check error given in issue later in execution.
      
      Closes #7226
      
      NO_DOC=minor bugfix
      aaf6f8e9
    • Vladimir Davydov's avatar
      memtx: move allocator stuff from memtx_engine to MemtxAllocator · c7e3eae9
      Vladimir Davydov authored
      Let's hide all the logic regarding delayed freeing of memtx tuples to
      MemtxAllocator and provide memtx_engine with methods for allocating and
      freeing tuples (not memtx_tuples, just generic tuples). All the tuple
      and snapshot version manipulation stuff is now done entirely in
      MemtxAllocator.
      
      This is a preparation for implementing a general-purpose tuple read view
      API in MemtxAllocator, see #7364.
      
      Note, since memtx_engine now deals with the size of a regular tuple,
      which is 4 bytes less than the size of memtx_tuple, this changes the
      size reported by OOM messages and the meaning of memtx_max_tuple_size,
      which now limits the size of a tuple, not memtx_tuple.
      
      NO_DOC=refactoring
      NO_TEST=refactoring
      NO_CHANGELOG=refactoring
      c7e3eae9
    • Mergen Imeev's avatar
      sql: fix error in ORDER BY ephemeral space format · 64cdb80c
      Mergen Imeev authored
      This patch fixes a bug where the ANY field type was replaced by the
      SCALAR field type in the ephemeral space used in ORDER BY.
      
      Closes #7345
      
      NO_DOC=bugfix
      64cdb80c
    • Mergen Imeev's avatar
      sql: ariphmetic of unsigned values · 3715f632
      Mergen Imeev authored
      After this patch, the result type of arithmetic between two unsigned
      values will be INTEGER.
      
      Closes #7295
      
      NO_DOC=bugfix
      3715f632
  2. Jul 07, 2022
    • Ilya Verbin's avatar
      coio: do exec after fork in unit/coio.test · 8b5bcd92
      Ilya Verbin authored
      This makes the test more real-life, and allows not to bother in the
      child process with the memory allocated prior to fork.
      
      Closes #7370
      
      NO_DOC=test fix
      NO_CHANGELOG=test fix
      8b5bcd92
    • Igor Munkin's avatar
      test: remove tests from fragile list · c245e201
      Igor Munkin authored
      Since "x64/LJ_GC64: Fix fallback case of asm_fuseloadk64()."
      (42853793ec3e6e36bc0f7dff9d483d64ba0d8d28) is backported into
      tarantool/luajit trunk, box/bitset.test.lua and box/function1.test.lua
      tests are no more fragile.
      
      Follows up tarantool/tarantool-qa#234
      Follows up tarantool/tarantool-qa#235
      
      NO_DOC=test changes
      NO_CHANGELOG=test changes
      NO_TEST=test changes
      c245e201
  3. Jul 06, 2022
    • Yaroslav Lobankov's avatar
      test: fix box-py/args.test.py · 11166382
      Yaroslav Lobankov authored
      This patch fixes `box-py/args.test.py` test and allows it to work
      against tarantool installed from a package.
      
      Closes tarantool/tarantool-qa#246
      
      NO_DOC=testing stuff
      NO_TEST=testing stuff
      NO_CHANGELOG=testing stuff
      11166382
    • Yaroslav Lobankov's avatar
      test: fix app-tap/tarantoolctl.test.lua · 91143500
      Yaroslav Lobankov authored
      This patch fixes `app-tap/tarantoolctl.test.lua` test and allows it to
      work against tarantool installed from a package.
      
      Part of tarantool/tarantool-qa#246
      
      NO_DOC=testing stuff
      NO_TEST=testing stuff
      NO_CHANGELOG=testing stuff
      91143500
    • Yaroslav Lobankov's avatar
      test: fix gh-1700-abort-recording-on-fiber-switch.test.lua · d6a6fc23
      Yaroslav Lobankov authored
      This patch fixes `gh-1700-abort-recording-on-fiber-switch.test.lua` test
      and allows it to work against tarantool installed from a package.
      
      Part of tarantool/tarantool-qa#246
      
      NO_DOC=testing stuff
      NO_TEST=testing stuff
      NO_CHANGELOG=testing stuff
      d6a6fc23
    • Yaroslav Lobankov's avatar
      test: add new `make` test targets · 9adedc1f
      Yaroslav Lobankov authored
      This patch adds the new `make` test targets to run unit and functional
      tests independending on each other. In some cases it can be useful.
      
      New test targets:
      
      * `test-unit` - run unit tests and exit after the first failure
      * `test-unit-force` - run unit tests
      * `test-func` - run functional tests and exit after the first failure
      * `test-func-force` - run functional tests
      
      Note, tests for 'small' lib are considered as unit tests as well.
      
      Part of tarantool/tarantool-qa#246
      
      NO_DOC=testing stuff
      NO_TEST=testing stuff
      NO_CHANGELOG=testing stuff
      9adedc1f
    • Nikolay Shirokovskiy's avatar
      test: use default readline configuration for a test · 1cd1a2df
      Nikolay Shirokovskiy authored
      If readline 'show-mode-in-prompt' is on then test fails because it does
      not handle prefix added to prompt in this mode. Let's use default
      (compiled in) readline configuration instead of the one provided by
      user or system config.
      
      NO_DOC=test changes
      NO_CHANGELOG=test changes
      NO_TEST=test changes
      1cd1a2df
    • Georgiy Lebedev's avatar
      memtx: fix story delete statement list · 654cf498
      Georgiy Lebedev authored
      Current implementation of tracking statements that delete a story has a
      flaw, consider the following example:
      
      tx1('box.space.s:replace{0, 0}') -- statement 1
      
      tx2('box.space.s:replace{0, 1}') -- statement 2
      tx2('box.space.s:delete{0}') -- statement 3
      tx2('box.space.s:replace{0, 2}') -- statement 4
      
      When statement 1 is prepared, both statements 2 and 4 will be linked to the
      delete statement list of {0, 0}'s story, though, apparently, statement 4
      does not delete {0, 0}.
      
      Let us notice the following: statement 4 is "pure" in the sense that, in
      the transaction's scope, it is guaranteed not to replace any tuple — we
      can retrieve this information when we check where the insert statement
      violates replacement rules, use it to determine "pure" insert statements,
      and skip them later on when, during preparation of insert statements, we
      handle other insert statements which assume they do not replace anything
      (i.e., have no visible old tuple).
      
      On the contrary, statements 1 and 2 are "dirty": they assume that they
      replaced nothing (i.e., there was no visible tuple in the index) — when one
      of them gets prepared — the other one needs to be either aborted or
      relinked to replace the prepared tuple.
      
      We also need to fix relinking of delete statements from the older story
      (in terms of the history chain) to the new one during preparation of insert
      statements: a statement needs to be relinked iff it comes from a different
      transaction (to be precise, there must, actually, be no more than one
      delete statement from the same transaction).
      
      Additionally, add assertions to verify the invariant that the story's
      add (delete) psn is equal to the psn of the add (delete) statement's
      transaction psn.
      
      Closes #7214
      Closes #7217
      
      NO_DOC=bugfix
      654cf498
  4. Jul 05, 2022
    • Vladimir Davydov's avatar
      test: stop server started by vinyl-luatest/update_optimize test · 85040161
      Vladimir Davydov authored
      Normally, if a server created by a test isn't stopped it should be
      forcefully killed by luatest or test-run. For some reason, it doesn't
      happen sometimes, which may lead to the next test failing to bind,
      because all test servers that belong to the same luatest suite and have
      the same alias share the same socket path (although they use different
      directories). This looks like a test-run or luatest bug.
      
      The vinyl-luatest/update_optimize test doesn't stop the test server
      so because of this test-run/luatest bug, the next vinyl-luatest test
      fails occasionally:
      
      NO_WRAP
      [001] vinyl-luatest/update_optimize_test.lua                          [ pass ]
      [001] vinyl-luatest/gh_6568_replica_initial_join_rem>                 [ fail ]
      [001] Test failed! Output from reject file /tmp/t/rejects/vinyl-luatest/gh_6568_replica_initial_join_removal_of_compacted_run_files.reject:
      [001] TAP version 13
      [001] 1..1
      [001] # Started on Tue Jul  5 13:30:37 2022
      [001] # Starting group: gh-6568-replica-initial-join-removal-of-compacted-run-files
      [001] master | 2022-07-05 13:30:37.530 [189564] main/103/default.lua F> can't initialize storage: unlink, called on fd 25, aka unix/:(socket), peer of unix/:(socket): Address already in use
      [001] ok     1  gh-6568-replica-initial-join-removal-of-compacted-run-files.test_replication_compaction_cleanup
      [001] not ok 1  gh-6568-replica-initial-join-removal-of-compacted-run-files.test_replication_compaction_cleanup
      [001] #   Failure in after_all hook: /home/vlad/.rocks/share/tarantool/luatest/process.lua:100: kill failed: 256
      [001] #   stack traceback:
      [001] #         .../src/tarantool/tarantool/test/luatest_helpers/server.lua:206: in function 'stop'
      [001] #         ...src/tarantool/tarantool/test/luatest_helpers/cluster.lua:44: in function 'drop'
      [001] #         ...ica_initial_join_removal_of_compacted_run_files_test.lua:34: in function <...ica_initial_join_removal_of_compacted_run_files_test.lua:33>
      [001] #         ...
      [001] #         [C]: in function 'xpcall'
      [001] # Ran 1 tests in 1.682 seconds, 0 succeeded, 1 errored
      NO_WRAP
      
      Let's fix this by stopping the test server started by the
      vinyl-luatest/update_optimize test.
      
      NO_DOC=test
      NO_CHANGELOG=test
      85040161
    • Vladimir Davydov's avatar
      test: add improved test for SELECT consistency · 4e9a94a3
      Vladimir Davydov authored
      The idea behind the new test is the same as the one used by
      vinyl/select_consistency.test.lua: create a space with a few
      compound secondary indexes that share the first part, then run
      SELECT requests under heavy write load and check that results
      match. However, in comparison to its predecessor, the new test
      has a few improvements:
      
       1. It generates DML requests in multi-statement transactions.
       2. It checks non-unique indexes.
       3. It checks multikey indexes.
       4. It triggers L0 dumps not by box.snapshot, but by exceeding
          the box.cfg.vinyl_memory limit.
       5. It starts 20 write and 5 read fibers.
       6. It reruns the test after restart to check that recovery works fine.
       7. It checks that there's no phantom statements stored in
          the space indexes after the test.
       8. It runs the test with deferred DELETEs enabled and disabled.
          (see box.cfg.vinyl_defer_deletes).
       9. It is written in luatest.
      
      The test takes about 20 seconds to finish so it's marked as long run.
      
      Closes #4251
      
      NO_DOC=test
      NO_CHANGELOG=test
      4e9a94a3
    • Ilya Verbin's avatar
      box: fix `fselect()` behavior on binary data · 915ccdf1
      Ilya Verbin authored
      Currently it throws an error when encounter binary data, print
      <binary> tag instead.
      
      Closes #7040
      
      NO_DOC=bugfix
      915ccdf1
  5. Jul 04, 2022
    • Serge Petrenko's avatar
      replication: relax split-brain checks after DEMOTE · b5811f15
      Serge Petrenko authored
      Our txn_limbo_is_replica_outdated check works correctly only when there
      is a stream of PROMOTE requests. Only the author of the latest PROMOTE
      is writable and may issue transactions. No matter synchronous or
      asynchronous.
      
      So txn_limbo_is_replica_outdated assumes that everyone but the node with
      the greatest PROMOTE/DEMOTE term is outdated.
      
      This isn't true for DEMOTE requests. There is only one server which
      issues the DEMOTE request, but once it's written, it's fine to accept
      asynchronous transactions from everyone.
      
      Now the check is too strict. Every time there is an asynchronous
      transaction from someone, who isn't the author of the latest PROMOTE or
      DEMOTE, replication is broken with ER_SPLIT_BRAIN.
      
      Let's relax it: when limbo owner is 0, it's fine to accept asynchronous
      transactions from everyone, no matter the term of their latest PROMOTE
      and DEMOTE.
      
      This means that now after a DEMOTE we will miss one case of true
      split-brain: when old leader continues writing data in an obsolete term,
      and the new leader first issues PROMOTE and then DEMOTE.
      
      This is a tradeoff for making async master-master work after DEMOTE.
      
      The completely correct fix would be to write the term the transaction
      was written in with each transaction and replace
      txn_limbo_is_replica_outdated with txn_limbo_is_request_outdated, so
      that we decide whether to filter the request or not judging by the term
      it was applied in, not by the term we seen in some past PROMOTE from the
      node. This fix seems too costy though, given that we only miss one case
      of split-brain at the moment when the user enables master-master
      replication (by writing a DEMOTE). And in master-master there is no such
      thing as a split-brain.
      
      Follow-up #5295
      Closes #7286
      
      NO_DOC=internal chcange
      b5811f15
  6. Jul 01, 2022
    • Yaroslav Lobankov's avatar
      test: fix running 'small' lib tests for OOS build · 28426f67
      Yaroslav Lobankov authored
      The 'small' lib test suite was not run for out-of-source builds since
      the wrong symlink was created for test binaries and test-run couldn't
      find them. Now it is fixed.
      
      When test-run loads tests, first, it searches the suite.ini file and if
      it exists test-run consider the dir as a test suite. So there was sense
      to create a permanent link for 'small' lib tests.
      
      Closes #4485
      
      NO_DOC=testing stuff
      NO_TEST=testing stuff
      NO_CHANGELOG=testing stuff
      28426f67
    • Vladimir Davydov's avatar
      vinyl: explicitly disable hot standby mode · 008ab8d3
      Vladimir Davydov authored
      Vinyl doesn't support the hot standby mode. There's a ticket to
      implement it, see #2013. The behavior is undefined if running an
      instance in the hot standby mode in case the master has Vinyl spaces.
      It may result in a crash or even data corruption.
      
      Let's raise an explicit error in this case.
      
      Closes #6565
      
      NO_DOC=bug fix
      008ab8d3
    • Vladimir Davydov's avatar
      json: don't match any nodes if there's [*] in the path · 35802a23
      Vladimir Davydov authored
      If a nested tuple field is indexed, it can be accessed by [*] aka
      multikey or any token:
      
        s = box.schema.create_space('test')
        s:create_index('pk')
        s:create_index('sk', {parts = {{2, 'unsigned', path = '[1][1]'}}})
        t = s:replace{1, {{1}}}
        t['[2][1][*]'] -- returns 1!
      
      If a nested field isn't indexed (remove creation of the secondary index
      in the example above), then access by [*] returns nil.
      
      Call graph:
      
        lbox_tuple_field_by_path:
          tuple_field_raw_by_full_path
            tuple_field_raw_by_path
              tuple_format_field_by_path
                json_tree_lookup_entry
                  json_tree_lookup
      
      And json_tree_lookup matches the first node if the key is [*].
      We shouldn't match anything to [*].
      
      Closes #5226
      
      NO_DOC=bug fix
      35802a23
  7. Jun 30, 2022
    • Boris Stepanenko's avatar
      test: box_promote and box_demote · 5a8dca70
      Boris Stepanenko authored
      Covered most of box_promote and box_demote with tests:
      1. Promote/demote unconfigured box
      2. Promoting current leader with elections on and off
      3. Demoting follower with elections on and off
      4. Promoting current leader, but not limbo owner with elections on
      5. Demoting current leader with elections on and off
      6. Simultaneous promote/demote
      7. Promoting voter
      8. Interfering promote/demote while writing new term to wal
      9. Interfering promote/demote while waiting for synchro queue
         to be emptied
      10. Interfering promote while waiting for limbo to be acked
          (similar to replication/gh-5430-qsync-promote-crash.test.lua)
      
      Closes #6033
      
      NO_DOC=testing stuff
      NO_CHANGELOG=testing stuff
      5a8dca70
    • Serge Petrenko's avatar
      test: fix election_pre_vote flaky failure · a10958e2
      Serge Petrenko authored
      The test failed with the following output:
      
       TAP version 13
       1..3
       # Started on Tue Jun 28 13:36:03 2022
       # Starting group: pre-vote
       not ok 1	pre-vote.test_no_direct_connection
       #   .../election_pre_vote_test.lua:46: expected: a value evaluating to
      					true, actual: false
       #   stack traceback:
       #   .../election_pre_vote_test.lua:65: in function 'retrying'
       #   .../election_pre_vote_test.lua:64: in function
      				    'pre-vote.test_no_direct_connection'
       #   ...
       #   [C]: in function 'xpcall'
       ok     2	pre-vote.test_no_quorum
       ok     3	pre-vote.test_promote_no_quorum
       # Ran 3 tests in 6.994 seconds, 2 succeeded, 1 failed
      
      This is the moment when one of the followers disconnects from
      the leader and expects its `box.info.election.leader_idle` to grow.
      
      It wasn't taken into account that this disconnect might lead to leader
      resign due to fencing, and then a new leader would emerge and
      `leader_idle` would still be small.
      
      IOW, the leader starts with fencing turned off, and only resumes
      fencing, once it has connected to a quorum of nodes (one replica in this
      test). If the replica that we just connected happens to be the one we
      disconnect in the test, the leader might fence, if it hasn't yet
      connected to the other replica, because it immediately loses a quorum of
      healthy connections right after gaining it for the first time.
      
      Fix this by waiting until everyone follows everyone before each test
      case.
      
      The test, of course, could be fixed by turning fencing off, but this
      might hide any possible future problems with fencing.
      
      Follow-up #6654
      Follow-up #6661
      
      NO_CHANGELOG=test fix
      NO_DOC=test fix
      a10958e2
    • Vladimir Davydov's avatar
      vinyl: disable deferred deletes if there are upserts on disk · a85629a6
      Vladimir Davydov authored
      Normally, there shouldn't be any upserts on disk if the space has
      secondary indexes, because we can't generate an upsert without a lookup
      in the primary index hence we convert upserts to replace+delete in this
      case. The deferred delete optimization only makes sense if the space has
      secondary indexes. So we ignore upserts while generating deferred
      deletes, see vy_write_iterator_deferred_delete.
      
      There's an exception to this rule: a secondary index could be created
      after some upserts were used on the space. In this case, because of the
      deferred delete optimization, we may never generate deletes for some
      tuples for the secondary index, as demonstrated in #3638.
      
      We could fix this issue by properly handle upserts in the write iterator
      while generating deferred delete, but this wouldn't be easy, because in
      case of a minor compaction there may be no replace/insert to apply the
      upsert to so we'd have to keep intermediate upserts even if there is a
      newer delete statement. Since this situation is rare (happens only once
      in a space life time), it doesn't look like we should complicate the
      write iterator to fix it.
      
      Another way to fix it is to force major compaction of the primary index
      after a secondary index is created. This looks doable, but it could slow
      down creation of secondary indexes. Let's instead simply disable the
      deferred delete optimization if the primary index has upsert statements.
      This way the optimization will be enabled sooner or later, when the
      primary index major compaction occurs. After all, it's just an
      optimization and it can be disabled for other reasons (e.g. if the space
      has on_replace triggers).
      
      Closes #3638
      
      NO_DOC=bug fix
      a85629a6
  8. Jun 29, 2022
    • Ilya Verbin's avatar
      fiber: get rid of cpu_misses in fiber.top() · 390311bb
      Ilya Verbin authored
      It doesn't make sense after switching from RDTSCP to
      clock_gettime(CLOCK_MONOTONIC).
      
      Part of #5869
      
      @TarantoolBot document
      Title: fiber: get rid of cpu_misses in fiber.top()
      Since: 2.11
      
      Remove any mentions of `cpu_misses` in `fiber.top()` description.
      390311bb
  9. Jun 27, 2022
    • Timur Safin's avatar
      datetime: fix set with hour=nil · ba140128
      Timur Safin authored
      We did not retain correctly `hour` attribute if modified
      via `:set` method attributes `min`, `sec` or `nsec`.
      
      ```
      tarantool> a = dt.parse '2022-05-05T00:00:00'
      
      tarantool> a:set{min = 0, sec = 0, nsec = 0}
      --
      - 2022-05-05T12:00:00Z
      ...
      ```
      
      Closes #7298
      
      NO_DOC=bugfix
      ba140128
  10. Jun 24, 2022
    • Vladimir Davydov's avatar
      vinyl: drop UPSERT squashing optimization when there is no disk data · afce0913
      Vladimir Davydov authored
      The optimization is mostly useless, because it only works if there's no
      data on disk. As explained in #5080, it contains a potential bug: if L0
      dump is triggered between 'prepare' and 'commit', it will insert a
      statement to a sealed vy_mem. Let's drop it.
      
      Part of #5080
      
      NO_DOC=bug fix
      NO_CHANGELOG=later
      afce0913
    • Nikita Pettik's avatar
      Fix gh_6634 test case · 47ad3bc9
      Nikita Pettik authored
      gh_6634_different_log_on_tuple_new_and_free_test.lua verifies that
      proper debug message gets into logs for tuple_new() and tuple_delete():
      occasionally tuple_delete() printed wrong tuple address. However, still
      there are two debug logs: one in tuple_delete() and another one in
      memtx_tuple_delete(). So to avoid any possible confusions let's fix
      regular expression to find proper log so that now it definitely finds
      memtx_tuple_delete().
      
      NO_CHANGELOG=<Test fix>
      NO_DOC=<Test fix>
      47ad3bc9
    • Vladimir Davydov's avatar
      net.box: explicitly forbid synchronous requests in triggers · 0d944f90
      Vladimir Davydov authored
      Net.box triggers (on_connect, on_schema_reload) are executed
      by the net.box connection worker fiber so a request issued by
      a trigger callback can't be processed until the trigger returns
      execution to the net.box fiber. Currently, an attempt to issue
      a synchronous request from a net.box trigger leads to a silent
      hang of the connection, which is confusing. Let's instead raise
      an error until #7291 is implemented.
      
      We need to add the check to three places in the code:
       1. luaT_netbox_wait_result for future:wait_result()
       2. luaT_netbox_iterator_next for future:pairs()
       3. conn._request for all synchronous requests.
          (We can't add the check to luaT_netbox_transport_perform_request,
          because conn._request may also call conn.wait_state, which would
          hang if called from on_connect or on_schema_reload trigger.)
      
      We also add an assertion to netbox_request_wait to ensure that we
      never wait for a request completion in the net.box worker fiber.
      
      Closes #5358
      
      @TarantoolBot document
      Title: Synchronous requests are not allowed in net.box triggers
      
      An attempt to issue a synchronous request (e.g. `call`) from
      a net.box trigger (`on_connect`, `on_schema_reload`) now raises
      an error: "Synchronous requests are not allowed in net.box trigger"
      (Before https://github.com/tarantool/tarantool/issues/5358 was
      fixed, it silently hung.)
      
      Invoking an asynchronous request (see `is_async` option) is allowed,
      but the request will not be processed until the trigger returns and
      an attempt to wait for the request completion with `future:pairs()`
      or `future:wait_result()` will raise the same error.
      0d944f90
  11. Jun 23, 2022
    • Vladimir Davydov's avatar
      box: fix exclude_null for json and multikey indexes · 30cb5d6e
      Vladimir Davydov authored
      exclude_null is a special index option, which makes the index ignore
      tuples that contain null in any of the indexed fields. Currently, it
      doesn't work for json and multikey indexes, because:
       1. index_filter_tuple ignores json path.
       2. index_filter_tuple ignores multikey index.
      
      Issue no. 1 is easy to fix - we just need to use tuple_field_by_part
      instead of tuple_field when checking if a key field is null.
      
      Issue no. 2 is more complicated, because when we call index_filter_tuple
      we don't know the multikey index. We address this issue by pushing the
      index_filter_tuple call down to engine-specific index implementation.
      
      For Vinyl, we make vy_stmt_foreach_entry, which iterates over multikey
      tuple entries, skip entries that contain nulls.
      
      For memtx, we move the check to index-specific index_replace function
      implementation.  Fortunately, only tree indexes support nullable fields
      so we just need to update the memtx tree implementation.
      
      Ideally, we should handle multikey indexes in memtx at the top level,
      because the implementation should essentially be the same for all kinds
      of indexes, but this refactoring is complicated and will be done later.
      For now, just fix the bug.
      
      Closes #5861
      
      NO_DOC=bug fix
      30cb5d6e
    • Vladimir Davydov's avatar
      test: make all engine/null test cases multi-engine · 7e605b13
      Vladimir Davydov authored
      For some reason, some test cases create memtx spaces irrespective of
      the value of the engine parameter.
      
      NO_DOC=test
      NO_CHANGELOG=test
      7e605b13
  12. Jun 22, 2022
    • Nikita Pettik's avatar
      box: introduce stubs for wal_ext · 6b6b5598
      Nikita Pettik authored
      NO_CHANGELOG=<No functional changes>
      NO_DOC=<Later for EE>
      6b6b5598
    • Nikita Pettik's avatar
      xrow: add old_tuple, new_tuple to struct request · 03e249b9
      Nikita Pettik authored
      These fields correspond to the tuple before DML request is executed
      (old); and after - result (new). For example let index stores
      tuple {1, 1}:
      replace{1, 2} -- old == {1, 1}, new == {1, 2}
      
      These fields rather make sense for update operation, which holds
      a key and an array of update operations (not the old tuple).
      
      `old_tuple`, `new_tuple` are going to be used as WAL extensions available
      in enterprise version. Alongside with it let's reserve 0x2c and 0x2d
      Iproto keys for these members.
      
      NO_DOC=<No functional changes>
      NO_TEST=<No functional changes>
      NO_CHANGELOG=<No functional changes>
      03e249b9
    • Georgiy Lebedev's avatar
      memtx: fix DML after select of key causing self-conflict of transaction · b3f085f2
      Georgiy Lebedev authored
      On insertion, when point holes are checked on insertion, we must only
      conflict transactions other than the one that read the hole.
      
      NO_CHANGELOG=internal bugfix
      NO_DOC=bugfix
      
      Closes #7234
      Closes #7235
      b3f085f2
    • Georgiy Lebedev's avatar
      memtx: fix DML causing transaction self-conflict after full scan in HASH · 3e546d49
      Georgiy Lebedev authored
      When full scans are checked on writes, we must only conflict transactions
      other than the one that did the full scan.
      
      NO_CHANGELOG=internal bugfix
      NO_DOC=bugfix
      
      Closes #7221
      3e546d49
  13. Jun 21, 2022
    • Vladimir Davydov's avatar
      vinyl: fix !vy_tx_is_in_read_view assertion failure in vy_tx_prepare · 2971f691
      Vladimir Davydov authored
      Commit 4d52199e ("box: fix transaction "read-view" and "conflicted"
      states") updated vy_tx_send_to_read_view so that now it aborts all RW
      transactions right away instead of sending them to read view and
      aborting them on commit. It also updated vy_tx_begin_statement to fail
      if a transaction sent to a read view tries to do DML. With all that,
      we assume that there cannot possibly be an RW transaction sent to read
      view so we have an assertion checking that in vy_tx_commit.
      
      However, this assertion may fail, because a DML statement may yield
      on disk read before it writes anything to the write set. If this is
      the first statement in a transaction, the transaction is technically
      read-only and we will send it to read-view instead of aborting it.
      Once it completes the disk read, it will apply the statement and hence
      become read-write, breaking our assumption in vy_tx_commit.
      
      Fix this by aborting RW transactions sent to read-view in vy_tx_set.
      
      Follow-up #7240
      
      NO_DOC=bug fix
      NO_CHANGELOG=unreleased
      2971f691
  14. Jun 17, 2022
    • Cyrill Gorcunov's avatar
      fiber: don't crash on wakeup with dead fibers · 206137e7
      Cyrill Gorcunov authored
      
      When fiber has finished its work it ended up in two cases:
      1) If no "joinable" attribute set then the fiber is
         simply recycled
      2) Otherwise it continue hanging around waiting to be
         joined.
      
      Our API allows to call fiber_wakeup() for dead but joinable
      fibers (2) in release builds without any side effects, such
      fibers are simply ignored, in turn for debug builds this
      causes assertion to trigger. We can't change our API for
      backward compatibility sake but same time we must not
      preserve different behaviour between release and debug
      builds since this brings inconsistency. Thus lets get
      rid of assertion call and allow to call fiber_wakeup
      in debug build as well.
      
      Fixes #5843
      
      NO_DOC=bug fix
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      206137e7
    • Serge Petrenko's avatar
      replication: unify replication filtering with and without elections · deca9749
      Serge Petrenko authored
      Once the split-brain detection is in place, it's fine to nopify obsolete
      data even on a node with elections disabled. Let's not keep a bug around
      anymore.
      
      This behaviour change leads to changing
      "gh_6842_qsync_applier_order_test.lua" a bit. It actually relied on old
      and buggy behaviour: it assumed old transactions would not be nopified
      and would trigger replication error.
      
      This doesn't happen anymore, because nopify works correctly, and the
      transactions are not followed by a conflicting CONFIRM.
      
      The test for this commit is simply altering the
      gh_5295_split_brain_detection_test.lua to work with elections disabled.
      
      Closes #6133
      Follow-up #5295
      
      NO_DOC=internal change
      NO_CHANGELOG=internal change
      deca9749
    • Cyrill Gorcunov's avatar
      txn_limbo: filter incoming synchro requests · af7d703f
      Cyrill Gorcunov authored
      
      When we receive synchro requests we can't just apply them blindly
      because in worst case they may come from split-brain configuration
      (where a cluster split into several clusters and each one has own
      leader elected, then clusters are trying to merge back into the original
      one). We need to do our best to detect such disunity and force these
      nodes to rejoin from the scratch for data consistency sake.
      
      Thus when we're processing requests we pass them to the packet filter
      first which validates their contents and refuse to apply if they violate
      consistency.
      
      Depending on request type each packet traverses an appropriate chain.
      
      filter_generic(): a common chain for any synchro packet.
       1) request:replica_id = 0 allowed for PROMOTE request only.
       2) request:replica_id should match limbo:owner_id, IOW the
          limbo migration should be noticed by all instances in the
          cluster.
      
      filter_confirm_rollback(): a chain for CONFIRM | ROLLBACK packets.
       1) Zero lsn is disallowed for such requests.
      
      filter_promote_demote(): a chain for PROMOTE | DEMOTE packets.
       1) The requests should come in with nonzero term, otherwise
          the packet is corrupted.
       2) The request's term should not be less than maximal known
          one, iow it should not come in from nodes which didn't notice
          raft epoch changes and living in the past.
      
      filter_queue_boundaries(): a common finalization chain.
       1) If LSN of the request matches current confirmed LSN the packet
          is obviously correct to process.
       2) If LSN is less than confirmed LSN then the request is wrong,
          we have processed the requested LSN already.
       3) If LSN is greater than confirmed LSN then
          a) If limbo is empty we can't do anything, since data is already
             processed and should issue an error;
          b) If there is some data in the limbo then requested LSN should
             be in range of limbo's [first; last] LSNs, thus the request
             will be able to commit and rollback limbo queue.
      
      Note the filtration is disabled during initial configuration where we
      apply requests from the only source of truth (either the remote master,
      or our own journal), so no split brain is possible.
      
      In order to make split-brain checks work, the applier nopify filter now
      passes synchro requests from obsolete term without nopifying them.
      
      Also, now ANY asynchronous request coming from an instance with obsolete
      term is treated as a split-brain. Think of it as of a syncrhonous
      request committed with a malformed quorum.
      
      Closes #5295
      
      NO_DOC=it's literally below
      
      Co-authored-by: default avatarSerge Petrenko <sergepetrenko@tarantool.org>
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      
      @TarantoolBot document
      Title: new error type: ER_SPLIT_BRAIN
      
      If for some reason the cluster had 2 leaders working independently (for
      example, user has mistakenly lovered the quorum below N / 2 + 1), then
      once such leaders and their followers try connecting to each other, they
      will receive the ER_SPLIT_BRAIN error, and the connection will be
      aborted. This is done to preserve data integrity. Once the user notices
      such an error he or she has to manually inspect the data on both the
      split halves, choose a way to restore the data, and rebootstrap one of
      the halves from the other.
      af7d703f
    • Serge Petrenko's avatar
      txn_limbo: do not confirm/rollback anything after restart · 6cc1b1f2
      Serge Petrenko authored
      It's important for the synchro queue owner to not finalize any of the
      pending synchronous transactions after restart.
      
      Since the node was down for some time the chances are pretty high it was
      deposed by some new leader during its downtime. It means that the node
      might not know yet that it's transactions were already finalized by someone
      else.
      
      So, any arbitrary finalization might lead to a future split-brain, once the
      remote PROMOTE finally reaches the local node.
      
      Let's fix this by adding a new reason for the limbo to be frozen - a
      queue owner has recovered but has not issued a new PROMOTE locally and
      hasn't received any PROMOTE requests from the remote nodes.
      
      Once the first PROMOTE is issued or received, it's safe to return to the
      old mode of operation.
      
      So, now the synchro queue owner starts in "frozen" state and can't
      CONFIRM, ROLLBACK or issue new transactions until either issuing a
      PROMOTE or receiving a PROMOTE from some remote node.
      
      This also required modifying box.ctl.promote() behaviour: it's no
      longer a no-op on a synchro queue owner, when elections are disabled and
      the queue is frozen due to restart.
      
      Also fix the tests, which assumed the queue owner is writeable after a
      restart. gh-5298 test was partially deleted, because it became pointless.
      
      And while we are at it, remove the double run of gh-5288 test. It is
      storage engine agnostic, so there's no point in running it for both
      memtx and vinyl.
      
      Part-of #5295
      
      NO_CHANGELOG=covered by previous commit
      
      @TarantoolBot document
      Title: ER_READONLY error receives new reasons
      
      When box.info.ro_reason is "synchro" and some operation throws an
      ER_READONLY error, this error now might include the following reason:
      ```
      Can't modify data on a read-only instance - synchro queue with term 2
      belongs to 1 (06c05d18-456e-4db3-ac4c-b8d0f291fd92) and is frozen due to
      fencing
      ```
      This means that the current instance is indeed the synchro queue owner,
      but it has noticed, that someone else in the cluster might start new
      elections or might overtake the synchro queue soon.
      This may be also detected by `box.info.election.term` becoming greater than
      `box.info.synchro.queue.term` (this is the case for the second error
      message).
      There is also a slightly different error message:
      ```
      Can't modify data on a read-only instance - synchro queue with term 2
      belongs to 1 (06c05d18-456e-4db3-ac4c-b8d0f291fd92) and is frozen until
      promotion
      ```
      This means that the node simply cannot guarantee that it is still the
      synchro queue owner (for example, after a restart, when a node still thinks
      it is the queue owner, but someone else in the cluster has already
      overtaken the queue).
      6cc1b1f2
Loading