Skip to content
Snippets Groups Projects
  1. Jul 06, 2020
    • Serge Petrenko's avatar
      replication: append NOP as the last tx row · 25382617
      Serge Petrenko authored
      
      Since we stopped sending local space operations in replication, the last
      tx row has to be global in order to preserve tx boundaries on replica.
      If the last row happens to be a local one, replica will never receive
      the tx end marker, yielding the following errors:
      `ER_UNSUPPORTED: replication does not support interleaving
      transactions`.
      
      In order to fix the problem append a global NOP row at the tx end if
      it happens to end on a local row.
      
      Follow-up #4114
      Closes #4928
      
      Reviewed-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      25382617
    • Serge Petrenko's avatar
      wal: fix tx boundaries · f41d1ddd
      Serge Petrenko authored
      
      In order to preserve transaction boundaries in replication protocol, wal
      assigns each tx row a transaction sequence number (tsn). Tsn is equal to
      the lsn of the first transaction row.
      
      Starting with commit 7eb4650e, local
      space requests are assigned a special replica id, 0, and have their own
      lsns. These operations are not replicated.
      
      If a transaction starting with a local space operation ends up in the
      WAL, it gets a tsn equal to the lsn of the local space request. Then,
      during replication, when such a transaction is replicated, the local
      space request is omitted, and replica receives a global part of the
      transaction with a seemingly random tsn, yielding an ER_PROTOCOL error:
      "Transaction id must be equal to LSN of the first row in the transaction".
      
      Assign tsn as equal to the lsn of the first global row in the
      transaction to fix the problem, and assign tsn as before for fully local
      transactions.
      
      Follow-up #4114
      Part-of #4928
      
      Reviewed-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      f41d1ddd
    • Serge Petrenko's avatar
      applier: fix tx boundary check for half-applied txns · 9fcbbb3e
      Serge Petrenko authored
      In case there are 2 "new" instances, running tarantool 2.2+,
      master and replica, and one "old" instance, running an earlier tarantool
      version, in a full-mesh cluster, it may happen that the "new" replica
      receives part of a tx from an "old" instance, and the remaining part
      from a "new" instance.
      
      Since "new" instances preserve tx boundaries, "new" replica would skip
      the tx remains assuming it has already applied the full tx if it has
      applied the first tx row. This leads to gaps in "new" replica's WAL and
      to skipping the remaining part of the tx forever.
      
      Fix this behaviour to apply the full tx even if it's beginning is
      already applied in mixed clusters.
      
      Closes #5125
      9fcbbb3e
  2. Jul 03, 2020
    • Alexander V. Tikhonov's avatar
      test: fix flaky box/net.box_readahead_gh-3958 test · 4c7d8281
      Alexander V. Tikhonov authored
      Issue:
      
      [014] --- box/net.box_readahead_gh-3958.result Mon Jun 15 15:33:23 2020
      [014] +++ box/net.box_readahead_gh-3958.reject Tue Jun 16 02:24:04 2020
      [014] @@ -46,6 +46,7 @@
      [014]  ...
      [014]  test_run:wait_log('default', 'readahead limit is reached', 1024, 0.1)
      [014]  ---
      [014] +- readahead limit is reached
      [014]  ...
      [014]  s:drop()
      [014]  ---
      [014]
      [014] Last 15 lines of Tarantool Log file [Instance "box"][/tarantool/test/var/014_box/box.log]:
      [014] 2020-06-16 02:24:03.792 [5585] main/121/console/unix/: I> set 'read_only' configuration option to false
      [014] 2020-06-16 02:24:03.834 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      [014] 2020-06-16 02:24:03.835 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      [014] 2020-06-16 02:24:03.835 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      [014] 2020-06-16 02:24:03.837 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      [014] 2020-06-16 02:24:03.837 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      [014] 2020-06-16 02:24:03.837 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      [014] 2020-06-16 02:24:03.951 [5585] main/121/console/unix/: space.h:336 E> ER_NO_SUCH_INDEX_ID: No index #1 is defined in space '_space'
      [014] 2020-06-16 02:24:04.180 [5585] main/121/console/unix/: I> set 'readahead' configuration option to 128
      [014] 2020-06-16 02:24:04.183 [5585] main/121/console/unix/: I> set 'readahead' configuration option to 102400
      [014] 2020-06-16 02:24:04.189 [5585] main/453/console/unix/: I> set 'readahead' configuration option to 16320
      
      Found that the root cause of the issue, was the previously run test
      'box/net.box_call_blocks_gh-946.test.lua' on the same worker, in this
      case the log output mistakenly checked by wait_log/grep_log test_run
      function, which finds the grepping string in the log of the previous
      test. To avoid of it the tests can be swapped in worker running queue
      and in this case both tests pass, check swapped log output:
      
      2020-06-17 10:57:39.881 [69372] main C> entering the event loop
      2020-06-17 10:57:39.896 [69372] main/119/console/unix/: I> set 'readahead' configuration option to 128
      2020-06-17 10:57:39.898 [69372] main/119/console/unix/: I> set 'readahead' configuration option to 102400
      2020-06-17 10:57:40.003 [69372] main/156/console/unix/: I> set 'readahead' configuration option to 16320
      2020-06-17 10:57:40.053 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      2020-06-17 10:57:40.056 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      2020-06-17 10:57:40.056 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      2020-06-17 10:57:40.058 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      2020-06-17 10:57:40.058 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      2020-06-17 10:57:40.061 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      2020-06-17 10:57:40.061 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      2020-06-17 10:57:40.062 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      2020-06-17 10:57:40.062 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      2020-06-17 10:57:40.063 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
      2020-06-17 10:57:40.067 [69372] main C> got signal 15 - Terminated
      
      Also found that 'readahead' issue from the first test blocks its
      printing to log file due to suppressed. To fix this issue the
      default server must be restarted at the very start of the test.
      
      Closes #5082
      4c7d8281
    • Alexander V. Tikhonov's avatar
      Correct cleanup gitlab-ci for perf jobs · 27ee9382
      Alexander V. Tikhonov authored
      Found that some perf jobs were forgot to be updated with local cleanup
      routine as was done for the other jobs at commit:
      
        892a188b "Correct cleanup gitlab-ci"
      
      Follows up #5036
      27ee9382
    • Alexander V. Tikhonov's avatar
      build: static build needs more cleanup in sources · b74a4623
      Alexander V. Tikhonov authored
      Building Tarantool sources on make command run may fail with:
      
        [ 10%] make[2]: *** [test/small] Error 1
        [ 10%] make[1]: *** [test/CMakeFiles/symlink_small_tests.dir/all] Error 2
        make[1]: *** Waiting for unfinished jobs....
      
      The root cause of the issue that Dockerfile.staticbuild
      uses local copy of sources:
      
        COPY . /tarantool
      
      Which may have broken links in tests, like:
      
        $ ls -al test
        ...
        luajit-tap -> /<wrong path>/third_party/luajit/test
        small -> /<wrong path>/src/lib/small/test/
        ...
      
      To fix the issue this links should be removed from
      the docker local copy of sources before build, like:
      
        rm -rf test/small test/luajit-tap
      
      Closes #5025
      b74a4623
  3. Jul 02, 2020
  4. Jun 30, 2020
  5. Jun 29, 2020
  6. Jun 26, 2020
    • Alexander V. Tikhonov's avatar
      test: flaky box/net.box_wait_connected_gh-3856 · d51be6f4
      Alexander V. Tikhonov authored
      
      Found issue running test on FreeBSD VBox host:
      
       [011] --- box/net.box_wait_connected_gh-3856.result	Mon Jun 15 09:39:49 2020
       [011] +++ box/net.box_wait_connected_gh-3856.reject	Fri May  8 08:23:30 2020
       [011] @@ -12,7 +12,8 @@
       [011]  - opts:
       [011]      wait_connected: false
       [011]    host: 8.8.8.8
       [011] -  state: initial
       [011] +  state: error
       [011] +  error: Invalid argument
       [011]    port: '123456'
       [011]  ...
       [011]  c:close()
      
      A. Turenko made deep investigation and found that the reason of the
      fail was that getaddrinfo() returned EIA_SERVICE for an incorrect
      TCP/IP port on FreeBSD, but crops it as modulo of 65536 on Linux/glibc.
      Checked with his local script './getaddrinfo':
      
        (Linux/glibc) $ ./getaddrinfo 8.8.8.8 123456
        ----
        family: AF_INET
        socktype: SOCK_STREAM
        protocol: IPPROTO_TCP
        host: 8.8.8.8
        serv: 57920
      
        (FreeBSD) $ ./getaddrinfo 8.8.8.8 123456
        getaddrinfo: Service was not recognized for socket type
      
      So obvious fix is to change 123456 to something less or equal to
      65535. Say, 1234.
      
      The test depended on an order in which fibers were scheduled
      (net_box.connect() creates a separate fiber for connecting in background
      using fiber.create(), which yields). Unlikely our fiber were not get
      execution time during the connection attempt, so it was more like a
      formal thing.
      
      But we can decrease probability of this situation even more if we'll
      grab all connection fields just when net_box.connect() returns, not
      after yield in console (which is due to waiting a next command from
      test-run).
      
      Closes #5083
      
      Co-authored-by: default avatarAlexander Turenko <alexander.turenko@tarantool.org>
      Co-authored-by: default avatarVladislav Shpilevoy <v.shpilevoy@tarantool.org>
      d51be6f4
    • Alexander V. Tikhonov's avatar
      test: fix flaky replication/wal_rw_stress.test.lua · 06eda0f7
      Alexander V. Tikhonov authored
      Found issue (reproduced on VBox FreeBSD machine):
      
       [016] --- replication/wal_rw_stress.result	Fri Feb 21 11:53:21 2020
       [016] +++ replication/wal_rw_stress.reject	Fri May  8 08:23:56 2020
       [016] @@ -73,7 +73,42 @@
       [016]  ...
       [016]  box.info.replication[1].downstream.status ~= 'stopped' or box.info
       [016]  ---
       [016] -- true
       [016] +- version: 2.5.0-27-g32f59756a
       [016] +  id: 2
       [016] +  ro: false
       [016] +  uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
       [016] +  package: Tarantool
       [016] +  cluster:
       [016] +    uuid: 397c196f-9105-11ea-96ab-08002739cbd6
       [016] +  listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto
       [016] +  replication:
       [016] +    1:
       [016] +      id: 1
       [016] +      uuid: 397a1886-9105-11ea-96ab-08002739cbd6
       [016] +      lsn: 10005
       [016] +      upstream:
       [016] +        status: follow
       [016] +        idle: 0.46353673400017
       [016] +        peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto
       [016] +        lag: -0.45732522010803
       [016] +      downstream:
       [016] +        status: stopped
       [016] +        message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati
       [016] +        system_message: Broken pipe
       [016] +    2:
       [016] +      id: 2
       [016] +      uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
       [016] +      lsn: 0
       [016] +  signature: 10005
       [016] +  status: running
       [016] +  vinyl: []
       [016] +  uptime: 2
       [016] +  lsn: 0
       [016] +  sql: []
       [016] +  gc: []
       [016] +  pid: 41231
       [016] +  memory: []
       [016] +  vclock: {1: 10005}
       [016]  ...
       [016]  test_run:cmd("switch default")
       [016]  ---
      
      To check the downstream status and it's message need to wait until an
      downstream appears. This prevents an attempt to index a nil value when
      one of those functions are called before a record about a peer appears
      in box.info.replication. It was observed on test:
        replication/show_error_on_disconnect
      after commit
        c6bea65f ('replication: recfg with 0
      quorum returns immediately').
      
      Checked that test still checks the error for which it was created at
      b9db91e1 ('xlog: fix fallocate vs
      read race') patch and successfully got the needed error "tx checksum
      mismatch":
      
      [153] --- replication/wal_rw_stress.result      Fri Jun 19 15:01:49 2020
      [153] +++ replication/wal_rw_stress.reject      Fri Jun 19 15:04:02 2020
      [153] @@ -73,7 +73,43 @@
      [153]  ...
      [153]  test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info
      ...
      [153] +      downstream:
      [153] +        status: stopped
      [153] +        message: tx checksum mismatch
      
      Note that wait_cond() allows to overcome a transient network
      connectivity errors, but 'tx checksum mismatch' is persistent
      one and will be catched.
      
      Closes #4977
      06eda0f7
    • Alexander V. Tikhonov's avatar
      test: fix flaky replication/wal_off.test.lua · 3e904475
      Alexander V. Tikhonov authored
      Found issue:
      
      [003] --- replication/wal_off.result	Thu Apr 25 13:10:18 2019
      [003] +++ replication/wal_off.reject	Tue Jul 16 17:10:31 2019
      [003] @@ -95,6 +95,8 @@
      [003]  ...
      [003]  while string.find(box.info.replication[wal_off_id].upstream.message, check) == nil do fiber.sleep(0.01) end
      [003]  ---
      [003] +- error: '[string "while string.find(box.info.replication[wal_of..."]:1: bad argument
      [003] +    #1 to ''find'' (string expected, got nil)'
      [003]  ...
      [003]  box.cfg { replication = "" }
      [003]  ---
      
      To check the upstream status and it's message need to wait until an
      upstream appears. This prevents an attempt to index a nil value when
      one of those functions are called before a record about a peer appears
      in box.info.replication. It was observed on test:
        replication/show_error_on_disconnect
      after commit
        c6bea65f ('replication: recfg with 0
      quorum returns immediately').
      
      Closes #4355
      3e904475
    • Igor Munkin's avatar
      box: reduce box_process_lua Lua GC memory usage · e88c0d21
      Igor Munkin authored
      
      <box_process_lua> function created a new GCfunc object for a handler
      having no upvalues depending on the request context on each call.
      
      The change introduces the following mapping:
      | <handler id> -> <handler GCfunc object>
      Initializing this mapping on Tarantool startup is aimed to reduce Lua GC
      memory usage.
      
      Reviewed-by: default avatarSergey Ostanevich <sergos@tarantool.org>
      Reviewed-by: default avatarVladislav Shpilevoy <v.shpilevoy@tarantool.org>
      Signed-off-by: default avatarIgor Munkin <imun@tarantool.org>
      e88c0d21
    • Igor Munkin's avatar
      test: disable JIT for Lua Fun chain iterator · 5fa7ded2
      Igor Munkin authored
      
      JIT compiler can generate an invalid trace for <fun.chain> iterator
      (i.e. chain_gen_r1) breaking its semantics (see LuaJIT/LuaJIT#584).
      Since interpreter works fine and produces the right results, disabling
      JIT for this function stops execution failures.
      
      As a result box-tap/key_def.test.lua is removed from box-tap suite
      fragile tests list.
      
      Relates to LuaJIT/LuaJIT#584
      Fixes #4252
      
      Reviewed-by: default avatarAlexander V. Tikhonov <avtikhon@tarantool.org>
      Reviewed-by: default avatarVladislav Shpilevoy <v.shpilevoy@tarantool.org>
      Signed-off-by: default avatarIgor Munkin <imun@tarantool.org>
      5fa7ded2
  7. Jun 23, 2020
    • Nikita Pettik's avatar
      vinyl: restart read iterator in case L0 is changed · 83462a5c
      Nikita Pettik authored
      Data read in vinyl is known to yield in case of disc access. So it opens
      a window for modifications of in-memory level. Imagine following scenario:
      right before data selection tuple is inserted into space. It passes first
      stage of commit procedure, i.e. it is prepared to be committed but still
      is not yet reached WAL.  Meanwhile iterator is starting to read the same key.
      At this moment prepared statement is already inserted to in-memory tree
      ergo visible to read iterator. So, read iterator fetches this statement
      and proceeds to disk scan.  In turn, disk scan yields and in this moment
      WAL fails to write statement on disk. Next, two cases are possible:
      1. WAL thread has enough time to complete rollback procedure.
      2. WAL thread fails to finish rollback in this time gap.
      
      In the first case read iterator should skip statement: version of
      in-memory tree has diverged from iterator's one, so we fall back into
      iterator restoration procedure. Mem iterator might become invalid so
      the only choice is to restart whole 'advance' routine.
      Let's don't try to restore it and always restart iteration cycle if
      L0 level has changed during yield.
      
      In the second case nothing is changed to read iterator, so it simply
      returns prepared statement (and it is considered to be OK).
      
      Closes #3395
      83462a5c
    • Nikita Pettik's avatar
      vinyl: fix passing uninitialized parameter to vy_page_find_key() · f84cb1aa
      Nikita Pettik authored
      vy_page_find_key() assumes that equal_key parameter is initialized since
      it is used unconditionally. Originally, function was designed with
      assumption that parameter is initialized by caller. Since then it has
      been used in several other places, but some callers doesn't initialize
      this parameter to 'false' value. Let's fix it and inside
      vy_page_find_key() set this output parameter to false by default.
      
      Closes #5078
      f84cb1aa
  8. Jun 22, 2020
  9. Jun 19, 2020
    • Kirill Yukhin's avatar
      sql: raise an error on attempt to use HASH index in SQL · e7a70be4
      Kirill Yukhin authored
      Since currently query planner is unable to use HASH indexes
      and attempt to use it will likely lead to SEGFAULT, this
      patch raises an error on attempt to open VDBE cursor
      against HASH index.
      
      @TarantoolBot document
      Title: Doceument allowed index type for SQL
      Before the change, Tarantool query planner segfaulted on
      try of using non-tree index. It is blocked now w/ appropriate
      error message. Need to document the behaviour.
      It should be noted, that this restriction might be relaxed in future.
      
      Closes #4659
      e7a70be4
  10. Jun 17, 2020
    • Cyrill Gorcunov's avatar
      fio/coio: handle partial writes · a9276dae
      Cyrill Gorcunov authored
      
      Writing less bytes than requested is perfectly fine. In turn out
      that fio.write/pwrite api simply returns 'true' even if only some
      part of a buffer has been written.
      
      Thus make coio_write and coio_pwrite to write the whole data in
      a cycle. Note in most situations there will be only one pass,
      partial writes are really the rare cases.
      
      Note that we're not handling nonblocking writes here (which
      could return EAGAIN) simply because we need an other api
      which would accept timeouts.
      
      Fixes #4651
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      a9276dae
  11. Jun 16, 2020
    • Ilya Kosarev's avatar
      memtx: fix tuples references on concurrent replaces · 8c53942e
      Ilya Kosarev authored
      Since 527b02a2 (memtx: add yields
      during index build) memtx_build_on_replace was introduced to handle
      concurrent updates. The problem here was that the tuples being handled
      with this trigger did not get reference counter promotion, leading to a
      number of wrong behavior cases. Now this problem is solved.
      This problem was found through primary index altering with updates in
      background fiber. Corresponding test is introduced.
      
      Closes #4973
      8c53942e
    • Vladislav Shpilevoy's avatar
      cmake: split UB sanitations into separate flags. · 5115d9f3
      Vladislav Shpilevoy authored
      Clang undefined behaviour sanitizer was turned on using
      -fsanitize=undefined flag, which is supposed to turn on all the
      sanitizations, except a few ones. Not needed sanitations were
      turned off explicitly, using -fno-sanitize=<type> flags. However
      appeared it does not work with some flags. For example,
      nullability sanitations can't be turned off when
      -fsanitize=undefined is used.
      
      Nullability sanitations lead to lots of false-positive fails
      such as typeof(*obj) where obj is NULL, or memcpy() with NULL
      destination but 0 size.
      
      The patch splits -fsanitize=undefined into separate flags and
      never turns on nullability checks.
      
      Part of #4609
      5115d9f3
    • Vladislav Shpilevoy's avatar
      sql: don't build sql as a separate library · 35473d5d
      Vladislav Shpilevoy authored
      SQL heavily depends on box, and box on SQL. So they can't be
      separate libraries. The build started failing with undefined box
      symbols in SQL, when code of the latter has slightly changed in
      one of the recent commits.
      
      The build failed only with UB sanitizer enabled, but
      'VERBOSE=1 make' showed that both with UB and without UB the build
      command was the same (not counting -fsanitize flags). So the
      sanitizer has nothing to do with it.
      
      The patch makes SQL sources being built as a part of box library.
      
      Closes #5067
      35473d5d
Loading