Skip to content
Snippets Groups Projects
  1. Aug 31, 2020
    • Alexander V. Tikhonov's avatar
      gitlab-ci: add openSUSE packages build jobs · d07e5f96
      Alexander V. Tikhonov authored
      Implemented openSUSE packages build with testing for images:
      opensuse-leap:15.[0-2]
      
      Added %{sle_version} checks in Tarantool spec file according to
      https://en.opensuse.org/openSUSE:Packaging_for_Leap#RPM_Distro_Version_Macros
      
      Added opensuse-leap of 15.1 and 15.2 versions to Gitlab-CI packages
      building/deploing jobs.
      
      Closes #4562
      d07e5f96
    • Alexander V. Tikhonov's avatar
      vinyl: fix check vinyl_dir existence at bootstrap · 9600b895
      Alexander V. Tikhonov authored
      
      During implementation of openSUSE build with testing got failed test
      box-tap/cfg.test.lua. Found that when memtx_dir didn't exist and
      vinyl_dir existed and also errno was set to ENOENT, box configuration
      succeeded, but it shouldn't. Reason of this wrong behavior was that
      not all of the failure paths in xdir_scan() set errno, but the caller
      assumed it.
      
      Debugging the issue found that after xdir_scan() there was incorrect
      check for errno when it returned negative values. xdir_scan() is not
      system call and negative return value from it doesn't mean that errno
      would be set too. Found that in situations when errno was left from
      previous commands before xdir_scan() and xdir_scan() returned negative
      value by itself it produced the wrong check.
      
      The previous failed logic of the check was to catch the error ENOENT
      which set in the xdir_scan() function to handle the situation when
      vinyl_dir was not exist. It failed, because checking ENOENT outside
      the xdir_scan() function, we had to be sure that ENOENT had come from
      xdir_scan() function call indeed and not from any other functions
      before. To be sure in it possible fix could be reset errno before
      xdir_scan() call, because errno could be passed from any other function
      before call to xdir_scan().
      
      As mentioned above xdir_scan() function is not system call and can be
      changed in any possible way and it can return any result value without
      need to setup errno. So check outside of this function on errno could
      be broken.
      
      To avoid that we must not check errno after call of the function.
      Better solution is to use the flag in xdir_scan(), to check if the
      directory should exist. So errno check was removed and instead of it
      the check for vinyl_dir existence using flag added.
      
      Closes #4594
      Needed for #4562
      
      Co-authored-by: default avatarAlexander Turenko <alexander.turenko@tarantool.org>
      9600b895
  2. Aug 25, 2020
    • Ilya Kosarev's avatar
      tuple: drop extra restrictions for multikey index · bfeb61b3
      Ilya Kosarev authored
      Multikey index did not work properly with nullable root field in
      tuple_raw_multikey_count(). Now it is fixed and corresponding
      restrictions are dropped. This also means that we can drop implicit
      nullability update for array/map fields and make all fields nullable
      by default, as it was until e1d3fe8a
      (tuple format: don't allow null where array/map is expected), as far as
      default non-nullability itself doesn't solve any real problems while
      providing confusing behavior (gh-5027).
      
      Follow-up #5027
      Closes #5192
      bfeb61b3
  3. Aug 24, 2020
    • Vladislav Shpilevoy's avatar
      box: introduce space:alter() · 8c965989
      Vladislav Shpilevoy authored
      There was no way to change certain space parameters without its
      recreation or manual update of internal system space _space. Even
      if some of them were legal to update: field_count, owner, flag of
      being temporary, is_sync flag.
      
      The patch introduces function space:alter(), which accepts a
      subset of parameters from box.schema.space.create which are
      mutable, and 'name' parameter. There is a method space:rename(),
      but still the parameter is added to space:alter() too, to be
      consistent with index:alter(), which also accepts a new name.
      
      Closes #5155
      
      @TarantoolBot document
      Title: New function space:alter(options)
      
      Space objects in Lua (stored in `box.space` table) now have a new
      method: `space:alter(options)`.
      
      The method accepts a table with parameters `field_count`, `user`,
      `format`, `temporary`, `is_sync`, and `name`. All parameters have
      the same meaning as in `box.schema.space.create(name, options)`.
      
      Note, `name` parameter in `box.schema.space.create` is separated
      from `options` table. It is not so in `space:alter(options)` -
      here all parameters are specified in the `options` table.
      
      The function does not return anything in case of success, and
      throws an error when fails.
      
      From 'Synchronous replication' page, from 'Limitations and known
      problems' it is necessary to delete the note about "no way to
      enable synchronous replication for existing spaces". Instead it
      is necessary to say, that it can be enabled using
      `space:alter({is_sync = true})`. And can be disabled by setting
      `is_sync = false`.
      https://www.tarantool.io/en/doc/2.5/book/replication/repl_sync/#limitations-and-known-problems
      
      The function will appear in >= 2.5.2.
      8c965989
    • Cyrill Gorcunov's avatar
      xrow: drop xrow_header_dup_body · 9dd2e2e4
      Cyrill Gorcunov authored
      
      We no longer use it.
      
      Closes #5129
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      9dd2e2e4
    • Cyrill Gorcunov's avatar
      txn: txn_add_redo -- drop synchro processing · 1d7e256b
      Cyrill Gorcunov authored
      
      Since we no longer use txn engine for synchro
      packets processing this code is never executed.
      
      Part-of #5129
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      1d7e256b
    • Cyrill Gorcunov's avatar
      applier: process synchro requests without txn engine · cfccfd44
      Cyrill Gorcunov authored
      
      Transaction processing code is very heavy simply because
      transactions are carrying various data and involves a number
      of other mechanisms to proceed.
      
      In turn, when we receive confirm or rollback packed from
      another node in a cluster we just need to inspect limbo
      queue and write this packed into a WAL journal. So calling
      a bunch of txn engine helpers is simply waste of cycles.
      
      Thus lets rather handle them in a special light way:
      
       - allocate synchro_entry structure which would carry
         the journal entry itself and encoded message
       - process limbo queue to mark confirmed/rollback'ed
         messages
       - finally write this synchro_entry into a journal
      
      Which is a way simplier.
      
      Part-of #5129
      
      Suggedsted-by: default avatarVladislav Shpilevoy <v.shpilevoy@tarantool.org>
      Co-developed-by: default avatarVladislav Shpilevoy <v.shpilevoy@tarantool.org>
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      cfccfd44
    • Cyrill Gorcunov's avatar
      qsync: direct write of CONFIRM/ROLLBACK into a journal · 41b31ff0
      Cyrill Gorcunov authored
      
      When we need to write CONFIRM or ROLLBACK message (which is
      a binary record in msgpack format) into a journal we use txn code
      to allocate a new transaction, encode there a message and pass it
      to walk the long txn path before it hit the journal. This is not
      only resource wasting but also somehow strange from architectural
      point of view.
      
      Instead lets encode a record on the stack and write it to the journal
      directly.
      
      Part-of #5129
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      41b31ff0
    • Cyrill Gorcunov's avatar
      qsync: provide a binary form of syncro entries · 7e1ce153
      Cyrill Gorcunov authored
      
      These msgpack entries will be needed to write them
      down to a journal without involving txn engine. Same
      time we would like to be able to allocate them on stack,
      for this sake the binary form is predefined.
      
      Part-of #5129
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      7e1ce153
    • Cyrill Gorcunov's avatar
      journal: add journal_entry_create helper · 580abaee
      Cyrill Gorcunov authored
      
      To create raw journal entries. We will use it
      to write confirm/rollback entries.
      
      Part-of #5129
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      580abaee
    • Cyrill Gorcunov's avatar
      journal: bind asynchronous write completion to an entry · fd145ed5
      Cyrill Gorcunov authored
      
      In commit 77ba0e35 we've redesigned
      wal journal operations such that asynchronous write completion
      is a single instance per journal.
      
      It turned out that such simplification is too tight and doesn't
      allow us to pass entries into the journal with custom completions.
      
      Thus lets allow back such ability. We will need it to be able
      to write "confirm" records into wal directly without touching
      transactions code at all.
      
      Part-of #5129
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      fd145ed5
  4. Aug 20, 2020
  5. Aug 17, 2020
    • Vladislav Shpilevoy's avatar
      xrow: introduce struct synchro_request · ee07eab4
      Vladislav Shpilevoy authored
      All requests saved to WAL and transmitted through network have
      their own request structure with parameters:
      - struct request for DML;
      - struct call_request for CALL/EVAL;
      - struct auth_request for AUTH;
      - struct ballot for VOTE;
      - struct sql_request for SQL;
      - struct greeting for greeting.
      
      It is done for a reason - not to pass all the request parameters
      into each function one by one, and manage them all at once
      instead.
      
      For synchronous requests IPROTO_CONFIRM and IPROTO_ROLLBACK it was
      not done. Because so far it was not too hard to carry just 2
      parameters: lsn and replica_id, from their body.
      
      But it will be changed in #5129. Because in fact these requests
      have more parameters, but they were filled by txn module, since
      synchro requests were saved to WAL via transactions (due to lack
      of alternative API to access WAL).
      
      After #5129 it will be necessary to save LSN and replica_id of the
      request author. This patch introduces struct synchro_request to
      simplify extension of the synchro parameters.
      
      Closes #5151
      Needed for #5129
      ee07eab4
  6. Aug 15, 2020
    • Vladislav Shpilevoy's avatar
      applier: drop a couple of unnecessary arguments · 7e16e45e
      Vladislav Shpilevoy authored
      Applier on_rollback and on_wal_write don't need any arguments -
      they either work with a global state, or with the signaled applier
      stored inside the trigger.
      
      However into on_wal_write() and on_rollback() was passed the
      transaction object, unused.
      
      Even if it would be used, it should have been fixed, because soon
      these triggers will be fired not only for traditional 'txn'
      transactions. They will be used by the synchro request WAL writes
      too - they don't have 'transactions'.
      
      Part of #5129
      7e16e45e
  7. Aug 13, 2020
    • Yaroslav Dynnikov's avatar
      Ensure all curl symbols are exported · 29ec6289
      Yaroslav Dynnikov authored
      In the recent update of libcurl (2.5.0-278-g807c7fa58) its layout has
      changed: private function `Curl_version_init()` which used to fill-in
      info structure was eliminated. As a result, no symbols for
      `libcurl_la-version.o` remained used, so it wasn't included in tarantool
      binary. And `curl_version` and `curl_version_info` symbols went missing.
      
      According to libcurl naming conventions all exported symbols are named
      as `curl_*`. This patch lists them all explicitly in `exprots.h` and
      adds the test.
      
      Close #5223
      29ec6289
  8. Aug 12, 2020
    • Vladislav Shpilevoy's avatar
      tuple: fix access by JSON path starting from '[*]' · 718267aa
      Vladislav Shpilevoy authored
      Tuple JSON field access crashed when '[*]' was used as a first
      part of the JSON path. The patch makes it treated like 'field not
      found'.
      
      Follow-up #5224
      718267aa
    • Vladislav Shpilevoy's avatar
      tuple: fix multikey field JSON access crash · 5c15df68
      Vladislav Shpilevoy authored
      When a tuple had format with multikey indexes in it, any attempt
      to get a multikey indexed field by a JSON path from Lua led to a
      crash.
      
      That was because of incorrect interpretation of offset slot value
      in tuple's field map.
      
      Tuple field map is an array stored before the tuple's MessagePack
      data. Each element is a 4 byte offset to an indexed value to be
      able to get it for O(1) time without MessagePack decoding of all
      the previous fields.
      
      At least it was so before multikeys. Now tuple field map is not
      just an array. It is rather a 2-level array, somehow similar to
      ext4 FS. Some elements of the root array are positive numbers
      pointing at data. Some elements point at a second 'indirect'
      array, so called 'extra', size of which is individual for each
      tuple. These second arrays are used by multikey indexes to store
      offsets to each multikey indexed value in a tuple.
      
      It means, that if there is an offset slot, it can't be just used
      as is. It is allowed only if the field is not multikey. Otherwise
      it is neccessary to somehow get an index in the second 'indirect'
      array.
      
      This is what was happening - a multikey field was found, its
      offset slot was valid, but it was pointing at an 'indirect' array,
      not at the data. JSON tuple field access tried to use it as a data
      offset.
      
      The patch makes JSON field access degrade to fullscan when a field
      is multikey, but no multikey array index is provided.
      
      Closes #5224
      5c15df68
  9. Aug 11, 2020
    • Vladislav Shpilevoy's avatar
      box: snapshot should not include rolled back data · 6f70020d
      Vladislav Shpilevoy authored
      Box.snapshot() could include rolled back data in case synchronous
      transaction ROLLBACK arrived during WAL rotation in preparation of
      a checkpoint.
      
      More specifically, snapshot consists of fixating the engines'
      content (creation of a read-view), doing WAL rotation, and writing
      the snapshot itself. All data changes after content fixation won't
      go into the snap. So if ROLLBACK arrives during WAL rotation, the
      fixated content will have rolled back data, not present in the
      newest dataset.
      
      The patch makes it fail if during WAL rotation anything was rolled
      back. The bug sometimes appeared in an existing test about qsync
      snapshots, but with a very poor reproducibility. In a new test
      file it is reproduced 100% without the patch.
      
      Closes #5167
      6f70020d
  10. Jul 31, 2020
    • Vladislav Shpilevoy's avatar
      txn_limbo: handle duplicate ACKs · 6e11674d
      Vladislav Shpilevoy authored
      Replica can send the same ACK multiple times. This is relatively
      easy to achieve.
      
      ACK is a message form the replica containing its vclock. It is
      sent on each replica's vclock update. The update not necessarily
      means that master's LSN was changed. Replica could write
      something locally, with its own instance_id. Vclock is changed,
      sent to the master, but from the limbo's point of view it looks
      like duplicated ACK, because the received master's LSN didn't
      change.
      
      The patch makes limbo ignore duplicated ACKs.
      
      Closes #5195
      Part of #5219
      6e11674d
    • Vladislav Shpilevoy's avatar
      txn_limbo: handle CONFIRM during ROLLBACK · e7559bfe
      Vladislav Shpilevoy authored
      Limbo could try to CONFIRM LSN whose ROLLBACK is in progress. This
      is how it could happen:
      
      - A synchronous transaction is created, written to WAL;
      - The fiber sleeps in the limbo waiting for CONFIRM or timeout;
      - Timeout happens. ROLLBACK for this and all next LSNs is sent to
        WAL;
      - Replica receives the transaction, sends ACK;
      - Master receives ACK, starts writing CONFIRM for the LSN, whose
        ROLLBACK is in progress right now.
      
      Another case - attempt to lower synchro quorum during ROLLBACK
      write. It also could try to write CONFIRM.
      
      The patch skips CONFIRM if there is a ROLLBACK in progress. Not
      even necessary to check LSNs. Because ROLLBACK always reverts the
      entire limbo queue, so it will cancel all pending transactions
      with all LSNs, and new commits are rolled back even before they
      try to go to WAL. CONFIRM can't help here with anything already.
      
      Part of #5185
      e7559bfe
  11. Jul 30, 2020
    • Vladislav Shpilevoy's avatar
      txn_limbo: handle ROLLBACK during CONFIRM · 849ba7dc
      Vladislav Shpilevoy authored
      Limbo could try to ROLLBACK LSN whose CONFIRM is in progress. This
      is how it could happen:
      
      - A synchronous transaction is created, written to WAL;
      - The fiber sleeps in the limbo waiting for CONFIRM or timeout;
      - Replica receives the transaction, sends ACK;
      - Master receives ACK, starts writing CONFIRM;
      - The first fiber times out and tries to write ROLLBACK for the
        LSN, whose CONFIRM is in progress right now.
      
      The patch adds more checks to the 'timed out' code path to see if
      it isn't too late to write ROLLBACK. If CONFIRM is in progress,
      the fiber will wait for its finish.
      
      Part of #5185
      849ba7dc
    • Vladislav Shpilevoy's avatar
      txn_limbo: panic when synchro WAL write fails · 61385877
      Vladislav Shpilevoy authored
      CONFIRM and ROLLBACK go to WAL. Their WAL write can fail just like
      any other WAL write. However it is not clear what to do in that
      case, especially in case of ROLLBACK fail.
      
      The patch adds panic() stub so as to at least terminate the
      instance. Before the patch it would work like nothing happened,
      with undefined behaviour.
      
      Closes #5159
      61385877
  12. Jul 29, 2020
    • Vladislav Shpilevoy's avatar
      txn_limbo: reduce fiber_set_cancellable() calls · a6ab3771
      Vladislav Shpilevoy authored
      The calls were added before and after each cond_wait() so as the
      fiber couldn't be woken up externally. For example, from Lua.
      
      But it is not necessary to flip the flag on each wait() call. It
      is enough to make it 2 times: forbid cancellation in the beginning
      of txn_limbo_wait_complete(), and return the old value back in the
      end.
      a6ab3771
    • Vladislav Shpilevoy's avatar
      txn_limbo: don't duplicate confirmations in WAL · 4920782e
      Vladislav Shpilevoy authored
      When an ACK was received for an already confirmed transaction
      whose CONFIRM WAL write is in progress, it produced a second
      CONFIRM in WAL with the same LSN.
      
      That was unnecessary work taking time and disk space for WAL
      records. Although it didn't lead to any bugs. Just was very
      inefficient.
      
      This patch makes confirmation LSN monotonically grow. In case more
      ACKs are received for an already confirmed LSN, its confirmation
      is not written second time.
      
      Closes #5144
      4920782e
    • Igor Munkin's avatar
      lua/utils: improve luaT_newthread performance · 604b98ee
      Igor Munkin authored
      
      <luaT_newthread> created a new GCfunc object for the helper invoked in a
      protected <lua_cpcall> frame (i.e. <luaT_newthread_wrapper>) on each
      call. The change introduces a static reference to a GCfunc object for
      <luaT_newthread_wrapper> to be initialized on Tarantool startup to
      reduce Lua GC memory usage.
      
      Furthermore, since <lua_cpcall> yields nothing on guest stack, the newly
      created Lua coroutine need to be pushed back to prevent its sweep. So
      to reduce guest stack manipulations <lua_cpcall> is replaced with
      <lua_pcall> and the resulting Lua thread is obtained via guest stack.
      
      Part of #5201
      
      Reviewed-by: default avatarSergey Ostanevich <sergos@tarantool.org>
      Reviewed-by: default avatarVladislav Shpilevoy <v.shpilevoy@tarantool.org>
      Signed-off-by: default avatarIgor Munkin <imun@tarantool.org>
      604b98ee
  13. Jul 28, 2020
    • Vladislav Shpilevoy's avatar
      test: fix flaky qsync_with_anon.test.lua again · cd292adb
      Vladislav Shpilevoy authored
      One of the test cases had 2 problems.
      
      - The same as in the previous commit - it started a sync
        transaction on master, switched to replica assuming it sees
        everything up to this sync transaction, but it still can see
        data from the previous test case;
      
      - The test case tried to write a sync transaction on master, got
        timeout, switched to replica to ensure the data is removed here
        too, but since dirty reads are possible, it could happen the
        data was delivered to replica and ROLLBACK wasn't not yet. On
        the replica the rolled back data still could be visible.
      
      The first issue is solved by flushing master's state to replica
      via making a successful sync transaction.
      
      The second issue is fixed by splitting it into more steps, not
      depending on timeouts (1000 is considered infinity).
      
      Closes #5196
      cd292adb
    • Vladislav Shpilevoy's avatar
      test: fix flaky qsync_snapshots.test.lua again · 3290d48a
      Vladislav Shpilevoy authored
      One of the test cases started a sync transaction on master,
      switched to replica, and tried to do some actions assuming that
      the latest master data has arrived here.
      
      But in fact the replica could be far behind the master. It could
      still contain data from the previous test case. That led to a
      bug, when it looked like if the replica had some data committed
      on it, but not committed on master - this was just data from the
      previous test case.
      
      The issue is solved by flushing master's state to replica via
      making a successful sync transaction.
      
      Closes #5167
      3290d48a
    • Timur Safin's avatar
      box: remove unnecessary bootstrap. file · 47551843
      Timur Safin authored
      `src/box/bootstrap.` file (please pay attendtion to the trailing dot)
      is preventing checkout of Tarantool sources on some file-systems
      (i.e. Windows NTFS) which disallow creations of such files.
      
      The funny story is - this file is unnecessary for the build process
      and might be easily deleted.
      
      Closes #4781
      47551843
    • Alexander V. Tikhonov's avatar
      gitlab-ci: restore lto testing on OSX · 412fd7b9
      Alexander V. Tikhonov authored
      Found that after commit:
      
        7faa1abe "gitlab-ci: implement OSX 10.14 testing on mac mini"
      
      the variable environment CMAKE_EXTRA_PARAMS, which turns on LTO flag
      on OSX compilation was mistakenly missed to be updated and LTO feature
      was disabled on OSX testing in real. After it some later commits
      generated the missed LTO errors, which the current commit fixed.
      
      Closes #5160
      412fd7b9
    • Oleg Babin's avatar
      lua: introduce function to check that passed value is uuid · ab3e332c
      Oleg Babin authored
      
      We already have is_decimal function that checks allowed value
      is decimal. After tarantool started to support UUID type it will
      be quite often case to check that some value has UUID type as
      well. This patch introduces "is_uuid" function for this purpose.
      
      Closes #5171
      
      Signed-off-by: default avatarOleg Babin <olegrok@tarantool.org>
      Reviewed-by: default avatarVladislav Shpilevoy <v.shpilevoy@tarantool.org>
      Reviewed-by: default avatarLeonid Vasiliev <lvasiliev@tarantool.org>
      
      @TarantoolBot document
      Title: uuid.is_uuid
      
      is_uuid function returns "true" if specified value has uuid type
      and "false" otherwise.
      ab3e332c
    • Cyrill Gorcunov's avatar
      lua/log: add missing exports · 625095d9
      Cyrill Gorcunov authored
      
      When building with LTO non exported symbols might
      be discarded while we have implicit use in log.lua
      code.
      
      Introduced by a94a9b3f
      
      Fixes #5160
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      625095d9
    • Roman Khabibov's avatar
      lua: return getaddrinfo() errors · acec5a82
      Roman Khabibov authored
      Add getaddrinfo() errors into the several fuctions of socket. Now
      getaddrinfo() return a pair of values (nil and error message) in
      case of error.
      
      Closes #4138
      
      @TarantoolBot document
      Title: socket API changes
      
      * socket.getaddrinfo()
      
      Return error message as the second return value.
      
      Example:
      tarantool> socket.getaddrinfo('non_exists_hostname', 3301)
      ---
      - null
      - 'getaddrinfo: nodename nor servname provided, or not known'
      ...
      
      * socket.tcp_connect()
      
      Return error message as the second return value.
      
      Example:
      tarantool> socket.tcp_connect('non_exists_hostname', 3301)
      ---
      - null
      - 'getaddrinfo: nodename nor servname provided, or not known'
      ...
      
      * socket.bind()
      
      Return error message as the second return value.
      
      Example:
      tarantool> socket.tcp_connect('non_exists_hostname', 3301)
      ---
      - null
      - 'getaddrinfo: nodename nor servname provided, or not known'
      ...
      
      * socket.tcp_server()
      
      Return error message as the second return value.
      
      Example:
      tarantool> socket.tcp_connect('non_exists_hostname', 3301)
      ---
      - null
      - 'getaddrinfo: nodename nor servname provided, or not known'
      ...
      acec5a82
    • Roman Khabibov's avatar
      coio/say: fix getaddrinfo error handling on macOS · eb73eda1
      Roman Khabibov authored
      Before this patch, branch when getaddrinfo() returns error codes
      couldn't be reached on macOS, because they are greater than 0 on
      macOS (assumption "rc < 0" in commit ea1da04d is incorrect for
      macOS).
      
      Note: diag_log() in say.c was added, because otherwise it will be
      hid in the case of panic(). Also, two diag_set() in
      syslog_connect_unix() was added to avoid asserts in this
      diag_log().
      
      Needed for #4138
      eb73eda1
  14. Jul 27, 2020
    • Vladislav Shpilevoy's avatar
      txn_limbo: introduce cascading rollback · dcaae66e
      Vladislav Shpilevoy authored
      Cascading rollback is a state when existing transactions are being
      rolled back right now, and newer transactions can't be committed
      as well. To preserve the 'reversed rollback order' rule.
      
      WAL writer can enter such state when something goes wrong with
      writing to disk. Limbo didn't have that feature until now.
      
      Consider an example why limbo should be able to turn on cascading
      rollback. Without cascading rollback it can happen that a
      transaction is seemingly rolled back, but after restart it is
      committed and visible. The scenario:
      
          * Master writes a sync transaction to WAL with LSN1;
      
          * It starts waiting for ACKs;
      
          * No ACKs for timeout - it starts writing to WAL the command
            ROLLBACK(LSN1). To rollback everything with LSN >= LSN1
            but < LSN of the ROLLBACK record itself;
      
          * Another fiber starts a new transaction, while ROLLBACK is in
            progress;
      
          * Limbo is not empty, so the new transaction is added there.
            Then it also starts writing itself to WAL;
      
          * ROLLBACK finishes WAL write. It rolls back all the
            transactions in the limbo to conform with the 'reversed
            rollback order' rule. Including the latest transaction;
      
          * The latest transaction finished its WAL write with LSN2 and
            sees that it was rolled back by the limbo already.
      
      All seems to be fine, but actually what happened is that
      ROLLBACK(LSN1) is written to WAL *before* the latest transaction
      with LSN2. Now when restart happens, ROLLBACK(LSN1) is replayed
      first, and then the latest LSN2 transaction is replayed second -
      it will be committed successfully, and will be visible.
      
      On the summary: transaction canceled its rollback after instance
      restart. Expected behaviour is that while ROLLBACK is in progress,
      all newer transactions should not even try going to WAL. They
      should be rolled back immediately.
      
      The patch implements the cascading rollback for the limbo.
      
      Closes #5140
      dcaae66e
  15. Jul 24, 2020
  16. Jul 22, 2020
    • Vladislav Shpilevoy's avatar
      txn: remove TXN_IS_DONE check from txn_commit() · 8c50e069
      Vladislav Shpilevoy authored
      
      TXN_IS_DONE was used in txn_commit() to finalize the transaction
      in a case it is not finished yet. But this happens only not in
      so common cases - during bootstrap and recovery. During normal
      operation the transaction is always finished when WAL thread
      returns it to TX thread after a disk write.
      
      So this is a matter of journal, and should be solved here, not in
      txn code with some crutch, especially in such a hot path place.
      
      This commit makes so that after journal_write() the transaction is
      always already done, i.e. txn_complete_async() was called. Nothing
      changes for the normal operation mode except this is -1 'if'.
      
      Also the commit disables snap_quorum_delay.test, which uses
      internal API of replication and txn modules assuming the journal
      is initialized somewhere inside. But now it is not, and it can't
      be fixed in a sane way inside the test. It will be
      deleted/rewritten later.
      
      Acked-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      8c50e069
    • Vladislav Shpilevoy's avatar
      txn_limbo: single function to confirm transactions · 71863597
      Vladislav Shpilevoy authored
      
      Before this patch there were some structural problems with the
      limbo code (still are now, but less).
      
      It wasn't clear when lifetime of a transaction ends. When it is
      committed, when rolled back, when completion is called. With
      rollback things are more or less bearable, but not with commit.
      
      Transaction's completion could be done at the same time with
      setting limbo_entry.is_commit flag. Or with setting the flag +
      yield + completion. This led to having a weird crutch in
      txn_limbo_wait_complete() to make async transactions explicitly
      wait until the previous sync transactions are written to WAL.
      
      The commit flag could be set in 3!!! different places - parameters
      update handler, ACK handler, and confirmation handler. In these 3
      places there were various assumptions making it really hard to
      understand what is happening here and there, what is the
      difference. Not counting how much code was duplicated.
      
      This patch makes the commit path always come through confirmation
      reader, as the most logical place for that, and already covering
      almost all the tricky cases.
      
      Now there is a guarantee that a transaction is completed if it is
      removed from the limbo queue. No need to check TXN_IS_DONE, wait
      for anything, whatsoever.
      
      Also the patch is a preparation for removal of TXN_IS_DONE check
      from the main path of txn_commit(). txn_limbo_wait_complete()
      shouldn't ever return a not finished transaction for that.
      
      Part of #5143
      
      Acked-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      71863597
    • Cyrill Gorcunov's avatar
      stailq: provide better names for args · be84b6de
      Cyrill Gorcunov authored
      
      Use self explanatory dest and src (like in strcat).
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      be84b6de
    • Kirill Yukhin's avatar
      luajit: bump new version · bc58c067
      Kirill Yukhin authored
      * gdb: fix the extension to be loaded with Python 2
      bc58c067
    • Alexander V. Tikhonov's avatar
      gitlab-ci: setup local cleanup/checkout processes · fca966a2
      Alexander V. Tikhonov authored
      Set cleanup based on docker for all of the jobs to avoid of fails
      when the job before change could use docker and reassigned the
      temporary files in the working directory. If the docker is not
      available than no need to use it for cleanup and cleanup runs
      using shell. Gitlab-ci clean flags disabled and reorganized localy
      as shown at [1].
      
      Disabled gitlab-ci checkout default process as shown at [2] to be
      able to fix repository before the checkout. Found that previously
      run gitlab-ci jobs could change permissions files at repository
      which broke the checkouts at the next jobs. Used for checkout
      strategy [3], for submodule update strategy [4]. Submodules local
      update routine in .gitlab.mk file became unneeded and removed.
      
      List of steps made localy instead of gitlab-ci preparations:
      
      1. Check/clone the Tarantool repository with submodules.
      2. For shell based jobs change ownership of all the sources
         to 'gitlab-runner' user. (NOTE: in Docker based jobs the
         'gitlab-runner' user is not known.)
      3. Fetch Tarantool sources with branches and force checkout
         of the testing commit.
      4. Update submodules recursively (use force where supports).
      5. Cleanup all the sources from all files except from repository.
      
      [1] https://docs.gitlab.com/ee/ci/yaml/README.html#git-clean-flags
      [2] https://docs.gitlab.com/ee/ci/yaml/README.html#git-strategy
      [3] https://docs.gitlab.com/ee/ci/yaml/README.html#git-checkout
      [4] https://docs.gitlab.com/ee/ci/yaml/README.html#git-submodule-strategy
      
      Follows up #5036
      fca966a2
Loading