Skip to content
Snippets Groups Projects
  1. Dec 23, 2020
    • Mergen Imeev's avatar
      box: remove unnecessary rights from peristent functions · 4a50e1c4
      Mergen Imeev authored
      After this patch, the persistent functions "box.schema.user.info" and
      "LUA" will have the same rights as the user who executed them.
      
      The problem was that setuid was unnecessarily set. Because of this,
      these functions had the same rights as the user who created them.
      However, they must have the same rights as the user who used them.
      
      Fixes tarantool/security#1
      4a50e1c4
    • Sergey Kaplun's avatar
      lua: avoid panic if HOOK_GC is not an active hook · 95aa7d20
      Sergey Kaplun authored
      
      Platform panic occurs when fiber.yield() is used within any active
      (i.e. being executed now) hook.
      
      It is a regression caused by 96dbc49d
      ('lua: prohibit fiber yield when GC hook is active').
      
      This patch fixes false positive panic in cases when VM is not running
      a GC hook.
      
      Relates to #4518
      Closes #5649
      
      Reported-by: default avatarMichael Filonenko <filonenko.mikhail@gmail.com>
      95aa7d20
    • Alexander V. Tikhonov's avatar
      gitlab-ci: add packaging for Fedora 32 · 1507c47f
      Alexander V. Tikhonov authored
      Added packaging jobs for Fedora 32.
      
      Closes #4966
      1507c47f
    • Alexander V. Tikhonov's avatar
      test: filter replication/skip_conflict_row output · 2828f912
      Alexander V. Tikhonov authored
      Found that test replication/skip_conflict_row.test.lua fails with
      output message in results file:
      
        [035] @@ -139,7 +139,19 @@
        [035]  -- applier is not in follow state
        [035]  test_run:wait_upstream(1, {status = 'stopped', message_re = "Duplicate key exists in unique index 'primary' in space 'test'"})
        [035]  ---
        [035] -- true
        [035] +- false
        [035] +- id: 1
        [035] +  uuid: f2084d3c-93f2-4267-925f-015df034d0a5
        [035] +  lsn: 553
        [035] +  upstream:
        [035] +    status: follow
        [035] +    idle: 0.0024020448327065
        [035] +    peer: unix/:/builds/4BUsapPU/0/tarantool/tarantool/test/var/035_replication/master.socket-iproto
        [035] +    lag: 0.0046234130859375
        [035] +  downstream:
        [035] +    status: follow
        [035] +    idle: 0.086121961474419
        [035] +    vclock: {2: 3, 1: 553}
        [035]  ...
        [035]  --
        [035]  -- gh-3977: check that NOP is written instead of conflicting row.
      
      Test could not be restarted with checksum because of changing values
      like UUID on each fail. It happend because test-run uses internal
      chain of functions wait_upstream() -> gen_box_info_replication_cond()
      which returns instance information on its fails. To avoid of it this
      output was redirected to log file instead of results file.
      2828f912
    • Alexander V. Tikhonov's avatar
      github-ci: set same workflow name as job · fa85c848
      Alexander V. Tikhonov authored
      Due to current testing schema uses separate pipelines per each testing
      job then workflow names should be the same as jobs to make it more
      visible on github actions results page [1].
      
      [1] - https://github.com/tarantool/tarantool/actions
      fa85c848
  2. Dec 22, 2020
  3. Dec 21, 2020
    • Vladislav Shpilevoy's avatar
      raft: fix crash on death timeout decrease · 4042b5c0
      Vladislav Shpilevoy authored
      If death timeout was decreased during waiting for leader death or
      discovery to a new value making the current death waiting end
      immediately, it could crash in libev.
      
      Because it would mean the remaining time until leader death became
      negative. The negative timeout was passed to libev without any
      checks, and there is an assertion, that a timeout should always
      be >= 0.
      
      This commit makes raft code covered almost on 100%, not counting
      one 'unreachable()' place.
      
      Closes #5303
      4042b5c0
    • Vladislav Shpilevoy's avatar
      raft: fix crash on election timeout decrease · ad713399
      Vladislav Shpilevoy authored
      If election timeout was decreased during election to a new value
      making the current election expired immediately, it could crash in
      libev.
      
      Because it would mean the remaining time until election end became
      negative. The negative timeout was passed to libev without any
      checks, and there is an assertion, that a timeout should always
      be >= 0.
      
      Part of #5303
      ad713399
    • Vladislav Shpilevoy's avatar
      raft: fix ignorance of bad state receipt · 3fe5367c
      Vladislav Shpilevoy authored
      raft_process_msg() only validated that the state is specified. But
      it didn't check if the state is inside of the allowed value range.
      
      Such messages were considered valid, and even their other fields
      were accepted. For instance, an invalid message could bump term.
      
      It is safer to reject such messages.
      
      Part of #5303
      3fe5367c
    • Vladislav Shpilevoy's avatar
      raft: fix crash when received 0 term message · 2f5522dd
      Vladislav Shpilevoy authored
      Term in raft can never be 0. It starts from 1 and can only grow.
      It was assumed it can't be received from other nodes because they
      do the same. There was an assertion for that.
      
      But in raft_msg, used as a transport unit between raft nodes, it
      was still possible to send 0 term. It could happen as a result of
      a bug, or if someone would try to mimic the protocol but made a
      mistake.
      
      That led to a crash in the assert in debug build.
      
      Part of #5303
      2f5522dd
    • Vladislav Shpilevoy's avatar
      test: introduce raft unit tests · e688280b
      Vladislav Shpilevoy authored
      Raft algorithm was tested only by functional Lua tests, as a part
      of the Tarantool executable.
      
      Functional testing of something like raft algorithm has drawbacks:
      
      - Not possible or too hard to cover some cases without error
        injections and/or good stability. Such as invalid messages, or
        small time durations, or a complex case which needs multiple
        steps to be reproduced. For instance, start WAL write, receive a
        message, finish the WAL write, and see if an expected thing
        happens.
      
      - Too long time to run when need to test not tiny timeouts. On the
        other hand, with tiny timeouts the test would become unstable.
      
      - Poor reproducibility due to random used in raft, due to system
        load, and number of other tests running in parallel.
      
      - Hard to debug, because for raft it is necessary to start one
        Tarantool process per raft instance.
      
      - Involves too much other systems, such as threads, real journal,
        relays, appliers, and so on. They can affect the process on the
        whole and reduce reproducibility and debug simplicity even more.
      
      Exactly the same problem existed for SWIM algorithm implemented as
      a module 'swim'. In order to test it, swim was moved to a separate
      library, refactored to be able to start many swims in the process
      (no global variables), its functions talking to other systems
      were virtualized (swim_ev, swim_transport), and substituted in the
      unit tests with fake analogue systems.
      
      In the unit tests these virtual functions were implemented
      differently, but the core swim algorithm was left intact and
      properly tested.
      
      The same is done for raft. This patch implements a set of helper
      functions and objects to unit test raft in raft_test_utils.c/.h
      files, and uses it to cover almost all the raft algorithm code.
      
      During implementation of the tests some bugs were found, which are
      not covered here, but fixed and covered in next commits.
      
      Part of #5303
      e688280b
    • Vladislav Shpilevoy's avatar
      raft: introduce raft_ev · 1d01394e
      Vladislav Shpilevoy authored
      Raft_ev.h/.c encapsulates usage of libev, to a certain extent. All
      libev functions are wrapped into raft_ev_* wrappers. Objects and
      types are left intact.
      
      This is done in order to be able to replace raft_ev.c in the
      soon coming unit tests with fakeev functions. That will allow to
      simulate time and catch all the raft events with 100%
      reproducibility, and without actual waiting for the events in
      real time.
      
      The similar approach is used for swim unit tests.
      
      Original raft core file is built into a new library 'raft_algo'.
      It is done for the sake of code coverage in unit tests. A test
      could be built by directly referencing raft.c in
      unit/CMakeLists.txt, but it can't apply compilation options to it,
      including gcov options.
      
      When raft.c is built into a library right where it is defined, it
      gets the gcov options, and the code covered by unit tests can be
      properly shown.
      
      Part of #5303
      1d01394e
    • Vladislav Shpilevoy's avatar
      fakesys: introduce fakeev_timer_remaining() · bce18d3c
      Vladislav Shpilevoy authored
      ev_timer_remaining() in libev returns number of seconds until the
      timer will expire. It is used in raft code.
      
      Raft is going to be tested using fakesys, and it means it needs
      fakeev analogue of ev_timer_remaining().
      
      Part of #5303
      bce18d3c
    • Vladislav Shpilevoy's avatar
      fakesys: fix ev_is_active not working on fake timers · bacf206c
      Vladislav Shpilevoy authored
      fakeev timers didn't set 'active' flag for ev_watcher objects.
      
      Because of that if fakeev_timer_start() was called, the timer
      wasn't visible as 'active' via libev API.
      
      Fakeev is supposed to be a drop-in simulation of libev, so it
      should "work" exactly the same.
      bacf206c
    • mechanik20051988's avatar
      memtx: fix a bug with insertion to space during recovery · 3bc4a156
      mechanik20051988 authored
      There was a problem whith on_schema_init trigger.
      This trigger gives a way to create on_replace trigger
      that will modify temporary or is_local spaces during recovery
      from snapshot, but on that stage of recovery process
      all space indexes are in special build mode when no check
      for uniqueness are made. I added a new function
      'is_recovery_finished' in box.ctl, which gives
      user ability to check that we are in snapshot recovery stage
      and can't insert/replace/update/upsert something. Also i added a
      check for corresponding operations, now they are failed
      if user tries to do them during snapshot recovery.
      
      @TarantoolBot document
      Title: Add 'is_recovery_finished' function
      Add 'is_recovery_finished' function in box.ctl
      to add user ability to check that we are
      in snapshot recovery stage and can't
      insert/replace/update/upsert something
      
      Closes #5304
      3bc4a156
  4. Dec 20, 2020
  5. Dec 19, 2020
    • Alexander Turenko's avatar
      test: update test-run (--test-timeout) · 6e6e7b29
      Alexander Turenko authored
      Added `--test-timeout <seconds>` argument. 110 seconds by default. The
      main idea is to don't reach --no-output-timeout when possible and so be
      able to restart a failed test (according to fragile test checksums) and
      store artifacts. PR #244.
      
      Fixed various cases when test-run doesn't wait for a stopping instance,
      doesn't try to stop it at all or issue just SIGTERM, without SIGKILL
      after some delay. PR #257.
      Unverified
      6e6e7b29
  6. Dec 16, 2020
    • Leonid Vasiliev's avatar
      sql: update temporary file name format · 427fdaa7
      Leonid Vasiliev authored
      
      The bug was consisted in fail when working with temporary files
      created by VDBE to sort large result of a `SELECT` statement with
      `ORDER BY`, `GROUP BY` clauses.
      
      Whats happen (step by step):
      - We have two instances on one node (sharded cluster).
      - A query is created that executes on both.
      - The first instance creates the name of the temporary file and
        checks a file with such name on existence.
      - The second instance creates the name of the temporary file
        (the same as in  first instance) and checks a file with such name
        on existence.
      - The first instance creates a file with the `SQL_OPEN_DELETEONCLOSE`
        flag.
      - The second instance opens(try to open) the same file.
      - The first instance closes (and removes) the temporary file.
      - The second instance tries to work with the file and fails.
      
      Why did it happen:
      The temporary file name format has a random part, but the random
      generator uses a fixed seed.
      
      When it was decided to use a fixed seed:
      32cb1ad2
      ("sql: drop useless code from os_unix.c")
      
      How the patch fixes the problem:
      The patch injects the PID in the temporary file name format.
      The generated name is unique for a single process (due to a random part)
      and unique between processes (due to the PID part).
      
      Alternatives:
      1) Use `O_TMPFILE` or `tmpfile()` (IMHO the best way to work with
        temporary files). In both cases, we need to update a significant
        part of the code, and some degradation can be added. It's hard to
        review.
      2) Return a random seed for the generator. As far as I understand,
        we want to have good reproducible system behavior, in which case
        it's good to use a fixed seed.
      3) Add reopening file with the flags `O_CREAT | O_EXCL` until we
        win the fight. Now we set such flags when opening a temporary file,
        but after that we try to open the file in `READONLY` mode and
        if ok - return the descriptor. This is strange logic for me and I
        don't want to add any aditional logic here. Also, such solution will
        add additional attempts to open the file.
      
      So, it look like such minimal changes will work fine and are simple
      to review.
      
      Co-authored-by: default avatarMergen <Imeev&lt;imeevma@gmail.com>
      
      Fixes #5537
      427fdaa7
    • Artem Starshov's avatar
      luacheck: change global vars to local in sql-tap · b1a4fed6
      Artem Starshov authored
      Fixed luacheck warning 111 (setting non-standard global variable)
      in test/sql-tap directory.
      Enabled this directory for checking W111 in
      config file(.luacheckrc).
      
      Changed almost all variables in test/sql-tap from globals
      to locals. In any cases, where variables need to be global,
      added explicit _G. prefix (table of globals).
      
      Fixes #5173
      Part-of #5464
      b1a4fed6
    • Artem Starshov's avatar
      bitset: replace zero-length array with flexible-array member · 9b0278f4
      Artem Starshov authored
      Zero-lenght arrays are GNU C extension.
      There's ISO C99 flexible array member, which
      is preffered mechanism to declare variable-length types.
      
      Flexible array member allows us to avoid applying sizeof
      operator cause it's incomplete type, so it will be an error
      at compile time. There're any moments else why it's better
      way to implement such structures via FAM:
      https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      
      In this issue it fixed gcc 10 warning:
      "warning: writing 1 byte into
      a region of size 0 [-Wstringop-overflow=]"
      
      Closes #4966
      Closes #5564
      9b0278f4
    • Artem Starshov's avatar
      sql: fix build with GCC 10 · a65d9c50
      Artem Starshov authored
      GCC 10 produces the following error:
      cc1: warning: function may return address of local variable [-Wreturn-local-addr]
      
      Fix it.
      
      Part-of #4966
      a65d9c50
  7. Dec 11, 2020
    • Serge Petrenko's avatar
      box: refactor tuple_field_raw to omit path checks · 28f3b2f1
      Serge Petrenko authored
      tuple_field_raw is an alias to tuple_field_raw_by_path with zero path.
      This involves multiple path != NULL checks which aren't needed for tuple
      field access by field number. The checks make this function rather slow
      compared to its 1.10 counterpart (see results below).
      
      In order to fix perf problems when JSON path indices aren't involved,
      factor out the part of tuple_field_raw_by_path which is responsible for
      direct field access by number and place it in tuple_field_raw.
      
      This patch was tested by snapshot recovery part involving secondary
      index building for a 1.5G snapshot with
      one space and one secondary index over 4 integer and one string field.
      Comparison table is below:
      
          Version    | time(seconds)  | Change relative to 1.10
      ---------------|----------------|------------------------
      1.10           |      2:24      |           -/-
      2.x(unpatched) |      3:03      |          + 27%
      2.x (patched)  |      2:10      |          - 10%
      
      Numbers below show cumulative time spent in tuple_compare_slowpath,
      for 1.10 / 2.x(unpatched) / 2.x(patched) for 15, 19 and 14 second
      profiles respectively: 13.9 / 17.8 / 12.5.
      
      tuple_field_raw() isn't measured directly, since it's inlined, but all
      its uses come from tuple_compare_slowpath.
      
      As the results show, we manage to be even faster, than 1.10 used to be
      in this test. This must be due to tuple comparison hints, which are
      present only in 2.x.
      
      Closes #4774
      28f3b2f1
    • Serge Petrenko's avatar
      box: speed up tuple_field_map_create · 420bacb2
      Serge Petrenko authored
      Since the introduction of JSON path indices tuple_init_field_map, which
      was quite a simple routine traversing a tuple and storing its field data
      offsets in the field map, was renamed to tuple_field_map_create and
      optimised for working with JSON path indices.
      
      The main difference is that tuple format fields are now organised in a
      tree rather than an array, and the tuple itself may have indexed fields,
      which are not plain array members, but rather members of some sub-array
      or map. This requires more complex iteration over tuple format fields
      and additional tuple parsing.
      
      All the changes were, however, unneeded for tuple formats not supporting
      fields indexed by JSON paths.
      
      Rework tuple_field_map_create so that it doesn't go through all the
      unnecessary JSON path-related checks for simple cases and restore most
      of the lost performance.
      
      Below are some benchmark results for the same workload that pointed to
      the degradation initially.
      Snapshot recovery time on RelWithDebInfo build for a 1.5G snapshot
      containing a single memtx space with one secondary index over 4 integer
      and 1 string field:
      
              Version            | Time (s) | Difference relative to 1.10
      ---------------------------|----------|----------------------------
      1.10 (the golden standard) |    28    |             -/-
      2.x (degradation)          |    39    |            + 39%
      2.x (patched)              |    31    |            + 11%
      
      Profile shows that the main difference is in memtx_tuple_new due to
      tuple_init_field_map/tuple_field_map_create performance difference.
      
      Numbers below show cumulative time spent in tuple_init_field_map (1.10) /
      tuple_field_map_create (unpatched) / tuple_field_map_create (patched).
      2.44 s / 8.61 s / 3.19 s
      
      More benchmark results can be seen at #4774
      
      Part of #4774
      420bacb2
    • Mergen Imeev's avatar
      sql: remove unecessary execute of space_cache_find() · bc16e5df
      Mergen Imeev authored
      Due to the fact that space_cache_find () is called unnecessarily, it is
      possible to set diag "Space '0' does not exist", although in this case
      it is not a wrong situation when the space id is 0.
      
      Part of #5592
      bc16e5df
    • Ilya Kosarev's avatar
      core: introduce evenly distributed int64 random in range · 31bf9ef1
      Ilya Kosarev authored
      Tarantool codebase had at least two functions to generate random
      integer in given range and both of them had problems at least with
      integer overflow. This patch brings nice functions to generate random
      int64_t in given range without overflow while preserving uniform random
      generator distribution using unbiased bitmask with rejection method.
      It is now possible to use xoshiro256++ PRNG or random bytes as a basis.
      Most relevant replacements have been made. Needed tests are introduced.
      
      Closes #5075
      31bf9ef1
    • Alexander V. Tikhonov's avatar
      test: add filter to box/net.*after_gh-3164 test · e519954a
      Alexander V. Tikhonov authored
      Found issue:
      
        [079] @@ -115,5 +115,14 @@
        [079]  -- connection is deleted by 'collect'.
        [079]  weak.c
        [079]  ---
        [079] -- null
        [079] +- peer_uuid: 035d7b36-f205-45f4-9e16-e5b0b99a9b0b
        [079] +  opts:
        [079] +    reconnect_after: 0.1
        [079] +  host: unix/
        [079] +  schema_version: 78
        [079] +  protocol: Binary
        [079] +  state: error_reconnect
        [079] +  error: Connection refused
        [079] +  peer_version_id: 132864
        [079] +  port: /tmp/tnt/079_box/proxy.socket-iproto
        [079]  ...
      
      Which could not be restarted with checksum because of changing UUID
      value each run. To avoid of it added filter on 'peer_uuid:' output.
      e519954a
    • Alexander V. Tikhonov's avatar
      test: add test filter for vinyl/errinj.test.lua · 718cba1b
      Alexander V. Tikhonov authored
      Added test-run filter on box.snapshot error message:
      
        'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'
      
      to avoid of printing changing data in results file to be able to use
      its checksums in fragile list of test-run to rerun it as flaky issue.
      
      Needed for #4346
      718cba1b
  8. Dec 10, 2020
  9. Dec 08, 2020
Loading