Skip to content
Snippets Groups Projects
  1. Jan 25, 2022
    • Vladislav Shpilevoy's avatar
      raft: fix ev_timer.at incorrect usage · bd44ab51
      Vladislav Shpilevoy authored
      ev_timer.at was used as timeout. But after ev_timer_start() it
      turns into the deadline - totally different value.
      
      The patch makes sure ev_timer.at is not used in raft at all.
      
      To test that the fakeev subsystem is patched to start its time not
      from 0. Otherwise ev_timer.at often really matched the timeout
      even for an active timer.
      
      (cherry picked from commit e51c61ae)
      bd44ab51
    • Vladislav Shpilevoy's avatar
      raft: fix crash on election_timeout reconfig · b428b2de
      Vladislav Shpilevoy authored
      It used to crash if done during election on a node voted for
      anybody, it is a candidate, it doesn't know a leader yet, but has
      a WAL write in progress.
      
      Thus it could only happen if the term was bumped by a message from
      a non-leader node and wasn't flushed to the disk yet.
      
      The patch makes the reconfig check if there is a WAL write in
      progress. Then don't do anything.
      
      Could also check for volatile vote instead of persistent, but it
      would create the same problem for the case when started writing
      vote for self and didn't finish yet. Reconfig would crash.
      
      (cherry picked from commit 82757e55)
      b428b2de
  2. Jan 19, 2022
  3. Jan 14, 2022
    • Vladimir Davydov's avatar
      test: fix flaky vinyl/gh-4810-dump-during-index-build test · eed93fbb
      Vladimir Davydov authored
      The commit fixes the following test failure:
      
      ```
      [082] vinyl/gh-4810-dump-during-index-build.test.lua                  Test timeout of 310 secs reached	[ fail ]
      [082]
      [082] Test failed! Result content mismatch:
      [082] --- vinyl/gh-4810-dump-during-index-build.result	Thu Dec  9 05:31:17 2021
      [082] +++ /build/usr/src/debug/tarantool-2.10.0~beta1.324.dev/test/var/rejects/vinyl/gh-4810-dump-during-index-build.reject	Thu Dec  9 06:51:03 2021
      [082] @@ -117,34 +117,3 @@
      [082]  for i = 1, ch:size() do
      [082]      ch:get()
      [082]  end;
      [082] - | ---
      [082] - | ...
      [082] -
      ...
      ```
      
      The test hangs waiting for the test fibers to exit. There are two test
      fibers - one builds an index, another populates the test space. The
      latter uses pcall so it always returns. The one that builds an index,
      however, doesn't. The problem is index build may fail because it builds
      a unique index while the fiber populating the space may insert
      non-unique values. Fix this by building a non-unique index instead,
      which should never fail. To reproduce the issue the test checks is fixed
      one can build any index, unique or non-unique, so it should be fine.
      
      Closes #5508
      
      (cherry picked from commit 5cd399b7)
      eed93fbb
    • Vladimir Davydov's avatar
      test: fix flaky vinyl/gh test failure · 698c3c7a
      Vladimir Davydov authored
      The commit fixes the following test failure:
      
      ```
      [005] vinyl/gh.test.lua                                               [ fail ]
      [005]
      [005] Test failed! Result content mismatch:
      [005] --- vinyl/gh.result	Mon Dec 13 15:03:45 2021
      [005] +++ /root/actions-runner/_work/tarantool/tarantool/test/var/rejects/vinyl/gh.reject	Fri Dec 17 10:41:24 2021
      [005] @@ -716,7 +716,7 @@
      [005]  ...
      [005]  test_run:wait_cond(function() return finished == 2 end)
      [005]  ---
      [005] -- true
      [005] +- false
      [005]  ...
      [005]  s:drop()
      [005]  ---
      ```
      
      The reason of the failure is that the fiber doing checkpoints fails,
      because a checkpoint may be already running by the checkpoint daemon.
      Invoke box.snapshot() under pcall to make the test more robust.
      
      Part of #5141
      
      (cherry picked from commit cc6c328d)
      698c3c7a
    • Vladimir Davydov's avatar
      test: fix flaky vinyl/deferred_delete test · 93f96a58
      Vladimir Davydov authored
      The commit fixes the following test failure:
      
      ```
      [019] vinyl/deferred_delete.test.lua                                  [ fail ]
      [019]
      [019] Test failed! Result content mismatch:
      [019] --- vinyl/deferred_delete.result	Tue Jan 11 11:10:22 2022
      [019] +++ /build/usr/src/debug/tarantool-2.10.0~beta2.37.dev/test/var/rejects/vinyl/deferred_delete.reject	Fri Jan 14 11:45:26 2022
      [019] @@ -964,7 +964,7 @@
      [019]  ...
      [019]  sk:stat().disk.dump.count -- 1
      [019]  ---
      [019] -- 1
      [019] +- 0
      [019]  ...
      [019]  sk:stat().rows - dummy_rows -- 120 old REPLACEs + 120 new REPLACEs + 120 deferred DELETEs
      [019]  ---
      ```
      
      The test checks that compaction of a primary index triggers dump of
      secondary indexes of the same space, because it generates deferred
      DELETE statements. There's no guarantee that by the time compaction
      completes, secondary index dump have been completed as well, because
      compaction may ignore the memory quota (it uses vy_quota_force_use in
      vy_deferred_delete_on_replace). Make the check more robust by using
      wait_cond.
      
      Follow-up #5089
      
      (cherry picked from commit 7f8c549b)
      93f96a58
    • Vladimir Davydov's avatar
      test: use wait_cond in vinyl/deferred_delete test · a930f8a0
      Vladimir Davydov authored
      It's better than hand-written busy-wait.
      
      (cherry picked from commit 8c913a10)
      a930f8a0
    • Vladimir Davydov's avatar
      test: fix flaky vinyl/gc test · 191cf6e9
      Vladimir Davydov authored
      The commit fixes the following test failure:
      
      ```
      [013] vinyl/gc.test.lua                                               [ fail ]
      [013]
      [013] Test failed! Result content mismatch:
      [013] --- vinyl/gc.result	Fri Dec 24 12:27:33 2021
      [013] +++ /build/usr/src/debug/tarantool-2.10.0~beta2.18.dev/test/var/rejects/vinyl/gc.reject	Thu Dec 30 10:29:29 2021
      [013] @@ -102,7 +102,7 @@
      [013]  ...
      [013]  check_files_number(2)
      [013]  ---
      [013] -- true
      [013] +- null
      [013]  ...
      [013]  -- All records should have been purged from the log by now
      [013]  -- so we should only keep the previous log file.
      ```
      
      The reason of the failure is that vylog files are deleted asynchronously
      (`box.snapshot()` doesn't wait for `unlink` to complete) since commit
      8e429f4b ("wal: remove old xlog files
      asynchronously"). So to fix the test, we just need to make the test wait
      for garbage collection to complete.
      
      Follow-up #5383
      
      (cherry picked from commit cd9fd77e)
      191cf6e9
  4. Jan 13, 2022
  5. Jan 12, 2022
    • Yaroslav Lobankov's avatar
      ci: mark 'unicode_de__phonebook_s3' as unstable · eded8278
      Yaroslav Lobankov authored
      The test for the 'unicode_de__phonebook_s3' collation from
      sql-tap/collation_unicode.test.lua fail if the ICU version >= 70.1.
      So let's temporarily mark it as unstable until the issue is resolved.
      
      See for more details tarantool/tarantool#6695.
      eded8278
  6. Dec 30, 2021
  7. Dec 29, 2021
  8. Dec 27, 2021
    • Serge Petrenko's avatar
      recovery: panic in case of recovery and replicaset vclock mismatch · f9e26802
      Serge Petrenko authored
      We assume that no one touches the instance's WALs, once it has taken the
      wal_dir_lock. This is not the case when upgrading from an old setup
      (running tarantool 1.7.3-6 or less). Such nodes either take a lock on
      snap dir, which may be different from wal dir, or don't take the lock at
      all.
      
      So, it's possible that during upgrade an old node is not stopped
      properly before a new node is started in the same data directory.
      
      The old node might even write some extra data to WAL during new node's
      startup.
      
      This is obviously bad and leads to multiple issues. For example, new node
      might start local recovery, scan the WALs and set replicaset.vclock to
      some value {1 : 5}. While the node recovers WALs they are appended by the old
      node up to vclock {1 : 10}.
      The node finishes local recovery with replicaset vclock {1 : 5}, but
      data recovered up to vclock {1 : 10}.
      
      The node will use the now outdated replicaset vclock to subscribe to
      remote peers (leading to replication breaking due to duplicate keys
      found), to initialize WAL (leading to new xlogs appearing with duplicate
      LSNs). There might be a number of other issues we just haven't stumbled
      upon.
      
      Let's prevent situations like that and panic as soon as we see that the
      initially scanned vclock (replicaset vclock) differs from actually
      recovered vclock.
      
      Closes #6709
      
      (cherry picked from commit 634f59c7)
      f9e26802
  9. Dec 24, 2021
  10. Dec 23, 2021
    • Vladimir Davydov's avatar
      iproto: clear request::header for client requests · 3a9f8899
      Vladimir Davydov authored
      To apply a client request, we only need to know its type and body. All
      the meta information, such as LSN, TSN, or replica id, must be set by
      WAL. Currently, however, it isn't necessarily true: iproto leaves a
      request header received over iproto as is, and tx will reuse the header
      instead of allocating a new one in this case, which is needed to process
      replication requests, see txn_add_redo().
      
      Unless a client actually sets one of those meta fields, this causes no
      problems. However, if we added transaction support to the replication
      protocol, reusing the header would result in broken xlog, because
      currently, all requests received over iproto have the is_commit field
      set in xrow_header for the lack of TSN, while is_commit must only be set
      for the final statement in a transaction. One way to fix it would be
      clearing is_commit explicitly in iproto, but ignoring the whole header
      received over iproto looks more logical and error-proof.
      
      Needed for #5860
      
      (cherry picked from commit 4fefb519)
      3a9f8899
  11. Dec 22, 2021
    • Kirill Yukhin's avatar
    • Kirill Yukhin's avatar
      Generate changelog for 2.8.3 · 0d76f62a
      Kirill Yukhin authored
      0d76f62a
    • Alexander Turenko's avatar
      ci: add linter job for changelog entries · 09b6408e
      Alexander Turenko authored
      What is bad: a considerable amount of boilerplate code should be added
      to just run a simple script. I hope we'll do something with this
      in #6604.
      
      (cherry picked from commit 8b1ce351)
      09b6408e
    • Alexander Turenko's avatar
      ci: rename luacheck.yml to lint.yml · ca003721
      Alexander Turenko authored
      The idea is to allow to add more checks here: I'm going to add
      changelogs check in a next commit.
      
      (cherry picked from commit f0980af8)
      ca003721
    • Alexander Turenko's avatar
      ci: drop useless chown call · 53bb2a09
      Alexander Turenko authored
      AFAIK we anyway run self-hosted runners from root due to all those
      problems like [1]. No reason to include the hack into every workflow
      file. If we'll going to start runners from a non-root user, it is better
      to pass `--user $PROPER_UID` to all docker jobs instead, see [2]: at
      least it does not require to place a boilerplate into all workflow
      files.
      
      Didn't touch perf_* jobs: don't want to dig inside this part of the
      infrastructure ATM.
      
      It reverts PR #5953.
      
      [1]: https://github.com/actions/checkout/issues/211
      [2]: https://github.com/actions/runner/issues/691
      
      (cherry picked from commit a0a2e8b9)
      53bb2a09
    • Alexander Turenko's avatar
      ci: drop useless fail-fast clause · 1b62dc6b
      Alexander Turenko authored
      AFAIU it only has meaning for jobs constructed with matrix expansion.
      See the documentation: [1].
      
      It is relevant to [2] as well, but already fixed on nektos/act side.
      
      [1]: https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#jobsjob_idstrategyfail-fast
      [2]: https://github.com/tarantool/tarantool-qa/issues/118
      
      (cherry picked from commit 474d1cc2)
      1b62dc6b
    • Vladimir Davydov's avatar
      ci: run full ci only on PRs with 'full-ci' label · 5beb9b29
      Vladimir Davydov authored
      After this commit only three workflow are run on pull request or push to
      a developer branch:
       - luacheck
       - release
       - debug_coverage
      
      To run all other tests, one should either name the branch `*-full-ci`
      and push it to the main repository or set the 'full-ci' label on the
      pull request.
      
      It is also possible to disable all tests on push by naming branch as
      `*-notest' or setting the 'notest' label on the pull request.
      
      **Caveats**:
       - Unfortunately, currently it doesn't seem to be possible to run
         workflows automatically when a particular label is set - the best we
         can do is run workflows when *any* label is set. So labeling a PR
         that has the 'full-ci' label set will trigger all workflows!
       - For the same reason, removing the 'notest' label doesn't trigger ci.
         One has to synchronize the PR afterwards. We could trigger ci on the
         'unlabel' event, but this would trigger tests when any label is
         removed, not necessarily 'notest'. Since 'notest' is supposed to be
         used only by developers, who can sync the branch, this should be
         acceptable.
      
      While we are at it:
       - Remove the check disabling certain workflow runs on forks - it's
         pointless, because forks don't have ci. Anyway, we don't bother
         disabling most of our workflows on forks, even those that we run on
         self-hosted machines, so that would only be consistent.
       - Remove the condition from the coverity workflow - coverity doesn't
         run on push or PR so it doesn't make any sense.
       - Remove the condition from the 'source' workflow. Instead trigger it
         only when a tag is pushed. This is needed to avoid showing it as a
         skipped workflow in PRs and commits.
      
      Closes #6605
      
      (cherry picked from commit 6c2e664e)
      5beb9b29
    • Alexander Turenko's avatar
      github-ci: don't deploy tarballs per push · ca8bf66e
      Alexander Turenko authored
      Follows up #6185
      
      (cherry picked from commit f820a3a8)
      ca8bf66e
    • Yaroslav Lobankov's avatar
      ci: cancel outdated workflow runs · e6552caa
      Yaroslav Lobankov authored
      According to a huge amount of commit pushes by developers we should
      cancel all outdated workflow runs (previously scheduled and not relevant
      due to new changes) to make CI more efficient. GitHub Actions provides
      the 'concurrency' feature [1] as a method of reaching that and this
      patch introduces its using.
      
      How does it work?
      
      Basically, an update of a developer branch cancels the previously
      scheduled workflow run for this branch. However, the 'master' branch,
      release branch (1.10, 2.8, etc.), and tag workflow runs are never
      canceled.
      
      [1] https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#concurrency
      
      Closes tarantool/tarantool-qa#100
      
      (cherry picked from commit d3f32d18)
      e6552caa
  12. Dec 21, 2021
    • Andrey Saranchin's avatar
      build: fix build with glibc-2.34 · bba7a2fa
      Andrey Saranchin authored
      Macros SIGSTKSZ used to be an integral constant but
      in glibc-2.34 it turned into a runtime function so it
      cannot be used as constant known size for arrays anymore.
      
      Beyond this, SIGSTKSZ is not enough for alt. signal stack size
      when you use ASAN, so the size was increased.
      
      Closes #6686
      
      (cherry picked from commit 9c01b325)
      bba7a2fa
    • Vladimir Davydov's avatar
      Move xmalloc to trivia/util.h · 8662fb74
      Vladimir Davydov authored
      We want to use the xmalloc helper throughout the code, not only in
      the core lib. Move its definition to trivia/util.h and use fprintf+exit
      instead of say/panic in order not to create circular dependencies.
      
      (cherry picked from commit f3b5ad97)
      8662fb74
    • Vladimir Davydov's avatar
      core: add x* memory allocation functions · badf030e
      Vladimir Davydov authored
      This patch adds xmalloc, xcalloc, xrealloc, xstrdup, and xstrndup helper
      functions. Each of them calls the corresponding memory allocation
      function and panics if it fails. See the issue description for the full
      justification.
      
      Closes #3534
      
      (cherry picked from commit 60dc88ea)
      badf030e
  13. Dec 13, 2021
Loading