Skip to content
Snippets Groups Projects
  1. Dec 30, 2020
    • Sergey Nikiforov's avatar
      base64: Properly ignore invalid characters · 726b96f0
      Sergey Nikiforov authored
      Not all invalid characters were ignored by base64 decoder
      causing data corruption and reads beyond decode table
      (faults under ASAN).
      
      Added corresponding check into base64 unit test.
      
      Fixes: #5627
      726b96f0
    • Igor Munkin's avatar
      luajit: bump new version · 94203c14
      Igor Munkin authored
      * core: remove excess assertion inside memprof
      * core: fix resources leak in memory profiler
      * misc: fix build with disabled memory profiler
      
      Follows up #5442
      94203c14
    • Alexander Turenko's avatar
      build: don't re-export libcurl.so/dylib symbols · 47c19eeb
      Alexander Turenko authored
      Export libcurl's symbols only when they are provided by tarantool
      itself: when the library is linked statically into the tarantool's
      executable. There is no much sense to export the symbols when we link
      against the library dynamically.
      
      Regarding motivation of the change. Since 2.6.0-36-g29ec62891 ('Ensure
      all curl symbols are exported') the curl_multi_poll() function is
      exported from the tarantool executable. It leads to a failure in
      Homebrew's build, because there we link (dynamically) with a system
      libcurl. On Mac OS 10.15 it is libcurl 7.64.1, while the function
      appears since libcurl 7.66.0. So a linker reports the undefined symbol:
      `curl_multi_poll`.
      
      Now the symbols are not exported at dynamic linking with libcurl, so the
      linker is happy.
      
      This commit relaxes bounds for dynamic linking, but an attempt to link
      with libcurl older than 7.66.0 statically still leads to a linking
      failure. The box-tap/gh-5223-curl-exports.test.lua test still fails when
      tarantool is linked (dynamically) against an old libcurl.
      
      It looks as the good compromise. When libcurl functionality is provided
      by tarantool itself, *all* functions listed in the test are present
      (otherwise a linker will complain). But tarantool does not enforce a
      newer libcurl version, when it just *uses* this functionality and don't
      provide it for modules and stored procedured. It is not tarantool's
      responsibility in the case.
      
      We possibly should skip the box-tap/gh-5223-curl-exports.test.lua test
      when tarantool is built against libcurl dynamically or revisit the
      described approach. I'll leave it as possible follow up activity.
      
      Fixes #5542
      47c19eeb
    • Aleksandr Lyapunov's avatar
      txm: fix tuple ownership strategy · ecc3f3d2
      Aleksandr Lyapunov authored
      Hotfix of 88b76800
      
      Fix an obvious bug - tuple ref/unref manipulation must be done only
       when we handle the primary index. Even code comment states that.
      
      Part of #5628
      ecc3f3d2
  2. Dec 29, 2020
    • Alexander Turenko's avatar
      github-ci: add --init option for docker containers · 18e24209
      Alexander Turenko authored
      
      Now we have a PID 1 zombie reaping problem, when zombie processes
      launched in Docker aren't collected by init, how it would be done if we
      launch it on a real host.
      
      It is fixed by adding --init option, when a container is created or
      started.
      
      Co-authored-by: default avatarArtem Starshov <artemreyt@tarantool.org>
      
      Follows up #4983
      18e24209
    • Cyrill Gorcunov's avatar
      crash: allow to build on non x86-64 machines · eaa61b5b
      Cyrill Gorcunov authored
      
      The general purpose registers were optional earlier
      lets make them optional back allowing the code to
      be compiled on non x86-64 machines.
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      Reviewed-by: default avatarAlexander Turenko <alexander.turenko@tarantool.org>
      eaa61b5b
    • Cyrill Gorcunov's avatar
      crash: extend report with instance data · 7cf4c487
      Cyrill Gorcunov authored
      
      Product team would prefer to have more data to
      be included into a crash report.
      
      So we add "instance" key with appropriate values
      (just like regular feedback entry has). For example
      
      | {
      |   "crashdump": {
      |     "version": "1",
      |     "data": {
      |       "uname": {
      |         "sysname": "Linux",
      |         "release": "5.9.14-100.fc32.x86_64",
      |         "version": "#1 SMP Fri Dec 11 14:30:38 UTC 2020",
      |         "machine": "x86_64"
      |       },
      |       "instance": {
      |         "server_id": "336bfbfd-9e71-4728-91e3-ba84aec4d7ea",
      |         "cluster_id": "176f3669-488f-46a5-a744-1be0b8a31029",
      |         "uptime": "3"
      |       },
      |       "build": {
      |         "version": "2.7.0-183-g02970b402",
      |         "cmake_type": "Linux-x86_64-Debug"
      |       },
      |       "signal": {
      |         "signo": 11,
      |         "si_code": 0,
      |         "si_addr": "0x3e800095fb9",
      |         "backtrace": "#0  0x6317ab in crash_collect+bf...",
      |         "timestamp": "2020-12-28 21:09:29 MSK"
      |       }
      |     }
      |   }
      | }
      
      Closes #5668
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      Reviewed-by: default avatarAlexander Turenko <alexander.turenko@tarantool.org>
      7cf4c487
    • Aleksandr Lyapunov's avatar
      txm: change tuple ownership strategy · 88b76800
      Aleksandr Lyapunov authored
      Since a space holds pointers to tuples it must increment
      reference counters of its tuples and must decrement counters
      of tuples that are deleted from the space.
      
      Memtx TX manager also holds references to processing tuples
      in the same way.
      
      Before this patch there was a logic: while a tuple is dirty
      it belongs to TX manager and does not belong to the space. Only
      when a tuple is cleared and it is still in space it will be
      referenced by space.
      
      That logic leads to crashes in some DDL requests since they
      works with indexes directly. For example deleting an index
      causes dereferencing of all its tuples - even dirty.
      
      This patch changes the logic. Now all tuples that are physically
      in the primary index of the space a referenced. Once removed from
      primary index - the tuple is dereferenced. TX manager references
      tuples like before - every holding tuple is referenced.
      
      Part of #5628
      88b76800
    • Aleksandr Lyapunov's avatar
      txm: free resource gracefully on server shutdown · a3741f6b
      Aleksandr Lyapunov authored
      Part of #5628
      a3741f6b
    • Aleksandr Lyapunov's avatar
      txm: fix another simple bug in tx manager · de6a4849
      Aleksandr Lyapunov authored
      There was a typo in collection of read set - a dirty
      tuple was added instead of clean.
      
      Closes #5559
      de6a4849
    • Aleksandr Lyapunov's avatar
      txm: fix a simple bug in tx manager · 122dc47f
      Aleksandr Lyapunov authored
      The problem happened when a tuple story was delete by two
      statements, one committed and one not committed.
      
      Part of #5628
      122dc47f
    • mechanik20051988's avatar
      memtx: change small allocator behavior · 4d175bff
      mechanik20051988 authored
      Previously, in small allocator, memory pools
      were allocated at the request, which in the case
      of a small slab_alloc_factor led to use
      pools with incorrect sizes. This patch changed
      small allocator behavior, now we allocate pools
      on the stage of allocator creation. Also we use
      special function to find appropriate pool, which
      is faster, then previous version with rbtree.
      This change fixes #5216.
      
      Also moved a check, that the slab_alloc_factor is in
      the range (1.0, 2.0] from small allocator to memtx_engine.
      If factor is not in range change it to 1.0001 or 2.0 respectively
      
      Closes #5216
      4d175bff
    • Kirill Yukhin's avatar
      small: bump new version · 43aceb45
      Kirill Yukhin authored
      * Fix Centos6 build failed
      * Fix build error
      * Remove Ubuntu 19.04 Disco
      * small: implement new size class evaluation
      * test: add small allocator performance test
      * small: changed small allocator pool management
      43aceb45
  3. Dec 28, 2020
  4. Dec 27, 2020
    • Artem Starshov's avatar
      lua: fix running init lua script · a8f3a6cb
      Artem Starshov authored
      When tarantool launched with -e flag and in
      script after there is an error, program hangs.
      This happens because shed fiber launches separate
      fiber for init user script and starts auxiliary
      event loop. It's supposed that fiber will stop
      this loop, but in case of error in script, fiber
      tries to stop a loop when the last one isn't
      started yet.
      
      Added a flag, which will watch is loop started and
      when fiber tries to call `ev_break()` we can be sure
      that loop is running already.
      
      Fixes #4983
      a8f3a6cb
    • Alexander Turenko's avatar
      test: update test-run (pass timeouts via env) · a598f3f5
      Alexander Turenko authored
      The following variables now control timeouts (if corresponding command
      line options are not passed): TEST_TIMEOUT, NO_OUTPUT_TIMEOUT,
      REPLICATION_SYNC_TIMEOUT. See [1] for details.
      
      I set the following values in the GitLab CI web interface:
      
      | Variable                 | Value                                                   |
      | ------------------------ | ------------------------------------------------------- |
      | REPLICATION_SYNC_TIMEOUT | 300                                                     |
      | TEST_TIMEOUT             | 310                                                     |
      | NO_OUTPUT_TIMEOUT        | 320                                                     |
      | PRESERVE_ENVVARS         | REPLICATION_SYNC_TIMEOUT,TEST_TIMEOUT,NO_OUTPUT_TIMEOUT |
      
      See packpack change [2] and the commit 'ci: preserve certain environment
      variables' regarding the PRESERVE_ENVVARS variable.
      
      The reason, why we need to increase timeouts, comes from the following
      facts:
      
      - We use self-hosted runners to serve GitLab CI jobs. So, the machine
        resources are limited.
      - We run testing with high level of parallelism to speed it up.
      - We have a bunch of vinyl tests, which intensively use disk.
      
      Disk accesses may be quite long within this infrastructure and the
      obvious way to workaround the problem is to increase timeouts.
      
      In the long term we should scale resources depending on the testing
      needs. We'll try to use GitHub hosted runners or, if we'll reach some
      limits, will setup GitHub runners on the Mail.Ru Cloud Solutions
      infrastructure.
      
      [1]: https://github.com/tarantool/test-run/issues/258
      [2]: https://github.com/packpack/packpack/pull/135
      a598f3f5
    • Alexander Turenko's avatar
      ci: preserve certain environment variables · d2f4bd68
      Alexander Turenko authored
      We want to increase testing timeouts for GitLab CI, where we use our own
      runners and observe stalls and high disk pressure when several vinyl
      tests are run in parallel. The idea is to set variables in GitLab CI web
      interface and read them from test-run (see [1]).
      
      First, we need to pass the variables into inner environments. GitLab CI
      jobs run the testing using packpack, Docker or VirtualBox.
      
      Packpack already preserves environment variables that are listed in the
      PRESERVE_ENVVARS variable (see [2]).
      
      This commit passes the variables that are listed in the PRESERVE_ENVVARS
      variable into Docker and VirtualBox environment. So, all jobs will have
      given variables in the enviroment. (Also dropped unused EXTRA_ENV
      variable.)
      
      The next commit will update the test-run submodule with support of
      setting timeouts using environment variables.
      
      [1]: https://github.com/tarantool/test-run/issues/258
      [2]: https://github.com/packpack/packpack/pull/135
      d2f4bd68
  5. Dec 26, 2020
    • Alexander V. Tikhonov's avatar
      test: remove obvious part in rpm spec for Travis · d9c25b7a
      Alexander V. Tikhonov authored
      Removed obvious part in rpm spec for Travis-CI, due to it is no
      longer in use.
      
      ---- Comments from @Totktonada ----
      
      This change is a kind of revertion of the commit
      d48406d5 ('test: add more tests to
      packaging testing'), which did close #4599.
      
      Here I described the story, why the change was made and why it is
      reverted now.
      
      We run testing during an RPM package build: it may catch some
      distribution specific problem. We had reduced quantity of tests and
      single thread tests execution to keep the testing stable and don't break
      packages build and deployment due to known fragile tests.
      
      Our CI had to use Travis CI, but we were in transition to GitLab CI to
      use our own machines and don't reach Travis CI limit with five jobs
      running in parallel.
      
      We moved package builds to GitLab CI, but kept build+deploy jobs on
      Travis CI for a while: GitLab CI was the new for us and we wanted to do
      this transition smoothly for users of our APT / YUM repositories.
      
      After enabling packages building on GitLab CI, we wanted to enable more
      tests (to catch more problems) and wanted to enable parallel execution
      of tests to speed up testing (and reduce amount of time a developer wait
      for results).
      
      We observed that if we'll enable more tests and parallel execution on
      Travis CI, the testing results will become much less stable and so we'll
      often have holes in deployed packages and red CI.
      
      So, we decided to keep the old way testing on Travis CI and perform all
      changes (more tests, more parallelism) only for GitLab CI.
      
      We had a guess that we have enough machine resources and will able to do
      some load balancing to overcome flaky fails on our own machines, but in
      fact we picked up another approach later (see below).
      
      That's all story behind #4599. What changes from those days?
      
      We moved deployment jobs to GitLab CI[^1] and now we completely disabled
      Travis CI (see #4410 and #4894). All jobs were moved either to GitLab CI
      or right to GitHub Actions[^2].
      
      We revisited our approach to improve stability of testing. Attemps to do
      some load balancing together with attempts to keep not-so-large
      execution time were failed. We should increase parallelism for speed,
      but decrease it for stability at the same time. There is no optimal
      balance.
      
      So we decided to track flaky fails in the issue tracker and restart a
      test after a known fail (see details in [1]). This way we don't need to
      exclude tests and disable parallelism in order to get the stable and
      fast testing[^3]. At least in theory. We're on the way to verify this
      guess, but hopefully we'll stick with some adequate defaults that will
      work everywhere[^4].
      
      To sum up, there are several reasons to remove the old workaround, which
      was implemented in the scope of #4599: no Travis CI, no foreseeable
      reasons to exclude tests and reduce parallelism depending on a CI
      provider.
      
      Footnotes:
      
      [^1]: This is simplification. Travis CI deployment jobs were not moved
            as is. GitLab CI jobs push packages to the new repositories
            backend (#3380). Travis CI jobs were disabled later (as part of
            #4947), after proofs that the new infrastructure works fine.
            However this is the another story.
      
      [^2]: Now we're going to use GitHub Actions for all jobs, mainly because
            GitLab CI is poorly integrated with GitHub pull requests (when
            source branch is in a forked repository).
      
      [^3]: Some work toward this direction still to be done:
      
            First, 'replication' test suite still excluded from the testing
            under RPM package build. It seems, we should just enable it back,
            it is tracked by #4798.
      
            Second, there is the issue [2] to get rid of ancient traces of the
            old attempts to keep the testing stable (from test-run side).
            It'll give us more parallelism in testing.
      
      [^4]: Of course, we perform investigations of flaky fails and fix code
            and testing problems it feeds to us. However it appears to be the
            long activity.
      
      References:
      
      [1]: https://github.com/tarantool/test-run/pull/217
      [2]: https://github.com/tarantool/test-run/issues/251
      d9c25b7a
  6. Dec 25, 2020
    • Sergey Bronnikov's avatar
      test: integrate with OSS Fuzz · 7680948f
      Sergey Bronnikov authored
      To run Tarantool fuzzers on OSS Fuzz infrastructure it is needed to pass
      library $LIB_FUZZING_ENGINE to linker and use external CFLAGS and
      CXXFLAGS. Full description how to integrate with OSS Fuzz is in [1] and
      [2].
      
      Patch to OSS Fuzz repository [2] is ready to merge.
      
      We need to pass options with "-fsanitize=fuzzer" two times
      (in cmake/profile.cmake and test/fuzz/CMakeLists.txt) because:
      
      - cmake/profile.cmake is for project source files,
        -fsanitize=fuzzer-no-link option allows to instrument project source
        files for fuzzing, but LibFuzzer will not replace main() in these
        files.
      
      - test/fuzz/CMakeLists.txt uses -fsanitize=fuzzer and not
        -fsanitize=fuzzer-no-link because we want to add automatically
        generated main() for each fuzzer.
      
      1. https://google.github.io/oss-fuzz/getting-started/new-project-guide/
      2. https://google.github.io/oss-fuzz/advanced-topics/ideal-integration/
      3. https://github.com/google/oss-fuzz/pull/4723
      
      Closes #1809
      7680948f
    • Sergey Bronnikov's avatar
      travis: build tarantool with ENABLE_FUZZER · af126b90
      Sergey Bronnikov authored
      OSS Fuzz has a limited number of runs per day and now it is a 4 runs.
      Option ENABLE_FUZZERS is enabled to make sure that building of fuzzers
      is not broken.
      
      Part of #1809
      af126b90
    • Sergey Bronnikov's avatar
      test: add corpus to be used with fuzzers · 8c1bb620
      Sergey Bronnikov authored
      Fuzzing tools uses evolutionary algorithms. Supplying seed corpus
      consisting of good sample inputs is one of the best ways to improve
      fuzz target’s coverage. Patch adds a corpuses that can be used with
      existed fuzzers. The name of each file in the corpus is the sha1
      checksum of its contents.
      
      Corpus with http headers was added from [1] and [2].
      
      1. https://google.github.io/oss-fuzz/getting-started/new-project-guide/
      2. https://en.wikipedia.org/wiki/List_of_HTTP_header_fields
      3. https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers
      
      The libFuzzer allow to minimize corpus with help of `-merge` flag:
      when 1 is passed, any corpus inputs from the 2nd, 3rd etc. corpus
      directories that trigger new code coverage will be merged into the
      first corpus directory, when 0 is passed an existed corpus will be
      minimized.
      
      All provided corpuses in a patch were minimized.
      
      Part of #1809
      8c1bb620
    • Sergey Bronnikov's avatar
      test: add fuzzers and support for fuzzing testing · 2ad7caca
      Sergey Bronnikov authored
      There is a number of bugs related to parsing and encoding/decoding data.
      Examples:
      
      - csv: #2692, #4497, #2692
      - uri: #585
      
      One of the effective method to find such issues is a fuzzing testing.
      Patch introduces a CMake flag to enable building fuzzers (ENABLE_FUZZER)
      and add fuzzers based on LibFuzzer [1] to csv, http_parser and uri
      modules. Note that fuzzers must return 0 exit code only, other exit
      codes are not supported [2].
      
      NOTE: LibFuzzer requires Clang compiler.
      
      1. https://llvm.org/docs/LibFuzzer.html
      2. http://llvm.org/docs/LibFuzzer.html#id22
      
      How-To Use:
      
      $ mkdir build && cd build
      $ cmake -DENABLE_FUZZER=ON \
      	-DENABLE_ASAN=ON \
      	-DCMAKE_BUILD_TYPE=Debug \
      	-DCMAKE_C_COMPILER="/usr/bin/clang" \
      	-DCMAKE_CXX_COMPILER="/usr/bin/clang++" ..
      $ make -j
      $ ./test/fuzz/csv_fuzzer -workers=4 ../test/static/corpus/csv
      
      Part of #1809
      2ad7caca
    • Sergey Bronnikov's avatar
      luacheck: remove unneeded comment · e0417b4d
      Sergey Bronnikov authored
      serpent module has been dropped in commit
      b53cb2ae
      "console: drop unused serpent module", but comment that belong to module
      was left in luacheck config.
      e0417b4d
    • Sergey Bronnikov's avatar
    • Sergey Bronnikov's avatar
    • Serge Petrenko's avatar
      test: fix box/error · 038c8abe
      Serge Petrenko authored
      Follow-up #5435
      038c8abe
    • Serge Petrenko's avatar
      txn_limbo: ignore CONFIRM/ROLLBACK for a foreign master · cab99888
      Serge Petrenko authored
      We designed limbo so that it errors on receiving a CONFIRM or ROLLBACK
      for other instance's data. Actually, this error is pointless, and even
      harmful. Here's why:
      
      Imagine you have 3 instances, 1, 2 and 3.
      First 1 writes some synchronous transactions, but dies before writing CONFIRM.
      
      Now 2 has to write CONFIRM instead of 1 to take limbo ownership.
      From now on 2 is the limbo owner and in case of high enough load it constantly
      has some data in the limbo.
      
      Once 1 restarts, it first recovers its xlogs, and fills its limbo with
      its own unconfirmed transactions from the previous run. Now replication
      between 1, 2 and 3 is started and the first thing 1 sees is that 2 and 3
      ack its old transactions. So 1 writes CONFIRM for its own transactions
      even before the same CONFIRM written by 2 reaches it.
      Once the CONFIRM written by 1 is replicated to 2 and 3 they error and
      stop replication, since their limbo contains entries from 2, not from 1.
      Actually, there's no need to error, since it's just a really old CONFIRM
      which's already processed by both 2 and 3.
      
      So, ignore CONFIRM/ROLLBACK when it references a wrong limbo owner.
      
      The issue was discovered with test replication/election_qsync_stress.
      
      Follow-up #5435
      cab99888
    • Serge Petrenko's avatar
      test: fix replication/election_qsync_stress test · bf0fbf3a
      Serge Petrenko authored
      The test involves writing synchronous transactions on one node and
      making other nodes confirm these transactions after its death.
      In order for the test to work properly we need to make sure the old
      node replicates all its transactions to peers before killing it.
      Otherwise once the node is resurrected it'll have newer data, not
      present on other nodes, which leads to their vclocks being incompatible
      and noone becoming the new leader and hanging the test.
      
      Follow-up #5435
      bf0fbf3a
    • Serge Petrenko's avatar
      box: rework clear_synchro_queue to commit everything · 5c7dae44
      Serge Petrenko authored
      
      It is possible that a new leader (elected either via raft or manually or
      via some user-written election algorithm) loses the data that the old
      leader has successfully committed and confirmed.
      
      Imagine such a situation: there are N nodes in a replicaset, the old
      leader, denoted A, tries to apply some synchronous transaction. It is
      written on the leader itself and N/2 other nodes, one of which is B.
      The transaction has thus gathered quorum, N/2 + 1 acks.
      
      Now A writes CONFIRM and commits the transaction, but dies before the
      confirmation reaches any of its followers. B is elected the new leader and it
      sees that the last A's transaction is present on N/2 nodes, so it doesn't have a
      quorum (A was one of the N/2 + 1).
      
      Current `clear_synchro_queue()` implementation makes B roll the transaction
      back, leading to rollback after commit, which is unacceptable.
      
      To fix the problem, make `clear_synchro_queue()` wait until all the rows from
      the previous leader gather `replication_synchro_quorum` acks.
      
      In case the quorum wasn't achieved during replication_synchro_timeout, rollback
      nothing and wait for user's intervention.
      
      Closes #5435
      
      Co-developed-by: default avatarVladislav Shpilevoy <v.shpilevoy@tarantool.org>
      5c7dae44
    • Serge Petrenko's avatar
      txn_limbo: introduce txn_limbo_last_synchro_entry method · 618e8269
      Serge Petrenko authored
      It'll be useful for box_clear_synchro_queue rework.
      
      Prerequisite #5435
      618e8269
    • Vladislav Shpilevoy's avatar
      replication: introduce on_ack trigger · 0941aaa1
      Vladislav Shpilevoy authored
      The trigger is fired every time any of the relays notifies tx of replica's
      known vclock change.
      
      The trigger will be used to collect synchronous transactions quorum for
      old leader's transactions.
      
      Part of #5435
      0941aaa1
Loading