Skip to content
Snippets Groups Projects
  1. Jul 23, 2024
  2. Jul 22, 2024
  3. Jul 19, 2024
    • Sergey Bronnikov's avatar
      perf/lua: add context section to test output · a3ef8fb6
      Sergey Bronnikov authored
      Google Benchmark output format contains a section "context" that
      describes useful information about test environment.
      
      Google Benchmark output format has been supported in Lua
      microbenchmarks in commit 3110ef9a
      ("perf: introduce benchmark.lua helper module"). However, produced
      output contains test results only and section "context" is missed.
      The patch add a section "context" with the following fields:
      date, load average, hostname, tarantool's version, build flags
      and a name of build target.
      
      ```
      $ tarantool uri_escape_unescape.lua --output=res.json --output_format=json
      $ jq ".context" res.json
      {
        "build_target": "Linux-x86_64-RelWithDebInfo",
        "host_name": "pony",
        "date": "2024-07-04 19:09:11",
        "tarantool_version": "3.2.0-entrypoint-114-g9e5dca29ad",
        "build_flags": " -fexceptions -funwind-tables -fasynchronous-unwind-tables -fno-common -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=/home/sergeyb/sources/MRG/tarantool=. -std=c11 -Wall -Wextra -Wno-gnu-alignof-expression -Wno-cast-function-type -O2 -g -DNDEBUG -ggdb -O2 ",
        "load_avg": [
          "0.76",
          "0.74",
          "0.63"
        ]
      }
      ```
      
      NO_CHANGELOG=perf
      NO_DOC=perf
      NO_TEST=perf
      a3ef8fb6
    • Sergey Bronnikov's avatar
      perf/lua: protect require of column module · 18661ed7
      Sergey Bronnikov authored
      NO_CHANGELOG=perf
      NO_DOC=perf
      NO_TEST=perf
      18661ed7
    • Sergey Bronnikov's avatar
      ci: enable BENCH_CMD · 3951a88c
      Sergey Bronnikov authored
      The patch enable environment variable `BENCH_CMD` introduced in
      a previous commit. The `taskset` alone will pin all the process
      threads into a single (random) isolated CPU, there's a ticket [1]
      about this in the Linux kernel bugtracker. The workaround is using
      realtime scheduler for the isolated task using `chrt` [2], e. g.:
      `taskset 0xef chrt 50`.
      
      1. https://bugzilla.kernel.org/show_bug.cgi?id=116701
      2. https://www.man7.org/linux/man-pages/man1/chrt.1.html
      
      NO_CHANGELOG=performance testing
      NO_DOC=performance testing
      NO_TEST=performance testing
      3951a88c
    • Sergey Bronnikov's avatar
      perf: introduce BENCH_CMD environment variable · 03317b16
      Sergey Bronnikov authored
      The patch introduces a BENCH_CMD, environment variable that
      could be set to a string with command and its arguments on CMake
      configuration stage and this string will be used as a pre-command
      for executing performance tests. Examples of these commands are
      `taskset` [1] and `numactl` [2], or any other utilities, see [3].
      
      1. https://man7.org/linux/man-pages/man1/taskset.1.html
      2. https://man7.org/linux/man-pages/man8/numactl.8.html
      3. https://github.com/tarantool/tarantool/wiki/Benchmarking#run-benchmarks
      
      NO_CHANGELOG=performance infra
      NO_DOC=performance infra
      NO_TEST=performance infra
      03317b16
    • Sergey Bronnikov's avatar
      perf: add a script for setting environment · f3ca5c93
      Sergey Bronnikov authored
      "Benchmarking" article [0] in Tarantool's wiki contains a lot of
      recommendations that help to setup the Linux operating system and
      avoid potential reproducibility pitfalls when executing
      performance tests in a Linux-based environment. These
      recommendations written in plain text with examples of commands
      that could be executed manually. We desire to execute benchmarks
      automatically and in continuous mode, therefore we need a way to
      setup the test environment automatically before running
      benchmarks.
      
      There are many guides with benchmarking tips, but unfortunately
      there is no script that will do these steps automatically.
      I found only temci [1] and pyperf (`pyperf system` [2]) projects.
      
      The patch adds a script for setting the environment before running
      performance tests. All settings used in the proposed script are
      described in the article [3]. Note, that uncertain settings were
      not implemented.
      
      0. https://github.com/tarantool/tarantool/wiki/Benchmarking
      1. https://github.com/parttimenerd/temci
      2. https://github.com/travisdowns/uarch-bench/blob/master/uarch-bench.sh
      3. https://pyperf.readthedocs.io/en/latest/cli.html#system-cmd
      
      NO_CHANGELOG=performance
      NO_DOC=performance
      NO_TEST=performance
      f3ca5c93
  4. Jul 18, 2024
    • Vladimir Davydov's avatar
      vinyl: wake up waiters after clearing checkpoint_in_progress flag · fc3196dc
      Vladimir Davydov authored
      The function `vy_space_build_index`, which builds a new index on DDL,
      calls `vy_scheduler_dump` on completion. If there's a checkpoint in
      progress, the latter will wait on `vy_scheduler::dump_cond` until
      `vy_scheduler::checkpoint_in_progress` is cleared. The problem is
      `vy_scheduler_end_checkpoint` doesn't broadcast `dump_cond` when it
      clears the flag. Usually, everything works fine because the condition
      variable is broadcast on any dump completion, and vinyl checkpoint
      implies a dump, but under certain conditions this may lead to a fiber
      hang. Let's broadcast `dump_cond` in `vy_scheduler_end_checkpoint`
      to be on the safe side.
      
      While we are at it, let's also inject a dump delay to the original
      test to make it more robust.
      
      Closes #10267
      Follow-up #10234
      
      NO_DOC=bug fix
      fc3196dc
  5. Jul 17, 2024
    • Nikita Zheleztsov's avatar
      iproto: introduce FETCH_SNAPSHOT_CURSOR feature · 62c49367
      Nikita Zheleztsov authored
      This commit introduces FETCH_SNAPSHOT_CURSOR feature, which is available
      only in EE. The feature is not returned in response to IPROTO_ID and is
      not shown in box.iproto.protocol_features in Community Edition. Its id
      is shown only in box.iproto.feature, which is a list of all available
      features in the current version.
      
      Needed for tarantool/tarantool-ee#741
      
      NO_CHANGELOG=minor
      
      @TarantoolBot document
      Title: Document iproto feature FETCH_SNAPSHOT_CURSOR
      
      Root document: https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/#net-box-connect
      
      FETCH_SNAPSHOT_CURSOR feature requires cursor FETCH_SNAPSHOT on the
      server. Its ID is IPROTO_FEATURE_FETCH_SNAPSHOT_CURSOR. IPROTO version
      is 8 or more, Enterprise Edition is also required.
      62c49367
    • Nikita Zheleztsov's avatar
      engine: introduce stubs for checkpoint FETCH_SNAPSHOT · 2fca5c13
      Nikita Zheleztsov authored
      This commit introduces engine stubs that enable a new method
      of fetching snapshots for anonymous replicas. Instead of using
      the traditional read-view join approach, this update allows
      file snapshot fetching. Note that file snapshot fetching
      is only available in Tarantool EE.
      
      Checkpoint fetching is done via IPROTO_IS_CHECKPOINT_JOIN,
      IPROTO_CHECKPOINT_VCLOCK and IPROTO_CHECKPOINT_LSN fields.
      
      If IPROTO_CHECKPOINT_JOIN is set to true, join will be done from
      files: .snap for memtx, .run for vinyl, if false - from read view.
      
      Checkpoint join allows to continue from the place, where client
      stopped in case of snapshot fetching error. This allows to avoid
      rebootstrap of an anonymous client. This can be done by specifying
      CHECKPOINT_VCLOCK, which says from which file server should continue
      join, client gets vclock at the beginning of the join. Specifying
      CHECKPOINT_LSN allows to continue from some position in checkpoint.
      Server sends all data >= CHECKPOINT_LSN.
      
      If CHECKPOINT_VCLOCK is not specified, fetching is done from the latest
      available checkpoint. If CHECKPOINT_LSN is not specified - start from
      the beginning of the snap. So, specifying only IS_CHECKPOINT_JOIN
      triggers fetching the latest checkpoint from files.
      
      Needed for tarantool/tarantool-ee#741
      
      NO_DOC=ee
      NO_TEST=ee
      NO_CHANGELOG=ee
      2fca5c13
    • Nikita Zheleztsov's avatar
      engine: send vclock with 0th component during join · 56058393
      Nikita Zheleztsov authored
      This commit makes engine to send vclock without ignoring 0th component
      during join, which is needed for checkpoint FETCH SNAPSHOT.
      
      Currently engine join functions are invoked only from
      relay_initial_join, which is done during JOIN or FETCH SNAPSHOT.
      They respond with vclock of the read view we're going to send.
      
      In the following commit checkpoint FETCH SNAPSHOT will be introduced,
      which responds with vclock of the checkpoint, we're going to send.
      Such vclock may include 0th component and it's crucial to send it to
      a client, as in case of connection failure, client will send us the
      same vclock and we'll have to use its signature to figure out, which
      checkpoint client wants.
      
      So, we have to send and receive 0th component of the vclock during
      FETCH_SNAPSHOT. This commit also introduces decoding vclocks without
      ignoring 0th component, as they'll be used in the following commit too.
      
      Needed for tarantool/tarantool-ee#741
      
      NO_DOC=internal
      NO_TEST=ee
      NO_CHANGELOG=internal
      56058393
    • Nikita Zheleztsov's avatar
      xrow: rename xrow_encode_vclock · 313bd730
      Nikita Zheleztsov authored
      This commit renames xrow_encode_vlock to xrow_encode_vclock_ignore0
      since the next commit will introduce encoding vclock without ignoring
      0th component, which is needed during sending the response to fetch
      snapshot request.
      
      This commit also removes internal field inside the replication_request
      structure, as the following commit will use 'vclock' for
      encoding/decoding vclock without ignoring component.
      
      Needed for tarantool/tarantool-ee#741
      
      NO_DOC=refactoring
      NO_TEST=refactoring
      NO_CHANGELOG=refactoring
      313bd730
    • Nikita Zheleztsov's avatar
      relay: refactor relay_initial_join · 72cc2b3e
      Nikita Zheleztsov authored
      From now on during initial join memtx engine prepares vclock, raft and
      limbo states, it also sends them during memtx_engine_join.
      
      It's done in order to simplify the code of initial join, as in the
      consequent commit checkpoint initial join will be introduced and we want
      relay code to handle it the same as read-view join without confusing
      conditions.
      
      Needed for tarantool/tarantool-ee#741
      
      NO_DOC=refactoring
      NO_TEST=refactoring
      NO_CHANGELOG=refactoring
      72cc2b3e
    • Nikita Zheleztsov's avatar
      engine: move raft and limbo states after system data in checkpoint · 3da31b83
      Nikita Zheleztsov authored
      Before this commit raft and limbo states were written at the end of the
      checkpoint, which makes it very costly to access them.
      
      Checkpoint join needs to access limbo and raft state in order to send
      them during JOIN_META stage. We cannot use the latest states, like it's
      done for read-view snapshot fetching: states may be far ahead of the
      data, written to the checkpoint, which we're going to send.
      
      This commit moves raft and limbo states after data from the system
      spaces but before user data. We cannot put them right at the beginning
      of the snapshot, because then we'll have to patch recovery process,
      which currently strongly relies on the fact, that system spaces are
      at the beginning of the snapshot (this was done in order to apply force
      recovery only for user data). If we patch recovery process, then old
      versions, where it's unpatched, won't be able to recover from the
      snapshots done by the newer version, compatibility of snapshots will be
      broken.
      
      The current change is not breaking, old Tarantool versions can restore
      from the snapshot made by the newer one.
      
      Needed for tarantool/tarantool-ee#741
      
      NO_DOC=internal
      NO_CHANGELOG=internal
      3da31b83
  6. Jul 16, 2024
    • Ilya Verbin's avatar
      perf: fix warnings in column_scan_module.c · 4cac1677
      Ilya Verbin authored
      Fix the following warnings (with ENABLE_READ_VIEW defined):
      
      ```
      ./perf/lua/column_scan_module.c:59:18: error: unused variable ‘index_id’ [-Werror=unused-variable]
         59 |         uint32_t index_id = luaL_checkinteger(L, 2);
            |                  ^~~~~~~~
      
      ./perf/lua/column_scan_module.c:149:18: error: unused variable ‘index_id’ [-Werror=unused-variable]
        149 |         uint32_t index_id = luaL_checkinteger(L, 2);
            |                  ^~~~~~~~
      ```
      
      NO_DOC=perf test
      NO_TEST=perf test
      NO_CHANGELOG=perf test
      4cac1677
    • Nikita Zheleztsov's avatar
      applier: fix assertion failure after split brain · 5ce010c5
      Nikita Zheleztsov authored
      After receiving async transaction from an old term applier_apply_tx
      exits without unlocking the latch. If the same applier tries to
      subscribe for replication, it fails with assertion, as the latch is
      already locked.
      
      Let's fix the function, which raises error so that it just sets
      diag and returns -1.
      
      Closes #10073
      
      NO_DOC=bugfix
      NO_CHANGELOG=no crash on release version
      5ce010c5
    • Ilya Verbin's avatar
      perf: add column insert test · e5c4bd63
      Ilya Verbin authored
      The test creates an empty space with 1000 nullable columns storing uint64
      values. Then it initializes a datasets that consists of 10 columns and
      1 million rows (row count and both column counts are configurable), then
      it inserts the dataset into the space.
      
      By default the test uses serial C API but one may switch to the Arrow API
      for batch insertion (the feature is exclusive to the Enterprise Edition).
      
      It's also possible to specify the engine and wal_mode to use (default are
      memtx, write).
      
      Needed for tarantool/tarantool-ee#712
      
      NO_DOC=perf test
      NO_TEST=perf test
      NO_CHANGELOG=perf test
      e5c4bd63
    • Ilya Verbin's avatar
      third_party: initial import of arrow/abi.h · 8cd677da
      Ilya Verbin authored
      Needed for tarantool/tarantool-ee#712
      
      NO_DOC=for enterprise edition
      NO_TEST=for enterprise edition
      NO_CHANGELOG=for enterprise edition
      8cd677da
    • Ilya Verbin's avatar
      lua/utils: export luaL_pushnull and luaL_isnull functions · a6140a3e
      Ilya Verbin authored
      They are useful in C modules.
      
      Needed for tarantool/tarantool-ee#712
      
      @TarantoolBot document
      Title: Update C API reference > Module lua/utils
      Product: Tarantool
      Root documents: https://www.tarantool.io/en/doc/latest/dev_guide/reference_capi/utils/
      
      The following functions are missed in the documentation:
      
       * luaL_iscallable
       * luaL_iscdata
       * luaL_isnull
       * luaL_pushnull
       * luaT_call
       * luaT_checktuple
       * luaT_isdecimal
       * luaT_newdecimal
       * luaT_pushdecimal
       * luaT_toibuf
       * luaT_tolstring
       * luaT_tuple_encode
       * luaT_tuple_new
      
      See also: https://github.com/tarantool/doc/issues/2011
      a6140a3e
    • Ilya Verbin's avatar
      mpstream: introduce mpstream_encode_int64() helper · f8be986d
      Ilya Verbin authored
      Needed for tarantool/tarantool-ee#712
      
      NO_TEST=EE
      NO_DOC=internal
      NO_CHANGELOG=internal
      f8be986d
    • Ilya Verbin's avatar
      error: introduce ERRINJ_TUPLE_ALLOC_COUNTDOWN · 52926402
      Ilya Verbin authored
      Needed for tarantool/tarantool-ee#712
      
      NO_DOC=internal
      NO_TEST=internal
      NO_CHANGELOG=internal
      52926402
    • Ilya Verbin's avatar
      test: do not test errinj.info() output · dc0fd81c
      Ilya Verbin authored
      There is no much sense in testing it, but it is sensitive to source code
      changes, especially `ERRINJ_*_COUNTDOWN` injections, e.g. see commit
      697123d0 ("box: use maximal space id instead of _schema.max_id").
      
      Needed for tarantool/tarantool-ee#712
      
      NO_DOC=test
      NO_CHANGELOG=test
      dc0fd81c
    • Lev Kats's avatar
      sio: fix error message displaying bind address · a5214bfc
      Lev Kats authored
      Now `sio_bind` function prints address into error message directly
      instead of relying on `fd` used in `bind` that failed to execute.
      
      `sio_bind` used `sio_socketname_to_buffer` for error message
      effectively attempting printing address bound to `fd` while there
      actually was an error in binding that address to that socket in the
      first place.
      
      Fixes #5925
      
      NO_DOC=bugfix
      NO_CHANGELOG=minor
      a5214bfc
    • Nikita Zheleztsov's avatar
      test: cover split-brain during promote · 06b87e27
      Nikita Zheleztsov authored
      This test checks, that when PROMOTE from the previous term is
      encountered we immediately notice split-brain situation and break
      replication without corrupting data.
      
      Closes #9943
      
      NO_DOC=test
      NO_CHANGELOG=test
      06b87e27
  7. Jul 15, 2024
    • Vladislav Shpilevoy's avatar
      applier: drop apply_final_join_tx · da158b9b
      Vladislav Shpilevoy authored
      Can use the regular applier_apply_tx(), they do the same. The
      latter is just more protective, but doesn't matter much in this
      case if the code does a few latch locks.
      
      The patch also drops an old test about double-received row panic
      during final join. The logic is that absolutely the same situation
      could happen during subscribe, but it was always filtered out by
      checking replicaset.applier.vclock and skipping duplicate rows.
      
      There doesn't seem to be a reason why final join must be any
      different. It is, after all, same subscribe logic but the received
      rows go into replica's initial snapshot instead of xlogs. Now it
      even uses the same txn processing function applier_apply_tx().
      
      The patch also moves `replication_skip_conflict` option setting
      after bootstrap is finished. In theory, final join could deliver
      a conflicting row and it must not be ignored. The problem is that
      it can't be reproduced anyhow without illegal error injection
      (which would corrupt something in an unrealistic way). But lets
      anyway move it below bootstrap for clarity.
      
      Follow-up #10113
      
      NO_DOC=refactoring
      NO_CHANGELOG=refactoring
      da158b9b
    • Vladislav Shpilevoy's avatar
      box: make instance_vclock const · 19b2cc20
      Vladislav Shpilevoy authored
      No code besides box.cc can now update instance's vclock
      explicitly. That is a protection against hacks like #9916.
      
      Closes #10113
      
      NO_DOC=refactoring
      NO_TEST=refactoring
      NO_CHANGELOG=refactoring
      19b2cc20
    • Vladislav Shpilevoy's avatar
      box: make final join vclock update only in box.cc · fe338ed4
      Vladislav Shpilevoy authored
      The goal is to make sure that no files except box.cc can change
      instance_vclock_storage directly. That leads to all sorts of hacks
      which in turn lead to bugs - #9916 is a good example.
      
      Now applier on final join only sends rows into the journal. The
      journal then is handled by box.cc where vclock is properly
      updated.
      
      Part of #10113
      
      NO_DOC=refactoring
      NO_TEST=refactoring
      NO_CHANGELOG=refactoring
      fe338ed4
    • Vladislav Shpilevoy's avatar
      journal: extract journal_write_row from limbo · 7d10096c
      Vladislav Shpilevoy authored
      The function writes a single xrow into the journal in a blocking
      way. It isn't so simple, so makes sense to keep as a function,
      especially given that it will be used more in the next commit.
      
      Part of #10113
      
      NO_DOC=refactoring
      NO_TEST=refactoring
      NO_CHANGELOG=refactoring
      7d10096c
    • Vladislav Shpilevoy's avatar
      box: move recovery_journal creation · 2620eb9e
      Vladislav Shpilevoy authored
      Recovery journal uses word "recovery" to say that it works with
      xlogs. For snapshot recovery there is bootstrap_journal. Lets use
      it during local snapshot recovery.
      
      The reasoning is that while right now there is no difference, in
      next commits the recovery_journal will do more.
      
      Part of #10113
      
      NO_DOC=refactoring
      NO_TEST=refactoring
      NO_CHANGELOG=refactoring
      2620eb9e
    • Vladislav Shpilevoy's avatar
      box: move replicaset.vclock into instance_vclock · f1e8e4e1
      Vladislav Shpilevoy authored
      Storing vclock of the instance in replicaset.vclock wasn't right.
      It wasn't vclock of the whole replicaset. It was local to this
      instance. There is no such thing as "replicaset vclock".
      
      The patch moves it to box.h/cc.
      
      Part of #10113
      
      NO_DOC=refactoring
      NO_TEST=refactoring
      NO_CHANGELOG=refactoring
      f1e8e4e1
    • Vladislav Shpilevoy's avatar
      applier: treat register txns like regular ones · 51751f87
      Vladislav Shpilevoy authored
      Applier during the registration waiting (for registering a new ID
      or a name) could keep doing the master txns received before the
      registration was started. They could still be inside WAL doing a
      disk write, when the replica sends a register request.
      
      Before this commit, it could cause an assertion failure in debug
      and a double LSN error in release.
      
      The reason was that during the registration waiting the applier
      treated all incoming txns as "final join" txns. I.e. it wasn't
      checking if those txns were already received, but not committed
      yet.
      
      During normal subscribe process the appliers (potentially
      multiple) protect themselves from that by keeping track of the
      vclocks which are already applied and also being applied right now
      (replicaset.applier.vclock).
      
      Such protection ensures that receiving same row from 2 appliers
      wouldn't result into its double write. It also protects from the
      case when a txn was received, goes to WAL, but then the applier
      reconnects, resubscribes, and gets the same txn again - it
      shouldn't be applied.
      
      The patch makes so that the registration waiting after recovery
      works like subscribe. Registration during recovery would mean
      bootstrap via join. And outside of recovery it means the instance
      is already running.
      
      Closes #9916
      
      NO_DOC=bugfix
      51751f87
    • Nikolay Shirokovskiy's avatar
      lua: shutdown tasks worker fiber · 6e403753
      Nikolay Shirokovskiy authored
      As this fiber is made system in the commit bf620650 ("box: finish
      client fibers on shutdown") we don not need the existing protection from
      cancelling. So first remove it. Now make it managed on shutdown.
      
      Note that we may have issues as we finish this fiber too early. The
      tasks scheduled but not executed at this moment will never be executed.
      So the tasks that be scheduled after fiber is finished. Now when we
      don't use worker fiber for swim gc this will not cause leaks. And
      leaking fd on Tarantool shutdown in fio is not a problem.
      
      Closes #9722
      
      NO_CHANGELOG=internal
      NO_DOC=internal
      6e403753
Loading