Skip to content
Snippets Groups Projects
  1. Oct 14, 2020
    • Vladislav Shpilevoy's avatar
      raft: introduce on_update trigger · 43d42969
      Vladislav Shpilevoy authored
      Raft state machine now has a trigger invoked each time when any of
      the visible Raft attributes is changed: state, term, vote.
      
      The trigger is needed to commit synchronous transactions of an old
      leader, when a new leader is elected. This is done via a trigger
      so as not to depend on box in raft code too much. That would make
      it harder to extract it into a new module later.
      
      The trigger is executed in the Raft worker fiber, so as not to
      stop the state machine transitions anywhere, which currently don't
      contain a single yield. And the synchronous transaction queue
      clearance requires a yield, to write CONFIRM and ROLLBACK records
      to WAL.
      
      Part of #5339
      43d42969
    • Vladislav Shpilevoy's avatar
      raft: new candidate should wait for leader death · 6b43f103
      Vladislav Shpilevoy authored
      When a raft node was configured to be a candidate via
      election_mode, it didn't do anything if there was an active
      leader.
      
      But it should have started monitoring its health in order to
      initiate a new election round when it dies.
      
      The patch fixes this bug. It does not contain a test, because will
      be covered by a test for #5339.
      
      Needed for #5339
      6b43f103
    • Vladislav Shpilevoy's avatar
      raft: factor out the code to wakeup worker fiber · da4998ea
      Vladislav Shpilevoy authored
      Raft has a worker fiber to perform async tasks such as WAL write,
      state broadcast.
      
      The worker was created and woken up from 2 places, leading at
      least to code duplication. The patch wraps it into a new function
      raft_worker_wakeup(), and uses it.
      
      The patch is not need for anything functional, but was created
      while working on #5339 and trying ideas. The patch seems to be
      good refactoring making the code simpler, and therefore it is
      submitted.
      da4998ea
    • Vladislav Shpilevoy's avatar
      test: add '_stress' suffix to election_qsync test · bd0da669
      Vladislav Shpilevoy authored
      The test is long, about 10 seconds. But its name is too general.
      And it would be better used for a simpler more basic test. This is
      going to happen in the next commits.
      
      election_qsync.test.lua will check if the election and qsync work
      fine together without any stress cases.
      
      Needed for #5339
      bd0da669
    • Alexander V. Tikhonov's avatar
      test: flaky vinyl/gh.test.lua fails on 427 line · 77848aa8
      Alexander V. Tikhonov authored
      Added new checksum for flaky fail on vinyl/gh.test.lua:427 line.
      
      Part of #5141
      77848aa8
    • Alexander V. Tikhonov's avatar
      test: flaky replication/replica_rejoin.test.lua · d9d1deac
      Alexander V. Tikhonov authored
      On heavy loaded hosts found the following issue:
      
        [151] --- replication/replica_rejoin.result     Tue Sep 29 10:57:26 2020
        [151] +++ replication/replica_rejoin.reject     Tue Sep 29 10:57:48 2020
        [151] @@ -230,7 +230,12 @@
        [151]      return box.info ~= nil and box.info.replication[1] ~= nil
        [151]  end)
        [151]  ---
        [151] -- true
        [151] +- error: "builtin/box/load_cfg.lua:601: Please call box.cfg{} first\nstack traceback:\n\tbuiltin/box/load_cfg.lua:601:
        [151] +    in function '__index'\n\t[string \"return test_run:wait_cond(function()         ...\"]:1:
        [151] +    in function 'cond'\n\t/tmp/tnt/151_replication/test_run.lua:411: in function </tmp/tnt/151_replication/test_run.lua:404>\n\t[C]:
        [151] +    in function 'pcall'\n\tbuiltin/box/console.lua:402: in function 'eval'\n\tbuiltin/box/console.lua:708:
        [151] +    in function 'repl'\n\tbuiltin/box/console.lua:842: in function <builtin/box/console.lua:828>\n\t[C]:
        [151] +    in function 'pcall'\n\tbuiltin/socket.lua:1081: in function <builtin/socket.lua:1079>"
        [151]  ...
        [151]  test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
        [151]  ---
        [151]
      
      It happened because box.cfg was not ready to provide information. In
      real there is no need to use local check for replication information
      parts availablity, due to wait_upstream() function used below, do it
      itself.
      
      Part of #4985
      d9d1deac
  2. Oct 13, 2020
    • Igor Munkin's avatar
      build: another fix for luajit-tap tests cmake · e7203e38
      Igor Munkin authored
      
      Fixes the regression from e5039742
      ('luajit: bump new version').
      
      Reported-by: default avatarAlexander Tikhonov <avtikhon@tarantool.org>
      Signed-off-by: default avatarIgor Munkin <imun@tarantool.org>
      e7203e38
    • Ilya Kosarev's avatar
      key_def: support composite types extraction · 9a8ac59c
      Ilya Kosarev authored
      key_def didn't support key definitions with array, map, varbinary & any
      fields. Thus they couldn't be extracted with
      key_def_object:extract_key(). Since the restriction existed due to
      impossibility of such types comparison, this patch removes the
      restriction for the fields extraction and only leaves it for
      comparison.
      
      Closes #4538
      9a8ac59c
    • Kirill Yukhin's avatar
      luajit: bump new version · e5039742
      Kirill Yukhin authored
      * misc: add C and Lua API for platform metrics
      * core: introduce various platform metrics
      e5039742
    • Alexander V. Tikhonov's avatar
      Add flaky tests checksums to fragile 2nd part · 3bc455f7
      Alexander V. Tikhonov authored
      Added for tests with issues:
      
        app/socket.test.lua				gh-4978
        box/access.test.lua				gh-5411
        box/access_misc.test.lua			gh-5401
        box/gh-5135-invalid-upsert.test.lua		gh-5376
        box/hash_64bit_replace.test.lua test		gh-5410
        box/hash_replace.test.lua			gh-5400
        box/huge_field_map_long.test.lua		gh-5375
        box/net.box_huge_data_gh-983.test.lua		gh-5402
        replication/anon.test.lua			gh-5381
        replication/autoboostrap.test.lua		gh-4933
        replication/box_set_replication_stress.test.lua gh-4992
        replication/election_basic.test.lua		gh-5368
        replication/election_qsync.test.lua test	gh-5395
        replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
        replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
        replication/gh-5287-boot-anon.test.lua	gh-5412
        replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
        replication/show_error_on_disconnect.test.lua	gh-5371
        replication/status.test.lua			gh-5409
        swim/swim.test.lua				gh-5403
        unit/swim.test				gh-5399
        vinyl/gc.test.lua				gh-5383
        vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
        vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
        vinyl/gh.test.lua				gh-5141
        vinyl/quota.test.lua				gh-5377
        vinyl/snapshot.test.lua			gh-4984
        vinyl/stat.test.lua				gh-4951
        vinyl/upsert.test.lua				gh-5398
      3bc455f7
    • Alexander V. Tikhonov's avatar
      test: enable flaky tests on FreeBSD 12 · 8bcb6409
      Alexander V. Tikhonov authored
      Testing on FreeBSD 12 had some tests previously blocked to avoid of
      flaky fails. For now we have the ability to avoid of it in test-run
      using checksums for fails with opened issues. So adding back 7 tests
      to testing on FreeBSD 12.
      
      Closes #4271
      8bcb6409
    • Alexander V. Tikhonov's avatar
      asan: add leak suppressions for flaky test · 389c12b4
      Alexander V. Tikhonov authored
      Met flaky issues on test:
      
        replication/gh-3637-misc-error-on-replica-auth-fail.test.lua
      
      Found memory leaks:
      
      [093] Last 15 lines of Tarantool Log file [Instance "replica_auth"][/builds/DtQXhC5e/0/tarantool/tarantool/test/var/093_replication/replica_auth.log]:
      [093]     #3 0xa13df8 in coio_on_call /builds/DtQXhC5e/0/tarantool/tarantool/src/lib/core/coio_task.c:264:16
      [093]     #4 0xfcedbe in eio_execute /builds/DtQXhC5e/0/tarantool/tarantool/third_party/libeio/eio.c:2015:9
      [093]     #5 0xfcedbe in etp_proc /builds/DtQXhC5e/0/tarantool/tarantool/third_party/libeio/etp.c:373
      [093]     #6 0x7f8c8260ffa2 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7fa2)
      [093]
      [093] Indirect leak of 4 byte(s) in 1 object(s) allocated from:
      [093]     #0 0x525dfa in calloc (/builds/DtQXhC5e/0/tarantool/tarantool/src/tarantool+0x525dfa)
      [093]     #1 0xa2eb4a in mh_i64ptr_new /builds/DtQXhC5e/0/tarantool/tarantool/src/lib/salad/mhash.h:408:22
      [093]     #2 0x8a516d in vy_recovery_new_f /builds/DtQXhC5e/0/tarantool/tarantool/src/box/vy_log.c:2321:23
      [093]     #3 0xa13df8 in coio_on_call /builds/DtQXhC5e/0/tarantool/tarantool/src/lib/core/coio_task.c:264:16
      [093]     #4 0xfcedbe in eio_execute /builds/DtQXhC5e/0/tarantool/tarantool/third_party/libeio/eio.c:2015:9
      [093]     #5 0xfcedbe in etp_proc /builds/DtQXhC5e/0/tarantool/tarantool/third_party/libeio/etp.c:373
      [093]     #6 0x7f8c8260ffa2 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7fa2)
      
      To stabilize testing these leaks added as suppressions to asan list.
      
      Part of #5343
      389c12b4
    • Alexander V. Tikhonov's avatar
      test: move error messages into logs gh-5383 · fa66c295
      Alexander V. Tikhonov authored
      Set error message to log output in test:
      
        vinyl/gc.test.lua
      fa66c295
    • Alexander V. Tikhonov's avatar
      test: move error messages into logs gh-4984 · e95aec95
      Alexander V. Tikhonov authored
      Set error message to log output in test:
      
        vinyl/snapshot.test.lua
      e95aec95
    • Alexander V. Tikhonov's avatar
      test: move error messages into logs gh-5366 · c34d1d67
      Alexander V. Tikhonov authored
      Set error message to log output in test:
      
        replication/gh-4402-info-errno.test.lua
      c34d1d67
    • Alexander V. Tikhonov's avatar
      test: move error messages into logs gh-4985 · ca0c2799
      Alexander V. Tikhonov authored
      Set error message to log output in test:
      
        replication/replica_rejoin.test.lua
      ca0c2799
    • Alexander V. Tikhonov's avatar
      test: move error messages into logs gh-4940 · 1433ed8e
      Alexander V. Tikhonov authored
      Set error message to log output in test:
      
        replication/gh-3160-misc-heartbeats-on-master-changes.test.lua
      1433ed8e
    • Kirill Yukhin's avatar
      test-run: bump new version · 69b5759c
      Kirill Yukhin authored
      * Restart server on each failed test in worker
      69b5759c
  3. Oct 12, 2020
    • Vladislav Shpilevoy's avatar
      raft: introduce election_mode configuration option · 24974f36
      Vladislav Shpilevoy authored
      The new option can be one of 3 values: 'off', 'candidate',
      'voter'. It replaces 2 old options: election_is_enabled and
      election_is_candidate. These flags looked strange, that it was
      possible to set candidate true, but disable election at the same
      time. Also it would not look good if we would ever decide to
      introduce another mode like a data-less sentinel node, for
      example. Just for voting.
      
      Anyway, the single option approach looks easier to configure and
      to extend.
      
      - 'off' means the election is disabled on the node. It is the same
        as election_is_enabled = false in the old config;
      
      - 'voter' means the node can vote and is never writable. The same
        as election_is_enabled = true + election_is_candidate = false in
        the old config;
      
      - 'candidate' means the node is a full-featured cluster member,
        which eventually may become a leader. The same as
        election_is_enabled = true + election_is_candidate = true in the
        old config.
      
      Part of #1146
      24974f36
  4. Oct 07, 2020
    • Aleksandr Lyapunov's avatar
      Introduce fselect - formatted select · 0dc72812
      Aleksandr Lyapunov authored
      space:fselect and index:fselect fetch data like ordinal select,
      but formats the result like mysql does - with columns, column
      names etc. fselect converts tuple to strings using json,
      extending with spaces and cutting tail if necessary. It is
      designed for visual analysis of select result and shouldn't
      be used stored procedures.
      
      index:fselect(<key>, <opts>, <fselect_opts>)
      space:fselect(<key>, <opts>, <fselect_opts>)
      
      There are some options that can be specified in different ways:
       - among other common options (<opts>) with 'fselect_' prefix.
         (e.g. 'fselect_type=..')
       - in special <fselect_opts> map (with or without prefix).
       - in global variables with 'fselect_' prefix.
      
      The possible options are:
       - type:
          - 'sql' - like mysql result (default).
          - 'gh' (or 'github' or 'markdown') - markdown syntax, for
            copy-pasting to github.
          - 'jira' - jira table syntax (for copy-pasting to jira).
       - widths: array with desired widths of columns.
       - max_width: limit entire length of a row string, longest fields
         will be cut if necessary. Set to 0 (default) to detect and use
         screen width. Set to -1 for no limit.
       - print: (default - false) - print each line instead of adding
         to result.
       - use_nbsp: (default - true) - add invisible spaces to improve
         readability in YAML output. Not applicabble when print=true.
      
      There is also a pair of shortcuts:
      index/space:gselect - same as fselect, but with type='gh'.
      index/space:jselect - same as fselect, but with type='jira'.
      
      See test/engine/select.test.lua for examples.
      
      Closes #5161
      0dc72812
    • Sergey Kaplun's avatar
      fiber: fix build for disabled fiber top · ab64b120
      Sergey Kaplun authored
      In case when we build without `ENABLE_FIBER_TOP` neither
      `struct fiber` contains `clock_stat` field nor `FIBER_TIME_RES`
      constant is defined.
      This patch adds corresponding ifdef directive to avoid compilation
      errors.
      ab64b120
  5. Oct 06, 2020
  6. Oct 02, 2020
    • Igor Munkin's avatar
      lua: abort trace recording on fiber yield · 2711797b
      Igor Munkin authored
      
      Since Tarantool fibers don't respect Lua coroutine switch mechanism, JIT
      machinery stays unnotified when one lua_State substitutes another one.
      As a result if trace recording hasn't been aborted prior to fiber
      switch, the recording proceeds using the new lua_State and leads to a
      failure either on any further compiler phase or while the compiled trace
      is executed.
      
      This changeset extends <cord_on_yield> routine aborting trace recording
      when the fiber switches to another one. If the switch-over occurs while
      mcode is being run the platform finishes its execution with EXIT_FAILURE
      code and calls panic routine prior to the exit.
      
      Closes #1700
      Fixes #4491
      
      Reviewed-by: default avatarSergey Ostanevich <sergos@tarantool.org>
      Reviewed-by: default avatarVladislav Shpilevoy <v.shpilevoy@tarantool.org>
      Signed-off-by: default avatarIgor Munkin <imun@tarantool.org>
      2711797b
    • Igor Munkin's avatar
      fiber: introduce a callback for fibers switch-over · a390ec55
      Igor Munkin authored
      
      Tarantool integrates several complex environments together and there are
      issues occurring at their junction leading to the platform failures.
      E.g. fiber switch-over is implemented outside the Lua world, so when one
      lua_State substitutes another one, main LuaJIT engines, such as JIT and
      GC, are left unnotified leading to the further platform misbehaviour.
      
      To solve this severe integration drawback <cord_on_yield> function is
      introduced. This routine encloses the checks and actions to be done when
      the running fiber yields the execution.
      
      Unfortunately the way callback is implemented introduces a circular
      dependency. Considering linker symbol resolving methods for static build
      an auxiliary translation unit is added to the particular tests mocking
      (i.e. exporting) <cord_on_yield> undefined symbol.
      
      Part of #1700
      Relates to #4491
      
      Reviewed-by: default avatarSergey Ostanevich <sergos@tarantool.org>
      Reviewed-by: default avatarVladislav Shpilevoy <v.shpilevoy@tarantool.org>
      Signed-off-by: default avatarIgor Munkin <imun@tarantool.org>
      a390ec55
  7. Oct 01, 2020
  8. Sep 30, 2020
  9. Sep 29, 2020
    • Vladislav Shpilevoy's avatar
      raft: add tests · cf799645
      Vladislav Shpilevoy authored
      Part of #1146
      cf799645
    • Vladislav Shpilevoy's avatar
      raft: introduce box.info.election · 15fc8449
      Vladislav Shpilevoy authored
      Box.info.election returns a table of form:
      
          {
              state: <string>,
              term: <number>,
              vote: <instance ID>,
              leader: <instance ID>
          }
      
      The fields correspond to the same named Raft concepts one to one.
      This info dump is supposed to help with the tests, first of all.
      And with investigation of problems in a real cluster.
      
      The API doesn't mention 'Raft' on purpose, to keep it not
      depending specifically on Raft, and not to confuse users who
      don't know anything about Raft (even that it is about leader
      election and synchronous replication).
      
      Part of #1146
      15fc8449
    • Vladislav Shpilevoy's avatar
      raft: introduce state machine · 27399889
      Vladislav Shpilevoy authored
      The commit is a core part of Raft implementation. It introduces
      the Raft state machine implementation and its integration into the
      instance's life cycle.
      
      The implementation follows the protocol to the letter except a few
      important details.
      
      Firstly, the original Raft assumes, that all nodes share the same
      log record numbers. In Tarantool they are called LSNs. But in case
      of Tarantool each node has its own LSN in its own component of
      vclock. That makes the election messages a bit heavier, because
      the nodes need to send and compare complete vclocks of each other
      instead of a single number like in the original Raft. But logic
      becomes simpler. Because in the original Raft there is a problem
      of uncertainty about what to do with records of an old leader
      right after a new leader is elected. They could be rolled back or
      confirmed depending on circumstances. The issue disappears when
      vclock is used.
      
      Secondly, leader election works differently during cluster
      bootstrap, until number of bootstrapped replicas becomes >=
      election quorum. That arises from specifics of replicas bootstrap
      and order of systems initialization. In short: during bootstrap a
      leader election may use a smaller election quorum than the
      configured one. See more details in the code.
      
      Part of #1146
      27399889
    • sergepetrenko's avatar
      raft: relay status updates to followers · 67b60f08
      sergepetrenko authored
      The patch introduces a new type of system message used to notify the
      followers of the instance's raft status updates.
      It's relay's responsibility to deliver the new system rows to its peers.
      The notification system reuses and extends the same row type used to
      persist raft state in WAL and snapshot.
      
      Part of #1146
      Part of #5204
      67b60f08
    • Vladislav Shpilevoy's avatar
      raft: introduce box.cfg.election_* options · 1d329f0b
      Vladislav Shpilevoy authored
      The new options are:
      
      - election_is_enabled - enable/disable leader election (via
        Raft). When disabled, the node is supposed to work like if Raft
        does not exist. Like earlier;
      
      - election_is_candidate - a flag whether the instance can try to
        become a leader. Note, it can vote for other nodes regardless
        of value of this option;
      
      - election_timeout - how long need to wait until election end, in
        seconds.
      
      The options don't do anything now. They are added separately in
      order to keep such mundane changes from the main Raft commit, to
      simplify its review.
      
      Option names don't mention 'Raft' on purpose, because
      - Not all users know what is Raft, so they may not even know it
        is related to leader election;
      - In future the algorithm may change from Raft to something else,
        so better not to depend on it too much in the public API.
      
      Part of #1146
      1d329f0b
    • Vladislav Shpilevoy's avatar
      raft: introduce persistent raft state · 4f0f7c8f
      Vladislav Shpilevoy authored
      The patch introduces a sceleton of Raft module and a method to
      persist a Raft state in snapshot, not bound to any space.
      
      Part of #1146
      4f0f7c8f
Loading