Skip to content
Snippets Groups Projects
  1. Oct 13, 2020
    • Alexander V. Tikhonov's avatar
      Add flaky tests checksums to fragile 2nd part · 3bc455f7
      Alexander V. Tikhonov authored
      Added for tests with issues:
      
        app/socket.test.lua				gh-4978
        box/access.test.lua				gh-5411
        box/access_misc.test.lua			gh-5401
        box/gh-5135-invalid-upsert.test.lua		gh-5376
        box/hash_64bit_replace.test.lua test		gh-5410
        box/hash_replace.test.lua			gh-5400
        box/huge_field_map_long.test.lua		gh-5375
        box/net.box_huge_data_gh-983.test.lua		gh-5402
        replication/anon.test.lua			gh-5381
        replication/autoboostrap.test.lua		gh-4933
        replication/box_set_replication_stress.test.lua gh-4992
        replication/election_basic.test.lua		gh-5368
        replication/election_qsync.test.lua test	gh-5395
        replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
        replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
        replication/gh-5287-boot-anon.test.lua	gh-5412
        replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
        replication/show_error_on_disconnect.test.lua	gh-5371
        replication/status.test.lua			gh-5409
        swim/swim.test.lua				gh-5403
        unit/swim.test				gh-5399
        vinyl/gc.test.lua				gh-5383
        vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
        vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
        vinyl/gh.test.lua				gh-5141
        vinyl/quota.test.lua				gh-5377
        vinyl/snapshot.test.lua			gh-4984
        vinyl/stat.test.lua				gh-4951
        vinyl/upsert.test.lua				gh-5398
      3bc455f7
    • Alexander V. Tikhonov's avatar
      test: enable flaky tests on FreeBSD 12 · 8bcb6409
      Alexander V. Tikhonov authored
      Testing on FreeBSD 12 had some tests previously blocked to avoid of
      flaky fails. For now we have the ability to avoid of it in test-run
      using checksums for fails with opened issues. So adding back 7 tests
      to testing on FreeBSD 12.
      
      Closes #4271
      8bcb6409
    • Alexander V. Tikhonov's avatar
      test: move error messages into logs gh-5383 · fa66c295
      Alexander V. Tikhonov authored
      Set error message to log output in test:
      
        vinyl/gc.test.lua
      fa66c295
    • Alexander V. Tikhonov's avatar
      test: move error messages into logs gh-4984 · e95aec95
      Alexander V. Tikhonov authored
      Set error message to log output in test:
      
        vinyl/snapshot.test.lua
      e95aec95
    • Alexander V. Tikhonov's avatar
      test: move error messages into logs gh-5366 · c34d1d67
      Alexander V. Tikhonov authored
      Set error message to log output in test:
      
        replication/gh-4402-info-errno.test.lua
      c34d1d67
    • Alexander V. Tikhonov's avatar
      test: move error messages into logs gh-4985 · ca0c2799
      Alexander V. Tikhonov authored
      Set error message to log output in test:
      
        replication/replica_rejoin.test.lua
      ca0c2799
    • Alexander V. Tikhonov's avatar
      test: move error messages into logs gh-4940 · 1433ed8e
      Alexander V. Tikhonov authored
      Set error message to log output in test:
      
        replication/gh-3160-misc-heartbeats-on-master-changes.test.lua
      1433ed8e
  2. Oct 12, 2020
    • Vladislav Shpilevoy's avatar
      raft: introduce election_mode configuration option · 24974f36
      Vladislav Shpilevoy authored
      The new option can be one of 3 values: 'off', 'candidate',
      'voter'. It replaces 2 old options: election_is_enabled and
      election_is_candidate. These flags looked strange, that it was
      possible to set candidate true, but disable election at the same
      time. Also it would not look good if we would ever decide to
      introduce another mode like a data-less sentinel node, for
      example. Just for voting.
      
      Anyway, the single option approach looks easier to configure and
      to extend.
      
      - 'off' means the election is disabled on the node. It is the same
        as election_is_enabled = false in the old config;
      
      - 'voter' means the node can vote and is never writable. The same
        as election_is_enabled = true + election_is_candidate = false in
        the old config;
      
      - 'candidate' means the node is a full-featured cluster member,
        which eventually may become a leader. The same as
        election_is_enabled = true + election_is_candidate = true in the
        old config.
      
      Part of #1146
      24974f36
  3. Oct 07, 2020
    • Aleksandr Lyapunov's avatar
      Introduce fselect - formatted select · 0dc72812
      Aleksandr Lyapunov authored
      space:fselect and index:fselect fetch data like ordinal select,
      but formats the result like mysql does - with columns, column
      names etc. fselect converts tuple to strings using json,
      extending with spaces and cutting tail if necessary. It is
      designed for visual analysis of select result and shouldn't
      be used stored procedures.
      
      index:fselect(<key>, <opts>, <fselect_opts>)
      space:fselect(<key>, <opts>, <fselect_opts>)
      
      There are some options that can be specified in different ways:
       - among other common options (<opts>) with 'fselect_' prefix.
         (e.g. 'fselect_type=..')
       - in special <fselect_opts> map (with or without prefix).
       - in global variables with 'fselect_' prefix.
      
      The possible options are:
       - type:
          - 'sql' - like mysql result (default).
          - 'gh' (or 'github' or 'markdown') - markdown syntax, for
            copy-pasting to github.
          - 'jira' - jira table syntax (for copy-pasting to jira).
       - widths: array with desired widths of columns.
       - max_width: limit entire length of a row string, longest fields
         will be cut if necessary. Set to 0 (default) to detect and use
         screen width. Set to -1 for no limit.
       - print: (default - false) - print each line instead of adding
         to result.
       - use_nbsp: (default - true) - add invisible spaces to improve
         readability in YAML output. Not applicabble when print=true.
      
      There is also a pair of shortcuts:
      index/space:gselect - same as fselect, but with type='gh'.
      index/space:jselect - same as fselect, but with type='jira'.
      
      See test/engine/select.test.lua for examples.
      
      Closes #5161
      0dc72812
  4. Oct 06, 2020
  5. Oct 02, 2020
    • Igor Munkin's avatar
      lua: abort trace recording on fiber yield · 2711797b
      Igor Munkin authored
      
      Since Tarantool fibers don't respect Lua coroutine switch mechanism, JIT
      machinery stays unnotified when one lua_State substitutes another one.
      As a result if trace recording hasn't been aborted prior to fiber
      switch, the recording proceeds using the new lua_State and leads to a
      failure either on any further compiler phase or while the compiled trace
      is executed.
      
      This changeset extends <cord_on_yield> routine aborting trace recording
      when the fiber switches to another one. If the switch-over occurs while
      mcode is being run the platform finishes its execution with EXIT_FAILURE
      code and calls panic routine prior to the exit.
      
      Closes #1700
      Fixes #4491
      
      Reviewed-by: default avatarSergey Ostanevich <sergos@tarantool.org>
      Reviewed-by: default avatarVladislav Shpilevoy <v.shpilevoy@tarantool.org>
      Signed-off-by: default avatarIgor Munkin <imun@tarantool.org>
      2711797b
    • Igor Munkin's avatar
      fiber: introduce a callback for fibers switch-over · a390ec55
      Igor Munkin authored
      
      Tarantool integrates several complex environments together and there are
      issues occurring at their junction leading to the platform failures.
      E.g. fiber switch-over is implemented outside the Lua world, so when one
      lua_State substitutes another one, main LuaJIT engines, such as JIT and
      GC, are left unnotified leading to the further platform misbehaviour.
      
      To solve this severe integration drawback <cord_on_yield> function is
      introduced. This routine encloses the checks and actions to be done when
      the running fiber yields the execution.
      
      Unfortunately the way callback is implemented introduces a circular
      dependency. Considering linker symbol resolving methods for static build
      an auxiliary translation unit is added to the particular tests mocking
      (i.e. exporting) <cord_on_yield> undefined symbol.
      
      Part of #1700
      Relates to #4491
      
      Reviewed-by: default avatarSergey Ostanevich <sergos@tarantool.org>
      Reviewed-by: default avatarVladislav Shpilevoy <v.shpilevoy@tarantool.org>
      Signed-off-by: default avatarIgor Munkin <imun@tarantool.org>
      a390ec55
  6. Oct 01, 2020
  7. Sep 29, 2020
    • Vladislav Shpilevoy's avatar
      raft: add tests · cf799645
      Vladislav Shpilevoy authored
      Part of #1146
      cf799645
    • Vladislav Shpilevoy's avatar
      raft: introduce box.info.election · 15fc8449
      Vladislav Shpilevoy authored
      Box.info.election returns a table of form:
      
          {
              state: <string>,
              term: <number>,
              vote: <instance ID>,
              leader: <instance ID>
          }
      
      The fields correspond to the same named Raft concepts one to one.
      This info dump is supposed to help with the tests, first of all.
      And with investigation of problems in a real cluster.
      
      The API doesn't mention 'Raft' on purpose, to keep it not
      depending specifically on Raft, and not to confuse users who
      don't know anything about Raft (even that it is about leader
      election and synchronous replication).
      
      Part of #1146
      15fc8449
    • Vladislav Shpilevoy's avatar
      raft: introduce box.cfg.election_* options · 1d329f0b
      Vladislav Shpilevoy authored
      The new options are:
      
      - election_is_enabled - enable/disable leader election (via
        Raft). When disabled, the node is supposed to work like if Raft
        does not exist. Like earlier;
      
      - election_is_candidate - a flag whether the instance can try to
        become a leader. Note, it can vote for other nodes regardless
        of value of this option;
      
      - election_timeout - how long need to wait until election end, in
        seconds.
      
      The options don't do anything now. They are added separately in
      order to keep such mundane changes from the main Raft commit, to
      simplify its review.
      
      Option names don't mention 'Raft' on purpose, because
      - Not all users know what is Raft, so they may not even know it
        is related to leader election;
      - In future the algorithm may change from Raft to something else,
        so better not to depend on it too much in the public API.
      
      Part of #1146
      1d329f0b
  8. Sep 28, 2020
    • Roman Khabibov's avatar
      box: disallow to alter SQL view · c5cb8d31
      Roman Khabibov authored
      Ban ability to modify view on box level. Since a view is a named
      select, and not a table, in fact, altering view is not a valid
      operation.
      c5cb8d31
    • Alexander V. Tikhonov's avatar
      Add flaky tests checksums to fragile · 75ba744b
      Alexander V. Tikhonov authored
      Added for tests with issues:
        app/fiber.test.lua				gh-5341
        app-tap/debug.test.lua			gh-5346
        app-tap/http_client.test.lua			gh-5346
        app-tap/inspector.test.lua			gh-5346
        box/gh-2763-session-credentials-update.test.lua gh-5363
        box/hash_collation.test.lua			gh-5247
        box/lua.test.lua				gh-5351
        box/net.box_connect_triggers_gh-2858.test.lua	gh-5247
        box/net.box_incompatible_index-gh-1729.test.lua gh-5360
        box/net.box_on_schema_reload-gh-1904.test.lua gh-5354
        box/protocol.test.lua				gh-5247
        box/update.test.lua				gh-5247
        box-tap/net.box.test.lua			gh-5346
        replication/autobootstrap.test.lua		gh-4533
        replication/autobootstrap_guest.test.lua	gh-4533
        replication/ddl.test.lua			gh-5337
        replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940
        replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua.test.lua gh-5357
        replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343
        replication/long_row_timeout.test.lua		gh-4351
        replication/on_replace.test.lua		gh-5344, gh-5349
        replication/prune.test.lua			gh-5361
        replication/qsync_advanced.test.lua		gh-5340
        replication/qsync_basic.test.lua		gh-5355
        replication/replicaset_ro_mostly.test.lua	gh-5342
        replication/wal_rw_stress.test.lua		gh-5347
        replication-py/multi.test.py			gh-5362
        sql/prepared.test.lua test			gh-5359
        sql-tap/selectG.test.lua			gh-5350
        vinyl/ddl.test.lua				gh-5338
        vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197
        vinyl/iterator.test.lua			gh-5336
        vinyl/write_iterator_rand.test.lua	gh-5356
        xlog/panic_on_wal_error.test.lua		gh-5348
      75ba744b
  9. Sep 25, 2020
    • Alexander V. Tikhonov's avatar
      test: fix mistake in replication/suite.ini · 6b98017e
      Alexander V. Tikhonov authored
      Removed dust line from merge.
      6b98017e
    • Alexander V. Tikhonov's avatar
      Enable test reruns on failed fragiled tests · 74328386
      Alexander V. Tikhonov authored
      In test-run implemented the new format of the fragile lists based on
      JSON format set as fragile option in 'suite.ini' files per each suite:
      
         fragile = {
              "retries": 10,
              "tests": {
                  "bitset.test.lua": {
                      "issues": [ "gh-4095" ],
                      "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
                  }
              }}
      
      Added ability to check results file checksum on tests fail and
      compare with the checksums of the known issues mentioned in the
      fragile list.
      
      Also added ability to set 'retries' option, which sets the number
      of accepted reruns of the tests failed from 'fragile' list that
      have checksums on its fails.
      
      Closes #5050
      74328386
    • Alexander V. Tikhonov's avatar
      test: flaky replication/anon.test.lua test · bb856247
      Alexander V. Tikhonov authored
      Found flaky issues multi running replication/anon.test.lua test
      on the single worker:
      
       [007] --- replication/anon.result	Fri Jun  5 09:02:25 2020
       [007] +++ replication/anon.reject	Mon Jun  8 01:19:37 2020
       [007] @@ -55,7 +55,7 @@
       [007]
       [007]  box.info.status
       [007]   | ---
       [007] - | - running
       [007] + | - orphan
       [007]   | ...
       [007]  box.info.id
       [007]   | ---
      
       [094] --- replication/anon.result       Sat Jun 20 06:02:43 2020
       [094] +++ replication/anon.reject       Tue Jun 23 19:35:28 2020
       [094] @@ -154,7 +154,7 @@
       [094]  -- Test box.info.replication_anon.
       [094]  box.info.replication_anon
       [094]   | ---
       [094] - | - count: 1
       [094] + | - count: 2
       [094]   | ...
       [094]  #box.info.replication_anon()
       [094]   | ---
       [094]
      
      It happend because replications may stay active from the previous
      runs on the common tarantool instance at the test-run worker. To
      avoid of it added restarting of the tarantool instance at the very
      start of the test.
      
      Closes #5058
      bb856247
  10. Sep 23, 2020
    • Aleksandr Lyapunov's avatar
      txm: add a test · 0018398d
      Aleksandr Lyapunov authored
      Closes #4897
      0018398d
    • Aleksandr Lyapunov's avatar
      test: move txn_proxy.lua to box/lua · 6f9f57fa
      Aleksandr Lyapunov authored
      txn_proxy is a special utility for transaction tests.
      Formerly it was used only for vinyl tests and thus was placed in
      vinyl folder.
      Now the time has come to test memtx transactions and the utility
      must be placed amongst other utils - in box/lua.
      
      Needed for #4897
      6f9f57fa
    • Aleksandr Lyapunov's avatar
      txm: introduce memtx tx manager · bd1ed6dd
      Aleksandr Lyapunov authored
      Define memtx TX manager. It will store data for MVCC and conflict
      manager. Define also 'memtx_use_mvcc_engine' in config that
      enables that MVCC engine.
      
      Part of #4897
      bd1ed6dd
  11. Sep 18, 2020
    • Vladislav Shpilevoy's avatar
      tests: fix replication/prune.test.lua hang · f7bcdf4c
      Vladislav Shpilevoy authored
      The test tried to start a replica whose box.cfg would hang, with
      replication_connect_quorum = 0 to make it return immediately.
      
      But the quorum parameter was added and removed during work on
      44421317 ("replication: do not
      register outgoing connections"). Instead, to start the replica
      without blocking on box.cfg it is necessary to pass 'wait=False'
      with the test_run:cmd('start server') command.
      
      Closes #5311
      f7bcdf4c
  12. Sep 17, 2020
    • Vladislav Shpilevoy's avatar
      replication: do not register outgoing connections · 44421317
      Vladislav Shpilevoy authored
      Replication protocol's first stage for non-anonymous replicas is
      that the replica should be registered in _cluster to get a unique
      ID number.
      
      That happens, when replica connects to a writable node, which
      performs the registration. So it means, registration always
      happens on the master node when appears an *incoming* request for
      it, explicitly asking for a registration. Only relay can do that.
      
      That wasn't the case for bootstrap. If box.cfg.replication wasn't
      empty on the master node doing the cluster bootstrap, it
      registered all the outgoing connections in _cluster. Note, the
      target node could be even anonymous, but still was registered.
      
      That breaks the protocol, and leads to registration of anon
      replicas sometimes. The patch drops it.
      
      Another motivation here is Raft cluster bootstrap specifics.
      During Raft bootstrap it is going to be very important that
      non-joined replicas should not be registered in _cluster. A
      replica can only register after its JOIN request was accepted, and
      its snapshot download has started.
      
      Closes #5287
      Needed for #1146
      44421317
    • Vladislav Shpilevoy's avatar
      replication: retry in case of XlogGapError · f1a507b0
      Vladislav Shpilevoy authored
      Previously XlogGapError was considered a critical error stopping
      the replication. That may be not so good as it looks.
      
      XlogGapError is a perfectly fine error, which should not kill the
      replication connection. It should be retried instead.
      
      Because here is an example, when the gap can be recovered on its
      own. Consider the case: node1 is a leader, it is booted with
      vclock {1: 3}. Node2 connects and fetches snapshot of node1, it
      also gets vclock {1: 3}. Then node1 writes something and its
      vclock becomes {1: 4}. Now node3 boots from node1, and gets the
      same vclock. Vclocks now look like this:
      
        - node1: {1: 4}, leader, has {1: 3} snap.
        - node2: {1: 3}, booted from node1, has only snap.
        - node3: {1: 4}, booted from node1, has only snap.
      
      If the cluster is a fullmesh, node2 will send subscribe requests
      with vclock {1: 3}. If node3 receives it, it will respond with
      xlog gap error, because it only has a snap with {1: 4}, nothing
      else. In that case node2 should retry connecting to node3, and in
      the meantime try to get newer changes from node1.
      
      The example is totally valid. However it is unreachable now
      because master registers all replicas in _cluster before allowing
      them to make a join. So they all bootstrap from a snapshot
      containing all their IDs. This is a bug, because such
      auto-registration leads to registration of anonymous replicas, if
      they are present during bootstrap. Also it blocks Raft, which
      can't work if there are registered, but not yet joined nodes.
      
      Once the registration problem will be solved in a next commit, the
      XlogGapError will strike quite often during bootstrap. This patch
      won't allow that happen.
      
      Needed for #5287
      f1a507b0
    • Vladislav Shpilevoy's avatar
      xlog: introduce an error code for XlogGapError · fc8e2297
      Vladislav Shpilevoy authored
      XlogGapError object didn't have a code in ClientError code space.
      Because of that it was not possible to handle the gap error
      together with client errors in some switch-case statement.
      
      Now the gap error has a code.
      
      This is going to be used in applier code to handle XlogGapError
      among other errors using its code instead of RTTI.
      
      Needed for #5287
      fc8e2297
  13. Sep 15, 2020
    • Alexander V. Tikhonov's avatar
      test: flaky replication/gh-3704-misc-* · db3dd8dd
      Alexander V. Tikhonov authored
      On heavy loaded hosts found the following issue:
      
        [037] --- replication/gh-3704-misc-replica-checks-cluster-id.result	Thu Sep 10 18:05:22 2020
        [037] +++ replication/gh-3704-misc-replica-checks-cluster-id.reject	Fri Sep 11 11:09:38 2020
        [037] @@ -25,7 +25,7 @@
        [037]  ...
        [037]  box.info.replication[2].downstream.status
        [037]  ---
        [037] -- follow
        [037] +- stopped
        [037]  ...
        [037]  -- change master's cluster uuid and check that replica doesn't connect.
        [037]  test_run:cmd("stop server replica")
      
      It happened because replication downstream status check occurred too
      early, when it was only in 'stopped' state. To give the replication
      status check routine ability to reach the needed 'follow' state, it
      need to wait for it using test_run:wait_downstream() routine.
      
      Closes #5293
      db3dd8dd
  14. Sep 14, 2020
    • Vladislav Shpilevoy's avatar
      memtx: force async snapshot transactions · c620735c
      Vladislav Shpilevoy authored
      Snapshot rows contain not real LSNs. Instead their LSNs are
      signatures, ordinal numbers. Rows in the snap have LSNs from 1 to
      the number of rows. This is because LSNs are not stored with every
      tuple in the storages, and there is no way to store real LSNs in
      the snapshot.
      
      These artificial LSNs broke the synchronous replication limbo.
      After snap recovery is done, limbo vclock was broken - it
      contained numbers not related to reality, and affected by rows
      from local spaces.
      
      Also the recovery could stuck because ACKs in the limbo stopped
      working after a first row - the vclock was set to the final
      signature right away.
      
      This patch makes all snapshot recovered rows async. Because they
      are confirmed by definition. So now the limbo is not involved into
      the snapshot recovery.
      
      Closes #5298
      c620735c
  15. Sep 12, 2020
    • Vladislav Shpilevoy's avatar
      limbo: don't wake self fiber on CONFIRM write · a0477827
      Vladislav Shpilevoy authored
      During recovery WAL writes end immediately, without yields.
      Therefore WAL write completion callback is executed in the
      currently active fiber.
      
      Txn limbo on CONFIRM WAL write wakes up the waiting fiber, which
      appears to be the same as the active fiber during recovery.
      
      That breaks the fiber scheduler, because apparently it is not safe
      to wake the currently active fiber unless it is going to call
      fiber_yield() immediately after. See a comment in fiber_wakeup()
      implementation about that way of usage.
      
      The patch simply stops waking the waiting fiber, if it is the
      currently active one.
      
      Closes #5288
      Closes #5232
      a0477827
  16. Sep 11, 2020
    • Alexander V. Tikhonov's avatar
      test: replication/status.test.lua fails on Debug · 008e732c
      Alexander V. Tikhonov authored
      
      Found 2 issues on Debug build:
      
        [009] --- replication/status.result	Fri Sep 11 10:04:53 2020
        [009] +++ replication/status.reject	Fri Sep 11 13:16:21 2020
        [009] @@ -174,7 +174,8 @@
        [009]  ...
        [009]  test_run:wait_downstream(replica_id, {status == 'follow'})
        [009]  ---
        [009] -- true
        [009] +- error: '[string "return test_run:wait_downstream(replica_id, {..."]:1: variable
        [009] +    ''status'' is not declared'
        [009]  ...
        [009]  -- wait for the replication vclock
        [009]  test_run:wait_cond(function()                    \
        [009] @@ -226,7 +227,8 @@
        [009]  ...
        [009]  test_run:wait_upstream(master_id, {status == 'follow'})
        [009]  ---
        [009] -- true
        [009] +- error: '[string "return test_run:wait_upstream(master_id, {sta..."]:1: variable
        [009] +    ''status'' is not declared'
        [009]  ...
        [009]  master.upstream.lag < 1
        [009]  ---
      
      It happened because of the change introduced in commit [1]. Where
      mistakenly were used wait_upstream()/wait_downstream() with:
      
        test_run:wait_*stream(*_id, {status == 'follow'})
      
      with status set using '==' instead of '='. We unable to read status
      variable when the strict mode is enabled. It is enabled by default on
      Debug builds.
      
      Follows up #5110
      Closes #5297
      
      Reviewed-by: default avatarAlexander Turenko <alexander.turenko@tarantool.org>
      Co-authored-by: default avatarAlexander Turenko <alexander.turenko@tarantool.org>
      
      [1] - a08b4f3a ("test: flaky replication/status.test.lua status")
      Unverified
      008e732c
    • Oleg Babin's avatar
      lua: fix panic in case when log.cfg.log incorrecly specified · 85f19a87
      Oleg Babin authored
      This patch makes log.cfg{log = ...} behaviour the same as in
      box.cfg{log = ...} and fixes panic if "log" is incorrectly
      specified. For such purpose we export "say_parse_logger_type"
      function and use for logger type validation and logger type
      parsing.
      
      Closes #5130
      85f19a87
    • Alexander V. Tikhonov's avatar
      test: flaky replication/gh-5195-qsync-* · a43414a5
      Alexander V. Tikhonov authored
      On heavy loaded hosts found the following issue:
      
         box.cfg{replication_synchro_quorum = 2}
          | ---
        + | - error: '[string "test_run:wait_cond(function()                ..."]:1: attempt to
        + |     index field ''vclock'' (a nil value)'
          | ...
      
      The issue output was not correct due to wrong output list. Real command
      that caused the initial issue was the previous command:
      
        test_run:wait_cond(function()                                                   \
                local info = box.info.replication[replica_id]                           \
                local lsn = info.downstream.vclock[replica_id]                          \
                return lsn and lsn >= replica_lsn                                       \
        end)
      
      It happened because replication vclock field was not exist at the moment
      of its check. To fix the issue, vclock field had to be waited to be
      available using test_run:wait_cond() routine.
      
      Closes #5230
      a43414a5
    • Alexander V. Tikhonov's avatar
      test: flaky replication/wal_off.test.lua test · ad4d0564
      Alexander V. Tikhonov authored
      On heavy loaded hosts found the following issue:
      
        [035] --- replication/wal_off.result	Fri Jul  3 04:29:56 2020
        [035] +++ replication/wal_off.reject	Mon Sep  7 15:32:46 2020
        [035] @@ -47,6 +47,8 @@
        [035]  ...
        [035]  while box.info.replication[wal_off_id].upstream.message ~= check do fiber.sleep(0) end
        [035]  ---
        [035] +- error: '[string "while box.info.replication[wal_off_id].upstre..."]:1: attempt to
        [035] +    index field ''upstream'' (a nil value)'
        [035]  ...
        [035]  box.info.replication[wal_off_id].upstream ~= nil
        [035]  ---
      
      It happened because replication upstream status check occurred too
      early, when its state was not set. To give the replication status
      check routine ability to reach the needed 'stopped' state, it need
      to wait for it using test_run:wait_upstream() routine.
      
      Closes #5278
      ad4d0564
    • Alexander V. Tikhonov's avatar
      test: flaky replication/status.test.lua status · a08b4f3a
      Alexander V. Tikhonov authored
      On heavy loaded hosts found the following 3 issues:
      
      line 174:
      
       [026] --- replication/status.result	Thu Jun 11 12:07:39 2020
       [026] +++ replication/status.reject	Sun Jun 14 03:20:21 2020
       [026] @@ -174,15 +174,17 @@
       [026]  ...
       [026]  replica.downstream.status == 'follow'
       [026]  ---
       [026] -- true
       [026] +- false
       [026]  ...
      
      It happened because replication downstream status check occurred too
      early. To give the replication status check routine ability to reach
      the needed 'follow' state, it need to wait for it using
      test_run:wait_downstream() routine.
      
      line 178:
      
      [024] --- replication/status.result	Mon Sep  7 00:22:52 2020
      [024] +++ replication/status.reject	Mon Sep  7 00:36:01 2020
      [024] @@ -178,11 +178,13 @@
      [024]  ...
      [024]  replica.downstream.vclock[master_id] == box.info.vclock[master_id]
      [024]  ---
      [024] -- true
      [024] +- error: '[string "return replica.downstream.vclock[master_id] =..."]:1: attempt to
      [024] +    index field ''vclock'' (a nil value)'
      [024]  ...
      [024]  replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
      [024]  ---
      [024] -- true
      [024] +- error: '[string "return replica.downstream.vclock[replica_id] ..."]:1: attempt to
      [024] +    index field ''vclock'' (a nil value)'
      [024]  ...
      [024]  --
      [024]  -- Replica
      
      It happened because replication vclock field was not exist at the moment
      of its check. To fix the issue, vclock field had to be waited to be
      available using test_run:wait_cond() routine. Also the replication data
      downstream had to be read at the same moment.
      
      line 224:
      
      [014] --- replication/status.result	Fri Jul  3 04:29:56 2020
      [014] +++ replication/status.reject	Mon Sep  7 00:17:30 2020
      [014] @@ -224,7 +224,7 @@
      [014]  ...
      [014]  master.upstream.status == "follow"
      [014]  ---
      [014] -- true
      [014] +- false
      [014]  ...
      [014]  master.upstream.lag < 1
      [014]  ---
      
      It happened because replication upstream status check occurred too
      early. To give the replication status check routine ability to reach
      the needed 'follow' state, it need to wait for it using
      test_run:wait_upstream() routine.
      
      Removed test from 'fragile' test_run tool list to run it in parallel.
      
      Closes #5110
      a08b4f3a
    • Alexander V. Tikhonov's avatar
      test: flaky replication/gh-4606-admin-creds test · 11ba3322
      Alexander V. Tikhonov authored
      On heavy loaded hosts found the following issue:
      
        [021] --- replication/gh-4606-admin-creds.result	Wed Apr 15 15:47:41 2020
        [021] +++ replication/gh-4606-admin-creds.reject	Sun Sep  6 20:23:09 2020
        [021] @@ -36,7 +36,42 @@
        [021]   | ...
        [021]  i.replication[i.id % 2 + 1].upstream.status == 'follow' or i
        [021]   | ---
        [021] - | - true
        [021] + | - version: 2.6.0-52-g71a24b9f2
        [021] + |   id: 2
        [021] + |   ro: false
        [021] + |   uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79
        [021] + |   package: Tarantool
        [021] + |   cluster:
        [021] + |     uuid: f27dfdfe-2802-486a-bc47-abc83b9097cf
        [021] + |   listen: unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/replica_auth.socket-iproto
        [021] + |   replication_anon:
        [021] + |     count: 0
        [021] + |   replication:
        [021] + |     1:
        [021] + |       id: 1
        [021] + |       uuid: a07cad18-d27f-48c4-8d56-96b17026702e
        [021] + |       lsn: 3
        [021] + |       upstream:
        [021] + |         peer: admin@unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/master.socket-iproto
        [021] + |         lag: 0.0030207633972168
        [021] + |         status: disconnected
        [021] + |         idle: 0.44824500009418
        [021] + |         message: timed out
        [021] + |         system_message: Operation timed out
        [021] + |     2:
        [021] + |       id: 2
        [021] + |       uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79
        [021] + |       lsn: 0
        [021] + |   signature: 3
        [021] + |   status: running
        [021] + |   vclock: {1: 3}
        [021] + |   uptime: 1
        [021] + |   lsn: 0
        [021] + |   sql: []
        [021] + |   gc: []
        [021] + |   vinyl: []
        [021] + |   memory: []
        [021] + |   pid: 40326
        [021]   | ...
        [021]  test_run:switch('default')
        [021]   | ---
      
      It happened because replication upstream status check occurred too
      early, when it was only in 'disconnected' state. To give the
      replication status check routine ability to reach the needed 'follow'
      state, it need to wait for it using test_run:wait_upstream() routine.
      
      Closes #5233
      11ba3322
    • Alexander V. Tikhonov's avatar
      test: flaky replication/gh-4402-info-errno.test.lua · 2b1f8f9b
      Alexander V. Tikhonov authored
      On heavy loaded hosts found the following issue:
      
        [004] --- replication/gh-4402-info-errno.result	Wed Jul 22 06:13:34 2020
        [004] +++ replication/gh-4402-info-errno.reject	Wed Jul 22 06:41:14 2020
        [004] @@ -32,7 +32,39 @@
        [004]   | ...
        [004]  d ~= nil and d.status == 'follow' or i
        [004]   | ---
        [004] - | - true
        [004] + | - version: 2.6.0-10-g8df49e4
        [004] + |   id: 1
        [004] + |   ro: false
        [004] + |   uuid: 41c4e3bf-cc3b-443d-88c9-39a9a8fe2df9
        [004] + |   package: Tarantool
        [004] + |   cluster:
        [004] + |     uuid: 6ec7bcce-68e7-41a4-b84b-dc9236621579
        [004] + |   listen: unix/:(socket)
        [004] + |   replication_anon:
        [004] + |     count: 0
        [004] + |   replication:
        [004] + |     1:
        [004] + |       id: 1
        [004] + |       uuid: 41c4e3bf-cc3b-443d-88c9-39a9a8fe2df9
        [004] + |       lsn: 52
        [004] + |     2:
        [004] + |       id: 2
        [004] + |       uuid: 8a989231-177a-4eb8-8030-c148bc752b0e
        [004] + |       lsn: 0
        [004] + |       downstream:
        [004] + |         status: stopped
        [004] + |         message: timed out
        [004] + |         system_message: Connection timed out
        [004] + |   signature: 52
        [004] + |   status: running
        [004] + |   vclock: {1: 52}
        [004] + |   uptime: 27
        [004] + |   lsn: 52
        [004] + |   sql: []
        [004] + |   gc: []
        [004] + |   vinyl: []
        [004] + |   memory: []
        [004] + |   pid: 99
        [004]   | ...
        [004]
        [004]  test_run:cmd('stop server replica')
      
      It happened because replication downstream status check occurred too
      early, when it was only in 'stopped' state. To give the replication
      status check routine ability to reach the needed 'follow' state, it
      need to wait for it using test_run:wait_downstream() routine.
      
      Closes #5235
      2b1f8f9b
    • Alexander V. Tikhonov's avatar
      test: flaky replication/gh-4928-tx-boundaries test · 5410e592
      Alexander V. Tikhonov authored
      On heavy loaded hosts found the following issue:
      
        [089] --- replication/gh-4928-tx-boundaries.result	Wed Jul 29 04:08:29 2020
        [089] +++ replication/gh-4928-tx-boundaries.reject	Wed Jul 29 04:24:02 2020
        [089] @@ -94,7 +94,7 @@
        [089]   | ...
        [089]  box.info.replication[1].upstream.status
        [089]   | ---
        [089] - | - follow
        [089] + | - disconnected
        [089]   | ...
        [089]
        [089]  box.space.glob:select{}
      
      It happened because replication upstream status check occurred too
      early, when it was only in 'disconnected' state. To give the
      replication status check routine ability to reach the needed 'follow'
      state, it need to wait for it using test_run:wait_upstream() routine.
      
      Closes #5234
      5410e592
Loading