Skip to content
Snippets Groups Projects
  1. Oct 06, 2020
  2. Oct 02, 2020
    • Igor Munkin's avatar
      lua: abort trace recording on fiber yield · 2711797b
      Igor Munkin authored
      
      Since Tarantool fibers don't respect Lua coroutine switch mechanism, JIT
      machinery stays unnotified when one lua_State substitutes another one.
      As a result if trace recording hasn't been aborted prior to fiber
      switch, the recording proceeds using the new lua_State and leads to a
      failure either on any further compiler phase or while the compiled trace
      is executed.
      
      This changeset extends <cord_on_yield> routine aborting trace recording
      when the fiber switches to another one. If the switch-over occurs while
      mcode is being run the platform finishes its execution with EXIT_FAILURE
      code and calls panic routine prior to the exit.
      
      Closes #1700
      Fixes #4491
      
      Reviewed-by: default avatarSergey Ostanevich <sergos@tarantool.org>
      Reviewed-by: default avatarVladislav Shpilevoy <v.shpilevoy@tarantool.org>
      Signed-off-by: default avatarIgor Munkin <imun@tarantool.org>
      2711797b
    • Igor Munkin's avatar
      fiber: introduce a callback for fibers switch-over · a390ec55
      Igor Munkin authored
      
      Tarantool integrates several complex environments together and there are
      issues occurring at their junction leading to the platform failures.
      E.g. fiber switch-over is implemented outside the Lua world, so when one
      lua_State substitutes another one, main LuaJIT engines, such as JIT and
      GC, are left unnotified leading to the further platform misbehaviour.
      
      To solve this severe integration drawback <cord_on_yield> function is
      introduced. This routine encloses the checks and actions to be done when
      the running fiber yields the execution.
      
      Unfortunately the way callback is implemented introduces a circular
      dependency. Considering linker symbol resolving methods for static build
      an auxiliary translation unit is added to the particular tests mocking
      (i.e. exporting) <cord_on_yield> undefined symbol.
      
      Part of #1700
      Relates to #4491
      
      Reviewed-by: default avatarSergey Ostanevich <sergos@tarantool.org>
      Reviewed-by: default avatarVladislav Shpilevoy <v.shpilevoy@tarantool.org>
      Signed-off-by: default avatarIgor Munkin <imun@tarantool.org>
      a390ec55
  3. Oct 01, 2020
  4. Sep 30, 2020
  5. Sep 29, 2020
    • Vladislav Shpilevoy's avatar
      raft: add tests · cf799645
      Vladislav Shpilevoy authored
      Part of #1146
      cf799645
    • Vladislav Shpilevoy's avatar
      raft: introduce box.info.election · 15fc8449
      Vladislav Shpilevoy authored
      Box.info.election returns a table of form:
      
          {
              state: <string>,
              term: <number>,
              vote: <instance ID>,
              leader: <instance ID>
          }
      
      The fields correspond to the same named Raft concepts one to one.
      This info dump is supposed to help with the tests, first of all.
      And with investigation of problems in a real cluster.
      
      The API doesn't mention 'Raft' on purpose, to keep it not
      depending specifically on Raft, and not to confuse users who
      don't know anything about Raft (even that it is about leader
      election and synchronous replication).
      
      Part of #1146
      15fc8449
    • Vladislav Shpilevoy's avatar
      raft: introduce state machine · 27399889
      Vladislav Shpilevoy authored
      The commit is a core part of Raft implementation. It introduces
      the Raft state machine implementation and its integration into the
      instance's life cycle.
      
      The implementation follows the protocol to the letter except a few
      important details.
      
      Firstly, the original Raft assumes, that all nodes share the same
      log record numbers. In Tarantool they are called LSNs. But in case
      of Tarantool each node has its own LSN in its own component of
      vclock. That makes the election messages a bit heavier, because
      the nodes need to send and compare complete vclocks of each other
      instead of a single number like in the original Raft. But logic
      becomes simpler. Because in the original Raft there is a problem
      of uncertainty about what to do with records of an old leader
      right after a new leader is elected. They could be rolled back or
      confirmed depending on circumstances. The issue disappears when
      vclock is used.
      
      Secondly, leader election works differently during cluster
      bootstrap, until number of bootstrapped replicas becomes >=
      election quorum. That arises from specifics of replicas bootstrap
      and order of systems initialization. In short: during bootstrap a
      leader election may use a smaller election quorum than the
      configured one. See more details in the code.
      
      Part of #1146
      27399889
    • sergepetrenko's avatar
      raft: relay status updates to followers · 67b60f08
      sergepetrenko authored
      The patch introduces a new type of system message used to notify the
      followers of the instance's raft status updates.
      It's relay's responsibility to deliver the new system rows to its peers.
      The notification system reuses and extends the same row type used to
      persist raft state in WAL and snapshot.
      
      Part of #1146
      Part of #5204
      67b60f08
    • Vladislav Shpilevoy's avatar
      raft: introduce box.cfg.election_* options · 1d329f0b
      Vladislav Shpilevoy authored
      The new options are:
      
      - election_is_enabled - enable/disable leader election (via
        Raft). When disabled, the node is supposed to work like if Raft
        does not exist. Like earlier;
      
      - election_is_candidate - a flag whether the instance can try to
        become a leader. Note, it can vote for other nodes regardless
        of value of this option;
      
      - election_timeout - how long need to wait until election end, in
        seconds.
      
      The options don't do anything now. They are added separately in
      order to keep such mundane changes from the main Raft commit, to
      simplify its review.
      
      Option names don't mention 'Raft' on purpose, because
      - Not all users know what is Raft, so they may not even know it
        is related to leader election;
      - In future the algorithm may change from Raft to something else,
        so better not to depend on it too much in the public API.
      
      Part of #1146
      1d329f0b
    • Vladislav Shpilevoy's avatar
      raft: introduce persistent raft state · 4f0f7c8f
      Vladislav Shpilevoy authored
      The patch introduces a sceleton of Raft module and a method to
      persist a Raft state in snapshot, not bound to any space.
      
      Part of #1146
      4f0f7c8f
    • Vladislav Shpilevoy's avatar
      replication: track registered replica count · 764a548a
      Vladislav Shpilevoy authored
      Struct replicaset didn't store a number of registered replicas.
      Only an array, which was necessary to fullscan each time when want
      to find the count.
      
      That is going to be needed in Raft to calculate election quorum.
      The patch makes the count tracked so as it could be found for
      constant time by simply reading an integer.
      
      Needed for #1146
      764a548a
    • Vladislav Shpilevoy's avatar
      wal: don't touch box.cfg.wal_dir more than once · 40335790
      Vladislav Shpilevoy authored
      Relay.cc and box.cc obtained box.cfg.wal_dir value using
      cfg_gets() call. To initialize WAL and create struct recovery
      objects.
      
      That is not only a bit dangerous (cfg_gets() uses Lua API and can
      throw a Lua error) and slow, but also not necessary - wal_dir
      parameter is constant, it can't be changed after instance start.
      
      It means, the value can be stored somewhere one time and then used
      without Lua.
      
      Main motivation is that the WAL directory path will be needed
      inside relay threads to restart their recovery iterators in the
      Raft patch. They can't use cfg_gets(), because Lua lives in TX
      thread. But can access a constant global variable, introduced in
      this patch (it existed before, but now has a method to get it).
      
      Needed for #1146
      40335790
    • Vladislav Shpilevoy's avatar
      box: introduce summary RO flag · 31dc4faf
      Vladislav Shpilevoy authored
      An instance is writable if box.cfg.read_only is false, and it is
      not orphan. Update of the final read-only state of the instance
      needs to fire read-only update triggers, and notify the engines.
      These 2 flags were easy and cheap to check on each operation, and
      the triggers were easy to use since both flags are stored and
      updated inside box.cc.
      
      That is going to change when Raft is introduced. Raft will add 2
      more checks:
      
        - A flag if Raft is enabled on the node. If it is not, then Raft
          state won't affect whether the instance is writable;
      
        - When Raft is enabled, it will allow writes on a leader only.
      
      It means a check for being read-only would look like this:
      
          is_ro || is_orphan || (raft_is_enabled() && !raft_is_leader())
      
      This is significantly slower. Besides, Raft somehow needs to
      access the read-only triggers and engine API - this looks wrong.
      
      The patch introduces a new flag is_ro_summary. The flag
      incorporates all the read-only conditions into one flag. When some
      subsystem may change read-only state of the instance, it needs to
      call box_update_ro_summary(), and the function takes care of
      updating the summary flag, running the triggers, and notifying the
      engines.
      
      Raft will use this function when its state or config will change.
      
      Needed for #1146
      31dc4faf
    • Vladislav Shpilevoy's avatar
      applier: store instance_id in struct applier · 7c14819f
      Vladislav Shpilevoy authored
      Applier is going to need its numeric ID in order to tell the
      future Raft module who is a sender of a Raft message. An
      alternative would be to add sender ID to each Raft message, but
      this looks like a crutch. Moreover, applier still needs to know
      its numeric ID in order to notify Raft about heartbeats from the
      peer node.
      
      Needed for #1146
      7c14819f
    • Sergey Kaplun's avatar
      httpc: src/httpc.c missed va_end() macro · 90108875
      Sergey Kaplun authored
      Found and fixed not closed va_list 'ap' with cppcheck:
      
      [src/httpc.c:190]: (error) va_list 'ap' was opened but not closed by va_end().
      90108875
  6. Sep 28, 2020
    • Roman Khabibov's avatar
      box: disallow to alter SQL view · c5cb8d31
      Roman Khabibov authored
      Ban ability to modify view on box level. Since a view is a named
      select, and not a table, in fact, altering view is not a valid
      operation.
      c5cb8d31
    • Alexander V. Tikhonov's avatar
      Add flaky tests checksums to fragile · 75ba744b
      Alexander V. Tikhonov authored
      Added for tests with issues:
        app/fiber.test.lua				gh-5341
        app-tap/debug.test.lua			gh-5346
        app-tap/http_client.test.lua			gh-5346
        app-tap/inspector.test.lua			gh-5346
        box/gh-2763-session-credentials-update.test.lua gh-5363
        box/hash_collation.test.lua			gh-5247
        box/lua.test.lua				gh-5351
        box/net.box_connect_triggers_gh-2858.test.lua	gh-5247
        box/net.box_incompatible_index-gh-1729.test.lua gh-5360
        box/net.box_on_schema_reload-gh-1904.test.lua gh-5354
        box/protocol.test.lua				gh-5247
        box/update.test.lua				gh-5247
        box-tap/net.box.test.lua			gh-5346
        replication/autobootstrap.test.lua		gh-4533
        replication/autobootstrap_guest.test.lua	gh-4533
        replication/ddl.test.lua			gh-5337
        replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940
        replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua.test.lua gh-5357
        replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343
        replication/long_row_timeout.test.lua		gh-4351
        replication/on_replace.test.lua		gh-5344, gh-5349
        replication/prune.test.lua			gh-5361
        replication/qsync_advanced.test.lua		gh-5340
        replication/qsync_basic.test.lua		gh-5355
        replication/replicaset_ro_mostly.test.lua	gh-5342
        replication/wal_rw_stress.test.lua		gh-5347
        replication-py/multi.test.py			gh-5362
        sql/prepared.test.lua test			gh-5359
        sql-tap/selectG.test.lua			gh-5350
        vinyl/ddl.test.lua				gh-5338
        vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197
        vinyl/iterator.test.lua			gh-5336
        vinyl/write_iterator_rand.test.lua	gh-5356
        xlog/panic_on_wal_error.test.lua		gh-5348
      75ba744b
    • Sergey Kaplun's avatar
      alter: box/alter.cc null pointer dereference · 641c1710
      Sergey Kaplun authored
      Found and fixed Null pointer dereference with cppcheck:
      
      [src/box/alter.cc:395]: (error) Null pointer dereference
      641c1710
    • Sergey Kaplun's avatar
      fiber: src/lua/fiber.c null pointer dereference · 58f84d1a
      Sergey Kaplun authored
      [src/lua/fiber.c:245] -> [src/lua/fiber.c:217]: (warning) Either the condition 'if(func)' is redundant or there is possible null pointer dereference: func.
      58f84d1a
  7. Sep 26, 2020
  8. Sep 25, 2020
    • Alexander Turenko's avatar
      test: update test-run · 59dca189
      Alexander Turenko authored
      Justify columns in the output.
      
      https://github.com/tarantool/test-run/pull/222
      Unverified
      59dca189
    • Alexander V. Tikhonov's avatar
      test: fix mistake in replication/suite.ini · 6b98017e
      Alexander V. Tikhonov authored
      Removed dust line from merge.
      6b98017e
    • Alexander V. Tikhonov's avatar
      Enable test reruns on failed fragiled tests · 74328386
      Alexander V. Tikhonov authored
      In test-run implemented the new format of the fragile lists based on
      JSON format set as fragile option in 'suite.ini' files per each suite:
      
         fragile = {
              "retries": 10,
              "tests": {
                  "bitset.test.lua": {
                      "issues": [ "gh-4095" ],
                      "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
                  }
              }}
      
      Added ability to check results file checksum on tests fail and
      compare with the checksums of the known issues mentioned in the
      fragile list.
      
      Also added ability to set 'retries' option, which sets the number
      of accepted reruns of the tests failed from 'fragile' list that
      have checksums on its fails.
      
      Closes #5050
      74328386
    • Alexander V. Tikhonov's avatar
      test: flaky replication/anon.test.lua test · bb856247
      Alexander V. Tikhonov authored
      Found flaky issues multi running replication/anon.test.lua test
      on the single worker:
      
       [007] --- replication/anon.result	Fri Jun  5 09:02:25 2020
       [007] +++ replication/anon.reject	Mon Jun  8 01:19:37 2020
       [007] @@ -55,7 +55,7 @@
       [007]
       [007]  box.info.status
       [007]   | ---
       [007] - | - running
       [007] + | - orphan
       [007]   | ...
       [007]  box.info.id
       [007]   | ---
      
       [094] --- replication/anon.result       Sat Jun 20 06:02:43 2020
       [094] +++ replication/anon.reject       Tue Jun 23 19:35:28 2020
       [094] @@ -154,7 +154,7 @@
       [094]  -- Test box.info.replication_anon.
       [094]  box.info.replication_anon
       [094]   | ---
       [094] - | - count: 1
       [094] + | - count: 2
       [094]   | ...
       [094]  #box.info.replication_anon()
       [094]   | ---
       [094]
      
      It happend because replications may stay active from the previous
      runs on the common tarantool instance at the test-run worker. To
      avoid of it added restarting of the tarantool instance at the very
      start of the test.
      
      Closes #5058
      bb856247
    • Alexander V. Tikhonov's avatar
      gitlab-ci: set opensuse jobs to test group · f56c8c4d
      Alexander V. Tikhonov authored
      Set opensuse jobs to test group to be sure that it will be run with
      artifacts collecting and without gitlab-ci jobs extra parallization.
      f56c8c4d
    • Alexander V. Tikhonov's avatar
      gitlab-ci: save failed test results artifacts · 814d3e27
      Alexander V. Tikhonov authored
      Added artifacts saver to all gitlab-ci jobs with testing.
      
      Gitlab-ci jobs saves its results files in the following paths:
      
        1. base jobs for testing different features:
          - test/var/artifacts
      
        2. OSX jobs:
          - ${OSX_VARDIR}/artifacts
      
        3. pack/deploy jobs:
          - build/usr/src/*/tarantool-*/test/var/artifacts
      
        4. VBOX jobs (freebsd_12) on virtual host:
          - ~/tarantool/test/var/artifacts
      
      In gitlab-ci configuration added 'after_script' section with script
      which collects from different test places 'artifacts' directories
      created by test-run tool. It saves 'artifacts' directories as root
      path in artifacts packages. User will be able to download these
      packages using gitlab-ci GUI either API.
      
      Additionally added OSX_VARDIR environment variable to be able to
      setup common path for artifacts and OSX shell scripts options.
      
        OSX_VARDIR: /tmp/tnt
      
      Part of #5050
      814d3e27
    • Sergey Bronnikov's avatar
      gitignore: ignore directories made on running jepsen tests · 25fc1c06
      Sergey Bronnikov authored
      On running Jepsen tests created directory with Terraform state and directory
      with Jepsen tests source code in a build directory. Everything is ok on using
      out of source build in a separate directory, but with building in a project
      root directory these directories appears in `git status` output. This patch add
      ignores for these directories.
      25fc1c06
    • Sergey Bronnikov's avatar
      cmake: move jepsen targets under option WITH_JEPSEN · a36749de
      Sergey Bronnikov authored
      For running Jepsen tests we need to checkout external repository with tests
      source code on a build stage. This behaviour brokes a Tarantool build under
      Gentoo. Option WITH_JEPSEN enables targets only when they needed.
      
      Closes #5325
      a36749de
  9. Sep 24, 2020
    • Alexander Turenko's avatar
      test: update test-run · 43482eed
      Alexander Turenko authored
      Retry a failed test when it is marked as fragile (and several other
      conditions are met, see below).
      
      The test-run already allows to set a list of fragile tests. They are run
      one-by-one after all parallel ones in order to eliminate possible
      resource starvation and fit timings to ones when the tests pass. See
      [1].
      
      In practice this approach does not help much against our problem with
      flaky tests. We decided to retry failed tests, when they are known as
      flagile. See [2].
      
      The core idea is to split responsibility: known flaky fails will not
      deflect attention of a developer, but each fragile test will be marked
      explicitly, trackerized and will be analyzed by the quality assurance
      team.
      
      The default behaviour is not changed: each test from the fragile list
      will be run once after all parallel ones. But now it is possible to set
      retries amount.
      
      Beware: the implementation does not allow to just set retries count, it
      also requires to provide an md5sum of a failed test output (so called
      reject file). The idea here is to ensure that we retry the test only in
      case of a known fail: not some other fail within the test.
      
      This approach has the limitation: in case of fail a test may output an
      information that varies from run to run or depend of a base directory.
      We should always verify the output before put its checksum into the
      configuration file.
      
      Despite doubts regarding this approach, it looks simple and we decided
      to try and revisit it if there will be a need.
      
      See configuration example in [3].
      
      [1]: https://github.com/tarantool/test-run/issues/187
      [2]: https://github.com/tarantool/test-run/issues/189
      [3]: https://github.com/tarantool/test-run/pull/217
      
      Part of #5050
      Unverified
      43482eed
  10. Sep 23, 2020
    • Aleksandr Lyapunov's avatar
      txm: add a test · 0018398d
      Aleksandr Lyapunov authored
      Closes #4897
      0018398d
    • Aleksandr Lyapunov's avatar
      test: move txn_proxy.lua to box/lua · 6f9f57fa
      Aleksandr Lyapunov authored
      txn_proxy is a special utility for transaction tests.
      Formerly it was used only for vinyl tests and thus was placed in
      vinyl folder.
      Now the time has come to test memtx transactions and the utility
      must be placed amongst other utils - in box/lua.
      
      Needed for #4897
      6f9f57fa
Loading