Skip to content
Snippets Groups Projects
  1. Dec 09, 2018
    • Vladislav Shpilevoy's avatar
      sio: remove unused functions · 4ed1c41d
      Vladislav Shpilevoy authored
      Next patches remove exceptions from sio and convert it
      to C. So as to do not care about unused functions they
      are deleted.
      4ed1c41d
    • Vladimir Davydov's avatar
      wal: trigger checkpoint if there are too many WALs · db8c7aa3
      Vladimir Davydov authored
      Closes #1082
      
      @TarantoolBot document
      Title: Document box.cfg.checkpoint_wal_threshold
      
      Tarantool makes checkpoints every box.cfg.checkpoint_interval seconds
      and keeps last box.cfg.checkpoint_count checkpoints. It also keeps all
      intermediate WAL files. Currently, it isn't possible to set a checkpoint
      trigger based on the sum size of WAL files, which makes it difficult to
      estimate the minimal amount of disk space that needs to be allotted to a
      Tarantool instance for storing WALs to eliminate the possibility of
      ENOSPC errors. For example, under normal conditions a Tarantool instance
      may write 1 GB of WAL files every box.cfg.checkpoint_interval seconds
      and so one may assume that 1 GB times box.cfg.checkpoint_count should be
      enough for the WAL partition, but there's no guarantee it won't write 10
      GB between checkpoints when the load is extreme.
      
      So we've agreed that we must provide users with one more configuration
      option that could be used to impose the limit on the sum size of WAL
      files. The new option is called box.cfg.checkpoint_wal_threshold. Once
      the configured threshold is exceeded, the WAL thread notifies the
      checkpoint daemon that it's time to make a new checkpoint and delete
      old WAL files. Note, the new option only limits the size of WAL files
      created since the last checkpoint, because backup WAL files are not
      needed for recovery and can be deleted in case of emergency ENOSPC, for
      more details see tarantool/tarantool#1082, tarantool/tarantool#3397,
      tarantool/tarantool#3822.
      
      The default value of the new option is 1 exabyte (10^18 byte), which
      actually means that the feature is disabled.
      db8c7aa3
    • Vladimir Davydov's avatar
      wal: pass struct instead of vclock to checkpoint methods · f33fbbf8
      Vladimir Davydov authored
      Currently, we only need to pass a vclock between TX and WAL during
      checkpointing. However, in order to implement auto-checkpointing
      triggered when WAL size exceeds a certain threshold, we will need to
      pass some extra info so that we can properly reset the counter
      accounting the WAL size in the WAL thread. To make it possible, let's
      move wal_checkpoint struct, which is used internally by WAL to pass a
      checkpoint vclock, to the header and require the caller to pass it to
      wal_begin/commit_checkpoint instead of just a vclock.
      f33fbbf8
    • Vladimir Davydov's avatar
      Rewrite checkpoint daemon in C · 4c04808a
      Vladimir Davydov authored
      Long time ago, when the checkpoint daemon was added to Tarantool, it was
      responsible not only for making periodic checkpoints, but also for
      maintaining the configured number of checkpoints and removing old snap
      and xlog files, so it was much easier to implement it in Lua than in C.
      However, over time, all its responsibilities have been reimplemented in
      C and moved to the server code so that now it just calls box.snapshot()
      periodically. Let's rewrite this simple procedure in C as well - this
      will allow us to easily add more complex logic there, e.g. triggering
      checkpoint when WAL files exceed a configured threshold.
      
      Note, this patch removes a few cases from xlog/checkpoint_daemon test
      that tested the internal state of the checkpoint daemon, which isn't
      available in Lua anymore. This is OK as those cases are covered by
      unit/checkpoint_schedule test.
      4c04808a
    • Vladimir Davydov's avatar
      Introduce checkpoint schedule module · 382568b1
      Vladimir Davydov authored
      This is a very simple module that incorporates the logic for calculating
      the time of the next scheduled checkpoint given the configured interval
      between checkpoints. It doesn't have any dependencies, which allows to
      cover it with a unit test. It will be used by the checkpoint daemon once
      we rewrite it in C. Rationale: in future we might want to introduce more
      complex rules for scheduling checkpoints (cron-like may be) and it will
      be really nice to have this logic neatly separated and tested.
      382568b1
    • Vladimir Davydov's avatar
      gc: some renames · bdb6825b
      Vladimir Davydov authored
      GC module is responsible not only for garbage collection, but also for
      tracking consumers and making checkpoints. Soon it will also incorporate
      the checkpoint daemon. Let's prefix all members related to the cleanup
      procedure accordingly to avoid confusion.
      bdb6825b
    • Vladimir Davydov's avatar
      box: move checkpointing to gc module · 4ef765d5
      Vladimir Davydov authored
      Garbage collection module seems to be the best way to accommodate the
      checkpoint daemon, but to move it there, we first need to move the code
      performing checkpoints there to avoid cross-dependency between box.cc
      and gc.c.
      4ef765d5
    • Vladimir Davydov's avatar
      box: don't use box_checkpoint_is_in_progress outside box.cc · b4650492
      Vladimir Davydov authored
      We only use box_checkpoint_is_in_progress in SIGUSR1 signal handler to
      print a warning in case checkpointing cannot be started, because it's
      already done by another fiber. Actually, it's not necessary to use it
      there - instead we can simply log the error returned by box_checkpoint,
      which will be self-explaining ER_CHECKPOINT_IN_PROGRESS in this case.
      So let's make box_checkpoint_is_in_progress private to box.cc - this
      will simplify moving the checkpoint daemon to the gc module.
      
      While we are at it, remove the unused snapshot_version declaration.
      b4650492
    • Vladimir Davydov's avatar
      box: fix certain cfg options initialized twice on recovery · a6f22d19
      Vladimir Davydov authored
      Certain dynamic configuration options are initialized right in box.cc,
      because they are needed for recovery. All such options are supposed to
      be present in dynamic_cfg_skip_at_load table so that load_cfg.lua won't
      try to set them again upon recovery completion. However, not all of them
      happen to be there - sometime we simply forgot to patch this table after
      introduction of a new configuration option. This patch adds all the
      missing ones except checkpoint_count - there's no point to initialize
      checkpoint_count in box.cc so it removes it from box.cc instead.
      a6f22d19
    • Vladimir Davydov's avatar
      wal: simplify watcher API · f2db4075
      Vladimir Davydov authored
      This patch reverts changes done in order to make WAL watcher API
      suitable for notiying TX about WAL garbage collection triggered on
      ENOSPC, namely:
      
       b073b017 wal: add event_mask to wal_watcher
       7077341e wal: pass wal_watcher_msg to wal_watcher callback
      
      We don't need them anymore, because now we piggyback the notification
      on the WAL request message that triggered ENOSPC.
      f2db4075
    • Vladimir Davydov's avatar
      gc: do not use WAL watcher API for deactivating stale consumers · d32166a6
      Vladimir Davydov authored
      The WAL thread may delete old WAL files if it gets ENOSPC error.
      Currently, we use WAL watcher API to notify the TX thread about it so
      that it can shoot off stale replicas. This looks ugly, because WAL
      watcher API was initially designed to propagate WAL changes to relay
      threads and the new event WAL_EVENT_GC, which was introduced for
      notifying about ENOSPC-driven garbage collection, isn't used anywhere
      else. Besides, there's already a pipe from WAL to TX - we could reuse it
      instead of opening another one.
      
      If we followed down that path, then in order to trigger a checkpoint
      from the WAL thread (see #1082), we would have to introduce yet another
      esoteric WAL watcher event, making the whole design look even uglier.
      That said, let's rewrite the garbage collection notification procedure
      using a plane callback instead of abusing WAL watcher API.
      d32166a6
    • Vladislav Shpilevoy's avatar
      netbox: fix wait_connected ignorance · a539fdc2
      Vladislav Shpilevoy authored
      After this patch d2468dac it became possible to
      wrap an existing connection into netbox API. A regular
      netbox.connect function was refactored so as to reuse
      connection establishment code.
      
      But connection should be established in a worker
      fiber, not in a caller's one. Otherwise it is
      impossible to do not wait for connect result.
      
      The patch just moves connection establishment into a
      worker fiber, without any functional changes.
      
      Closes #3856
      a539fdc2
    • Alexander Turenko's avatar
      Fix premature cdata collecting in luaT_pusherror() · 88de7c34
      Alexander Turenko authored
      This is follow up of 28c7e667 to fix
      luaT_pusherror() itself, not only luaT_error().
      
      Fixes #1955 (again).
      88de7c34
  2. Dec 06, 2018
    • Sergei Voronezhskii's avatar
      test: replication parallel mode on · f5c8b825
      Sergei Voronezhskii authored
      Part of #2436, #3232
      f5c8b825
    • Sergei Voronezhskii's avatar
      test: use wait_cond to check follow status · f41548b7
      Sergei Voronezhskii authored
      After setting timeouts in `box.cfg` and before making a `replace` needs
      to wait for replicas in `follow` status. Then if `wait_follow()` found
      not `follow` status it returns true. Which immediately causes an error.
      
      Fixes #3734
      Part of #2436, #3232
      f41548b7
    • Sergei Voronezhskii's avatar
      test: put require in proper places · d2f28afa
      Sergei Voronezhskii authored
      * put `require('fiber')` after each switch server command, because
        sometimes got 'fiber' not defined error
      * use `require('fio')` after `require('test_run').new()`, because
        sometimes got 'fio' not defined error
      
      Part of #2436, #3232
      d2f28afa
    • Sergei Voronezhskii's avatar
      test: errinj for pause relay_send · 1c34c91f
      Sergei Voronezhskii authored
      Instead of using timeout we need just pause `relay_send`. Can't rely
      on timeout because of various system load in parallel mode. Add new
      errinj which checks boolean in loop and until it is not `True` do not
      pass the method `relay_send` to the next statement.
      
      To check the read-only mode, need to make a modification of tuple. It
      is enough to call `replace` method. Instead of `delete` and then
      useless verification that we have not delete tuple by using `get`
      method.
      
      And lookup the xlog files in loop with a little sleep, until the file
      count is not as expected.
      
      Update box/errinj.result because new errinj was added.
      
      Part of #2436, #3232
      1c34c91f
    • Sergei Voronezhskii's avatar
      test: cleanup replication tests · 848a0b03
      Sergei Voronezhskii authored
      - at the end of tests which create any replication config need to call:
        * `test_run:cmd('delete server ...')` removes server object
          from `TestState.servers` list, this behaviour was taken
          from `test_run:drop_cluster()` function
        * `test_run:clenup_cluster()` which clears `box.space._cluster`
      - switch on `use_unix_sockets` because of 'Address already in use'
        problems
      - test `once` need to clean `once*` schemas
      
      Part of #2436, #3232
      848a0b03
    • Kirill Shcherbatov's avatar
      sql: fix tarantoolSqlite3TupleColumnFast · 2bfe8ac5
      Kirill Shcherbatov authored
      The tarantoolSqlite3TupleColumnFast routine used to lookup
      offset_slot in unallocated memory in some cases.
      The assert with exact_field_count same as motivation to change
      old correct assert with field_count in 7a8de281 is not correct.
      assert(format->exact_field_count == 0 ||
             fieldno < format->exact_field_count);
      The tarantoolSqlite3TupleColumnFast routine requires offset_slot
      that has been allocated during tuple_format_create call. This
      value is stored in indexed field with index that limited with
      index_field_count that is <= field_count. Look at
      tuple_format_alloc for more details.
      
      The format in cursor triggering valid assertion has such
      structure because first 4 tuples in _space: 257, 272, 276 and
      280 have an old format of _space with only one field
      (format->field_count == 1).
      It happens because these 4 tuples are recovered not after tuple
      with id 280 which stores actual format of _space. After tuple
      280 is recovered, an actual format is set in struct space of
      _space and all next tuples have full featured formats.
      
      So for these 4 tuples tarantoolSqlite3TupleColumnFast can fail
      even if a field exists, is indexed and has a name. Those
      features are just described in a newer format.
      (thank Gerold103 for problem explanation)
      
      Closes #3772
      2bfe8ac5
    • Kirill Shcherbatov's avatar
      sql: fix parser.parse_only mode for triggers · ac73e345
      Kirill Shcherbatov authored
      As the parse_only flag had not worked correctly for sql triggers
      sql_trigger_compile have had a Vdbe memory leak.
      
      Closes #3838
      ac73e345
    • Kirill Shcherbatov's avatar
      box: fix checkpoint_delete · 3c8330ea
      Kirill Shcherbatov authored
      The rlist_foreach_entry iterator was used for freeing resources.
      As a result there was dirty access to memory during next step of
      for-loop.
      Replaced with rlist_foreach_entry_safe valid for destructors.
      
      Closes #3858
      3c8330ea
  3. Dec 04, 2018
    • Vladislav Shpilevoy's avatar
      c6e5bf48
    • Vladislav Shpilevoy's avatar
      box: move info_handler interface into src/info · f1a114ca
      Vladislav Shpilevoy authored
      Box/info.h defines info_handler interface with a set
      of virtual functions. It allows to hide Lua from code
      not depending on this language, and is used in things
      like index:info(), box.info() to build Lua table with
      some info. But it does not depend on box/ so move it
      to src/.
      
      Also, this API is needed for the forthcoming SWIM
      module which is going to be placed into src/lib and
      needs info to dump its state to Lua from C without
      strict Lua dependency.
      
      @locker:
       - remove pointless _GNU_SOURCE definition from
         box/lua/info.c
       - remove luaT_info_handler_create declaration from
         box/lua/info.h
      
      Needed for #3234
      f1a114ca
  4. Dec 03, 2018
    • Vladimir Davydov's avatar
      lua: getpwall/getgrall error handling - follow-up fixes · a1606e91
      Vladimir Davydov authored
       - Add the forgotten errno(0) to getgrall.
       - Throw errors from getgrall/getpwall instead of returning nil in
         case the underlying system function fails.
       - Fix the error message in getgr.
       - Remove pointless and confusing asterisk sign from error messages.
       - Do not hide a stack frame on error.
      
      Follow-up efccac69 ("lua: fix error handling in getpwall and
      getgrall").
      a1606e91
    • Alexander Turenko's avatar
      lua: fix error handling in getpwall and getgrall · efccac69
      Alexander Turenko authored
      This commit fixes app-tap/pwd.test.lua test. It seems that the problem
      appears after updating to glibc-2.28.
      
      It seems that usual way to handle errors in Unix is to check errno only
      when a return value indicates possibility of an error.
      
      Related to #3766.
      efccac69
    • Alexander Turenko's avatar
      Remove deprecated getaddrinfo() flags · b601d0be
      Alexander Turenko authored
      AI_IDN_ALLOW_UNASSIGNED and AI_IDN_USE_STD3_ASCII_RULES flags are
      deprecated by glibc-2.28 and the deprecation warnings did cause fail of
      Debug build, because of -Werror.
      
      Fixes #3766.
      b601d0be
    • Vladislav Shpilevoy's avatar
      box: move port to src/ · 1730b39a
      Vladislav Shpilevoy authored
      Basic port structure does not depend on anything but
      standard types. It just gives an interface and calls
      virtual functions.
      
      Its location in box/ was ok since it was not used
      anywhere in src/. But next commits will add a new
      method to mpstream so as to dump port. Mpstream is
      implemented in src/, so lets move port over here.
      
      Needed for #3505
      1730b39a
    • Alexander Turenko's avatar
      test: fix app/fiber.test.lua flaky fails · 0e19478c
      Alexander Turenko authored
      Fixes #3852.
      0e19478c
    • Alexander Turenko's avatar
      test: fix hardcoded port in box/net.box.test.lua · f36568c0
      Alexander Turenko authored
      It allows to run the test many times in parallel to investigate flaky
      test failures and decreases probability that the test fails, because
      this port was already used by, say, some other test.
      f36568c0
    • Alexander Turenko's avatar
      test: fix http_client.test.lua with curl-7.62 · 10518cc1
      Alexander Turenko authored
      curl-7.61.1
      
      ```
      tarantool> require('http.client').new():get('http://localhost:0')
      ---
      - status: 595
        reason: Couldn't connect to server
      ```
      
      curl-7.62
      
      ```
      tarantool> require('http.client').new():get('http://localhost:0')
      ---
      - error: 'curl: URL using bad/illegal format or missing URL'
      ...
      ```
      
      curl-7.62 returns CURLE_URL_MALFORMAT is case of zero port and tarantool
      raises an error in the case. I think this behaviour is valid, so I fixed
      the test.
      10518cc1
  5. Nov 29, 2018
    • Vladimir Davydov's avatar
      gc: run garbage collection in background · 07191842
      Vladimir Davydov authored
      Currently, garbage collection is executed synchronously by functions
      that may trigger it, such as gc_consumer_advance or gc_add_checkpoint.
      As a result, one has to be very cautious when using those functions as
      they may yield at their will. For example, we can't shoot off stale
      consumers right in tx_prio handler - we have to use rather clumsy WAL
      watcher interface instead. Besides, in future, when the garbage
      collector state is persisted, we will need to call those functions from
      on_commit trigger callback, where yielding is not normally allowed.
      
      Actually, there's no reason to remove old files synchronously - we could
      as well do it in the background. So this patch introduces a background
      garbage collection fiber that executes gc_run when woken up. Now all
      functions that might trigger garbage collection wake up this fiber
      instead of executing gc_run directly.
      07191842
    • Vladimir Davydov's avatar
      recovery: restore garbage collector vclock after restart · baf28a59
      Vladimir Davydov authored
      After restart the garbage collector vclock is reset to the vclock of the
      oldest preserved checkpoint, which is incorrect - it may be less in case
      there is a replica that lagged behind, and it may be greater as well in
      case the WAL thread hit ENOSPC and had to remove some WAL files to
      continue. Fix it.
      
      A note about xlog/panic_on_wal_error test. To check that replication
      stops if some xlogs are missing, the test first removes xlogs on the
      master, then restarts the master, then tries to start the replica
      expecting that replication should fail. Well, it shouldn't - the replica
      should rebootstrap instead. It didn't rebootstrap before this patch
      though, because the master reported wrong garbage collector vclock (as
      it didn't recover it on restart). After this patch the replica would
      rebootstrap and the test would hang. Fix this by restarting the master
      before removing xlog files.
      baf28a59
    • Vladimir Davydov's avatar
      wal: remove files needed for recovery from backup checkpoints on ENOSPC · bd7f7116
      Vladimir Davydov authored
      Tarantool always keeps box.cfg.checkpoint_count latest checkpoints. It
      also never deletes WAL files needed for recovery from any of them for
      the sake of redundancy, even if it gets ENOSPC while trying to write to
      WAL. This patch changes that behavior: now the WAL thread is allowed to
      delete backup WAL files in case of emergency ENOSPC - after all it's
      better than stopping operation.
      
      Closes #3822
      bd7f7116
    • Vladimir Davydov's avatar
      wal: separate checkpoint and flush paths · 74d8db74
      Vladimir Davydov authored
      Currently, wal_checkpoint() is used for two purposes. First, to make a
      checkpoint (rotate = true). Second, to flush all pending WAL requests
      (rotate = false). Since checkpointing has to fail if cascading rollback
      is in progress so does flushing. This is confusing. Let's separate the
      two paths.
      
      While we are at it, let's also rewrite WAL checkpointing using cbus_call
      instead of cpipe_push as it's a more convenient way of exchanging simple
      two-hop messages between two threads.
      74d8db74
    • Kirill Shcherbatov's avatar
      json: some renames · b56103f5
      Kirill Shcherbatov authored
      We are planning to link json_path_node objects in a tree and attach some
      extra information to them so that they could be used to describe a json
      document structure. Let's rename it to json_token as it sounds more
      appropriate for the purpose.
      
      Also, rename json_path_parser to json_lexer as it isn't a parser,
      really, it's rather a tokenizer or lexer. Besides, the new name is
      shorter.
      
      Needed for #1012
      b56103f5
    • Vladimir Davydov's avatar
      test: fix vinyl/errinj spurious failure · 8e13153b
      Vladimir Davydov authored
      The failing test case checks that modifications done to the space during
      the final dump of a newly built index are recovered properly. It assumes
      that a series of operations will complete in 0.1 seconds, but it may not
      happen if the disk is slow (like on Travis CI). This results in spurious
      failures. To fix this issue, let's replace ERRINJ_VY_RUN_WRITE_TIMEOUT
      used by the test with ERRINJ_VY_RUN_WRITE_DELAY, which blocks index
      creation until it is disabled instead of injecting a time delay as its
      predecessor did.
      
      Closes #3756
      8e13153b
    • Konstantin Osipov's avatar
      Don't repeast SQL stress tests with vinyl engine. · 6e07131d
      Konstantin Osipov authored
      These are stress testing some of the parser/vdbe features, no point
      in replaying them against vinyl. They could just as well run in
      wal_mode="none"
      6e07131d
    • Konstantin Osipov's avatar
      Disable gh-3332-tuple-format-leak.test, gh-3083-ephemeral-unref-tuples.test · 52a212f3
      Konstantin Osipov authored
      Disable these tests in regular suite until they are sped up in scope
      of gh-3845
      52a212f3
    • Ilya Markov's avatar
      lua: moving lua error functions to separate file · 27a04953
      Ilya Markov authored
      Refactoring. Move lua error functions to a separate file.
      
      A prerequisite for #677
      27a04953
    • Sergei Voronezhskii's avatar
      test: skip test backtrace if no libunwind support · 2aa25ba5
      Sergei Voronezhskii authored
      Closes #3824
      2aa25ba5
Loading