Skip to content
Snippets Groups Projects
  1. Oct 25, 2018
    • Vladimir Davydov's avatar
      wal: delete old wal files when running out of disk space · 8a1bdc82
      Vladimir Davydov authored
      Now if the WAL thread fails to preallocate disk space needed to commit
      a transaction, it will delete old WAL files until it succeeds or it
      deletes all files that are not needed for local recovery from the oldest
      checkpoint. After it deletes a file, it notifies the garbage collector
      via the WAL watcher interface. The latter then deactivates consumers
      that would need deleted files.
      
      The user doesn't see a ENOSPC error if the WAL thread successfully
      allocates disk space after deleting old files. Here's what's printed
      to the log when this happens:
      
        wal/101/main C> ran out of disk space, try to delete old WAL files
        wal/101/main I> removed /home/vlad/src/tarantool/test/var/001_replication/master/00000000000000000005.xlog
        wal/101/main I> removed /home/vlad/src/tarantool/test/var/001_replication/master/00000000000000000006.xlog
        wal/101/main I> removed /home/vlad/src/tarantool/test/var/001_replication/master/00000000000000000007.xlog
        main/105/main C> deactivated WAL consumer replica 82d0fa3f-6881-4bc5-a2c0-a0f5dcf80120 at {1: 5}
        main/105/main C> deactivated WAL consumer replica 98dce0a8-1213-4824-b31e-c7e3c4eaf437 at {1: 7}
      
      Closes #3397
      8a1bdc82
  2. Oct 24, 2018
    • Vladimir Davydov's avatar
      vinyl: account disk statements of each type · b2f85642
      Vladimir Davydov authored
      This patch adds a new entry to per index statistics reported by
      index.stat():
      
        disk.statement
          inserts
          replaces
          deletes
          upserts
      
      It shows the number of statements of each type stored in run files.
      The new statistics are persisted in index files. We will need this
      information so that we can force major compaction when there are too
      many DELETE statements accumulated in run files.
      
      Needed for #3225
      b2f85642
    • Vladimir Davydov's avatar
      tuple: zap tuple_extra · e65ba254
      Vladimir Davydov authored
      tuple_extra() allows to store arbitrary metadata inside tuples.
      To use it, one should set extra_size when creating a tuple_format.
      It was introduced for storing UPSERT counter or column mask inside
      vinyl statements. Turned out that it wasn't really needed as UPSERT
      counter can be stored on lsregion while column mask doesn't need to
      be stored at all.
      
      Actually, the whole idea of tuple_extra() is rather crooked: why
      would we need it if we can inherit struct tuple instead, as we do
      in case of memtx_tuple and vy_stmt? Accessing an inherited struct
      is much more convenient than using tuple_extra().
      
      So this patch gets rid of tuple_extra(). To do that, it partially
      reverts the following commits:
      
      6c0842e0 vinyl: refactor vy_stmt_alloc()
      74ff46d8 vinyl: add special format for tuples with column mask
      11eb7816 Add extra size to tuple_format->field_map_size
      e65ba254
    • Vladimir Davydov's avatar
      vinyl: zap vy_stmt_column_mask and mem_format_with_colmask · 08afd57f
      Vladimir Davydov authored
      Finally, these atrocities are not used anywhere and can be removed.
      08afd57f
    • Vladimir Davydov's avatar
      vinyl: move update optimization from write iterator to tx · 9d0ccd66
      Vladimir Davydov authored
      An UPDATE operation is written as DELETE + REPLACE to secondary indexes.
      We write those statements to the memory level even if the UPDATE doesn't
      actually update columns indexed by a secondary key. We filter them out
      in the write iterator when the memory level is dumped. That's what we
      use vy_stmt_column_mask for.
      
      Actually, there's no point to keep those statements until dump - we
      could as well filter them out when the transaction is committed. This
      would even save some memory. This wouldn't hurt read operations, because
      point lookup doesn't work for secondary indexes by design and so we have
      to read all sources, including disk, on every read from a secondary
      index.
      
      That said, let's move update optimization from the write iterator to
      vy_tx_commit. This is a step towards removing vy_stmt_column_mask.
      9d0ccd66
  3. Oct 13, 2018
    • Vladimir Davydov's avatar
      replication: fix rebootstrap crash in case master has replica's rows · d4ce7447
      Vladimir Davydov authored
      During SUBSCRIBE the master sends only those rows originating from the
      subscribed replica that aren't present on the replica. Such rows may
      appear after a sudden power loss in case the replica doesn't issue
      fdatasync() after each WAL write, which is the default behavior. This
      means that a replica can write some rows to WAL, relay them to another
      replica, then stop without syncing WAL file. If this happens we expect
      the replica to read its own rows from other members of the cluster upon
      restart. For more details see commit eae84efb ("replication: recover
      missing local data from replica").
      
      Obviously, this feature only makes sense for SUBSCRIBE. During JOIN
      we must relay all rows. This is how it initially worked, but commit
      adc28591 ("replication: do not delete relay on applier disconnect"),
      witlessly removed the corresponding check from relay_send_row() so that
      now we don't send any rows originating from the joined replica:
      
        @@ -595,8 +630,7 @@ relay_send_row(struct xstream *stream, struct xrow_header *packet)
                 * it). In the latter case packet's LSN is less than or equal to
                 * local master's LSN at the moment it received 'SUBSCRIBE' request.
                 */
        -       if (relay->replica == NULL ||
        -           packet->replica_id != relay->replica->id ||
        +       if (packet->replica_id != relay->replica->id ||
                    packet->lsn <= vclock_get(&relay->local_vclock_at_subscribe,
                                              packet->replica_id)) {
                        relay_send(relay, packet);
      
      (relay->local_vclock_at_subscribe is initialized to 0 on JOIN)
      
      This only affects the case of rebootstrap, automatic or manual, because
      when a new replica joins a cluster there can't be any rows on the master
      originating from it. On manual rebootstrap, i.e. when the replica files
      are deleted by the user and the replica is restarted from an empty
      directory with the same UUID (set via box.cfg.instance_uuid), this isn't
      critical - the replica will still receive those rows it should have
      received during JOIN once it subscribes. However, in case of automatic
      rebootstrap this can result in broken order of xlog/snap files, because
      the replica directory still contains old xlog/snap files created before
      rebootstrap. The rebootstrap logic expects them to have strictly less
      vclocks than new files, but if JOIN stops prematurely, this condition
      may not hold, leading to a crash when the vclock of a new xlog/snap is
      inserted into the corresponding xdir.
      
      This patch fixes this issue by restoring pre eae84efb behavior: now
      we create a new relay for FINAL JOIN instead of reusing the one attached
      to the joined replica so that relay_send_row() can detect JOIN phase and
      relay all rows in this case. It also adds a comment so that we don't
      make such a mistake in future.
      
      Apart from fixing the issue, this patch also fixes a relay leak in
      relay_initial_join() in case engine_join_xc() fails, which was also
      introduced by the above mentioned commit.
      
      A note about xlog/panic_on_broken_lsn test. Now the relay status isn't
      reported by box.info.replication if FINAL JOIN failed and the replica
      never subscribed (this is how it worked before commit eae84efb) so
      we need to tweak the test a bit to handle this.
      
      Closes #3740
      d4ce7447
  4. Oct 12, 2018
    • Vladimir Davydov's avatar
      vinyl: implement basic transaction throttling · c0d8063b
      Vladimir Davydov authored
      If the rate at which transactions are ready to write to the database is
      greater than the dump bandwidth, memory will get depleted before the
      previously scheduled dump is complete and all newer transactions will
      have to wait, which may take seconds or even minutes:
      
        W> waited for 555 bytes of vinyl memory quota for too long: 15.750 sec
      
      This patch set implements basic transaction throttling that is supposed
      to help avoid unpredictably long stalls. Now the transaction write rate
      is always capped by the observed dump bandwidth, because it doesn't make
      sense to consume memory at a greater rate than it can be freed. On top
      of that, when a dump begins, we estimate the amount of time it is going
      to take and limit the transaction write rate accordingly.
      
      Note, this patch doesn't take into account compaction when setting the
      rate limit so compaction threads may still fail to keep up with dumps,
      increasing the read amplification. It will be addressed later.
      
      Closes #1862
    • Vladimir Davydov's avatar
      vinyl: bypass format validation for statements loaded from disk · 3846d9b2
      Vladimir Davydov authored
      When the format of a space is altered, we walk over all tuples stored in
      the primary index and check them against the new format. This doesn't
      guarantee that all *statements* stored in the primary index conform to
      the new format though, because the check isn't performed for deleted or
      overwritten statements, e.g.
      
        s = box.schema.space.create('test', {engine = 'vinyl'})
        s:create_index('primary')
        s:insert{1}
        box.snapshot()
        s:delete{1}
      
        -- The following command will succeed, because the space is empty,
        -- however one of the runs contains REPLACE{1}, which doesn't conform
        -- to the new format.
        s:create_index('secondary', {parts = {2, 'unsigned'}})
      
      This is OK as we will never return such overwritten statements to the
      user, however we may still need to read them. Currently, this leads
      either to an assertion failure or to a read error in
      
        vy_stmt_decode
         vy_stmt_new_with_ops
          tuple_init_field_map
      
      We could probably force major compaction of the primary index to purge
      such statements, but it is complicated as there may be a read view
      preventing the write iterator from squashing such a statement, and
      currently there's no way to force destruction of a read view.
      
      So this patch simply disables format validation for all tuples loaded
      from disk (actually we already skip format validation for all secondary
      index statements and for DELETE statements in primary indexes so this
      isn't as bad as it may seem). To do that, it adds a boolean parameter to
      tuple_init_field_map() that disables format validation, and then makes
      vy_stmt_new_with_ops(), which is used for constructing vinyl statements,
      set it to false. This is OK as all statements inserted into a vinyl
      space are validated explicitly with tuple_validate() anyway.
      
      This is rather a workaround for the lack of a better solution.
      
      Closes #3540
      3846d9b2
    • Vladimir Davydov's avatar
      test: fix spurious box/sql test failure · 23e71c6e
      Vladimir Davydov authored
      For some reason this test uses 555 for space id, which may be taken by
      a previously created space:
      
      Test failed! Result content mismatch:
      --- box/sql.result        Fri Oct  5 17:23:25 2018
      +++ box/sql.reject        Fri Oct 12 19:38:51 2018
      @@ -12,12 +12,14 @@
       ...
       _ = box.schema.space.create('test1', { id = 555 })
       ---
      +- error: Duplicate key exists in unique index 'primary' in space '_space'
       ...
      
      Reproduce file:
      
      ---
      - [box/rtree_point.test.lua, null]
      - [box/transaction.test.lua, null]
      - [box/tree_pk.test.lua, null]
      - [box/access.test.lua, null]
      - [box/cfg.test.lua, null]
      - [box/admin.test.lua, null]
      - [box/lua.test.lua, null]
      - [box/bitset.test.lua, null]
      - [box/role.test.lua, null]
      - [box/sql.test.lua, null]
      ...
      
      Remove { id = 555 } to make sure it never happens.
      23e71c6e
  5. Oct 10, 2018
    • Georgy Kirichenko's avatar
      socket: fix polling in case of spurious wakeup · e6bd7748
      Georgy Kirichenko authored
      socket_writable/socket_readable handles socket.iowait spurious wakeup
      until event is happened or timeout is exceeded.
      
      Closes #3344
      e6bd7748
    • Vladimir Davydov's avatar
      vinyl: fix for deferred DELETE overwriting newer statement · 63912c30
      Vladimir Davydov authored
      A deferred DELETE may be generated after a newer statement for the same
      key was inserted into a secondary index and hence land in a newer run.
      Since the read iterator assumes that newer sources always contain newer
      statements for the same key, we mark all deferred DELETE statements with
      VY_STMT_SKIP_READ flag, which makes run/mem iterators ignore them. The
      flag must be persisted when a statement is written to disk, but it is
      not. Fix this.
      
      Fixes commit 504bc805 ("vinyl: do not store meta in secondary index
      runs").
      63912c30
    • Alexander Turenko's avatar
      test: disable feedback daemon test on Mac OS in CI · ab868a6b
      Alexander Turenko authored
      The fail is known and should not have any influence on our CI results.
      
      The test should be enabled back after a fix of #3558.
      ab868a6b
  6. Oct 06, 2018
  7. Oct 05, 2018
    • Vladimir Davydov's avatar
      replication: ref checkpoint needed to join replica · bae6f037
      Vladimir Davydov authored
      Before joining a new replica we register a gc_consumer to prevent
      garbage collection of files needed for join and following subscribe.
      Before commit 9c5d851d ("replication: remove old snapshot files not
      needed by replicas") a consumer would pin both checkpoints and WALs so
      that would work as expected. However, the above mentioned commit
      introduced consumer types and marked a consumer registered on replica
      join as WAL-only so if the garbage collector was invoked during join, it
      could delete files corresponding to the relayed checkpoint resulting in
      replica join failure. Fix this issue by pinning the checkpoint used for
      joining a replica with gc_ref_checkpoint and unpinning once join is
      complete.
      
      The issue can only be reproduced if there are vinyl spaces, because
      deletion of an open snap file doesn't prevent the relay from reading it.
      The existing replication/gc test would catch the issue if it triggered
      compaction on the master so we simply tweak it accordingly instead of
      adding a new test case.
      
      Closes #3708
      bae6f037
    • Vladimir Davydov's avatar
      vinyl: force deletion of runs left from unfinished indexes on restart · eb6280d0
      Vladimir Davydov authored
      If an instance is restarted while building a new vinyl index, there will
      probably be some run files left. Currently, we won't delete such files
      until box.snapshot() is called, even though there's no point in keeping
      them around. Let's tweak vy_gc_lsm() so that it marks all runs that
      belong to an unfinished index as incomplete to force vy_gc() to remove
      them immediately after recovery is complete.
      
      This also removes files left from a failed rebootstrap attempt so we can
      remove a call to box.snapshot() from vinyl/replica_rejoin.test.lua.
      eb6280d0
    • Vladimir Davydov's avatar
      vinyl: fix master crash on replica join failure · 626dfb2c
      Vladimir Davydov authored
      This patch fixes a trivial error on vy_send_range() error path which
      results in a master crash in case a file needed to join a replica is
      missing or corrupted.
      
      See #3708
      626dfb2c
  8. Oct 03, 2018
    • Vladislav Shpilevoy's avatar
      utf8: allow empty strings in utf8.upper/lower · 129099bc
      Vladislav Shpilevoy authored
      Closes #3709
      129099bc
    • Olga Arkhangelskaia's avatar
      replication: fix assertion with duplicate connection · 03a9bb1a
      Olga Arkhangelskaia authored
      Patch fixes behavior when replica tries to connect to the same master
      more than once. In case when it is initial configuration we raise the
      exception. If it in not initial config we print the error and disconnect
      the applier.
      
      @locker: minor test cleanup.
      
      Closes #3610
      03a9bb1a
    • Vladimir Davydov's avatar
      vinyl: zap vy_env::memory, read_threads, and write_threads · 69a4b786
      Vladimir Davydov authored
      They are only used to set corresponding members of vy_quota, vy_run_env,
      and vy_scheduler when vy_env is created. No point in keeping them around
      all the time.
      69a4b786
    • Vladimir Davydov's avatar
      vinyl: factor load regulator out of quota · 90ffaa8d
      Vladimir Davydov authored
      Turned out that throttling isn't going to be as simple as maintaining
      the write rate below the estimated dump bandwidth, because we also need
      to take into account whether compaction keeps up with dumps. Tracking
      compaction progress isn't a trivial task and mixing it in a module
      responsible for resource limiting, which vy_quota is, doesn't seem to be
      a good idea. Let's factor out the related code into a separate module
      and call it vy_regulator. Currently, the new module only keeps track of
      the write rate and the dump bandwidth and sets the memory watermark
      accordingly, but soon we will extend it to configure throttling as well.
      
      Since write rate and dump bandwidth are now a part of the regulator
      subsystem, this patch renames 'quota' entry of box.stat.vinyl() to
      'regulator'. It also removes 'quota.usage' and 'quota.limit' altogether,
      because memory usage is reported under 'memory.level0' while the limit
      can be read from box.cfg.vinyl_memory, and renames 'use_rate' to
      'write_rate', because the latter seems to be a more appropriate name.
      
      Needed for #1862
      90ffaa8d
  9. Sep 26, 2018
    • Vladimir Davydov's avatar
      replication: don't stop syncing on configuration errors · 4baa71bc
      Vladimir Davydov authored
      When replication is restarted with the same replica set configuration
      (i.e. box.cfg{replication = box.cfg.replication}), there's a chance that
      an old relay will be still running on the master at the time when a new
      applier tries to subscribe. In this case the applier will get an error:
      
        main/152/applier/localhost:62649 I> can't join/subscribe
        main/152/applier/localhost:62649 xrow.c:891 E> ER_CFG: Incorrect value for
            option 'replication': duplicate connection with the same replica UUID
      
      Such an error won't stop the applier - it will keep trying to reconnect:
      
        main/152/applier/localhost:62649 I> will retry every 1.00 second
      
      However, it will stop synchronization so that box.cfg() will return
      without an error, but leave the replica in the orphan mode:
      
        main/151/console/::1:42606 C> failed to synchronize with 1 out of 1 replicas
        main/151/console/::1:42606 C> entering orphan mode
        main/151/console/::1:42606 I> set 'replication' configuration option to
          "localhost:62649"
      
      In a second, the stray relay on the master will probably exit and the
      applier will manage to subscribe so that the replica will leave the
      orphan mode:
      
        main/152/applier/localhost:62649 C> leaving orphan mode
      
      This is very annoying, because there's no need to enter the orphan mode
      in this case - we could as well keep trying to synchronize until the
      applier finally succeeds to subscribe or replication_sync_timeout is
      triggered.
      
      So this patch makes appliers enter "loading" state on configuration
      errors, the same state they enter if they detect that bootstrap hasn't
      finished yet. This guarantees that configuration errors, like the one
      above, won't break synchronization and leave the user gaping at the
      unprovoked orphan mode.
      
      Apart from the issue in question (#3636), this patch also fixes spurious
      replication-py/multi test failures that happened for exactly the same
      reason (#3692).
      
      Closes #3636
      Closes #3692
      4baa71bc
  10. Sep 25, 2018
    • Serge Petrenko's avatar
      recovery: fix incorrect handling of empty-body requests. · f8956e05
      Serge Petrenko authored
      In some cases no-ops are written to xlog. They have no effect but are
      needed to bump lsn.
      
      Some time ago (see commit 89e5b784) such
      ops were made bodiless, and empty body requests are not handled in
      xrow_header_decode(). This leads to recovery errors in special case:
      when we have a multi-statement transaction containing no-ops written to
      xlog, upon recovering from such xlog, all data after the no-op end till
      the start of new transaction will become no-op's body, so, effectively,
      it will be ignored. Here's example `tarantoolctl cat` output showing
      this (BODY contains next request data):
      
          ---
          HEADER:
            lsn: 5
            replica_id: 1
            type: NOP
            timestamp: 1536656270.5092
          BODY:
            type: 3
            timestamp: 1536656270.5092
            lsn: 6
            replica_id: 1
          ---
          HEADER:
            type: 0
          ...
      
      This patch handles no-ops correctly in xrow_header_decode().
      
      @locker: refactored the test case so as not to restart the server for
      a second time.
      
      Closes #3678
      f8956e05
    • Serge Petrenko's avatar
      tarantoolctl: fix cat and play for empty body requests · 24a87ff2
      Serge Petrenko authored
      If space.before_replace returns the old tuple, the operation turns into
      no-op, but is still written to WAL as IPROTO_NOP for the sake of
      replication. Such a request doesn't have a body, and tarantoolctl failed
      to parse such requests in `tarantoolctl cat` and `tarantoolctl play`.
      Fix this by checking whether a request has a body. Also skip such
      requests in `play`, since they have no effect, and, while we're at it,
      make sure `play` and `cat` do not read excess rows with lsn>=to in case
      these rows are skipped.
      
      Closes #3675
      24a87ff2
  11. Sep 22, 2018
  12. Sep 21, 2018
  13. Sep 20, 2018
  14. Sep 19, 2018
    • Vladimir Davydov's avatar
      vinyl: add global disk stats · fe06b124
      Vladimir Davydov authored
      This patch adds some essential disk statistics that are already
      collected and reported on per index basis to box.stat.vinyl().
      The new statistics are shown under the 'disk' section and currently
      include the following fields:
      
       - data: size of data stored on disk.
       - index: size of index stored on disk.
       - dump.in: size of dump input.
       - dump.out: size of dump output.
       - compact.in: size of compaction input.
       - compact.out: size of compaction output.
       - compact.queue: size of compaction queue.
      
      All the counters are given in bytes without taking into account
      disk compression. Dump/compaction in/out counters can be reset with
      box.stat.reset().
      fe06b124
    • Vladimir Davydov's avatar
      vinyl: keep track of compaction queue length · 06e70cad
      Vladimir Davydov authored
      Currently, there's no way to figure out whether compaction keeps up
      with dumps or not while this is essential for implementing transaction
      throttling. This patch adds a metric that is supposed to help answer
      this question. This is the compaction queue size. It is calculated per
      range and per LSM tree as the total size of slices awaiting compaction.
      We update the metric along with the compaction priority of a range, in
      vy_range_update_compact_priority(), and account it to an LSM tree in
      vy_lsm_acct_range(). For now, the new metric is reported only on per
      index basis, in index.stat() under disk.compact.queue.
      06e70cad
    • Vladimir Davydov's avatar
      vinyl: report pages and bytes_compressed in dump/compact in/out stats · 8a1e507d
      Vladimir Davydov authored
      There's no reason not to report pages and bytes_compressed under
      disk.stat.dump.out and disk.stat.compact.{in,out} apart from using
      the same struct for dump and compaction statistics (vy_compact_stat).
      The statistics are going to differ anyway once compaction queue size
      is added to disk.stat.compact so let's zap struct vy_compact_stat
      and report as much info as we can.
      8a1e507d
  15. Sep 17, 2018
    • Serge Petrenko's avatar
      lua: fix assertion failure after an error in box.session.su() · ac77418f
      Serge Petrenko authored
      If some error occured during execution of a function called from
      box.session.su(), we assumed that fiber diagnostics area was not empty,
      and tried to print an error message using data from the diagnostics.
      However, this assumption is not true when some lua error happens.
      Imagine such a case:
      
        box.session.su('admin', function(x) return #x end, 3)
      
      A lua error would be pushed on the stack but the diagnostics would be
      empty, and we would get an assertion failure when trying to print the
      error message. Handle this by using lua_error() instead of luaT_error().
      
      Closes #3659
      ac77418f
  16. Sep 15, 2018
    • Alexander Turenko's avatar
      Fix Debug build on GCC 8 · 8c538963
      Alexander Turenko authored
      Fixed false positive -Wimplicit-fallthrough in http_parser.c by adding a
      break. The code jumps anyway, so the execution flow is not changed.
      
      Fixed false positive -Wparenthesis in reflection.h by removing the
      parentheses. The argument 'method' of the macro 'type_foreach_method' is
      just name of the loop variable and is passed to the macro for
      readability reasons.
      
      Fixed false positive -Wcast-function-type triggered by reflection.h by
      adding -Wno-cast-function-type for sources and unit tests. We cast a
      pointer to a member function to an another pointer to member function to
      store it in a structure, but we cast it back before made a call. It is
      legal and does not lead to an undefined behaviour.
      
      Fixes #3685.
      Unverified
      8c538963
  17. Sep 14, 2018
    • AKhatskevich's avatar
      Fix http error test · 76a8bd32
      AKhatskevich authored
      The test expected that http:get yields, however, in case of
      very fast unix_socket and parallel test execution, a context
      switch during the call lead to absence of yield and to instant
      reply. That caused an error during `fiber:cancel`.
      
      The problem is solved by increasing http server response time.
      
      Closes #3480
      76a8bd32
  18. Sep 13, 2018
    • Roman Khabibov's avatar
      json: add options to json.encode() · 1663bdc4
      Roman Khabibov authored
      Add an ability to pass options to json.encode()/decode().
      
      Closes: #2888.
      
      @TarantoolBot document
      Title: json.encode() json.decode()
      Add an ability to pass options to
      json.encode() and json.decode().
      These are the same options that
      are used globally in json.cfg().
      1663bdc4
  19. Sep 09, 2018
    • Vladimir Davydov's avatar
      vinyl: add global memory stats · e78ebb77
      Vladimir Davydov authored
      box.info.memory() gives you some insight on what memory is used for,
      but it's very coarse. For vinyl we need finer grained global memory
      statistics.
      
      This patch adds such: they are reported under box.stat.vinyl().memory
      and consist of the following entries:
      
       - level0: sum size of level-0 of all LSM trees.
       - tx: size of memory used by tx write and read sets.
       - tuple_cache: size of memory occupied by tuple cache.
       - page_index: size of memory used for storing page indexes.
       - bloom_filter: size of memory used for storing bloom filters.
      
      It also removes box.stat.vinyl().cache, as the size of cache is now
      reported under memory.tuple_cache.
      e78ebb77
    • Vladimir Davydov's avatar
      vinyl: fix accounting of secondary index cache statements · 16faada1
      Vladimir Davydov authored
      Since commit 0c5e6cc8 ("vinyl: store full tuples in secondary index
      cache"), we store primary index tuples in secondary index cache, but we
      still account them as separate tuples. Fix that.
      
      Follow-up #3478
      Closes #3655
      16faada1
    • Vladimir Davydov's avatar
      vinyl: set box.cfg.vinyl_write_threads to 4 by default · fe1e4694
      Vladimir Davydov authored
      Any LSM-based database design implies high level of write amplification
      so there should be more compaction threads than dump threads. With the
      default value of 2 for box.cfg.vinyl_write_threads, which we have now,
      we start only one compaction thread. Let's increase the default up to 4
      so that there are three compaction threads started by default, because
      it fits better LSM-based design.
      fe1e4694
  20. Sep 04, 2018
    • Vladimir Davydov's avatar
      box: sync on replication configuration update · 113ade24
      Vladimir Davydov authored
      Now box.cfg() doesn't return until 'quorum' appliers are in sync not
      only on initial configuration, but also on replication configuration
      update. If it fails to synchronize within replication_sync_timeout,
      box.cfg() returns without an error, but the instance enters 'orphan'
      state, which is basically read-only mode. In the meantime, appliers
      will keep trying to synchronize in the background, and the instance
      will leave 'orphan' state as soon as enough appliers are in sync.
      
      Note, this patch also changes logging a bit:
       - 'ready to accept request' is printed on startup before syncing
         with the replica set, because although the instance is read-only
         at that time, it can indeed accept all sorts of ro requests.
       - For 'connecting', 'connected', 'synchronizing' messages, we now
         use 'info' logging level, not 'verbose' as they used to be, because
         those messages are important as they give the admin idea what's
         going on with the instance, and they can't flood logs.
       - 'sync complete' message is also printed as 'info', not 'crit',
         because there's nothing critical about it (it's not an error).
      
      Also note that we only enter 'orphan' state if failed to synchronize.
      In particular, if the instnace manages to synchronize with all replicas
      within a timeout, it will jump from 'loading' straight into 'running'
      bypassing 'orphan' state. This is done for the sake of consistency
      between initial configuration and reconfiguration.
      
      Closes #3427
      
      @TarantoolBot document
      Title: Sync on replication configuration update
      The behavior of box.cfg() on replication configuration update is
      now consistent with initial configuration, that is box.cfg() will
      not return until it synchronizes with as many masters as specified
      by replication_connect_quorum configuration option or the timeout
      specified by replication_connect_sync occurs. On timeout, it will
      return without an error, but the instance will enter 'orphan' state.
      It will leave 'orphan' state as soon as enough appliers have synced.
    • Olga Arkhangelskaia's avatar
      box: add replication_sync_timeout configuration option · ca9fc33a
      Olga Arkhangelskaia authored
      In the scope of #3427 we need timeout in case if an instance waits for
      synchronization for too long, or even forever. Default value is 300.
      
      Closes #3674
      
      @locker: moved dynamic config check to box/cfg.test.lua; code cleanup
      
      @TarantoolBot document
      Title: Introduce new configuration option replication_sync_timeout
      After initial bootstrap or after replication configuration changes we
      need to sync up with replication quorum. Sometimes sync can take too
      long or replication_sync_lag can be smaller than network latency we
      replica will stuck in sync loop that can't be cancelled.To avoid this
      situations replication_sync_timeout can be used. When time set in
      replication_sync_timeout is passed replica enters orphan state.
      Can be set dynamically. Default value is 300 seconds.
      ca9fc33a
Loading