Skip to content
Snippets Groups Projects
  1. May 29, 2018
  2. May 25, 2018
    • Konstantin Belyavskiy's avatar
      replication: display downstream status at upstream · 3db1dee9
      Konstantin Belyavskiy authored
      This fix improves 'box.info.replication' output.
      If downstream fails and thus disconnects from upstream, improve
      logging by printing 'status: disconnected' and error message on
      both sides (master and replica).
      
      Closes #3365
      3db1dee9
    • Konstantin Belyavskiy's avatar
      replication: do not delete relay on applier disconnect · adc28591
      Konstantin Belyavskiy authored
      This is a part of more complex task aiming to improve logging.
      Do not destroy relay since it stores last error and it can be
      useful for diagnostic reason.
      Now relay is created with replica and always exists. So also
      remove several NULL checks.
      Add relay_state { OFF, FOLLOW and STOPPED } to track replica
      presence, once connected it either FOLLOW or STOPPED until
      master is reset.
      Updated with @kostja proposal.
      
      Used for #3365.
      adc28591
    • Konstantin Osipov's avatar
    • Vladimir Davydov's avatar
      vinyl: purge dropped indexes from vylog on garbage collection · a2d1d2a2
      Vladimir Davydov authored
      Currently, when an index is dropped, we remove all ranges/slices
      associated with it and mark all runs as dropped in vylog immediately.
      To find ranges/slices/runs, we use vy_lsm struct, see vy_log_lsm_prune.
      
      The problem is vy_lsm struct may be inconsistent with the state stored
      in vylog if index drop races with compaction, because we first write
      changes done by compaction task to vylog and only then update vy_lsm
      struct, see vy_task_compact_complete. Since write to vylog yields, this
      opens a time window during which the index can be dropped. If this
      happens, objects that were created by compaction but haven't been logged
      yet (such as new runs, slices, ranges) will be deleted from vylog by
      index drop, and this will permanently break vylog, making recovery
      impossible.
      
      To fix this issue, let's rework garbage collection of objects associated
      with dropped indexes as follows. Now when an index is dropped, we write
      a single record to vylog, VY_LOG_DROP_LSM, i.e. just mark the index as
      dropped without deleting associated objects. Actual index cleanup takes
      place in the garbage collection procedure, see vy_gc, which purges all
      ranges/slices linked to marked indexes from vylog and marks all their
      runs as dropped. When all runs are actually deleted from disk and
      "forgotten" in vylog, we remove the index record from vylog by writing
      VY_LOG_FORGET_LSM record. Since garbage collection procedure uses vylog
      itself instead of vy_lsm struct for iterating over vinyl objects, no
      race between index drop and dump/compaction can now lead to broken
      vylog.
      
      Closes #3416
      a2d1d2a2
    • Vladimir Davydov's avatar
      vinyl: store lsn of index drop record in vylog · 264f7e3f
      Vladimir Davydov authored
      This is required to rework garbage collection in vinyl.
      264f7e3f
    • Vladimir Davydov's avatar
      alter: pass lsn of index drop record to engine · 1af04afe
      Vladimir Davydov authored
      We pass lsn of index alter/create records, let's pass lsn of drop record
      for consistency. This is also needed by vinyl to store it in vylog (see
      the next patch).
      1af04afe
    • Vladimir Davydov's avatar
      vinyl: do not reuse lsm objects during recovery from vylog · 31ab8e03
      Vladimir Davydov authored
      If an index was dropped and then recreated, then while replaying vylog
      we will reuse vy_lsm_recovery_info object corresponding to it. There's
      no reason why we do that instead of simply allocating a new object -
      amount of memory saved is negligible, but the code looks more complex.
      Let's simplify the code - whenever we see VY_LOG_CREATE_LSM, create a
      new vy_lsm_recovery_info object and replace the old incarnation if any
      in the hash map.
      31ab8e03
    • Konstantin Osipov's avatar
      test: update replication_connect_timeout in tests to a lower value · e9bf00fc
      Konstantin Osipov authored
      replication: make replication_connect_timeout dynamic
      e9bf00fc
    • Konstantin Osipov's avatar
    • Vladimir Davydov's avatar
      test: rework test case for memtx async garbage collection · c5f98b91
      Vladimir Davydov authored
      Do not use errinj as it is unreliable. Check that:
       - No memory is freed by immediately after space drop (WAL is off).
       - All memory is freed asynchronously after yield.
      c5f98b91
    • Vladimir Davydov's avatar
      replication: fix log message in case of sync failure · 6c35bf9b
      Vladimir Davydov authored
      replicaset_sync() returns not only if the instance synchronized to
      connected replicas, but also if some replicas have disconnected and
      the quorum can't be formed any more. Nevertheless, it always prints
      that sync has been completed. Fix it.
      
      See #3422
      6c35bf9b
    • Vladimir Davydov's avatar
      replication: do not stop syncing if replicas are loading · 1785e79c
      Vladimir Davydov authored
      If a replica disconnects while sync is in progress, box.cfg{} may stop
      syncing leaving the instance in 'orphan' mode. This will happen if not
      enough replicas are connected to form a quorum. This makes sense e.g. on
      network error, but not when a replica is loading, because in the latter
      case it should be up and running quite soon. Let's account replicas that
      disconnected because they haven't completed initial configuration yet
      and continue syncing if connected + loading > quorum.
      
      Closes #3422
      1785e79c
    • Konstantin Belyavskiy's avatar
      replication: use applier_state to check quorum · ca53ab91
      Konstantin Belyavskiy authored
      Small refactoring: remove 'enum replica_state' since reuse a subset
      from applier state machine 'enum replica_state' to check if we have
      achieved replication quorum and hence can leave read-only mode.
      ca53ab91
    • Konstantin Osipov's avatar
      replication: change default replication_connect_timeout to 30 seconds · 06a63686
      Konstantin Osipov authored
      The default of 4 seconds is too low to bootstrap a large cluster.
      06a63686
    • Vladislav Shpilevoy's avatar
      iproto: 'iproto_msg_max' -> 'net_msg_max' in message · 020fb77f
      Vladislav Shpilevoy authored
      Closes #3425
      020fb77f
  3. May 24, 2018
    • Georgy Kirichenko's avatar
      replication: add strict ordering for appliers operating in a full mesh · edd76a2a
      Georgy Kirichenko authored
      In some cases when an applier processing yielded, other applier might
      start some conflicting operation and break replication and database
      consistency.
      Now applier locks a per-server-id latch before processing a transaction.
      This guarantees that there is only one applier request for each server
      in progress at each given moment.
      
      The problem was very rare until full mesh topologies in vinyl
      became a commonplace.
      
      Fixes gh-3339
      edd76a2a
    • Vladimir Davydov's avatar
      memtx: run garbage collection on demand · 39c8b526
      Vladimir Davydov authored
      When a memtx space is dropped or truncated, we delegate freeing tuples
      stored in it to a background fiber so as not to block the caller (and tx
      thread) for too long. Turns out it doesn't work out well for ephemeral
      spaces, which share the destruction code with normal spaces: the problem
      is the user might issue a lot of complex SQL SELECT statements that
      create a lot of ephemeral spaces and do not yield and hence don't give
      the garbage collection fiber a chance to clean up. There's a test that
      emulates this, 2.0:test/sql-tap/gh-3083-ephemeral-unref-tuples.test.lua.
      For this test to pass, let's run garbage collection procedure on demand,
      i.e. when any of memtx allocation functions fails to allocate memory.
      
      Follow-up #3408
      39c8b526
    • Vladimir Davydov's avatar
      memtx: rework background garbage collection procedure · cc0e5b4c
      Vladimir Davydov authored
      Currently, the engine has not control over yields issued during
      asynchronous index destruction. As a result, it can't force gc when
      there's not enough memory. To fix that, let's make gc callback stateful:
      now it's supposed to free some objects and return true if there's still
      more objects to free or false otherwise. Yields are now done by the
      memtx engine itself after each gc callback invocation.
      cc0e5b4c
  4. May 22, 2018
  5. May 21, 2018
    • Vladislav Shpilevoy's avatar
      Remove unused FDGuard · f57fd113
      Vladislav Shpilevoy authored
      f57fd113
    • Vladimir Davydov's avatar
      memtx: free tuples asynchronously when primary index is dropped · 2a1482f3
      Vladimir Davydov authored
      When a memtx space is dropped or truncated, we have to unreference all
      tuples stored in it. Currently, we do it synchronously, thus blocking
      the tx thread. If a space is big, tx thread may remain blocked for
      several seconds, which is unacceptable. This patch makes drop/truncate
      hand actual work to a background fiber.
      
      Before this patch, drop of a space with 10M 64-byte records took more
      than 0.5 seconds. After this patch, it takes less than 1 millisecond.
      
      Closes #3408
      2a1482f3
    • Vladimir Davydov's avatar
      vinyl: implement index compact method · db9e214a
      Vladimir Davydov authored
      Force major compaction of all ranges when index.compact() is called.
      Note, the function only triggers compaction, it doesn't wait until
      compaction is complete.
      
      Closes #3139
      db9e214a
    • Vladimir Davydov's avatar
      index: add compact method · 9abd0192
      Vladimir Davydov authored
      This patch adds index.compact() Lua method. The new method is backed by
      index_vtab::compact. Currently, it's a no-op for all kinds of indexes.
      It will be used by Vinyl engine in order to trigger major compaction.
      
      Part of #3139
      9abd0192
  6. May 19, 2018
    • Konstantin Belyavskiy's avatar
      replication: stability fix for test recover_missing_xlog · 73354bb7
      Konstantin Belyavskiy authored
      This test falls from time to time, because .xlog may have a
      different number in a name (and using box.info.lsn is not an
      option here).
      Since it's setup of two masters, it could be one or two xlogs
      in a folder, so first get a list of all matching files and then
      delete the last one.
      73354bb7
  7. May 18, 2018
  8. May 17, 2018
    • Vladislav Shpilevoy's avatar
      lua: introduce utf8 built-in globaly visible module · a4f3fff8
      Vladislav Shpilevoy authored
      utf8 is a module partially compatible with Lua 5.3 utf8 and
      lua-utf8 third party module.
      Partially means, that not all functions are implemented.
      
      The patch introduces these ones:
      upper, lower, len, char, sub, next.
      
      Len and char works exactly like in Lua 5.3. Other functions work
      like in lua-utf8, because they are not presented in Lua 5.3.
      
      Tarantool utf8 has extensions:
      
      * isupper/lower/alpha/digit, that check some property by a symbol
        or by its code;
      
      * cmp/casecmp, that compare two UTF8 strings.
      
      Closes #3290
      Closes #3385
      Closes #3081
      a4f3fff8
    • Vladislav Shpilevoy's avatar
      collation: introduce collation fingerprint · f3348764
      Vladislav Shpilevoy authored
      Collation fingerprint is a formatted string unique for a set
      of collation properties. Equal collations with different names
      have the same fingerprint.
      
      This new property is used to build collation fingerprint cache
      to use in Tarantool internals, where collation name does not
      matter.
      
      Fingerprint cache can never conflict or replace on insertion into
      it. It means, that, for example, utf8 module being created in
      this patchset, can fill collation cache with its own collations
      and it will affect neither users or other modules.
      f3348764
    • Vladislav Shpilevoy's avatar
      collation: split collation into coll and id objects · 97a6a4c5
      Vladislav Shpilevoy authored
      In the issue #3290 the important problem appeared - Tarantool can
      not create completely internal collations with no ID, name,
      owner. Just for internal usage.
      
      Original struct coll can not be used for this since
      * it has fields that are not needed in internals;
      * collation name is public thing, and the collation cache uses
        it, so it would be necessary to forbid to a user usage of some
        system names;
      * when multiple collations has the same comparator and only their
        names/owners/IDs are different, the separate UCollator objects
        are created, but it would be good to be able to reference a
        single one.
      
      This patch renames coll to coll_id, coll_def to call_id_def and
      introduces coll - pure collation object with no any user defined
      things.
      
      Needed for #3290.
      97a6a4c5
Loading