Skip to content
Snippets Groups Projects
  1. May 30, 2018
    • Vladimir Davydov's avatar
      Allow to increase box.cfg.vinyl_memory and memtx_memory at runtime · 30492862
      Vladimir Davydov authored
      Slab arena can grow dynamically so all we need to do is increase the
      quota limit. Decreasing the limits is still explicitly prohibited,
      because slab arena never unmaps slabs.
      
      Closes #2634
      30492862
    • Vladimir Davydov's avatar
      vinyl: update recovery context with records written during recovery · d135f39c
      Vladimir Davydov authored
      During recovery, we may write VY_LOG_CREATE_LSM and VY_LOG_DROP_LSM
      records we failed to write before restart (because those records are
      written after WAL and hence may not make it to vylog). Right after
      recovery we invoke garbage collection to drop incomplete runs. Once
      VY_LOG_PREPARE_LSM record is introduced, we will also collect incomplete
      LSM trees there (those we failed to build). However, there may be LSM
      trees we managed to build but failed to write VY_LOG_CREATE_LSM for.
      This is OK as we will retry vylog write, but currenntly it isn't
      reflected in the recovery context used for garbage collection. To avoid
      purging such LSM trees, let's update the recovery context with records
      written during recovery.
      
      Needed for #1653
      d135f39c
  2. May 29, 2018
  3. May 25, 2018
    • Konstantin Belyavskiy's avatar
      replication: display downstream status at upstream · 3db1dee9
      Konstantin Belyavskiy authored
      This fix improves 'box.info.replication' output.
      If downstream fails and thus disconnects from upstream, improve
      logging by printing 'status: disconnected' and error message on
      both sides (master and replica).
      
      Closes #3365
      3db1dee9
    • Konstantin Belyavskiy's avatar
      replication: do not delete relay on applier disconnect · adc28591
      Konstantin Belyavskiy authored
      This is a part of more complex task aiming to improve logging.
      Do not destroy relay since it stores last error and it can be
      useful for diagnostic reason.
      Now relay is created with replica and always exists. So also
      remove several NULL checks.
      Add relay_state { OFF, FOLLOW and STOPPED } to track replica
      presence, once connected it either FOLLOW or STOPPED until
      master is reset.
      Updated with @kostja proposal.
      
      Used for #3365.
      adc28591
    • Konstantin Osipov's avatar
    • Vladimir Davydov's avatar
      vinyl: purge dropped indexes from vylog on garbage collection · a2d1d2a2
      Vladimir Davydov authored
      Currently, when an index is dropped, we remove all ranges/slices
      associated with it and mark all runs as dropped in vylog immediately.
      To find ranges/slices/runs, we use vy_lsm struct, see vy_log_lsm_prune.
      
      The problem is vy_lsm struct may be inconsistent with the state stored
      in vylog if index drop races with compaction, because we first write
      changes done by compaction task to vylog and only then update vy_lsm
      struct, see vy_task_compact_complete. Since write to vylog yields, this
      opens a time window during which the index can be dropped. If this
      happens, objects that were created by compaction but haven't been logged
      yet (such as new runs, slices, ranges) will be deleted from vylog by
      index drop, and this will permanently break vylog, making recovery
      impossible.
      
      To fix this issue, let's rework garbage collection of objects associated
      with dropped indexes as follows. Now when an index is dropped, we write
      a single record to vylog, VY_LOG_DROP_LSM, i.e. just mark the index as
      dropped without deleting associated objects. Actual index cleanup takes
      place in the garbage collection procedure, see vy_gc, which purges all
      ranges/slices linked to marked indexes from vylog and marks all their
      runs as dropped. When all runs are actually deleted from disk and
      "forgotten" in vylog, we remove the index record from vylog by writing
      VY_LOG_FORGET_LSM record. Since garbage collection procedure uses vylog
      itself instead of vy_lsm struct for iterating over vinyl objects, no
      race between index drop and dump/compaction can now lead to broken
      vylog.
      
      Closes #3416
      a2d1d2a2
    • Vladimir Davydov's avatar
      vinyl: store lsn of index drop record in vylog · 264f7e3f
      Vladimir Davydov authored
      This is required to rework garbage collection in vinyl.
      264f7e3f
    • Vladimir Davydov's avatar
      alter: pass lsn of index drop record to engine · 1af04afe
      Vladimir Davydov authored
      We pass lsn of index alter/create records, let's pass lsn of drop record
      for consistency. This is also needed by vinyl to store it in vylog (see
      the next patch).
      1af04afe
    • Vladimir Davydov's avatar
      vinyl: do not reuse lsm objects during recovery from vylog · 31ab8e03
      Vladimir Davydov authored
      If an index was dropped and then recreated, then while replaying vylog
      we will reuse vy_lsm_recovery_info object corresponding to it. There's
      no reason why we do that instead of simply allocating a new object -
      amount of memory saved is negligible, but the code looks more complex.
      Let's simplify the code - whenever we see VY_LOG_CREATE_LSM, create a
      new vy_lsm_recovery_info object and replace the old incarnation if any
      in the hash map.
      31ab8e03
    • Konstantin Osipov's avatar
      test: update replication_connect_timeout in tests to a lower value · e9bf00fc
      Konstantin Osipov authored
      replication: make replication_connect_timeout dynamic
      e9bf00fc
    • Konstantin Osipov's avatar
    • Vladimir Davydov's avatar
      test: rework test case for memtx async garbage collection · c5f98b91
      Vladimir Davydov authored
      Do not use errinj as it is unreliable. Check that:
       - No memory is freed by immediately after space drop (WAL is off).
       - All memory is freed asynchronously after yield.
      c5f98b91
    • Vladimir Davydov's avatar
      replication: fix log message in case of sync failure · 6c35bf9b
      Vladimir Davydov authored
      replicaset_sync() returns not only if the instance synchronized to
      connected replicas, but also if some replicas have disconnected and
      the quorum can't be formed any more. Nevertheless, it always prints
      that sync has been completed. Fix it.
      
      See #3422
      6c35bf9b
    • Vladimir Davydov's avatar
      replication: do not stop syncing if replicas are loading · 1785e79c
      Vladimir Davydov authored
      If a replica disconnects while sync is in progress, box.cfg{} may stop
      syncing leaving the instance in 'orphan' mode. This will happen if not
      enough replicas are connected to form a quorum. This makes sense e.g. on
      network error, but not when a replica is loading, because in the latter
      case it should be up and running quite soon. Let's account replicas that
      disconnected because they haven't completed initial configuration yet
      and continue syncing if connected + loading > quorum.
      
      Closes #3422
      1785e79c
    • Konstantin Belyavskiy's avatar
      replication: use applier_state to check quorum · ca53ab91
      Konstantin Belyavskiy authored
      Small refactoring: remove 'enum replica_state' since reuse a subset
      from applier state machine 'enum replica_state' to check if we have
      achieved replication quorum and hence can leave read-only mode.
      ca53ab91
    • Konstantin Osipov's avatar
      replication: change default replication_connect_timeout to 30 seconds · 06a63686
      Konstantin Osipov authored
      The default of 4 seconds is too low to bootstrap a large cluster.
      06a63686
    • Vladislav Shpilevoy's avatar
      iproto: 'iproto_msg_max' -> 'net_msg_max' in message · 020fb77f
      Vladislav Shpilevoy authored
      Closes #3425
      020fb77f
  4. May 24, 2018
    • Georgy Kirichenko's avatar
      replication: add strict ordering for appliers operating in a full mesh · edd76a2a
      Georgy Kirichenko authored
      In some cases when an applier processing yielded, other applier might
      start some conflicting operation and break replication and database
      consistency.
      Now applier locks a per-server-id latch before processing a transaction.
      This guarantees that there is only one applier request for each server
      in progress at each given moment.
      
      The problem was very rare until full mesh topologies in vinyl
      became a commonplace.
      
      Fixes gh-3339
      edd76a2a
    • Vladimir Davydov's avatar
      memtx: run garbage collection on demand · 39c8b526
      Vladimir Davydov authored
      When a memtx space is dropped or truncated, we delegate freeing tuples
      stored in it to a background fiber so as not to block the caller (and tx
      thread) for too long. Turns out it doesn't work out well for ephemeral
      spaces, which share the destruction code with normal spaces: the problem
      is the user might issue a lot of complex SQL SELECT statements that
      create a lot of ephemeral spaces and do not yield and hence don't give
      the garbage collection fiber a chance to clean up. There's a test that
      emulates this, 2.0:test/sql-tap/gh-3083-ephemeral-unref-tuples.test.lua.
      For this test to pass, let's run garbage collection procedure on demand,
      i.e. when any of memtx allocation functions fails to allocate memory.
      
      Follow-up #3408
      39c8b526
    • Vladimir Davydov's avatar
      memtx: rework background garbage collection procedure · cc0e5b4c
      Vladimir Davydov authored
      Currently, the engine has not control over yields issued during
      asynchronous index destruction. As a result, it can't force gc when
      there's not enough memory. To fix that, let's make gc callback stateful:
      now it's supposed to free some objects and return true if there's still
      more objects to free or false otherwise. Yields are now done by the
      memtx engine itself after each gc callback invocation.
      cc0e5b4c
  5. May 22, 2018
  6. May 21, 2018
    • Vladislav Shpilevoy's avatar
      Remove unused FDGuard · f57fd113
      Vladislav Shpilevoy authored
      f57fd113
    • Vladimir Davydov's avatar
      memtx: free tuples asynchronously when primary index is dropped · 2a1482f3
      Vladimir Davydov authored
      When a memtx space is dropped or truncated, we have to unreference all
      tuples stored in it. Currently, we do it synchronously, thus blocking
      the tx thread. If a space is big, tx thread may remain blocked for
      several seconds, which is unacceptable. This patch makes drop/truncate
      hand actual work to a background fiber.
      
      Before this patch, drop of a space with 10M 64-byte records took more
      than 0.5 seconds. After this patch, it takes less than 1 millisecond.
      
      Closes #3408
      2a1482f3
    • Vladimir Davydov's avatar
      vinyl: implement index compact method · db9e214a
      Vladimir Davydov authored
      Force major compaction of all ranges when index.compact() is called.
      Note, the function only triggers compaction, it doesn't wait until
      compaction is complete.
      
      Closes #3139
      db9e214a
    • Vladimir Davydov's avatar
      index: add compact method · 9abd0192
      Vladimir Davydov authored
      This patch adds index.compact() Lua method. The new method is backed by
      index_vtab::compact. Currently, it's a no-op for all kinds of indexes.
      It will be used by Vinyl engine in order to trigger major compaction.
      
      Part of #3139
      9abd0192
  7. May 19, 2018
    • Konstantin Belyavskiy's avatar
      replication: stability fix for test recover_missing_xlog · 73354bb7
      Konstantin Belyavskiy authored
      This test falls from time to time, because .xlog may have a
      different number in a name (and using box.info.lsn is not an
      option here).
      Since it's setup of two masters, it could be one or two xlogs
      in a folder, so first get a list of all matching files and then
      delete the last one.
      73354bb7
  8. May 18, 2018
Loading