Skip to content
Snippets Groups Projects
  1. Mar 23, 2017
  2. Mar 22, 2017
  3. Mar 21, 2017
    • Konstantin Osipov's avatar
      box: minor cleanups · 37534436
      Konstantin Osipov authored
      * reduce the scope of wal_stream in box_cfg_xc()
      * remove unused start_offset/end_offset from wal_request
      * remove unused rmean_wal_tx_bus
      37534436
    • Konstantin Osipov's avatar
      recovery: avoid using the global recovery->vclock · 010662df
      Konstantin Osipov authored
      Do not use recovery->vclock in lua/info.cc and for truncate.
      
      Maintainance of recovery->vclock after initial recovery is finished
      is an artefact of Tarantool 1.5 architecture and will be removed
      in the future.
      010662df
    • Vladimir Davydov's avatar
    • Vladimir Davydov's avatar
      vinyl: start scheduler for remote recovery · e0dc453f
      Vladimir Davydov authored
      The amount of data sent when bootstrapping replica is only limited by
      the master's disk size, which can exceed the size of available memory by
      orders of magnitude. So we need the scheduler to be up and running while
      bootstrapping from the remote host so that it could schedule dumps when
      the quota limit is hit.
      e0dc453f
    • Vladimir Davydov's avatar
      vinyl: rework replication · 2993a758
      Vladimir Davydov authored
      Currently, on initial join we send the current vinyl state. To do that,
      we open a read iterator over a space's primary index and send statements
      returned by it. Such an approach has a number of inherent problems:
      
       - An open read iterator blocks compaction, which is unacceptable for
         such a long operation as join. To avoid blocking compaction, we open
         the iterator in the dirty mode, i.e. it skims over the tops. This,
         however, introduces a different kind of problem: this makes the
         threshold between initial and final join phases hazy - statements
         sent on final join may or may not be among those sent during the
         initial join, and there's no efficient way to differentiate between
         them w/o sending extra information.
      
       - The replica expects LSNs to be growing monotonically. This constraint
         is imposed by the lsregion allocator used for storing statements in
         memory, but read iterator returns statements ordered by key, not by
         LSN. Currently, replica simply crashes if statements happen to be
         sent in an order different from chronological, which renders vinyl
         replication unusable. In the scope of the current model, we can't fix
         this by assigning fake LSNs to statements received on initial join,
         because there's no strict LSN threshold between initial and final
         join phases (see the previous paragraph).
      
       - In the initial join phase, replica is only aware of  spaces that were
         created before the last snapshot, while vinyl sends statements from
         spaces that exist now. As a result, if a space was created after the
         most recent snapshot, the replica won't be able to receive its tuples
         and fail.
      
      To address the above-mentioned problems, we make vinyl initial join send
      the latest snapshot, just like in case of memtx. We implement this by
      loading the vinyl state from the last snapshot of the metadata log and
      sending statements of all runs from the snapshot as is (including
      deletes and updates), to be applied by the replica. To make lsregion at
      the receiving end happy, we assign fake monotonically growing LSNs to
      statements received on initial join. This is OK, because
      
        any LSN from final join > max real LSN from initial join
        max real LSN from initial join >= max fake LSN
      
      hence
      
        any LSN from final join > any fake LSN from initial join
      
      Besides fixing vinyl replication, this patch also enables the
      replication test suite for the vinyl engine (except for hot_standby)
      and makes engine/replica_join cover the following test cases:
       - secondary indexes
       - delete and update statements
       - keys added in an order different from LSN
       - recreate space after checkpoint
      
      Closes #1911
      Closes #2001
      2993a758
    • Konstantin Osipov's avatar
      txn: remove recovery vclock promotion from recovery_fill_lsn() · ac87c66b
      Konstantin Osipov authored
      * promote recovery vclock in recover_xlog(), after we apply
        the recovered row
      * remove unnecessary promotions from WAL and relays: they
        should not use recovery vclock going forward.
      * update format specifier for error code ER_UNKNOWN_REPLICA
        to expect a string, rather than integer, since it's passed
        a string for replica id, not integer
      * remove unused code in relay.cc
      ac87c66b
    • Konstantin Osipov's avatar
      recovery: move check for alien xrows to applier · 7bdaff31
      Konstantin Osipov authored
      Assume we can trust everything we read from the local
      recovery - the xlog signatures and checksums are on guard for this.
      7bdaff31
    • Roman Tokarev's avatar
      Implement space:bsize(). Issue #2043. · 7a2bb91b
      Roman Tokarev authored
      7a2bb91b
    • Konstantin Osipov's avatar
      info: don't use wal_checkpoint() to display the current vclock · b44c1fd9
      Konstantin Osipov authored
      Using wal_checkpoint() incurs an extra and unnecessary overhead
      in tx thread. Monitoring inquiries are supposed to be quick
      and incur close to no ovrhead on tx thread.
      
      The state of replicaset vclock is good enough for monitoring.
      
      Revert back the test results broken by
      f2bccc18 by @rtsisyk
      The original results were correct, as indicated by the comment
      not changed by f2bccc18
      b44c1fd9
    • Konstantin Osipov's avatar
      8ff0a6a8
    • Konstantin Osipov's avatar
      wal: initialize row timestamp in wal_request_write(). · fa148512
      Konstantin Osipov authored
      Intialize row timestamp before writing the row, in wal_request_write().
      Row timestamp does not bear any semantics, it's for informaitonal
      purposes only. Initialize it in WAL thread. This fixes timestamps
      of xctl file as well, which were up until now left uninitialized.
      
      The patch prepares for removal of recovery_fill_lsn().
      fa148512
    • Konstantin Osipov's avatar
      recovery: remove recovery.h from applier.cc · 38313524
      Konstantin Osipov authored
      Use replicaset_vclock in SUBSCRIBE. Ensure it is correctly initialized
      after local recovery, so that it contains LSNs of remote
      servers even after they were saved in the local WAL and recovered
      from it.
      38313524
  4. Mar 20, 2017
  5. Mar 17, 2017
    • Vladimir Davydov's avatar
      xctl: fix xlog fd leak on rotation · 81405c33
      Vladimir Davydov authored
      Follow up 5102bfbc
      81405c33
    • Roman Tsisyk's avatar
      Fix flaky app/ipc.test.lua · 0d5b047c
      Roman Tsisyk authored
      Closes #1845
      0d5b047c
    • Roman Tsisyk's avatar
      Add missing replication/prune.result · 5cc226ce
      Roman Tsisyk authored
      Follow up f232756b
      5cc226ce
    • Vladimir Davydov's avatar
      box: fix LSN assigned on final join · d45bcc5a
      Vladimir Davydov authored
      The number of rows sent during initial join may be greater than the LSN
      of the checkpoint sent by the master, because there are rows that do not
      contribute to LSN (system spaces, etc). If this happens, LSNs assigned
      on final join will be greater than LSNs assigned after bootstrapping is
      complete, which breaks Vinyl logic. Fix that by resetting recovery
      vclock to the checkpoint LSN before getting to final join.
      d45bcc5a
    • Vladimir Davydov's avatar
      Consolidate garbage collection · 666c0337
      Vladimir Davydov authored
      Currently, old snapshots and xlogs are deleted by the snapshot daemon
      while vinyl files are removed from engine_commit_checkpoint().
      For the sake of backups and replication, which need to temporarily
      disable garbage collection, we should bring all garbage collection
      routines together in one place. That's why this patch introduces
      box.internal.gc() method, which takes an LSN of the latest snapshot to
      keep as its argument. When called, it deletes all xlog files as well as
      engine specific files (memtx snapshots, vinyl runs) that are not
      required to recover from a snapshot with LSN greater or equal to the
      given one. For removal of engine specific files, a new engine callback
      is introduced, Engine::collectGarbage. The snapshot daemon now calls
      box.internal.gc() to cleanup instead of deleting snap and xlog files by
      itself.
      666c0337
    • Vladimir Davydov's avatar
      xctl: separate gc from metadata log rotation · 6c5329af
      Vladimir Davydov authored
      This patch is a preparation for centralized garbage collection. It
      extracts the code doing garbage collection from xctl_rotate() and
      places it in a separate function xctl_collect_garbage(). The latter
      takes a signature which determines the minimal age an object has to
      have to be deleted - the function only removes files left from objects
      that were deleted before the log received the given signature. It is
      needed for making vinyl respect box.cfg.snapshot_count.
      6c5329af
    • Vladimir Davydov's avatar
      vinyl: make snapshot consistent · bc6457a1
      Vladimir Davydov authored
      A set of run files created by a snapshot is inconsistent, meaning w/o
      replaying xlog it is not guaranteed that it contains a database state
      that existed when the snapshot was taken. This is because we dump all
      ranges independently and each range as a whole, so that if a statements
      happens to be inserted to a range after snapshot was started and before
      the range is dumped, it will be included in the dump. This peculiarity
      stands in the way of backups and replication, both of which require a
      consistent database state.
      
      To make snapshot consistent, let's force rotation of all in-memory trees
      on snapshot and make the dump task only dump trees that need to be
      snapshotted if snapshot is in progress. The rotation is done lazily, on
      insertion to the tree, similarly to how we handle DDL. The difference is
      instead of sc_version we check vy_mem->min_lsn against checkpoint_lsn.
      bc6457a1
    • Vladimir Davydov's avatar
      vinyl: remember which mems are dumped by lsn · c1b25490
      Vladimir Davydov authored
      Since range->mem can be rotated while dump is in progress, we have to
      remember which mems we are dumping. Commit 818208c4 ("vinyl: fix
      unwritten mem dropped if ddl") does this by remembering the number of
      frozen mems at the time of dump preparation on the dump task. Currently,
      this works fine, because we always dump all frozen mems. However, this
      condition won't hold when consistent snapshot is introduced. The point
      is that in order to make snapshot consistent, we need to dump only
      in-memory trees which were created before WAL checkpoint during
      snapshot. These mems are not even guaranteed to be at the end of the
      range->frozen list because of range coalescing. So in this patch we use
      the current LSN to remember which mems are going to be dumped - all mems
      created after task dump was created will have min_lsn > LSN of task
      creation, so we should delete only mems with min_lsn <= the LSN on task
      completion.
      c1b25490
    • Vladimir Davydov's avatar
      vinyl: make sure all statements with LSN <= snapshot LSN are dumped · db180cde
      Vladimir Davydov authored
      In contrast to the memtx engine, which populates in-memory trees from
      Engine::prepare(), in case of Vinyl statements are inserted into
      in-memory trees after WAL write, from the Engine::commit() callback.
      Therefore, to make sure all statements inserted before snapshot are
      dumped, we must initiate checkpoint after WAL rotation. Currently, it is
      not true - checkpoint is initiated from Engine::beginCheckpoint(). To
      make Vinyl snapshots consistent (not requiring xlog replay), we have to
      fix that, so introduce a new callback, Engine::prepareWaitCheckpoint(),
      which is called right after WAL rotation, and trigger Vinyl checkpoint
      from it.
      db180cde
    • Vladimir Davydov's avatar
      vinyl: zap vy_mem_update_formats() + cleanup · 95b9af65
      Vladimir Davydov authored
      vy_mem_update_formats() is used to update mem formats when mem rotation
      is skipped, because the mem is empty. This doesn't work as expected,
      because vy_mem_update_formats() does not update mem->sc_version, so that
      in case of ddl the next insertion will rotate it anyway. Instead of
      updating sc_version in vy_mem_update_formats(), let's fix this by
      zapping the helper altogether and simply recreating mem - it isn't a big
      deal, because this does not happen often. While we are at it, let's
      also:
       - reorder arguments of vy_mem_new() to keep key_def close to format
       - remove extra arguments of vy_range_rotate_mem() - we can get all
         of them right there (in fact we already do in case of ->format).
      95b9af65
    • Vladimir Davydov's avatar
      vinyl: add frozen mems of splitting range to read iterator · ee447dd0
      Vladimir Davydov authored
      Currently, if the range is splitting, we only add active in-memory
      indexes of the resulting ranges to the read iterator, see
      vy_read_iterator_add_mem(). This is because until recently a mem could
      only be frozen on dump/compaction task preparation, which is disabled
      while split is in progress. However, it is not true any more - a mem can
      be rotated on txn_commmit() in case of DDL, hence we must always add all
      in-memory indexes, including frozen ones, when opening a read iterator.
      ee447dd0
  6. Mar 16, 2017
Loading