Skip to content
Snippets Groups Projects
  1. Jul 08, 2017
    • Georgy Kirichenko's avatar
      Altering a space takes effect immediately. · 8631ffb2
      Georgy Kirichenko authored
      Before this patch, the new space, created by alter specification,
      would be put into space cache only after successful WAL write.
      
      This behaviour is not linearizable: on a replica, the WAL is
      played sequentially, and the order of events could differ from the
      master.
      
      Besides, it could crash, as demonstrated in gh-2074 test case.
      
      Since we use a cascading rollback for all transactions on WAL
      write error, it's OK to put a space into space cache
      before WAL write, so that the new transactions apply to the new
      space.
      
      This patch does exactly that.
      
      All subsequent requests are executed against the new space.
      
      This patch also removes on_replace trigger in the old space, since
      all checks against the new tuple format are performed using the new
      space.
      
      Fixes #2074.
      8631ffb2
  2. Jul 07, 2017
  3. Jul 06, 2017
    • Konstantin Osipov's avatar
      alter: MoveIndex review fixes · 7b1028a4
      Konstantin Osipov authored
      * update comments
      * add a test case for altering a primary key on the fly
      * rename AddIndex to CreateIndex
      * factor out common code into a function
      7b1028a4
    • Georgy Kirichenko's avatar
      alter: introduce MoveIndex and RebuildIndex operations · 8c854d7f
      Georgy Kirichenko authored
      MoveIndex operation is used to move an existing index from the old space
      to the new one. Semantically it's a no-op.
      
      RebuildIndex is introduced for case when essential index properties are
      changed, so it is necessary to drop the old index and create a new one in
      its place in the new space.
      
      AlterSpaceOp::prepare() is removed: all checks are moved to
      on_replace trigger in _index system space from it.
      All checks are done before any alter operation is created.
      
      Necessary for gh-2074 and gh-1796.
      8c854d7f
    • alyapunov's avatar
      alter: move modify vs rebuild index check · 4c87d443
      alyapunov authored
      Move the check which decides on an alter strategy whenever
      a row in _index space is changed, from AlterSpaceOp::preapre()
      to on_replace trigger on _index space.
      
      The check chooses between two options: a heavy-weight index rebuild,
      invoked when index definition, such as key parts, is changed, vs.
      lightweight modify, invoked when index name or minor options are
      modified..
      
      Before this patch, index alteration creates a pair of operations
      (DropIndex + AddIndex) in all cases, but later
      replaces two operations with one at AlterSpaceOp::prepare() phase.
      
      This is bad by several reasons:
      
      - it's done while traversing of a linked list of operations,
        and it changes the list being traversed.
      
      - an order in the list of operations is required for this
      to work: drop must precede add.
      
      - needless allocation and deallocation of operations makes the logic
        unnecessarily complex.
      
      Necessary for gh-1796.
      4c87d443
    • Konstantin Osipov's avatar
      space: create the primary key index first. · 25a99f42
      Konstantin Osipov authored
      Always first create the primary key index in a space. Put
      the primary key key def first in the array of key_defs, passed
      into tuple_format_new(). This is necessary for gh-1796.
      25a99f42
    • Konstantin Osipov's avatar
      space add an assert on state of space object · 5bca3f7c
      Konstantin Osipov authored
      Assert that we can't create a space with secondary key but no primary.
      5bca3f7c
    • alyapunov's avatar
      Move drop primary index check to the stage before new space creation · f6d06b9f
      alyapunov authored
      Now drop primary index checks are made in alter triggers after
      new space creation. Such an implementation leads to temporary
      creation of a space with invalid index set. Fix it and check
      the index set before space_new call.
      f6d06b9f
    • Georgy Kirichenko's avatar
      Add swap_index_def function · 8ec3e3ed
      Georgy Kirichenko authored
      Non destructive swap_index_def function sould be used because memtx_tree
      stores a pointer to an index_def used while tree creation. Fixed #2570
      8ec3e3ed
  4. Jul 05, 2017
  5. Jul 04, 2017
    • Vladimir Davydov's avatar
      vinyl: fix snapshot consistency · 4586f2d6
      Vladimir Davydov authored
      Commit dbfd515f ("vinyl: fix crash if snapshot is called while dump
      is in progress") introduced a bug that can result in statements inserted
      after WAL checkpoint being included in a snapshot. This happens, because
      vy_begin_checkpoint() doesn't force rotation of in-memory trees anymore:
      it bumps checkpoint_generation, but doesn't touch scheduler->generation,
      which is used to trigger in-memory tree rotation.
      
      To fix this issue, this patch zaps scheduler->checkpoint_generation and
      makes vy_begin_checkpoint() bump scheduler->generation directly as it
      used to. To guarantee dump consistency (the issued fixed by commit
      dbfd515f), scheduler->dump_generation is introduced - it defines the
      generation of in-memory data that are currently being dumped. The
      scheduler won't start dumping newer trees until all trees whose
      generation equals dump_generation have been dumped. The counter is only
      bumped by the scheduler itself when all old in-memory trees have been
      dumped. Together, this guarantees that each dump contains data of the
      same generation, i.e. is consistent.
      
      While we are at it, let's also remove vy_scheduler->dump_fifo, the list
      of all in-memory trees sorted in the chronological order. The scheduler
      uses it to keep track of the oldest in-memory tree, which is needed to
      invoke lsregion_gc(). However, since we do not remove indexes from the
      dump_heap, as we used to not so long ago, we can use the heap for this.
      The only problem is indexes that are currently being dumped are moved
      off the top of the heap, but we can detect this case by maintaining a
      counter of dump tasks in progress: if dump_task_count is > 0 when a dump
      task is completed, we must not call lsregion_gc() irrespective of the
      generation of the index at the top of the heap. A good thing about
      ridding of vy_scheduler->dump_fifo is that it is a step forward towards
      making vy_index independent of vy_scheduler so that it can be moved to a
      separate source file.
      
      Closes #2541
      Needed for #1906
      4586f2d6
    • Vladimir Davydov's avatar
      vinyl: zap vy_index->generation · 199836db
      Vladimir Davydov authored
      vy_index->generation equals to the generation of the oldest in-memory
      tree, which can be looked up efficiently as vy_index->sealed list is
      sorted by generation so let's zap it and add vy_index_generation()
      function instead.
      199836db
    • Vladimir Davydov's avatar
      vinyl: move vy_range to its own source file · 448b643e
      Vladimir Davydov authored
      Needed for #1906
      448b643e
    • Vladimir Davydov's avatar
      Move iterator_type to its own source file · 63fe5e6d
      Vladimir Davydov authored
      Including index.h just for the sake of iterator_type, as we do in
      vy_run.h and vy_mem.h, is a bit of overkill. Let's move its definition
      to a separate source file, iterator_type.h.
      63fe5e6d
    • Vladimir Davydov's avatar
      vinyl: remove dependency of vy_range from vy_index · e0dae391
      Vladimir Davydov authored
       - Replace vy_range->index with key_def.
       - Replace vy_range_iterator->index with vy_range_tree_t.
      
      Needed for #1906
      e0dae391
    • Vladimir Davydov's avatar
      vinyl: store indexes in scheduler->compact_heap · 47c3aa20
      Vladimir Davydov authored
      The compact_heap, used by the scheduler to schedule range compaction,
      contains all ranges except those that are currently being compacted.
      Since the appropriate vy_index object is required to schedule a range
      compaction, we have to store a pointer to the index a range belongs to
      in vy_range->index. This makes it impossible to move vy_range struct and
      its implementation to a separate source file.
      
      To address this, let's rework the scheduler as follows:
      
       - Make compact_heap store indexes, not ranges. An index is prioritized
         by the greatest compact_priority among its ranges.
      
       - Add a heap of ranges to each index, prioritized by compact_priority.
         A range is removed from the heap while it's being compacted.
      
       - Do not remove indexes from dump_heap or compact_heap when a task is
         scheduled (otherwise we could only schedule one compaction per
         index). Instead just update the index position in the heaps.
      
      Needed for #1906
      47c3aa20
  6. Jul 03, 2017
    • Vladimir Davydov's avatar
      replica: advance gc state only when xlog is closed · ba09475f
      Vladimir Davydov authored
      Advancing replica->gc on every status update is inefficient as gc can
      only be invoked when we move to the next xlog file. Currently, it's
      acceptable, because status is only updated once per second, but there's
      no guarantee that it won't be updated say every millisecond in future,
      in which case advancing replica->gc on every status update may become
      too costly.
      
      So introduce a trigger invoked every time an xlog is closed by
      recover_remaining_wals() and use it in relay to send a special gc
      message.
      ba09475f
    • Vladimir Davydov's avatar
      cbus: fix cbus_endpoint_destroy loop exit condition · f986bc96
      Vladimir Davydov authored
      ipc_cond_wait() always returns 0, so the body of the loop waiting for
      the endpoint to be ready for destruction is only invoked once.
      f986bc96
    • Vladimir Davydov's avatar
      relay: fix potential thread hang on exit · b49bb458
      Vladimir Davydov authored
      To make sure there is no status message pending in the tx pipe,
      relay_cbus_detach() waits on relay->status_cond before proceeding to
      relay destruction. The status_cond is signaled by the status message
      completion routine (relay_status_update()) handled by cbus on the
      relay's side. The problem is by the time we call relay_cbus_detach(),
      the cbus loop has been stopped (see relay_subscribe_f()), i.e. there's
      no one to process the message that is supposed to signal status_cond.
      That means, if there happens to be a status message en route when the
      relay is stopped, the relay thread will hang forever.
      
      To fix this issue, let's introduce a new helper function, cbus_flush(),
      which blocks the caller until all cbus messages queued on a pipe have
      been processed, and use it in relay_cbus_detach() to wait for in-flight
      status messages to complete. Apart from source and destination pipes,
      this new function takes a callback to be used for processing incoming
      cbus messages, so it can be used even if the loop that is supposed to
      invoke cbus_process() stopped.
      b49bb458
    • Vladimir Davydov's avatar
      recovery: refactor recover_remaining_wals · 2fbafbf5
      Vladimir Davydov authored
       - Fold in wal dir scan. It's pretty easy to detect if we need to rescan
         wal dir - we do iff the current wal is closed (if it isn't, we need
         to recover it first), so there's no point in keeping it apart.
      
       - Close the last recovered wal on eof. We don't close it to avoid
         rereading it in case recover_remaining_wals() is called again before
         a new wal is added to wal dir. We can detect this case by checking if
         the signature of the last wal stored in wal dir has increased after
         rescanning the dir.
      
       - Don't abort recovery and print 'xlog is deleted under our feet'
         message if current wal file is removed. This is pointless, really -
         it's OK to remove an open file in Unix. Besides, the check for
         deleted file is only relevant if wal dir has been rescanned, which is
         only done when we proceed to the next wal, i.e. it doesn't really
         detect anything.
      
      A good side effect of this rework is that now we can invoke garbage
      collector right from recovery_close_log().
      2fbafbf5
    • Konstantin Osipov's avatar
  7. Jun 30, 2017
    • Konstantin Osipov's avatar
      vinyl: point iterator merge fixes. · e2ba4c89
      Konstantin Osipov authored
      * use per-index statistics
      * remove step_count as it is no longer maintained
      * add statistics for txw, mem, and index overall
      e2ba4c89
    • alyapunov's avatar
      vinyl: implement special iterator for full-key EQ case · a20f0ddb
      alyapunov authored
      Old iterator has several problems:
      
      - restoration system is too complex and might cause several
        reads from disk of the same statements.
      
      - applying of upserts works in direct way (squash all upserts
        and apply them to terminal statement) and the code doesn't
        leave a change to optimize it.
      
      Implement iterator for full-key EQ case that fixes problems above.
      a20f0ddb
    • alyapunov's avatar
      vinyl: split index->version to mem_list_version/range_tree_version · 08bb9b27
      alyapunov authored
      There is a version member in vy_index that is incremented on each
      modification of mem list and range tree. Split it to two members
      that correspond to mem list and range tree accordingly.
      
      It is needed for more precise tracking of changes in iterators.
      08bb9b27
  8. Jun 29, 2017
  9. Jun 27, 2017
    • Konstantin Osipov's avatar
    • Vladimir Davydov's avatar
      vinyl: add per-index txw statistics · 44305191
      Vladimir Davydov authored
      This patch adds the following counters to index.info:
      
        txw
          count               # number of statements in the TX write set
            rows
            bytes
          iterator
            lookup            # number of lookups in the TX write set
            get               # number of statements returned by the iterator
              rows
              bytes
      
      Needed for #1662
      44305191
    • Vladimir Davydov's avatar
      vinyl: add per-index cache statistics · 1bb7a8b2
      Vladimir Davydov authored
      This patch adds the cache section to index.info with the following
      counters in it:
      
        cache
          rows                # number of tuples in the cache
          bytes               # cache memory size
          lookup              # lookups in the cache
          get                 # reads from the cache
            rows
            bytes
          put                 # write to the cache
            rows
            bytes
          invalidate          # overwrites in the cache
            rows
            bytes
          evict               # evictions due to memory quota
            rows
            bytes
      
      Needed for #1662
      1bb7a8b2
    • Vladimir Davydov's avatar
      vinyl: do not (ab)use vy_quota for vy_cache · 8e5834b8
      Vladimir Davydov authored
      Using vy_quota, which was implemented to support watermarking,
      throttling, timeouts, for accounting cached tuples is an overkill.
      Replace it with mem_used and mem_quota counters.
      8e5834b8
    • Vladimir Davydov's avatar
      vinyl: add per-index disk write statistics · 542ce685
      Vladimir Davydov authored
      This patch adds the following counters to the disk section index.info:
      
        dump                  # dump statistics:
          count               #   number of invocations
          in                  #   number of input statements
            rows
            bytes
          out                 #   number of output statements
            rows
            bytes
      
        compact               # compaction statistics:
          count               #   number of invocations
          in                  #   number of input statements
            rows
            bytes
          out                 #   number of output statements
            rows
            bytes
      
      Needed for #1662
      542ce685
Loading