Skip to content
Snippets Groups Projects
  1. Jul 11, 2017
    • Alexandr Lyapunov's avatar
      vinyl: fix broken rollback to savepoint · b9bdcd31
      Alexandr Lyapunov authored
      Now vinyl's rollback to TX savepoint doesn't work in case when one
      TX has several statements modifying the same key.
      
      A simple example is presented in gh ticket 2589.
      
      Fix it.
      
      Add a big test that compares vinyl with memtx in case of many
      failing (catched with pcall) statements done in a transaction.
      
      Fixes #2589
      b9bdcd31
    • Alexandr Lyapunov's avatar
      vinyl: fix TX log order · 9648cfd5
      Alexandr Lyapunov authored
      Now if a TX (in vinyl) changes the same key again, it uses
      previously created struct txv and updates the tuple in it.
      Therefore the change is not added to the transaction log; instead,
      the old entry in the log is updated.
      
      At the same time, the order of TX log is important, especially when
      there are several indexes in a vinyl space: tx_prepare traverses
      TX log, determines the beginning of a tarantool statement (that
      could consist of several vinyl statements) and behave with several
      conjuncted txvs as a whole.
      
      Fix it by leaving old txv and always creating new txv.
      
      Fixes #2577
      9648cfd5
    • Alexandr Lyapunov's avatar
      vinyl: fix a pair of bugs · b525e8fc
      Alexandr Lyapunov authored
      - don't reference the upserted tuple twice in vy_tx_set(..)
      - handle txv allocaion error properly in vy_tx_et(..)
      
      The return value of txv_new was not checked.
      b525e8fc
  2. Jul 10, 2017
    • Vladimir Davydov's avatar
      vinyl: get rid of vy_run_iterator->coio_read · eeb9a81e
      Vladimir Davydov authored
      Instead introduce vy_run_env_enable_coio(), which starts reader threads,
      and make vy_run_iterator automatically switch to coio if reader threads
      are running. With this patch, vy_read_iterator doesn't need a pointer to
      vy_env to add a run as a source, only vy_run_env.
      
      While we are at it, cleanup vy_conf a bit.
      
      Needed for #1906
      eeb9a81e
    • Vladimir Davydov's avatar
      vinyl: fix use-after-free of last_stmt in vy_run_write_page · 63f59df4
      Vladimir Davydov authored
      vy_run_write_page() doesn't take a reference to last_stmt, because it
      assumes that the write iterator guarantees it won't be deleted until
      'next' is called again. The iterator does pin a statement if it is read
      from a run file - see vy_write_iterator_set_tuple() - however there's a
      case when the last returned statement can go away under us. This will
      happen if the iterator is used for major compaction and the last source
      statement is a DELETE. In this case the iterator will unreference the
      last statement it returned to the caller, take a reference to the DELETE
      instead, but won't return the DELETE - see vy_write_iterator_next(). As
      a result, the caller, i.e. vy_run_write_page(), will hit use-after-free
      on an attempt to read last_stmt.
      
      To fix this bug, let's make vy_run_write_page() take a reference to
      last_stmt as it used to before the write iterator was reworked. A test
      case will be added later, after all iterator-related issues have been
      fixed.
      
      Closes #2578
      63f59df4
    • Georgy Kirichenko's avatar
      Add iconv support · aff6235c
      Georgy Kirichenko authored
      Iconv is a library to convert a sequence of characters in one
      character encoding to a sequence of characters in another character
      encoding. Example below converts utf-16 big endian string into utf-8
      string:
      
          convertor = require('iconv').new('UTF-16BE', 'UTF-8')
          converted_string = convertor(source_string)
      
      Closes #2587
      aff6235c
    • Vladimir Davydov's avatar
      vinyl: add missing mem_list_version increment · 3a73e5dc
      Vladimir Davydov authored
      vy_task_dump_new() deletes empty in-memory trees right away, but
      doesn't increment vy_index->mem_list_version, which may result in
      a read iterator crash accessing a deleted vy_mem.
      3a73e5dc
    • alyapunov's avatar
      Use open MP sort optimisation only for huge arrays · 2e864b5a
      alyapunov authored
      Now open MP sort is used for any size of an array.
      For small arrays it's an overkill and even can cause overhead
      due to thread pool creation.
      
      Invoke open MP sort only for big arrays and use old good
      single-thread qsort for small arrays.
      
      Fix #2431
      2e864b5a
    • Roman Tsisyk's avatar
      Fix logging of box.cfg.replication option · 6015e0df
      Roman Tsisyk authored
      Print original uri as is if it doesn't contain sensitive
      information.
      
      Closes #2292
      6015e0df
  3. Jul 09, 2017
  4. Jul 08, 2017
    • Vladimir Davydov's avatar
      vinyl: move read_set from vy_index to tx_manager · 56871b94
      Vladimir Davydov authored
      Needed to remove dependency of vy_index on struct txv.
      
      Needed for #1906
      56871b94
    • Konstantin Osipov's avatar
    • Georgy Kirichenko's avatar
      Altering a space takes effect immediately. · 8631ffb2
      Georgy Kirichenko authored
      Before this patch, the new space, created by alter specification,
      would be put into space cache only after successful WAL write.
      
      This behaviour is not linearizable: on a replica, the WAL is
      played sequentially, and the order of events could differ from the
      master.
      
      Besides, it could crash, as demonstrated in gh-2074 test case.
      
      Since we use a cascading rollback for all transactions on WAL
      write error, it's OK to put a space into space cache
      before WAL write, so that the new transactions apply to the new
      space.
      
      This patch does exactly that.
      
      All subsequent requests are executed against the new space.
      
      This patch also removes on_replace trigger in the old space, since
      all checks against the new tuple format are performed using the new
      space.
      
      Fixes #2074.
      8631ffb2
  5. Jul 07, 2017
  6. Jul 06, 2017
    • Konstantin Osipov's avatar
      alter: MoveIndex review fixes · 7b1028a4
      Konstantin Osipov authored
      * update comments
      * add a test case for altering a primary key on the fly
      * rename AddIndex to CreateIndex
      * factor out common code into a function
      7b1028a4
    • Georgy Kirichenko's avatar
      alter: introduce MoveIndex and RebuildIndex operations · 8c854d7f
      Georgy Kirichenko authored
      MoveIndex operation is used to move an existing index from the old space
      to the new one. Semantically it's a no-op.
      
      RebuildIndex is introduced for case when essential index properties are
      changed, so it is necessary to drop the old index and create a new one in
      its place in the new space.
      
      AlterSpaceOp::prepare() is removed: all checks are moved to
      on_replace trigger in _index system space from it.
      All checks are done before any alter operation is created.
      
      Necessary for gh-2074 and gh-1796.
      8c854d7f
    • alyapunov's avatar
      alter: move modify vs rebuild index check · 4c87d443
      alyapunov authored
      Move the check which decides on an alter strategy whenever
      a row in _index space is changed, from AlterSpaceOp::preapre()
      to on_replace trigger on _index space.
      
      The check chooses between two options: a heavy-weight index rebuild,
      invoked when index definition, such as key parts, is changed, vs.
      lightweight modify, invoked when index name or minor options are
      modified..
      
      Before this patch, index alteration creates a pair of operations
      (DropIndex + AddIndex) in all cases, but later
      replaces two operations with one at AlterSpaceOp::prepare() phase.
      
      This is bad by several reasons:
      
      - it's done while traversing of a linked list of operations,
        and it changes the list being traversed.
      
      - an order in the list of operations is required for this
      to work: drop must precede add.
      
      - needless allocation and deallocation of operations makes the logic
        unnecessarily complex.
      
      Necessary for gh-1796.
      4c87d443
    • Konstantin Osipov's avatar
      space: create the primary key index first. · 25a99f42
      Konstantin Osipov authored
      Always first create the primary key index in a space. Put
      the primary key key def first in the array of key_defs, passed
      into tuple_format_new(). This is necessary for gh-1796.
      25a99f42
    • Konstantin Osipov's avatar
      space add an assert on state of space object · 5bca3f7c
      Konstantin Osipov authored
      Assert that we can't create a space with secondary key but no primary.
      5bca3f7c
    • alyapunov's avatar
      Move drop primary index check to the stage before new space creation · f6d06b9f
      alyapunov authored
      Now drop primary index checks are made in alter triggers after
      new space creation. Such an implementation leads to temporary
      creation of a space with invalid index set. Fix it and check
      the index set before space_new call.
      f6d06b9f
    • Georgy Kirichenko's avatar
      Add swap_index_def function · 8ec3e3ed
      Georgy Kirichenko authored
      Non destructive swap_index_def function sould be used because memtx_tree
      stores a pointer to an index_def used while tree creation. Fixed #2570
      8ec3e3ed
  7. Jul 05, 2017
  8. Jul 04, 2017
    • Vladimir Davydov's avatar
      vinyl: fix snapshot consistency · 4586f2d6
      Vladimir Davydov authored
      Commit dbfd515f ("vinyl: fix crash if snapshot is called while dump
      is in progress") introduced a bug that can result in statements inserted
      after WAL checkpoint being included in a snapshot. This happens, because
      vy_begin_checkpoint() doesn't force rotation of in-memory trees anymore:
      it bumps checkpoint_generation, but doesn't touch scheduler->generation,
      which is used to trigger in-memory tree rotation.
      
      To fix this issue, this patch zaps scheduler->checkpoint_generation and
      makes vy_begin_checkpoint() bump scheduler->generation directly as it
      used to. To guarantee dump consistency (the issued fixed by commit
      dbfd515f), scheduler->dump_generation is introduced - it defines the
      generation of in-memory data that are currently being dumped. The
      scheduler won't start dumping newer trees until all trees whose
      generation equals dump_generation have been dumped. The counter is only
      bumped by the scheduler itself when all old in-memory trees have been
      dumped. Together, this guarantees that each dump contains data of the
      same generation, i.e. is consistent.
      
      While we are at it, let's also remove vy_scheduler->dump_fifo, the list
      of all in-memory trees sorted in the chronological order. The scheduler
      uses it to keep track of the oldest in-memory tree, which is needed to
      invoke lsregion_gc(). However, since we do not remove indexes from the
      dump_heap, as we used to not so long ago, we can use the heap for this.
      The only problem is indexes that are currently being dumped are moved
      off the top of the heap, but we can detect this case by maintaining a
      counter of dump tasks in progress: if dump_task_count is > 0 when a dump
      task is completed, we must not call lsregion_gc() irrespective of the
      generation of the index at the top of the heap. A good thing about
      ridding of vy_scheduler->dump_fifo is that it is a step forward towards
      making vy_index independent of vy_scheduler so that it can be moved to a
      separate source file.
      
      Closes #2541
      Needed for #1906
      4586f2d6
    • Vladimir Davydov's avatar
      vinyl: zap vy_index->generation · 199836db
      Vladimir Davydov authored
      vy_index->generation equals to the generation of the oldest in-memory
      tree, which can be looked up efficiently as vy_index->sealed list is
      sorted by generation so let's zap it and add vy_index_generation()
      function instead.
      199836db
    • Vladimir Davydov's avatar
      vinyl: move vy_range to its own source file · 448b643e
      Vladimir Davydov authored
      Needed for #1906
      448b643e
    • Vladimir Davydov's avatar
      Move iterator_type to its own source file · 63fe5e6d
      Vladimir Davydov authored
      Including index.h just for the sake of iterator_type, as we do in
      vy_run.h and vy_mem.h, is a bit of overkill. Let's move its definition
      to a separate source file, iterator_type.h.
      63fe5e6d
    • Vladimir Davydov's avatar
      vinyl: remove dependency of vy_range from vy_index · e0dae391
      Vladimir Davydov authored
       - Replace vy_range->index with key_def.
       - Replace vy_range_iterator->index with vy_range_tree_t.
      
      Needed for #1906
      e0dae391
    • Vladimir Davydov's avatar
      vinyl: store indexes in scheduler->compact_heap · 47c3aa20
      Vladimir Davydov authored
      The compact_heap, used by the scheduler to schedule range compaction,
      contains all ranges except those that are currently being compacted.
      Since the appropriate vy_index object is required to schedule a range
      compaction, we have to store a pointer to the index a range belongs to
      in vy_range->index. This makes it impossible to move vy_range struct and
      its implementation to a separate source file.
      
      To address this, let's rework the scheduler as follows:
      
       - Make compact_heap store indexes, not ranges. An index is prioritized
         by the greatest compact_priority among its ranges.
      
       - Add a heap of ranges to each index, prioritized by compact_priority.
         A range is removed from the heap while it's being compacted.
      
       - Do not remove indexes from dump_heap or compact_heap when a task is
         scheduled (otherwise we could only schedule one compaction per
         index). Instead just update the index position in the heaps.
      
      Needed for #1906
      47c3aa20
  9. Jul 03, 2017
    • Vladimir Davydov's avatar
      replica: advance gc state only when xlog is closed · ba09475f
      Vladimir Davydov authored
      Advancing replica->gc on every status update is inefficient as gc can
      only be invoked when we move to the next xlog file. Currently, it's
      acceptable, because status is only updated once per second, but there's
      no guarantee that it won't be updated say every millisecond in future,
      in which case advancing replica->gc on every status update may become
      too costly.
      
      So introduce a trigger invoked every time an xlog is closed by
      recover_remaining_wals() and use it in relay to send a special gc
      message.
      ba09475f
    • Vladimir Davydov's avatar
      cbus: fix cbus_endpoint_destroy loop exit condition · f986bc96
      Vladimir Davydov authored
      ipc_cond_wait() always returns 0, so the body of the loop waiting for
      the endpoint to be ready for destruction is only invoked once.
      f986bc96
    • Vladimir Davydov's avatar
      relay: fix potential thread hang on exit · b49bb458
      Vladimir Davydov authored
      To make sure there is no status message pending in the tx pipe,
      relay_cbus_detach() waits on relay->status_cond before proceeding to
      relay destruction. The status_cond is signaled by the status message
      completion routine (relay_status_update()) handled by cbus on the
      relay's side. The problem is by the time we call relay_cbus_detach(),
      the cbus loop has been stopped (see relay_subscribe_f()), i.e. there's
      no one to process the message that is supposed to signal status_cond.
      That means, if there happens to be a status message en route when the
      relay is stopped, the relay thread will hang forever.
      
      To fix this issue, let's introduce a new helper function, cbus_flush(),
      which blocks the caller until all cbus messages queued on a pipe have
      been processed, and use it in relay_cbus_detach() to wait for in-flight
      status messages to complete. Apart from source and destination pipes,
      this new function takes a callback to be used for processing incoming
      cbus messages, so it can be used even if the loop that is supposed to
      invoke cbus_process() stopped.
      b49bb458
    • Vladimir Davydov's avatar
      recovery: refactor recover_remaining_wals · 2fbafbf5
      Vladimir Davydov authored
       - Fold in wal dir scan. It's pretty easy to detect if we need to rescan
         wal dir - we do iff the current wal is closed (if it isn't, we need
         to recover it first), so there's no point in keeping it apart.
      
       - Close the last recovered wal on eof. We don't close it to avoid
         rereading it in case recover_remaining_wals() is called again before
         a new wal is added to wal dir. We can detect this case by checking if
         the signature of the last wal stored in wal dir has increased after
         rescanning the dir.
      
       - Don't abort recovery and print 'xlog is deleted under our feet'
         message if current wal file is removed. This is pointless, really -
         it's OK to remove an open file in Unix. Besides, the check for
         deleted file is only relevant if wal dir has been rescanned, which is
         only done when we proceed to the next wal, i.e. it doesn't really
         detect anything.
      
      A good side effect of this rework is that now we can invoke garbage
      collector right from recovery_close_log().
      2fbafbf5
    • Konstantin Osipov's avatar
Loading