Skip to content
Snippets Groups Projects
  1. Jun 08, 2017
    • Vladimir Davydov's avatar
      vinyl: convert vy_index->tree to pointer · 801f32c7
      Vladimir Davydov authored
      Space truncate rework done by the next patch requires the ability to
      swap data stored on disk between two indexes on recovery so as not to
      reload all runs every time a space gets truncated. Since we can't swap
      content of two rb tree (due to rbt_nil), convert vy_index->tree to a
      pointer.
      801f32c7
    • Roman Tsisyk's avatar
      Fix -Wunused on on Clang · 85009195
      Roman Tsisyk authored
      85009195
    • Georgy Kirichenko's avatar
      Lock schema for space and index alteration · 5a200cb3
      Georgy Kirichenko authored
      Lock schema before any changes to space and index dictionary and unlock
      only after commit or rollback. This allow many parallel data definition
      statements. Issue #2075
      5a200cb3
    • Georgy Kirichenko's avatar
      Add before statement trigger for spaces · c60fa224
      Georgy Kirichenko authored
      We need to lock box schema while editing a ddl space. This lock should
      be done before any changes in a ddl space. Before trigger is the good
      place to issue a schema lock. See #2075
      c60fa224
    • Vladimir Davydov's avatar
      box: require box.cfg.checkpoint_count to be >= 1 · 27b86b1d
      Vladimir Davydov authored
      We must store at least one snapshot, otherwise we wouldn't recover
      after restart, so if checkpoint_count is set to 0, we disable garbage
      collection. This contravenes the notion followed everywhere else in
      tarantool: if we want an option value (timeout, checkpoint count, etc)
      to be infinite, we should set it to a very big number, not to 0.
      Make checkpoint_count comply.
      27b86b1d
    • Vladimir Davydov's avatar
      box: rework internal garbage collection API · 2c547c26
      Vladimir Davydov authored
      The current gc implementation has a number of flaws:
      
       - It tracks checkpoints, not consumers, which makes it impossible to
         identify the reason why gc isn't invoked. All we can see is the
         number of users of each particular checkpoint (reference counter),
         while it would be good to know what references it (replica or
         backup).
      
       - While tracking checkpoints suits well for backup and initial join, it
         doesn't look good when used for subscribe, because replica is
         supposed to track a vclock, not a checkpoint.
      
       - Tracking checkpoints from box/gc also violates encapsulation:
         checkpoints are, in fact, memtx snapshots, so they should be tracked
         by memtx engine, not by gc, as they are now. This results in
         atrocities, like having two snap xdirs - one in memtx, another in gc.
      
       - Garbage collection is invoked by a special internal function,
         box.internal.gc.run(), which is passed the signature of the oldest
         checkpoint to save. This function is then used by the snapshot daemon
         to maintain the configured number of checkpoints. This brings
         unjustified complexity to the snapshot daemon implementation: instead
         of just calling box.snapshot() periodically it has to take on
         responsibility to invoke the garbage collector with the right
         signature. This also means that garbage collection is disabled unless
         snapshot daemon is configured to be running, which is confusing, as
         snapshot daemon is disabled by default.
      
      So this patch reworks box/gc as follows:
      
       - Checkpoints are now tracked by memtx engine and can be accessed via a
         new module box/src/checkpoint.[hc], which provides simple wrappers
         around corresponding MemtxEngine methods.
      
       - box/gc.[hc] now tracks not checkpoints, but individual consumers that
         can be registered, unregistered, and advanced. Each consumer has a
         human-readable name displayed by box.internal.gc.info():
      
         tarantool> box.internal.gc.info()
         ---
         - consumers:
           - name: backup
             signature: 8
           - name: replica 885a81a9-a286-4f06-9cb1-ed665d7f5566
             signature: 12
           - name: replica 5d3e314f-bc03-49bf-a12b-5ce709540c87
             signature: 12
           checkpoints:
           - signature: 8
           - signature: 11
           - signature: 12
         ...
      
       - box.internal.gc.run() is removed. Garbage collection is now invoked
         automatically by box.snapshot() and doesn't require the snapshot
         daemon to be up and running.
      2c547c26
    • Konstantin Osipov's avatar
      gh-1716: improve comments · 5e20557c
      Konstantin Osipov authored
      Fix spelling and rephrase a few comments.
      5e20557c
    • Vladislav Shpilevoy's avatar
      Improve column mask calculation · 22d9b97d
      Vladislav Shpilevoy authored
      If the update operation changes a field with number >= 64, the
      column mask of the update op is set to UINT64_MAX.
      Lets use the last bit of the column mask as the flag, that all
      fields with numbers >= 63 could be changed. Then if the indexed
      positions are less than 64, the column mask will be always working.
      
      Closes #1716
      22d9b97d
    • Vladislav Shpilevoy's avatar
      vinyl: speed up update_optimize test · 3ce12ff2
      Vladislav Shpilevoy authored
      Remove waiting for end of the dump of the secondary indexes.
      According to 0d99714f commit the
      primary is always dumped after secondary and we can wait for
      the only primary instead of all indexes.
      3ce12ff2
  2. Jun 07, 2017
  3. Jun 05, 2017
  4. Jun 02, 2017
  5. Jun 01, 2017
    • Vladimir Davydov's avatar
      relay: fix cbus endpoint name conflict · 78721a42
      Vladimir Davydov authored
      Names of the cbus endpoint used in a relay must be unique. To assure
      that, we name endpoints after replica->id. This isn't quite correct,
      as replica can be deleted from the cluster table while its relay
      remains active. If the replica's id happens to be reused for another
      replica, we will get a cbus endpoint name conflict, which typically
      results in a crash. Fix this by naming endpoints after the relay
      in-memory address.
      
      Closes #2497
      78721a42
    • Konstantin Osipov's avatar
      test: fix a sporadically failinig box/select.test.lua · 5e26af20
      Konstantin Osipov authored
      A local reference could be sometimes garbage collected and sometimes
      not, leading to flaky test results.
      5e26af20
    • Konstantin Osipov's avatar
      test: update test results · e524af4c
      Konstantin Osipov authored
      A follow up for 06099023
      
      gh-2491, invalid default value of wal_max_size
      e524af4c
    • Vladimir Davydov's avatar
      vinyl: log index creation after wal write · d9908432
      Vladimir Davydov authored
      Currently, we log index creation before writing the index row to WAL,
      via VinylEngine::addPrimaryKey() and VinylEngine::buildSecondaryKey().
      This is incorrect, because it isn't guaranteed that WAL write will
      succeed. If it doesn't, the index creation is aborted, but the index
      record is left in the metadata log. At best this will result in garbage
      in the metadata log, at worst we will get a permanent failure to rotate
      the log. The latter can happen if the index id in the metadata log (i.e.
      LSN) is reused by another index, e.g. after the following piece of Lua
      code is executed by tarantool, box.snapshot() will keep failing:
      
          tarantool> box.cfg{}
          ---
          ...
      
          tarantool> s = box.schema.space.create("test", {engine='vinyl'})
          ---
          ...
      
          tarantool> box.error.injection.set('ERRINJ_WAL_IO', true)
          ---
          - ok
          ...
      
          tarantool> _ = box.space.test:create_index('pk')
          ---
          - error: Failed to write to disk
          ...
      
          tarantool> box.error.injection.set('ERRINJ_WAL_IO', false)
          ---
          - ok
          ...
      
          tarantool> _ = box.space.test:create_index('pk')
          ---
          ...
      
          tarantool> box.snapshot()
          ---
          - error: 'Invalid VYLOG file: Duplicate index id 2'
          ...
      
      Fix this by moving vinyl index creation logging to the new Index method
      commitCreate(), which is called after WAL write. Since we can't fail
      there, we use the same technique as in commitDrop() to make sure the log
      record is committed sooner or later, namely in case of vylog write
      failure we leave the record in the vylog buffer to be flushed along with
      the next write. If it doesn't get flushed before shutdown, we will
      replay it on local recovery from WAL.
      d9908432
    • Vladimir Davydov's avatar
      vinyl: fix vy_index leak on index creation error path · 3ddfe560
      Vladimir Davydov authored
      VinylIndex::open() doesn't free vy_index if vy_index_open() fails.
      Also, VinylIndex leaks vy_index if we fail to log index creation in WAL,
      because in this case VinylIndex::commitDrop(), which releases the
      underlying vy_index, isn't called. To fix the leaks, make VinylIndex
      reference vy_index object which it is wrapped around in constructor and
      unreference it in destructor.
      
      While we are at it, remove useless TODOs from vy_index_open() and
      vy_index_commit_drop() asking not to drop and recreate index on
      recovery, because we don't reload index directory on drop/create since
      the time vylog was introduced.
      3ddfe560
    • Vladimir Davydov's avatar
      vinyl: zap vy_env->indexes list · 053f8335
      Vladimir Davydov authored
      Before commit a63595db ("Implement index:info() introspection"),
      we needed the list of all vinyl indexes to show index stats via
      box.info.vinyl(). Now the list is only used in vy_env_delete() to
      cleanup on shutdown. This code doesn't make much sense as it doesn't
      delete VinylIndex structures - indexes should be deleted from the
      upper level instead - so remove the cleanup and delete vy_env->indexes
      as its meaning is quite obscure (e.g. it isn't quite clear when we
      should add an index to it, from vy_index_new() or from vy_index_open()).
      053f8335
    • Vladimir Davydov's avatar
      Make Handler::dropIndex() method of Index · bfb0252e
      Vladimir Davydov authored
      This method only needs Index, it doesn't have anything to do with space.
      Apart from moving it from Handler to Index, rename it commitDrop() to
      emphasize the fact that it is called after WAL write.
      bfb0252e
    • Vladislav Shpilevoy's avatar
      vinyl: fix error with return out of range stmts from read_iter · 17eb36d7
      Vladislav Shpilevoy authored
      When the read iterator gets next range, it then calls
      read_iterator_merge_next_key. But this function can call
      read_iterator_restore, that causes also restore of the
      range_iterator. This restore returns the previous range, and the
      next statements are out of the range.
      Example:
      
        curr_range
        curr_stmt      next_stmt
      +-------------+--------------+
      | -inf        x         +inf |
      +-------------+--------------+
      
      read_iterator_next_range
      
        curr_stmt      curr_range
                       next_stmt
      +-------------+--------------+
      | -inf        x         +inf |
      +-------------+--------------+
      
      read_iterator_merge_next_keys fails version checking and call
      read_iterator_restore, which calls range_iterator_restore to the
      curr_stmt. And then the merge_iterator returns the next_stmt, which
      is out of the curr_range.
      
        curr_range     curr_stmt
      +-------------+--------------+
      | -inf        x         +inf |
      +-------------+--------------+
      17eb36d7
    • Roman Tsisyk's avatar
      Fix invalid default value of wal_max_size · 06099023
      Roman Tsisyk authored
      It should be 256Mb instead of 256Gb.
      
      Fixes #2491
      06099023
    • Vladislav Shpilevoy's avatar
    • Vladimir Davydov's avatar
      Add replication/gc test · 13d3a27e
      Vladimir Davydov authored
      Check that:
       - garbage collector doesn't delete files used by initial join,
         final join, or subscribe
       - old checkpoints are deleted as the replica advances
       - checkpoint pinned by a stale replica is released when the
         replica is unregistered
      13d3a27e
Loading