Skip to content
Snippets Groups Projects
  1. Apr 06, 2018
    • Vladimir Davydov's avatar
      alter: zap space_vtab::commit_alter · 9ec3b1a4
      Vladimir Davydov authored
      space_vtab::commit_alter is implemented only by memtx, which uses it
      to set bsize for a new space. However, it isn't necessary to use
      commit_alter for this - instead we can set bsize in prepare_alter and
      reset it in drop_primary_key, along with memtx_space::replace. Let's
      do it and zap space_vtab::commit_alter altogether, because the callback
      is confusing: judging by its name it should be called after WAL write,
      but it is called before.
      9ec3b1a4
    • Vladimir Davydov's avatar
      memtx: do not use space_vtab::commit_alter for freeing tuples · 3bb5a4b1
      Vladimir Davydov authored
      When the last index of a memtx space is dropped, we need to delete all
      tuples stored in the space. We do it in space_vtab::commit_alter, but
      this is wrong, because this function is not rolled back and so we may
      get use-after-free error if we fail to write a DDL operation to WAL.
      To avoid that, let's delete tuples in index_vtab::commit_drop, which is
      called after WAL write.
      
      There's a nuance here: index_vtab::commit_drop is called if an index is
      rebuilt (because essentially it is drop + create) so we must elevate the
      reference counter of every tuple added to the new index during rebuild
      and, respectively, drop all the references in index_vtab::abort_create,
      which is called if index creation is aborted for some reason. This also
      means that now we iterate over all tuples twice when a primary key is
      rebuilt - first to build the new index, then to unreference all tuples
      stored in the old index. This is OK as we can make the last step
      asynchronous, which will also speed up the more common case of space
      drop.
      
      Closes #3289
      3bb5a4b1
    • Vladislav Shpilevoy's avatar
      netbox: don't cancel pending requests on schema change · 62ba7ba7
      Vladislav Shpilevoy authored
      When a schema version change is detected, there is no reason to
      cancel and retry already sent requests. They can be already
      executed on a server, and their retrying leads to multiple
      execution.
      
      A request must be retried only if a server responded with
      WRONG_SCHEMA_VERSION error exactly to this request.
      
      Closes #3325
      62ba7ba7
    • Konstantin Osipov's avatar
      rfc: add rfc template · 51411a7a
      Konstantin Osipov authored
      Introducing a formal, trackable process for server enhancement.
      
      Before working on a complex feature, please write an RFC document,
      describing what and how you see changed, and get it approved.
      
      All historical RFCs are kept in doc/rfc.
      51411a7a
  2. Apr 05, 2018
    • Vladimir Davydov's avatar
      vinyl: do not use space_vtab::commit_alter for preparing new indexes · 48a47600
      Vladimir Davydov authored
      Currently, space_vtab::commit_alter is called before WAL write so we can
      use it for preparing new indexes in vinyl. However, this is going to
      change soon, because actually space_vtab::commit_alter should be called
      after WAL write, like index_vtab::commit_drop or commit_create. Calling
      it before WAL write may result in use-after-free in memtx (see #3289).
      Besides, using this function for iterating over space indexes just feels
      wrong, as we have index methods invoked by AlterSpaceOp descendants for
      this.
      
      So let's move all the operations performed by vinyl_space_commit_alter
      somewhere else. Fortunately, it's easy to do without damaging code
      readability or efficiency:
      
       - Update of vy_lsm::pk can be done by vinyl_space_swap_index and
         vinyl_build_secondary_key.
      
       - vy_lsm->check_is_unique can be computed by vinyl_engine_create_space
         and then set by vinyl_space_swap_index.
      
       - Call to key_def_update_optionality is implied by key_def_swap, which
         is already called by vinyl_space_swap_index, hence it can be removed.
      
      Needed for #3289
      48a47600
    • Vladimir Davydov's avatar
      memtx: don't call begin_buid and end_build for new pk after recovery · 30db003a
      Vladimir Davydov authored
      Basically, index_begin_build() followed by index_end_build() is a no-op.
      There's absolutely no point in calling it for primary indexes after
      initial recovery has completed.
      30db003a
    • Vladimir Davydov's avatar
      vinyl: use disk_format in vy_run_rebuild_index · 828a33bf
      Vladimir Davydov authored
      We read tuples from disk hence we should use disk_format, not
      mem_format. Fix it. While we are at it, let's also update the
      outdated comment to vy_run_rebuild_index.
      828a33bf
    • Vladimir Davydov's avatar
      memtx: rtree: remove pointless index_vtab::begin_build implementation · 50fcc202
      Vladimir Davydov authored
      The rtree is empty when this function is called (in fact, it is called
      right after creating the index), there's no need to purge it.
      50fcc202
    • Vladimir Davydov's avatar
      vinyl: allow to modify key definition if it does not require rebuild · a0299db8
      Vladimir Davydov authored
      To allow extending key definition for non-empty vinyl spaces, this patch
      performs the following steps:
      
       - Revert commit c31dd19a ("vinyl: forbid vinyl index key definition
         alter") that forbade any key def alter. It isn't needed anymore.
      
       - Update key_def and cmp_def in vinyl_space_swap_index(). We simply
         swap the definitions between the old and new indexes in memory.
         Since all vinyl objects reference either vy_lsm::key_def or
         vy_lsm::cmp_def or both, and the change is compatible (does not
         change the order for existing tuples), this should work just fine.
      
       - Update key definition in vylog on ALTER. For this, we introduce a new
         vylog record type, VY_LOG_MODIFY_LSM, which updates key definition.
         To be able to replay it on recovery in case we failed to flush it
         before restart, we also store the LSN of the WAL record that
         triggered the ALTER.
      
      It also adds the following test cases:
      
       - Modify key definition of primary and secondary indexes of a non-empty
         space (engine/ddl).
      
       - Modify key definition before snapshot and relay it to a newly joined
         replica (engine/replica_join).
      
       - Make sure key definition is updated in vylog on ALTER (vinyl/layout).
      a0299db8
    • Vladimir Davydov's avatar
      vinyl: do not write VY_LOG_DUMP_LSM record to snapshot · 48e97cd3
      Vladimir Davydov authored
      There's no point in writing this record to snapshot, because we can
      store LSN of the last index dump right in VY_LOG_CREATE_LSM record.
      48e97cd3
    • Vladimir Davydov's avatar
      vinyl: rename vy_log_record::commit_lsn to create_lsn · aadd7901
      Vladimir Davydov authored
      So as to draw the line between LSN of index creation and LSN of last
      index modification, which is introduced by later in the series.
      aadd7901
    • Vladimir Davydov's avatar
      vinyl: use source tuple format when copying field map · d647f27a
      Vladimir Davydov authored
      There are two functions in vy_stmt.c that blindly copy tuple field map,
      vy_stmt_dup() and vy_stmt_replace_from_upsert(). Both these functions
      take a tuple format to use for the new statement and require this format
      to be the same as the source tuple format in terms of fields definition,
      otherwise they'll just crash. The only reason why we did that is that
      back when these functions were written we used a separate format for
      UPSERT statements so we needed this extra argument for creating a
      REPLACE from UPSERT. Now it's not needed, and we can use the source
      tuple format instead. Moreover, passing the current tuple format to any
      of those functions is even harmful, because tuple format can be extended
      by ALTER, in which case these functions will crash if called on a
      statement created before ALTER. That being said, let's drop the tuple
      format argument.
      d647f27a
    • Vladimir Davydov's avatar
      vinyl: remove pointless is_nullable initialization for disk_format · b2469121
      Vladimir Davydov authored
      space->format and cmp_def must be compatible, i.e. space->format has
      is_nullable flag set for a field iff it is set for all key parts
      indexing this field. Therefore there's no point to set is_nullable for
      disk_format as it must have been initialized by tuple_format_create().
      Remove the pointless loop.
      
      Also, while we are at it, fix the minor memory leak - disk_format is
      referenced twice for the primary key.
      b2469121
    • Vladimir Davydov's avatar
      vinyl: zap vy_mem_update_formats · 0ce7381f
      Vladimir Davydov authored
      A piece of code left from the inglorious past, which doesn't even have
      a forward declaration, let alone used anywhere. Remove it.
      0ce7381f
    • Vladimir Davydov's avatar
      vinyl: zap vy_lsm_validate_formats · fac96568
      Vladimir Davydov authored
      We allocate index formats in the only place, vy_lsm_new, so there's no
      point in this debug-only check anymore.
      fac96568
    • Vladimir Davydov's avatar
      vinyl: do not reallocate tuple formats on alter · d75c24a4
      Vladimir Davydov authored
      We create new formats for all indexes of the new space in
      vinyl_space_commit_alter() while we don't actually need to
      do this, because the new formats have already been created
      by vy_lsm_new() - all we need to do is reuse them somehow.
      
      This patch does the trick: it implements the swap_index()
      space virtual method for vinyl so that it swaps tuple formats
      between the old and new spaces.
      d75c24a4
    • Vladimir Davydov's avatar
      space: make space_swap_index method virtual · 1cb16edb
      Vladimir Davydov authored
      This function is called by MoveIndex and ModifyIndex ALTER operations,
      i.e. when the index definition is not changed at all or is extended.
      Making this method virtual will allow to avoid reallocation of vinyl
      formats in vinyl_space_commit_alter().
      1cb16edb
    • Vladimir Davydov's avatar
      alter: do not rebuild secondary indexes on compatible pk changes · 7e214255
      Vladimir Davydov authored
      If the new cmp_def of a secondary index is compatible with the old one
      after the primary key parts have changed, we don't need to rebuild it,
      we just need to update its definition.
      7e214255
    • Vladimir Davydov's avatar
      alter: require rebuild of all secondary vinyl indexes if pk changes · d6d3e2c0
      Vladimir Davydov authored
      If the primary key is modified, we schedule rebuild of all non-unique
      (including nullable) secondary TREE indexes. This is valid for memtx,
      but is not quite right for vinyl. For vinyl we have to rebuild all
      secondary indexes, because they are all non-clustered (i.e. point to
      tuples via primary key parts). This doesn't result in any bugs for now,
      because rebuild of vinyl indexes is not supported, but hopefully this is
      going to change soon. So let's introduce a new virtual index method,
      index_vtab::depends_on_pk, which returns true iff the index needs to be
      updated if the primary key changes, and define this new method for vinyl
      and memtx TREE indexes.
      d6d3e2c0
    • Vladimir Davydov's avatar
      index: add commit_modify virtual method · 28c31d69
      Vladimir Davydov authored
      The new method is called after successful update of index definition.
      It is passed the signature of the WAL record that committed the
      operation. It will be used by Vinyl to update key definition in vylog.
      28c31d69
    • Konstantin Osipov's avatar
    • Ilya Markov's avatar
      log: Fix syslog logger · 7c7a2fa1
      Ilya Markov authored
      * Remove rewriting format of default logger in case of syslog option.
      * Add facility option parsing and use parsed results in format message
        according to RFC3164. Possible values and default value of syslog
        facility are taken from nginx (https://nginx.ru/en/docs/syslog.html)
      * Move initialization of logger type and format fucntion before
        initialization of descriptor in log_XXX_init, so that we can test
        format function of syslog logger.
      
      Closes gh-3244.
      7c7a2fa1
  3. Apr 04, 2018
    • Vladimir Davydov's avatar
      vinyl: zap upsert_format · 3a73c2c6
      Vladimir Davydov authored
      The only difference between format of UPSERT statements and format of
      other DML statements of the same index is that the former reserves one
      byte for UPSERT counter, which is needed to schedule UPSERT squashing.
      Since we store UPSERT counter on lsregion now, we don't need a special
      format for UPSERTs anymore. Remove it.
      3a73c2c6
    • Vladimir Davydov's avatar
      vinyl: allocate upsert counter on lsregion · e8147fe7
      Vladimir Davydov authored
      Currently, we store upsert counter in tuple metadata (that's what
      upsert_format is for), but since it's only relevant for tuples of
      the memory level, we can store it on lsregion, right before tuple
      data. Let's do it now so that we can get rid of upsert_format.
      e8147fe7
    • Kirill Yukhin's avatar
      Merge branch '1.9' into 1.10 · 402f066c
      Kirill Yukhin authored
      402f066c
    • Alexander Turenko's avatar
      Add 'key_def_new_with_parts' (temporary) · 7d089bbd
      Alexander Turenko authored
      Filed gh-3311 to remove this export soon.
      
      Fixes #3310.
      7d089bbd
  4. Apr 03, 2018
  5. Apr 02, 2018
  6. Mar 30, 2018
    • Konstantin Belyavskiy's avatar
      replication: recover missing local data from replica · eae84efb
      Konstantin Belyavskiy authored
      In case of sudden power-loss, if data was not written to WAL but
      already sent to remote replica, local can't recover properly and
      we have different datasets. Fix it by using remote replica's data
      and LSN comparison.
      
      Based on @GeorgyKirichenko proposal and @locker race free check.
      
      Closes #3210
      eae84efb
    • Konstantin Belyavskiy's avatar
      replication: stay in orphan mode until replica is synced by vclock · 7ebc8ae4
      Konstantin Belyavskiy authored
      Stay in orphan (read-only) mode until local vclock is lower than
      master's to make sure that datasets are the same across replicaset.
      Update replication/catch test to reflect the change.
      
      Suggested by @kostja
      
      Needed for #3210
      7ebc8ae4
    • Vladimir Davydov's avatar
      Update LuaRocks · 3171288c
      Vladimir Davydov authored
      Closes #3148
      3171288c
    • Vladislav Shpilevoy's avatar
      console: do not try to prevent SIGPIPE in text console · 427795fa
      Vladislav Shpilevoy authored
      Text console tried to learn about SIGPIPE before its raising
      by read-before-write. If a socket is readable, but read returns
      0, then it is closed, and writing to it can raise SIGPIPE. But
      Tarantool ignores SIGPIPE, so the process will not be terminated,
      write() just returns -1.
      
      The original code checks for SIGPIPE, because when Tarantool is
      run under debugger (gdb or lldb), the debugger by default sets
      its own signal handlers, and SIGPIPE terminates the process.
      
      But debugger settings can be changed to ignore SIGPIPE too, so
      lets remove this overengineering from the console code.
      427795fa
    • Vladislav Shpilevoy's avatar
      netbox: fix a bug with ignored reconnect_after · f278d3f0
      Vladislav Shpilevoy authored
      If a remote host is unreachable on the first connection attempt,
      and reconnect_after is set, then netbox state machine enters
      error state, but it must enter error_reconnect. Do it.
      
      The bug was introduced by me in
      d2468dac.
      f278d3f0
    • Vladimir Davydov's avatar
      libev: use clock_gettime on OS X if available · 10af1cb1
      Vladimir Davydov authored
      EV_USE_REALTIME and EV_USE_MONOTONIC, which force libev to use
      clock_gettime, are enabled automatically on Linux, but not on OS X. We
      used to forcefully enable them for performance reasons, but this broke
      compilation on certain OS X versions and so was disabled by commit
      d36ba279 ("Fix gh-1777: clock_gettime detected but unavailable in
      macos"). Today we need these features enabled not just because of
      performance, but also to avoid crashes when time changes on the host -
      see issue #2527 and commit a6c87bf9 ("Use ev_monotonic_now/time
      instead of ev_now/time for timeouts"). Fortunately, we have this cmake
      defined macro HAVE_CLOCKGETTIME_DECL, which is set if clock_gettime is
      available. Let's enable EV_USE_REALTIME and EV_USE_MONOTONIC if this
      macro is defined.
      
      Closes #3299
      10af1cb1
  7. Mar 29, 2018
    • Vladislav Shpilevoy's avatar
      Fix net.box test · 405446e0
      Vladislav Shpilevoy authored
      405446e0
    • Vladimir Davydov's avatar
      vinyl: fix discrepancy between vy_log.tx_size and actual tx len · 94569f65
      Vladimir Davydov authored
      When a vylog transaction is rolled back, we always reset vy_log.tx_size.
      Generally speaking, this is incorrect as rollback doesn't necessarily
      remove all pending records from the tx buffer - there still may be
      records committed with vy_log_tx_try_commit() that were left in the
      buffer due to write errors.  We don't rollback such records, but we
      still reset tx_size, which leads to a discrepancy between vy_log.tx_size
      and the actual length of vy_log.tx list, which further on results in an
      assertion failure:
      
        src/box/vy_log.c:698: vy_log_flush: Assertion `i < vy_log.tx_size' failed.
      
      We need vy_log.tx_size to allocate xrow_header array of a proper size so
      that we can flush pending vylog records to disk. This isn't a hot path
      there, because vylog operations are rare. Besides, we iterate over all
      records anyway to fill the xrow_header array. That said, let's remove
      vy_log.tx_size altogether and instead calculate the vy_log.tx list
      length right in place.
      94569f65
    • Vladimir Davydov's avatar
      vinyl: use rlist for iterating over objects recovered from vylog · 197e1ef0
      Vladimir Davydov authored
      Currently, we use mh_foreach, but each object is on an rlist, which
      suits better for iteration.
      197e1ef0
    • Vladimir Davydov's avatar
      index: add abort_create virtual method · 7dee93a0
      Vladimir Davydov authored
      The new method is called if index creation failed, either due to WAL
      write error or build error. It will be used by Vinyl to purge prepared
      LSM tree from vylog.
      7dee93a0
Loading