Skip to content
Snippets Groups Projects
  1. Apr 05, 2018
  2. Apr 04, 2018
    • Vladimir Davydov's avatar
      vinyl: zap upsert_format · 3a73c2c6
      Vladimir Davydov authored
      The only difference between format of UPSERT statements and format of
      other DML statements of the same index is that the former reserves one
      byte for UPSERT counter, which is needed to schedule UPSERT squashing.
      Since we store UPSERT counter on lsregion now, we don't need a special
      format for UPSERTs anymore. Remove it.
      3a73c2c6
    • Vladimir Davydov's avatar
      vinyl: allocate upsert counter on lsregion · e8147fe7
      Vladimir Davydov authored
      Currently, we store upsert counter in tuple metadata (that's what
      upsert_format is for), but since it's only relevant for tuples of
      the memory level, we can store it on lsregion, right before tuple
      data. Let's do it now so that we can get rid of upsert_format.
      e8147fe7
    • Kirill Yukhin's avatar
      Merge branch '1.9' into 1.10 · 402f066c
      Kirill Yukhin authored
      402f066c
    • Alexander Turenko's avatar
      Add 'key_def_new_with_parts' (temporary) · 7d089bbd
      Alexander Turenko authored
      Filed gh-3311 to remove this export soon.
      
      Fixes #3310.
      7d089bbd
  3. Apr 03, 2018
  4. Apr 02, 2018
  5. Mar 30, 2018
    • Konstantin Belyavskiy's avatar
      replication: recover missing local data from replica · eae84efb
      Konstantin Belyavskiy authored
      In case of sudden power-loss, if data was not written to WAL but
      already sent to remote replica, local can't recover properly and
      we have different datasets. Fix it by using remote replica's data
      and LSN comparison.
      
      Based on @GeorgyKirichenko proposal and @locker race free check.
      
      Closes #3210
      eae84efb
    • Konstantin Belyavskiy's avatar
      replication: stay in orphan mode until replica is synced by vclock · 7ebc8ae4
      Konstantin Belyavskiy authored
      Stay in orphan (read-only) mode until local vclock is lower than
      master's to make sure that datasets are the same across replicaset.
      Update replication/catch test to reflect the change.
      
      Suggested by @kostja
      
      Needed for #3210
      7ebc8ae4
    • Vladimir Davydov's avatar
      Update LuaRocks · 3171288c
      Vladimir Davydov authored
      Closes #3148
      3171288c
    • Vladislav Shpilevoy's avatar
      console: do not try to prevent SIGPIPE in text console · 427795fa
      Vladislav Shpilevoy authored
      Text console tried to learn about SIGPIPE before its raising
      by read-before-write. If a socket is readable, but read returns
      0, then it is closed, and writing to it can raise SIGPIPE. But
      Tarantool ignores SIGPIPE, so the process will not be terminated,
      write() just returns -1.
      
      The original code checks for SIGPIPE, because when Tarantool is
      run under debugger (gdb or lldb), the debugger by default sets
      its own signal handlers, and SIGPIPE terminates the process.
      
      But debugger settings can be changed to ignore SIGPIPE too, so
      lets remove this overengineering from the console code.
      427795fa
    • Vladislav Shpilevoy's avatar
      netbox: fix a bug with ignored reconnect_after · f278d3f0
      Vladislav Shpilevoy authored
      If a remote host is unreachable on the first connection attempt,
      and reconnect_after is set, then netbox state machine enters
      error state, but it must enter error_reconnect. Do it.
      
      The bug was introduced by me in
      d2468dac.
      f278d3f0
    • Vladimir Davydov's avatar
      libev: use clock_gettime on OS X if available · 10af1cb1
      Vladimir Davydov authored
      EV_USE_REALTIME and EV_USE_MONOTONIC, which force libev to use
      clock_gettime, are enabled automatically on Linux, but not on OS X. We
      used to forcefully enable them for performance reasons, but this broke
      compilation on certain OS X versions and so was disabled by commit
      d36ba279 ("Fix gh-1777: clock_gettime detected but unavailable in
      macos"). Today we need these features enabled not just because of
      performance, but also to avoid crashes when time changes on the host -
      see issue #2527 and commit a6c87bf9 ("Use ev_monotonic_now/time
      instead of ev_now/time for timeouts"). Fortunately, we have this cmake
      defined macro HAVE_CLOCKGETTIME_DECL, which is set if clock_gettime is
      available. Let's enable EV_USE_REALTIME and EV_USE_MONOTONIC if this
      macro is defined.
      
      Closes #3299
      10af1cb1
  6. Mar 29, 2018
    • Vladislav Shpilevoy's avatar
      Fix net.box test · 405446e0
      Vladislav Shpilevoy authored
      405446e0
    • Vladimir Davydov's avatar
      vinyl: fix discrepancy between vy_log.tx_size and actual tx len · 94569f65
      Vladimir Davydov authored
      When a vylog transaction is rolled back, we always reset vy_log.tx_size.
      Generally speaking, this is incorrect as rollback doesn't necessarily
      remove all pending records from the tx buffer - there still may be
      records committed with vy_log_tx_try_commit() that were left in the
      buffer due to write errors.  We don't rollback such records, but we
      still reset tx_size, which leads to a discrepancy between vy_log.tx_size
      and the actual length of vy_log.tx list, which further on results in an
      assertion failure:
      
        src/box/vy_log.c:698: vy_log_flush: Assertion `i < vy_log.tx_size' failed.
      
      We need vy_log.tx_size to allocate xrow_header array of a proper size so
      that we can flush pending vylog records to disk. This isn't a hot path
      there, because vylog operations are rare. Besides, we iterate over all
      records anyway to fill the xrow_header array. That said, let's remove
      vy_log.tx_size altogether and instead calculate the vy_log.tx list
      length right in place.
      94569f65
    • Vladimir Davydov's avatar
      vinyl: use rlist for iterating over objects recovered from vylog · 197e1ef0
      Vladimir Davydov authored
      Currently, we use mh_foreach, but each object is on an rlist, which
      suits better for iteration.
      197e1ef0
    • Vladimir Davydov's avatar
      index: add abort_create virtual method · 7dee93a0
      Vladimir Davydov authored
      The new method is called if index creation failed, either due to WAL
      write error or build error. It will be used by Vinyl to purge prepared
      LSM tree from vylog.
      7dee93a0
    • Vladislav Shpilevoy's avatar
      Fix net.box test · d9e254f8
      Vladislav Shpilevoy authored
      d9e254f8
    • Konstantin Osipov's avatar
      Merge branch '1.9' into 1.10 · 97cc085f
      Konstantin Osipov authored
      97cc085f
    • Ilya Markov's avatar
      log: Fix logging large objects · 5ab4581d
      Ilya Markov authored
      The bug was that logging we passed to function write
      number of bytes which may be more than size of buffer.
      This may happen because formatting log string we use vsnprintf which
      returns number of bytes would be written to buffer, not the actual
      number.
      
      Fix this with limiting number of bytes passing to write function.
      
      Close #3248
      5ab4581d
    • Konstantin Osipov's avatar
      Merge branch '1.9' into 1.10 · 180af15f
      Konstantin Osipov authored
      180af15f
    • Vladimir Davydov's avatar
      vinyl: improve latency stat · f3a84293
      Vladimir Davydov authored
      To facilitate performance analysis, let's report not only 99th
      percentile, but also 50th, 75th, 90th, and 95th. Also, let's add
      microsecond-granular buckets to the latency histogram.
      
      Closes #3207
      f3a84293
    • Ilya Markov's avatar
      say: Fix log_rotate · 26a4effe
      Ilya Markov authored
      * Refactor tests.
      * Add ev_async and fiber_cond for thread-safe log_rotate usage.
      
      Follow up #3015
      26a4effe
    • Ilya Markov's avatar
      log: Fix logger.test.lua · d0dcc8b9
      Ilya Markov authored
      Fix race condition in test on log_rotate.
      Test opened file that must be created by log_rotate and read from it.
      But as log_rotate is executed in separate thread, file may be not
      created or log may be not written yet by the time of opening in test.
      
      Fix this with waiting creation and reading the line.
      d0dcc8b9
    • Vladislav Shpilevoy's avatar
      netbox: deprecate console support · bd06e32a
      Vladislav Shpilevoy authored
      Print warning about that. After a while the cosole support will
      be deleted from netbox.
      bd06e32a
    • Vladislav Shpilevoy's avatar
      console: do not use netbox for console text connections · 1730c538
      Vladislav Shpilevoy authored
      Netbox console support complicates both netbox and console. Lets
      use sockets directly for text protocol.
      
      Part of #2677
      1730c538
    • Vladislav Shpilevoy's avatar
      netbox: allow to create a netbox connection from existing socket · d2468dac
      Vladislav Shpilevoy authored
      It is needed to create a binary console connection, when a
      socket is already created and a greeting is read and decoded.
      d2468dac
    • Vladimir Davydov's avatar
      bloom: drop spectrum · bc859dce
      Vladimir Davydov authored
      As it was pointed out earlier, the bloom spectrum concept is rather
      dubious, because its overhead for a reasonable false positive rate is
      about 10 bytes per record while storing all hashes in an array takes
      only 4 bytes per record so one can stash all hashes and count records
      first, then create the optimal bloom filter and add all hashes there.
      bc859dce
    • Vladimir Davydov's avatar
      bloom: optimize tuple bloom filter size · 4357bcf3
      Vladimir Davydov authored
      When we check if a multi-part key is hashed in a bloom filter, we check
      all its sub keys as well so the resulting false positive rate will be
      equal to the product of multiplication of false positive rates of bloom
      filters created for each sub key.
      
      The false positive rate of a bloom filter is given by the formula:
      
        f = (1 - exp(-kn/m)) ^ k
      
      where m is the number of bits in the bloom filter, k is the number of
      hash functions, and n is the number of elements hashed in the filter.
      By varying n, we can estimate the false positive rate of an existing
      bloom filter when used for a greater number of elements, in other words
      we can estimate the false positive rate of a bloom filter created for
      checking sub keys when used for checking full keys.
      
      Knowing this, we can adjust the target false positive rate of a bloom
      filter used for checking keys of a particular length based on false
      positive rates of bloom filters used for checking its sub keys. This
      will reduce the number of hash functions required to conform to the
      configured false positive rate and hence the bloom filter size.
      
      Follow-up #3177
      4357bcf3
    • Vladimir Davydov's avatar
      vinyl: introduce bloom filters for partial key lookups · fc654aaf
      Vladimir Davydov authored
      Currently, we store and use bloom only for full-key lookups. However,
      there are use cases when we can also benefit from maintaining bloom
      filters for partial keys as well - see #3177 for example. So this patch
      replaces the current full-key bloom filter with a multipart one, which
      is basically a set of bloom filters, one per each partial key. Old bloom
      filters stored on disk will be recovered as is so users will see the
      benefit of this patch only after major compaction takes place.
      
      When a key or tuple is checked against a multipart bloom filter, we
      check all its partial keys to reduce the false positive result.
      Nevertheless there's no size optimization as per now. E.g. even if the
      cardinality of a partial key is the same as of the full key, we will
      still store two full-sized bloom filters although we could probably save
      some space in this case by assuming that checking against the bloom
      corresponding to a partial key would reduce the false positive rate of
      full key lookups. This is addressed later in the series.
      
      Before this patch we used a bloom spectrum object to construct a bloom
      filter. A bloom spectrum is basically a set of bloom filters ranging in
      size. The point of using a spectrum is that we don't know what the run
      size will be while we are writing it so we create 10 bloom filters and
      choose the best of them after we are done. With the default bloom fpr of
      0.05 it is 10 byte overhead per record, which seems to be OK. However,
      if we try to optimize other parameters as well, e.g. the number of hash
      functions, the cost of a spectrum will become prohibitive. Funny thing
      is a tuple hash is only 4 bytes long, which means if we stored all
      hashes in an array and built a bloom filter after we'd written a run, we
      would reduce the memory footprint by more than half! And that would only
      slightly increase the run write time as scanning a memory map of hashes
      and constructing a bloom filter is cheap in comparison to mering runs.
      Putting it all together, we stop using bloom spectrum in this patch,
      instead we stash all hashes in a new bloom builder object and use them
      to build a perfect bloom filer after the run has been written and we
      know the cardinality of each partial key.
      
      Closes #3177
      fc654aaf
    • Vladimir Davydov's avatar
      bloom: rename bloom_possible_has to bloom_maybe_has · f03fd4db
      Vladimir Davydov authored
      Suggested by @kostja
      f03fd4db
    • Vladimir Davydov's avatar
      bloom: use malloc for bitmap allocations · 78df5acd
      Vladimir Davydov authored
      There's absolutely no point in using mmap() instead of malloc() for
      bitmap allocation - malloc() will fallback on mmap() anyway provided
      the allocation is large enough.
      
      Note about the unit test: since we don't round the bloom filter size up
      to a multiple of page size anymore, we have to use a more sophisticated
      hash function for the test to pass.
      78df5acd
    • Vladimir Davydov's avatar
      test: vinyl/layout: fix bloom filter filtering in output · 88c4c19a
      Vladimir Davydov authored
      We filter bloom filters, because they depend on ICU version and hence
      the test output may vary from one platform to another (see commit
      0a37ccad "Filter out bloom_filter in vinyl/layout.test.lua").
      However, using test_run for this is unreliable, because a bloom string
      can contain newline characters and hence be split in multiple lines in
      console output, in which case the filter won't work. Fix this by
      filtering bloom_filter manually.
      88c4c19a
    • Vladislav Shpilevoy's avatar
      Merge branch '1.9' into 1.10 · 7ee84c95
      Vladislav Shpilevoy authored
      7ee84c95
    • Kirill Shcherbatov's avatar
      netbox: show is_nullable and collation fields · cc935d24
      Kirill Shcherbatov authored
      Netbox does not need nullability or collation info, but some
      customers do. Lets fill index parts with these fields.
      
      Fixes #3256
      cc935d24
  7. Mar 28, 2018
    • Kirill Shcherbatov's avatar
      tuple: add names_only option to build true dictionary · 5ad26fe2
      Kirill Shcherbatov authored
      Now tuple:tomap() method returns a map with both field names and
      field indexes, equal to the same field values. It is done to
      1) allow to still access tomap() result like a tuple, by indexes;
      2) allow to access non-named fields.
      
      But is not useful, when a result map must be saved somewhere, for
      example, in JSON - all its keys muse be strings. So allow to
      get this behaviour using tuple:tomap({names_only = true}).
      
      Fixes #3280
      5ad26fe2
Loading