Skip to content
Snippets Groups Projects
  1. Feb 11, 2019
    • Alexander Turenko's avatar
      sql: clean up lemon acttab_free() a bit · 8acf2939
      Alexander Turenko authored
      Nikita Pettik suggests me that free(NULL) is no-op according to POSIX.
      
      This is follow up of 9dbcaa3a.
      8acf2939
    • Konstantin Belyavskiy's avatar
      replication: do not fetch records twice · ae938677
      Konstantin Belyavskiy authored
      This is a draft paper covering following topics:
      1. Draft protocol for discovering and maintaining network topology
      in case of large arbitrary network.
      2. List of required changes to support this feature.
      3. Open questions and alternatives.
      
      Changes in V2:
      Based or Vlad's review
      1. Rewrite couple sections to make it more clear.
      2. Clarify with more details and add examples.
      3. Fixed error.
      
      RFC for #3294
      ae938677
    • Vladimir Davydov's avatar
      vinyl: fix compaction priority calculation · d5ceb204
      Vladimir Davydov authored
      When computing the number of runs that need to be compacted for a range
      to conform to the target LSM tree shape, we use the newest run size for
      the size of the first LSM tree level. This isn't quite correct for two
      reasons.
      
      First, the size of the newest run is unstable - it may vary in a
      relatively wide range from dump to dump. This leads to frequent changes
      in the target LSM tree shape and, as a result, unpredictable compaction
      behavior. In particular this breaks compaction randomization, which is
      supposed to smooth out IO load generated by compaction.
      
      Second, this can increase space amplification. We trigger compaction at
      the last level when there's more than one run, irrespective of the value
      of run_count_per_level configuration option. We expect this to keep
      space amplification below 2 provided run_count_per_level is not greater
      than (run_size_ratio - 1). However, if the newest run happens to have
      such a size that multiplying it by run_size_ratio several times gives us
      a value only slightly less than the size of the oldest run, we can
      accumulate up to run_count_per_level more runs that are approximately as
      big as the last level run without triggering compaction, thus increasing
      space amplification by up to run_count_per_level.
      
      To fix these problems, let's use the oldest run size for computing the
      size of the first LSM tree level - simply divide it by run_size_ratio
      until it exceeds the size of the newest run.
      
      Follow-up #3657
      d5ceb204
    • Vladimir Davydov's avatar
      box: enable WAL before making initial checkpoint · c6743038
      Vladimir Davydov authored
      While a replica is bootstrapped from a remote master, vinyl engine
      may need to perform compaction, which means that it may write to
      the _vinyl_deferred_delete system space. Compaction proceeds fully
      asynchronously, i.e. a write may occur after the join stage is
      complete, but before the WAL is initialized, in which case the new
      replica will crash. To make sure a race like that won't happen, let's
      setup WAL before making the initial checkpoint. The WAL writer is now
      initialized right before starting the WAL thread and so we don't need
      to split WAL struct into the thread and the writer anymore.
      
      Closes #3968
      c6743038
  2. Feb 08, 2019
    • Vladimir Davydov's avatar
      test: fix xlog/panic_on_broken_lsn spurious failure · f1bd33a8
      Vladimir Davydov authored
      If this test is executed after some other test that bumps LSN, then
      the output line gets truncated differently, because greater LSNs may
      increase its length. Fix this by filtering out the LSN manually.
      
      Closes #3970
      f1bd33a8
    • Ivan Koptelov's avatar
      sql: raise an err on CHECK constraint with ON CONFLICT action · 6aa0ef4a
      Ivan Koptelov authored
      Currently all on 'conflict' actions are silently
      ignored for 'check' constraints. This patch add
      explicit parse-time error.
      
      Closes #3345
      6aa0ef4a
    • Nikita Pettik's avatar
      Remove affinity from field definition · b9afef16
      Nikita Pettik authored
      Closes #3698
      b9afef16
    • Nikita Pettik's avatar
      sql: clean-up affinity from SQL source code · 037e2e44
      Nikita Pettik authored
      Replace remains of affinity usage in SQL parser, query optimizer and
      VDBE. Don't add affinity to field definition when table is encoded into
      msgpack.  Remove field type <-> affinity converters, since now we can
      operate directly on field type.
      
      Part of #3698
      037e2e44
    • Nikita Pettik's avatar
      sql: replace affinity with field type in struct Expr · 2dd8444f
      Nikita Pettik authored
      Also this patch resolves issue connected with wrong query plans during
      select on spaces created from Lua: instead of index search in most cases
      table scan was used. It appeared due to the fact that index was checked
      on affinity compatibility with space format. So, if space is created
      without affinity in format, indexes won't be used.
      However, now all checks are related to field types, and as a result
      query optimizer is able to choose correct index.
      
      Closes #3886
      Part of #3698
      2dd8444f
    • Nikita Pettik's avatar
      sql: replace affinity with field type for VDBE runtime · 5a561326
      Nikita Pettik authored
      This stage of affinity removal requires introducing of auxiliary
      intermediate function to convert array of affinity values to field type
      values. The rest of job done in this commit is a straightforward
      refactoring.
      
      Part of #3698
      5a561326
    • Nikita Pettik's avatar
      sql: replace affinity with field type for func · 00758981
      Nikita Pettik authored
      Lets use field_type instead of affinity as a type of return value of
      user function registered in SQL. Moreover, lets assign type of return
      value to expression representing functions. It allows to take it into
      consideration during derived type calculation.
      
      Part of #3698
      00758981
    • Nikita Pettik's avatar
      sql: remove numeric affinity · 758ab1a4
      Nikita Pettik authored
      Numeric affinity in SQLite means the same as real, except that it
      forces integer values into floating point representation in case
      it can be converted without loss (e.g. 2.0 -> 2).
      Since in Tarantool core there is no difference between numeric and real
      values (both are stored as values of Tarantool type NUMBER), lets
      remove numeric affinity and use instead real.
      
      The only real pitfall is implicit conversion mentioned above.  We can't
      pass *.0 as an iterator value since our fast comparators (TupleCompare,
      TupleCompareWithKey) are designed to work with only values of same MP_
      type. They do not use slow tuple_compare_field() which is able to
      compare double and integer. Solution to this problem is simple: lets
      always attempt at encoding floats as ints if conversion takes place
      without loss. This is a straightforward approach, but to implement it we
      need to care about reversed (decoding) situation.
      
      OP_Column fetches from msgpack field with given number and stores it as
      a native VDBE memory object. Type of that memory is based on type of
      msgpack value. So, if space field is of type NUMBER and holds value 1,
      type of VDBE memory will be INT (after decoding), not float 1.0.  As a
      result, further calculations may be wrong: for instance, instead of
      floating point division, we could get integer division.  To cope with
      this problem, lets add auxiliary conversion to decoding routine which
      uses space format of tuple to be decoded. It is worth mentioning that
      ephemeral spaces don't feature space format, so we are going to rely on
      type of key parts. Finally, internal VDBE merge sorter also operates on
      entries encoded into msgpack. To fix this case, we check type of
      ORDER BY/GROUP BY arguments: if they are of type float, we are emitting
      additional opcode OP_AffinityReal to force float type after encoding.
      
      Part of #3698
      758ab1a4
    • Nikita Pettik's avatar
      sql: use field type instead of affinity for type_def · 82298c55
      Nikita Pettik authored
      Also, this allows to delay affinity assignment to field def until
      encoding of table format.
      
      Part of #3698
      82298c55
    • Nikita Pettik's avatar
      sql: remove SQLITE_ENABLE_UPDATE_DELETE_LIMIT define · 43ed060f
      Nikita Pettik authored
      Code under this define is dead. What is more, it uses affinity, so lets
      remove it alongside with tests related to it.
      
      Needed for #3698
      43ed060f
    • Georgy Kirichenko's avatar
      replication: promote tx vclock only after successful wal write · 056deb2c
      Georgy Kirichenko authored
      Applier used to promote vclock prior to applying the row. This lead to
      a situation when master's row would be skipped forever in case there is
      an error trying to apply it. However, some errors are transient, and we
      might be able to successfully apply the same row later.
      
      While we're at it, make wal writer the only one responsible for
      advancing replicaset vclock. It was already doing it for rows coming
      from the local instance, besides, it makes the code cleaner since now we
      want to advance vclock direct from wal batch reply and lets us get rid of
      unnecessary checks whether applier or wal has already advanced the
      vclock.
      
      Closes #2283
      Prerequisite #980
      056deb2c
    • Georgy Kirichenko's avatar
      wal: do not promote wal vclock for failed writes · 066b929b
      Georgy Kirichenko authored
      Wal used to promote vclock prior to write the row. This lead to a
      situation when master's row would be skipped forever in case there is
      an error trying to write it. However, some errors are transient, and we
      might be able to successfully apply the same row later. So we do not
      promote writer vclock in order to be able to restart replication from
      failing point.
      
      Obsoletes xlog/panic_on_lsn_gap.test.
      
      Needed for #2283
      066b929b
  3. Feb 07, 2019
  4. Feb 06, 2019
    • Vladimir Davydov's avatar
      vinyl: use uncompressed run size for range split/coalesce/compaction · 3313009d
      Vladimir Davydov authored
      Historically, when considering splitting or coalescing a range or
      updating compaction priority, we use sizes of compressed runs (see
      bytes_compressed). This makes the algorithms dependent on whether
      compression is used or not and how effective it is, which is weird,
      because compression is a way of storing data on disk - it shouldn't
      affect the way data is partitioned. E.g. if we turned off compression
      at the first LSM tree level, which would make sense, because it's
      relatively small, we would affect the compaction algorithm because
      of this.
      
      That said, let's use uncompressed run sizes when considering range
      tree transformations.
      3313009d
    • Serge Petrenko's avatar
      Fix tarantool -e "os.exit()" hang · 3a851430
      Serge Petrenko authored
      After the patch which made os.exit() execute on_shutdown triggers
      (see commit 6dc4c8d7) we relied
      on on_shutdown triggers to break the ev_loop and exit tarantool.
      Hovewer, there is an auxiliary event loop which is run in
      tarantool_lua_run_script() to reschedule the fiber executing chunks
      of code passed by -e option and executing interactive mode.
      This event loop is started only to execute interactive mode, and
      doesn't exist during execution of -e chunks. Make sure we don't start
      it if os.exit() was already executed in one of the chunks.
      
      Closes #3966
      3a851430
    • Serge Petrenko's avatar
      Fix fiber_join() hang in case fiber_cancel() was called · d69c149f
      Serge Petrenko authored
      In case a fiber joining another fiber gets cancelled, it stays suspended
      forever and never finishes joining. This happens because fiber_cancel()
      wakes the fiber and removes it from all execution queues.
      Fix this by adding the fiber back to the wakeup queue of the joined
      fiber after each yield.
      
      Closes #3948
      d69c149f
    • Serge Petrenko's avatar
      replication: downstream status reporting in box.info · fcf43533
      Serge Petrenko authored
      Start showing downstream status for relays in "follow" state.
      Also refactor lbox_pushrelay to unify code for different relay
      states.
      
      Closes #3904
      fcf43533
  5. Feb 05, 2019
  6. Feb 04, 2019
    • Konstantin Osipov's avatar
    • Vladimir Davydov's avatar
      Update msgpuck library to fix compilation on clang · 76edf94b
      Vladimir Davydov authored
      The patch adds missing -fPIC option for clang, without which msgpuck
      library might fail to compile.
      76edf94b
    • Kirill Shcherbatov's avatar
      box: specify indexes in user-friendly form · a754980d
      Kirill Shcherbatov authored
      Implemented a more convenient interface for creating an index
      by JSON path. Instead of specifying fieldno and relative path
      it is now possible to pass full JSON path to data.
      
      Closes #1012
      
      @TarantoolBot document
      Title: Indexes by JSON path
      Sometimes field data could have complex document structure.
      When this structure is consistent across whole space,
      you are able to create an index by JSON path.
      
      Example:
      s = box.schema.space.create('sample')
      format = {{'id', 'unsigned'}, {'data', 'map'}}
      s:format(format)
      -- explicit JSON index creation
      age_idx = s:create_index('age', {{2, 'number', path = "age"}})
      -- user-friendly syntax for JSON index creation
      parts = {{'data.FIO["fname"]', 'str'}, {'data.FIO["sname"]', 'str'},
           {'data.age', 'number'}}
      info_idx = s:create_index('info', {parts = parts}})
      s:insert({1, {FIO={fname="James", sname="Bond"}, age=35}})
      a754980d
    • Kirill Shcherbatov's avatar
      box: introduce offset_slot cache in key_part · e2df0af2
      Kirill Shcherbatov authored
      tuple_field_by_part looks up the tuple_field corresponding to the
      given key part in tuple_format in order to quickly retrieve the offset
      of indexed data from the tuple field map. For regular indexes this
      operation is blazing fast, however of JSON indexes it is not as we
      have to parse the path to data and then do multiple lookups in a JSON
      tree. Since tuple_field_by_part is used by comparators, we should
      strive to make this routine as fast as possible for all kinds of
      indexes.
      
      This patch introduces an optimization that is supposed to make
      tuple_field_by_part for JSON indexes as fast as it is for regular
      indexes in most cases. We do that by caching the offset slot right in
      key_part. There's a catch here however - we create a new format
      whenever an index is dropped or created and we don't reindex old
      tuples. As a result, there may be several generations of tuples in the
      same space, all using different formats while there's the only key_def
      used for comparison.
      
      To overcome this problem, we introduce the notion of tuple_format
      epoch. This is a counter incremented each time a new format is
      created. We store it in tuple_format and key_def, and we only use
      the offset slot cached in a key_def if it's epoch coincides with the
      epoch of the tuple format. If they don't, we look up a tuple_field as
      before, and then update the cached value provided the epoch of the
      tuple format.
      
      Part of #1012
      e2df0af2
    • Kirill Shcherbatov's avatar
      box: introduce has_json_paths flag in templates · 8e091047
      Kirill Shcherbatov authored
      Introduced has_json_path flag for compare, hash and extract
      functions templates(that are really hot) to make possible do not
      look to path field for flat indexes without any JSON paths.
      
      Part of #1012
      8e091047
    • Kirill Shcherbatov's avatar
      box: introduce JSON Indexes · 4273ec52
      Kirill Shcherbatov authored
      New JSON indexes allows to index documents content.
      At first, introduced new key_part fields path and path_len
      representing JSON path string specified by user. Modified
      tuple_format_use_key_part routine constructs corresponding
      tuple_fields chain in tuple_format::fields tree to indexed data.
      The resulting tree is used for type checking and for alloctating
      indexed fields offset slots.
      
      Then refined tuple_init_field_map routine logic parses tuple
      msgpack in depth using stack allocated on region and initialize
      field map with corresponding tuple_format::field if any.
      Finally, to proceed memory allocation for vinyl's secondary key
      restored by extracted keys loaded from disc without fields
      tree traversal, introduced format::min_tuple_size field - the
      size of tuple_format tuple as if all leaf fields are zero.
      
      Example:
      To create a new JSON index specify path to document data as a
      part of key_part:
      parts = {{3, 'str', path = '.FIO.fname', is_nullable = false}}
      idx = s:create_index('json_idx', {parts = parse})
      idx:select("Ivanov")
      
      Part of #1012
      4273ec52
    • Kirill Shcherbatov's avatar
      box: introduce tuple_field_raw_by_path routine · e4a565db
      Kirill Shcherbatov authored
      Introduced a new function tuple_field_raw_by_path is used to get
      tuple fields by field index and relative JSON path. This routine
      uses tuple_format's field_map if possible. It will be further
      extended to use JSON indexes.
      The old tuple_field_raw_by_path routine used to work with full
      JSON paths, renamed tuple_field_raw_by_full_path. It's return
      value type is changed to const char * because the other similar
      functions tuple_field_raw and tuple_field_by_part_raw use this
      convention.
      Got rid of reporting error position for 'invalid JSON path' error
      in lbox_tuple_field_by_path because we can't extend other
      routines to behave such way that makes an API inconsistent,
      moreover such error are useless and confusing.
      
      Needed for #1012
      e4a565db
    • Kirill Shcherbatov's avatar
      Update msgpuck library · c4f2ffb8
      Kirill Shcherbatov authored
      The msgpack dependency has been updated because the new version
      introduces the new mp_stack class which we will use to parse
      tuple without recursion when initializing the field map.
      
      Needed for #1012
      c4f2ffb8
  7. Jan 30, 2019
    • Serge Petrenko's avatar
      box: get rid of atexit() for calling cleanup routines · 1bc1fcda
      Serge Petrenko authored
      Move a call to tarantool_free() to the end of main().
      We needn't call atexit() at all anymore, since we've implemented
      on_shutdown triggers and patched os.exit() so that when exiting not
      due to a fatal signal (when no cleanup routines are called anyway)
      control always reaches a call to tarantool_free().
      1bc1fcda
Loading