Skip to content
Snippets Groups Projects
  1. Jan 29, 2019
    • Mergen Imeev's avatar
      iproto: move map creation to sql_response_dump() · 7609069c
      Mergen Imeev authored
      Currently, function sql_response_dump() puts data into an already
      created map. Moving the map creation to sql_response_dump()
      simplifies the code and allows us to use sql_response_dump() as
      one of the port_sql methods.
      
      Needed for #3505
      7609069c
    • Vladimir Davydov's avatar
      tuple: fix on-stack buffer allocation in tuple_hash_field · c267b0b8
      Vladimir Davydov authored
      The buffer is defined in a nested {} block. This gives the compiler the
      liberty to overwrite it once the block has been executed, which would be
      incorrect since the content of the buffer is used outside the {} block.
      This results in box/hash and viny/bloom test failures when tarantool is
      compiled in the release mode. Fix this by moving the buffer definition
      to the beginning of the function.
      
      Fixes commit 0dfd99c4 ("tuple: fix hashing of integer numbers").
      c267b0b8
    • Vladimir Davydov's avatar
      tuple: fix hashing of integer numbers · 0dfd99c4
      Vladimir Davydov authored
      Integer numbers stored in tuples as MP_FLOAT/MP_DOUBLE are hashed
      differently from integer numbers stored as MP_INT/MP_UINT. This breaks
      select() for memtx hash indexes and vinyl indexes (the latter use bloom
      filters). Fix this by converting MP_FLOAT/MP_DOUBLE to MP_INT/MP_UINT
      before hashing if the value can be stored as an integer. This is
      consistent with the behavior of tuple comparators, which treat MP_FLOAT
      and MP_INT as equal in case they represent the same number.
      
      Closes #3907
      0dfd99c4
  2. Jan 25, 2019
    • Vladimir Davydov's avatar
      wal: remove old xlog files asynchronously · 8e429f4b
      Vladimir Davydov authored
      In contrast to TX thread, WAL thread performs garbage collection
      synchronously, blocking all concurrent writes. We expected file removal
      to happen instantly so we didn't bother to offload this job to eio
      threads. However, it turned out that sometimes removal of a single xlog
      file can take 50 or even 100 ms. If there are a dozen files to be
      removed, this means a second delay and 'too long WAL write' warnings.
      
      To fix this issue, let's make WAL garbage collection fully asynchronous.
      Simply submit a jobs to eio and assume it will successfully complete
      sooner or later.  This means that if unlink() fails for some reason, we
      will log an error and never retry file removal until the server is
      restarted. Not a big deal. We can live with it assuming unlink() doesn't
      normally fail.
      
      Closes #3938
      8e429f4b
    • Vladimir Davydov's avatar
      gc: do not abort garbage collection if failed to unlink snap file · 783662fb
      Vladimir Davydov authored
      We build the checkpoint list from the list of memtx snap files. So to
      ensure that it is always possible to recover from any checkpoint present
      in box.info.gc() output, we abort garbage collection if we fail to
      unlink a snap file. This introduces extra complexity to the garbage
      collection code, which makes it difficult to make WAL file removal fully
      asynchronous.
      
      Actually, it looks like we are being way too overcautious here, because
      unlink() doesn't normally fail so an error while removing a snap file is
      highly unlikely to occur. Besides, even if it happens, it still won't be
      critical, because we never delete the last checkpoint, which is usually
      used for backups/recovery. So let's simplify the code by removing that
      check.
      
      Needed for #3938
      783662fb
    • Kirill Yukhin's avatar
      Allow to reuse tuple_formats for ephemeral spaces · dbbd9317
      Kirill Yukhin authored
      Since under heavy load with SQL queries ephemeral
      spaces might be extensively used it is possible to run out
      of tuple_formats for such spaces. This occurs because
      tuple_format is not immediately deleted when ephemeral space is
      dropped. Its removel is postponed instead and triggered only
      when tuple memory is exhausted.
      As far as there's no way to alter ephemeral space's format,
      let's re-use them for multiple epehemral spaces in case
      they're identical.
      
      Closes #3924
      dbbd9317
  3. Jan 24, 2019
    • Kirill Yukhin's avatar
      sql: set error type in case of ephemral space creation failure · 65cd0b11
      Kirill Yukhin authored
      This is trivial patch which sets error kind if epehemeral
      spaces cannot be created due to Tarantool's backend (e.g. there's
      no more memory or formats available).
      65cd0b11
    • Kirill Yukhin's avatar
      Set is_temporary flag for formats of ephemeral spaces · 7225084f
      Kirill Yukhin authored
      Before the patch, when ephemeral space was created flag
      is_temporary was set after space was actually created.
      Which in turn lead to corresponding flag of tuple_format
      being set to `false`.
      So, having heavy load using ephemeral spaces (almost any
      SQL query) and snapshotting at the same time might lead
      to OOM, since tuples of ephemeral spaces were not marked
      as temporary and were not gc-ed.
      Patch sets the flag in space definition.
      7225084f
    • Kirill Yukhin's avatar
      Pass necessary fields to tuple_format contructor · 3a18f81d
      Kirill Yukhin authored
      There were three extra fields of tuple_format which were setup
      after it was created. Fix that by extending tuple_format
      contstructor w/ three new arguments: engine, is_temporary,
      exact_field_count.
      3a18f81d
    • Vladimir Davydov's avatar
      vinyl: ignore unknown .run, .index and .vylog keys · 7bd128ae
      Vladimir Davydov authored
      Currently, if we encounter an unknown key while parsing a .run, .index,
      or .vylog file we raise an error. As a result, if we add a new key to
      either of those entities, we will break forward compatibility although
      there's actually no reason for that. To avoid that, let's silently
      ignore unknown keys, as we do in case of xrow header keys.
      7bd128ae
    • Vladimir Davydov's avatar
      vinyl: update lsm->range_heap in one go on dump completion · b07ad8b7
      Vladimir Davydov authored
      Upon LSM tree dump completion, we iterate over all ranges of the LSM
      tree to update their priority and the position in the compaction heap.
      Since typically we need to update all ranges, we better use update_all
      heap method instead of updating the heap entries one by one.
      b07ad8b7
    • Nikita Pettik's avatar
      sql: make IN operator stop ignoring type and collation · d5373cd2
      Nikita Pettik authored
      SQLite discards type and collation of IN operator when it comes with
      only one operand. This leads to different results of straight comparison
      using '=' operator and IN:
      
      SELECT x FROM t1 WHERE x IN (1.0);
      -- Result is empty set
      SELECT x FROM t1 WHERE x = 1.0;
      - - ['1']
      
      Lets remove this strange ignorance and always take into consideration
      types and collations of operands.
      
      Closes #3934
      d5373cd2
    • Alexander Turenko's avatar
      test: update test-run · 0fc536c7
      Alexander Turenko authored
      * Fixed wait_vclock() LSN problem with nil handling (#3895).
      * Enabled HangWatcher under --long.
      * Show result file for a hang test once at the end.
      * Show diff against a result file for a hung test.
      0fc536c7
  4. Jan 16, 2019
    • Vladimir Davydov's avatar
      vinyl: add last level size to statistics · f583b6c8
      Vladimir Davydov authored
      In order to estimate space amplification of a vinyl database, we need to
      know the size of data stored at the last LSM tree level. So this patch
      adds such a counter both per index and globablly.
      
      Per-index it is reported under disk.last_level, in rows, bytes, bytes
      after compression, and pages, just like any other disk counter.
      
      Globablly it is repoted in bytes only under disk.data_compacted. Note,
      to be consistent with disk.data, it doesn't include the last level of
      secondary indexes.
      f583b6c8
    • Vladimir Davydov's avatar
      vinyl: add dump/compaction time to statistics · 64fcd367
      Vladimir Davydov authored
      This patch adds dump_time and compaction_time to the scheduler section
      of global vinyl statistics and disk.dump.time and disk.compaction.time
      to per-index statistics. They report the total time spent doing dump and
      compaction tasks, respectively and can be useful for estimating average
      disk write rate, which is required for compaction-aware throttling.
      64fcd367
    • Vladimir Davydov's avatar
      vinyl: add task accounting to global scheduler statistics · 01e5d62c
      Vladimir Davydov authored
      This patch adds the following new fields to box.stat.vinyl():
      
        scheduler.tasks_inprogress - number of currently running tasks
        scheduler.tasks_completed - number of successfully completed tasks
        scheduler.tasks_failed - number of aborted tasks
      
      tasks_failed can be useful for monitoring disk write errors while
      tasks_inprogress and tasks_completed can shed light on worker thread
      pool effeciency.
      01e5d62c
    • Vladimir Davydov's avatar
      vinyl: don't account secondary indexes to scheduler.dump_input · ea79b624
      Vladimir Davydov authored
      Indexes of the same space share a memory level so we should account them
      to box.stat.vinyl().scheduler.dump_input once per space dump, not each
      time an index is dumped.
      ea79b624
    • Vladimir Davydov's avatar
      vinyl: add dump count to global scheduler statistics · 39ba4a5f
      Vladimir Davydov authored
      This patch adds scheduler.dump_count to box.stat.vinyl(), which shows
      the number of memory level dumps that have happened since the instance
      startup or box.stat.reset(). It's useful for estimating an average size
      of a single memory dump, which in turn can be used for calculating LSM
      tree fanout.
      39ba4a5f
    • Vladimir Davydov's avatar
      vinyl: move global dump/compaction statistics to scheduler · 1d23a1b1
      Vladimir Davydov authored
      Although it's convenient to maintain dump/compaction input/output
      metrics in vy_lsm_env, semantically it's incorrect as those metrics
      characterize the scheduler not the LSM environment. Also, we can't
      easily extend those stats with e.g. the number of completed dumps or
      the number of tasks in progress, because those are only known to the
      scheduler.
      
      That said, let's introduce 'scheduler' section in box.stat.vinyl() and
      move dump/compaction stats from 'disk' to the new section. Let's also
      move the stats accounting from vy_lsm.c to vy_scheduler.c. The 'disk'
      section now stores only the size of data and index on disk and no
      cumulative statistics, which makes it similar to the 'memory' section.
      
      Note, this patch flattens the stats (disk.compaction.input is moved to
      scheduler.compaction_input and so forth), because all other global stats
      are reported without using nested tables.
      1d23a1b1
  5. Jan 15, 2019
    • Vladimir Davydov's avatar
      vinyl: don't add dropped LSM trees to the scheduler during recovery · 6a4b0667
      Vladimir Davydov authored
      During local recovery we may encounter an LSM tree marked as dropped.
      This means that the LSM tree was dropped before restart and hence will
      be deleted before recovery completion. There's no need to add such trees
      to the vinyl scheduler - it looks confusing and can potentially result
      in mistakes when the code gets modified.
      6a4b0667
    • Vladimir Davydov's avatar
      vinyl: bump range version in vy_range.c · c8fa604e
      Vladimir Davydov authored
      Currently, we bump range->version in vy_scheduler.c. This looks like an
      encapsulation violation, and may easily result in an error (as we have
      to be cautious to inc range->version whenever we modify a range). That
      said, let's bump range version right in vy_range.c.
      c8fa604e
    • Vladimir Davydov's avatar
      vinyl: rename compact to compaction · a3efe351
      Vladimir Davydov authored
      compact_input sounds confusing, because 'compact' works as an adjective
      here. Saving 3 characters per variable/stat name related to compaction
      doesn't justify this. Let's rename 'compact' to 'compaction' both in
      stats and in the code.
      a3efe351
    • Vladimir Davydov's avatar
      vinyl: rename dump/compact in/out to input/output · 8f7f7c03
      Vladimir Davydov authored
      'in' is a reserved keyword in Lua, so using 'in' as a map key was a bad
      decision - one has to access it with [] rather than simply with dot.
      Let's rename 'in' to 'input' and 'out' to 'output' both in the output
      and in the code.
      8f7f7c03
    • Vladimir Davydov's avatar
      test: split vinyl/errinj · 0a7dbedb
      Vladimir Davydov authored
      This test is huge and takes long to complete. Let's move ddl, tx, and
      stat related stuff to separate files.
      0a7dbedb
    • Vladimir Davydov's avatar
      test: rename vinyl/info to vinyl/stat · 2ce012e2
      Vladimir Davydov authored
      The test was called 'info' in the first place, because back when it was
      introduced vinyl statistics were reported by 'info' method. Today, stats
      are reported by 'stat' so let's rename the test as well to conform.
      2ce012e2
  6. Jan 14, 2019
    • Konstantin Osipov's avatar
      box: expose snapshot status in box.info.gc() · dfa7a61d
      Konstantin Osipov authored
      Add box.info.gc.checkpoint_is_in_progress which is true when
      there is an ongoing checkpoint/snapshot and false otherwise
      
      Closes gh-3935
      
      @TarantoolBot document
      Title: box.info.gc().checkpoint_is_in_progress
      
      Extend box.info.gc() documentation with a new member
      - checkpoint_is_in_progress, which is true if there is an ongoing
      checkpoint, false otherwise.
      dfa7a61d
  7. Jan 10, 2019
    • Mergen Imeev's avatar
      box: add missing diag_set to port_tuple_dump_msgpack · 949d6737
      Mergen Imeev authored
      Not really critical as obuf_alloc() fails only on OOM, i.e. never in
      practice.
      949d6737
    • Kirill Shcherbatov's avatar
      sql: do not use OP_Delete+OP_Insert for UPDATES · c5c1a389
      Kirill Shcherbatov authored
      Introduced a new OP_Update opcode executing Tarantool native
      Update operation.
      In case of UPDATE or REPLACE we can't use new OP_Update as it
      has a complex SQL-specific semantics:
      
      CREATE TABLE tj (s1 INT PRIMARY KEY, s2 INT);
      INSERT INTO tj VALUES (1, 3),(2, 4),(3,5);
      CREATE UNIQUE INDEX i ON tj (s2);
      SELECT * FROM tj;
      [1, 3], [2, 4], [3, 5]
      UPDATE OR REPLACE tj SET s2 = s2 + 1;
      SELECT * FROM tj;
      [1, 4], [3, 6]
      
      I.e. [1, 3] tuple is updated as [1, 4] and have replaced tuple
      [2, 4]. This logic is implemented as preventive tuples deletion
      by all corresponding indexes in SQL.
      
      The other significant change is forbidden primary key update.
      It was possible to deal with it the same way like with or
      REPLACE specifier but we need an atomic UPDATE step for #3691
      ticket to support "or IGNORE/or ABORT/or FAIL" specifiers.
      Reworked tests to make testing avoiding primary key UPDATE where
      possible.
      
      Closes #3850
      c5c1a389
    • Kirill Shcherbatov's avatar
      sql: encode tuples with mpstream on Vdbe run · 486f0d24
      Kirill Shcherbatov authored
      Introduced new sql_vdbe_mem_encode_tuple and
      mpstream_encode_vdbe_mem routines to perform Vdbe memory to
      msgpack encoding on region without previous size estimation call.
      Got rid of sqlite3VdbeMsgpackRecordLen and
      sqlite3VdbeMsgpackRecordPut functions that became useless. This
      approach also resolves problem with invalid size estimation #3035
      because it is not required anymore.
      
      Needed for #3850
      Closes #3035
      486f0d24
    • Kirill Shcherbatov's avatar
      sql: fix fkey exception for self-referenced table · c8c73713
      Kirill Shcherbatov authored
      UPDATE operation doesn't fail when fkey self-reference condition
      unsatisfied and table has other records.
      To do not raise error where it is not necessary Vdbe inspects
      parent table with OP_Found. This branch is not valid
      for self-referenced table since its looking for a tuple affected
      by UPDATE operation and since the foreign key has already
      detected a conflict it must be raised.
      
      Example:
      CREATE TABLE t6(a INTEGER PRIMARY KEY, b TEXT, c INT, d TEXT, UNIQUE(a, b),
                      FOREIGN KEY(c, d) REFERENCES t6(a, b));
      INSERT INTO t6 VALUES(1, 'a', 1, 'a');
      INSERT INTO t6 VALUES(100, 'one', 100, 'one');
      UPDATE t6 SET c = 1, d = 'a' WHERE a = 100;
      -- fk conflict must be raised here
      
      Needed for #3850
      Closes #3918
      c8c73713
    • Kirill Shcherbatov's avatar
      sql: fix sql_vdbe_mem_alloc_region result memory · db7e4757
      Kirill Shcherbatov authored
      Function sql_vdbe_mem_alloc_region() that constructs the value
      of Vdbe Mem object used to change only type related flags.
      However, it is also required to erase other flags (for instance
      flags related to allocation policy: static, dynamic etc), since
      their combination may be invalid.
      In a typical Vdbe scenario, OP_MakeRecord and OP_RowData release
      memory with sqlite3VdbeMemRelease() and allocate on region with
      sql_vdbe_mem_alloc_region(). An integrity assert based on
      sqlite3VdbeCheckMemInvariants() would fire here due to
      incompatible combination of flags:
      MEM_Static | (MEM_Blob | MEM_Ephem).
      
      Needed for #3850
      db7e4757
    • Kirill Shcherbatov's avatar
      sql: clean-up vdbe_emit_constraint_checks · 1c51153a
      Kirill Shcherbatov authored
      Removed vdbe code generation making type checks from
      vdbe_emit_constraint_checks as it is useless since strict types
      have been introduced.
      1c51153a
  8. Jan 09, 2019
  9. Jan 05, 2019
  10. Dec 29, 2018
    • Kirill Shcherbatov's avatar
      box: use bitmap to check for missing fields in tuple_init_field_map · f6a31c03
      Kirill Shcherbatov authored
      Reworked tuple_init_field_map to fill a local bitmap and
      compare it with template required_fields bitmap containing
      information about required fields. Each field is mapped to
      bitmap with field:id - unique field identifier.
      This approach to check the required fields will work even after
      the introduction of JSON paths, when the field tree becomes
      multilevel.
      
      @locker: massive code refactoring, comments.
      
      Needed for #1012
      f6a31c03
    • Kirill Shcherbatov's avatar
      box: introduce bitmap_size helper · b905d4b7
      Kirill Shcherbatov authored
      @locker: comments.
      
      Needed for #1012
      b905d4b7
    • Nikita Pettik's avatar
      sql: fix decoding data of type BOOLEAN · 5f18717d
      Nikita Pettik authored
      Closes #3906
      5f18717d
    • Kirill Shcherbatov's avatar
      sql: support HAVING without GROUP BY clause · b40f2443
      Kirill Shcherbatov authored
      Allowed to make SELECT requests that have HAVING clause without
      GROUP BY. It is possible when both - left and right parts of
      request have aggregate function or constant value.
      
      Closes #2364.
      
      @TarantoolBot document
      Title: HAVING without GROUP BY clause
      A query with a having clause should also have a group by clause.
      If you omit group by, all the rows not excluded by the where
      clause return as a single group.
      Because no grouping is performed between the where and having
      clauses, they cannot act independently of each other. Having
      acts like where because it affects the rows in a single group
      rather than groups, except the having clause can still use
      aggregates.
      Having without group by is not supported for select from
      multiple tables.
      
      2011 SQL standard "Part 2: Foundation" 7.10 <having clause> p.381
      
      Example:
      SELECT MIN(s1) FROM te40 HAVING SUM(s1) > 0; -- is valid
      SELECT 1 FROM te40 HAVING SUM(s1) > 0;       -- is valid
      SELECT NULL FROM te40 HAVING SUM(s1) > 0;    -- is valid
      SELECT date() FROM te40 HAVING SUM(s1) > 0;  -- is valid
      b40f2443
    • Vladimir Davydov's avatar
      xlog: assure xlog is opened and closed in the same thread · 847aab99
      Vladimir Davydov authored
      xlog and xlog_cursor must be opened and closed in the same thread,
      because they use cord's slab allocator.
      
      Follow-up #3910
      847aab99
    • Vladimir Davydov's avatar
      relay: close xlog cursor in relay thread · 21726f69
      Vladimir Davydov authored
      An xlog_cursor created and used by a relay via recovery context is
      destroyed by the main thread once the relay thread has exited. This is
      incorrect, because xlog_cursor uses cord's slab allocator and therefore
      must be destroyed in the same thread it was created by, otherwise we
      risk getting a use-after-free bug. So this patch moves recovery_delete()
      invocation to the end of the relay thread routine.
      
      No test is added, because our existing tests already cover this case -
      crashes don't usually happen, because we are lucky. The next patch will
      add some assertions to make the bug 100% reproducible.
      
      Closes #3910
      21726f69
Loading