Commits · 0bf0a00c8f8f31d15bd64149149fee53e4047de5 · core / tarantool

Jul 06, 2017

sql: Reduce maximal trigger call chain · 0bf0a00c

Kirill Yukhin authored 7 years ago

LLVM is more eager for frame size than GCC
Current maximal call chain length may corrupt stack then.
Reduce maximal call chain length to 30.

Related to #2550

0bf0a00c

sql: Intialize SQL subsystem after config is over · 1f04f5d5

Kirill Yukhin authored 7 years ago

If after recovery first client to invoke SQL differs from
ADMIN, then initialization of SQL subsystem will fail due to
insufficient access rights. Problem is that SQL invokes
create_from_tuple() to fill space_def structure, which in turn
verifies access right.
As far as SQL is a static part of Tarantool, there's no sense
to initialize the system by demand. Change performs data dictionary
initialization upon config completion.
Regression test added as well.

Closes #2483

1f04f5d5

Jul 04, 2017

sql: add iproto sql constants · 1a6b2fcc
Vladislav Shpilevoy authored 7 years ago
```
Needed for #2285
```
1a6b2fcc

vinyl: fix snapshot consistency · 8c4b28a1

Vladimir Davydov authored 7 years ago

Commit dbfd515f ("vinyl: fix crash if snapshot is called while dump
is in progress") introduced a bug that can result in statements inserted
after WAL checkpoint being included in a snapshot. This happens, because
vy_begin_checkpoint() doesn't force rotation of in-memory trees anymore:
it bumps checkpoint_generation, but doesn't touch scheduler->generation,
which is used to trigger in-memory tree rotation.

To fix this issue, this patch zaps scheduler->checkpoint_generation and
makes vy_begin_checkpoint() bump scheduler->generation directly as it
used to. To guarantee dump consistency (the issued fixed by commit
dbfd515f), scheduler->dump_generation is introduced - it defines the
generation of in-memory data that are currently being dumped. The
scheduler won't start dumping newer trees until all trees whose
generation equals dump_generation have been dumped. The counter is only
bumped by the scheduler itself when all old in-memory trees have been
dumped. Together, this guarantees that each dump contains data of the
same generation, i.e. is consistent.

While we are at it, let's also remove vy_scheduler->dump_fifo, the list
of all in-memory trees sorted in the chronological order. The scheduler
uses it to keep track of the oldest in-memory tree, which is needed to
invoke lsregion_gc(). However, since we do not remove indexes from the
dump_heap, as we used to not so long ago, we can use the heap for this.
The only problem is indexes that are currently being dumped are moved
off the top of the heap, but we can detect this case by maintaining a
counter of dump tasks in progress: if dump_task_count is > 0 when a dump
task is completed, we must not call lsregion_gc() irrespective of the
generation of the index at the top of the heap. A good thing about
ridding of vy_scheduler->dump_fifo is that it is a step forward towards
making vy_index independent of vy_scheduler so that it can be moved to a
separate source file.

Closes #2541
Needed for #1906

8c4b28a1

vinyl: zap vy_index->generation · a4222599

Vladimir Davydov authored 7 years ago

vy_index->generation equals to the generation of the oldest in-memory
tree, which can be looked up efficiently as vy_index->sealed list is
sorted by generation so let's zap it and add vy_index_generation()
function instead.

a4222599

vinyl: move vy_range to its own source file · 1151f369
Vladimir Davydov authored 7 years ago
```
Needed for #1906
```
1151f369

Move iterator_type to its own source file · f7d60dec

Vladimir Davydov authored 7 years ago

Including index.h just for the sake of iterator_type, as we do in
vy_run.h and vy_mem.h, is a bit of overkill. Let's move its definition
to a separate source file, iterator_type.h.

f7d60dec

vinyl: remove dependency of vy_range from vy_index · bbefead0

Vladimir Davydov authored 7 years ago

 - Replace vy_range->index with key_def.
 - Replace vy_range_iterator->index with vy_range_tree_t.

Needed for #1906

bbefead0

vinyl: store indexes in scheduler->compact_heap · 90ee5837

Vladimir Davydov authored 7 years ago

The compact_heap, used by the scheduler to schedule range compaction,
contains all ranges except those that are currently being compacted.
Since the appropriate vy_index object is required to schedule a range
compaction, we have to store a pointer to the index a range belongs to
in vy_range->index. This makes it impossible to move vy_range struct and
its implementation to a separate source file.

To address this, let's rework the scheduler as follows:

 - Make compact_heap store indexes, not ranges. An index is prioritized
   by the greatest compact_priority among its ranges.

 - Add a heap of ranges to each index, prioritized by compact_priority.
   A range is removed from the heap while it's being compacted.

 - Do not remove indexes from dump_heap or compact_heap when a task is
   scheduled (otherwise we could only schedule one compaction per
   index). Instead just update the index position in the heaps.

Needed for #1906

90ee5837

replica: advance gc state only when xlog is closed · 18ffccc5

Vladimir Davydov authored 7 years ago

Advancing replica->gc on every status update is inefficient as gc can
only be invoked when we move to the next xlog file. Currently, it's
acceptable, because status is only updated once per second, but there's
no guarantee that it won't be updated say every millisecond in future,
in which case advancing replica->gc on every status update may become
too costly.

So introduce a trigger invoked every time an xlog is closed by
recover_remaining_wals() and use it in relay to send a special gc
message.

18ffccc5

cbus: fix cbus_endpoint_destroy loop exit condition · b8f87f55

Vladimir Davydov authored 7 years ago

ipc_cond_wait() always returns 0, so the body of the loop waiting for
the endpoint to be ready for destruction is only invoked once.

b8f87f55

relay: fix potential thread hang on exit · 12c1328e

Vladimir Davydov authored 7 years ago

To make sure there is no status message pending in the tx pipe,
relay_cbus_detach() waits on relay->status_cond before proceeding to
relay destruction. The status_cond is signaled by the status message
completion routine (relay_status_update()) handled by cbus on the
relay's side. The problem is by the time we call relay_cbus_detach(),
the cbus loop has been stopped (see relay_subscribe_f()), i.e. there's
no one to process the message that is supposed to signal status_cond.
That means, if there happens to be a status message en route when the
relay is stopped, the relay thread will hang forever.

To fix this issue, let's introduce a new helper function, cbus_flush(),
which blocks the caller until all cbus messages queued on a pipe have
been processed, and use it in relay_cbus_detach() to wait for in-flight
status messages to complete. Apart from source and destination pipes,
this new function takes a callback to be used for processing incoming
cbus messages, so it can be used even if the loop that is supposed to
invoke cbus_process() stopped.

12c1328e

recovery: refactor recover_remaining_wals · 6e5c9a9f

Vladimir Davydov authored 7 years ago

 - Fold in wal dir scan. It's pretty easy to detect if we need to rescan
   wal dir - we do iff the current wal is closed (if it isn't, we need
   to recover it first), so there's no point in keeping it apart.

 - Close the last recovered wal on eof. We don't close it to avoid
   rereading it in case recover_remaining_wals() is called again before
   a new wal is added to wal dir. We can detect this case by checking if
   the signature of the last wal stored in wal dir has increased after
   rescanning the dir.

 - Don't abort recovery and print 'xlog is deleted under our feet'
   message if current wal file is removed. This is pointless, really -
   it's OK to remove an open file in Unix. Besides, the check for
   deleted file is only relevant if wal dir has been rescanned, which is
   only done when we proceed to the next wal, i.e. it doesn't really
   detect anything.

A good side effect of this rework is that now we can invoke garbage
collector right from recovery_close_log().

6e5c9a9f

small: advance to include rb.h iterators (in scope of gh-1668) · b13afd6b
Konstantin Osipov authored 7 years ago

b13afd6b

vinyl: point iterator merge fixes. · 773c1c5d

Konstantin Osipov authored 7 years ago

* use per-index statistics
* remove step_count as it is no longer maintained
* add statistics for txw, mem, and index overall

773c1c5d

vinyl: implement special iterator for full-key EQ case · c7f81ee2

alyapunov authored 7 years ago

Old iterator has several problems:

- restoration system is too complex and might cause several
  reads from disk of the same statements.

- applying of upserts works in direct way (squash all upserts
  and apply them to terminal statement) and the code doesn't
  leave a change to optimize it.

Implement iterator for full-key EQ case that fixes problems above.

c7f81ee2

vinyl: split index->version to mem_list_version/range_tree_version · 9bcb4159

alyapunov authored 7 years ago

There is a version member in vy_index that is incremented on each
modification of mem list and range tree. Split it to two members
that correspond to mem list and range tree accordingly.

It is needed for more precise tracking of changes in iterators.

9bcb4159

Jul 03, 2017

sql: Limit number of simultaneously compiling triggers · d00e70b2

Kirill Yukhin authored 7 years ago

* Right now Tarantool runs compilation of a query on top of fiber's stack
  so, so number of triggeres compilation of which was triggered by chain
  should be limited. Introduce dedicated global variable which decreases
  each time new compilation in back track is initiated. Error if zero is
  reached. Set it to limit @ parser
* Add test for verious cases
* Fix out of order build for maintainer mode
* Regenerate parser

Closes #2550

d00e70b2

sql: Limit number of terms in SELECT compoind stmts · 974bce9a

Kirill Yukhin authored 7 years ago

Right now SQL query compiler is run on top of fiber's stack,
which is limited to 64KB by default, so maximum
number of entities in compound SELECT statement should be less than
50 (verified by experiment) or stack guard will be triggered.

In future we'll introduce heuristic which should understand that
query is 'complex' and run compilation in separate thread with larger
stack. This will also allow us not to block Tarantool while compiling
the query.

Closes #2548

974bce9a

sql: Add feature: call lua from sql; enable tests · 44ba745d

khatskevich authored 7 years ago

This is a first implentation of box.internal.sql_create_function
It works properly on limited set of cases.
The main purpose for this feature by now is tests.

Some test cases relaying on this feature were enabled.

Part of #2233

44ba745d

Jun 29, 2017

Merge remote-tracking branch 'origin/1.7' into 1.8 · 9f248e1e
Konstantin Osipov authored 7 years ago

9f248e1e
gh-2532: use region allocator for struct vy_log_record · 246a89ef
Konstantin Osipov authored 7 years ago

246a89ef

vinyl: copy key_def and range boundaries to vylog buffer · 08590def

Vladimir Davydov authored 7 years ago

If a vylog record doesn't get flushed to disk due to an error, objects
it refers to (index->key_def, range->begin and range->end) may get
destroyed, resulting in a crash. To avoid that, we must copy those
objects to vylog buffer.

Closes #2532

08590def

Update test result (small). · b150a8d5
Konstantin Osipov authored 7 years ago

b150a8d5

vinyl: remove meaningless global stat counters · 513b45d6

Vladimir Davydov authored 7 years ago

 - Remove tx and cursor latencies as they are useless - they actually
   account how long a tx/cursor was open, not latencies.
 - Remove vy_stat->get_latency as it doesn't account latency of select,
   besides we now have per index read latency.
 - Remove vy_stat->dumped_statements and dump_total as these statistics
   are reported per index as well.
 - VY_STAT_TX_OPS is currently unused (always 0). Let's use it for
   accounting the total number of statements committed in tx instead of
   vy_stat->write_count.

513b45d6

vinyl: move index options from index.info to index.options · 18e3463d

Vladimir Davydov authored 7 years ago

index.info() is supposed to show index stats, not options.
box.space.<space_name>.index.<index_name>.options looks like
a better place for reporting index options.

Needed for #1662

18e3463d

vinyl: account read latency per index · b71814d8

Vladimir Davydov authored 7 years ago

This patch adds 'latency' field to index.info. It shows the latency of
reads from the index. The latency is computed as 99-percentile of all
delays incurred by vy_read_iterator_next().

Needed for #1662

b71814d8

vinyl: account squashed upserts per index · 3ba2779e

Vladimir Davydov authored 7 years ago

Replace box.info.vinyl().performance.upsert_{squashed,applied} with per
index index.info().upsert.{squashed,applied}.

Needed for #1662

3ba2779e

vinyl: account reads and writes per index · c3d0046a

Vladimir Davydov authored 7 years ago

Add the following counters to index.info:

  lookup                # number of lookups (read iter start)
  get                   # number of statements read (read iter next)
    rows
    bytes
  put                   # number of statements written
    rows
    bytes

Needed for #1662

c3d0046a

logging: crash if fmt is not string · 8368fb21
Eugine Blikh authored 7 years ago
```
closes gh-2516
```
8368fb21
key_def: eliminiate a duplicate implementation of key_def_sizeof() · fe54f347
Konstantin Osipov authored 7 years ago

fe54f347

sql: Remove SQLite's WAL · f3eeb49b

Kirill Yukhin authored 7 years ago

This chage removes SQL's WAL support. Most of code
related to it was removed. OP_Checkpoint op-code was
commented, but not removed (also PRAGMA wal_mode): this might
be useful in future if Tarantool's WAL is adopted there.

f3eeb49b

Jun 28, 2017

sql: Convert 34 tests, add features to sqltester · 6d1d8b49

khatskevich authored 7 years ago

* Changes to sqltester:
  - Change lsearch to be closer to tcl implementation:
      return index of first match or return -1
  - Make execsql2 work with multistatements
* Converted tests include cases for large (64b) integers

Part of #2381

6d1d8b49

sql: replace column_int with column_int64 · 2420040b

Vladislav Shpilevoy authored 7 years ago

Big numvers return from sql to lua as double or as long long depending on it's
size (wheather it fit into mantisa or not)
Remember, in tarantool you can create variable of type long long using LL
postfix (ex: 1235453452343LL)

closes #2469

2420040b

Jun 27, 2017

vinyl: update gh-2520 test case to the new statistics API · eb9066fa
Konstantin Osipov authored 7 years ago

eb9066fa

vinyl: add per-index txw statistics · 44305191

Vladimir Davydov authored 7 years ago

This patch adds the following counters to index.info:

  txw
    count               # number of statements in the TX write set
      rows
      bytes
    iterator
      lookup            # number of lookups in the TX write set
      get               # number of statements returned by the iterator
        rows
        bytes

Needed for #1662

44305191

vinyl: add per-index cache statistics · 1bb7a8b2

Vladimir Davydov authored 7 years ago

This patch adds the cache section to index.info with the following
counters in it:

  cache
    rows                # number of tuples in the cache
    bytes               # cache memory size
    lookup              # lookups in the cache
    get                 # reads from the cache
      rows
      bytes
    put                 # write to the cache
      rows
      bytes
    invalidate          # overwrites in the cache
      rows
      bytes
    evict               # evictions due to memory quota
      rows
      bytes

Needed for #1662

1bb7a8b2

vinyl: do not (ab)use vy_quota for vy_cache · 8e5834b8

Vladimir Davydov authored 7 years ago

Using vy_quota, which was implemented to support watermarking,
throttling, timeouts, for accounting cached tuples is an overkill.
Replace it with mem_used and mem_quota counters.

8e5834b8

vinyl: add per-index disk write statistics · 542ce685

Vladimir Davydov authored 7 years ago

This patch adds the following counters to the disk section index.info:

  dump                  # dump statistics:
    count               #   number of invocations
    in                  #   number of input statements
      rows
      bytes
    out                 #   number of output statements
      rows
      bytes

  compact               # compaction statistics:
    count               #   number of invocations
    in                  #   number of input statements
      rows
      bytes
    out                 #   number of output statements
      rows
      bytes

Needed for #1662

542ce685

vinyl: add per-index mem and run iterator stat · 8105cec7

Vladimir Davydov authored 7 years ago

Replace box.info.vinyl().performance.iterator.{run,mem} global counters
with the following per index counters:

  memory
    iterator
      lookup            # number of lookups in the memory tree
      get               # number of statements returned by mem iterator
        rows
        bytes

  disk
    iterator
      lookup            # number of lookups in the page index
      get               # number of statements returned by run iterator
        rows
        bytes

      bloom             # number of times bloom filter
        hit             #   allowed to avoid a disk read
        miss            #   failed to prevent a disk read

      read              # number of statements actually read from disk
        rows
        bytes
        bytes_compressed
        pages

Needed for #1662

8105cec7