Commits · c2de45c4686405d199eb22273706a3d52cadea3b · core / tarantool

Sep 21, 2018

test: skip ddl test for vinyl on travis · c2de45c4
Sergei Voronezhskii authored 6 years ago
```
Until the bug in #3420 is fixed
```
c2de45c4

The only difference between struct key_part_def and struct key_part is
that the former stores only the id of a collation while the latter also
stores a pointer to speed up tuple comparisons. It isn't worth keeping a
separate struct just because of that. Let's use struct key_part
everywhere and assume that key_part->coll is NULL if the part is needed
solely for storing a decoded key part definition and isn't NULL if it is
used for tuple comparisons (i.e. is attached to a key_def).

ea3a2b5f

box: introduce tuple_field_by_part routine · 48a3dc96

Kirill Shcherbatov authored 6 years ago

Start use tuple_field_by_part(_raw) routine in *extract,
*compare, *hash functions. This new function use key_part to
retrieve field data mentioned in key_part. Now it is just a
wrapper for tuple_field_raw but with introducing JSON paths
it would work in other way.

Needed for #1012

48a3dc96

box: refactor API to use non-constant key_def · 93354623

Kirill Shcherbatov authored 6 years ago

To introduce JSON indexes we need changeable key_def containing
key_part definition that would store JSON path and offset slot
and slot epoch in following patches.

Needed for #1012

93354623

Merge branch '1.9' into 1.10 · 18bd613b
Kirill Yukhin authored 6 years ago

18bd613b

Sep 20, 2018

Fix clang build (fails on -Wno-cast-function-type) · cea113cf

Alexander Turenko authored 6 years ago

The problem is that clang does not support -Wno-cast-function-type flag.
It is the regression from 8c538963.

Follow up of #3685.
Fixes #3701.

Unverified

cea113cf

test: remove universal grants from tests · af6b554b

Serge Petrenko authored 6 years ago

This patch rewrites all tests to grant only necessary privileges, not
privileges to universe. This was made possible by bugfixes in access
control, patches #3516, #3574, #3524, #3530.

Follow-up #3530

af6b554b

Merge branch '1.9' into 1.10 · 4149e97b
Kirill Yukhin authored 6 years ago

4149e97b

Sep 19, 2018

vinyl: add global disk stats · fe06b124

Vladimir Davydov authored 6 years ago

This patch adds some essential disk statistics that are already
collected and reported on per index basis to box.stat.vinyl().
The new statistics are shown under the 'disk' section and currently
include the following fields:

 - data: size of data stored on disk.
 - index: size of index stored on disk.
 - dump.in: size of dump input.
 - dump.out: size of dump output.
 - compact.in: size of compaction input.
 - compact.out: size of compaction output.
 - compact.queue: size of compaction queue.

All the counters are given in bytes without taking into account
disk compression. Dump/compaction in/out counters can be reset with
box.stat.reset().

fe06b124

vinyl: factor out helpers for accounting dump/compaction · 346cb06b

Vladimir Davydov authored 6 years ago

So that we can easily extend them to account the stats not only per LSM
tree, but also globally, in vy_lsm_env.

346cb06b

vinyl: keep track of compaction queue length · 06e70cad

Vladimir Davydov authored 6 years ago

Currently, there's no way to figure out whether compaction keeps up
with dumps or not while this is essential for implementing transaction
throttling. This patch adds a metric that is supposed to help answer
this question. This is the compaction queue size. It is calculated per
range and per LSM tree as the total size of slices awaiting compaction.
We update the metric along with the compaction priority of a range, in
vy_range_update_compact_priority(), and account it to an LSM tree in
vy_lsm_acct_range(). For now, the new metric is reported only on per
index basis, in index.stat() under disk.compact.queue.

06e70cad

vinyl: add helpers for resetting statement counters · 58d2b9db

Vladimir Davydov authored 6 years ago

Currently, we call memset() on vy_stmt_counter and vy_disk_stmt_counter
directly, but that looks rather ugly, especially when a counter has a
long name. Let's introduce helper functions for that.

58d2b9db

vinyl: report pages and bytes_compressed in dump/compact in/out stats · 8a1e507d

Vladimir Davydov authored 6 years ago

There's no reason not to report pages and bytes_compressed under
disk.stat.dump.out and disk.stat.compact.{in,out} apart from using
the same struct for dump and compaction statistics (vy_compact_stat).
The statistics are going to differ anyway once compaction queue size
is added to disk.stat.compact so let's zap struct vy_compact_stat
and report as much info as we can.

8a1e507d

vinyl: annotate info_table_end with comment · 7ef02518

Vladimir Davydov authored 6 years ago

The code is difficult to follow when there are nested info tables,
because info_table_end() doesn't refer to the table name. Let's
annotate info_table_end() with a comment to make it easier to follow.
No functional changes.

7ef02518

vinyl: update compact priority usual way on range split/coalesce · e0f8aefb

Vladimir Davydov authored 6 years ago

When a few ranges are coalesced, we "force" compaction of the resulting
range by raising its compaction priority to max (slice count). There's
actually no point in that, because as long as the shape of the resulting
LSM tree is OK, we don't need to do extra compaction work. Moreover, it
actually doesn't work if a new slice is added to the resulting range by
dump before it gets compacted, which is fairly likely, because then its
compaction priority will be recalculated as usual. So let's simply call
vy_range_update_compact_priority() for the resulting range.

When a range is split, the produced ranges will inherit its compaction
priority. This is actually incorrect, because range split may change the
shape of the tree so let's recalculate priority for each part the usual
way, i.e. by calling vy_range_update_compact_priority().

After this patch, there's this only place where we can update compaction
priority of a range - it's vy_range_update_compact_priority().

e0f8aefb

vinyl: fix force compaction logic · f3134f2f

Vladimir Davydov authored 6 years ago

This patch addresses a few problems index.compact() is suffering from,
namely:

 - When a range is split or coalesced, it should inherit the value of
   needs_compaction flag from the source ranges. Currently, the flag is
   cleared so that the resulting range may be not compacted.

 - If a range has no slices, we shouldn't set needs_compaction flag for
   it, because obviously it can't be compacted, but we do.

 - The needs_compaction flag should be cleared as soon as we schedule a
   range for compaction, not when all slices have been compacted into
   one, as we presently expect, because the latter may never happen
   under a write-intensive load.

f3134f2f

Sep 17, 2018

lua: fix assertion failure after an error in box.session.su() · ac77418f

Serge Petrenko authored 6 years ago

If some error occured during execution of a function called from
box.session.su(), we assumed that fiber diagnostics area was not empty,
and tried to print an error message using data from the diagnostics.
However, this assumption is not true when some lua error happens.
Imagine such a case:

  box.session.su('admin', function(x) return #x end, 3)

A lua error would be pushed on the stack but the diagnostics would be
empty, and we would get an assertion failure when trying to print the
error message. Handle this by using lua_error() instead of luaT_error().

Closes #3659

ac77418f

Sep 15, 2018

Fix Debug build on GCC 8 · 8c538963

Alexander Turenko authored 6 years ago

Fixed false positive -Wimplicit-fallthrough in http_parser.c by adding a
break. The code jumps anyway, so the execution flow is not changed.

Fixed false positive -Wparenthesis in reflection.h by removing the
parentheses. The argument 'method' of the macro 'type_foreach_method' is
just name of the loop variable and is passed to the macro for
readability reasons.

Fixed false positive -Wcast-function-type triggered by reflection.h by
adding -Wno-cast-function-type for sources and unit tests. We cast a
pointer to a member function to an another pointer to member function to
store it in a structure, but we cast it back before made a call. It is
legal and does not lead to an undefined behaviour.

Fixes #3685.

Unverified

8c538963

Sep 14, 2018

Fix http error test · 76a8bd32

AKhatskevich authored 6 years ago

The test expected that http:get yields, however, in case of
very fast unix_socket and parallel test execution, a context
switch during the call lead to absence of yield and to instant
reply. That caused an error during `fiber:cancel`.

The problem is solved by increasing http server response time.

Closes #3480

76a8bd32

Sep 13, 2018

json: add options to json.encode() · 1663bdc4

Roman Khabibov authored 6 years ago

Add an ability to pass options to json.encode()/decode().

Closes: #2888.

@TarantoolBot document
Title: json.encode() json.decode()
Add an ability to pass options to
json.encode() and json.decode().
These are the same options that
are used globally in json.cfg().

1663bdc4

Sep 10, 2018

Fix libgomp linking for static build · 0a3186c4

Kirill Yukhin authored 6 years ago

Since addition of -fopenmp to compiler also means
addition of -lgomp to the link stage, pass -fno-openmp
to the linking stage in case of static build. In that
case OMP functions are statically linked into libmisc.

Also, emit error if trying to perform static build using
clang.

0a3186c4

Sep 09, 2018

vinyl: add global memory stats · e78ebb77

Vladimir Davydov authored 6 years ago

box.info.memory() gives you some insight on what memory is used for,
but it's very coarse. For vinyl we need finer grained global memory
statistics.

This patch adds such: they are reported under box.stat.vinyl().memory
and consist of the following entries:

 - level0: sum size of level-0 of all LSM trees.
 - tx: size of memory used by tx write and read sets.
 - tuple_cache: size of memory occupied by tuple cache.
 - page_index: size of memory used for storing page indexes.
 - bloom_filter: size of memory used for storing bloom filters.

It also removes box.stat.vinyl().cache, as the size of cache is now
reported under memory.tuple_cache.

e78ebb77

vinyl: fix accounting of secondary index cache statements · 16faada1

Vladimir Davydov authored 6 years ago

Since commit 0c5e6cc8 ("vinyl: store full tuples in secondary index
cache"), we store primary index tuples in secondary index cache, but we
still account them as separate tuples. Fix that.

Follow-up #3478
Closes #3655

16faada1

vinyl: set box.cfg.vinyl_write_threads to 4 by default · fe1e4694

Vladimir Davydov authored 6 years ago

Any LSM-based database design implies high level of write amplification
so there should be more compaction threads than dump threads. With the
default value of 2 for box.cfg.vinyl_write_threads, which we have now,
we start only one compaction thread. Let's increase the default up to 4
so that there are three compaction threads started by default, because
it fits better LSM-based design.

fe1e4694

vinyl: don't start scheduler fiber until local recovery is complete · 7069eab5

Vladimir Davydov authored 6 years ago

We must not schedule any background jobs during local recovery, because
they may disrupt yet to be recovered data stored on disk. Since we start
the scheduler fiber as soon as the engine is initialized, we have to
pull some tricks to make sure it doesn't schedule any tasks: the
scheduler fiber function yields immediately upon startup; we assume
that it won't be woken up until local recovery is complete, because we
don't set the memory limit until then.

This looks rather flimsy, because the logic is spread among several
seemingly unrelated functions: the scheduler fiber (vy_scheduler_f),
the quota watermark callback (vy_env_quota_exceeded_cb), and the engine
recovery callback (vinyl_engine_begin_initial_recovery), where we leave
the memory limit unset until recovery is complete. The latter isn't even
mentioned in comments, which makes the code difficult to follow. Think
how everything would fall apart should we try to wake up the scheduler
fiber somewhere else for some reason.

This patch attempts to make the code more straightforward by postponing
startup of the scheduler fiber until recovery completion. It also moves
the comment explaining why we can't schedule tasks during local recovery
from vy_env_quota_exceeded_cb to vinyl_engine_begin_initial_recovery,
because this is where we actually omit the scheduler fiber startup.

Note, since now the scheduler fiber goes straight to business once
started, we can't start worker threads in the fiber function as we used
to, because then workers threads would be running even if vinyl was
unused. So we move this code to vy_worker_pool_get, which is called when
a worker is actually needed to run a task.

7069eab5

vinyl: zap vy_worker_pool::idle_worker_count · 0ff58856
Vladimir Davydov authored 6 years ago
```
It is not used anywhere anymore.
```
0ff58856

vinyl: use separate thread pools for dump and compaction tasks · 3e76f7b9

Vladimir Davydov authored 6 years ago

Using the same thread pool for both dump and compaction tasks makes
estimation of dump bandwidth unstable. For instance, if we have four
worker threads, then the observed dump bandwidth may vary from X if
there's high compaction demand and all worker threads tend to be busy
with compaction tasks to 4 * X if there's no compaction demand. As a
result, we can overestimate the dump bandwidth and trigger dump when
it's too late, which will result in hitting the limit before dump is
complete and hence stalling write transactions, which is unacceptable.

To avoid that, let's separate thread pools used for dump and compaction
tasks. Since LSM tree based design typically implies high levels of
write amplification, let's allocate 1/4th of all threads for dump tasks
and use the rest exclusively for compaction.

3e76f7b9

vinyl: move worker allocation closer to task creation · ba7abf6f

Vladimir Davydov authored 6 years ago

Call vy_worker_pool_get() from vy_scheduler_peek_{dump,compaction} so
that we can use different worker pools for dump and compaction tasks.

ba7abf6f

vinyl: factor out worker pool from scheduler struct · 49595189

Vladimir Davydov authored 6 years ago

A worker pool is an independent entity that provides the scheduler with
worker threads on demand. Let's factor it out so that we can introduce
separate pools for dump and compaction tasks.

49595189

vinyl: don't use mempool for allocating background tasks · 661763ed

Vladimir Davydov authored 6 years ago

Background tasks are allocated infrequently, not more often than once
per several seconds, so using mempool for them is unnecessary and only
clutters vy_scheduler struct. Let's allocate them with malloc().

661763ed

vinyl: add helper to check whether dump is in progress · 04a735b2
Vladimir Davydov authored 6 years ago
```
Needed solely to improve code readability. No functional changes.
```
04a735b2

Sep 06, 2018

cmake: revert the change which breaks cmake build with llvm + openmp · 2500a7a4
Konstantin Osipov authored 6 years ago

2500a7a4

Tarantool static build ability · cb1c72da

Georgy Kirichenko authored 6 years ago

A possibility to build tarantool with included library dependencies.
Use the flag -DBUILD_STATIC=ON to build statically against curl, readline,
ncurses, icu and z.
Use the flag -DOPENSSL_USE_STATIC_LIBS=ON to build with static
openssl

Changes:
  * Add FindOpenSSL.cmake because some distributions do not support the use of
  openssl static libraries.
  * Find libssl before curl because of build dependency.
  * Catch all bundled libraries API and export then it in case of static
  build.
  * Rename crc32 internal functions to avoid a name clash with linked libraries.

Notes:
  * Bundled libyaml is not properly exported, use the system one.
  * Dockerfile to build static with docker is included

Fixes #3445

cb1c72da

Sep 04, 2018

Merge branch '1.9' into 1.10 · 8bf936f7
Vladimir Davydov authored 6 years ago

8bf936f7

box: sync on replication configuration update · 113ade24

Vladimir Davydov authored 6 years ago

Now box.cfg() doesn't return until 'quorum' appliers are in sync not
only on initial configuration, but also on replication configuration
update. If it fails to synchronize within replication_sync_timeout,
box.cfg() returns without an error, but the instance enters 'orphan'
state, which is basically read-only mode. In the meantime, appliers
will keep trying to synchronize in the background, and the instance
will leave 'orphan' state as soon as enough appliers are in sync.

Note, this patch also changes logging a bit:
 - 'ready to accept request' is printed on startup before syncing
   with the replica set, because although the instance is read-only
   at that time, it can indeed accept all sorts of ro requests.
 - For 'connecting', 'connected', 'synchronizing' messages, we now
   use 'info' logging level, not 'verbose' as they used to be, because
   those messages are important as they give the admin idea what's
   going on with the instance, and they can't flood logs.
 - 'sync complete' message is also printed as 'info', not 'crit',
   because there's nothing critical about it (it's not an error).

Also note that we only enter 'orphan' state if failed to synchronize.
In particular, if the instnace manages to synchronize with all replicas
within a timeout, it will jump from 'loading' straight into 'running'
bypassing 'orphan' state. This is done for the sake of consistency
between initial configuration and reconfiguration.

Closes #3427

@TarantoolBot document
Title: Sync on replication configuration update
The behavior of box.cfg() on replication configuration update is
now consistent with initial configuration, that is box.cfg() will
not return until it synchronizes with as many masters as specified
by replication_connect_quorum configuration option or the timeout
specified by replication_connect_sync occurs. On timeout, it will
return without an error, but the instance will enter 'orphan' state.
It will leave 'orphan' state as soon as enough appliers have synced.

113ade24

box: add replication_sync_timeout configuration option · ca9fc33a

Olga Arkhangelskaia authored 6 years ago

In the scope of #3427 we need timeout in case if an instance waits for
synchronization for too long, or even forever. Default value is 300.

Closes #3674

@locker: moved dynamic config check to box/cfg.test.lua; code cleanup

@TarantoolBot document
Title: Introduce new configuration option replication_sync_timeout
After initial bootstrap or after replication configuration changes we
need to sync up with replication quorum. Sometimes sync can take too
long or replication_sync_lag can be smaller than network latency we
replica will stuck in sync loop that can't be cancelled.To avoid this
situations replication_sync_timeout can be used. When time set in
replication_sync_timeout is passed replica enters orphan state.
Can be set dynamically. Default value is 300 seconds.

ca9fc33a

box: make replication_sync_lag option dynamic · 5eb5c181

Olga Arkhangelskaia authored 6 years ago

In #3427 replication_sync_lag should be taken into account during
replication reconfiguration. In order to configure replication properly
this parameter is made dynamic and can be changed on demand.

@locker: moved dynamic config check to box/cfg.test.lua

@TarantoolBot document
Title: recation_sync_lag option can be set dynamically
box.cfg.recation_sync_lag now can be set at any time.

5eb5c181

Sep 03, 2018

box: make sure box.ctl is available before box.cfg{} · ad6a2498

Konstantin Osipov authored 6 years ago

Ensure box.ctl.wait_ro() and box.ctl.wait_rw() produce
meaningful results even when invoked before box.cfg{}: wait
for box.cfg{} to complete and the server to enter the right state.
Add a test case.

In scope of gh-3159

ad6a2498

Aug 31, 2018
- Merge remote-tracking branch 'origin/1.9' into 1.10 · 2b573166
  Konstantin Osipov authored 6 years ago
  
  2b573166
- Merge branch '1.9' into 1.10 · 5f601d7f
  Vladimir Davydov authored 6 years ago
  
  5f601d7f