Commits · 5e28b70a8f4a7f1ebafbd20dadc7be893e2b0660 · core / tarantool

Nov 03, 2017

vinyl: do not use rmean for calculating quota use rate · 5e28b70a

We have a timer for updating watermark every second. Let's reuse it for
quota use rate calculation. This will allow us to get rid of legacy
vinyl statistics.

Also, let's use EWMA for calculating the average. It is a more efficient
and common method, which allows to easily tune the period over which the
value is averaged.

5e28b70a

box: add box.NULL alias for msgpack.NULL · 043ba278
Roman Tsisyk authored 7 years ago
```
Follow up #1557
```
043ba278

Nov 02, 2017

Reject attempts to create non-string index part with collation · 4c10b711

Alexandr Lyapunov authored 7 years ago

Collation was simply ignored for non-string parts, that could
confuse potential user.

Generate a readable error in this case.

Fix #2862 part 2

4c10b711

Make collation work with scalar fields · 2601fcd3

Alexandr Lyapunov authored 7 years ago

Now collation is silently ignored for type='scalar' parts.

Use collation for string scalar fields.

Fix #2862 part 1

2601fcd3

Show collation in lua index object · 76b6e110
Alexandr Lyapunov authored 7 years ago
```
Show collation name (if present) in space.index.name.parts[no].

Fix #2862 part 4
```
76b6e110

Make collation by name lookup case insensitive · 3f9a73a7

Alexandr Lyapunov authored 7 years ago

test:create_index('unicode_s1', {parts = {{1, 'STR', collation =
'UNICODE'}}}) will work now.

Fix #2862 part 3

3f9a73a7

schema: allow to store smaller field count that specified in format · 2f53308e

Vladislav Shpilevoy authored 7 years ago

If a field is not indexed and no more indexed or not nullable
fields after that, than allow to skip it in insertion. Such field
value looks like MP_NIL, but MP_NIL is not explicitly stored.
Named access to this field in lua returns nil.

Example:
format =
{{'field1'},
 {'field2'},
 {'field3', is_nullable = true},
 {'field4', is_nullable = true}}

t = space:insert{1, 2} -- ok.

t.field1 == 1, t.field2 == 2, t.field3 == nil, t.field4 == nil

Closes #2880

2f53308e

schema: allow to store custom fields in format's field definition · f688ef36

Vladislav Shpilevoy authored 7 years ago

Some users store in format fields their custom keys. But current
opts parser does not allow to store any unknown keys. Lets allow it.

Example:
format = {}
format[1] = {name = 'field1', type = 'unsigned', custom_field = 'custom_value'}
s = box.schema.create_space('test', {format = format})
s:format()[1].custom_field == 'custom_value'

Closes #2839

f688ef36

vinyl: forbid DDL/DML if wal is disabled · b75ab0e0

Vladimir Davydov authored 7 years ago

Using DML/DDL on a Vinyl index with wal_mode = 'none' is likely to
result in unrecoverable errors like:

  F> can't initialize storage: Invalid VYLOG file: Index 512/0 created twice

To avoid data corruption in case the user tries to use an existing Vinyl
database in conjunction with wal_mode = 'none', let's explicitly forbid
it until we figure out how to fix it.

Workaround #2278

b75ab0e0

vinyl: ignore quota timeout during bootstrap from master · 19ac10c9

Vladimir Davydov authored 7 years ago

During initial join, a replica receives all data accumulated on the
master for its whole lifetime, which may be quota a lot. If the network
connection is fast enough, the replica might fail to keep up with dumps,
in which case replication fails with ER_VY_QUOTA_TIMEOUT. To avoid that,
let's ignore quota timeout until bootstrap is complete.

Note, replication may still fail during the 'subscribe' stage for the
same reason, but it's unlikely, because the rate at which the master
sends data is limited by the number of requests served by the master per
a unit of time, and it should become nearly impossible once throttling
is introduced (See #1862).

Closes #2873

19ac10c9

vinyl: abort bootstrap if vinyl directory is not empty · 4d796a8c

Vladimir Davydov authored 7 years ago

If the user sets snap_dir to an empty directory by mistake while leaving
vinyl_dir the same, tarantool will still bootstrap, but there is likely
to be errors like:

  vinyl.c:835 E> 512/0: dump failed: file './512/0/00000000000000000001.run' already exists
  vy_log.c:1095 E> failed to rotate metadata log: file './00000000000000000005.vylog' already exists

Even worse, it may eventually fail to restart with:

  vy_log.c:886 E> ER_MISSING_SNAPSHOT: Can't find snapshot

To avoid that, let's check the vinyl_dir on bootstrap and abort if it
contains vylog files left from previous setups.

Closes #2872

4d796a8c

vinyl: embed scheduler in env · 76481655

Vladimir Davydov authored 7 years ago

The only reason why it was allocated is that struct vy_scheduler was
defined after struct vy_env, which is not a problem any more. Embedding
it allows us to drop the extra argument to vy_scheduler_need_dump_f().

76481655

vinyl: move scheduler implementation to separate source file · 9f140646

Vladimir Davydov authored 7 years ago

It's a big independent entity, let's isolate its code in
a separate file.

While we are at it, add missing comments to vy_scheduler
struct members.

9f140646

vinyl: remove dependency of scheduler on environment · 32da4d7c

Vladimir Davydov authored 7 years ago

Instead of storing a pointer to vy_env in vy_scheduler, let's:

 - Add pointers to tx_manager::read_views and vy_env::run_env to
   vy_scheduler struct. They are needed to create a write iterator
   for a dump/compaction task.

 - Add a callback to struct vy_scheduler that is called upon dump
   completion to free memory. This allows us to eliminate accesses
   vy_env::quota and vy_env::allocator from vy_scheduler code.

 - Move the assert that assures that the scheduler isn't started during
   local recovery from vy_scheduler_f() to vy_env_quota_exceeded_cb()
   callback so that we don't need to access vy_env::status from the
   scheduler code. Note, after this change we have to set vy_env::status
   to VINYL_ONLINE before calling vy_quota_set_limit(), because the
   latter might schedule a dump.

 - Check if we have anything to dump from vy_begin_checkpoint() instead
   of vy_scheduler_begin_checkpoint().

This will allow us to isolate the scheduler code in a separate file.

32da4d7c

vinyl: rework dump trigger · f927a87b

Vladimir Davydov authored 7 years ago

Currently, dump is triggered (by bumping the memory generation) by the
scheduler fiber while quota consumers just wake it up. As a result, the
scheduler depends on the quota - it has to access the quota to check if
it needs to trigger dump. In order to move the scheduler to a separate
source file, we need to get rid of this dependency.

Let's rework this code as follows:

 - Remove vy_scheduler_trigger_dump() from vy_scheduler_peek_dump(). The
   scheduler fiber now just dumps all indexes eligible for dump and
   completes dump by bumping dump_generation. It doesn't trigger dump by
   bumping generation anymore. As a result, it doesn't need to access
   the quota.

 - Make quota consumers call vy_scheduler_trigger_dump() instead of just
   waking up the scheduler. This function will become a public one once
   the scheduler is moved out of vinyl.c. The function logic is changed
   a bit. First, besides bumping generation, it now also wakes up the
   scheduler fiber. Second, it does nothing if dump is already in
   progress or can't be scheduled because of concurrent checkpoint.
   In the latter case it sets a special flag though that will force the
   scheduler trigger dump upon checkpoint completion.

 - vy_scheduler_begin_checkpoint() can't use vy_scheduler_trigger_dump()
   anymore due to additional checks added to the function, so it bumps
   the generation directly. This looks fine.

 - Such a design has a subtlety regarding how quota consumers notify the
   scheduler and how they are notified back about available quota.
   In extreme cases, quota released by a dump may be not enough to
   satisfy all consumers, in which case we need to reschedule dump.
   Since the scheduler doesn't check the quota anymore and doesn't
   reschedule dump, it has to be done by the left consumers. So
   consumers has to call the quota_exceeded_cb (which triggers a dump
   now) callback every time they are woken up and see there's not enough
   quota. The vy_quota_use() is reworked accordingly.

   Also, since the quota usage may exceed the limit (because of
   vy_quota_force_use()), the quota usage may remain higher than the
   limit after a dump completion, in which case vy_quota_release()
   doesn't wake up consumers and again there's no one to trigger another
   dump. So we must wake up all consumers every time vy_quota_release()
   is called.

f927a87b

vinyl: move throttling code to vy_quota · 0d751d84

Vladimir Davydov authored 7 years ago

quota_cond, which is used for throttling quota consumers, doesn't really
belong to vy_scheduler. It would fit much better in vy_quota. Let's move
it there. This also allows us to remove the two callbacks from vy_quota
struct, quota_throttled_cb and quota_released_cb, and make the code more
straightforward.

While we are at it, let's also rename vy_scheduler_quota_exceeded_cb()
to vy_env_quota_exceeded_cb().

0d751d84

Nov 01, 2017

memtx: handle xlog_flush() error when writing snapshot · d542e0f9

Vladimir Davydov authored 7 years ago

If xlog_flush() fails, box.snapshot() will still succeed, but
recovery from such an incomplete snapshot will fail. Fix it and
add the corresponding test case.

d542e0f9

Oct 31, 2017

net_box: update space objects instead of replacing them · 300bc7da

Ilya authored 7 years ago

Update netbox.space objects in-place instead of re-creating them
on every schema change.

Closes #2401

300bc7da

string: introduce string.strip · 6007e6dc

Ilya authored 7 years ago

* Add support of string.strip, string.lstrip, string.rstrip

Closes #2785

6007e6dc

Oct 30, 2017

replication: finish bootstrap from 1.6 only when replica id is received · c383806d

Vladimir Davydov authored 7 years ago

In 1.7 the join procedure consists of two phases: initial, during which
we send the last snapshot, and final, when we send xlogs written after
the snapshot. Between the two phases, the replica uuid is added to the
cluster table on the master, so by the time join is finished, the
replica should have received its id.

However, on 1.6 there's no final join phase, instead the master expects
the replica to receive xlogs upon subscription. As a result, the replica
doesn't receive its id until it sends the subscribe request. This is
not expected by 1.7 clients - they fail with ER_UNKNOWN_REPLICA.

Fix this problem by making 1.7 replicas proceed to subscription and wait
until the id is received before completing bootstrap from 1.6 master.

Closes #2702

c383806d

Oct 27, 2017

Update small to fix memory leak · 685416d9

Vladimir Davydov authored 7 years ago

There was a bug in small garbage collection that resulted in tuple
leak in case box.snapshot() races with DML. The leak was indicated
by constantly growing box.slab.info().items_used. Update the small
library to fix it.

Closes #2842

685416d9

Revert "box: start database with default without error" · 72c5fe23
Roman Tsisyk authored 7 years ago
```
This reverts commit 8b6cefd0.

This feature is so good to be pushed into 1.7. Sorry.
```
72c5fe23

Oct 26, 2017
- tarantoolctl: add instance name to log messages · fda3002a
  Roman Tsisyk authored 7 years ago
  
  Improve usability.
  fda3002a
- tarantoolctl: don't chmod/chown control socket in usermode · 13c83f93
  Roman Tsisyk authored 7 years ago
  
  Follow up 9297ec36 "chmod and chown control socket"
  13c83f93
- Travis CI: enable Fedora 26 packages · 8c55b499
  Roman Tsisyk authored 7 years ago
  
  8c55b499
- Update test plans, minimal test and test-run · b7744576
  Alexander Turenko authored 7 years ago
  
  Fixes #2852. Fixes #2849.
  b7744576
- http: fix headers parsing · 69e94271
  Ilya authored 7 years ago
  
  * Fix parsing in case of unexpected headers * Fix duplicating headers in case of retransmitting responses A workaround for #2836
  69e94271
- lua: introduce tuple:tomap() · 9c434a9f
  Vladislav Shpilevoy authored 7 years ago
  
  Tomap() creates a lua table with both names and number indexes. Each named field stored by its name AND by its index in a tuple. For example, if a tuple is {'a', 'b', 'c'} and its format is {'field1', 'field2'}, then t.field1 is the same as t[1], t.field2 is the same as t[2]. Not named fields can be accessed only by their indexes. For the example above 'c' can be accessed only as t[3]. Closes #2821
  9c434a9f
- box: start database with default without error · 8b6cefd0
  Ilya authored 7 years ago
  
  * Call box.cfg() instead of raising an error on the first access to box.XXX Fixes #2559
  8b6cefd0
- string: fix error messages · 713ba869
  Ilya authored 7 years ago
  
  Fix typos in types check in string module Closes #2775
  713ba869
- Fix timeout calculation in latch_lock_timeout() · 57a7fe31
  Georgy Kirichenko authored 7 years ago
  
  57a7fe31
Oct 25, 2017
- xrow: fix error in request_str · 302a9a3d
  Vladislav Shpilevoy authored 7 years ago
  
  Check request fields on NULL before print.
  302a9a3d
- iproto: warn when we reach the limit on input or working fibers · 0b9fa597
  Konstantin Osipov authored 7 years ago
  
  0b9fa597
Oct 24, 2017
- netbox: add on_disconnect/connect trigger · 40a7b0e2
  Vladislav Shpilevoy authored 7 years ago
  
  Needed to remove monitoring fibers from shard and use only netbox api to track, if a connection is closed or reopened. Closes #2858
  40a7b0e2
- replication: log a row in a case of apply_row error · 0854397f
  Vladislav Shpilevoy authored 7 years ago
  
  Closes #2779
  0854397f
Oct 19, 2017

Add support for tarantoolctl rocks make · 2427c360

Konstantin Nazarov authored 7 years ago

luarocks make <rockspec> allows one to build a rock from local
directory.

In addition to the "rocks make" argument, one additional option is
needed in tarantoolctl: --chdir. This is because we need to build
inside the rock directory, but output the result to
<project_root>/.rocks.

Implements #2846

2427c360

Fix compilation on Mac OS · 39276fe1

Vladimir Davydov authored 7 years ago

> src/box/txn.c:454:40: error: '_Alignof' applied to an expression is a GNU extension [-Werror,-Wgnu-alignof-expression]
>                 diag_set(OutOfMemory, sizeof(*svp) + alignof(*svp) - 1,
>                                                      ^

Do not try to be smart and guess allocation size using alignof.

> src/box/memtx_tree.c:391:11: error: comparison of unsigned enum expression < 0 is always false [-Werror,-Wtautological-compare]
>         if (type < 0 || type > ITER_GT) { /* Unsupported type */
>             ~~~~ ^ ~

> src/box/vinyl_index.c:184:29: error: comparison of unsigned enum expression < 0 is always false [-Werror,-Wtautological-compare]
>         if (type > ITER_GT || type < 0) {
>                               ~~~~ ^ ~

Move the check for illegal params (i.e. 'type < 0') to the box API.
In index callbacks, only check that the iterator type is supported
by the index.

39276fe1

index: introduce and use internal iterator API · d4d6b613

Vladimir Davydov authored 7 years ago

Since we already have the index_create_iterator() method to create an
iterator, the API basically consists of two functions: iterator_next()
and iterator_delete(). While iterator_delete() is just a trivial wrapper
around iterator::free callback, iterator_next() is more than that: it
also checks schema version and invalidates the iterator in case there
was a DDL that affected the index. Previously, this was done only by the
box API, but the overhead of this check seems to be really negligible so
it is compelling to do it from the internal API so that an internal API
user doesn't need to care about DDL once he opened an iterator.

Needed for #2776

d4d6b613

space: drop execute_select virtual method · f1a75fe4

Vladimir Davydov authored 7 years ago

This virtual method was added to make use of the 'position' optimization
implemented in memtx. Since the optimization was removed recently, we
don't need it anymore.

f1a75fe4

index: implement generic versions of min(), max(), and count() · 4e3bb53e

Vladimir Davydov authored 7 years ago

The primary reason for these methods to be implemented differently
for memtx and vinyl was the 'position' optimization exploited by
the memtx engine: since selects from memtx do not yield, we could
use a preallocated iterator there.

Now, as the 'position' optimization became redundant and was
removed due to the switch to memory pools for iterator allocations,
the only idiosyncrasy left in the memtx implementation is the count()
optimization: count() falls back on size() for ITER_ALL. Since this
optimization consists of just a few lines of code, we don't really
need memtx_index_count() co-used by all memtx index implementations:
we can implement it in each memtx index separately.

That being said, let us:
 - implement generic versions of min(), max(), and count();
 - make vinyl, memtx, and sysview engines use generic versions of
   the above-mentioned methods if appropriate;
 - Remove memtx_index.[hc]

As a side-effect, this patch enables min(), max(), and count() in
the sysview engine, but that is not bad considering that this engine
implements general-purpose iterator for its indexes.

4e3bb53e