Commits · c0a30d17bcad92274b6b22b10245805ae2d41972 · core / tarantool

Sep 26, 2017

sequence: speed up allocation of sequence data tuple during checkpoint · c0a30d17
Konstantin Osipov authored 7 years ago
```
No need to allocate a tuple buffer for every sequence value.
```
c0a30d17

box: write up-to-date sequence values to snapshot · 775c02f7

Currently, _sequence_data is written to the snapshot the same way as any
other memtx space - the snapshot iterator simply walks over the space
tuples and dumps them one-by-one. This is OK now, because _sequence_data
is always up-to-date. However, after the next patch, its contents may
become stale - due to technical reasons we won't update _sequence_data
if a sequence is used for auto increment in a space. To make sure we
still snapshot sequence values properly, let's introduce a special
iterator for this space that walks over the sequence cache instead of
the space contents.

Needed for #389

775c02f7

index: use snapshot_iterator for writing snapshot · 7675c28b

Vladimir Davydov authored 7 years ago

Currently, Index::createSnapshotIterator() returns a pointer to struct
iterator, so the iterator implementation has to deal with struct tuple,
which might not always be an option. Since the iterator is only used for
writing a snapshot, struct tuple isn't really necessary, raw msgpack
would be enough. So let's introduce snapshot_iterator which returns raw
msgpack instead of struct tuple. It is required by the next patch to
make a snapshot of _sequence_data space.

Needed for #389

7675c28b

test: vinyl: check select consistency · 738c4a18
Vladimir Davydov authored 7 years ago

738c4a18

vinyl: track DELETEs skipped by read iterator in conflict manager · bcb1bbec

Vladimir Davydov authored 7 years ago

Currently, read iterator adds only final statements (i.e. statements
returned to the user) to the conflict manager while intermediate DELETEs
are silently ignored. This can result to the following select()
inconsistency.

Suppose there are REPLACE{10}, DELETE{20} and REPLACE{30} (it does not
matter which source owns which statement in this example) and select({})
is called. The read iterator returns REPLACE{10} and is called again. It
scans the sources, finds DELETE{20} and silently skips to the next
statement. If its fiber yields now, and REPLACE{15} is inserted, the
read iterator will ignore it, because its curr_stmt is {20} and so it
will return {30}, i.e. the select() will return [{10}, {30}]. The same
select() called from the same transaction afterwards will, however,
return [{10}, {20}, {30}], which is inconsistent with the previous
result. This happens, because interval [{10}, {20}] was not added to the
conflict manager before the yield and so the transaction was not sent to
read view.

To avoid this, we must track not only final statements, but also
intermediate DELETEs.

There's a nuance to this patch regarding REQ case that needs to be noted
explicitly. Since source iterators can't handle REQ, in contrast to EQ,
we change vy_read_iterator->iterator_type from REQ to LE and set the
need_check_eq flag when the iterator is opened. The need_check_eq flag
makes read iterator check if the resulting statement equals the search
key before returning it to the user. This check must be performed before
vy_tx_track() is called, otherwise we might add a larger interval to the
conflict manager than necessary, resulting in unnecessary conflicts
(vinyl/tx_gap_lock test will surely catch that). So this patch moves
call to vy_tx_track() along with the need_check_eq check immediately
after vy_read_iterator_next_key() irrespective of the statement type.
The problem is vy_cache_add() still has to be called on the final
statement:

 * WAS:
   vy_cache_add(result)
   need_check_eq check; may set result to NULL
   vy_tx_track(result)

 * NOW:
   need_check_eq check; may set result to NULL
   vy_tx_track(result)
   vy_cache_add(result)

Beside the result, vy_cache_add() is passed the iterator type so it
should handle NULL properly and it sure does in case of EQ, but not in
case of REQ, because REQ was changed to LE. That is, when vy_cache_add()
is passed NULL (search end marker) in case of REQ, it thinks it saw the
leftmost statement across all keys, not the leftmost statement equal to
the search key.

I decided that the best way to overcome this is to make read iterator
handle REQ separately, just like EQ, and only replace it with LE right
before passing it down to a source iterator. This gives vy_cache_add()
enough info to handle the REQ case properly and makes the code look more
symmetrical IMO.

bcb1bbec

vinyl: fix read source restore after statement removal · 330d53f0

Vladimir Davydov authored 7 years ago

Suppose, a source iterator was positioned at the minimal statement
across all sources before yielding to read a disk source. Then it has
front_id equal to the front_id of the read iterator, which indicates
that the source iterator needs to be advanced on the next iteration.
However, if the statement was removed during the yield, the iterator
gets repositioned on the next statement by the restoration procedure
(see vy_read_iterator_next_key()). In this case it isn't positioned on
the minimal statement any more and hence must not be advanced on the
next iteration, but its front_id remains the same, which means it will
be advanced, possibly skipping a statement.

The situation described above can happen only in case of the tuple
cache, because other sources are append-only. Obviously, to avoid this
kind of trouble, we must reset the front_id if a source iterator is
restored to a statement that is greater than the minimal statement.

330d53f0

vinyl: fix double free in cache iterator restore · a2a82803

Vladimir Davydov authored 7 years ago

We drop a reference to itr->curr_stmt before restoring the iterator
position so we must not drop it while iterating over the cache tree.
This mistake results in funky crashes in various places, e.g. in
vy_stmt_compare(). Shame on me: I missed that during self review.

Fixes: 073c8d07 ("vinyl: rework cache iterator restore")

a2a82803

vinyl: fold vy_merge_iterator in vy_read_iterator · 4fed8854

Vladimir Davydov authored 7 years ago

Initially merge sort implementation was supposed to be shared between
read and write iterators hence we got vy_merge_iterator. However, over
time such a design decision proved to be inadequate, because it turned
out that the two iterators can barely share a few lines - so different
are the tasks they solve. The write iterator has already been rewritten
without the use of the vy_merge_iterator so we can now fold the latter
in vy_read_iterator. While we are at it, spruce up some comments a bit.

4fed8854

vinyl: rename vy_read_iterator->curr_stmt to last_stmt · 6fc49730
Vladimir Davydov authored 7 years ago
```
So as not to confuse it with vy_merge_iterator->curr_stmt.
```
6fc49730

vinyl: remove redundant merge iterator flags · 4af159ec

Vladimir Davydov authored 7 years ago

 - search_started is not needed, (curr_stmt != NULL) can be used
   instead.
 - unique_optimization and one_value can be checked in-place.

4af159ec

vinyl: fold range iterator in read iterator · 6c8d9a56

Vladimir Davydov authored 7 years ago

Range iterator is trivial and only used by the read iterator, so it
doesn't deserve a layer of abstraction. Fold it in.

6c8d9a56

vinyl: simplify iteration over ranges in read iterator · 9da5698e

Vladimir Davydov authored 7 years ago

Historically, we have this 'belong_range/range_ended' optimization in
the read iterator: we move to the next range iff all sources that
are marked as 'belong_range' are done. It is supposed to spare us a
comparison with a range boundary on each iteration. However, with the
introduction of the tuple cache and the unified memory level, it became
more of a pain in the ass rather than a valuable asset. Thanks to it, we
have to read all runs that belong to the current range before we can
jump to the next range, even if all statements spanned by the current
range are in the cache. Also, when all runs in a range have been read,
we immediately jump to the next range and read its runs, even if the
user intends to stop the iterator after reading a memory statement that
is within the current range. That said, we effectively trade disk
accesses for simple in-memory comparison operations, which is silly.

Let's get rid of this "optimization" and make iteration over ranges
straightforward: after the merge iterator returns a statement, compare
it to the current range boundary and jump to the next range if it is
outside.

9da5698e

Introduce base64_encode options · cd4e26d4

Vladislav Shpilevoy authored 7 years ago

* no-wrap: do not wrap each 72 result symbols on a new line;
* no-pad:  do not add padding at the end of a result;
* urlsafe: no-wrap + no-pad + replace '+' -> '-' and '/' -> '_'.

Closes #2777
Closes #2478
Closes #2479

cd4e26d4

Add an option to allow using system ZStd · 664fc51e

Alexander Turenko authored 7 years ago

The option is '-DENABLE_BUNDLED_ZSTD' and defaults to ON.

There are two goals of making this conditional, both related to building
Tarantool on Gentoo Linux from an ebuild:

* Avoid bundled ZStd building issue w/o pay to investivage it.
* Allow user to choose between system and bundled library.

664fc51e

Sep 25, 2017

salad: light: allow to store values of arbitrary size · 04835943

Vladimir Davydov authored 7 years ago

A value stored in a light hash must be convertible to int and have such
a size that sizeof(LIGHT(record)) is a power of two. Let's remove these
limitations.

04835943

salad: light: make C compatible · abc9d055

Vladimir Davydov authored 7 years ago

 - Use 'static inline' instead of 'inline'
 - Move GROW_INCREMENT enum out of LIGHT(record) struct

abc9d055

vinyl: do not start reader threads if vinyl is unused · 57e8ff91

Vladimir Davydov authored 7 years ago

So as not to confuse the user, let's start reader threads when recovery
completes only if there are Vinyl indexes. Otherwise, start them on the
first Vinyl index creation.

See #2664

57e8ff91

vinyl: do not start worker threads on checkpoint if vinyl is unused · b98a38ba

Vladimir Davydov authored 7 years ago

Vinyl starts its worker threads on checkpoint, even if there is no Vinyl
spaces to dump out there. This is a mess. Let's wake up the scheduler
only if we actually have something to dump. BTW, this patch removes that
annoying 'vinyl checkpoint done' message in case nothing was dumped.

See #2664

b98a38ba

vinyl: factor out scheduler checkpoint functions · 0dc8d031

Vladimir Davydov authored 7 years ago

Introduce the following functions:

  vy_scheduler_begin_checkpoint()
  vy_scheduler_wait_checkpoint()
  vy_scheduler_end_checkpoint()

and make the Vinyl API methods use them instead of playing with the
scheduler directly.

Rationale:
 - It will help move the scheduler to a separate file in future.
 - We want to avoid waking the scheduler on checkpoint if Vinyl is not
   used, because the scheduler starts worker threads on the first
   wakeup. At the same time, we have to call vy_log_rotate() on
   checkpoint no matter if Vinyl is used or not (it needs to update its
   internal housekeeping). Factoring out the scheduler functions will
   allow to achieve this goal neatly.

0dc8d031

vinyl: use checkpoint_last() instead of vy_scheduler->last_checkpoint · d6d70836

Vladimir Davydov authored 7 years ago

There's no point in maintaining the LSN of the last checkpoint in Vinyl,
we have checkpoint_last() for retrieving it. Make this helper a little
bit more friendly by allowing to omit vclock argument and use it
wherever appropriate.

Here's why I need to do that now. I'm planning to omit calling scheduler
checkpoint functions altogether if there's nothing to dump (so as not to
start worker threads and print 'vinyl checkpoint done' message in case
Vinyl is not used), but one of those functions updates last_checkpoint,
which other pieces of code (gc primarily) relies on. I could factor
last_checkpoint out of vy_scheduler, but why if I can simply use
checkpoint_last() instead.

d6d70836

box: do not invoke garbage collection on checkpoint_count change · 884abc0b

Vladimir Davydov authored 7 years ago

Currently, box_set_checkpoint_count() not only changes the corresponding
configuration option, but also invokes garbage collection. The problem
is it is called in the beginning of box_cfg(), before recovering engine
data. At that time engines are not ready to invoke garbage collection
(e.g. Vinyl hasn't scanned vylog directory yet). Besides, the server
might be started in the hot standby mode, in which case calling garbage
collection is obviously wrong.

To fix this, we could somehow detect if box_set_checkpoint_count() is
called from box_cfg() and avoid invoking the garbage collector if so,
but I think it isn't worth it. Instead let's just not invoke garbage
collection when this option changes. It's not a big deal - old files
will be removed by the next snapshot anyway.

884abc0b

vinyl: check existence of vinyl data directory on demand · 114f0403

Vladimir Davydov authored 7 years ago

It's inappropriate to fail startup if the vinyl data directory does not
exist, because vinyl might not even be in use. Move the check to the
time when a vinyl index is opened.

See #2664

114f0403

vinyl: fix gc of incomplete runs left from dropped indexes · e8e7ed18

Vladimir Davydov authored 7 years ago

If an index is dropped while a dump/compaction task is in progress, the
task will be aborted, but VY_LOG_DROP_RUN record won't be written to the
metadata log so the run will remain incomplete (VY_LOG_PREPARE_RUN) and
pin the dropped index in the log until the server is restarted and all
incomplete runs are purged. This results in occasional failures of the
vinyl/gc test which checks that vylog files are removed after all vinyl
spaces are dropped. There's actually no reason not to log deletion of an
incomplete run that belongs to a dropped index so let's do this to fix
this problem.

Closes #2486

e8e7ed18

memtx: undef BPS tree macros used to define memtx_tree · 622c4bcf
Vladimir Davydov authored 7 years ago
```
So that BPS tree can be redefined in files that include memtx_tree.h
```
622c4bcf

Sep 22, 2017

tarantoolctl: fix "attempt to index local 'cfg'" · 873632f2
Roman Tsisyk authored 7 years ago
```
Fixes #2761
```
873632f2

curl: prefer version from homebrew on macOS · 947b378d

Roman Tsisyk authored 7 years ago

Apple's version of libcurl is outdated and buggy.
Use libcurl from homebrew by default.

Closes #2772

947b378d

doc: tweak extra/dist/example.lua · 563fcafc
Eugene Leonovich authored 7 years ago
```
queue.start() is not needed anymore.
```
563fcafc
vinyl: fix error injection in vy_cache_iterator_restore() · e80c3ecb
Roman Tsisyk authored 7 years ago
```
Follow up the previous commit.
```
e80c3ecb

vinyl: rework cache iterator restore · 073c8d07

Vladimir Davydov authored 7 years ago

The cache iterator restoration function is muddy and suboptimal:

 - If last_stmt equals NULL, it doesn't attempt to change the iterator
   position although it should restart the iterator. Moreover, it
   doesn't even check EQ condition in this case, i.e. it may return a
   statement that doesn't satisfy the search criteria.

 - It may restore iterator position to last_stmt, but the restoration
   semantics mandates that the returned statement must be strictly
   greater than last_stmt if the iterator was restored.

 - If the iterator was invalidated, it first positions the iterator to
   the last known position (curr_stmt) and then steps back to the
   statement closest to last_stmt although it could jump to last_stmt
   immediately.

Let's rewrite it.

073c8d07

vinyl: clean up upsert squashing procedure of read iterator · 98ac5be8

Vladimir Davydov authored 7 years ago

vy_merge_iterator_squash_upserts() returns an UPSERT, which is then
converted to a REPLACE by vy_read_iterator_next(). Fold the latter into
the former to make the workflow easier to follow. Also, remove useless
suppress_error argument.

98ac5be8

vinyl: do not handle restore in next_key/next_lsn callbacks · 6fb87432

Vladimir Davydov authored 7 years ago

Not only restore() callback can restore the iterator position after the
source changed - next_key() and next_lsn() can do that too, which brings
additional complexity. Do we really need it? Not really:

 - We can avoid handling restoration in next_key() by simply calling
   restore() before it. This is cheap as restore() is a no-op if the
   source was not changed.

 - Regarding next_lsn() - when it comes to calling it, we iterate over
   sources from the most recent one to the oldest, which means that the
   only sources out there that may yield, on-disk runs, are guaranteed
   to be iterated last, after all mutable sources (txw, cache, mems), so
   none of the mutable sources needs to be restored in this case.

Simplify the read iterator as per above.

6fb87432

vinyl: make sure next_lsn callback is only called after next_key · e5229867

Vladimir Davydov authored 7 years ago

Currently, ->next_lsn() may be (and actually is) called right after
opening the iterator, which means that apart from advancing the iterator
to the next version of the same statement it has to handle the case of
starting iteration. This complicates the callback semantics so let's
require ->next_key() to be called first.

e5229867

Sep 21, 2017
- httpc: don't pollute global namespace · 4d2b53af
  Roman Tsisyk authored 7 years ago
  
  Closes #2774
  4d2b53af
Sep 20, 2017
- alter: forbid to drop space with truncate record · 29d00dca
  Vladimir Davydov authored 7 years ago
  
  We generally do not allow to drop objects that have dependents.
  29d00dca
- box: remove sequence_get · 3ff1f1e3
  Vladimir Davydov authored 7 years ago
  
  In contrast to PostgreSQL, this method doesn't make sense in Tarantool, because we don't have sessions.
  3ff1f1e3
Sep 19, 2017

box: implement persistent sequences · f797eec4

Vladimir Davydov authored 7 years ago

This patch implements a new object type, persistent sequences. Sequences
are created with function box.schema.sequence.create(name, options).
Options include min/max, start value, increment, cache size, just like
in Postgresql, although 'cache' is ignored for now. All sequences can be
accessed via box.sequence.<name>, similarly to spaces. To generate a
sequence value, use seq:next() method. To retrieve the last generated
value, use seq:get(). A sequence value can also be reset to the start
value or to any other value using seq:reset() and seq:set() methods.

Needed for #389

f797eec4

box: use trigger to push space data to Lua · 1f736583

Vladimir Davydov authored 7 years ago

Currently, it is done by space_cache_replace/delete which violates
incapsulation. Let's introduce a trigger that is fired after a change in
a space definition is committed and use it to propagate changes to Lua.
Patch by @kostja.

1f736583

Update man file for tarantool · fca3d686
lenkis authored 7 years ago

fca3d686
box: add boolean field type · 702bcac2
Vladimir Davydov authored 7 years ago

702bcac2

Sep 18, 2017
- alter: implement space_swap_triggers() · 6c29b110
  Konstantin Osipov authored 7 years ago
  
  Implement a helper function to swap space triggers. Patch by @locker.
  6c29b110