Commits · e798a012a5913cd2677add7e65b4293681b65dd0 · core / tarantool

May 04, 2017

Create 1.8.0 branch · e798a012
Konstantin Osipov authored 7 years ago

View commits for tag 1.8.0 1.8.0

e798a012
net.box: expose schema_version to public API · b37fc52e
Roman Tsisyk authored 7 years ago
```
Fixes #2412
```
b37fc52e
Add a test case for box.internal.schema_version() · 6a449bb3
Roman Tsisyk authored 7 years ago

6a449bb3
Rename schema_id to schema_version · c23e4765
Roman Tsisyk authored 7 years ago
```
Fixes #2265
```
c23e4765
Expose sc_version via box.internal.schema_version() · cbfc235e
Konstantin Nazarov authored 7 years ago

cbfc235e

vinyl: don't mangle key and type of run/mem iterator · 4ea60cad

We do it in order to reuse the code starting the iteration for restore.
This looks ugly. Instead we'd better pass contrived type and key as
function arguments. Also, rename vy_{run,mem}_iterator_do_start() to
vy_{run,mem}_iterator_start_from().

Closes #2405

4ea60cad

Explicitly include <ctype.h> in src/lua/init.c · a08f20ff

Kirill Yukhin authored 7 years ago

	* src/lua/init.c: Explicitly include <ctype.h>.
	`isspace` is used in source. Normally <ctype.h> is included
	through <readline/readline.h>, nut native `Readline` on OSX doesn't
	include <ctype.h>.

a08f20ff

Travis CI: turn off verbose output from coveralls-lcov · eb9d8d35
Roman Tsisyk authored 7 years ago
```
The output doesn't fit to 4MB limit for logs.
```
eb9d8d35

Fix severe slowdown on certain strings in LuaJIT · b8b102d5

Roman Tsisyk authored 7 years ago

Apply a patch from Yura Sokolov:

The default "fast" string hash function samples only a few positions in
a string, the remaining bytes don't affect the function's result. The
function performs well for short strings; however long strings can yield
extremely high collision rates.

An adaptive schema was implemented. Two hash functions are used
simultaneously. A bucket is picked based on the output of the fast hash
function. If an item is to be inserted in a collision chain longer than
a certain threshold, another bucket is picked based on the stronger hash
function. Since two hash functions are used simultaneously, insert
should consider two buckets. The second bucket is often NOT considered
thanks to the bloom filter. The filter is rebuilt during GC cycle.

b8b102d5

Update LuaJIT to 2.1.0-beta3 · cef07700

Roman Tsisyk authored 7 years ago

Highlights from Mike Pall [1]:

The major addition over beta2 is the LJ_GC64 mode JIT compiler
backend contributed by Peter Cawley. Previously, only the x64 and
ARM64 interpreters could be built in this mode. This mode removes
the 32 bit limitation for garbage collected memory on 64 bit systems.

LuaJIT for x64 can optionally be built for LJ_GC64 mode by
enabling the -DLUAJIT_ENABLE_GC64 line in src/Makefile or via
'msvcbuild.bat gc64'.

Cisco Systems, Inc. and Linaro have sponsored the development of
the JIT compiler backend for ARM64. Contributors are Djordje
Kovacevic and Stefan Pejic from RT-RK, Charles Baylis from Linaro
and Zheng Xu from ARM. ARM64 big endian mode is now supported, too.

Cisco Systems, Inc. has sponsored the development of the MIPS64
interpreter and JIT compiler backend. Contributors are Djordje
Kovacevic and Stefan Pejic from RT-RK.

Peter Cawley has contributed the changes for full exception
interoperability on Windows/x86 (32 bit).

François Perrad has contributed various extensions from Lua 5.2 and
Lua 5.3. Note: some left-over compatibility defines for Lua 5.0
have been removed from the header files.

[1]: https://www.freelists.org/post/luajit/LuaJIT210beta3

In context of #2396

cef07700

Remove old Lua 5.0 compatibility defines · b6e03e69
Roman Tsisyk authored 7 years ago
```
See https://github.com/LuaJIT/LuaJIT/commit/dc320ca70f

In context of #2393
```
b6e03e69

May 03, 2017

vinyl: pin run slice in iterator · d40d611d

Vladimir Davydov authored 7 years ago

Currently, we take a reference to vy_slice while waiting for IO in run
iterator to avoid use-after-free. Since a slice references a run, we
also need a reference counter in vy_run. We can't use the same reference
counter for counting the number of active slices, because it includes
deleted slices which stay allocated only because of being pinned by
iterators, hence on top of that we add vy_run->slice_count. And all this
machinery exists solely for the sake of run iterator!

This patch reworks this as follows. It removes vy_run->refs and
vy_slice->refs, leaving only vy_run->slice_count since it is needed for
detecting unused runs. Instead it adds vy_slice->pin_count similar to
vy_mem->pin_count. As long as the pin_count > 0, the slice can't be
deleted. The one who wants to delete the slice (compaction, split, index
removal) has to wait until the slice is unpinned. Run iterator pins the
slice while waiting for IO. All in all this should make the code easier
to follow.

d40d611d

vinyl: tx_serial.test: do not append very long strings in lua. · 4b5c6d2b

Alexandr Lyapunov authored 7 years ago

Patch f57151941ab9abc103c1d5f79d24c48238ab39cc introduced
generation of reproduce code and dump of it to the log. But
the problem is that the code is initially generated in a big
lua string using repeated concatenation in a loop. Such a use
of lua strings is too vulnerable in terms of performance.
Avoid repeated concatenation of lua string in tx_serial.test.

4b5c6d2b

vinyl: improve tx_conflict.test · 1c543ab6

Vladimir Davydov authored 7 years ago

The test now generetes lua code that reproduces found problem.
The generated code is saved in log.

Copied from tx_serial.test

1c543ab6

vinyl: use stailq instead of rlist for linking log records · b6047032
Vladimir Davydov authored 7 years ago
```
We don't need a doubly-linked list for this. Singly-linked will do.
```
b6047032

vinyl: yield while adding run slices to ranges on dump · 10f0a1b5

Vladimir Davydov authored 7 years ago

The loop over all ranges can take long so we should yield once in a
while in order not to stall the TX thread. The problem is we can't
delete dumped in-memory trees until we've added a slice of the new run
to each range, so if we yield while adding slices, a concurrent fiber
will see a range with a slice containing statements present in in-memory
trees, which breaks the assumption taken by merge iterator that its
sources don't have duplicates. Handle this by filtering out newly dumped
runs by LSN in vy_read_iterator_add_disk().

10f0a1b5

vinyl: don't create empty slices on dump · 7a18727a

Vladimir Davydov authored 7 years ago

Adding an empty slice to a range is pointless, besides it triggers
compaction for no reason, which is especially harmful in case of
time-series-like workload. On dump we can omit creating slices for
ranges that are not intersected by the new run. Note how it affects the
coalesce test: now we have to insert a statement into each range to
trigger compaction, not just into the first one.

7a18727a

vinyl: write index dump lsn to metadata log · aace2e14

Vladimir Davydov authored 7 years ago

When replaying local WAL, we filter out statements that were written to
disk before restart by checking stmt->lsn against run->max_lsn: if the
latter is greater, the statement was dumped. Although it is undoubtedly
true, this check isn't quite correct. The thing is run->max_lsn might be
less that the actual lsn at the time the run was dumped, because max_lsn
is computed as the maximum among all statements present in the run file,
which doesn't include deleted statements. If this happens, we might
replay some statements for nothing: they will cancel each other anyway.
This may be dangerous, because the number of such statements can be
huge. Suppose, a whole run consists of deleted statements, i.e. there's
no run file at all. Then we replay all statements in-memory, which might
result in OOM, because the scheduler isn't started until the local
recovery is completed.

To avoid that, introduce a new record type in the metadata log,
VY_LOG_DUMP_INDEX, which is written on each index dump, even if no file
is created, and contains the LSN of the dump. Use this LSN on recovery
to detect statements that don't need to be replayed.

aace2e14

vinyl: delete empty runs right away · f5645474

Vladimir Davydov authored 7 years ago

This reverts commit a366b5bb ("vinyl: keep track of empty runs").

The former single memory level design required knowledge of max LSN of
each run. Since this information can't be extracted from the run file in
general (the newest key might have been deleted by compaction), we added
it to the metadata log. Since we can get an empty run (i.e. a run w/o
file on disk) as a result of compaction or dump, we had to add a special
flag to the log per each run, is_empty, so that we could store a run
record while omitting loading run file. Thanks to the concept of slices,
this is not needed any more, so we can move min/max LSN back to the
index file and remove is_empty flag from the log. This patch starts from
removing is_empty flag.

f5645474

vinyl: allocate records for metadata log dynamically · b82d5ac8

Vladimir Davydov authored 7 years ago

Currently, we use a fixed size buffer, which can accommodate up to 64
records. With the single memory level it can easily overflow, as we
create a slice for each range on dump in a single transaction, i.e. if
there are > 64 ranges in an index, we may get a panic. So this patch
makes vylog use a list of dynamically allocated records instead of a
static array.

b82d5ac8

vinyl: add test for iterator in transaction involving several spaces · 63ceb698
Vladimir Davydov authored 7 years ago
```
Closes #2394
```
63ceb698
Export box_index_key_def() to public C API · f42e1c15
Roman Tsisyk authored 7 years ago
```
Closes #2386
```
f42e1c15

net.box: minor renames · 0a0bc4d6

Roman Tsisyk authored 7 years ago

Rename `remote_check` to `check_remote_arg` to follow conventions
in schema.lua

0a0bc4d6

net.box: remove varargs from call() and eval() · 386df3d3

Roman Tsisyk authored 7 years ago

Change conn:call() and conn:eval() API to accept Lua table instead of
varargs for function/expression arguments:

    conn:call(func_name, arg1, arg2, ...)
      =>
    conn:call(func_name, {arg1, arg2, ...}, opts)

    conn:eval(expr, arg1, arg2, ...)
      =>
    conn:eval(expr, {arg1, arg2, ...}, opts)

This breaking change is needed to extend call() and eval() API with
per-requests options, like `timeout` and `buffer` (see #2195):

    c:call("echo", {1, 2, 3}, {timeout = 0.2})

    c:call("echo", {1, 2, 3}, {buffer = ibuf})
    ibuf.rpos, result = msgpack.ibuf_decode(ibuf.rpos)
    result

Tarantool 1.6.x behaviour can be turned on by `call_16` per-connection option:

    c = net.connect(box.cfg.listen, {call_16 = true})
    c:call('echo', 1, 2, 3)

This is a breaking change for 1.7.x.

Needed for #2285
Closes #2195

386df3d3

Add support for space:format() to net.box · 905d44b0

Konstantin Nazarov authored 7 years ago

Getting the space format should be safe, as it is tied to schema_id,
and net.box makes sure that schema_id stays consistent.

It means that when you receive a tuple from net.box, you may be sure
that its space format is consistent with the remote.

Fixes #2402

905d44b0

Add a test case for space:format() · 9e7749eb
Roman Tsisyk authored 7 years ago
```
Fixes #2391
```
9e7749eb
Fix error when setting space format · 75b8cc9b
Konstantin Nazarov authored 7 years ago
```
Previously the format in space:format() wasn't allowed to be nil.

In context of #2391
```
75b8cc9b

May 02, 2017

vinyl: implement single memory level · de27e278

Vladimir Davydov authored 7 years ago

 - In-memory trees are now created per index, not per range as before.
 - Dump is scheduled per index and writes the whole in-memory tree to a
   single run file. Upon completion it creates a slice for each range of
   the index.
 - Compaction is scheduled per range as before, but now it doesn't
   include in-memory trees, only on-disk runs (via slices). Compaction
   and dump of the same index can happen simultaneously.
 - Range split, just like coalescing, is done immediately by creating
   new slices and doesn't require long-term operations involving disk
   writes.

de27e278

vinyl: teach memory iterator to skip statements on start · 1e154ede

Vladimir Davydov authored 7 years ago

With the single in-memory tree per index, read iterator will reopen
memory iterator per each range, as it already does in case of txw and
cache iterators, so we need to teach memory iterator to skip to the
statement following the last key returned by read iterator. So this
patch adds a new parameter to memory iterator, before_first, which, if
not NULL, will make it start iteration from the first statement
following the key of before_first.

1e154ede

vinyl: move write_iterator->key unref from cleanup to delete · f9c0b99e

Vladimir Davydov authored 7 years ago

The key is created in the main cord so there's absolutely no point in
deleting it in a worker thread. Moving key unref from cleanup to delete
will simplify some of the workflows of the single memory level patch.

f9c0b99e

vinyl: drop only_disk read iterator argument · da3f11a5

Vladimir Davydov authored 7 years ago

This parameter was needed for replication before it was redesigned.
Currently, it is always false.

da3f11a5

vinyl: sort slices by lsn on recovery · 3a6c2ff3

Vladimir Davydov authored 7 years ago

To ease recovery, vy_recovery_iterate() iterates over slices of the same
range in the chronological order. It is easy to do, because we always
log slices of the same range in the chronological order, as there can't
be concurrent dump and compaction of the same range. However, this will
not hold when the single memory level is introduced: a dump, which adds
new slices to all ranges, may occur while compaction is in progress so
that when compaction is finished a record corresponding to the slice
created by compaction will appear after the slice created by dump,
although the latter is newer. To prevent this from breaking the
assumption made by iterators that newer slices are closer to the head of
vy_range->slices list, let's sort the list on recovery/join.

3a6c2ff3

vinyl: don't recover the same run for each its slice · 10a739b5

Vladimir Davydov authored 7 years ago

Currently, on recovery we create and load a new vy_run per each slice,
so if there's more than one slice created for a run, we will have the
same run duplicated in memory. To avoid that, maintain the hash of all
runs loaded during recovery of the current index, and look up the run
there when a slice is created instead of creating a new run.

Note, we don't need to do anything like this on initial join, as we
delete the run right after sending it to the replica, so we can just
create a new run each time we make a slice.

10a739b5

vinyl: store run slices in metadata log · f18dbce6

Vladimir Davydov authored 7 years ago

In order to recover run slices, we need to store info about them in the
metadata log, so this patch introduces two new records:
 - VY_LOG_INSERT_SLICE: takes IDs of the slice, the range to insert the
   slice into, and the run the slice is for. Also, it takes the slice
   boundaries as after coalescing two ranges a slice inserted into the
   resulting range may be narrower than the range.
 - VY_LOG_DELETE_SLICE: takes ID of the slice to delete.

Also, it renames VY_LOG_INSERT_RUN and VY_LOG_DELETE_RUN to
VY_LOG_CREATE_RUN and VY_LOG_DROP_RUN.

Note, we don't need to keep deleted ranges (and slices) in the log until
the garbage collection wipes them away any more, because they are not
needed by deleted run records, which garbage collection targets at.

f18dbce6

vinyl: rename range_{begin,end} keys to {begin,end} in vy_log · b80a2cf8

Vladimir Davydov authored 7 years ago

The same keys will be used to specify slice boundaries, so let's call
them in a neutral way. No functional changes.

b80a2cf8

vinyl: count number of slices per run · d716f54f

Vladimir Davydov authored 7 years ago

Currently, there can't be more than one slice per run, but this will
change one the single memory level is introduced. Then we will have to
count the number of slices per each run so as not to unaccount the same
run more than once on each slice deletion. Unfortunately, we can't use
vy_run->refs to count the number of slices created per each run,
because, although vy_run->refs is only incremented per each slice
allocated for the run, this includes slices that were removed from
ranges and stay allocated only because of being pinned by open
iterators. So we add one more counter to vy_run, slice_count, and
introduce new helpers to be used for slice creation/destruction,
vy_run_make_slice() and vy_run_destroy_slice(), which inc/dec the
counter.

d716f54f

vinyl: make check for empty range on split more thorough · 09d56944

Vladimir Davydov authored 7 years ago

There's a sanity check in vy_range_needs_split() that assures the
resulting ranges are not going to be empty: it checks the split key
against the oldest run's min key. The check is not enough for the slice
concept, because even if the split key is > min key, it still can be <
the beginning of the slice.

09d56944

vinyl: add slice size estimate · 3740ff9d

Vladimir Davydov authored 7 years ago

We use run->info.keys to estimate the size of a new run's bloom filter.
We use run->info.size to trigger range split/coalescing. If a range
contains a slice that spans only a part of a run, we can't use run->info
stats, so this patch introduces the following slice stats: number of
keys (for the bloom filter) and the size on disk (for split/coalesce).
These two counters are not accurate, they are only estimates, because
calculating exact numbers would require disk reads. Instead we simply
take the corresponding run's stat and multiply it by

slice page count / run page count

3740ff9d

vinyl: separate accounting of ranges and runs · fccaa3f1

Vladimir Davydov authored 7 years ago

There will be more than one slice per run, i.e. the same run will be
used jointly by multiple ranges. To make sure that a run isn't accounted
twice, separate run accounting from range accounting.

fccaa3f1

vinyl: teach run iterator to respect slice boundaries · c0bb544d

Vladimir Davydov authored 7 years ago

Make sure that we start iteration within the given slice and end it as
soon as the current position leaves the slice boundaries. Note, the
overhead caused by extra comparisons is only incurred if the slice has
non-NULL boundaries, which is only the case if the run is shared among
ranges.

c0bb544d