- May 04, 2017
-
-
Roman Tsisyk authored
Fixes #2412
-
Roman Tsisyk authored
-
Roman Tsisyk authored
Fixes #2265
-
Konstantin Nazarov authored
-
Vladimir Davydov authored
We do it in order to reuse the code starting the iteration for restore. This looks ugly. Instead we'd better pass contrived type and key as function arguments. Also, rename vy_{run,mem}_iterator_do_start() to vy_{run,mem}_iterator_start_from(). Closes #2405
-
Kirill Yukhin authored
* src/lua/init.c: Explicitly include <ctype.h>. `isspace` is used in source. Normally <ctype.h> is included through <readline/readline.h>, nut native `Readline` on OSX doesn't include <ctype.h>.
-
Roman Tsisyk authored
The output doesn't fit to 4MB limit for logs.
-
Roman Tsisyk authored
Apply a patch from Yura Sokolov: The default "fast" string hash function samples only a few positions in a string, the remaining bytes don't affect the function's result. The function performs well for short strings; however long strings can yield extremely high collision rates. An adaptive schema was implemented. Two hash functions are used simultaneously. A bucket is picked based on the output of the fast hash function. If an item is to be inserted in a collision chain longer than a certain threshold, another bucket is picked based on the stronger hash function. Since two hash functions are used simultaneously, insert should consider two buckets. The second bucket is often NOT considered thanks to the bloom filter. The filter is rebuilt during GC cycle.
-
Roman Tsisyk authored
Highlights from Mike Pall [1]: The major addition over beta2 is the LJ_GC64 mode JIT compiler backend contributed by Peter Cawley. Previously, only the x64 and ARM64 interpreters could be built in this mode. This mode removes the 32 bit limitation for garbage collected memory on 64 bit systems. LuaJIT for x64 can optionally be built for LJ_GC64 mode by enabling the -DLUAJIT_ENABLE_GC64 line in src/Makefile or via 'msvcbuild.bat gc64'. Cisco Systems, Inc. and Linaro have sponsored the development of the JIT compiler backend for ARM64. Contributors are Djordje Kovacevic and Stefan Pejic from RT-RK, Charles Baylis from Linaro and Zheng Xu from ARM. ARM64 big endian mode is now supported, too. Cisco Systems, Inc. has sponsored the development of the MIPS64 interpreter and JIT compiler backend. Contributors are Djordje Kovacevic and Stefan Pejic from RT-RK. Peter Cawley has contributed the changes for full exception interoperability on Windows/x86 (32 bit). François Perrad has contributed various extensions from Lua 5.2 and Lua 5.3. Note: some left-over compatibility defines for Lua 5.0 have been removed from the header files. [1]: https://www.freelists.org/post/luajit/LuaJIT210beta3 In context of #2396
-
Roman Tsisyk authored
See https://github.com/LuaJIT/LuaJIT/commit/dc320ca70f In context of #2393
- May 03, 2017
-
-
Vladimir Davydov authored
Currently, we take a reference to vy_slice while waiting for IO in run iterator to avoid use-after-free. Since a slice references a run, we also need a reference counter in vy_run. We can't use the same reference counter for counting the number of active slices, because it includes deleted slices which stay allocated only because of being pinned by iterators, hence on top of that we add vy_run->slice_count. And all this machinery exists solely for the sake of run iterator! This patch reworks this as follows. It removes vy_run->refs and vy_slice->refs, leaving only vy_run->slice_count since it is needed for detecting unused runs. Instead it adds vy_slice->pin_count similar to vy_mem->pin_count. As long as the pin_count > 0, the slice can't be deleted. The one who wants to delete the slice (compaction, split, index removal) has to wait until the slice is unpinned. Run iterator pins the slice while waiting for IO. All in all this should make the code easier to follow.
-
Alexandr Lyapunov authored
Patch f57151941ab9abc103c1d5f79d24c48238ab39cc introduced generation of reproduce code and dump of it to the log. But the problem is that the code is initially generated in a big lua string using repeated concatenation in a loop. Such a use of lua strings is too vulnerable in terms of performance. Avoid repeated concatenation of lua string in tx_serial.test.
-
Vladimir Davydov authored
The test now generetes lua code that reproduces found problem. The generated code is saved in log. Copied from tx_serial.test
-
Vladimir Davydov authored
We don't need a doubly-linked list for this. Singly-linked will do.
-
Vladimir Davydov authored
The loop over all ranges can take long so we should yield once in a while in order not to stall the TX thread. The problem is we can't delete dumped in-memory trees until we've added a slice of the new run to each range, so if we yield while adding slices, a concurrent fiber will see a range with a slice containing statements present in in-memory trees, which breaks the assumption taken by merge iterator that its sources don't have duplicates. Handle this by filtering out newly dumped runs by LSN in vy_read_iterator_add_disk().
-
Vladimir Davydov authored
Adding an empty slice to a range is pointless, besides it triggers compaction for no reason, which is especially harmful in case of time-series-like workload. On dump we can omit creating slices for ranges that are not intersected by the new run. Note how it affects the coalesce test: now we have to insert a statement into each range to trigger compaction, not just into the first one.
-
Vladimir Davydov authored
When replaying local WAL, we filter out statements that were written to disk before restart by checking stmt->lsn against run->max_lsn: if the latter is greater, the statement was dumped. Although it is undoubtedly true, this check isn't quite correct. The thing is run->max_lsn might be less that the actual lsn at the time the run was dumped, because max_lsn is computed as the maximum among all statements present in the run file, which doesn't include deleted statements. If this happens, we might replay some statements for nothing: they will cancel each other anyway. This may be dangerous, because the number of such statements can be huge. Suppose, a whole run consists of deleted statements, i.e. there's no run file at all. Then we replay all statements in-memory, which might result in OOM, because the scheduler isn't started until the local recovery is completed. To avoid that, introduce a new record type in the metadata log, VY_LOG_DUMP_INDEX, which is written on each index dump, even if no file is created, and contains the LSN of the dump. Use this LSN on recovery to detect statements that don't need to be replayed.
-
Vladimir Davydov authored
This reverts commit a366b5bb ("vinyl: keep track of empty runs"). The former single memory level design required knowledge of max LSN of each run. Since this information can't be extracted from the run file in general (the newest key might have been deleted by compaction), we added it to the metadata log. Since we can get an empty run (i.e. a run w/o file on disk) as a result of compaction or dump, we had to add a special flag to the log per each run, is_empty, so that we could store a run record while omitting loading run file. Thanks to the concept of slices, this is not needed any more, so we can move min/max LSN back to the index file and remove is_empty flag from the log. This patch starts from removing is_empty flag.
-
Vladimir Davydov authored
Currently, we use a fixed size buffer, which can accommodate up to 64 records. With the single memory level it can easily overflow, as we create a slice for each range on dump in a single transaction, i.e. if there are > 64 ranges in an index, we may get a panic. So this patch makes vylog use a list of dynamically allocated records instead of a static array.
-
Vladimir Davydov authored
Closes #2394
-
Roman Tsisyk authored
Closes #2386
-
Roman Tsisyk authored
Rename `remote_check` to `check_remote_arg` to follow conventions in schema.lua
-
Roman Tsisyk authored
Change conn:call() and conn:eval() API to accept Lua table instead of varargs for function/expression arguments: conn:call(func_name, arg1, arg2, ...) => conn:call(func_name, {arg1, arg2, ...}, opts) conn:eval(expr, arg1, arg2, ...) => conn:eval(expr, {arg1, arg2, ...}, opts) This breaking change is needed to extend call() and eval() API with per-requests options, like `timeout` and `buffer` (see #2195): c:call("echo", {1, 2, 3}, {timeout = 0.2}) c:call("echo", {1, 2, 3}, {buffer = ibuf}) ibuf.rpos, result = msgpack.ibuf_decode(ibuf.rpos) result Tarantool 1.6.x behaviour can be turned on by `call_16` per-connection option: c = net.connect(box.cfg.listen, {call_16 = true}) c:call('echo', 1, 2, 3) This is a breaking change for 1.7.x. Needed for #2285 Closes #2195
-
Konstantin Nazarov authored
Getting the space format should be safe, as it is tied to schema_id, and net.box makes sure that schema_id stays consistent. It means that when you receive a tuple from net.box, you may be sure that its space format is consistent with the remote. Fixes #2402
-
Roman Tsisyk authored
Fixes #2391
-
Konstantin Nazarov authored
Previously the format in space:format() wasn't allowed to be nil. In context of #2391
-
- May 02, 2017
-
-
Vladimir Davydov authored
- In-memory trees are now created per index, not per range as before. - Dump is scheduled per index and writes the whole in-memory tree to a single run file. Upon completion it creates a slice for each range of the index. - Compaction is scheduled per range as before, but now it doesn't include in-memory trees, only on-disk runs (via slices). Compaction and dump of the same index can happen simultaneously. - Range split, just like coalescing, is done immediately by creating new slices and doesn't require long-term operations involving disk writes.
-
Vladimir Davydov authored
With the single in-memory tree per index, read iterator will reopen memory iterator per each range, as it already does in case of txw and cache iterators, so we need to teach memory iterator to skip to the statement following the last key returned by read iterator. So this patch adds a new parameter to memory iterator, before_first, which, if not NULL, will make it start iteration from the first statement following the key of before_first.
-
Vladimir Davydov authored
The key is created in the main cord so there's absolutely no point in deleting it in a worker thread. Moving key unref from cleanup to delete will simplify some of the workflows of the single memory level patch.
-
Vladimir Davydov authored
This parameter was needed for replication before it was redesigned. Currently, it is always false.
-
Vladimir Davydov authored
To ease recovery, vy_recovery_iterate() iterates over slices of the same range in the chronological order. It is easy to do, because we always log slices of the same range in the chronological order, as there can't be concurrent dump and compaction of the same range. However, this will not hold when the single memory level is introduced: a dump, which adds new slices to all ranges, may occur while compaction is in progress so that when compaction is finished a record corresponding to the slice created by compaction will appear after the slice created by dump, although the latter is newer. To prevent this from breaking the assumption made by iterators that newer slices are closer to the head of vy_range->slices list, let's sort the list on recovery/join.
-
Vladimir Davydov authored
Currently, on recovery we create and load a new vy_run per each slice, so if there's more than one slice created for a run, we will have the same run duplicated in memory. To avoid that, maintain the hash of all runs loaded during recovery of the current index, and look up the run there when a slice is created instead of creating a new run. Note, we don't need to do anything like this on initial join, as we delete the run right after sending it to the replica, so we can just create a new run each time we make a slice.
-
Vladimir Davydov authored
In order to recover run slices, we need to store info about them in the metadata log, so this patch introduces two new records: - VY_LOG_INSERT_SLICE: takes IDs of the slice, the range to insert the slice into, and the run the slice is for. Also, it takes the slice boundaries as after coalescing two ranges a slice inserted into the resulting range may be narrower than the range. - VY_LOG_DELETE_SLICE: takes ID of the slice to delete. Also, it renames VY_LOG_INSERT_RUN and VY_LOG_DELETE_RUN to VY_LOG_CREATE_RUN and VY_LOG_DROP_RUN. Note, we don't need to keep deleted ranges (and slices) in the log until the garbage collection wipes them away any more, because they are not needed by deleted run records, which garbage collection targets at.
-
Vladimir Davydov authored
The same keys will be used to specify slice boundaries, so let's call them in a neutral way. No functional changes.
-
Vladimir Davydov authored
Currently, there can't be more than one slice per run, but this will change one the single memory level is introduced. Then we will have to count the number of slices per each run so as not to unaccount the same run more than once on each slice deletion. Unfortunately, we can't use vy_run->refs to count the number of slices created per each run, because, although vy_run->refs is only incremented per each slice allocated for the run, this includes slices that were removed from ranges and stay allocated only because of being pinned by open iterators. So we add one more counter to vy_run, slice_count, and introduce new helpers to be used for slice creation/destruction, vy_run_make_slice() and vy_run_destroy_slice(), which inc/dec the counter.
-
Vladimir Davydov authored
There's a sanity check in vy_range_needs_split() that assures the resulting ranges are not going to be empty: it checks the split key against the oldest run's min key. The check is not enough for the slice concept, because even if the split key is > min key, it still can be < the beginning of the slice.
-
Vladimir Davydov authored
We use run->info.keys to estimate the size of a new run's bloom filter. We use run->info.size to trigger range split/coalescing. If a range contains a slice that spans only a part of a run, we can't use run->info stats, so this patch introduces the following slice stats: number of keys (for the bloom filter) and the size on disk (for split/coalesce). These two counters are not accurate, they are only estimates, because calculating exact numbers would require disk reads. Instead we simply take the corresponding run's stat and multiply it by slice page count / run page count
-
Vladimir Davydov authored
There will be more than one slice per run, i.e. the same run will be used jointly by multiple ranges. To make sure that a run isn't accounted twice, separate run accounting from range accounting.
-
Vladimir Davydov authored
Make sure that we start iteration within the given slice and end it as soon as the current position leaves the slice boundaries. Note, the overhead caused by extra comparisons is only incurred if the slice has non-NULL boundaries, which is only the case if the run is shared among ranges.
-