- Jun 08, 2017
-
-
Vladimir Davydov authored
Space truncate rework done by the next patch requires the ability to swap data stored on disk between two indexes on recovery so as not to reload all runs every time a space gets truncated. Since we can't swap content of two rb tree (due to rbt_nil), convert vy_index->tree to a pointer.
-
Roman Tsisyk authored
-
Georgy Kirichenko authored
Lock schema before any changes to space and index dictionary and unlock only after commit or rollback. This allow many parallel data definition statements. Issue #2075
-
Georgy Kirichenko authored
We need to lock box schema while editing a ddl space. This lock should be done before any changes in a ddl space. Before trigger is the good place to issue a schema lock. See #2075
-
Vladimir Davydov authored
We must store at least one snapshot, otherwise we wouldn't recover after restart, so if checkpoint_count is set to 0, we disable garbage collection. This contravenes the notion followed everywhere else in tarantool: if we want an option value (timeout, checkpoint count, etc) to be infinite, we should set it to a very big number, not to 0. Make checkpoint_count comply.
-
Vladimir Davydov authored
The current gc implementation has a number of flaws: - It tracks checkpoints, not consumers, which makes it impossible to identify the reason why gc isn't invoked. All we can see is the number of users of each particular checkpoint (reference counter), while it would be good to know what references it (replica or backup). - While tracking checkpoints suits well for backup and initial join, it doesn't look good when used for subscribe, because replica is supposed to track a vclock, not a checkpoint. - Tracking checkpoints from box/gc also violates encapsulation: checkpoints are, in fact, memtx snapshots, so they should be tracked by memtx engine, not by gc, as they are now. This results in atrocities, like having two snap xdirs - one in memtx, another in gc. - Garbage collection is invoked by a special internal function, box.internal.gc.run(), which is passed the signature of the oldest checkpoint to save. This function is then used by the snapshot daemon to maintain the configured number of checkpoints. This brings unjustified complexity to the snapshot daemon implementation: instead of just calling box.snapshot() periodically it has to take on responsibility to invoke the garbage collector with the right signature. This also means that garbage collection is disabled unless snapshot daemon is configured to be running, which is confusing, as snapshot daemon is disabled by default. So this patch reworks box/gc as follows: - Checkpoints are now tracked by memtx engine and can be accessed via a new module box/src/checkpoint.[hc], which provides simple wrappers around corresponding MemtxEngine methods. - box/gc.[hc] now tracks not checkpoints, but individual consumers that can be registered, unregistered, and advanced. Each consumer has a human-readable name displayed by box.internal.gc.info(): tarantool> box.internal.gc.info() --- - consumers: - name: backup signature: 8 - name: replica 885a81a9-a286-4f06-9cb1-ed665d7f5566 signature: 12 - name: replica 5d3e314f-bc03-49bf-a12b-5ce709540c87 signature: 12 checkpoints: - signature: 8 - signature: 11 - signature: 12 ... - box.internal.gc.run() is removed. Garbage collection is now invoked automatically by box.snapshot() and doesn't require the snapshot daemon to be up and running.
-
Konstantin Osipov authored
Fix spelling and rephrase a few comments.
-
Vladislav Shpilevoy authored
If the update operation changes a field with number >= 64, the column mask of the update op is set to UINT64_MAX. Lets use the last bit of the column mask as the flag, that all fields with numbers >= 63 could be changed. Then if the indexed positions are less than 64, the column mask will be always working. Closes #1716
-
Vladislav Shpilevoy authored
Remove waiting for end of the dump of the secondary indexes. According to 0d99714f commit the primary is always dumped after secondary and we can wait for the only primary instead of all indexes.
-
- Jun 07, 2017
-
-
Konstantin Osipov authored
Replace coeio_init() with coio_init(). Remove coeio prefix, and use coio for everything.
-
Konstantin Osipov authored
We use _init() suffic for library-wide initializers.
-
Konstantin Osipov authored
Originally, coio was used for socket io based on libev, and coeio for everything based on libeio. This seems to be hard to hard to remember as the previous commit demonstrates. Some of the APIs are already mixed up and use the wrong prefix (see, for example, struct coio_task). Merge the two set of calls together and use the same prefix for all. First patch in the set, renaming the files and adjusting header guards and makefiles.
-
Vladimir Davydov authored
Since xdir_collect_garbage() uses unlink() to remove files, vylog calls it from a coeio thread so as not to block tx. This is error-prone, because the vylog xdir is modified from tx so if garbage collection races with log rotation, the behavior is undefined. Let's add an argument to xdir_collect_garbage() specifying whether it should unlink files with unlink() or coeio_unlink(), and call it directly from tx, without the use of a coeio task. Also, add a return code to xdir_collect_garbage(), which is set to 0 on success or to -1 in case the function failed to delete a file. It will be used in the following patch.
-
Roman Tsisyk authored
Move all xrow encoding/decoding functions into the common place. In context of #2507
-
Roman Tsisyk authored
Pass schema_version explicitly. In context of #2507
-
Vladislav Shpilevoy authored
Actually, iproto header is the same as xrow header, so encoding should be moved to xrow.h Needed for #2285
-
Vladimir Davydov authored
This is a cleanup, no functional changes intended.
-
Vladimir Davydov authored
This is a cleanup, no functional changes intended.
-
Vladislav Shpilevoy authored
Follow up #2492
-
Vladislav Shpilevoy authored
Return bps_iterator, positioned on the inserted element from the bps_insert. It is usefull, when it is need to insert a statement and then iterate from the inserted position. For example, in #1988. Closes #2492
-
- Jun 05, 2017
-
-
bigbes authored
remove additional definition from Public API in next files: * src/box/index.h * src/box/tuple_format.h
-
Konstantin Osipov authored
Convert port.cc to plain C
-
Vladislav Shpilevoy authored
-
Vladislav Shpilevoy authored
Lets write enum max key without explicit +1 to a penult key. The C automatically assignes to the max key the value = penult key + 1.
-
- Jun 02, 2017
-
-
alyapunov authored
-
alyapunov authored
Extend stmt_stream and implemet write iterator using stmt_stream virtual interface.
-
Alexandr Lyapunov authored
Use the write iterator that was implemented in the previous commit. Remove old version of the iterator. Fixes #2312
-
Alexandr Lyapunov authored
Create a concept of a stream - simple iterator over a mem or a run, that is used for iterating all the tuples in the source. Create streams for mem and run. Create a new simple write iterator in a separate file.
-
Alexandr Lyapunov authored
Add vy_page_index_find_page function. Comment it well. Use it in slice boundaries determination and in run iterator. Fix slice page boundaries determination. Now it works correctly in case of spreading of a key to several pages.
-
- Jun 01, 2017
-
-
Vladimir Davydov authored
Names of the cbus endpoint used in a relay must be unique. To assure that, we name endpoints after replica->id. This isn't quite correct, as replica can be deleted from the cluster table while its relay remains active. If the replica's id happens to be reused for another replica, we will get a cbus endpoint name conflict, which typically results in a crash. Fix this by naming endpoints after the relay in-memory address. Closes #2497
-
Konstantin Osipov authored
A local reference could be sometimes garbage collected and sometimes not, leading to flaky test results.
-
Konstantin Osipov authored
A follow up for 06099023 gh-2491, invalid default value of wal_max_size
-
Vladimir Davydov authored
Currently, we log index creation before writing the index row to WAL, via VinylEngine::addPrimaryKey() and VinylEngine::buildSecondaryKey(). This is incorrect, because it isn't guaranteed that WAL write will succeed. If it doesn't, the index creation is aborted, but the index record is left in the metadata log. At best this will result in garbage in the metadata log, at worst we will get a permanent failure to rotate the log. The latter can happen if the index id in the metadata log (i.e. LSN) is reused by another index, e.g. after the following piece of Lua code is executed by tarantool, box.snapshot() will keep failing: tarantool> box.cfg{} --- ... tarantool> s = box.schema.space.create("test", {engine='vinyl'}) --- ... tarantool> box.error.injection.set('ERRINJ_WAL_IO', true) --- - ok ... tarantool> _ = box.space.test:create_index('pk') --- - error: Failed to write to disk ... tarantool> box.error.injection.set('ERRINJ_WAL_IO', false) --- - ok ... tarantool> _ = box.space.test:create_index('pk') --- ... tarantool> box.snapshot() --- - error: 'Invalid VYLOG file: Duplicate index id 2' ... Fix this by moving vinyl index creation logging to the new Index method commitCreate(), which is called after WAL write. Since we can't fail there, we use the same technique as in commitDrop() to make sure the log record is committed sooner or later, namely in case of vylog write failure we leave the record in the vylog buffer to be flushed along with the next write. If it doesn't get flushed before shutdown, we will replay it on local recovery from WAL.
-
Vladimir Davydov authored
VinylIndex::open() doesn't free vy_index if vy_index_open() fails. Also, VinylIndex leaks vy_index if we fail to log index creation in WAL, because in this case VinylIndex::commitDrop(), which releases the underlying vy_index, isn't called. To fix the leaks, make VinylIndex reference vy_index object which it is wrapped around in constructor and unreference it in destructor. While we are at it, remove useless TODOs from vy_index_open() and vy_index_commit_drop() asking not to drop and recreate index on recovery, because we don't reload index directory on drop/create since the time vylog was introduced.
-
Vladimir Davydov authored
Before commit a63595db ("Implement index:info() introspection"), we needed the list of all vinyl indexes to show index stats via box.info.vinyl(). Now the list is only used in vy_env_delete() to cleanup on shutdown. This code doesn't make much sense as it doesn't delete VinylIndex structures - indexes should be deleted from the upper level instead - so remove the cleanup and delete vy_env->indexes as its meaning is quite obscure (e.g. it isn't quite clear when we should add an index to it, from vy_index_new() or from vy_index_open()).
-
Vladimir Davydov authored
This method only needs Index, it doesn't have anything to do with space. Apart from moving it from Handler to Index, rename it commitDrop() to emphasize the fact that it is called after WAL write.
-
Vladislav Shpilevoy authored
When the read iterator gets next range, it then calls read_iterator_merge_next_key. But this function can call read_iterator_restore, that causes also restore of the range_iterator. This restore returns the previous range, and the next statements are out of the range. Example: curr_range curr_stmt next_stmt +-------------+--------------+ | -inf x +inf | +-------------+--------------+ read_iterator_next_range curr_stmt curr_range next_stmt +-------------+--------------+ | -inf x +inf | +-------------+--------------+ read_iterator_merge_next_keys fails version checking and call read_iterator_restore, which calls range_iterator_restore to the curr_stmt. And then the merge_iterator returns the next_stmt, which is out of the curr_range. curr_range curr_stmt +-------------+--------------+ | -inf x +inf | +-------------+--------------+
-
Roman Tsisyk authored
It should be 256Mb instead of 256Gb. Fixes #2491
-
Vladislav Shpilevoy authored
-
Vladimir Davydov authored
Check that: - garbage collector doesn't delete files used by initial join, final join, or subscribe - old checkpoints are deleted as the replica advances - checkpoint pinned by a stale replica is released when the replica is unregistered
-