- Nov 16, 2017
-
-
Georgy Kirichenko authored
Start applier->writer fiber only after SUBSCRIBE. Otherwiser writer will send ACK during FINAL JOIN and break replication protocol. Fixes #2726
-
- Nov 15, 2017
-
-
Vladimir Davydov authored
Make sure the master receives an ack from the replica and performs garbage collection before checking the checkpoint count.
-
Vladimir Davydov authored
We remove old xlog files as soon as we have sent them to all replicas. However, the fact that we have successfully sent something to a replica doesn't necessarily mean the replica will have received it. If a replica fails to apply a row (for instance, it is out of memory), replication will stop, but the data files have already been deleted on the master so that when the replica is back online, the master won't find appropriate xlog to feed to the replica and replication will stop again. The user visible effect is the following error message in the log and in the replica status: Missing .xlog file between LSN 306 {1: 306} and 311 {1: 311} There is no way to recover from this but to re-bootstrap the replica from scratch. The issue was introduced by commit ba09475f ("replica: advance gc state only when xlog is closed"), which targeted at making the status update procedure as lightweight and fast as possible and so moved gc_consumer_advance() from tx_status_update() to a special gc message. A gc message is created and sent to TX as soon as an xlog is relayed. Let's rework this so that gc messages are appended to a special queue first and scheduled only when the relay receives the receipt confirmation from the replica. Closes #2825
-
Vladimir Davydov authored
Engine callbacks that perform garbage collection may sleep, because they use coio for removing files to avoid blocking the TX thread. If garbage collection is called concurrently from different fibers (e.g. from relay fibers), we may attempt to delete the same file multiple times. What is worse xdir_collect_garbage(), used by engine callbacks to remove files, isn't safe against concurrent execution - it first unlinks a file via coio, which involves a yield, and only then removes the corresponding vclock from the directory index. This opens a race window for another fiber to read the same clock and yield, in the interim the vclock can be freed by the first fiber: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007f105ceda3fa in __GI_abort () at abort.c:89 #2 0x000055e4c03f4a3d in sig_fatal_cb (signo=11) at main.cc:184 #3 <signal handler called> #4 0x000055e4c066907a in vclockset_remove (rbtree=0x55e4c1010e58, node=0x55e4c1023d20) at box/vclock.c:215 #5 0x000055e4c06256af in xdir_collect_garbage (dir=0x55e4c1010e28, signature=342, use_coio=true) at box/xlog.c:620 #6 0x000055e4c0417dcc in memtx_engine_collect_garbage (engine=0x55e4c1010df0, lsn=342) at box/memtx_engine.c:784 #7 0x000055e4c0414dbf in engine_collect_garbage (lsn=342) at box/engine.c:155 #8 0x000055e4c04a36c7 in gc_run () at box/gc.c:192 #9 0x000055e4c04a38f2 in gc_consumer_advance (consumer=0x55e4c1021360, signature=342) at box/gc.c:262 #10 0x000055e4c04b4da8 in tx_gc_advance (msg=0x7f1028000aa0) at box/relay.cc:250 #11 0x000055e4c04eb854 in cmsg_deliver (msg=0x7f1028000aa0) at cbus.c:353 #12 0x000055e4c04ec871 in fiber_pool_f (ap=0x7f1056800ec0) at fiber_pool.c:64 #13 0x000055e4c03f4784 in fiber_cxx_invoke(fiber_func, typedef __va_list_tag __va_list_tag *) (f=0x55e4c04ec6d4 <fiber_pool_f>, ap=0x7f1056800ec0) at fiber.h:665 #14 0x000055e4c04e6816 in fiber_loop (data=0x0) at fiber.c:631 #15 0x000055e4c0687dab in coro_init () at /home/vlad/src/tarantool/third_party/coro/coro.c:110 Fix this by serializing concurrent execution of garbage collection callbacks with a latch.
-
Vladimir Davydov authored
Currently, box.schema.upgrade() is called automatically after box.cfg() if the upgrade is considered safe (currently, only upgrade to 1.7.5 is "safe"). However, no upgrade is safe in case replication is configured, because it can easily result in replication conflicts. Let's disable auto upgrade if the 'replication' configuration option is set. Closes #2886
-
- Nov 13, 2017
-
-
Vladimir Davydov authored
Before commit 29d00dca ("alter: forbid to drop space with truncate record") a space record was removed before the corresponding record in the _truncate system space so we should disable the check that the space being dropped doesn't have a record in _truncate in case we are recovering data generated by tarantool < 1.7.6. Closes #2909
-
- Nov 06, 2017
-
-
Roman Tsisyk authored
-
Roman Tsisyk authored
Bloom filter depends on hash function, which depends on ICU version, which may vary.
-
Roman Tsisyk authored
-
Roman Tsisyk authored
-
Roman Tsisyk authored
+ Don't use id=0 for collations Follow up #2649
-
Vladimir Davydov authored
Fix tuple_hash_field() to handle the following cases properly: - Nullable string field (crash in vinyl on dump). - Scalar field with collation enabled (crash in memtx hash index). Add corresponding test cases.
-
Vladimir Davydov authored
First, unique but nullable indexes are not rebuilt when the primary key is altered although they should be, because they can contain multiple NULLs. Second, when rebuilding such indexes we use a wrong key def (index_def->key_def instead of cmp_def), which results in lost stable order after recovery. Fix both these issues and add a test case.
-
Vladimir Davydov authored
Needed to check if the key definition loaded from vylog to send initial data to a replica has the collation properly recovered.
-
Vladimir Davydov authored
It isn't stored currently, but this doesn't break anything, because the primary key, which is the only key whose definition is used after having been loaded from vylog, can't be nullable. Let's store it there just in case. Update the vinyl/layout test to check that.
-
Vladimir Davydov authored
Collations were disabled in vinyl by commmit 2097908f ("Fix collation test on some platforms and disable collation in vinyl"), because a key_def referencing a collation could not be loaded from vylog on recovery (collation objects are created after vylog is recovered). Now, it isn't a problem anymore, because the decoding procedure, key_def_decode_parts(), deals with struct key_part_def, which references a collation by id and hence doesn't need a collation object to be created. So we can enable collations in vinyl. This patch partially reverts the aforementioned commit (it can't do full revert, because that commit also fixed some tests along the way). Closes #2822
-
Vladimir Davydov authored
We can't use key_def_decode_parts() when recovering vylog if key_def has a collation, because vylog is recovered before the snapshot, i.e. when collation objects haven't been created yet, while key_def_decode_parts() tries to look up the collation by id. As a result, we can't enable collations for vinyl indexes. To fix this, let's rework the decoding procedure so that it works with struct key_part_def instead of key_part. The only difference between the two structures is that the former references the collation by id while the latter by pointer. Needed for #2822
-
Georgy Kirichenko authored
Writer fiber should be stopped before re-connect to avoid sending unwanted IPROTO_OK replication acknowledges. Fixes #2726
-
Georgy Kirichenko authored
SUBSCRIBE command is not multiplexed in the binary protocol. When relay exits with an error during subscribe, remote replica still continue to send IPROTO_OK replication acknowledges to the master. These packets are unwanted by IPROTO decoder. Close socket on errors during SUBSCRIBE. Fixes #2726
- Nov 04, 2017
-
-
Vladimir Davydov authored
The corresponding comparator is missing, which leads to a crash. Fix it and add a test case checking that nullable indexes work fine with all available types.
-
Georgy Kirichenko authored
Symbol resolving can be expensive. Introduce an option for fiber.info(): fiber.info({ backtrace = true }) fiber.info({ bt = true }) Fixes #2878
-
Ilya authored
- Change signature of function access_check_func. Now it returns status instead of function. Close #2816
-
- Nov 03, 2017
-
-
Vladimir Davydov authored
> src/tarantool/src/box/vinyl.c:2111:33: error: initializer element is not a compile-time constant > static const double weight = 1 - exp(-VY_QUOTA_UPDATE_INTERVAL / > ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Remove the "static" qualifier. It is not really needed as any sane compiler will pre-calculate the value of 'weight' at compile time (checked on gcc 6.3.0 with -O0).
-
Vladimir Davydov authored
- Remove vy_stat::rmean statistics, which were left from Sophia, as now we have per index statistics which are much more verbose than those. - Move vy_stat::dump_bw to vy_env and remove struct vy_stat as there's nothing left in it. - Move quota statistics from box.info.vinyl().performance.memory to box.info.vinyl().quota. Remove 'ratio', which equals used / limit, as this kind of calculations should be done by a script aggregating statistics. Report 'use_rate' and 'dump_bandwidth' there. - Report 'limit' in cache statistics to make them consistent with 'quota' statistics, where 'limit' is reported. Rename 'cache.count' to 'cache.tuples'. Remove vy_cache_env::cache_count, use mempool stats instead. - Move 'tx_allocated', 'txv_allocated', 'read_interval', 'read_view' from box.info.vinyl().performance to box.info.vinyl().tx and name them 'transactions', 'statements', 'gap_locks', and 'read_views', respectively. Remove vy_tx_stat::active and 'tx.active' as the same value is shown by 'tx.transactions', extracted from the mempool. - Zap box.info.vinyl().performance - there's nothing left there. Now global statistics look like: tarantool> box.info.vinyl() --- - cache: limit: 134217728 tuples: 32344 used: 34898794 tx: conflict: 1 commit: 324 rollback: 13 statements: 10 transactions: 3 gap_locks: 4 read_views: 1 quota: dump_bandwidth: 10000000 watermark: 119488351 use_rate: 1232703 limit: 134217728 used: 34014634 ... Closes #2861
-
Vladimir Davydov authored
We have a timer for updating watermark every second. Let's reuse it for quota use rate calculation. This will allow us to get rid of legacy vinyl statistics. Also, let's use EWMA for calculating the average. It is a more efficient and common method, which allows to easily tune the period over which the value is averaged.
-
Roman Tsisyk authored
Follow up #1557
-
- Nov 02, 2017
-
-
Alexandr Lyapunov authored
Collation was simply ignored for non-string parts, that could confuse potential user. Generate a readable error in this case. Fix #2862 part 2
-
Alexandr Lyapunov authored
Now collation is silently ignored for type='scalar' parts. Use collation for string scalar fields. Fix #2862 part 1
-
Alexandr Lyapunov authored
Show collation name (if present) in space.index.name.parts[no]. Fix #2862 part 4
-
Alexandr Lyapunov authored
test:create_index('unicode_s1', {parts = {{1, 'STR', collation = 'UNICODE'}}}) will work now. Fix #2862 part 3
-
Vladislav Shpilevoy authored
If a field is not indexed and no more indexed or not nullable fields after that, than allow to skip it in insertion. Such field value looks like MP_NIL, but MP_NIL is not explicitly stored. Named access to this field in lua returns nil. Example: format = {{'field1'}, {'field2'}, {'field3', is_nullable = true}, {'field4', is_nullable = true}} t = space:insert{1, 2} -- ok. t.field1 == 1, t.field2 == 2, t.field3 == nil, t.field4 == nil Closes #2880
-
Vladislav Shpilevoy authored
Some users store in format fields their custom keys. But current opts parser does not allow to store any unknown keys. Lets allow it. Example: format = {} format[1] = {name = 'field1', type = 'unsigned', custom_field = 'custom_value'} s = box.schema.create_space('test', {format = format}) s:format()[1].custom_field == 'custom_value' Closes #2839
-
Vladimir Davydov authored
Using DML/DDL on a Vinyl index with wal_mode = 'none' is likely to result in unrecoverable errors like: F> can't initialize storage: Invalid VYLOG file: Index 512/0 created twice To avoid data corruption in case the user tries to use an existing Vinyl database in conjunction with wal_mode = 'none', let's explicitly forbid it until we figure out how to fix it. Workaround #2278
-
Vladimir Davydov authored
During initial join, a replica receives all data accumulated on the master for its whole lifetime, which may be quota a lot. If the network connection is fast enough, the replica might fail to keep up with dumps, in which case replication fails with ER_VY_QUOTA_TIMEOUT. To avoid that, let's ignore quota timeout until bootstrap is complete. Note, replication may still fail during the 'subscribe' stage for the same reason, but it's unlikely, because the rate at which the master sends data is limited by the number of requests served by the master per a unit of time, and it should become nearly impossible once throttling is introduced (See #1862). Closes #2873
-
Vladimir Davydov authored
If the user sets snap_dir to an empty directory by mistake while leaving vinyl_dir the same, tarantool will still bootstrap, but there is likely to be errors like: vinyl.c:835 E> 512/0: dump failed: file './512/0/00000000000000000001.run' already exists vy_log.c:1095 E> failed to rotate metadata log: file './00000000000000000005.vylog' already exists Even worse, it may eventually fail to restart with: vy_log.c:886 E> ER_MISSING_SNAPSHOT: Can't find snapshot To avoid that, let's check the vinyl_dir on bootstrap and abort if it contains vylog files left from previous setups. Closes #2872
-
Vladimir Davydov authored
The only reason why it was allocated is that struct vy_scheduler was defined after struct vy_env, which is not a problem any more. Embedding it allows us to drop the extra argument to vy_scheduler_need_dump_f().
-
Vladimir Davydov authored
It's a big independent entity, let's isolate its code in a separate file. While we are at it, add missing comments to vy_scheduler struct members.
-
Vladimir Davydov authored
Instead of storing a pointer to vy_env in vy_scheduler, let's: - Add pointers to tx_manager::read_views and vy_env::run_env to vy_scheduler struct. They are needed to create a write iterator for a dump/compaction task. - Add a callback to struct vy_scheduler that is called upon dump completion to free memory. This allows us to eliminate accesses vy_env::quota and vy_env::allocator from vy_scheduler code. - Move the assert that assures that the scheduler isn't started during local recovery from vy_scheduler_f() to vy_env_quota_exceeded_cb() callback so that we don't need to access vy_env::status from the scheduler code. Note, after this change we have to set vy_env::status to VINYL_ONLINE before calling vy_quota_set_limit(), because the latter might schedule a dump. - Check if we have anything to dump from vy_begin_checkpoint() instead of vy_scheduler_begin_checkpoint(). This will allow us to isolate the scheduler code in a separate file.
-
Vladimir Davydov authored
Currently, dump is triggered (by bumping the memory generation) by the scheduler fiber while quota consumers just wake it up. As a result, the scheduler depends on the quota - it has to access the quota to check if it needs to trigger dump. In order to move the scheduler to a separate source file, we need to get rid of this dependency. Let's rework this code as follows: - Remove vy_scheduler_trigger_dump() from vy_scheduler_peek_dump(). The scheduler fiber now just dumps all indexes eligible for dump and completes dump by bumping dump_generation. It doesn't trigger dump by bumping generation anymore. As a result, it doesn't need to access the quota. - Make quota consumers call vy_scheduler_trigger_dump() instead of just waking up the scheduler. This function will become a public one once the scheduler is moved out of vinyl.c. The function logic is changed a bit. First, besides bumping generation, it now also wakes up the scheduler fiber. Second, it does nothing if dump is already in progress or can't be scheduled because of concurrent checkpoint. In the latter case it sets a special flag though that will force the scheduler trigger dump upon checkpoint completion. - vy_scheduler_begin_checkpoint() can't use vy_scheduler_trigger_dump() anymore due to additional checks added to the function, so it bumps the generation directly. This looks fine. - Such a design has a subtlety regarding how quota consumers notify the scheduler and how they are notified back about available quota. In extreme cases, quota released by a dump may be not enough to satisfy all consumers, in which case we need to reschedule dump. Since the scheduler doesn't check the quota anymore and doesn't reschedule dump, it has to be done by the left consumers. So consumers has to call the quota_exceeded_cb (which triggers a dump now) callback every time they are woken up and see there's not enough quota. The vy_quota_use() is reworked accordingly. Also, since the quota usage may exceed the limit (because of vy_quota_force_use()), the quota usage may remain higher than the limit after a dump completion, in which case vy_quota_release() doesn't wake up consumers and again there's no one to trigger another dump. So we must wake up all consumers every time vy_quota_release() is called.
-