- May 29, 2018
-
-
Konstantin Osipov authored
Update the error message for dynamic changes of instance_uuid and replicaset_uuid
-
Konstantin Osipov authored
-
Georgy Kirichenko authored
Handle cases if instance_uuid and replicaset_uuid are present in box.cfg and have same values as already set. Fixes #3421
-
- May 25, 2018
-
-
Konstantin Belyavskiy authored
This fix improves 'box.info.replication' output. If downstream fails and thus disconnects from upstream, improve logging by printing 'status: disconnected' and error message on both sides (master and replica). Closes #3365
-
Konstantin Belyavskiy authored
This is a part of more complex task aiming to improve logging. Do not destroy relay since it stores last error and it can be useful for diagnostic reason. Now relay is created with replica and always exists. So also remove several NULL checks. Add relay_state { OFF, FOLLOW and STOPPED } to track replica presence, once connected it either FOLLOW or STOPPED until master is reset. Updated with @kostja proposal. Used for #3365.
-
Konstantin Osipov authored
-
Vladimir Davydov authored
Currently, when an index is dropped, we remove all ranges/slices associated with it and mark all runs as dropped in vylog immediately. To find ranges/slices/runs, we use vy_lsm struct, see vy_log_lsm_prune. The problem is vy_lsm struct may be inconsistent with the state stored in vylog if index drop races with compaction, because we first write changes done by compaction task to vylog and only then update vy_lsm struct, see vy_task_compact_complete. Since write to vylog yields, this opens a time window during which the index can be dropped. If this happens, objects that were created by compaction but haven't been logged yet (such as new runs, slices, ranges) will be deleted from vylog by index drop, and this will permanently break vylog, making recovery impossible. To fix this issue, let's rework garbage collection of objects associated with dropped indexes as follows. Now when an index is dropped, we write a single record to vylog, VY_LOG_DROP_LSM, i.e. just mark the index as dropped without deleting associated objects. Actual index cleanup takes place in the garbage collection procedure, see vy_gc, which purges all ranges/slices linked to marked indexes from vylog and marks all their runs as dropped. When all runs are actually deleted from disk and "forgotten" in vylog, we remove the index record from vylog by writing VY_LOG_FORGET_LSM record. Since garbage collection procedure uses vylog itself instead of vy_lsm struct for iterating over vinyl objects, no race between index drop and dump/compaction can now lead to broken vylog. Closes #3416
-
Vladimir Davydov authored
This is required to rework garbage collection in vinyl.
-
Vladimir Davydov authored
We pass lsn of index alter/create records, let's pass lsn of drop record for consistency. This is also needed by vinyl to store it in vylog (see the next patch).
-
Vladimir Davydov authored
If an index was dropped and then recreated, then while replaying vylog we will reuse vy_lsm_recovery_info object corresponding to it. There's no reason why we do that instead of simply allocating a new object - amount of memory saved is negligible, but the code looks more complex. Let's simplify the code - whenever we see VY_LOG_CREATE_LSM, create a new vy_lsm_recovery_info object and replace the old incarnation if any in the hash map.
-
Konstantin Osipov authored
replication: make replication_connect_timeout dynamic
-
Konstantin Osipov authored
-
Vladimir Davydov authored
Do not use errinj as it is unreliable. Check that: - No memory is freed by immediately after space drop (WAL is off). - All memory is freed asynchronously after yield.
-
Vladimir Davydov authored
replicaset_sync() returns not only if the instance synchronized to connected replicas, but also if some replicas have disconnected and the quorum can't be formed any more. Nevertheless, it always prints that sync has been completed. Fix it. See #3422
-
Vladimir Davydov authored
If a replica disconnects while sync is in progress, box.cfg{} may stop syncing leaving the instance in 'orphan' mode. This will happen if not enough replicas are connected to form a quorum. This makes sense e.g. on network error, but not when a replica is loading, because in the latter case it should be up and running quite soon. Let's account replicas that disconnected because they haven't completed initial configuration yet and continue syncing if connected + loading > quorum. Closes #3422
-
Konstantin Belyavskiy authored
Small refactoring: remove 'enum replica_state' since reuse a subset from applier state machine 'enum replica_state' to check if we have achieved replication quorum and hence can leave read-only mode.
-
Konstantin Osipov authored
The default of 4 seconds is too low to bootstrap a large cluster.
-
Vladislav Shpilevoy authored
Closes #3425
-
- May 24, 2018
-
-
Georgy Kirichenko authored
In some cases when an applier processing yielded, other applier might start some conflicting operation and break replication and database consistency. Now applier locks a per-server-id latch before processing a transaction. This guarantees that there is only one applier request for each server in progress at each given moment. The problem was very rare until full mesh topologies in vinyl became a commonplace. Fixes gh-3339
-
Vladimir Davydov authored
When a memtx space is dropped or truncated, we delegate freeing tuples stored in it to a background fiber so as not to block the caller (and tx thread) for too long. Turns out it doesn't work out well for ephemeral spaces, which share the destruction code with normal spaces: the problem is the user might issue a lot of complex SQL SELECT statements that create a lot of ephemeral spaces and do not yield and hence don't give the garbage collection fiber a chance to clean up. There's a test that emulates this, 2.0:test/sql-tap/gh-3083-ephemeral-unref-tuples.test.lua. For this test to pass, let's run garbage collection procedure on demand, i.e. when any of memtx allocation functions fails to allocate memory. Follow-up #3408
-
Vladimir Davydov authored
Currently, the engine has not control over yields issued during asynchronous index destruction. As a result, it can't force gc when there's not enough memory. To fix that, let's make gc callback stateful: now it's supposed to free some objects and return true if there's still more objects to free or false otherwise. Yields are now done by the memtx engine itself after each gc callback invocation.
-
- May 22, 2018
-
-
Konstantin Osipov authored
Avoid goto, a follow up on gh-3257.
-
Konstantin Belyavskiy authored
Another broken case. Adding a new replica to cluster: + if (replica->applier->remote_is_ro && + replica->applier->vclock.signature == 0) In this case we may got an ER_READONLY, since signature is not 0. So leader election now has two phases: 1. To select among read-write replicas. 2. If no such found, try old algorithm for backward compatibility (case then all replicas exist in cluster table). Closes #3257
-
Konstantin Osipov authored
-
Vladimir Davydov authored
No point in this level of indirection. We embed bps tree implementation into memtx_tree_index, why don't we do the same in case of hash index. A good side effect is that we can now define iterators in headers for both memtx_tree_index and memtx_hash_index, which is required to improve memtx garbage collection mechanism.
-
Vladimir Davydov authored
Since it is created when the memtx engine is initialized, we should destroy it on engine shutdown.
-
Vladimir Davydov authored
All functions that need them are now explicitly passed engine so we can consolidate all variables related to memtx engine state in one place.
-
Vladimir Davydov authored
We need this so that we can force garbage collection when we are short on memory. There are two such functions: one is used for allocating index extents, another for allocating tuples. Index allocating function has an opaque context so we simply reuse it for passing memtx engine to it. To pass memtx engine to tuple allocating function, we add an opaque engine specific pointer to tuple_format and set it to memtx_engine for memtx spaces.
-
Vladimir Davydov authored
The two files are too closely related: memtx_arena is defined and used in memtx_engine.c, but initialized in memtx_tuple.cc. Since memtx_tuple.cc is small, let's fold it into memtx_engine.c.
-
Vladimir Davydov authored
Postponing it until a memtx index is created for the first time saves us no memory or cpu, it only makes the code more difficult to follow.
-
- May 21, 2018
-
-
Vladislav Shpilevoy authored
-
Vladimir Davydov authored
When a memtx space is dropped or truncated, we have to unreference all tuples stored in it. Currently, we do it synchronously, thus blocking the tx thread. If a space is big, tx thread may remain blocked for several seconds, which is unacceptable. This patch makes drop/truncate hand actual work to a background fiber. Before this patch, drop of a space with 10M 64-byte records took more than 0.5 seconds. After this patch, it takes less than 1 millisecond. Closes #3408
-
Vladimir Davydov authored
Force major compaction of all ranges when index.compact() is called. Note, the function only triggers compaction, it doesn't wait until compaction is complete. Closes #3139
-
Vladimir Davydov authored
This patch adds index.compact() Lua method. The new method is backed by index_vtab::compact. Currently, it's a no-op for all kinds of indexes. It will be used by Vinyl engine in order to trigger major compaction. Part of #3139
-
- May 19, 2018
-
-
Konstantin Belyavskiy authored
This test falls from time to time, because .xlog may have a different number in a name (and using box.info.lsn is not an option here). Since it's setup of two masters, it could be one or two xlogs in a folder, so first get a list of all matching files and then delete the last one.
-
- May 18, 2018
-
-
Vladislav Shpilevoy authored
Simplify collation code.
-
- May 17, 2018
-
-
Vladislav Shpilevoy authored
utf8 is a module partially compatible with Lua 5.3 utf8 and lua-utf8 third party module. Partially means, that not all functions are implemented. The patch introduces these ones: upper, lower, len, char, sub, next. Len and char works exactly like in Lua 5.3. Other functions work like in lua-utf8, because they are not presented in Lua 5.3. Tarantool utf8 has extensions: * isupper/lower/alpha/digit, that check some property by a symbol or by its code; * cmp/casecmp, that compare two UTF8 strings. Closes #3290 Closes #3385 Closes #3081
-
Vladislav Shpilevoy authored
Collation fingerprint is a formatted string unique for a set of collation properties. Equal collations with different names have the same fingerprint. This new property is used to build collation fingerprint cache to use in Tarantool internals, where collation name does not matter. Fingerprint cache can never conflict or replace on insertion into it. It means, that, for example, utf8 module being created in this patchset, can fill collation cache with its own collations and it will affect neither users or other modules.
-
Vladislav Shpilevoy authored
In the issue #3290 the important problem appeared - Tarantool can not create completely internal collations with no ID, name, owner. Just for internal usage. Original struct coll can not be used for this since * it has fields that are not needed in internals; * collation name is public thing, and the collation cache uses it, so it would be necessary to forbid to a user usage of some system names; * when multiple collations has the same comparator and only their names/owners/IDs are different, the separate UCollator objects are created, but it would be good to be able to reference a single one. This patch renames coll to coll_id, coll_def to call_id_def and introduces coll - pure collation object with no any user defined things. Needed for #3290.
-