- May 31, 2018
-
-
Vladislav Shpilevoy authored
Session salt is 32 random bytes, that are used to encode password when a user is authorized. The salt is not used in non-binary sessions, and can be moved to iproto connection.
-
Vladislav Shpilevoy authored
Yaml.decode tag_only option allows to decode a single tag of a YAML document. For #2677 it is needed to detect different push types in text console: print pushes via console.print, and actual pushes via box.session.push. To distinguish them YAML tags will be used. A client console for each message will try to find a tag. If a tag is absent, then the message is a simple response to a request. If a tag is !print!, then the document consists of a single string, that must be printed. Such a document must be decoded to get the printed string. So the calls sequence is yaml.decode(tag_only) + yaml.decode. The reason why a print message must be decoded is that a print() result on a server side can be not well-formatted YAML, and must be encoded into it to be correctly sent. For example, when I do on a server side something like this: console.print('very bad YAML string') The result of a print() is not a YAML document, and to be sent it must be encoded into YAML on a server side. If a tag is !push!, then the document is sent via box.session.push, and must not be decoded. It can be just printed or ignored or something. Needed for #2677
-
Vladislav Shpilevoy authored
Encode_tagged is a workaround to be able to pass options to yaml.encode(). Before the patch yaml.encode() in fact has this signature: yaml.encode(...). So it was impossible to add any options to this function - all of them would be treated as the parameters. But documentation says: https://tarantool.io/en/doc/1.9/reference/reference_lua/yaml.html?highlight=yaml#lua-function.yaml.encode that the function has this signature: yaml.encode(value). I hope if anyone uses yaml.encode(), he does it according to the documentation. And I can add the {tag_prefix, tag_handle} options to yaml.encode() and remove yaml.encode_tagged() workaround.
-
- May 30, 2018
-
-
Vladislav Shpilevoy authored
Encode_tagged allows to define one global YAML tag for a document. Tagged YAML documents are going to be used for console text pushes to distinguish actual box.session.push() from console.print(). The first will have tag !push, and the second - !print.
-
Vladimir Davydov authored
None of engine_wait_checkpoint, engine_commit_checkpoint, engine_join, engine_backup needs to modify the vclock argument.
-
Vladimir Davydov authored
Slab arena can grow dynamically so all we need to do is increase the quota limit. Decreasing the limits is still explicitly prohibited, because slab arena never unmaps slabs. Closes #2634
-
Vladimir Davydov authored
During recovery, we may write VY_LOG_CREATE_LSM and VY_LOG_DROP_LSM records we failed to write before restart (because those records are written after WAL and hence may not make it to vylog). Right after recovery we invoke garbage collection to drop incomplete runs. Once VY_LOG_PREPARE_LSM record is introduced, we will also collect incomplete LSM trees there (those we failed to build). However, there may be LSM trees we managed to build but failed to write VY_LOG_CREATE_LSM for. This is OK as we will retry vylog write, but currenntly it isn't reflected in the recovery context used for garbage collection. To avoid purging such LSM trees, let's update the recovery context with records written during recovery. Needed for #1653
-
- May 29, 2018
-
-
Vladimir Davydov authored
Allocation of vy_lsm_recovery_info::key_parts is a part of the struct initialization, which is handled by vy_recovery_do_create_lsm().
-
Konstantin Osipov authored
Update the error message for dynamic changes of instance_uuid and replicaset_uuid
-
Konstantin Osipov authored
-
Georgy Kirichenko authored
Handle cases if instance_uuid and replicaset_uuid are present in box.cfg and have same values as already set. Fixes #3421
-
- May 25, 2018
-
-
Konstantin Belyavskiy authored
This fix improves 'box.info.replication' output. If downstream fails and thus disconnects from upstream, improve logging by printing 'status: disconnected' and error message on both sides (master and replica). Closes #3365
-
Konstantin Belyavskiy authored
This is a part of more complex task aiming to improve logging. Do not destroy relay since it stores last error and it can be useful for diagnostic reason. Now relay is created with replica and always exists. So also remove several NULL checks. Add relay_state { OFF, FOLLOW and STOPPED } to track replica presence, once connected it either FOLLOW or STOPPED until master is reset. Updated with @kostja proposal. Used for #3365.
-
Konstantin Osipov authored
-
Vladimir Davydov authored
Currently, when an index is dropped, we remove all ranges/slices associated with it and mark all runs as dropped in vylog immediately. To find ranges/slices/runs, we use vy_lsm struct, see vy_log_lsm_prune. The problem is vy_lsm struct may be inconsistent with the state stored in vylog if index drop races with compaction, because we first write changes done by compaction task to vylog and only then update vy_lsm struct, see vy_task_compact_complete. Since write to vylog yields, this opens a time window during which the index can be dropped. If this happens, objects that were created by compaction but haven't been logged yet (such as new runs, slices, ranges) will be deleted from vylog by index drop, and this will permanently break vylog, making recovery impossible. To fix this issue, let's rework garbage collection of objects associated with dropped indexes as follows. Now when an index is dropped, we write a single record to vylog, VY_LOG_DROP_LSM, i.e. just mark the index as dropped without deleting associated objects. Actual index cleanup takes place in the garbage collection procedure, see vy_gc, which purges all ranges/slices linked to marked indexes from vylog and marks all their runs as dropped. When all runs are actually deleted from disk and "forgotten" in vylog, we remove the index record from vylog by writing VY_LOG_FORGET_LSM record. Since garbage collection procedure uses vylog itself instead of vy_lsm struct for iterating over vinyl objects, no race between index drop and dump/compaction can now lead to broken vylog. Closes #3416
-
Vladimir Davydov authored
This is required to rework garbage collection in vinyl.
-
Vladimir Davydov authored
We pass lsn of index alter/create records, let's pass lsn of drop record for consistency. This is also needed by vinyl to store it in vylog (see the next patch).
-
Vladimir Davydov authored
If an index was dropped and then recreated, then while replaying vylog we will reuse vy_lsm_recovery_info object corresponding to it. There's no reason why we do that instead of simply allocating a new object - amount of memory saved is negligible, but the code looks more complex. Let's simplify the code - whenever we see VY_LOG_CREATE_LSM, create a new vy_lsm_recovery_info object and replace the old incarnation if any in the hash map.
-
Konstantin Osipov authored
replication: make replication_connect_timeout dynamic
-
Konstantin Osipov authored
-
Vladimir Davydov authored
Do not use errinj as it is unreliable. Check that: - No memory is freed by immediately after space drop (WAL is off). - All memory is freed asynchronously after yield.
-
Vladimir Davydov authored
replicaset_sync() returns not only if the instance synchronized to connected replicas, but also if some replicas have disconnected and the quorum can't be formed any more. Nevertheless, it always prints that sync has been completed. Fix it. See #3422
-
Vladimir Davydov authored
If a replica disconnects while sync is in progress, box.cfg{} may stop syncing leaving the instance in 'orphan' mode. This will happen if not enough replicas are connected to form a quorum. This makes sense e.g. on network error, but not when a replica is loading, because in the latter case it should be up and running quite soon. Let's account replicas that disconnected because they haven't completed initial configuration yet and continue syncing if connected + loading > quorum. Closes #3422
-
Konstantin Belyavskiy authored
Small refactoring: remove 'enum replica_state' since reuse a subset from applier state machine 'enum replica_state' to check if we have achieved replication quorum and hence can leave read-only mode.
-
Konstantin Osipov authored
The default of 4 seconds is too low to bootstrap a large cluster.
-
Vladislav Shpilevoy authored
Closes #3425
-
- May 24, 2018
-
-
Georgy Kirichenko authored
In some cases when an applier processing yielded, other applier might start some conflicting operation and break replication and database consistency. Now applier locks a per-server-id latch before processing a transaction. This guarantees that there is only one applier request for each server in progress at each given moment. The problem was very rare until full mesh topologies in vinyl became a commonplace. Fixes gh-3339
-
Vladimir Davydov authored
When a memtx space is dropped or truncated, we delegate freeing tuples stored in it to a background fiber so as not to block the caller (and tx thread) for too long. Turns out it doesn't work out well for ephemeral spaces, which share the destruction code with normal spaces: the problem is the user might issue a lot of complex SQL SELECT statements that create a lot of ephemeral spaces and do not yield and hence don't give the garbage collection fiber a chance to clean up. There's a test that emulates this, 2.0:test/sql-tap/gh-3083-ephemeral-unref-tuples.test.lua. For this test to pass, let's run garbage collection procedure on demand, i.e. when any of memtx allocation functions fails to allocate memory. Follow-up #3408
-
Vladimir Davydov authored
Currently, the engine has not control over yields issued during asynchronous index destruction. As a result, it can't force gc when there's not enough memory. To fix that, let's make gc callback stateful: now it's supposed to free some objects and return true if there's still more objects to free or false otherwise. Yields are now done by the memtx engine itself after each gc callback invocation.
-
- May 22, 2018
-
-
Konstantin Osipov authored
Avoid goto, a follow up on gh-3257.
-
Konstantin Belyavskiy authored
Another broken case. Adding a new replica to cluster: + if (replica->applier->remote_is_ro && + replica->applier->vclock.signature == 0) In this case we may got an ER_READONLY, since signature is not 0. So leader election now has two phases: 1. To select among read-write replicas. 2. If no such found, try old algorithm for backward compatibility (case then all replicas exist in cluster table). Closes #3257
-
Konstantin Osipov authored
-
Vladimir Davydov authored
No point in this level of indirection. We embed bps tree implementation into memtx_tree_index, why don't we do the same in case of hash index. A good side effect is that we can now define iterators in headers for both memtx_tree_index and memtx_hash_index, which is required to improve memtx garbage collection mechanism.
-
Vladimir Davydov authored
Since it is created when the memtx engine is initialized, we should destroy it on engine shutdown.
-
Vladimir Davydov authored
All functions that need them are now explicitly passed engine so we can consolidate all variables related to memtx engine state in one place.
-
Vladimir Davydov authored
We need this so that we can force garbage collection when we are short on memory. There are two such functions: one is used for allocating index extents, another for allocating tuples. Index allocating function has an opaque context so we simply reuse it for passing memtx engine to it. To pass memtx engine to tuple allocating function, we add an opaque engine specific pointer to tuple_format and set it to memtx_engine for memtx spaces.
-
Vladimir Davydov authored
The two files are too closely related: memtx_arena is defined and used in memtx_engine.c, but initialized in memtx_tuple.cc. Since memtx_tuple.cc is small, let's fold it into memtx_engine.c.
-
Vladimir Davydov authored
Postponing it until a memtx index is created for the first time saves us no memory or cpu, it only makes the code more difficult to follow.
-
- May 21, 2018
-
-
Vladislav Shpilevoy authored
-
Vladimir Davydov authored
When a memtx space is dropped or truncated, we have to unreference all tuples stored in it. Currently, we do it synchronously, thus blocking the tx thread. If a space is big, tx thread may remain blocked for several seconds, which is unacceptable. This patch makes drop/truncate hand actual work to a background fiber. Before this patch, drop of a space with 10M 64-byte records took more than 0.5 seconds. After this patch, it takes less than 1 millisecond. Closes #3408
-