- Mar 23, 2017
-
-
Roman Tsisyk authored
* Supress warnings in third-party code. * Fix false-positive "variable may be used uninitialized in this function" * Add MAYBE_UNUSED to variables used only for assertions * Fix compilation on GCC 6.3.0 20170205 * Fix compilation on GCC 7.0.1 20170316 * Fix compilation on Clang 5.0.0-svn294894-1
-
Vladimir Davydov authored
It will be used for backups.
-
Vladimir Davydov authored
On backup we copy files corresponding to the most recent checkpoint. Since xctl does not create snapshots of its log files, but instead appends records written after checkpoint to the most recent log file, the signature of the xctl file corresponding to the last checkpoint equals the signature of the previous checkpoint. So upon successful recovery from a backup we need to rotate the log to keep checkpoint and xctl signatures in sync.
-
Vladimir Davydov authored
Add vclocks of rotated logs to xdir so that they can be deleted on garbage collection with xdir_collect_garbage().
-
Vladimir Davydov authored
We are going to scan directory to find the latest xctl log. Therefore we must create an xctl file on each checkpoint, even if there's not record in it, otherwise we can load stale metadata.
-
Vladimir Davydov authored
So that we don't have to reimplement directory scanning logic when we need it for looking up the latest xctl log.
-
Vladimir Davydov authored
So that we can use xdir for manipulating metadata logs.
-
Ilya authored
- Fix box.cfg() without options - Fix box.cfg { replication = {} } Closes #2191
-
Alexandr Lyapunov authored
Fixes #2104
-
Georgy Kirichenko authored
This feature has never truly been implemented. There weren't even any test cases for it. This is a breaking change since 1.7.2 alpha. - Remove index:upsert() and keep only space:upsert() - Remove index_id from box_upsert() - Ignore index_id for UPSERT in IPROTO. Fixes #2226
-
- Mar 22, 2017
-
-
bigbes authored
-
- Mar 21, 2017
-
-
Konstantin Osipov authored
* reduce the scope of wal_stream in box_cfg_xc() * remove unused start_offset/end_offset from wal_request * remove unused rmean_wal_tx_bus
-
Konstantin Osipov authored
Do not use recovery->vclock in lua/info.cc and for truncate. Maintainance of recovery->vclock after initial recovery is finished is an artefact of Tarantool 1.5 architecture and will be removed in the future.
-
Vladimir Davydov authored
-
Vladimir Davydov authored
The amount of data sent when bootstrapping replica is only limited by the master's disk size, which can exceed the size of available memory by orders of magnitude. So we need the scheduler to be up and running while bootstrapping from the remote host so that it could schedule dumps when the quota limit is hit.
-
Vladimir Davydov authored
Currently, on initial join we send the current vinyl state. To do that, we open a read iterator over a space's primary index and send statements returned by it. Such an approach has a number of inherent problems: - An open read iterator blocks compaction, which is unacceptable for such a long operation as join. To avoid blocking compaction, we open the iterator in the dirty mode, i.e. it skims over the tops. This, however, introduces a different kind of problem: this makes the threshold between initial and final join phases hazy - statements sent on final join may or may not be among those sent during the initial join, and there's no efficient way to differentiate between them w/o sending extra information. - The replica expects LSNs to be growing monotonically. This constraint is imposed by the lsregion allocator used for storing statements in memory, but read iterator returns statements ordered by key, not by LSN. Currently, replica simply crashes if statements happen to be sent in an order different from chronological, which renders vinyl replication unusable. In the scope of the current model, we can't fix this by assigning fake LSNs to statements received on initial join, because there's no strict LSN threshold between initial and final join phases (see the previous paragraph). - In the initial join phase, replica is only aware of spaces that were created before the last snapshot, while vinyl sends statements from spaces that exist now. As a result, if a space was created after the most recent snapshot, the replica won't be able to receive its tuples and fail. To address the above-mentioned problems, we make vinyl initial join send the latest snapshot, just like in case of memtx. We implement this by loading the vinyl state from the last snapshot of the metadata log and sending statements of all runs from the snapshot as is (including deletes and updates), to be applied by the replica. To make lsregion at the receiving end happy, we assign fake monotonically growing LSNs to statements received on initial join. This is OK, because any LSN from final join > max real LSN from initial join max real LSN from initial join >= max fake LSN hence any LSN from final join > any fake LSN from initial join Besides fixing vinyl replication, this patch also enables the replication test suite for the vinyl engine (except for hot_standby) and makes engine/replica_join cover the following test cases: - secondary indexes - delete and update statements - keys added in an order different from LSN - recreate space after checkpoint Closes #1911 Closes #2001
-
Konstantin Osipov authored
* promote recovery vclock in recover_xlog(), after we apply the recovered row * remove unnecessary promotions from WAL and relays: they should not use recovery vclock going forward. * update format specifier for error code ER_UNKNOWN_REPLICA to expect a string, rather than integer, since it's passed a string for replica id, not integer * remove unused code in relay.cc
-
Konstantin Osipov authored
Assume we can trust everything we read from the local recovery - the xlog signatures and checksums are on guard for this.
-
Roman Tokarev authored
-
Konstantin Osipov authored
Using wal_checkpoint() incurs an extra and unnecessary overhead in tx thread. Monitoring inquiries are supposed to be quick and incur close to no ovrhead on tx thread. The state of replicaset vclock is good enough for monitoring. Revert back the test results broken by f2bccc18 by @rtsisyk The original results were correct, as indicated by the comment not changed by f2bccc18
-
Konstantin Osipov authored
-
Konstantin Osipov authored
Intialize row timestamp before writing the row, in wal_request_write(). Row timestamp does not bear any semantics, it's for informaitonal purposes only. Initialize it in WAL thread. This fixes timestamps of xctl file as well, which were up until now left uninitialized. The patch prepares for removal of recovery_fill_lsn().
-
Konstantin Osipov authored
Use replicaset_vclock in SUBSCRIBE. Ensure it is correctly initialized after local recovery, so that it contains LSNs of remote servers even after they were saved in the local WAL and recovered from it.
-
- Mar 20, 2017
-
-
Konstantin Osipov authored
Reset the global vclock after initial join to ensure engine events are not filtered out by recovery.
-
Konstantin Osipov authored
-
Konstantin Osipov authored
This reverts commit d45bcc5a.
-
Vladimir Davydov authored
Empty ranges are added to the dump heap, so if - we exceed the quota - there's a range dump in progress - there's a range that has already been dumped we will keep picking the empty range for dumping over and over again until dumping of the non-empty range is complete, flooding the log with informational messages.
-
Vladimir Davydov authored
This breaks multi-level compaction and triggers coalescing of all ranges after recovery, which may result in xctl buffer overflow and panic if there are a lot of runs.
-
- Mar 17, 2017
-
-
Vladimir Davydov authored
Follow up 5102bfbc
-
Roman Tsisyk authored
Closes #1845
-
Roman Tsisyk authored
Follow up f232756b
-
Vladimir Davydov authored
The number of rows sent during initial join may be greater than the LSN of the checkpoint sent by the master, because there are rows that do not contribute to LSN (system spaces, etc). If this happens, LSNs assigned on final join will be greater than LSNs assigned after bootstrapping is complete, which breaks Vinyl logic. Fix that by resetting recovery vclock to the checkpoint LSN before getting to final join.
-
Vladimir Davydov authored
Currently, old snapshots and xlogs are deleted by the snapshot daemon while vinyl files are removed from engine_commit_checkpoint(). For the sake of backups and replication, which need to temporarily disable garbage collection, we should bring all garbage collection routines together in one place. That's why this patch introduces box.internal.gc() method, which takes an LSN of the latest snapshot to keep as its argument. When called, it deletes all xlog files as well as engine specific files (memtx snapshots, vinyl runs) that are not required to recover from a snapshot with LSN greater or equal to the given one. For removal of engine specific files, a new engine callback is introduced, Engine::collectGarbage. The snapshot daemon now calls box.internal.gc() to cleanup instead of deleting snap and xlog files by itself.
-
Vladimir Davydov authored
This patch is a preparation for centralized garbage collection. It extracts the code doing garbage collection from xctl_rotate() and places it in a separate function xctl_collect_garbage(). The latter takes a signature which determines the minimal age an object has to have to be deleted - the function only removes files left from objects that were deleted before the log received the given signature. It is needed for making vinyl respect box.cfg.snapshot_count.
-
Vladimir Davydov authored
A set of run files created by a snapshot is inconsistent, meaning w/o replaying xlog it is not guaranteed that it contains a database state that existed when the snapshot was taken. This is because we dump all ranges independently and each range as a whole, so that if a statements happens to be inserted to a range after snapshot was started and before the range is dumped, it will be included in the dump. This peculiarity stands in the way of backups and replication, both of which require a consistent database state. To make snapshot consistent, let's force rotation of all in-memory trees on snapshot and make the dump task only dump trees that need to be snapshotted if snapshot is in progress. The rotation is done lazily, on insertion to the tree, similarly to how we handle DDL. The difference is instead of sc_version we check vy_mem->min_lsn against checkpoint_lsn.
-
Vladimir Davydov authored
Since range->mem can be rotated while dump is in progress, we have to remember which mems we are dumping. Commit 818208c4 ("vinyl: fix unwritten mem dropped if ddl") does this by remembering the number of frozen mems at the time of dump preparation on the dump task. Currently, this works fine, because we always dump all frozen mems. However, this condition won't hold when consistent snapshot is introduced. The point is that in order to make snapshot consistent, we need to dump only in-memory trees which were created before WAL checkpoint during snapshot. These mems are not even guaranteed to be at the end of the range->frozen list because of range coalescing. So in this patch we use the current LSN to remember which mems are going to be dumped - all mems created after task dump was created will have min_lsn > LSN of task creation, so we should delete only mems with min_lsn <= the LSN on task completion.
-
Vladimir Davydov authored
In contrast to the memtx engine, which populates in-memory trees from Engine::prepare(), in case of Vinyl statements are inserted into in-memory trees after WAL write, from the Engine::commit() callback. Therefore, to make sure all statements inserted before snapshot are dumped, we must initiate checkpoint after WAL rotation. Currently, it is not true - checkpoint is initiated from Engine::beginCheckpoint(). To make Vinyl snapshots consistent (not requiring xlog replay), we have to fix that, so introduce a new callback, Engine::prepareWaitCheckpoint(), which is called right after WAL rotation, and trigger Vinyl checkpoint from it.
-
Vladimir Davydov authored
vy_mem_update_formats() is used to update mem formats when mem rotation is skipped, because the mem is empty. This doesn't work as expected, because vy_mem_update_formats() does not update mem->sc_version, so that in case of ddl the next insertion will rotate it anyway. Instead of updating sc_version in vy_mem_update_formats(), let's fix this by zapping the helper altogether and simply recreating mem - it isn't a big deal, because this does not happen often. While we are at it, let's also: - reorder arguments of vy_mem_new() to keep key_def close to format - remove extra arguments of vy_range_rotate_mem() - we can get all of them right there (in fact we already do in case of ->format).
-
Vladimir Davydov authored
Currently, if the range is splitting, we only add active in-memory indexes of the resulting ranges to the read iterator, see vy_read_iterator_add_mem(). This is because until recently a mem could only be frozen on dump/compaction task preparation, which is disabled while split is in progress. However, it is not true any more - a mem can be rotated on txn_commmit() in case of DDL, hence we must always add all in-memory indexes, including frozen ones, when opening a read iterator.
-
- Mar 16, 2017
-
-
Konstantin Osipov authored
-