- Jul 10, 2020
-
-
Serge Petrenko authored
Introduce a new function to box.ctl API: box.ctl.clear_synchro_queue() The function performs some actions to make sure that after it's executed, the txn_limbo is free of any transactions issued on a remote instance. In order to achieve this goal, the instance first waits for 2 replication_synchro_timeouts so that confirmations and rollbacks from the remote instance reach it. If the limbo remains non-empty, the instance starts figuring out which transactions should be confirmed and which should be rolled back. In order to do so the instance scans through vclocks of all the instances that replicate from it and defines which old leader's lsn is the last reached by replication_synchro_quorum of replicas. Then the instance writes appropriate CONFIRM and ROLLBACK entries. After these actions the limbo must be empty. Closes #4849
-
Serge Petrenko authored
The comparator will be needed in other files too, e.g. box.cc Prerequisite #4849
-
Vladislav Shpilevoy authored
Fully local transactions are expected to be blocked if there is a synchronous transaction not finished. Also there is a special case for when a transaction is not local, but has a local row in the end, related to #4928.
-
Sergey Bronnikov authored
Part of #5055
-
Sergey Bronnikov authored
Part of #5055
-
Sergey Bronnikov authored
Part of #5055
-
Vladislav Shpilevoy authored
When synchro quorum is 1, the final commit and confirmation write are done by the fiber created the transaction, right after WAL write. This case got special handling in the previous patches, and this commits adds a test for that. Closes #5123
-
Vladislav Shpilevoy authored
Follow-up #4845
-
Serge Petrenko authored
Final join (or register) stage is needed to deliver the replica its _cluster registration. Since this stage is followed by a snapshot on replica, the data received during this stage must be confirmed. Make master check that there are no rollbacks for the data to be sent during final join and that all the data is confirmed before final join starts. Closes #5097
-
Serge Petrenko authored
All the data that master sends during the join stage (both initial and final) is embedded into the first snapshot created on replica, so this data mustn't contain any unconfirmed or rolled back synchronous transactions. Make sure that master starts sending the initial data, which contains a snapshot-like dump of all the spaces only after the latest synchronous tx it has is confirmed. In case of rollback, the replica may retry joining. Part of #5097
-
Serge Petrenko authored
Add failure reason to txn_limbo_wait_confirm Prerequisite #5097
-
Vladislav Shpilevoy authored
Applier used to send ACKs to master when commit happens. But for sync transactions this is not enough - their ACK should be sent after WAL write. Master doesn't really care whether a commit will happen after WAL write on the replica. The only thing which matters is whether the replica managed to persist the sync transaction. Now applier uses WAL write event instead of commit to send ACKs. Nothing changed for async transactions (for them WAL write == commit). But sync transactions now send ACKs immediately, without waiting for heartbeat timeout. Closes #5100 Closes #5127
-
Vladislav Shpilevoy authored
Applier has a writer fiber sending vclock of the instance to the master after each WAL write or when heartbeat timeout passes. However it missed WAL writes happened *during* sending ACK on a previous WAL write. That made applier sleep heartbeat timeout even though it had not sent data. It is not a problem for async replication, but becomes a bug when sync transactions appear. For them an ACK should be sent as soon as possible. Part of #5100
-
Vladislav Shpilevoy authored
With synchronous replication a sycn transaction passes 2 stages: WAL write + commit. These are separate events on the contrary with async transactions, where WAL write == commit. The WAL write event is needed on non-leader nodes to be able to send an ACK to the master. Part of #5100
-
Serge Petrenko authored
Follow-up #4847 Follow-up #4848
-
Serge Petrenko authored
Follow-up #4847 Follow-up #4848 Closes #4851
-
Serge Petrenko authored
Local recovery should use asynchronous txn commit procedure in order to get to CONFIRM and ROLLBACK statements for a transaction that needs confirmation before confirmation timeout happens. Using async txn commit doesn't harm other transactions, since the journal used during local recovery fakes writes and its write_async() method may reuse plain write(). Follow-up #4847 Follow-up #4848
-
Serge Petrenko authored
Now txn_limbo writes a ROLLBACK entry to WAL when one of the limbo entries fails to gather quorum during a txn_limbo_confirm_timeout. All the limbo entries, starting with the failed one, are rolled back in reverse order. Closes #4848
-
Serge Petrenko authored
Now txn_limbo_wait_complete() waits for acks only for txn_limbo_confirm_timeout seconds. If a timeout is reached, the entry and all the ones following it must be rolled back. Part-of #4848
-
Leonid Vasiliev authored
To support qsync replication, the waiting for confirmation of current "sync" transactions during a timeout has been added to the snapshot machinery. In the case of rollback or the timeout expiration, the snapshot will be cancelled. Closes #4850
-
Serge Petrenko authored
Make txn_limbo write a CONFIRM entry as soon as a batch of entries receive their acks. CONFIRM entry is written to WAL and later replicated to all the replicas. Now replicas put synchronous transactions into txn_limbo and wait for corresponding confirmation entries to arrive and end up in their WAL before committing the transactions. Closes #4847
-
Serge Petrenko authored
Transaction on_rollback triggers will need to distinguish txn_limbo-issued rollbacks from rollbacks that happened due to a failed WAL write or memory error. Prerequisite #4847 Prerequisite #4848
-
Serge Petrenko authored
Add methods to encode/decode CONFIRM entry. A CONFIRM entry will be written to WAL by synchronous replication master as soon as it finds that the transaction was applied on a quorum of replicas. CONFIRM rows share the same header with other rows in WAL, but their body differs: it's just a map containing replica_id and lsn of the last confirmed transaction. ROLLBACK request contains the same data as CONFIRM request. The only difference is the request semantics. While a CONFIRM request releases all the limbo entries up to the given lsn, the ROLLBACK request rolls back all the entries with lsn greater than given one. Part-of #4847 Part-of #4848
-
Vladislav Shpilevoy authored
Synchronous transaction (which changes anything in a synchronous space) before commit waits until it is replicated onto a quorum of replicas. When there is a not committed synchronous transaction, any attempt to commit a next transaction is suspended, even if it is an async transaction. This restriction comes from the theoretically possible dependency of what is written in the async transactions on what was written in the previous sync transactions. So far all the 'synchronousness' is basically the same as the well known 'wait_lsn' technique. With the exception, that the transaction really is not committed until replicated. Problem of wait_lsn is still present though, in case master restarts. Because there is no a 'confirm' record in WAL telling which transactions are replicated and can be applied. Closes #4844 Closes #4845
-
- Jul 09, 2020
-
-
Vladislav Shpilevoy authored
Synchronous transactions are supposed to be replicated on a specified number of replicas before committed on master. The number of replicas can be specified using replication_synchro_quorum option. It is 1 by default, so sync transactions work like asynchronous when not configured anyhow. 1 means successful WAL write on master is enough for commit. When replication_synchro_quorum is greater than 1, an instance has to wait for the specified number of replicas to reply with success. If enough replies aren't collected during replication_synchro_timeout, the instance rolls back the tx in question. Part of #4844 Part of #5073
-
Vladislav Shpilevoy authored
Synchronous space makes every transaction, affecting its data, wait until it is replicated on a quorum of replicas before it is committed. Part of #4844 Part of #5073
-
Nikita Pettik authored
xrow_upsert_execute() can fail and return NULL for various reasons. However, in vy_apply_upsert() the result of xrow_upsert_execute() is used unconditionally which may lead to crash. Let's fix it and in case xrow_upser_execute() fails return from vy_apply_upsert() NULL value. Part of #4957
-
Ilya Kosarev authored
Tarantool's box.backup.start() returns the list of files needed to successfully run new instance. The problem was that it doesn't include empty indexes log directories. This means that after starting the instance using backed up files and inserting something into previously empty index we could run into log file creation error for example on box.snapshot() call. Now this is not a problem as far as according directories are created if needed. Corresponding test case added. Closes #5090
-
- Jul 08, 2020
-
-
Cyrill Gorcunov authored
Fixes #4718 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Part-of #4718 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Part-of #4718 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Part-of #4718 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Part-of #4718 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Part-of #4718 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Part-of #4718 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Part-of #4718 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Part-of #4718 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Part-of #4718 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
And drop duplicate declaration of cbus_init. Part-of #4718 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Part-of #4718 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-