- Jul 13, 2020
-
-
Vladislav Shpilevoy authored
In one of the test cases 2 fibers were started making a transaction. In the first fiber the transaction was rolled back, and the second fiber was expected to do the same. It did rollback too, but not always immediately after the first one. Because the first fiber needed not just do rollback right away, but write a ROLLBACK entry into WAL before applying the rollback to all next transactions. This led to a yield, during which it was possible to observe the second fiber not dead yet. The patch makes the test explicitly wait for the fibers death. Closes #5162
-
- Jul 12, 2020
-
-
Aleksandr Lyapunov authored
-
- Jul 11, 2020
-
-
Cyrill Gorcunov authored
Instead of open coding. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Bsaically this is the same what txn_limbo_abort does. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
The limbo variable is accessed unconditionally thus no need for fake reference. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
We use limbo variable accounting acks so no need for formal read here. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Otherwise it is not clear why we should setup a flag here. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- Jul 10, 2020
-
-
Alexander V. Tikhonov authored
Syncronized suites 'fragile' lists with actual list of flaky tests.
-
Nikita Pettik authored
sql_value_type() and mem_mp_type() do the same thing: return messagePack type corresponding to value stored in memory cell. However, sql_value_type() operates on opaque API wrapper - sql_value*. To avoid duplicating code let's invoke mem_mp_type() in sql_value_type(). At once, let's account that mp_type now can be not only _BIN, but also _ARRAY and _MAP - this fact will be used when we introduce arrays and maps in SQL.
-
Nikita Pettik authored
It takes memory cell object and returns corresponding to its value messagePack type (i.e. it maps MEM_* types on MP_* types). It's an internal analogue of sql_value_type(). In other words, it operates directly on struct Mem *.
-
Vladislav Shpilevoy authored
Transactions are rolled back in reversed order, always. Limbo somewhy removed rolled back transactions from the beginning, not from the end. The test ensures it is not so. Closes #5147
-
Vladislav Shpilevoy authored
In the original issue there were 2 bugs: one memory leak and one memory corruption. The leak was in txn_limbo_write_confirm_rollback(). This function used fiber->gc region to encode CONFIRM/ROLLBACK, but never freed it. The corruption was in applier.cc in process_confirm_rollback(). CONFIRM/ROLLBACK were stored on the applier's ibuf. As a result, if applier experiences relatively intensive load, the ibuf will be quickly recycled, right during a WAL write of CONFIRM/ROLLBACK stored on it. DML requests would have the same problem, but they were copied in txn_add_redo() inside of xrow_encode_dml() call. The test checks whether CONFIRM/ROLLBACK are also copied. Closes #5138
-
Vladislav Shpilevoy authored
Synchronous replication options - replication_synchro_quorum and replication_synchro_timeout - were not updated for the existing transactions on change. As a result, there could be weird inconsistencies, when a new transaction could have required quorum smaller than a previous transaction's, and could implicitly confirm it. The same could be told about rollback on timeout - new transactions could wake up earlier than older transactions. This patch makes configuration dynamic. So if the mentioned options are updated, they are applied to the existing transactions too. It opens wide administrative capabilities. For example, when replica count becomes less than the quorum, an administrator can lower the quorum dynamically, and it will be applied to all the existing transactions. Closes #5119
-
Serge Petrenko authored
Introduce a new function to box.ctl API: box.ctl.clear_synchro_queue() The function performs some actions to make sure that after it's executed, the txn_limbo is free of any transactions issued on a remote instance. In order to achieve this goal, the instance first waits for 2 replication_synchro_timeouts so that confirmations and rollbacks from the remote instance reach it. If the limbo remains non-empty, the instance starts figuring out which transactions should be confirmed and which should be rolled back. In order to do so the instance scans through vclocks of all the instances that replicate from it and defines which old leader's lsn is the last reached by replication_synchro_quorum of replicas. Then the instance writes appropriate CONFIRM and ROLLBACK entries. After these actions the limbo must be empty. Closes #4849
-
Serge Petrenko authored
The comparator will be needed in other files too, e.g. box.cc Prerequisite #4849
-
Vladislav Shpilevoy authored
Fully local transactions are expected to be blocked if there is a synchronous transaction not finished. Also there is a special case for when a transaction is not local, but has a local row in the end, related to #4928.
-
Sergey Bronnikov authored
Part of #5055
-
Sergey Bronnikov authored
Part of #5055
-
Sergey Bronnikov authored
Part of #5055
-
Vladislav Shpilevoy authored
When synchro quorum is 1, the final commit and confirmation write are done by the fiber created the transaction, right after WAL write. This case got special handling in the previous patches, and this commits adds a test for that. Closes #5123
-
Vladislav Shpilevoy authored
Follow-up #4845
-
Serge Petrenko authored
Final join (or register) stage is needed to deliver the replica its _cluster registration. Since this stage is followed by a snapshot on replica, the data received during this stage must be confirmed. Make master check that there are no rollbacks for the data to be sent during final join and that all the data is confirmed before final join starts. Closes #5097
-
Serge Petrenko authored
All the data that master sends during the join stage (both initial and final) is embedded into the first snapshot created on replica, so this data mustn't contain any unconfirmed or rolled back synchronous transactions. Make sure that master starts sending the initial data, which contains a snapshot-like dump of all the spaces only after the latest synchronous tx it has is confirmed. In case of rollback, the replica may retry joining. Part of #5097
-
Serge Petrenko authored
Add failure reason to txn_limbo_wait_confirm Prerequisite #5097
-
Vladislav Shpilevoy authored
Applier used to send ACKs to master when commit happens. But for sync transactions this is not enough - their ACK should be sent after WAL write. Master doesn't really care whether a commit will happen after WAL write on the replica. The only thing which matters is whether the replica managed to persist the sync transaction. Now applier uses WAL write event instead of commit to send ACKs. Nothing changed for async transactions (for them WAL write == commit). But sync transactions now send ACKs immediately, without waiting for heartbeat timeout. Closes #5100 Closes #5127
-
Vladislav Shpilevoy authored
Applier has a writer fiber sending vclock of the instance to the master after each WAL write or when heartbeat timeout passes. However it missed WAL writes happened *during* sending ACK on a previous WAL write. That made applier sleep heartbeat timeout even though it had not sent data. It is not a problem for async replication, but becomes a bug when sync transactions appear. For them an ACK should be sent as soon as possible. Part of #5100
-
Vladislav Shpilevoy authored
With synchronous replication a sycn transaction passes 2 stages: WAL write + commit. These are separate events on the contrary with async transactions, where WAL write == commit. The WAL write event is needed on non-leader nodes to be able to send an ACK to the master. Part of #5100
-
Serge Petrenko authored
Follow-up #4847 Follow-up #4848
-
Serge Petrenko authored
Follow-up #4847 Follow-up #4848 Closes #4851
-
Serge Petrenko authored
Local recovery should use asynchronous txn commit procedure in order to get to CONFIRM and ROLLBACK statements for a transaction that needs confirmation before confirmation timeout happens. Using async txn commit doesn't harm other transactions, since the journal used during local recovery fakes writes and its write_async() method may reuse plain write(). Follow-up #4847 Follow-up #4848
-
Serge Petrenko authored
Now txn_limbo writes a ROLLBACK entry to WAL when one of the limbo entries fails to gather quorum during a txn_limbo_confirm_timeout. All the limbo entries, starting with the failed one, are rolled back in reverse order. Closes #4848
-
Serge Petrenko authored
Now txn_limbo_wait_complete() waits for acks only for txn_limbo_confirm_timeout seconds. If a timeout is reached, the entry and all the ones following it must be rolled back. Part-of #4848
-
Leonid Vasiliev authored
To support qsync replication, the waiting for confirmation of current "sync" transactions during a timeout has been added to the snapshot machinery. In the case of rollback or the timeout expiration, the snapshot will be cancelled. Closes #4850
-
Serge Petrenko authored
Make txn_limbo write a CONFIRM entry as soon as a batch of entries receive their acks. CONFIRM entry is written to WAL and later replicated to all the replicas. Now replicas put synchronous transactions into txn_limbo and wait for corresponding confirmation entries to arrive and end up in their WAL before committing the transactions. Closes #4847
-
Serge Petrenko authored
Transaction on_rollback triggers will need to distinguish txn_limbo-issued rollbacks from rollbacks that happened due to a failed WAL write or memory error. Prerequisite #4847 Prerequisite #4848
-
Serge Petrenko authored
Add methods to encode/decode CONFIRM entry. A CONFIRM entry will be written to WAL by synchronous replication master as soon as it finds that the transaction was applied on a quorum of replicas. CONFIRM rows share the same header with other rows in WAL, but their body differs: it's just a map containing replica_id and lsn of the last confirmed transaction. ROLLBACK request contains the same data as CONFIRM request. The only difference is the request semantics. While a CONFIRM request releases all the limbo entries up to the given lsn, the ROLLBACK request rolls back all the entries with lsn greater than given one. Part-of #4847 Part-of #4848
-
Vladislav Shpilevoy authored
Synchronous transaction (which changes anything in a synchronous space) before commit waits until it is replicated onto a quorum of replicas. When there is a not committed synchronous transaction, any attempt to commit a next transaction is suspended, even if it is an async transaction. This restriction comes from the theoretically possible dependency of what is written in the async transactions on what was written in the previous sync transactions. So far all the 'synchronousness' is basically the same as the well known 'wait_lsn' technique. With the exception, that the transaction really is not committed until replicated. Problem of wait_lsn is still present though, in case master restarts. Because there is no a 'confirm' record in WAL telling which transactions are replicated and can be applied. Closes #4844 Closes #4845
-
- Jul 09, 2020
-
-
Vladislav Shpilevoy authored
Synchronous transactions are supposed to be replicated on a specified number of replicas before committed on master. The number of replicas can be specified using replication_synchro_quorum option. It is 1 by default, so sync transactions work like asynchronous when not configured anyhow. 1 means successful WAL write on master is enough for commit. When replication_synchro_quorum is greater than 1, an instance has to wait for the specified number of replicas to reply with success. If enough replies aren't collected during replication_synchro_timeout, the instance rolls back the tx in question. Part of #4844 Part of #5073
-
Vladislav Shpilevoy authored
Synchronous space makes every transaction, affecting its data, wait until it is replicated on a quorum of replicas before it is committed. Part of #4844 Part of #5073
-
Nikita Pettik authored
xrow_upsert_execute() can fail and return NULL for various reasons. However, in vy_apply_upsert() the result of xrow_upsert_execute() is used unconditionally which may lead to crash. Let's fix it and in case xrow_upser_execute() fails return from vy_apply_upsert() NULL value. Part of #4957
-