- Sep 11, 2020
-
-
Alexander V. Tikhonov authored
On heavy loaded hosts found the following issue: [035] --- replication/wal_off.result Fri Jul 3 04:29:56 2020 [035] +++ replication/wal_off.reject Mon Sep 7 15:32:46 2020 [035] @@ -47,6 +47,8 @@ [035] ... [035] while box.info.replication[wal_off_id].upstream.message ~= check do fiber.sleep(0) end [035] --- [035] +- error: '[string "while box.info.replication[wal_off_id].upstre..."]:1: attempt to [035] + index field ''upstream'' (a nil value)' [035] ... [035] box.info.replication[wal_off_id].upstream ~= nil [035] --- It happened because replication upstream status check occurred too early, when its state was not set. To give the replication status check routine ability to reach the needed 'stopped' state, it need to wait for it using test_run:wait_upstream() routine. Closes #5278
-
Alexander V. Tikhonov authored
On heavy loaded hosts found the following 3 issues: line 174: [026] --- replication/status.result Thu Jun 11 12:07:39 2020 [026] +++ replication/status.reject Sun Jun 14 03:20:21 2020 [026] @@ -174,15 +174,17 @@ [026] ... [026] replica.downstream.status == 'follow' [026] --- [026] -- true [026] +- false [026] ... It happened because replication downstream status check occurred too early. To give the replication status check routine ability to reach the needed 'follow' state, it need to wait for it using test_run:wait_downstream() routine. line 178: [024] --- replication/status.result Mon Sep 7 00:22:52 2020 [024] +++ replication/status.reject Mon Sep 7 00:36:01 2020 [024] @@ -178,11 +178,13 @@ [024] ... [024] replica.downstream.vclock[master_id] == box.info.vclock[master_id] [024] --- [024] -- true [024] +- error: '[string "return replica.downstream.vclock[master_id] =..."]:1: attempt to [024] + index field ''vclock'' (a nil value)' [024] ... [024] replica.downstream.vclock[replica_id] == box.info.vclock[replica_id] [024] --- [024] -- true [024] +- error: '[string "return replica.downstream.vclock[replica_id] ..."]:1: attempt to [024] + index field ''vclock'' (a nil value)' [024] ... [024] -- [024] -- Replica It happened because replication vclock field was not exist at the moment of its check. To fix the issue, vclock field had to be waited to be available using test_run:wait_cond() routine. Also the replication data downstream had to be read at the same moment. line 224: [014] --- replication/status.result Fri Jul 3 04:29:56 2020 [014] +++ replication/status.reject Mon Sep 7 00:17:30 2020 [014] @@ -224,7 +224,7 @@ [014] ... [014] master.upstream.status == "follow" [014] --- [014] -- true [014] +- false [014] ... [014] master.upstream.lag < 1 [014] --- It happened because replication upstream status check occurred too early. To give the replication status check routine ability to reach the needed 'follow' state, it need to wait for it using test_run:wait_upstream() routine. Removed test from 'fragile' test_run tool list to run it in parallel. Closes #5110
-
Alexander V. Tikhonov authored
On heavy loaded hosts found the following issue: [021] --- replication/gh-4606-admin-creds.result Wed Apr 15 15:47:41 2020 [021] +++ replication/gh-4606-admin-creds.reject Sun Sep 6 20:23:09 2020 [021] @@ -36,7 +36,42 @@ [021] | ... [021] i.replication[i.id % 2 + 1].upstream.status == 'follow' or i [021] | --- [021] - | - true [021] + | - version: 2.6.0-52-g71a24b9f2 [021] + | id: 2 [021] + | ro: false [021] + | uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79 [021] + | package: Tarantool [021] + | cluster: [021] + | uuid: f27dfdfe-2802-486a-bc47-abc83b9097cf [021] + | listen: unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/replica_auth.socket-iproto [021] + | replication_anon: [021] + | count: 0 [021] + | replication: [021] + | 1: [021] + | id: 1 [021] + | uuid: a07cad18-d27f-48c4-8d56-96b17026702e [021] + | lsn: 3 [021] + | upstream: [021] + | peer: admin@unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/master.socket-iproto [021] + | lag: 0.0030207633972168 [021] + | status: disconnected [021] + | idle: 0.44824500009418 [021] + | message: timed out [021] + | system_message: Operation timed out [021] + | 2: [021] + | id: 2 [021] + | uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79 [021] + | lsn: 0 [021] + | signature: 3 [021] + | status: running [021] + | vclock: {1: 3} [021] + | uptime: 1 [021] + | lsn: 0 [021] + | sql: [] [021] + | gc: [] [021] + | vinyl: [] [021] + | memory: [] [021] + | pid: 40326 [021] | ... [021] test_run:switch('default') [021] | --- It happened because replication upstream status check occurred too early, when it was only in 'disconnected' state. To give the replication status check routine ability to reach the needed 'follow' state, it need to wait for it using test_run:wait_upstream() routine. Closes #5233
-
Alexander V. Tikhonov authored
On heavy loaded hosts found the following issue: [004] --- replication/gh-4402-info-errno.result Wed Jul 22 06:13:34 2020 [004] +++ replication/gh-4402-info-errno.reject Wed Jul 22 06:41:14 2020 [004] @@ -32,7 +32,39 @@ [004] | ... [004] d ~= nil and d.status == 'follow' or i [004] | --- [004] - | - true [004] + | - version: 2.6.0-10-g8df49e4 [004] + | id: 1 [004] + | ro: false [004] + | uuid: 41c4e3bf-cc3b-443d-88c9-39a9a8fe2df9 [004] + | package: Tarantool [004] + | cluster: [004] + | uuid: 6ec7bcce-68e7-41a4-b84b-dc9236621579 [004] + | listen: unix/:(socket) [004] + | replication_anon: [004] + | count: 0 [004] + | replication: [004] + | 1: [004] + | id: 1 [004] + | uuid: 41c4e3bf-cc3b-443d-88c9-39a9a8fe2df9 [004] + | lsn: 52 [004] + | 2: [004] + | id: 2 [004] + | uuid: 8a989231-177a-4eb8-8030-c148bc752b0e [004] + | lsn: 0 [004] + | downstream: [004] + | status: stopped [004] + | message: timed out [004] + | system_message: Connection timed out [004] + | signature: 52 [004] + | status: running [004] + | vclock: {1: 52} [004] + | uptime: 27 [004] + | lsn: 52 [004] + | sql: [] [004] + | gc: [] [004] + | vinyl: [] [004] + | memory: [] [004] + | pid: 99 [004] | ... [004] [004] test_run:cmd('stop server replica') It happened because replication downstream status check occurred too early, when it was only in 'stopped' state. To give the replication status check routine ability to reach the needed 'follow' state, it need to wait for it using test_run:wait_downstream() routine. Closes #5235
-
Alexander V. Tikhonov authored
On heavy loaded hosts found the following issue: [089] --- replication/gh-4928-tx-boundaries.result Wed Jul 29 04:08:29 2020 [089] +++ replication/gh-4928-tx-boundaries.reject Wed Jul 29 04:24:02 2020 [089] @@ -94,7 +94,7 @@ [089] | ... [089] box.info.replication[1].upstream.status [089] | --- [089] - | - follow [089] + | - disconnected [089] | ... [089] [089] box.space.glob:select{} It happened because replication upstream status check occurred too early, when it was only in 'disconnected' state. To give the replication status check routine ability to reach the needed 'follow' state, it need to wait for it using test_run:wait_upstream() routine. Closes #5234
-
- Sep 09, 2020
-
-
Alexander V. Tikhonov authored
Fixed flaky status check: [016] @@ -73,11 +73,11 @@ [016] ... [016] box.info.status [016] --- [016] -- running [016] +- orphan [016] ... [016] box.info.ro [016] --- [016] -- false [016] +- true [016] ... [016] box.cfg{ \ [016] replication = {}, \ [016] Test changed to use wait condition for the status check, which should be changed from 'orphan' to 'running'. On heavy loaded hosts it may spend some additional time, wait condition routine helped to fix it. Closes #5271
-
Alexander V. Tikhonov authored
On heavy loaded hosts found the following issue: [036] --- replication/gh-3642-misc-no-socket-leak-on-replica-disconnect.result Sun Sep 6 23:49:57 2020 [036] +++ replication/gh-3642-misc-no-socket-leak-on-replica-disconnect.reject Mon Sep 7 04:07:06 2020 [036] @@ -63,7 +63,7 @@ [036] ... [036] box.info.replication[1].upstream.status [036] --- [036] -- follow [036] +- disconnected [036] ... [036] test_run:cmd('switch default') [036] --- It happened because replication upstream status check occurred too early, when it was only in 'disconnected' state. To give the replication status check routine ability to reach the needed 'follow' state, it need to wait for it using test_run:wait_upstream() routine. Closes #5276
-
Alexander V. Tikhonov authored
ASAN should the issue in msgpuck repository in file test/msgpuck.c which was the cause of the fail in unit/msgpack test. The issue was fixed in msgpuck repository and ASAN suppression was removed for it. Also removed skip condition file, which blocked the test when it failed. Part of #4360
-
Alexander V. Tikhonov authored
Removed lsan suppresions that were not reproduced. Part of #4360
-
- Sep 08, 2020
-
-
Ilya Kosarev authored
MsgPack extension types allow applications to define application-specific types. They consist of an 8-bit signed integer and a byte array where the integer represents a kind of types and the byte array represents data. Types from 0 to 127 are application-specific types and types from -128 to -1 are reserved for predefined types. However, extension types were printed as unsigned integers. Now it is fixed and extension types are being printed in a correct way as signed integers. Also the typo in word "Unsupported" was fixed. According test case is introduced. Closes #5016
-
Ilya Kosarev authored
rtree_search() has return value and it is ignored in some cases. Although it is totally fine it seems to be reasonable to comment those cases as far as such usage might be questionable. Closes #2052
-
Alexander V. Tikhonov authored
To fix flaky issues of replication/misc.test.lua the test had to be divided into smaller tests to be able to localize the flaky results: gh-2991-misc-asserts-on-update.test.lua gh-3111-misc-rebootstrap-from-ro-master.test.lua gh-3160-misc-heartbeats-on-master-changes.test.lua gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-3510-misc-assert-replica-on-applier-disconnect.test.lua gh-3606-misc-crash-on-box-concurrent-update.test.lua gh-3610-misc-assert-connecting-master-twice.test.lua gh-3637-misc-error-on-replica-auth-fail.test.lua gh-3642-misc-no-socket-leak-on-replica-disconnect.test.lua gh-3704-misc-replica-checks-cluster-id.test.lua gh-3711-misc-no-restart-on-same-configuration.test.lua gh-3760-misc-return-on-quorum-0.test.lua gh-4399-misc-no-failure-on-error-reading-wal.test.lua gh-4424-misc-orphan-on-reconfiguration-error.test.lua Needed for #4940
-
Kirill Yukhin authored
- test: correct buffer size to fix ASAN error
-
Sergey Bronnikov authored
Import of `table.clear` module has been removed to fix luacheck warning about unused variable in commit 3af79e70 ('Fix luacheck warnings in src/lua/') and method `table.clear()` became unavailable in Tarantool. This commit returns that import back as some applications depends on it (bug has been found with Cartridge application) and adds regression test for table.clear(). Note: `table.clear` is not available until an explicit `require('table.clear')` call. Closes #5210
-
- Aug 31, 2020
-
-
Alexander V. Tikhonov authored
On running update_repo tool with the given option to delete some RPMs need to remove all files found by this given pattern. The loop checking metadata deletes files, but only which were presented in it. However it is possible that some broken update left orphan files: they are present in the storage, but does not mentioned in the metadata.
-
Ilya Kosarev authored
Concurrent tuple update could segfault on BITSET_ALL_NOT_SET iterator usage. Fixed in 850054b2. This patch introduces corresponding test. Closes #1088
-
Alexander V. Tikhonov authored
Implemented openSUSE packages build with testing for images: opensuse-leap:15.[0-2] Added %{sle_version} checks in Tarantool spec file according to https://en.opensuse.org/openSUSE:Packaging_for_Leap#RPM_Distro_Version_Macros Added opensuse-leap of 15.1 and 15.2 versions to Gitlab-CI packages building/deploing jobs. Closes #4562
-
Alexander V. Tikhonov authored
During implementation of openSUSE build with testing got failed test box-tap/cfg.test.lua. Found that when memtx_dir didn't exist and vinyl_dir existed and also errno was set to ENOENT, box configuration succeeded, but it shouldn't. Reason of this wrong behavior was that not all of the failure paths in xdir_scan() set errno, but the caller assumed it. Debugging the issue found that after xdir_scan() there was incorrect check for errno when it returned negative values. xdir_scan() is not system call and negative return value from it doesn't mean that errno would be set too. Found that in situations when errno was left from previous commands before xdir_scan() and xdir_scan() returned negative value by itself it produced the wrong check. The previous failed logic of the check was to catch the error ENOENT which set in the xdir_scan() function to handle the situation when vinyl_dir was not exist. It failed, because checking ENOENT outside the xdir_scan() function, we had to be sure that ENOENT had come from xdir_scan() function call indeed and not from any other functions before. To be sure in it possible fix could be reset errno before xdir_scan() call, because errno could be passed from any other function before call to xdir_scan(). As mentioned above xdir_scan() function is not system call and can be changed in any possible way and it can return any result value without need to setup errno. So check outside of this function on errno could be broken. To avoid that we must not check errno after call of the function. Better solution is to use the flag in xdir_scan(), to check if the directory should exist. So errno check was removed and instead of it the check for vinyl_dir existence using flag added. Closes #4594 Needed for #4562 Co-authored-by:
Alexander Turenko <alexander.turenko@tarantool.org>
-
- Aug 25, 2020
-
-
Ilya Kosarev authored
Multikey index did not work properly with nullable root field in tuple_raw_multikey_count(). Now it is fixed and corresponding restrictions are dropped. This also means that we can drop implicit nullability update for array/map fields and make all fields nullable by default, as it was until e1d3fe8a (tuple format: don't allow null where array/map is expected), as far as default non-nullability itself doesn't solve any real problems while providing confusing behavior (gh-5027). Follow-up #5027 Closes #5192
-
- Aug 24, 2020
-
-
Vladislav Shpilevoy authored
There was no way to change certain space parameters without its recreation or manual update of internal system space _space. Even if some of them were legal to update: field_count, owner, flag of being temporary, is_sync flag. The patch introduces function space:alter(), which accepts a subset of parameters from box.schema.space.create which are mutable, and 'name' parameter. There is a method space:rename(), but still the parameter is added to space:alter() too, to be consistent with index:alter(), which also accepts a new name. Closes #5155 @TarantoolBot document Title: New function space:alter(options) Space objects in Lua (stored in `box.space` table) now have a new method: `space:alter(options)`. The method accepts a table with parameters `field_count`, `user`, `format`, `temporary`, `is_sync`, and `name`. All parameters have the same meaning as in `box.schema.space.create(name, options)`. Note, `name` parameter in `box.schema.space.create` is separated from `options` table. It is not so in `space:alter(options)` - here all parameters are specified in the `options` table. The function does not return anything in case of success, and throws an error when fails. From 'Synchronous replication' page, from 'Limitations and known problems' it is necessary to delete the note about "no way to enable synchronous replication for existing spaces". Instead it is necessary to say, that it can be enabled using `space:alter({is_sync = true})`. And can be disabled by setting `is_sync = false`. https://www.tarantool.io/en/doc/2.5/book/replication/repl_sync/#limitations-and-known-problems The function will appear in >= 2.5.2.
-
Cyrill Gorcunov authored
We no longer use it. Closes #5129 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Since we no longer use txn engine for synchro packets processing this code is never executed. Part-of #5129 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Transaction processing code is very heavy simply because transactions are carrying various data and involves a number of other mechanisms to proceed. In turn, when we receive confirm or rollback packed from another node in a cluster we just need to inspect limbo queue and write this packed into a WAL journal. So calling a bunch of txn engine helpers is simply waste of cycles. Thus lets rather handle them in a special light way: - allocate synchro_entry structure which would carry the journal entry itself and encoded message - process limbo queue to mark confirmed/rollback'ed messages - finally write this synchro_entry into a journal Which is a way simplier. Part-of #5129 Suggedsted-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Co-developed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
When we need to write CONFIRM or ROLLBACK message (which is a binary record in msgpack format) into a journal we use txn code to allocate a new transaction, encode there a message and pass it to walk the long txn path before it hit the journal. This is not only resource wasting but also somehow strange from architectural point of view. Instead lets encode a record on the stack and write it to the journal directly. Part-of #5129 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
These msgpack entries will be needed to write them down to a journal without involving txn engine. Same time we would like to be able to allocate them on stack, for this sake the binary form is predefined. Part-of #5129 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
To create raw journal entries. We will use it to write confirm/rollback entries. Part-of #5129 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
In commit 77ba0e35 we've redesigned wal journal operations such that asynchronous write completion is a single instance per journal. It turned out that such simplification is too tight and doesn't allow us to pass entries into the journal with custom completions. Thus lets allow back such ability. We will need it to be able to write "confirm" records into wal directly without touching transactions code at all. Part-of #5129 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- Aug 20, 2020
-
-
Alexander V. Tikhonov authored
Removed asan/lsan suppresions for issues that were not reproduced. Removed skip condition files for tests that passed testing. Part of #4360
-
- Aug 17, 2020
-
-
Vladislav Shpilevoy authored
All requests saved to WAL and transmitted through network have their own request structure with parameters: - struct request for DML; - struct call_request for CALL/EVAL; - struct auth_request for AUTH; - struct ballot for VOTE; - struct sql_request for SQL; - struct greeting for greeting. It is done for a reason - not to pass all the request parameters into each function one by one, and manage them all at once instead. For synchronous requests IPROTO_CONFIRM and IPROTO_ROLLBACK it was not done. Because so far it was not too hard to carry just 2 parameters: lsn and replica_id, from their body. But it will be changed in #5129. Because in fact these requests have more parameters, but they were filled by txn module, since synchro requests were saved to WAL via transactions (due to lack of alternative API to access WAL). After #5129 it will be necessary to save LSN and replica_id of the request author. This patch introduces struct synchro_request to simplify extension of the synchro parameters. Closes #5151 Needed for #5129
-
- Aug 15, 2020
-
-
Vladislav Shpilevoy authored
Applier on_rollback and on_wal_write don't need any arguments - they either work with a global state, or with the signaled applier stored inside the trigger. However into on_wal_write() and on_rollback() was passed the transaction object, unused. Even if it would be used, it should have been fixed, because soon these triggers will be fired not only for traditional 'txn' transactions. They will be used by the synchro request WAL writes too - they don't have 'transactions'. Part of #5129
-
- Aug 13, 2020
-
-
Yaroslav Dynnikov authored
In the recent update of libcurl (2.5.0-278-g807c7fa58) its layout has changed: private function `Curl_version_init()` which used to fill-in info structure was eliminated. As a result, no symbols for `libcurl_la-version.o` remained used, so it wasn't included in tarantool binary. And `curl_version` and `curl_version_info` symbols went missing. According to libcurl naming conventions all exported symbols are named as `curl_*`. This patch lists them all explicitly in `exprots.h` and adds the test. Close #5223
-
- Aug 12, 2020
-
-
Vladislav Shpilevoy authored
Tuple JSON field access crashed when '[*]' was used as a first part of the JSON path. The patch makes it treated like 'field not found'. Follow-up #5224
-
Vladislav Shpilevoy authored
When a tuple had format with multikey indexes in it, any attempt to get a multikey indexed field by a JSON path from Lua led to a crash. That was because of incorrect interpretation of offset slot value in tuple's field map. Tuple field map is an array stored before the tuple's MessagePack data. Each element is a 4 byte offset to an indexed value to be able to get it for O(1) time without MessagePack decoding of all the previous fields. At least it was so before multikeys. Now tuple field map is not just an array. It is rather a 2-level array, somehow similar to ext4 FS. Some elements of the root array are positive numbers pointing at data. Some elements point at a second 'indirect' array, so called 'extra', size of which is individual for each tuple. These second arrays are used by multikey indexes to store offsets to each multikey indexed value in a tuple. It means, that if there is an offset slot, it can't be just used as is. It is allowed only if the field is not multikey. Otherwise it is neccessary to somehow get an index in the second 'indirect' array. This is what was happening - a multikey field was found, its offset slot was valid, but it was pointing at an 'indirect' array, not at the data. JSON tuple field access tried to use it as a data offset. The patch makes JSON field access degrade to fullscan when a field is multikey, but no multikey array index is provided. Closes #5224
-
- Aug 11, 2020
-
-
Vladislav Shpilevoy authored
Box.snapshot() could include rolled back data in case synchronous transaction ROLLBACK arrived during WAL rotation in preparation of a checkpoint. More specifically, snapshot consists of fixating the engines' content (creation of a read-view), doing WAL rotation, and writing the snapshot itself. All data changes after content fixation won't go into the snap. So if ROLLBACK arrives during WAL rotation, the fixated content will have rolled back data, not present in the newest dataset. The patch makes it fail if during WAL rotation anything was rolled back. The bug sometimes appeared in an existing test about qsync snapshots, but with a very poor reproducibility. In a new test file it is reproduced 100% without the patch. Closes #5167
-
- Jul 31, 2020
-
-
Vladislav Shpilevoy authored
Replica can send the same ACK multiple times. This is relatively easy to achieve. ACK is a message form the replica containing its vclock. It is sent on each replica's vclock update. The update not necessarily means that master's LSN was changed. Replica could write something locally, with its own instance_id. Vclock is changed, sent to the master, but from the limbo's point of view it looks like duplicated ACK, because the received master's LSN didn't change. The patch makes limbo ignore duplicated ACKs. Closes #5195 Part of #5219
-
Vladislav Shpilevoy authored
Limbo could try to CONFIRM LSN whose ROLLBACK is in progress. This is how it could happen: - A synchronous transaction is created, written to WAL; - The fiber sleeps in the limbo waiting for CONFIRM or timeout; - Timeout happens. ROLLBACK for this and all next LSNs is sent to WAL; - Replica receives the transaction, sends ACK; - Master receives ACK, starts writing CONFIRM for the LSN, whose ROLLBACK is in progress right now. Another case - attempt to lower synchro quorum during ROLLBACK write. It also could try to write CONFIRM. The patch skips CONFIRM if there is a ROLLBACK in progress. Not even necessary to check LSNs. Because ROLLBACK always reverts the entire limbo queue, so it will cancel all pending transactions with all LSNs, and new commits are rolled back even before they try to go to WAL. CONFIRM can't help here with anything already. Part of #5185
-
- Jul 30, 2020
-
-
Vladislav Shpilevoy authored
Limbo could try to ROLLBACK LSN whose CONFIRM is in progress. This is how it could happen: - A synchronous transaction is created, written to WAL; - The fiber sleeps in the limbo waiting for CONFIRM or timeout; - Replica receives the transaction, sends ACK; - Master receives ACK, starts writing CONFIRM; - The first fiber times out and tries to write ROLLBACK for the LSN, whose CONFIRM is in progress right now. The patch adds more checks to the 'timed out' code path to see if it isn't too late to write ROLLBACK. If CONFIRM is in progress, the fiber will wait for its finish. Part of #5185
-
Vladislav Shpilevoy authored
CONFIRM and ROLLBACK go to WAL. Their WAL write can fail just like any other WAL write. However it is not clear what to do in that case, especially in case of ROLLBACK fail. The patch adds panic() stub so as to at least terminate the instance. Before the patch it would work like nothing happened, with undefined behaviour. Closes #5159
-
- Jul 29, 2020
-
-
Vladislav Shpilevoy authored
The calls were added before and after each cond_wait() so as the fiber couldn't be woken up externally. For example, from Lua. But it is not necessary to flip the flag on each wait() call. It is enough to make it 2 times: forbid cancellation in the beginning of txn_limbo_wait_complete(), and return the old value back in the end.
-
Vladislav Shpilevoy authored
When an ACK was received for an already confirmed transaction whose CONFIRM WAL write is in progress, it produced a second CONFIRM in WAL with the same LSN. That was unnecessary work taking time and disk space for WAL records. Although it didn't lead to any bugs. Just was very inefficient. This patch makes confirmation LSN monotonically grow. In case more ACKs are received for an already confirmed LSN, its confirmation is not written second time. Closes #5144
-