- Oct 13, 2020
-
-
Alexander V. Tikhonov authored
Added for tests with issues: app/socket.test.lua gh-4978 box/access.test.lua gh-5411 box/access_misc.test.lua gh-5401 box/gh-5135-invalid-upsert.test.lua gh-5376 box/hash_64bit_replace.test.lua test gh-5410 box/hash_replace.test.lua gh-5400 box/huge_field_map_long.test.lua gh-5375 box/net.box_huge_data_gh-983.test.lua gh-5402 replication/anon.test.lua gh-5381 replication/autoboostrap.test.lua gh-4933 replication/box_set_replication_stress.test.lua gh-4992 replication/election_basic.test.lua gh-5368 replication/election_qsync.test.lua test gh-5395 replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380 replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407 replication/gh-5287-boot-anon.test.lua gh-5412 replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379 replication/show_error_on_disconnect.test.lua gh-5371 replication/status.test.lua gh-5409 swim/swim.test.lua gh-5403 unit/swim.test gh-5399 vinyl/gc.test.lua gh-5383 vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408 vinyl/gh-4957-too-many-upserts.test.lua gh-5378 vinyl/gh.test.lua gh-5141 vinyl/quota.test.lua gh-5377 vinyl/snapshot.test.lua gh-4984 vinyl/stat.test.lua gh-4951 vinyl/upsert.test.lua gh-5398
-
Alexander V. Tikhonov authored
Testing on FreeBSD 12 had some tests previously blocked to avoid of flaky fails. For now we have the ability to avoid of it in test-run using checksums for fails with opened issues. So adding back 7 tests to testing on FreeBSD 12. Closes #4271
-
Alexander V. Tikhonov authored
Set error message to log output in test: vinyl/gc.test.lua
-
Alexander V. Tikhonov authored
Set error message to log output in test: vinyl/snapshot.test.lua
-
Alexander V. Tikhonov authored
Set error message to log output in test: replication/gh-4402-info-errno.test.lua
-
Alexander V. Tikhonov authored
Set error message to log output in test: replication/replica_rejoin.test.lua
-
Alexander V. Tikhonov authored
Set error message to log output in test: replication/gh-3160-misc-heartbeats-on-master-changes.test.lua
-
- Oct 12, 2020
-
-
Vladislav Shpilevoy authored
The new option can be one of 3 values: 'off', 'candidate', 'voter'. It replaces 2 old options: election_is_enabled and election_is_candidate. These flags looked strange, that it was possible to set candidate true, but disable election at the same time. Also it would not look good if we would ever decide to introduce another mode like a data-less sentinel node, for example. Just for voting. Anyway, the single option approach looks easier to configure and to extend. - 'off' means the election is disabled on the node. It is the same as election_is_enabled = false in the old config; - 'voter' means the node can vote and is never writable. The same as election_is_enabled = true + election_is_candidate = false in the old config; - 'candidate' means the node is a full-featured cluster member, which eventually may become a leader. The same as election_is_enabled = true + election_is_candidate = true in the old config. Part of #1146
-
- Oct 07, 2020
-
-
Aleksandr Lyapunov authored
space:fselect and index:fselect fetch data like ordinal select, but formats the result like mysql does - with columns, column names etc. fselect converts tuple to strings using json, extending with spaces and cutting tail if necessary. It is designed for visual analysis of select result and shouldn't be used stored procedures. index:fselect(<key>, <opts>, <fselect_opts>) space:fselect(<key>, <opts>, <fselect_opts>) There are some options that can be specified in different ways: - among other common options (<opts>) with 'fselect_' prefix. (e.g. 'fselect_type=..') - in special <fselect_opts> map (with or without prefix). - in global variables with 'fselect_' prefix. The possible options are: - type: - 'sql' - like mysql result (default). - 'gh' (or 'github' or 'markdown') - markdown syntax, for copy-pasting to github. - 'jira' - jira table syntax (for copy-pasting to jira). - widths: array with desired widths of columns. - max_width: limit entire length of a row string, longest fields will be cut if necessary. Set to 0 (default) to detect and use screen width. Set to -1 for no limit. - print: (default - false) - print each line instead of adding to result. - use_nbsp: (default - true) - add invisible spaces to improve readability in YAML output. Not applicabble when print=true. There is also a pair of shortcuts: index/space:gselect - same as fselect, but with type='gh'. index/space:jselect - same as fselect, but with type='jira'. See test/engine/select.test.lua for examples. Closes #5161
-
- Oct 06, 2020
-
-
Igor Munkin authored
While running GC hook (i.e. __gc metamethod) garbage collector engine is "stopped": the memory penalty threshold is set to LJ_MAX_MEM and incremental GC step is not triggered as a result. Ergo, yielding the execution at the finalizer body leads to further running platform with disabled LuaJIT GC. It is not re-enabled until the yielded fiber doesn't get the execution back. This changeset extends <cord_on_yield> routine with the check whether GC hook is active. If the switch-over occurs in scope of __gc metamethod the platform is forced to stop its execution with EXIT_FAILURE and calls panic routine before the exit. Relates to #4518 Follows up #4727 Reviewed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Reviewed-by:
Sergey Ostanevich <sergos@tarantool.org> Signed-off-by:
Igor Munkin <imun@tarantool.org>
-
Serge Petrenko authored
-
- Oct 02, 2020
-
-
Igor Munkin authored
Since Tarantool fibers don't respect Lua coroutine switch mechanism, JIT machinery stays unnotified when one lua_State substitutes another one. As a result if trace recording hasn't been aborted prior to fiber switch, the recording proceeds using the new lua_State and leads to a failure either on any further compiler phase or while the compiled trace is executed. This changeset extends <cord_on_yield> routine aborting trace recording when the fiber switches to another one. If the switch-over occurs while mcode is being run the platform finishes its execution with EXIT_FAILURE code and calls panic routine prior to the exit. Closes #1700 Fixes #4491 Reviewed-by:
Sergey Ostanevich <sergos@tarantool.org> Reviewed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Signed-off-by:
Igor Munkin <imun@tarantool.org>
-
Igor Munkin authored
Tarantool integrates several complex environments together and there are issues occurring at their junction leading to the platform failures. E.g. fiber switch-over is implemented outside the Lua world, so when one lua_State substitutes another one, main LuaJIT engines, such as JIT and GC, are left unnotified leading to the further platform misbehaviour. To solve this severe integration drawback <cord_on_yield> function is introduced. This routine encloses the checks and actions to be done when the running fiber yields the execution. Unfortunately the way callback is implemented introduces a circular dependency. Considering linker symbol resolving methods for static build an auxiliary translation unit is added to the particular tests mocking (i.e. exporting) <cord_on_yield> undefined symbol. Part of #1700 Relates to #4491 Reviewed-by:
Sergey Ostanevich <sergos@tarantool.org> Reviewed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Signed-off-by:
Igor Munkin <imun@tarantool.org>
-
- Oct 01, 2020
-
-
Cyrill Gorcunov authored
Convert to uint64_t explicitly. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- Sep 29, 2020
-
-
Vladislav Shpilevoy authored
Part of #1146
-
Vladislav Shpilevoy authored
Box.info.election returns a table of form: { state: <string>, term: <number>, vote: <instance ID>, leader: <instance ID> } The fields correspond to the same named Raft concepts one to one. This info dump is supposed to help with the tests, first of all. And with investigation of problems in a real cluster. The API doesn't mention 'Raft' on purpose, to keep it not depending specifically on Raft, and not to confuse users who don't know anything about Raft (even that it is about leader election and synchronous replication). Part of #1146
-
Vladislav Shpilevoy authored
The new options are: - election_is_enabled - enable/disable leader election (via Raft). When disabled, the node is supposed to work like if Raft does not exist. Like earlier; - election_is_candidate - a flag whether the instance can try to become a leader. Note, it can vote for other nodes regardless of value of this option; - election_timeout - how long need to wait until election end, in seconds. The options don't do anything now. They are added separately in order to keep such mundane changes from the main Raft commit, to simplify its review. Option names don't mention 'Raft' on purpose, because - Not all users know what is Raft, so they may not even know it is related to leader election; - In future the algorithm may change from Raft to something else, so better not to depend on it too much in the public API. Part of #1146
-
- Sep 28, 2020
-
-
Roman Khabibov authored
Ban ability to modify view on box level. Since a view is a named select, and not a table, in fact, altering view is not a valid operation.
-
Alexander V. Tikhonov authored
Added for tests with issues: app/fiber.test.lua gh-5341 app-tap/debug.test.lua gh-5346 app-tap/http_client.test.lua gh-5346 app-tap/inspector.test.lua gh-5346 box/gh-2763-session-credentials-update.test.lua gh-5363 box/hash_collation.test.lua gh-5247 box/lua.test.lua gh-5351 box/net.box_connect_triggers_gh-2858.test.lua gh-5247 box/net.box_incompatible_index-gh-1729.test.lua gh-5360 box/net.box_on_schema_reload-gh-1904.test.lua gh-5354 box/protocol.test.lua gh-5247 box/update.test.lua gh-5247 box-tap/net.box.test.lua gh-5346 replication/autobootstrap.test.lua gh-4533 replication/autobootstrap_guest.test.lua gh-4533 replication/ddl.test.lua gh-5337 replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940 replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua.test.lua gh-5357 replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343 replication/long_row_timeout.test.lua gh-4351 replication/on_replace.test.lua gh-5344, gh-5349 replication/prune.test.lua gh-5361 replication/qsync_advanced.test.lua gh-5340 replication/qsync_basic.test.lua gh-5355 replication/replicaset_ro_mostly.test.lua gh-5342 replication/wal_rw_stress.test.lua gh-5347 replication-py/multi.test.py gh-5362 sql/prepared.test.lua test gh-5359 sql-tap/selectG.test.lua gh-5350 vinyl/ddl.test.lua gh-5338 vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197 vinyl/iterator.test.lua gh-5336 vinyl/write_iterator_rand.test.lua gh-5356 xlog/panic_on_wal_error.test.lua gh-5348
-
- Sep 25, 2020
-
-
Alexander V. Tikhonov authored
Removed dust line from merge.
-
Alexander V. Tikhonov authored
In test-run implemented the new format of the fragile lists based on JSON format set as fragile option in 'suite.ini' files per each suite: fragile = { "retries": 10, "tests": { "bitset.test.lua": { "issues": [ "gh-4095" ], "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ] } }} Added ability to check results file checksum on tests fail and compare with the checksums of the known issues mentioned in the fragile list. Also added ability to set 'retries' option, which sets the number of accepted reruns of the tests failed from 'fragile' list that have checksums on its fails. Closes #5050
-
Alexander V. Tikhonov authored
Found flaky issues multi running replication/anon.test.lua test on the single worker: [007] --- replication/anon.result Fri Jun 5 09:02:25 2020 [007] +++ replication/anon.reject Mon Jun 8 01:19:37 2020 [007] @@ -55,7 +55,7 @@ [007] [007] box.info.status [007] | --- [007] - | - running [007] + | - orphan [007] | ... [007] box.info.id [007] | --- [094] --- replication/anon.result Sat Jun 20 06:02:43 2020 [094] +++ replication/anon.reject Tue Jun 23 19:35:28 2020 [094] @@ -154,7 +154,7 @@ [094] -- Test box.info.replication_anon. [094] box.info.replication_anon [094] | --- [094] - | - count: 1 [094] + | - count: 2 [094] | ... [094] #box.info.replication_anon() [094] | --- [094] It happend because replications may stay active from the previous runs on the common tarantool instance at the test-run worker. To avoid of it added restarting of the tarantool instance at the very start of the test. Closes #5058
-
- Sep 23, 2020
-
-
Aleksandr Lyapunov authored
Closes #4897
-
Aleksandr Lyapunov authored
txn_proxy is a special utility for transaction tests. Formerly it was used only for vinyl tests and thus was placed in vinyl folder. Now the time has come to test memtx transactions and the utility must be placed amongst other utils - in box/lua. Needed for #4897
-
Aleksandr Lyapunov authored
Define memtx TX manager. It will store data for MVCC and conflict manager. Define also 'memtx_use_mvcc_engine' in config that enables that MVCC engine. Part of #4897
-
- Sep 18, 2020
-
-
Vladislav Shpilevoy authored
The test tried to start a replica whose box.cfg would hang, with replication_connect_quorum = 0 to make it return immediately. But the quorum parameter was added and removed during work on 44421317 ("replication: do not register outgoing connections"). Instead, to start the replica without blocking on box.cfg it is necessary to pass 'wait=False' with the test_run:cmd('start server') command. Closes #5311
-
- Sep 17, 2020
-
-
Vladislav Shpilevoy authored
Replication protocol's first stage for non-anonymous replicas is that the replica should be registered in _cluster to get a unique ID number. That happens, when replica connects to a writable node, which performs the registration. So it means, registration always happens on the master node when appears an *incoming* request for it, explicitly asking for a registration. Only relay can do that. That wasn't the case for bootstrap. If box.cfg.replication wasn't empty on the master node doing the cluster bootstrap, it registered all the outgoing connections in _cluster. Note, the target node could be even anonymous, but still was registered. That breaks the protocol, and leads to registration of anon replicas sometimes. The patch drops it. Another motivation here is Raft cluster bootstrap specifics. During Raft bootstrap it is going to be very important that non-joined replicas should not be registered in _cluster. A replica can only register after its JOIN request was accepted, and its snapshot download has started. Closes #5287 Needed for #1146
-
Vladislav Shpilevoy authored
Previously XlogGapError was considered a critical error stopping the replication. That may be not so good as it looks. XlogGapError is a perfectly fine error, which should not kill the replication connection. It should be retried instead. Because here is an example, when the gap can be recovered on its own. Consider the case: node1 is a leader, it is booted with vclock {1: 3}. Node2 connects and fetches snapshot of node1, it also gets vclock {1: 3}. Then node1 writes something and its vclock becomes {1: 4}. Now node3 boots from node1, and gets the same vclock. Vclocks now look like this: - node1: {1: 4}, leader, has {1: 3} snap. - node2: {1: 3}, booted from node1, has only snap. - node3: {1: 4}, booted from node1, has only snap. If the cluster is a fullmesh, node2 will send subscribe requests with vclock {1: 3}. If node3 receives it, it will respond with xlog gap error, because it only has a snap with {1: 4}, nothing else. In that case node2 should retry connecting to node3, and in the meantime try to get newer changes from node1. The example is totally valid. However it is unreachable now because master registers all replicas in _cluster before allowing them to make a join. So they all bootstrap from a snapshot containing all their IDs. This is a bug, because such auto-registration leads to registration of anonymous replicas, if they are present during bootstrap. Also it blocks Raft, which can't work if there are registered, but not yet joined nodes. Once the registration problem will be solved in a next commit, the XlogGapError will strike quite often during bootstrap. This patch won't allow that happen. Needed for #5287
-
Vladislav Shpilevoy authored
XlogGapError object didn't have a code in ClientError code space. Because of that it was not possible to handle the gap error together with client errors in some switch-case statement. Now the gap error has a code. This is going to be used in applier code to handle XlogGapError among other errors using its code instead of RTTI. Needed for #5287
-
- Sep 15, 2020
-
-
Alexander V. Tikhonov authored
On heavy loaded hosts found the following issue: [037] --- replication/gh-3704-misc-replica-checks-cluster-id.result Thu Sep 10 18:05:22 2020 [037] +++ replication/gh-3704-misc-replica-checks-cluster-id.reject Fri Sep 11 11:09:38 2020 [037] @@ -25,7 +25,7 @@ [037] ... [037] box.info.replication[2].downstream.status [037] --- [037] -- follow [037] +- stopped [037] ... [037] -- change master's cluster uuid and check that replica doesn't connect. [037] test_run:cmd("stop server replica") It happened because replication downstream status check occurred too early, when it was only in 'stopped' state. To give the replication status check routine ability to reach the needed 'follow' state, it need to wait for it using test_run:wait_downstream() routine. Closes #5293
-
- Sep 14, 2020
-
-
Vladislav Shpilevoy authored
Snapshot rows contain not real LSNs. Instead their LSNs are signatures, ordinal numbers. Rows in the snap have LSNs from 1 to the number of rows. This is because LSNs are not stored with every tuple in the storages, and there is no way to store real LSNs in the snapshot. These artificial LSNs broke the synchronous replication limbo. After snap recovery is done, limbo vclock was broken - it contained numbers not related to reality, and affected by rows from local spaces. Also the recovery could stuck because ACKs in the limbo stopped working after a first row - the vclock was set to the final signature right away. This patch makes all snapshot recovered rows async. Because they are confirmed by definition. So now the limbo is not involved into the snapshot recovery. Closes #5298
-
- Sep 12, 2020
-
-
Vladislav Shpilevoy authored
During recovery WAL writes end immediately, without yields. Therefore WAL write completion callback is executed in the currently active fiber. Txn limbo on CONFIRM WAL write wakes up the waiting fiber, which appears to be the same as the active fiber during recovery. That breaks the fiber scheduler, because apparently it is not safe to wake the currently active fiber unless it is going to call fiber_yield() immediately after. See a comment in fiber_wakeup() implementation about that way of usage. The patch simply stops waking the waiting fiber, if it is the currently active one. Closes #5288 Closes #5232
-
- Sep 11, 2020
-
-
Alexander V. Tikhonov authored
Found 2 issues on Debug build: [009] --- replication/status.result Fri Sep 11 10:04:53 2020 [009] +++ replication/status.reject Fri Sep 11 13:16:21 2020 [009] @@ -174,7 +174,8 @@ [009] ... [009] test_run:wait_downstream(replica_id, {status == 'follow'}) [009] --- [009] -- true [009] +- error: '[string "return test_run:wait_downstream(replica_id, {..."]:1: variable [009] + ''status'' is not declared' [009] ... [009] -- wait for the replication vclock [009] test_run:wait_cond(function() \ [009] @@ -226,7 +227,8 @@ [009] ... [009] test_run:wait_upstream(master_id, {status == 'follow'}) [009] --- [009] -- true [009] +- error: '[string "return test_run:wait_upstream(master_id, {sta..."]:1: variable [009] + ''status'' is not declared' [009] ... [009] master.upstream.lag < 1 [009] --- It happened because of the change introduced in commit [1]. Where mistakenly were used wait_upstream()/wait_downstream() with: test_run:wait_*stream(*_id, {status == 'follow'}) with status set using '==' instead of '='. We unable to read status variable when the strict mode is enabled. It is enabled by default on Debug builds. Follows up #5110 Closes #5297 Reviewed-by:
Alexander Turenko <alexander.turenko@tarantool.org> Co-authored-by:
Alexander Turenko <alexander.turenko@tarantool.org> [1] - a08b4f3a ("test: flaky replication/status.test.lua status")
-
Oleg Babin authored
This patch makes log.cfg{log = ...} behaviour the same as in box.cfg{log = ...} and fixes panic if "log" is incorrectly specified. For such purpose we export "say_parse_logger_type" function and use for logger type validation and logger type parsing. Closes #5130
-
Alexander V. Tikhonov authored
On heavy loaded hosts found the following issue: box.cfg{replication_synchro_quorum = 2} | --- + | - error: '[string "test_run:wait_cond(function() ..."]:1: attempt to + | index field ''vclock'' (a nil value)' | ... The issue output was not correct due to wrong output list. Real command that caused the initial issue was the previous command: test_run:wait_cond(function() \ local info = box.info.replication[replica_id] \ local lsn = info.downstream.vclock[replica_id] \ return lsn and lsn >= replica_lsn \ end) It happened because replication vclock field was not exist at the moment of its check. To fix the issue, vclock field had to be waited to be available using test_run:wait_cond() routine. Closes #5230
-
Alexander V. Tikhonov authored
On heavy loaded hosts found the following issue: [035] --- replication/wal_off.result Fri Jul 3 04:29:56 2020 [035] +++ replication/wal_off.reject Mon Sep 7 15:32:46 2020 [035] @@ -47,6 +47,8 @@ [035] ... [035] while box.info.replication[wal_off_id].upstream.message ~= check do fiber.sleep(0) end [035] --- [035] +- error: '[string "while box.info.replication[wal_off_id].upstre..."]:1: attempt to [035] + index field ''upstream'' (a nil value)' [035] ... [035] box.info.replication[wal_off_id].upstream ~= nil [035] --- It happened because replication upstream status check occurred too early, when its state was not set. To give the replication status check routine ability to reach the needed 'stopped' state, it need to wait for it using test_run:wait_upstream() routine. Closes #5278
-
Alexander V. Tikhonov authored
On heavy loaded hosts found the following 3 issues: line 174: [026] --- replication/status.result Thu Jun 11 12:07:39 2020 [026] +++ replication/status.reject Sun Jun 14 03:20:21 2020 [026] @@ -174,15 +174,17 @@ [026] ... [026] replica.downstream.status == 'follow' [026] --- [026] -- true [026] +- false [026] ... It happened because replication downstream status check occurred too early. To give the replication status check routine ability to reach the needed 'follow' state, it need to wait for it using test_run:wait_downstream() routine. line 178: [024] --- replication/status.result Mon Sep 7 00:22:52 2020 [024] +++ replication/status.reject Mon Sep 7 00:36:01 2020 [024] @@ -178,11 +178,13 @@ [024] ... [024] replica.downstream.vclock[master_id] == box.info.vclock[master_id] [024] --- [024] -- true [024] +- error: '[string "return replica.downstream.vclock[master_id] =..."]:1: attempt to [024] + index field ''vclock'' (a nil value)' [024] ... [024] replica.downstream.vclock[replica_id] == box.info.vclock[replica_id] [024] --- [024] -- true [024] +- error: '[string "return replica.downstream.vclock[replica_id] ..."]:1: attempt to [024] + index field ''vclock'' (a nil value)' [024] ... [024] -- [024] -- Replica It happened because replication vclock field was not exist at the moment of its check. To fix the issue, vclock field had to be waited to be available using test_run:wait_cond() routine. Also the replication data downstream had to be read at the same moment. line 224: [014] --- replication/status.result Fri Jul 3 04:29:56 2020 [014] +++ replication/status.reject Mon Sep 7 00:17:30 2020 [014] @@ -224,7 +224,7 @@ [014] ... [014] master.upstream.status == "follow" [014] --- [014] -- true [014] +- false [014] ... [014] master.upstream.lag < 1 [014] --- It happened because replication upstream status check occurred too early. To give the replication status check routine ability to reach the needed 'follow' state, it need to wait for it using test_run:wait_upstream() routine. Removed test from 'fragile' test_run tool list to run it in parallel. Closes #5110
-
Alexander V. Tikhonov authored
On heavy loaded hosts found the following issue: [021] --- replication/gh-4606-admin-creds.result Wed Apr 15 15:47:41 2020 [021] +++ replication/gh-4606-admin-creds.reject Sun Sep 6 20:23:09 2020 [021] @@ -36,7 +36,42 @@ [021] | ... [021] i.replication[i.id % 2 + 1].upstream.status == 'follow' or i [021] | --- [021] - | - true [021] + | - version: 2.6.0-52-g71a24b9f2 [021] + | id: 2 [021] + | ro: false [021] + | uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79 [021] + | package: Tarantool [021] + | cluster: [021] + | uuid: f27dfdfe-2802-486a-bc47-abc83b9097cf [021] + | listen: unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/replica_auth.socket-iproto [021] + | replication_anon: [021] + | count: 0 [021] + | replication: [021] + | 1: [021] + | id: 1 [021] + | uuid: a07cad18-d27f-48c4-8d56-96b17026702e [021] + | lsn: 3 [021] + | upstream: [021] + | peer: admin@unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/master.socket-iproto [021] + | lag: 0.0030207633972168 [021] + | status: disconnected [021] + | idle: 0.44824500009418 [021] + | message: timed out [021] + | system_message: Operation timed out [021] + | 2: [021] + | id: 2 [021] + | uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79 [021] + | lsn: 0 [021] + | signature: 3 [021] + | status: running [021] + | vclock: {1: 3} [021] + | uptime: 1 [021] + | lsn: 0 [021] + | sql: [] [021] + | gc: [] [021] + | vinyl: [] [021] + | memory: [] [021] + | pid: 40326 [021] | ... [021] test_run:switch('default') [021] | --- It happened because replication upstream status check occurred too early, when it was only in 'disconnected' state. To give the replication status check routine ability to reach the needed 'follow' state, it need to wait for it using test_run:wait_upstream() routine. Closes #5233
-
Alexander V. Tikhonov authored
On heavy loaded hosts found the following issue: [004] --- replication/gh-4402-info-errno.result Wed Jul 22 06:13:34 2020 [004] +++ replication/gh-4402-info-errno.reject Wed Jul 22 06:41:14 2020 [004] @@ -32,7 +32,39 @@ [004] | ... [004] d ~= nil and d.status == 'follow' or i [004] | --- [004] - | - true [004] + | - version: 2.6.0-10-g8df49e4 [004] + | id: 1 [004] + | ro: false [004] + | uuid: 41c4e3bf-cc3b-443d-88c9-39a9a8fe2df9 [004] + | package: Tarantool [004] + | cluster: [004] + | uuid: 6ec7bcce-68e7-41a4-b84b-dc9236621579 [004] + | listen: unix/:(socket) [004] + | replication_anon: [004] + | count: 0 [004] + | replication: [004] + | 1: [004] + | id: 1 [004] + | uuid: 41c4e3bf-cc3b-443d-88c9-39a9a8fe2df9 [004] + | lsn: 52 [004] + | 2: [004] + | id: 2 [004] + | uuid: 8a989231-177a-4eb8-8030-c148bc752b0e [004] + | lsn: 0 [004] + | downstream: [004] + | status: stopped [004] + | message: timed out [004] + | system_message: Connection timed out [004] + | signature: 52 [004] + | status: running [004] + | vclock: {1: 52} [004] + | uptime: 27 [004] + | lsn: 52 [004] + | sql: [] [004] + | gc: [] [004] + | vinyl: [] [004] + | memory: [] [004] + | pid: 99 [004] | ... [004] [004] test_run:cmd('stop server replica') It happened because replication downstream status check occurred too early, when it was only in 'stopped' state. To give the replication status check routine ability to reach the needed 'follow' state, it need to wait for it using test_run:wait_downstream() routine. Closes #5235
-
Alexander V. Tikhonov authored
On heavy loaded hosts found the following issue: [089] --- replication/gh-4928-tx-boundaries.result Wed Jul 29 04:08:29 2020 [089] +++ replication/gh-4928-tx-boundaries.reject Wed Jul 29 04:24:02 2020 [089] @@ -94,7 +94,7 @@ [089] | ... [089] box.info.replication[1].upstream.status [089] | --- [089] - | - follow [089] + | - disconnected [089] | ... [089] [089] box.space.glob:select{} It happened because replication upstream status check occurred too early, when it was only in 'disconnected' state. To give the replication status check routine ability to reach the needed 'follow' state, it need to wait for it using test_run:wait_upstream() routine. Closes #5234
-