- Apr 23, 2019
-
-
Kirill Shcherbatov authored
Previously Tarantool used to raise the confusing error message in case of invalid usage of the httpc module. Fixed to follow the current module API. Closes #4136
-
Kirill Shcherbatov authored
The tuple:update() used to work incorrectly in case of empty tuple produced with box.tuple.new{} because update_create_rope unconditionally initialized a new rope with [tuple_data, mp_next(tuple_data)] field that might not exists. Closes #4041
-
Roman Khabibov authored
According to the ANSI standard, ltrim, rtrim and trim should be merged into one unified TRIM() function. The specialization of trimming (left, right or both and trimming characters) determined in arguments of this function. Closes #3879 @TarantoolBot document Title: TRIM() function Modify signature of SQL function TRIM(). This function removes characters included in <trim character> (binary) string from <trim source> (binary) string until encounter a character that doesn't belong to <trim character>. Removal occurs on the side, specified by <trim specification>. Now, syntax is following: TRIM([ [ <trim specification> ] [ <trim character> ] FROM ] <trim source>). <trim specification> can be one of the following keywords: LEADING, TRAILING and BOTH. <trim character> is the set of trimming characters. <trim source> is the string, that will be trimmed. If FROM is specified, then: 1) Either <trim specification> or <trim character> or both shall be specified. 2) If <trim specification> is not specified, then BOTH is implicit. 3) If <trim character> is not specified, then ' ' is implicit.
-
- Apr 19, 2019
-
-
Kirill Yukhin authored
The patch sets format for spaces with sysview engine. This is done due to following reasons: 1. Since an SQL view looks into underneath space's format, set it for spaces with sysview engine. Before the patch, spaces with sysview enginge didn't have its own tuples and hence didn't need to have a format. 2. To use sysview engine to deal with SQL views. This will allow to use sysview machinery to query SQL views from Lua land. Closes #4111
-
- Apr 18, 2019
-
-
Vladislav Shpilevoy authored
Payload is arbitrary user data disseminated over the cluster along with other member attributes. Part of #3234
-
Vladislav Shpilevoy authored
TTL is time-to-live and it slightly confuses when is said about a member's attribute. Status_ttl looks like after this value gets 0 the status is deleted or is no longer valid. TTD is more precise definition for these counters and is expanded as time-to-disseminate.
-
Vladimir Davydov authored
- Add REQUESTS.current to report the number of requests currently in flight, because it's useful for understanding whether we need to increase box.cfg.net_msg_max. - Add REQUESTS.{rps,total}, because knowing the number of requests processed per second can come in handy for performance analysis. - Add CONNECTIONS.{rps,total} that show the number of connections opened per second and total. Those are not really necessary, but without them the output looks kinda lopsided. Closes #4150 @TarantoolBot document Title: Document new box.stat.net fields Here's the list of the new fields: - `CONNECTIONS.rps` - number of connections opened per second recently (for the last 5 seconds). - `CONNECTIONS.total` - total number of connections opened so far. - `REQUESTS.current` - number of requests in flight (this is what's limited by `box.cfg.net_msg_max`). - `REQUESTS.rps` - number of requests processed per second recently (for the last 5 seconds). - `REQUESTS.total` - total number of requests processed so far. `CONNECTIONS.rps`, `CONNECTIONS.total`, `REQUESTS.rps`, `REQUESTS.total` are reset by `box.stat.reset()`. Example of the new output: ``` --- - SENT: total: 5344924 rps: 840212 CONNECTIONS: current: 60 rps: 148 total: 949 REQUESTS: current: 17 rps: 1936 total: 12139 RECEIVED: total: 240882 rps: 38428 ... ```
-
- Apr 17, 2019
-
-
Konstantin Osipov authored
increment min range size to 128MB to reduce the amount of open files per process in a typical install.
-
d.sharonov authored
Fixes #4165
-
- Apr 16, 2019
-
-
Vladimir Davydov authored
In contrast to vinyl_iterator, vinyl_index_get doesn't take a reference to the LSM tree while reading from it. As a result, if the LSM tree is dropped in the meantime, vinyl_index_get will crash. Fix this issue by surrounding vy_get with vy_lsm_ref/unref. Closes #4109
-
Vladimir Davydov authored
To propagate changes applied to a space while a new index is being built, we install an on_replace trigger. In case the on_replace trigger callback fails, we abort the DDL operation. The problem is the trigger may yield, e.g. to check the unique constraint of the new index. This opens a time window for the DDL operation to complete and clear the trigger. If this happens, the trigger will try to access the outdated build context and crash: | #0 0x558f29cdfbc7 in print_backtrace+9 | #1 0x558f29bd37db in _ZL12sig_fatal_cbiP9siginfo_tPv+1e7 | #2 0x7fe24e4ab0e0 in __restore_rt+0 | #3 0x558f29bfe036 in error_unref+1a | #4 0x558f29bfe0d1 in diag_clear+27 | #5 0x558f29bfe133 in diag_move+1c | #6 0x558f29c0a4e2 in vy_build_on_replace+236 | #7 0x558f29cf3554 in trigger_run+7a | #8 0x558f29c7b494 in txn_commit_stmt+125 | #9 0x558f29c7e22c in box_process_rw+ec | #10 0x558f29c81743 in box_process1+8b | #11 0x558f29c81d5c in box_upsert+c4 | #12 0x558f29caf110 in lbox_upsert+131 | #13 0x558f29cfed97 in lj_BC_FUNCC+34 | #14 0x558f29d104a4 in lua_pcall+34 | #15 0x558f29cc7b09 in luaT_call+29 | #16 0x558f29cc1de5 in lua_fiber_run_f+74 | #17 0x558f29bd30d8 in _ZL16fiber_cxx_invokePFiP13__va_list_tagES0_+1e | #18 0x558f29cdca33 in fiber_loop+41 | #19 0x558f29e4e8cd in coro_init+4c To fix this issue, let's recall that when a DDL operation completes, all pending transactions that affect the altered space are aborted by the space_invalidate callback. So to avoid the crash, we just need to bail out early from the on_replace trigger callback if we detect that the current transaction has been aborted. Closes #4152
-
Roman Khabibov authored
Added a check whether box.cfg() is called within an instance file. If box.cfg() is missed, point a user the reason of a fail explicitly. Before this commit the error was look so: /usr/bin/tarantoolctl:541: attempt to index a nil value Closes #3953
-
- Apr 12, 2019
-
-
Vladislav Shpilevoy authored
In the next patch on payloads it is wanted to drop only packets containing certain sections such as anti-entropy, dissemination. New SWIM test transport filters allow to implement this with ease. Part of #3234
-
Vladislav Shpilevoy authored
At this moment SWIM test harness implements its own fake file descriptor table, which is used unawares by the real SWIM code. Each fake fd has send and recv queues, can delay and drop packets with a certain probability. But it is going to be not enough for new tests. It is wanted to be able to drop packets with a specified content, from and to a specified direction. For that the patch implements a filtering mechanism. Each fake fd now has a list of filters, applied one by one to each packet. If at least on filter wants to drop a packet, then it is dropped. The filters know packet content and direction: outgoing or incomming. Now only one filter exists - drop rate. It existed even before the patch, but now it is ported on the new API. Part of #3234
-
Alexander Turenko authored
Before this commit it always returns false. Fixes #4091.
-
Serge Petrenko authored
The test is flaky under high load (e.g. when is run in parallel with a lot of workers). Make it less dependent on arbitrary timeouts to improve stability. Part of #4134
-
Serge Petrenko authored
This part of the test is flaky when tests are run in parallel, besides, it is quite big on its own, so extract it into a separate file to add more flexibility in running tests and to make finding problems easier. Part of #4134
-
Nikita Pettik authored
Before this patch SQL statement which involves FK constraints creation or drop didn't increment rowcount: box.execute("ALTER TABLE t ADD CONSTRAINT fk1 FOREIGN KEY (b) REFERENCES parent (a);") --- - rowcount: 0 ... This patch fixes this misbehaviour: accidentally VDBE was forgotten to enable counting changes during ALTER TABLE ADD/DROP constraint. Closes #4130
-
avtikhon authored
Disabled wal_off/iterator_lt_gt.test.lua test due to performance test need to be reorganized into separate mode at the standalone host. Currently this test doesn't show any issue, but breaks the testing some time, with errors like: [010] wal_off/iterator_lt_gt.test.lua [ fail ] [010] [010] Test failed! Result content mismatch: [010] --- wal_off/iterator_lt_gt.result Fri Apr 12 10:30:43 2019 [010] +++ wal_off/iterator_lt_gt.reject Fri Apr 12 10:36:30 2019 [010] @@ -79,7 +79,9 @@ [010] ... [010] too_longs [010] --- [010] -- [] [010] +- - 'Some of the iterators takes too long to position: 0.074278' [010] + - 'Some of the iterators takes too long to position: 0.11786' [010] + - 'Some of the iterators takes too long to position: 0.053848' [010] ... [010] s:drop() [010] --- [010] [010] Last 15 lines of Tarantool Log file [Instance "wal"][/tarantool/test/var/010_wal_off/wal.log]: See #2539
-
Vladimir Davydov authored
After we retrieve a statement from a secondary index, we always do a lookup in the primary index to get the full tuple corresponding to the found secondary key. It may turn out that the full tuple doesn't match the secondary key, which means the key was overwritten, but the DELETE statement hasn't been propagated yet (aka deferred DELETE). Currently, there's no way to figure out how often this happens as all tuples read from an LSM tree are accounted under 'get' counter. So this patch splits 'get' in two: 'get', which now accounts only tuples actually returned to the user, and 'skip', which accounts skipped tuples.
-
- Apr 11, 2019
-
-
Vladislav Shpilevoy authored
Appeared, that it is not called. But probably it should be, in order to catch more errors.
-
Vladimir Davydov authored
Currently, latency accounting and warning lives in vy_point_lookup and vy_read_iterator_next. As a result, we don't take into account full by partial tuple lookup in it while it can take quite a while, especially if there are lots of deferred DELETE statements we have to skip. So this patch moves latency accounting to the upper level, namely to vy_get and vinyl_iterator_{primary,secondary}_next. Note, as a side effect, now we always print full tuples to the log on "too long" warning. Besides, we strip LSN and statement type as those don't make much sense anymore.
-
Vladimir Davydov authored
box.stat.SELECT accounts index.get and index.select, but not index.pairs, which is confusing since pairs() may be used even more often than select() in a Lua application.
-
Konstantin Osipov authored
SQL is still using a sqlite legacy enum and not enum field_type from NoSQL to identify types. This creates a mess with type identification, when the original column/literal type is lost during expression evaluation. Until we have proper type arithmetics and preserve field_type in expressions, coerce the string return value of typeof() functions, which queries SQL expression value type, with the closest nosql type name. Rename: real -> number text -> string blob -> scalar
-
Vladislav Shpilevoy authored
After turning on a spell checker there were found lots of typos. The commit fixes them.
-
Vladislav Shpilevoy authored
During merge it was accidentally set to too low number. Follow up 8fe05fdd (swim: expose ping broadcast API)
-
Vladislav Shpilevoy authored
The previous commit has introduced an API to broadcast SWIM packets. This commit harnesses it in orider to allow user to do initial discovery in a cluster, when member tables are empty, and UUIDs aren't ready at hand. Part of #3234
-
Vladislav Shpilevoy authored
When a cluster is just created, no one knows anyone. Broadcast helps to establish some initial relationships between members. This commit introduces only an interface to create broadcast tasks from SWIM code. The next commit uses this interface to implement ping broadcast. Part of #3234
-
Vladislav Shpilevoy authored
In the original SWIM paper the incarnation is just a way of refuting old statuses, nothing more. It is not designed as a versioning system of a member and its non-status attributes. But Tarantool harnesses the incarnation for wider range of tasks. In Tarantool's implementation the incarnation (in theory) refutes old statuses, old payloads, old addresses. But appeared, that before the patch an address update did not touch incarnation. Because of that it was possible to rewrite a new address with the old one back. The patch fixes it with a mere increment of incarnation on each address update. The fix is simple because the current SWIM implementation always carries the tuple {incarnation, status, address} together, as a one big attribute. It is not so for payloads, so for them an analogous fix will be much more tricky. Follow-up for f510dc6f (swim: introduce failure detection component)
-
Vladimir Davydov authored
Before rebootstrapping a replica, the admin may delete it from the _cluster space on the master. If he doesn't make a checkpoint after that, rebootstrap will fail with E> ER_LOCAL_INSTANCE_ID_IS_READ_ONLY: The local instance id 2 is read-only This is sort of unexpected. Let's fix this issue by allowing replicas to change their id during join. A note about replication/misc test. The test used this error to check that a master instance doesn't crash in case a replica fails to bootstrap. However, we can simply set mismatching replicaset UUIDs to get the same effect. Closes #4107
-
Vladimir Davydov authored
Currently, the garbage collector works with vclock signatures and doesn't take into account vclock components. This works as long as the caller (i.e. relay) makes sure that it doesn't advance a consumer associated with a replica unless its acknowledged vclock is greater than or equal to the vclock of a WAL file fed to it. The bug is that it does not - it only compares vclock signatures. As a result, if a replica has some local changes or changes pulled from other members of the cluster, which render its signature greater, the master may remove files that are still needed by the replica, permanently breaking replication and requiring rebootstrap. I guess the proper fix would be teaching the garbage collector operate on vclock components rather than signatures, but it's rather difficult to implement. This patch is a quick fix, which simply replaces vclock signature comparison in relay with vclock_compare. Closes #4106
-
Vladimir Davydov authored
L1 runs are usually the most frequently read and smallest runs at the same time so we gain nothing by compressing them. Closes #2389
-
- Apr 10, 2019
-
-
Vladimir Davydov authored
The test fixes the following two test failures: | --- vinyl/errinj_ddl.result Tue Mar 19 17:52:48 2019 | +++ vinyl/errinj_ddl.reject Tue Mar 19 19:05:36 2019 | @@ -358,7 +358,7 @@ | ... | s.index.sk:stat().memory.rows | --- | -- 27 | +- 23 | ... | test_run:cmd('restart server default') | fiber = require('fiber') This happens, because creation of the test index can happen later than we expect. Fix it by adding an appropriate wait_cond. | --- vinyl/errinj_ddl.result Tue Mar 19 17:52:48 2019 | +++ vinyl/errinj_ddl.reject Tue Mar 19 18:07:55 2019 | @@ -504,6 +504,7 @@ | ... | _ = s1:create_index('sk', {parts = {2, 'unsigned'}}) | --- | +- error: Tuple field 2 required by space format is missing | ... | errinj.set("ERRINJ_VY_READ_PAGE_TIMEOUT", 0) | --- This one is due to a test transaction completing before DDL starts so that the transaction isn't aborted by DDL, as we expect. Fix it by making sure the transaction won't commit before DDL starts, again with the aid of wait_cond. | --- vinyl/errinj_ddl.result Wed Apr 10 18:59:57 2019 | +++ vinyl/errinj_ddl.reject Wed Apr 10 19:05:35 2019 | @@ -779,7 +779,7 @@ | ... | ch1:get() | --- | -- Transaction has been aborted by conflict | +- Duplicate key exists in unique index 'i1' in space 'test' | ... | ch2:get() | --- This test case fails, because we use a timeout to stall reading DML operations. This was initially a bad call, because under severe load (e.g. parallel test run), the timeout may fire before we get to execute the DDL request, which is supposed to abort the DML operations, in which case they won't be aborted. Fix this by replacing the timeout with a delay, as we should have done right from the start. Closes #4056 Closes #4057
-
Vladimir Davydov authored
Another failure this time: | [024] —- vinyl/errinj_stat.result Wed Apr 10 14:21:34 2019 | [024] +++ vinyl/errinj_stat.reject Wed Apr 10 14:24:15 2019 | [024] @@ -220,7 +220,7 @@ | [024] ... | [024] box.snapshot() | [024] —- | [024] — error: Error injection 'vinyl dump' | [024] +- error: Snapshot is already in progress | [024] ... | [024] stat = box.stat.vinyl().scheduler | [024] —- | [024] @@ -231,7 +231,7 @@ | [024] ... | [024] stat.tasks_failed > 0 | [024] —- | [024] — true | [024] +- false | [024] ... | [024] errinj.set('ERRINJ_VY_RUN_WRITE', false) | [024] —- Hope it's the last time we fix it. Follow-up commit f4b80bcf ("test: fix vinyl/errinj_stat failure").
-
Vladimir Davydov authored
The patch fixes the following test failures: | [022] --- vinyl/errinj_stat.result Tue Mar 19 17:52:48 2019 | [022] +++ vinyl/errinj_stat.reject Wed Mar 20 08:08:41 2019 | [022] @@ -229,7 +229,7 @@ | [022] ... | [022] stat.tasks_inprogress == 0 | [022] --- | [022] -- true | [022] +- false | [022] ... | [022] stat.tasks_completed == 1 | [022] --- | [013] --- vinyl/errinj_stat.result Tue Mar 19 17:52:48 2019 | [013] +++ vinyl/errinj_stat.reject Wed Mar 20 08:11:15 2019 | [013] @@ -168,7 +168,7 @@ | [013] ... | [013] stat.tasks_inprogress > 0 | [013] --- | [013] -- true | [013] +- false | [013] ... | [013] stat.tasks_completed == 0 | [013] --- | [013] @@ -183,7 +183,7 @@ | [013] ... | [013] box.stat.vinyl().scheduler.tasks_inprogress > 0 | [013] --- | [013] -- true | [013] +- false | [013] ... | [013] errinj.set('ERRINJ_VY_RUN_WRITE_DELAY', false) | [013] --- The problem occurred, because the test didn't make sure that an asynchronous dump/compaction task has actually started/completed. Even box.snapshot() doesn't guarantee that a dump task is complete, in fact. This patch adds wait_cond's to guarantee the test never fails like that anymore. Closes #4059 Closes #4060
-
Alexander Tikhonov authored
When a system in under heavy load (say, when tests are run in parallel) it is possible that disc writes stalls for some time. This can cause a fail of a check that a test performs, so now we retry such checks during 60 seconds until a condition will be met. This change targets replication test suite.
-
Alexander Turenko authored
Needed for parallel running of the test suite. Use default replication_connect_timeout (30 seconds) instead of 0.5 seconds. This don't changes meaning of the test cases. Increase replication_timeout from 0.01 to 0.1. These changes allow to run the test 100 times in 50 parallel jobs successfully.
-
Alexander Turenko authored
All changes are needed to eliminate sporadic fails when testing is run with, say, 30 parallel jobs. First, replication_connect_timeout is increased to 30 seconds. This parameter doesn't change meaning of the test cases. Second, increase replication_timeout from 0.01 to 0.03. We usually set it to 0.1 in tests, but a duration of the gh-3160 test case ('Send heartbeats if there are changes from a remote master only') is around 100 * replication_timeout seconds and we don't want to make this test much longer. Runs of the test case (w/o other ones that are in replication/mics.test.lua) in 30 parallel jobs show that 0.03 is enough for the gh-3160 case to pass stably and hopefully enough for the following test cases too.
-
Alexander Turenko authored
It allows to run `./test-run.py -j 1 replication/misc <...> replication/misc` that can be useful when debugging a flaky problem. This ability was broken after after 7474c14e ('test: enable cleaning of a test environment'), because test-run starts to clean package.loaded between runs and so each time the test is run it calls ffi.cdef() under require('rlimit'). This ffi.cdef() call defines a structure, so a second and following attempts to call the ffi.cdef() will give a Lua error. This commit does not change anything in regular testing, because each test runs once (if other is not stated in a configuration list).
-
Vladislav Shpilevoy authored
Before this patch UUID update was the same as introduction of a new member and waiting until the 'old' is dropped as 'dead' by the failure detection component. It could take 2.5 minutes with the default ack timeout. What is more, with GC turned off it would always result in never deleted old UUID. The patch on a UUID update marks the old UUID as 'left' member. In the best and most common case it guarantees that old UUID will be dropped not later than after 2 complete rounds, and marked as 'left' everywhere for log(cluster_size) round steps. Even with GC turned off. Part of #3234
-