- Jul 09, 2019
-
-
Vladislav Shpilevoy authored
Before the patch it was split in two parts by 1.5KB packet, and in the constructor it was nullifying the whole volume. Obviously, these were mistakes. The first problem breaks cache locality, the second one flushes the cache.
-
Vladislav Shpilevoy authored
Before the patch each SWIM member had two preallocated task objects, 3KB in total. It was a waste of memory, because network load per member in SWIM is ~2 messages per round step regardless of cluster size. This patch moves the tasks to a pool, where they can be reused. Even by different SWIM instances running on the same node.
-
Serge Petrenko authored
The test regarding logging corrupted rows failed occasionally with ``` [016] test_run:grep_log('default', 'Got a corrupted row.*') [016] --- [016] -- 'Got a corrupted row:' [016] +- null [016] ... ``` The logs then had ``` [010] 2019-07-06 19:36:16.857 [13046] iproto sio.c:261 !> SystemError writev(1), called on fd 23, aka unix/:(socket), peer of unix/:(socket): Broken pipe ``` instead of the expected message. This happened, because we closed a socket before tarantool could write a greeting to the client, the connection was then closed, and execution never got to processing the malformed request and thus printing the desired message to the log. To fix this, actually read the greeting prior to writing new data and closing the socket. Follow-up #4273
-
Oleg Babin authored
Closes #4323 @TarantoolBot document Title: fio.utime fio.utime (filepath [, atime [, mtime]]) Set access and modification times of a file. The first argument is the filename, the second argument (atime) is the access time, and the third argument (mtime) is the modification time. Both times are provided in seconds since the epoch. If the modification time is omitted, the access time provided is used; if both times are omitted, the current time is used.
-
Vladimir Davydov authored
When a memtx transaction is aborted on yield, it isn't enough to rollback individual statements - we must also run on_rollback triggers, otherwise changes done to the schema by an aborted DDL transaction will be visible to other fibers until an attempt to commit it is made.
-
Alexander V. Tikhonov authored
The test case has two problems that appear from time to time and lead to flaky fails. Those fails are look as shown below in a test-run output. | Test failed! Result content mismatch: | --- box/net.box.result Mon Jun 24 17:23:49 2019 | +++ box/net.box.reject Mon Jun 24 17:51:52 2019 | @@ -1404,7 +1404,7 @@ | ... | test_run:grep_log('default', 'ER_INVALID_MSGPACK.*') | --- | -- 'ER_INVALID_MSGPACK: Invalid MsgPack - packet body' | +- 'ER_INVALID_MSGPACK: Invalid MsgPack - packet length' | ... | -- gh-983 selecting a lot of data crashes the server or hangs the | -- connection 'ER_INVALID_MSGPACK.*' regexp should match 'ER_INVALID_MSGPACK: Invalid MsgPack - packet body' log message, but if it is not in a log file at a time of grep_log() call (just don't flushed to the file yet) a message produced by another test case can be matched ('ER_INVALID_MSGPACK: Invalid MsgPack - packet length'). The fix here is to match the entire message and check for the message periodically during several seconds (use wait_log() instead of grep_log()). Another problem is the race between writing a response to an iproto socket on a server side and closing the socket on a client end. If tarantool is unable to write a response, it does not produce the warning re invalid msgpack, but shows 'broken pipe' message instead. We need first grep for the message in logs and only then close the socket on a client. The similar problem (with another test case) is described in [1]. [1]: https://github.com/tarantool/tarantool/issues/4273#issuecomment-508939695 Closes: #4311
-
- Jul 08, 2019
-
-
Vladimir Davydov authored
Both commit and rollback triggers are currently added to the list head. As a result, they are both run in the reverse order. This is correct for rollback triggers, because this matches the order in which statements that added the triggers are rolled back, but this is wrong for commit triggers. For example, suppose we create a space and then create an index for it in the same transaction. We expect that on success we first run the trigger that commits the space and only then the trigger that commits the index, not vice versa. That said, reverse the order of commit triggers in the scope of preparations for transactional DDL.
-
Vladimir Davydov authored
Changes done to an altered space while a new index is being built or the format is being checked are propagated via an on_replace trigger. The problem is there may be transactions that started before the alter request. Their working set can't be checked so we simply abort them. We can't abort transactions that have reached WAL so we also call wal_sync() to flush all pending WAL requests. This is a yielding operation and we call it even if there's no transactions that need to be flushed. As a result, vinyl space alter yields unconditionally, even if the space is empty and there is no pending transactions affecting it. This prevents us from implementing transactional DDL. Let's call wal_sync() only if there's actually at least one pending transaction affecting the altered space and waiting for WAL.
-
Serge Petrenko authored
Add a decimal library to lua. Part of #692 @TarantoolBot document Title: Document decimal module in lua. First of all, you have to require the package via `decimal = require('decimal')` Now you can construct decimals via `new` method. Decimals may be constructed from lua numbers, strings, unsigned and signed 64 bit integers. Decimal is a fixed-point type with maximum 38 digits of precision. All the calculations are exact, so, be careful when constructing decimals from lua numbers: they may hold only 15 decimal digits of precision. You are advised to construct decimals from strings, since strings represent decimals exactly, and vice versa. ``` a = decimal.new(123e-7) b = decimal.new('123.456') c = decimal.new('123.456e2') d = decimal.new(123ULL) e = decimal.new(2) ``` The allowed operations are addition, subtraction, division, multiplication and power. If at least one of the operands is decimal, decimal operations are performed. The other operand may be either decimal or string, containing a number representation, or a lua number. Operations only fail on an overflow, i.e. when result exceeds 10^38 - 1. This includes division by zero. In these cases an error `Operation failed` is raised. Underflow is also possible, when precision needed to store the exact result exceeds 38 digits. Underflow is not an error. When an underflow happens, the result is rounded to 38 digits of precision. ``` a = decimal.new(123e-7) b = decimal.new('123.456') c = decimal.new('123.456e2') d = decimal.new(123ULL) e = decimal.new(2) ``` ``` tarantool> a + b --- - '123.456012300000000' ... tarantool> c - d --- - '12222.6' ... tarantool> c / b --- - '100' ... tarantool> d * d --- - '15129' ... tarantool> d ^ 2 --- - '15129' ... tarantool> 2 ^ d --- - '10633823966279326983230456482242756608'... tarantool> e ^ d --- - '10633823966279326983230456482242756608' ... ``` The following math functions are also supported: log10, ln, exp, sqrt. When specified as `decimal.opname()`, operations may be performed on strings and lua numbers. ``` f = decimal.new(100) tarantool> decimal.log10(f) --- - '2' ... tarantool> decimal.sqrt(f) --- - '10' ... tarantool> e2 = decimal.exp(2) --- ... tarantool> decimal.ln(e2) --- - '2.0000000000000000000000000000000000000' ... There are also `abs` and `tostring` methods, and an unary minus operator, which are pretty self-explanatory. ``` tarantool> a = decimal.new('-5') --- ... tarantool> a --- - '-5' ... tarantool> decimal.abs(a) --- - '5' ... tarantool> -a --- - '5' ... tostring(a) --- - '-5' ... ``` `decimal.precision`, `decimal.scale` and `decimal.round` : The first two methods return precision, i.e. decimal digits in number representation, and scale, i.e. decimal digits after the decimal point in the number representation. `decimal.round` rounds the number to the given scale. ``` tarantool> a = decimal.new('123.456789') --- ... tarantool> decimal.precision(a) --- - 9 ... tarantool> decimal.scale(a) --- - 6 ... tarantool> decimal.round(a, 4) --- - '123.4568' ... ``` Comparsions: `>`, `<`, `>=`, `<=`, `==` are also legal and work as expected. You may compare decimals with lua numbers or strings. In that case comparsion will happen after the values are converted to decimal type.
-
Serge Petrenko authored
A ffi metatype has a CTypeID, which can be used to push cdata of the type on the lua stack, and has an associated metatable, automatically applied to every created member of the type. This allows the behavior similar to pushing userdata and assigning a metatable to it. Needed for #692
-
Serge Petrenko authored
Use printf "%g" option instead of "%f" to trim traling zeros in such cases: decimal_from_double(1) -> '1.000000000000000' -> decimal_from_string() Now it should be decimal_from_double(1) -> '1' ->decimal_from_string() Follow-up 6d62c6c1
-
Serge Petrenko authored
While arithmetic operations do not return infinities or NaNs, it is possbile to construct an invalid decimal value from strings 'Infinity', 'NaN' and similar. Some decimal mathematic functions may also result in an infinity, say, ln(0) yields '-Infinity'. So, add checks that the number is not a NaN or infinity after each operation, so that the operation either returns an error, or a valid finite decimal number. Follow-up 6d62c6c1
-
Serge Petrenko authored
Turns out decNumberLn hangs when result is subnormal, according to the current context settings. To fix this, reset minimal allowed exponent to a smaller value during the ln operation and round the result afterwards. Follow-up 6d62c6c1
-
Vladimir Davydov authored
Under certain circumstances vy_slice_new() may create an empty slice, e.g. on range split: |------------------ Slice ---------------| |---- Run -----| + split key |---- Slice 1 ----||------ Slice 2 ------| ^^^^^^^ Empty vy_range_update_compaction_priority() uses the size of the last slice in a range as a base for LSM tree level sizing. If the slice size happens to be 0, it will simply hang in an infinite loop. Fix this potential hang by using 1 if the last slice size is 0.
-
Konstantin Osipov authored
-
- Jul 06, 2019
-
-
Alexander V. Tikhonov authored
Current results have some tests with flaky results, which blocks the deploy stage - decided to merge deploy stage into test stage temporary to fix it. Follows up #4156
-
- Jul 05, 2019
-
-
Vladislav Shpilevoy authored
SWIM in the original paper says, that dissemination time of an event is O(log(N)), where N is size of the cluster. It is true, when both ping and ack messages carry dissemination and anti-entropy. Before this patch it wasn't so - only regular pings were carrying something. After this patch the SWIM module has true exponential dissemination speed. Closes #4253
-
Vladislav Shpilevoy authored
One another place consuming most of the tests start up time is useless dissemination of an empty payload, which can be skipped in fact. Consider a cluster of 300 nodes. Each one of them are interconnected manually, and now a test wants to wait for a stabilization, when there are no events. On such a cluster it happens for ~200 round steps till there are no any single event. This is not about big packets, or log() TTD. There may be a few events, may be more, but when a test wants the cluster to be clean, it needs to wait for all the events being done. This patch abuses the fact, that empty payloads can be compared for free, no any single memcmp. If both new and the old payload are empty, then nothing to disseminate. It could help in a real cluster too, if initially there are no payloads. Needed for #4253
-
Vladislav Shpilevoy authored
With following patches some of the tests will work much slower due to significantly increased size of the most of packets. This commit tries to smooth it by * Turning off verbose logs in unit tests; * Using much more light version of UUID comparator. According to the profiler these places increase speed in a couple of times, and at the same time they are simple. Needed for #4253
-
Vladislav Shpilevoy authored
There were tests relying on certain content of SWIM messages. After next patches these conditions won't work without an explicit intervention with error injections. The patchset moves these tests to separate release-disabled files. Part of #4253
-
Vladislav Shpilevoy authored
SWIM sends basically the same message during a round. There was a microoptimization so as not to reassemble the message on each step. Now it is getting harder to support that island of perfectionism, because * Soon all the messages will carry all the sections, including indirect messages. Their body is smaller, so it is not possible to maintain one cached message without reducing its maximal size; * In big-clusters even without any changes a cached message would need to be rebuilt. This is because anti-entropy section won't help much unless it is being changed frequent enough; * In big clusters changes happen often enough to invalidate the cached message constantly, unless SWIM would had maintained what members are included into the cache, and which are not. Then change of a member, not included into the message, would not affect the cache. But it would complicate the code too much. Part of #4253
-
Vladislav Shpilevoy authored
With a certain random seed sometimes a member was checked for a status in a wrong moment of time.
-
Vladislav Shpilevoy authored
The previous commit solves one important problem with too long event dissemination. Events could for too long time occupy the whole UDP packet. Now they live log() time, but 'dead' and 'left' members were bound to TTD. Such members were deleted after TTD is 0. Now they are deleted to early. Cluster nodes too early forget about dead ones, and nodes not aware of death of the latters, can accidentally resurrect them via anti-entropy. Cluster nodes need to be suspicious when someone tells them to add a new not dead member. This patch makes SWIM add a new member in two cases only: manually and if an ACK was received from it. A new member can't be added indirectly via events and anti-entropy anymore. Instead, a ping is sent to the members who are said to be new and alive. If ACK is received directly from them, then they are added. The patch does not affect updates. They are still indirect, because if something has updated in an existing member, then it is definitely alive. Part of #4253
-
Vladislav Shpilevoy authored
Before the patch there was a problem of events and anti-entropy starvation, when a cluster generates so many events, that they consume the whole UDP packet. A packet fits up to 26 events. If during the event storm something important happens, that event is likely to be lost, and not disseminated until the storm is over. Sadly, there is no way to prevent a storm, but it can be made much shorter. For that the patch makes TTD of events logarithmic instead of linear of cluster size. According to the SWIM paper and to experiments the logarithm is really enough. Linear TTD was a redundant overkill. When events live shorter, it does not solve a problem of the events starvation - still some of them can be lost in case of a storm. But it frees some space for anti-entropy, which can finish dissemination of lost events. Experiments in a simulation of a cluster with 100 nodes showed, that a failure dissemination happened in ~110 steps if there is a storm. Basically, no dissemination at all. After the patch it is ~20 steps. So it is logarithmic as it should be, although with a bigger constant than without a storm. Part of #4253
-
Alexander V. Tikhonov authored
Homebrew now contains curl-7.65.1 which affected by curl/curl#3995 (this problem leads to segfaults). The next version is not released yet. The current commit downgrades the curl version to 7.65.0. Close #4288
-
Serge Petrenko authored
This patch adds a stack cleanup after a trigger is run and its return values, if any, have been read. This problem was found in a case when on_schema_init trigger set an on_replace trigger on a space, and the trigger ran during recovery. This lead to Lua stack overflows for the aforementioned reasons. Closes #4275
-
Vladimir Davydov authored
Now, as we don't need to take the schema lock for checkpointing, it is only used to synchronize concurrent space modifications (drop, truncate, alter). Actually, a global lock is a way too heavy means to achieve this goal, because we only care about forbidding concurrent modifications of the same space while concurrent modifications of different spaces should work just fine. So this patch replaces the global schema lock with per space locking. A space lock is held while alter_space_do() is in progress so as to make sure that while AlterSpaceOp::prepare() is performing a potentially yielding operation, such as building a new index, the space struct doesn't get freed from under our feet. Note, the lock is released right after index build is complete, before the transaction is committed to WAL, so if the transaction is non-yielding it can modify the space again in the next statement (this is impossible now, but will be done in the scope of the transactional DDL feature). If alter_space_do() sees that the space is already locked it bails out and throws an error. This should be fine, because long-lasting operation involving schema change, such as building an index, are rare and only performed under the supervision of the user so throwing an error rather than waiting seems to be adequate. Removal of the schema lock allows us to remove latch_steal() helper and on_begin_stmt txn trigger altogether, as they were introduced solely to support locking. This is a prerequisite for transactional DDL, because it's unclear how to preserve the global schema lock while allowing to combine several DDL statements in the same transaction.
-
Vladimir Davydov authored
Currently, we always log a vinyl index creation in the vylog file synchronously, i.e. wait for the write to complete successfully. This makes any index creation a yielding operation, even if the target space is empty. To implement transactional DDL for non-yielding statements, we need to eliminate yields in this case. We can do that by simply using vy_log_try_commit() instead of vy_log_commit() for logging index creation, because we can handle a missing VY_LOG_PREPARE_INDEX record during recovery - the code was left since before commit dd0827ba ("vinyl: log new index before WAL write on DDL") which split index creation into PREPARE and COMMIT stages so all we need to do is slightly modify the test. The reason why I'm doing this now, in the series removing the schema lock, is that removal of the schema lock without making space truncation non-yielding (remember space truncation basically drops and recreates all indexes) may result in a failure while executing space.truncate() from concurrent fibers, which is rather unexpected. In particular, this is checked by engine/truncate.test.lua. So to prevent the test failure once the schema lock is removed (see the next patch), let's make empty index creation non-yielding right now.
-
Vladimir Davydov authored
Memtx checkpointing proceeds as follows: first we open iterators over primary indexes of all spaces and save them to a list, then we start a thread that uses the iterators to dump space contents to a snap file. To avoid accessing a freed tuple, we put the small allocator to the delayed free mode. However, this doesn't prevent an index from being dropped so we also take the schema lock to lock out any DDL operation that can potentially destroy a space or an index. Note, vinyl doesn't need this lock, because it implements index reference counting under the hood. Actually, we don't really need to take a lock - instead we can simply postpone index destruction until checkpointing is complete, similarly to how we postpone destruction of individual tuples. We even have all the infrastructure for this - it's delayed garbage collection. So this patch tweaks it a bit to delay the actual index destruction to be done after checkpointing is complete. This is a step forward towards removal of the schema lock, which stands in the way of transactional DDL.
-
- Jul 04, 2019
-
-
Alexander V. Tikhonov authored
Implemented GitLab CI testing process additionally to existing Travis CI. The new testing process is added to run tests faster. It requires to control a load of machines to avoid flaky fails on timeouts. GitLab CI allows us to run testing on our machines. Created 2 stages for testing and deploying packages. The testing stage contains the following jobs that are run for all branches: * Debian 9 (Stretch): release/debug gcc. * Debian 10 (Buster): release clang8 + lto. * OSX 14 (Mojave): release. * FreeBSD 12: release gcc. And the following jobs that are run of long-term branches (release branches: for now it is 1.10, 2.1 and master): * OSX 13 (Sierra): release clang. * OSX 14 (Mojave): release clang + lto. The deployment stage contains the same jobs as we have in Travis CI. They however just build tarballs and packages: don't push them to S3 and packagecloud. In order to run full testing on a short-term branch one can name it with '-full-ci' suffix. The additional manual work is needed when dependencies are changed in .travis.mk file ('deps_debian' or 'deps_buster_clang_8' goals): | make GITLAB_USER=foo -f .gitlab.mk docker_bootstrap This command pushes docker images into GitLab Registry and then they are used in testing. Pre-built images speed up testing. Fixes #4156
-
Vladimir Davydov authored
The test checks that files left after rebootstrap are removed by the garbage collector. It does that by printing file names to the result file. This is inherently unstable, because should timing change, and we can easily get an extra dump or compaction resulting in a different set of files and hence test failure. Let's rewrite the test so that it checks that files are actually removed using fio.path.exists().
-
Vladimir Davydov authored
Timeout injections are unstable and difficult to use. Injecting a delay is much more convenient.
-
Vladimir Davydov authored
ERROR_INJECT_YIELD yields the current fiber execution by calling fiber_sleep(0.001) while the given error injection is set. ERROR_INJECT_SLEEP suspends the current thread execution by calling usleep(1000) while the given error injection is set.
-
Vladimir Davydov authored
When we implement transactional DDL, txn_last_stmt won't necessarily point to the right statement on commit or rollback so we must avoid using it. Note, replicas are still registered/unregisterd in the on_commit trigger, but that's okay, as we don't really need _cluster space to be in sync with the replica set.
-
Vladimir Davydov authored
When we implement transactional DDL, txn_last_stmt won't necessarily point to the right statement on commit or rollback so we must avoid using it. While we are at it, let's also make sure that changes are propagated to Lua on replace, not on commit, by moving on_alter_space trigger invocation appropriately.
-
Vladimir Davydov authored
When we implement transactional DDL, txn_last_stmt won't necessarily point to the right statement on commit or rollback so we must avoid using it.
-
Vladimir Davydov authored
When we implement transactional DDL, txn_last_stmt won't necessarily point to the right statement on commit or rollback so we must avoid using it.
-
Vladimir Davydov authored
A sequence isn't supposed to roll back to the old value if the transaction it was used in is aborted for some reason. However, if a sequence is dropped, we do want to restore the original value on rollback so that we don't lose it on an unsuccessful attempt to drop the sequence.
-
Vladimir Davydov authored
_space_sequence changes are not rolled back properly. Fix it keeping in mind that our ultimate goal is to implement transactional DDL, which implies that all changes to the schema should be done synchronously, i.e. on_replace, not on_commit.
-
Vladimir Davydov authored
To implement transactional DDL, we must make sure that in-memory schema is updated synchronously with system space updates, i.e. on_replace, not on_commit. Note, to do this in case of the sequence cache, we have to rework the way sequences are exported to Lua - make on_alter_sequence similar to how on_alter_space and on_alter_func triggers are implemented.
-