- Mar 05, 2020
-
-
Serge Petrenko authored
libcurl has a built-in threaded resolver used for asynchronous DNS requests, however, when DNS server is slow to respond, the request still hangs tarantool until it is finished. The reason is that curl calls thread_join on the resolving thread internally upon timeout, making the calling thread hang until resolution has ended. Use c-ares as an asynchronous resolver instead to eliminate the problem. Closes #4591 (cherry picked from commit 23837076)
-
Maria authored
It was possible to leak user password through setting 'replication' configuration option in first box.cfg invocation. This happened due to unconditional logging in load_cfg function. The patch introduces conditional logging. Closes #4493 (cherry picked from commit 3ce08a3e9cf0386d93ecc694aa4c4f99056ae7ca)
-
- Mar 03, 2020
-
-
Serge Petrenko authored
When checking wheter rejoin is needed, replica loops through all the instances in box.cfg.replication, which makes it believe that there is a master holding files, needed by it, since it accounts itself just like all other instances. So make replica skip itself when finding an instance which holds files needed by it, and determining whether rebootstrap is needed. We already have a working test for the issue, it missed the issue due to replica.lua replication settings. Fix replica.lua to optionally include itself in box.cfg.replication, so that the corresponding test works correctly. Closes #4759 (cherry picked from commit dbcfaf70)
-
- Mar 02, 2020
-
-
sergepetrenko authored
We have a mechanism for restoring rows originating from an instance that suffered a sudden power loss: remote masters resend the isntance's rows received before a certain point in time, defined by remote master vclock at the moment of subscribe. However, this is useful only on initial replication configuraiton, when an instance has just recovered, so that it can receive what it has relayed but haven't synced to disk. In other cases, when an instance is operating normally and master-master replication is configured, the mechanism described above may lead to instance re-applying instance's own rows, coming from a master it has just subscribed to. To fix the problem do not relay rows coming from a remote instance, if the instance has already recovered. Closes #4739 (cherry picked from commit ed2e1430)
-
Serge Petrenko authored
Add a filter for relay to skip rows coming from unwanted instances. A list of instance ids whose rows replica doesn't want to fetch is encoded together with SUBSCRIBE request after a freshly introduced flag IPROTO_ID_FILTER. Filtering rows is needed to prevent an instance from fetching its own rows from a remote master, which is useful on initial configuration and harmful on resubscribe. Prerequisite #4739, #3294 @TarantoolBot document Title: document new binary protocol key and subscribe request changes Add key `IPROTO_ID_FILTER = 0x51` to the internals reference. This is an optional key used in SUBSCRIBE request followed by an array of ids of instances whose rows won't be relayed to the replica. SUBSCRIBE request is supplemented with an optional field of the following structure: ``` +====================+ | ID_FILTER | | 0x51 : ID LIST | | MP_INT : MP_ARRRAY | | | +====================+ ``` The field is encoded only when the id list is not empty. (cherry picked from commit 45de9907)
-
Serge Petrenko authored
There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't fire in release builds, of course. Let's at least warn the user on an attempt to write a record with a duplicate or otherwise broken lsn, and not follow such an lsn. Follow-up #4739 (cherry picked from commit e0750262)
-
Serge Petrenko authored
is_orphan status check is needed by applier in order to tell relay whether to send the instance's own rows back or not. Prerequisite #4739 (cherry picked from commit 7b83b73d)
-
- Feb 27, 2020
-
-
Alexander Turenko authored
After #4736 regression fix (in fact it just reverts the new logic in small) it is possible again that a fiber's region may hold a memory for a while, but release it eventually. When the used memory exceeds 128 KiB threshold, fiber_gc() puts 'garbage' slabs back to slab_cache and subtracts them from region_used() metric. But until this point those slabs are accounted in region_used() and so in fiber.info() metrics. This commit fixes flakiness of test cases of the following kind: | fiber.info()[fiber.self().id()].memory.used -- should be zero | <...workload...> | fiber.info()[fiber.self().id()].memory.used -- should be zero The problem is that the first `<...>.memory.used` value may be non-zero. It depends of previous tests that were executed on this tarantool instance. The obvious way to solve it would be print differences between `<...>.memory.used` values before and after a workload instead of absolute values. This however does not work, because a first slab in a region can be almost used at the point where a test case starts and a next slab will be acquired from a slab_cache. This means that the previous slab will become a 'garbage' and will not be collected until 128 KiB threshold will exceed: the latter `<...>.memory.used` check will return a bigger value than the former one. However, if the threshold will be reached during the workload, the latter check may show lesser value than the former one. In short, the test case would be unstable after this change. It is resolved by restarting of a tarantool instance before such test cases to ensure that there are no 'garbage' slabs in a current fiber's region. Note: This works only if a test case reserves only one slab at the moment: otherwise some memory may be hold after the case (and so a memory check after a workload will fail). However it seems that our cases are small enough to don't trigger this situation. Call of region_free() would be enough, but we have no Lua API for it. Fixes #4750. (cherry picked from commit d6cf327f)
-
- Feb 24, 2020
-
-
Vladislav Shpilevoy authored
The bug was in an attempt to update a record in _space_sequence in-place, to add field path and number. This was not properly supported by the system space's trigger, and was banned in the previous patch of this series. But delete + tuple update + insert work fine. The patch uses them. To test it the old disabled and heavily outdated xlog/upgrade.test.lua was replaced with a smaller analogue, which is supposed to be created separately for each upgrade bug. According to the new policy of creating test files. The patch tries to make it easy to add new upgrade tests and snapshots. A new test should consist of fill.lua script to populate spaces, snapshot, needed xlogs, and a .test.lua file. Fill script and binaries should be in the same folder as test file name, which is located in version folder. Like this: xlog/ | + <test_name>.test.lua | +- upgrade/ | +- <version>/ | | | +-<test_name>/ | | | +- fill.lua | +- *.snap | +- *.xlog Version is supposed to say explicitly what a version files in there have. Closes #4771 (cherry picked from commit 6d45a41e)
-
Vladislav Shpilevoy authored
Anyway this does not work for generated sequences. A proper support of update would complicate the code and won't give anything useful. Part of #4771 (cherry picked from commit 1a84b80e)
-
Vladislav Shpilevoy authored
box.internal.bootstrap() before doing anything turns off system space triggers, because it is likely to do some hard changes violating existing rules. And eliminates data from all system spaces to fill it from the scratch. Each time when a new space is added, its erasure and turning off its triggers should have been called explicitly here. As a result it was not done sometimes, by accident. For example, triggers were not turned off for _sequence_data, _sequence, _space_sequence. Content removal wasn't done for _space_sequence. The patch makes a generic solution which does not require manual patching of trigger manipulation and truncation anymore. The bug was discovered while working on #4771, although it is not related. (cherry picked from commit e1c7d25f)
-
Cyrill Gorcunov authored
In case if we unable to revert guard page back to read|write we should never use such slab again. Initially I thought of just put panic here and exit but it is too destructive. I think better print an error and continue. If node admin ignore this message then one moment at future there won't be slab left for use and creating new fibers get prohibited. In future (hopefully near one) we plan to drop guard pages to prevent VMA fracturing and use stack marks instead. Reviewed-by:
Alexander Turenko <alexander.turenko@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> (cherry picked from commit 8d53fadc)
-
Cyrill Gorcunov authored
Both madvise and mprotect calls can fail due to various reasons, mostly because of lack of free memory in the system. We log such cases via say_x helpers but this is not enough. In particular tarantool/memcached relies on diag error to be set to detect an error condition: | expire_fiber = fiber_new(name, memcached_expire_loop); | const box_error_t *err = box_error_last(); | if (err) { | say_error("Can't start the expire fiber"); | say_error("%s", box_error_message(err)); | return -1; | } Thus lets use diag_set() helper here and instead of macros use inline functions for better readability. Fixes #4722 Reported-by:
Alexander Turenko <alexander.turenko@tarantool.org> Reviewed-by:
Alexander Turenko <alexander.turenko@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> (cherry picked from commit c6752297)
-
- Feb 23, 2020
-
-
Vladislav Shpilevoy authored
Before the patch there were the rules: * float +/- double = double * double +/- double = double * float +/- float = float The rules were applied regardless of values. That led to a problem when float + float exceeding maximal float value could fit into double, but was stored as an infinity. The patch makes so that if a floating point arithmetic operation result fits into float, it is stored as float. Otherwise as double. Regardless of initial types. This alongside saves some memory for cases when doubles can be stored as floats, and therefore takes 4 less bytes. Although these cases are rare, because any not integer value stored in a double may have a long garbage tail in its fraction. Closes #4701 (cherry picked from commit fef4fdfc)
-
- Feb 21, 2020
-
-
Alexander V. Tikhonov authored
Our S3 based repositories now reflect packagecloud.io repositories structure. It will allow us to migrate from packagecloud.io w/o much complicating redirection rules on a web server serving download.tarantool.org. Deploy source packages (*.src.rpm) into separate 'SRPM' repository like packagecloud.io does. Changed repository signing key from its subkey to public and moved it to gitlab-ci environment. Follows up #3380 (cherry picked from commit 4dee6890)
-
Alexander V. Tikhonov authored
Enabled Tarantool performance testing on Gitlab-CI for release/master branches and "*-perf" named branches. For this purpose 'perf' and 'cleanup' stages were added into Gitlab-CI pipeline. Performance testing support next benchmarks: - cbench - linkbench - nosqlbench (hash and tree Tarantool run modes) - sysbench - tpcc - ycsb (hash and tree Tarantool run modes) Benchmarks use scripts from repository: http://github.com/tarantool/bench-run Performance testing uses docker images, built with docker files from bench-run repository: - perf/ubuntu-bionic:perf_master -- parent image with benchmarks only - perf_tmp/ubuntu-bionic:perf_<commit_SHA> -- child images used for testing Tarantool sources @Totktonada: Harness and workloads are to be reviewed. (cherry picked from commit 87c68344)
-
- Feb 20, 2020
-
-
Alexander V. Tikhonov authored
Found that on 19.02.2020 APT repositories with packages for Ubuntu 18.10 Cosmic were removed from Ubuntu archive: E: The repository 'http://security.ubuntu.com/ubuntu cosmic-security Release' does not have a Release file. E: The repository 'http://archive.ubuntu.com/ubuntu cosmic Release' does not have a Release file. E: The repository 'http://archive.ubuntu.com/ubuntu cosmic-updates Release' does not have a Release file. E: The repository 'http://archive.ubuntu.com/ubuntu cosmic-backports Release' does not have a Release file. Also found the half a year old message about Ubuntu 18.10 Cosmic EOL: https://fridge.ubuntu.com/2019/07/19/ubuntu-18-10-cosmic-cuttlefish-end-of-life-reached-on-july-18-2019/ Removed the Ubuntu 18.10 Cosmic from gitlab-ci and travis-ci testings. (cherry picked from commit 961e8c5f)
-
Vladislav Shpilevoy authored
The server used to crash when any option argument was passed with a value concatenated to it, like this: '-lvalue', '-evalue' instead of '-l value' and '-e value'. However this is a valid way of writing values, and it should not have crashed regardless of its validity. The bug was in usage of 'optind' global variable from getopt() function family. It is not supposed to be used for getting an option's value. It points to a next argv to parse. Next argv != value of current argv, like it was with '-lvalue' and '-evalue'. For getting a current value there is a variable 'optarg'. Closes #4775 (cherry picked from commit 29cfd564)
-
- Feb 19, 2020
-
-
Vladislav Shpilevoy authored
os.setenv() and os.environ() are Lua API for extern char **environ; int setenv(); The Open Group standardized access points for environment variables. But there is no a word about that environ never changes. Programs can't relay on that. For example, addition of a new variable may cause realloc of the whole environ array, and therefore change of its pointer value. That was exactly the case in os.environ() - it was using value of environ array remembered when Tarantool started. And os.setenv() could realloc the array and turn the saved pointer into garbage. Closes #4733 (cherry picked from commit 954d4bdc)
-
Kirill Yukhin authored
Revert "build: introduce LUAJIT_ENABLE_PAIRSMM flag" Related to #4770 (cherry picked from commit 04dd6f43)
-
- Feb 18, 2020
-
-
Oleg Babin authored
After 7fd6c809 (buffer: port static allocator to Lua) uri started to use static_allocator - cyclic buffer that also is used in several modules. However situation when uri.format output is zero-length string was not handled properly and ffi.string could return data that was previously written in static buffer because use as string terminator the first zero byte. To prevent such situation let's pass result length explicitly. Closes #4779 (cherry picked from commit 57f6fc93)
-
- Feb 15, 2020
-
-
Olga Arkhangelskaia authored
When json.decode is used with 2 arguments, 2nd argument seeps out to the json configuration of the instance. Moreover, due to current serializer.cfg implementation it remains invisible while checking settings using json.cfg table. This fixes commit 6508ddb7 ('json: fix stack-use-after-scope in json_decode()'). Closes #4761 (cherry picked from commit f54f4dc0)
-
Vladislav Shpilevoy authored
box_process_call/eval() in the end check if there is an active transaction. If there is, it is rolled back, and an error is set. But rollback is not needed anymore, because anyway in the end of the request the fiber is stopped, and its not finished transaction is rolled back. Just setting of the error is enough. Follow-up #4662 (cherry picked from commit f5d51448)
-
Vladislav Shpilevoy authored
Fiber.storage was not deleted when created in a fiber started from the thread pool used by IProto requests. The problem was that fiber.storage was created and deleted in Lua land only, assuming that only Lua-born fibers could have it. But in fact any fiber can create a Lua storage. Including the ones used to serve IProto requests. Not deletion of the storage led to a possibility of meeting a non-empty fiber.storage in the beginning of an iproto request, and to not deletion of the memory caught by the storage until its explicit nullification. Now the storage destructor works for any fiber, which managed to create the storage. The destructor unrefs and nullifies the storage. For destructor purposes the fiber.on_stop triggers were reworked. Now they can be called multiple times during fiber's lifetime. After every request done by that fiber. Closes #4662 Closes #3462 @TarantoolBot document Title: Clarify fiber.storage lifetime Fiber.storage is a Lua table created when it is first accessed. On the site it is said that it is deleted when fiber is canceled via fiber:cancel(). But it is not the full truth. Fiber.storage is destroyed when the fiber is finished. Regardless of how is it finished - via :cancel(), or the fiber's function did 'return', it does not matter. Moreover, from that moment the storage is cleaned up even for pooled fibers used to serve IProto requests. Pooled fibers never really die, but nonetheless their storage is cleaned up after each request. That makes possible to use fiber.storage as a full featured request-local storage. Fiber.storage may be created for a fiber no matter how the fiber itself was created - from C, from Lua. For example, a fiber could be created in C using fiber_new(), then it could insert into a space, which had Lua on_replace triggers, and one of the triggers could create fiber.storage. That storage will be deleted when the fiber is stopped. Another place where fiber.storage may be created - for replication applier fiber. Applier has a fiber from which it applies transactions from a remote instance. In case the applier fiber somehow creates a fiber.storage (for example, from a space trigger again), the storage won't be deleted until the applier fiber is stopped. (cherry picked from commit 7692e08f)
-
Vladislav Shpilevoy authored
Fiber.storage is a table, available from anywhere in the fiber. It is destroyed after fiber function is finished. That provides a reliable fiber-local storage, similar to thread-local in C/C++. But there is a problem that the storage may be created via one struct lua_State, and destroyed via another. Here is an example: function test_storage() fiber.self().storage.key = 100 end box.schema.func.create('test_storage') _ = fiber.create(function() box.func.test_storage:call() end) There are 3 struct lua_State: tarantool_L - global always alive state; L1 - Lua coroutine of the fiber, created by fiber.create(); L2 - Lua coroutine created by that fiber to execute test_storage(). Fiber.storage is created on stack of L2 and referenced by global LUA_REGISTRYINDEX. Then it is unreferenced from L1 when the fiber is being destroyed. That is generally ok as soon as the storage object is always in LUA_REGISTRYINDEX, which is shared by all Lua states. But soon during destruction of the fiber.storage there will be only tarantool_L and the original L2. Original L2 may be already deleted by the time the storage is being destroyed. So this patch makes unref of the storage via reliable tarantool_L. Needed for #4662 (cherry picked from commit 5b3e8a72)
-
- Feb 14, 2020
-
-
Cyrill Gorcunov authored
Every new error introduced into error engine cause massive update in test even if only one key is introduced. To minimize diff output better print them in sorted order. Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Reviewed-by:
Alexander Turenko <alexander.turenko@tarantool.org> (cherry picked from commit 95b9a48d)
-
- Feb 06, 2020
-
-
Chris Sosnin authored
We should first check that primary key is not NULL. Closes #4745 (cherry picked from commit e9aa3784)
-
- Feb 05, 2020
-
-
Leonid Vasiliev authored
LuaJIT records traces while interpreting Lua bytecode (considering it's hot enough) in order to compile the corresponding execution flow to a machine code. A Lua/C call aborts trace recording, but an FFI call does not abort it per se. If code inside an FFI call yields to another fiber while recording a trace and the new current fiber interpreting a Lua bytecode too, then unrelated instructions will be recorded to the current trace. In short, we should not yield a current fiber inside an FFI call. There is another problem. Machine code of a compiled trace may sink a value from a Lua state down to a host register, change it and write back only at trace exit. So the interpreter state may be outdated during the compiled trace execution. A Lua/C call aborts a trace and so the code inside a callee always see an actual interpreter state. An FFI call however can be turned into a single machine's CALL instruction in the compiled code and if the callee accesses a Lua state, then it may see an irrelevant value. In short, we should not access a Lua state directly or reenter to the interpreter from an FFI call. The box.rollback_to_savepoint() function may yield and another fiber will be scheduled for execution. If this fiber touches a Lua state, then it may see an inconsistent state and the behaviour will be undefined. Noted that <struct txn>.id starts from 1, because we lean on this fact to use luaL_toint64(), which does not distinguish an unexpected Lua type and cdata<int64_t> with zero value. It seems that this assumption already exists: the code that prepare arguments for 'on_commit' triggers uses luaL_toint64() too (see lbox_txn_pairs()). Fixes #4427 Co-authored-by:
Alexander Turenko <alexander.turenko@tarantool.org> Reviewed-by:
Igor Munkin <imun@tarantool.org> (cherry picked from commit 34234427)
-
- Feb 04, 2020
-
-
Alexander V. Tikhonov authored
We're going to use S3 compatible storage for Deb and RPM repositories instead of packagecloud.io service. The main reason is that packagecloud.io provides a limited amount of storage, which is not enough for keeping all packages (w/o regular pruning of old versions). Note: At the moment packages are still pushed to packagecloud.io from Travis-CI. Disabling this is out of scope of this patch. This patch implements saving of packages on an S3 compatible storage and regeneration of a repository metadata. The layout is a bit different from one we have on packagecloud.io. packagecloud.io: | - 1.10 | - 2.1 | - 2.2 | - ... S3 compatible storage: | - live | - 1.10 | - 2.1 | - 2.2 | - ... | - release | - 1.10 | - 2.1 | - 2.2 | - ... Both 'live' and 'release' repositories track release branches (named as <major>.<minor>) and master branch. The difference is that 'live' is updated on every push, but 'release' is only for tagged versions (<major>.<minor>.<patch>.0). Packages are also built on '*-full-ci' branches, but only for testing purposes: they don't pushed anywhere. The core logic is in the tools/update_repo.sh script, which implements the following flow: - create metadata for new packages - fetch relevant metadata from the S3 storage - push new packages to the S3 storage - merge and push the updated metadata to the S3 storage The script uses 'createrepo' for RPM repositories and 'reprepro' for Deb repositories. Closes #3380 (cherry picked from commit 05d3ed4b)
-
- Jan 29, 2020
-
-
Mergen Imeev authored
This patch makes the INSTEAD OF DELETE trigger work for every row in VIEW. Prior to this patch, it worked only once for each group of non-unique rows. Also, this patch adds tests to check that the INSTEAD OF UPDATE trigger work for every row in VIEW. Closes #4740 (cherry picked from commit 6ddccda4)
-
Kirill Yukhin authored
Revert "Free all slabs on region reset" commit. Closes #4736 (cherry picked from commit fc8d42f50073f9b4f1510ce55ee514af14f672af)
-
- Jan 24, 2020
-
-
Serge Petrenko authored
Update decNumber library to silence the build warning produced on too long integer constant. (cherry picked from commit aab03a73)
-
Kirill Yukhin authored
Fix build on Mac with gcc and XCode 11 Part of https://github.com/tarantool/tarantool/issues/4580
-
- Jan 21, 2020
-
-
Cyrill Gorcunov authored
Test multireturn in lua output mode and lack of a parameter in '\set output <...>' command. Co-developed-by:
Alexander Turenko <alexander.turenko@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by:
Alexander Turenko <alexander.turenko@tarantool.org> (cherry picked from commit cdf502c6)
-
Cyrill Gorcunov authored
In case if output format is not specified we should exit with more readable error message. Fixes #4638 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by:
Alexander Turenko <alexander.turenko@tarantool.org> (cherry picked from commit 9cc2c9c5)
-
Cyrill Gorcunov authored
Currently we handle only first member of multireturn statement. Fix it processing each element separately. n.b.: While at this file add vim settings. | tarantool> \set output lua | true; | tarantool> 1,2,3,4 | 1, 2, 3, 4; Fixes #4604 Reported-by:
Alexander Turenko <alexander.turenko@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by:
Alexander Turenko <alexander.turenko@tarantool.org> (cherry picked from commit d7cbd007)
-
Vladislav Shpilevoy authored
Transaction adds a redo log for each statement. The log is an xrow header. Some requests don't have a header (local requests), some do (from remote client, from replication). When a request had a header, it was written as is to WAL. But requests from remote client have an xrow header, however barely filled. Most of its fields are default values, usually 0. Including group id. Indeed, remote clients should not care about setting such deep system fields. That led to a problem when a space had group id local (!= 0), but it was ignored because in a request header from a remote client the group id was default (== 0). On the summary, it was possible to force Tarantool to replicate a replica-local space. Now group id setting is server-authoritative. Box always sets it regardless of what is present in an xrow header received from a client. Thanks Kostja Osipov (@kostja) for the diagnostics and the solution. Closes #4729 (cherry picked from commit 3d0f12968a8f5349eedb1778d6e29950f04785d5)
-
- Jan 17, 2020
-
-
Chris Sosnin authored
'pragma collation_list' uses _collation space, although user may have no access to it. Thus, we replace it with the corresponding view. Closes #4713 (cherry picked from commit 28370f19)
-
- Jan 16, 2020
-
-
Oleg Babin authored
Usually functions return pair `nil, err` and expected that err is string. Let's make the behaviour of error object closer to string and define __concat metamethod. The case of error "error_mt.__concat(): neither of args is an error" is not covered by tests because of #4723 Closes #4489 (cherry picked from commit 935db173)
-
- Jan 14, 2020
-
-
Maria authored
Struct of type tuple_format is being passed as an argument to tuple_format_unref() where it might be freed. On such occasion any further references to format fields should not take place. Acked-by:
Cyrill Gorcunov <gorcunov@gmail.com> Closes #4658 (cherry picked from commit c08b94ed)
-