- Aug 07, 2024
-
-
Sergey Bronnikov authored
The patch fixes a number of typos in datetime source code. NO_CHANGELOG=codehealth NO_DOC=codehealth NO_TEST=codehealth (cherry picked from commit c9c5b9f139ccbf5372d1568827fbb50bec7239bb)
-
- Aug 01, 2024
-
-
Georgiy Belyanin authored
Since tarantool/luajit@a16313f large exponent double strings are not considered convertible to number. It broke encoding lua objects to YAML because single quotes weren't considered necessary for decoding. This commit adds wrapping of every string containing infinite double values into a single quotes. Closes #10164 NO_DOC=bug fix (cherry picked from commit 7c3f42590240525d2e543305b6c289ddb30054a2)
-
- Jul 31, 2024
-
-
Maksim Tiushev authored
This patch adds encapsulation for IPv6 addresses in brackets when calling uri.format (as per RFC 2732). Closes #9556 NO_DOC=bugfix (cherry picked from commit a49ec23b85edb684744eb525427465dfa4f660e1)
-
Maksim Tiushev authored
Before this patch `uri.parse(<uri-string-with-ipv6-addres>)` did not work correctly. In particular, it did not parse an IPv6 address if it contained `A-F`. It is a regression caused by commit 1376aad9 ("Refactor src/uri.rl to support RFC3986 and add Lua bindings"). This patch fixes a bug where characters `A-F` are not supported in IPv6. Part of #9556 NO_DOC=bugfix (cherry picked from commit 2eefc56a3e08a60f0b71e33621be851794131546)
-
Alexander Turenko authored
The motivation is to exclude the file from diffs on GitHub's web interface [1] and to exclude it from checkpatch checks [2]. [1]: https://docs.github.com/en/repositories/working-with-files/managing-files/customizing-how-changed-files-appear-on-github [2]: https://github.com/tarantool/checkpatch/pull/75 NO_DOC=the patch is about the development process, it doesn't change anything in the shipped product NO_CHANGELOG=see NO_DOC NO_TEST=see NO_DOC (cherry picked from commit fbea9862b087e161a48dc0ac181a1da06cc0ce9d)
-
- Jul 24, 2024
-
-
Andrey Saranchin authored
When checkpoint fails, we abort it in all engines even if it wasn't started successfully. If it fails right from the start so that checkpoint in memtx wasn't started, assertion in `memtx_engine_abort_checkpoint` fails - memtx doesn't expect that checkpoint will be aborted if it failed to start. Let's do the same thing as vinyl does - no-op if there is no checkpoint in progress. Closes #10265 NO_CHANGELOG=reproducible only with error injection NO_DOC=bugfix (cherry picked from commit 6b484622259c01a2468b1f248dd6f1bcdc227021)
-
Vladimir Davydov authored
We've had a number of issues when Tarantool was permanently broken (unable to recover after restart) because of a bad vylog record. The `force_recovery` mode didn't help so the user would have no other choice but to rebootstrap. A funny thing is those bugs were usually caused by a race between the garbage collector and dump/compaction when a vylog record was written for a dropped index. The worst thing that could happen if we ignored such a bad record is an unused run file not deleted from disk. Apparently, this is better than a permanent recovery failure so let's support the `force_recovery` mode in vylog. The tricky part here is handling checkpoint after restart. The problem is that to create a vylog checkpoint, we load the previous vylog file so we have to ignore errors if it was loaded in the `force_recovery` mode. Closes #10292 NO_DOC=bug fix (cherry picked from commit c68e8a8e029d849d68c6018ed00b5a79cc769222)
-
Vladimir Davydov authored
The vinyl metadata log processor allocates its internal objects either from malloc or region, neither of which should fail for small allocations. Let's switch to xalloc to simplify the code. A good thing about this change is that now we can ignore all errors raised by vy_log_record_decode() and vy_recovery_process_record() if the force_recovery flag is set (see the next commit). Needed for #10292 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit b6f015e98b7c30861dc33032ba7eca47de0cc198)
-
Vladimir Davydov authored
Temporary allocations from a region don't fail so let's use xalloc to simplify the code. NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit 28d51f8076b390c50d5b18f8e767c1eb540e5dcc)
-
Vladimir Davydov authored
Let's log the new value when an error injection is set in orer to ease debugging in tests. NO_DOC=logging NO_TEST=logging NO_CHANGELOG=logging (cherry picked from commit 019bacbe)
-
Ilya Verbin authored
There is no much sense in testing it, but it is sensitive to source code changes, especially `ERRINJ_*_COUNTDOWN` injections, e.g. see commit 697123d0 ("box: use maximal space id instead of _schema.max_id"). Needed for tarantool/tarantool-ee#712 NO_DOC=test NO_CHANGELOG=test (cherry picked from commit dc0fd81c)
-
- Jul 23, 2024
-
-
Vladimir Davydov authored
An index can be dropped while a memory dump is in progress. If the vinyl garbage collector happens to delete the index from the vylog by the time the memory dump completes, the dump will log an entry for a deleted index, resulting in an error next time we try to recover the vylog, like: ``` ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Run 2 committed after deletion ``` or ``` ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Deleted range 9 has run slices ``` We already fixed a similar issue with compaction in commit 29e2931c ("vinyl: fix race between compaction and gc of dropped LSM"). Let's fix this one in exactly the same way: discard the new run without logging it to the vylog on a memory dump completion if the index was dropped while the dump was in progress. Closes #10277 NO_DOC=bug fix (cherry picked from commit ae6a02eb)
-
- Jul 22, 2024
-
-
Vladimir Davydov authored
The tuple formats table may be accessed with `tuple_format_by_id()` from any thread, not just tx. For example, it's accessed by a vinyl writer thread when it deletes a tuple. If a thread happens to access the table while it's being reallocated by tx, see `tuple_format_register()`, the accessing thread may crash with a use-after-free or NULL pointer dereference bug, like the one below: ``` # 1 0x64bd45c09e22 in crash_signal_cb+162 # 2 0x76ce74e45320 in __sigaction+80 # 3 0x64bd45ab070c in vy_run_writer_append_stmt+700 # 4 0x64bd45ada32a in vy_task_write_run+234 # 5 0x64bd45ad84fe in vy_task_f+46 # 6 0x64bd45a4aba0 in fiber_cxx_invoke(int (*)(__va_list_tag*), __va_list_tag*)+16 # 7 0x64bd45c13e66 in fiber_loop+70 # 8 0x64bd45e83b9c in coro_init+76 ``` To avoid that, let's make the tuple formats table statically allocated. This shouldn't increase actual memory usage because system memory is allocated lazily, on page fault. The max number of tuple formats isn't that big (64K) to care about the increase in virtual memory usage. Closes #10278 NO_DOC=bug fix NO_TEST=mt race (cherry picked from commit a2da1de7)
-
Vladislav Shpilevoy authored
Can use the regular applier_apply_tx(), they do the same. The latter is just more protective, but doesn't matter much in this case if the code does a few latch locks. The patch also drops an old test about double-received row panic during final join. The logic is that absolutely the same situation could happen during subscribe, but it was always filtered out by checking replicaset.applier.vclock and skipping duplicate rows. There doesn't seem to be a reason why final join must be any different. It is, after all, same subscribe logic but the received rows go into replica's initial snapshot instead of xlogs. Now it even uses the same txn processing function applier_apply_tx(). The patch also moves `replication_skip_conflict` option setting after bootstrap is finished. In theory, final join could deliver a conflicting row and it must not be ignored. The problem is that it can't be reproduced anyhow without illegal error injection (which would corrupt something in an unrealistic way). But lets anyway move it below bootstrap for clarity. Follow-up #10113 NO_DOC=refactoring NO_CHANGELOG=refactoring (cherry picked from commit da158b9b)
-
Vladislav Shpilevoy authored
No code besides box.cc can now update instance's vclock explicitly. That is a protection against hacks like #9916. Closes #10113 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit 19b2cc20)
-
Vladislav Shpilevoy authored
The goal is to make sure that no files except box.cc can change instance_vclock_storage directly. That leads to all sorts of hacks which in turn lead to bugs - #9916 is a good example. Now applier on final join only sends rows into the journal. The journal then is handled by box.cc where vclock is properly updated. Part of #10113 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit fe338ed4)
-
Vladislav Shpilevoy authored
The function writes a single xrow into the journal in a blocking way. It isn't so simple, so makes sense to keep as a function, especially given that it will be used more in the next commit. Part of #10113 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit 7d10096c)
-
Vladislav Shpilevoy authored
Recovery journal uses word "recovery" to say that it works with xlogs. For snapshot recovery there is bootstrap_journal. Lets use it during local snapshot recovery. The reasoning is that while right now there is no difference, in next commits the recovery_journal will do more. Part of #10113 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit 2620eb9e)
-
Vladislav Shpilevoy authored
Storing vclock of the instance in replicaset.vclock wasn't right. It wasn't vclock of the whole replicaset. It was local to this instance. There is no such thing as "replicaset vclock". The patch moves it to box.h/cc. Part of #10113 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit f1e8e4e1)
-
Vladislav Shpilevoy authored
Applier during the registration waiting (for registering a new ID or a name) could keep doing the master txns received before the registration was started. They could still be inside WAL doing a disk write, when the replica sends a register request. Before this commit, it could cause an assertion failure in debug and a double LSN error in release. The reason was that during the registration waiting the applier treated all incoming txns as "final join" txns. I.e. it wasn't checking if those txns were already received, but not committed yet. During normal subscribe process the appliers (potentially multiple) protect themselves from that by keeping track of the vclocks which are already applied and also being applied right now (replicaset.applier.vclock). Such protection ensures that receiving same row from 2 appliers wouldn't result into its double write. It also protects from the case when a txn was received, goes to WAL, but then the applier reconnects, resubscribes, and gets the same txn again - it shouldn't be applied. The patch makes so that the registration waiting after recovery works like subscribe. Registration during recovery would mean bootstrap via join. And outside of recovery it means the instance is already running. Closes #9916 NO_DOC=bugfix (cherry picked from commit 51751f87)
-
- Jul 18, 2024
-
-
Vladimir Davydov authored
The function `vy_space_build_index`, which builds a new index on DDL, calls `vy_scheduler_dump` on completion. If there's a checkpoint in progress, the latter will wait on `vy_scheduler::dump_cond` until `vy_scheduler::checkpoint_in_progress` is cleared. The problem is `vy_scheduler_end_checkpoint` doesn't broadcast `dump_cond` when it clears the flag. Usually, everything works fine because the condition variable is broadcast on any dump completion, and vinyl checkpoint implies a dump, but under certain conditions this may lead to a fiber hang. Let's broadcast `dump_cond` in `vy_scheduler_end_checkpoint` to be on the safe side. While we are at it, let's also inject a dump delay to the original test to make it more robust. Closes #10267 Follow-up #10234 NO_DOC=bug fix (cherry picked from commit fc3196dc)
-
Nikita Zheleztsov authored
After receiving async transaction from an old term applier_apply_tx exits without unlocking the latch. If the same applier tries to subscribe for replication, it fails with assertion, as the latch is already locked. Let's fix the function, which raises error so that it just sets diag and returns -1. Closes #10073 NO_DOC=bugfix NO_CHANGELOG=no crash on release version (cherry picked from commit 5ce010c5)
-
- Jul 16, 2024
-
-
Lev Kats authored
Now `sio_bind` function prints address into error message directly instead of relying on `fd` used in `bind` that failed to execute. `sio_bind` used `sio_socketname_to_buffer` for error message effectively attempting printing address bound to `fd` while there actually was an error in binding that address to that socket in the first place. Fixes #5925 NO_DOC=bugfix NO_CHANGELOG=minor (cherry picked from commit a5214bfc)
-
Nikita Zheleztsov authored
This test checks, that when PROMOTE from the previous term is encountered we immediately notice split-brain situation and break replication without corrupting data. Closes #9943 NO_DOC=test NO_CHANGELOG=test (cherry picked from commit 06b87e27)
-
Georgiy Lebedev authored
For symmetry with the update of the synchronous replication quorum on insertion into the `_cluster` space, let's reuse the `on_replace_cluster_update_quorum` on_commit trigger. Follows-up #10087 NO_CHANGELOG=<refactoring> NO_DOC=<refactoring> NO_TEST=<refactoring> (cherry picked from commit 9b63ced3)
-
Georgiy Lebedev authored
Currently, we update the synchronous replication quorum from the `on_replace` trigger of the `_cluster` space when registering a new replica. However, during the join process, the replica cannot ack its own insertion into the `_cluster` space. In the scope of #9723, we are going to enable synchronous replication for most of the system spaces, including the `_cluster` space. There are several problems with this: 1. Joining a replica to a 1-member cluster without manual changing of quorum won't work: it is impossible to commit the insertion into the `_cluster` space with only 1 node, since the quorum will equal to 2 right after the insertion. 2. Joining a replica to a 3-member cluster may fail: the quorum will become equal to 3 right after the insertion, the newly joined replica cannot ACK its own insertion into the `_cluster` space — if one out of original 3 nodes fails, then reconfiguration will fail. Generally speaking, it will be impossible to join a new replica to the cluster, if a quorum, which includes the newly added replica (which cannot ACK), cannot be gathered. To solve these problems, let's update the quorum in the `on_commit` trigger. This way we’ll be able to insert a node regardless of the current configuration. This somewhat contradicts with the Raft specification, which requires application of all configuration changes in the `on_replace` trigger (i.e., as soon as they are persisted in the WAL, without quorum confirmation), but still forbids several reconfigurations at the same time. Closes #10087 NO_DOC=<no special documentation page devoted to cluster reconfiguration> (cherry picked from commit 29d1c0fa)
-
- Jul 15, 2024
-
-
Vladimir Davydov authored
There may be more than one fiber waiting on `vy_scheduler::dump_cond`: ``` box.snapshot vinyl_engine_wait_checkpoint vy_scheduler_wait_checkpoint space.create_index vinyl_space_build_index vy_scheduler_dump ``` To avoid hang, we should use `fiber_cond_broadcast`. Closes #10233 NO_DOC=bug fix (cherry picked from commit 30547157)
-
Lev Kats authored
This patch bumped small to the new version that does not trigger UBSan with *_entry* macros and should support new oss-fuzz builder. New commits: * rlist: make its methods accept const arguments * lsregion: introduce lsregion_to_iovec method * rlist: make foreach_enrty_* macros not to use UB Fixes: #10143 NO_DOC=small submodule bump NO_TEST=small submodule bump NO_CHANGELOG=small submodule bump (cherry picked from commit 3e183044)
-
- Jul 08, 2024
-
-
Nikolay Shirokovskiy authored
In this case join will just hang. Instead let's raise an error in case of Lua API and panic in case of C API. Closes #10196 NO_DOC=minor (cherry picked from commit 1e1bf36d)
-
Magomed Kostoev authored
Prior to this patch a bunch of illegal conditions was possible: 1. The joinability of a fiber could be changed while the fiber is being joined by someone. This could lead to double recycling: the first one happened on the fiber finish, and the second one in the fiber join. 2. The joinability of a dead joinable fiber could be altered, this led to inability jo join the dead fiber and free its resources. 3. A running fiber could be joined concurrently by two or more fibers, so the fiber could be recycled more than once (once per each concurrent join). 4. A dead recycled fiber could be made joinable and joined leading to the double recycle. Fixed these issues by adding a new FIBER_JOIN_BEEN_INVOKED flag: now the `fiber_set_joinable` and `fiber_join_timeout` functions detect the double join. Because of the API limitations both of them panic when an invalid condition is met: - The `fiber_set_joinable` was not designed to report errors. - The `fiber_join_timeout` can't raise any error unless a timeout is met, because the `fiber_join` users don't expect to receive any error from this function at all (except the one generated by the joined fiber). It's still possible that a fiber join is performed on a struct which has been recycled and, if the new fiber is joinable too, this can't be detected. The current fiber API does not allow to fix this, so this is to be the user's responsibility, they should be warned about the fact the double join to the same fiber is illegal. Closes #7562 @TarantoolBot document Title: `fiber_join`, `fiber_join_timeout` and `fiber_set_joinable` behave differently now. `fiber_join` and `fiber_join_timeout` now panic in case if double join of the given fiber is detected. `fiber_set_joinable` now panics if the given fiber is dead or is joined already. This prevents some amount of error conditions that could happen when using the API in an unexpected way, including: - Making a dead joinable fiber non-joinable could lead to a memory leak: one can't join the fiber anymore. - Making a dead joinable fiber joinable again is a sign of attempt to join the fiber later. That means the fiber struct may be joined later, when it's been recycled and reused. This could lead to a very hard to debug double join. - Making an alive joined fiber non-joinable would lead to the double free: once on the fiber function finish, and secondly in the active fiber join finish. Risks of making it joinable are described above. - Making a dead and recycled fiber joinable allowed to join the fiber once again leading to a double free. Any given by the API `struct fiber` should only be joined once. If a fiber is joined after the first join on it has finished the behavior is undefined: it can either be a panic or an incidental join to a totally foreign fiber. (cherry picked from commit 44401529)
-
Sergey Kaplun authored
* Correct fix for stack check when recording BC_VARG. * test: remove inline suppressions of _TARANTOOL * FFI: Fix ffi.alignof() for reference types. * FFI: Fix sizeof expression in C parser for reference types. * FFI: Allow ffi.metatype() for typedefs with attributes. * FFI: Fix ffi.metatype() for non-raw types. * Maintain chain invariant in DCE. * build: introduce option LUAJIT_ENABLE_TABLE_BUMP * ci: add tablebump flavor for exotic builds * test: allow `jit.parse` to return aborted traces * Handle all types of errors during trace stitching. * Use generic trace error for OOM during trace stitching. * Check for IR_HREF vs. IR_HREFK aliasing in non-nil store check. * cmake: set cmake_minimum_required only once * cmake: fix warning about minimum required version * ci: add a workflow for testing with AVX512 enabled * test: introduce a helper read_file * OSX/iOS/ARM64: Fix generation of Mach-O object files. * OSX/iOS/ARM64: Fix bytecode embedding in Mach-O object file. * build: introduce LUAJIT_USE_UBSAN option * ci: enable UBSan for sanitizers testing workflow * cmake: add the build directory to the .gitignore * Prevent sanitizer warning in snap_restoredata(). * Avoid negation of signed integers in C that may hold INT*_MIN. * Show name of NYI bytecode in -jv and -jdump. Closes #9924 Closes #8473 NO_DOC=LuaJIT submodule bump NO_TEST=LuaJIT submodule bump
-
- Jul 04, 2024
-
-
Nikolay Shirokovskiy authored
When fiber is accessed from Lua we create a userdata object and keep the reference for future accesses. The reference is cleared when fiber is stopped. But if fiber is joinable is still can be found with `fiber.find`. In this case we create userdata object again. Unfortunately as fiber is already stopped we fail to clear the reference. The trigger memory that clear the reference is also leaked. As well as fiber storage if it is accessed after fiber is stopped. Let's add `on_destroy` trigger to fiber and clear the references there. Note that with current set of LSAN suppressions the trigger memory leak of the issue is not reported. Closes #10187 NO_DOC=bugfix (cherry picked from commit 7db4de75)
-
- Jun 26, 2024
-
-
Nikolay Shirokovskiy authored
We just don't free functional index keys on functional index drop now. Let's approach keys deletion as in the case of primary index drop ie let's drop these keys in background. We should set `use_hint` to `true` in case of MEMTX_TREE_VTAB_DISABLED tree index methods because `memtx_tree_disabled_index_vtab` uses `memtx_tree_index_destroy<true>`. Otherwise we get read outside of index structure for stub functional index on destroy for introduced `is_func` field (which is reported by ASAN). Closes #10163 NO_DOC=bugfix (cherry picked from commit 319357d5)
-
- Jun 25, 2024
-
-
Sergey Bronnikov authored
The patch updates curl module to the version 8.8.0 [1] plus a number of commits in a range curl-8_8_0..30de937bda0f because it includes a fix for a regression [2] caught on the previous bump. The new version brings a number of functional fixes. Previous changelog entry has been removed because duplicate entries about bumps in release changelog confuses end users. Closes #9612 1. https://curl.se/changes.html#8_8_0 2. https://github.com/curl/curl/issues/13740 NO_DOC=libcurl submodule bump NO_TEST=libcurl submodule bump (cherry picked from commit 7192bf66)
-
Sergey Bronnikov authored
The patch updates curl module to the version 8.7.1 [1][2] that brings a number of functional and security fixes, and updates CMake module for building curl library. Security fixes: - CVE-2024-2004: Usage of disabled protocol. (low) - CVE-2024-2398: HTTP/2 push headers memory-leak. (medium) - CVE-2024-2379: QUIC certificate check bypass with wolfSSL. (low) - CVE-2024-2466: TLS certificate check bypass with mbedTLS. (medium) Changes in CMake module: - Option `USE_OPENSSL_QUIC` was added and disabled by default [3] Previous changelog entry has been removed because duplicate entries about bumps in release changelog confuses end users. The bump was blocked by a regression in libcurl [4][5]. 1. https://curl.se/changes.html#8_7_1 2. https://github.com/curl/curl/compare/curl-8_6_0...curl-8_7_1 3. https://github.com/curl/curl/commit/8e741644a229c3791963b4f5cae1dcfccba842dd 4. https://curl.se/mail/lib-2024-03/0059.html 5. https://github.com/curl/curl/issues/13260 NO_DOC=libcurl submodule bump NO_TEST=libcurl submodule bump (cherry picked from commit 63cb2bf6)
-
Sergey Bronnikov authored
The patch updates curl module to the version 8.6.0 [1][2] that brings a number of functional fixes, and updates CMake module for building curl library. Changes in CMake module: - Option `ENABLE_CURL_MANUAL` was added and disabled by default [3] - Option `BUILD_LIBCURL_DOCS` was added and disabled by default [3] Previous changelog entry has been removed because duplicate entries about bumps in release changelog confuses end users. This bump was blocked by a regression in libcurl [4]. 1. https://curl.se/changes.html#8_6_0 2. https://github.com/curl/curl/compare/curl-8_5_0...curl-8_6_0 3. https://github.com/curl/curl/commit/a808aab06851d4364ab1773c664df3d906a497a9 4. https://github.com/curl/curl/commit/b8c003832d730bb2f4b9de4204675ca5d9f7a903 NO_DOC=libcurl submodule bump NO_TEST=libcurl submodule bump (cherry picked from commit 00cfc959)
-
- Jun 22, 2024
-
-
Vladislav Shpilevoy authored
listen() on Mac used to take SOMAXCONN as the backlog size. It is just 128, which is too small when connections are incoming too fast. They get rejected. Increase of the queue size wasn't possible, because the limit was hardcoded. But now sio takes the runtime limit from kern.ipc.somaxconn sysctl setting. One weird thing is that when set too high, it seems to have no effect, like if nothing was changed. Specifically, values above 32767 are not doing anything, even though stay visible in kern.ipc.somaxconn. It seems listen() on Mac internally might be using 'short' or int16_t to store the queue size and it gets broken when anything above INT16_MAX is used. The code truncates the queue size to this value if the given one is too high. Closes #8130 NO_DOC=bugfix NO_TEST=requires root privileges for testing (cherry picked from commit 7e9a872f)
-
- Jun 20, 2024
-
-
Nikolay Shirokovskiy authored
Tarantool has hardcoded list of versions it can downgrade to. This list should consist of all the released versions less than Tarantool version. This workflow helps to make sure we update the list before release. It is run on pushing release tag to the repo, checks the list and fails if it misses some released version less than current. In this case we are supposed to update downgrade list (with required downgrade code) and update the release tag. Closes #8319 NO_TEST=ci NO_CHANGELOG=ci NO_DOC=ci (cherry picked from commit 6d856347)
-
- Jun 14, 2024
-
-
Nikolay Shirokovskiy authored
See #8890 NO_TEST=internal NO_CHANGELOG=internal NO_DOC=internal (cherry picked from commit c5b3e594)
-