- Dec 11, 2024
-
-
Introduced a new type of cbus pipe - lcpipe. The current pipe in the cbus - cpipe, has a number of limitations, first of all - the cpipe cannot be used from the 3rd party threads, cpipe only works as a channel between two cords. That why lcpipe is needed. Its main responsibility - create channel between any thread and tarantool cord. Internally lcpipe is a cpipe, but: - on flush triggers removed, cause triggers use thread-local mem-pool, this is not possible on a third party thread - producer event loop removed, cause there is no libev event loop in third party thread Also, lcpipe interface is exported to the outside world. fix: use-after-free in `cbus_endpoint_delete` Calling a `TRASH` macro after calling the `free` function dereferences the pointer to the already freed memory. NO_DOC=picodata internal patch NO_CHANGELOG=picodata internal patch NO_TEST=picodata internal patch
-
NO_DOC=core feature NO_TEST=no Lua API NO_CHANGELOG=bugfix
-
Due to inconsistency of Tarantool type casting while using strict data types as "double" or "unsigned" it is needed to use "number" data type in a whole bunch of cases. However "number" may contain "decimal" that will be serialized into string by JSON builtin module. This commit adds "encode_decimal_as_number" parameter to json.cfg{}. That forces to encode `decimal` as JSON number to force type consistency in JSON output. Use with catious - most of JSON parsers assume that number is restricted to float64. NO_DOC=we do not host doc
-
Actually there is no reason to throw an error and make a user manually recreate prepared statement when it expires. A much more user friendly way is to recreate it under hood when statement's schema version differs from the box one. NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
-
Problem description. When we prepare a statement with parameters in the result columns (for example box.prepare('select ?')) Tarantool has no information about the type of the output column and set it to default boolean. Then, on the execution phase, the type would be recalculated during the parameter binding. Tarantool expects that there is no way for parameter to appear in the result tuple other than exactly be mentioned in the final projection. But it is incorrect - we can easily propagate parameter from the inner part of the join. For example box.prepare([[select COLUMN_1 from t1 join (values (?)) as t2 on true]]) In this case column COLUMN_1 in the final projection is not a parameter, but a "reference" to it and its type depends on the parameter from the inner part of the join. But as Tarantool recalculates only binded parameters in the result projection, it doesn't change the default boolean metadata type of the COLUMN_1 and the query fails on comparison with the actual type of the tuple. Solution. As we don't want to patch Vdbe to make COLUMN_1 refer inner parameter, it was decided to make a simple workaround: change the default column type from BOOLEAN to ANY for parameters. It fixes the comparison with the actual tuple type (we do not fail), but in some cases get ANY column in the results where we would like to have explicitly defined type. Also NULL parameters would also have ANY type, though Tarantool prefers to have BOOLEAN in this case. Closes https://github.com/tarantool/tarantool/issues/7283 NO_DOC=bug fix
-
It's similar to sql_execute_prepared, but doesn't have the `region` parameter. NO_DOC=minor NO_TEST=minor
-
NO_DOC=disable feedback NO_TEST=disable feedback
-
- add box_tuple_data_offset function (return offset of the msgpack encoded data from the beginning of the tuple) - add more export functions NO_DOC=build NO_TEST=build
-
If the write iterator sees that one DELETE statement follows another, which isn't discarded because it's referenced by a read view, it drops the newer DELETE, see commit a6f45d87 ("vinyl: discard tautological DELETEs on compaction"). This is incorrect if the older DELETE is a deferred DELETE statement (marked as SKIP READ) because such statements are dumped out of order, i.e. there may be a statement with the LSN lying between the two DELETEs in an older source not included into this compaction task. If we discarded the newer DELETE, we wouldn't overwrite this statement on major compaction, leaving garbage. Fix this issue by disabling this optimization for deferred DELETEs. Closes #10895 NO_DOC=bug fix (cherry picked from commit 2945a8c9fde6df9f6cbc714f9cf8677f0fded57a)
-
vy_lsm.c seems to be a more appropriate place for cache invalidation because (a) it's vy_lsm that owns the cache and (b) we invalidate the cache on rollback in vy_lsm_rollback_stmt(). While we are at it, let's inline vy_tx_write() and vy_tx_write_prepare() because they are trivial and used in just one place. NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit 44c245ef0baa227a6bff75de2a91f20da5532dc1)
-
There's no point in doing so because if the committed tuple has been overwritten by the time it's committed, the statement that overwrote it must have already invalidated the cache, see `vy_tx_write()`. The code invalidating the cache on commit was added along with the cache implementation without any justification. NO_DOC=minor NO_TEST=minor NO_CHANGELOG=minor (cherry picked from commit 6ee49a5955893834fdaaf554d57d92d3f35992bc)
-
Once a statement is prepared to be committed to WAL, it becomes visible (in the 'read-committed' isolation level) so it can be added to the tuple cache. That's why if the statement is rolled back due to a WAL error, we have to invalidate the cache. The problem is that the function invalidating the cache (`vy_cache_on_write`) ignores the statement if it's a DELETE judging that "there was nothing and there is nothing now". This is apparently wrong for rollback. Fix it. Closes #10879 NO_DOC=bug fix (cherry picked from commit d64e29da2c323a4b4fcc7cf9fddb0300d5dd081f)
-
A multikey index stores a tuple once per each entry of the indexed array field, excluding duplicates. For example, if the array field equals {1, 3, 2, 3}, the tuple will be stored three times. Currently, when a tuple with duplicate multikey entries is inserted into a transaction write set, duplicates are overwritten as if they belonged to different statements. Actually, this is pointless: we could just as well skip them without trying to add to the write set. Besides, this may break the assumptions taken by various optimizations, resulting in anomalies. Consider the following example: ```lua local s = box.schema.space.create('test', {engine = 'vinyl'}) s:create_index('primary') s:create_index('secondary', {parts = {{'[2][*]', 'unsigned'}}}) s:replace({1, {10, 10}}) s:update({1}, {{'=', 2, {10}}}) ``` It will insert the following entries to the transaction write set of the secondary index: 1. REPLACE {10, 1} [overwritten by no.2] 2. REPLACE {10, 1} [overwritten by no.3] 3. DELETE {10, 1} [turned into no-op as REPLACE + DELETE] 4. DELETE {10, 1} [overwritten by no.5] 5. REPLACE {10, 1} [turned into no-op as DELETE + REPLACE] (1-2 correspond to `replace()` and 3-5 to `delete()`) As a result, tuple {1, {10}} will be lost forever. Let's fix this issue by silently skipping duplicate multikey entries added to a transaction write set. After the fix, the example above will produce the following write set entries: 1. REPLACE{10, 1} [overwritten by no.2] 2. DELETE{10, 1} [turned into no-op as REPLACE + DELETE] 3. REPLACE{10, 1} [committed] (1 corresponds to `replace()` and 2-3 to `delete()`) Closes #10869 Closes #10870 NO_DOC=bug fix (cherry picked from commit 1869dce15d9a797391e45df75507078d91f1651e)
-
A Vinyl read iterator scans all read sources (memory and disk levels) even if it's executed in a read view from which most of the sources are invisible. As a result, a long running scanning request may spend most of the time skipping invisible statements. The situation is exacerbated if the instance is experiencing a heavy write load because it would pile up old statement versions in memory and force the iterator to skip over them after each disk read. Since the replica join procedure in Vinyl uses a read view iterator under the hood, the issue is responsible for a severe performance degradation of the master instance and the overall join procedure slowdown when a new replica is joined to an instance running under a heavy write load. Let's fix this issue by making a read iterator skip read sources that aren't visible from its read view. Closes #10846 NO_DOC=bug fix (cherry picked from commit 6a214e42e707b502022622866d898123a6f177f1)
-
Statements executed in a transaction are first inserted into the transaction write set and only when the transaction is committed, they are applied to the LSM trees that store indexed keys in memory. If the same key is updated more than once in the same transaction, the old version is marked as overwritten in the write set and not applied on commit. Initially, write sets of different indexes of the same space were independent: when a transaction was applied, we didn't have a special check to skip a secondary index statement if the corresponding primary index statement was overwritten because in this case the secondary index statement would have to be overwritten as well. This changed when deferred DELETEs were introduced in commit a6edd455 ("vinyl: eliminate disk read on REPLACE/DELETE"). Because of deferred DELETEs, a REPLACE or DELETE overwriting a REPLACE in the primary index write set wouldn't generate DELETEs that would overwrite the previous key version in write sets of the secondary indexes. If we applied such a statement to the secondary indexes, it'd stay there forever because, since there's no corresponding REPLACE in the primary index, a DELETE wouldn't be generated on primary index compaction. So we added a special instruction to skip a secondary index statement if the corresponding primary index was overwritten, see `vy_tx_prepare()`. Actually, this wasn't completely correct because we skipped not only secondary index REPLACE but also DELETE. Consider the following example: ```lua local s = box.schema.space.create('test', {engine = 'vinyl'}) s:create_index('primary') s:create_index('secondary', {parts = {2, 'unsigned'}}) s:replace{1, 1} box.begin() s:update(1, {{'=', 2, 2}}) s:update(1, {{'=', 2, 3}}) box.commit() ``` UPDATEs don't defer DELETEs because, since they have to query the old value, they can generate DELETEs immediately so here's what we'd have in the transaction write set: 1. REPLACE {1, 2} in 'test.primary' [overwritten by no.4] 2. DELETE {1, 1} from 'test.secondary' 3. REPLACE {1, 2} in 'test.secondary' [overwritten by no.5] 4. REPLACE{1, 3} in 'test.primary' 5. DELETE{1, 2} from 'test.secondary' 6. REPLACE{1, 3} in 'test.secondary' Statement no.2 would be skipped and marked as overwritten because of the new check, resulting in {1, 1} never deleted from the secondary index. Note, the issue affects spaces both with and without enabled deferred DELETEs. This commit fixes this issue by updating the check to only skip REPLACE statements. It should be safe to apply DELETEs in any case. There's another closely related issue that affects only spaces with enabled deferred DELETEs. When we generate deferred DELETEs for secondary index when a transaction is committed (we can do it if we find the previous version in memory), we assume that there can't be a DELETE in a secondary index write set. This isn't true: there can be a DELETE generated by UPDATE or UPSERT. If there's a DELETE, we have nothing to do unless the DELETE was optimized out (marked as no-op). Both issues were found by `vinyl-luatest/select_consistency_test.lua`. Closes #10820 Closes #10822 NO_DOC=bug fix (cherry picked from commit 6a87c45deeb49e4e17ae2cc0eeb105cc9ee0f413)
-
- Nov 21, 2024
-
-
Andrey Saranchin authored
Currently, we use raw index for count operation instead of `box_index_count`. As a result, we skip a check if current transaction can continue and we don't begin transaction in engine if needed. So, if count statement is the first in a transaction, it won't be tracked by MVCC since it wasn't notified about the transaction. The commit fixes the mistake. Also, the commit adds a check if count was successful and covers it with a test. In order to backport the commit to 2.11, space name was wrapped with quotes since it is in lower case and addressing such spaces with SQL without quotes is Tarantool 3.0 feature. Another unsupported feature is prohibition of data access in transactional triggers - it was used in a test case so it was rewritten. Closes #10825 NO_DOC=bugfix (cherry picked from commit 0656a9231149663a0f13c4be7466d4776ccb0e66)
-
- Nov 12, 2024
-
-
Vladimir Davydov authored
`vy_mem_insert()` and `vy_mem_insert_upsert()` increment the row count statistic of `vy_mem` only if no statement is replaced, which is correct, while `vy_lsm_commit()` increments the row count of `vy_lsm` unconditionally. As a result, `vy_lsm` may report a non-zero statement count (via `index.stat()` or `index.len()`) after a dump. This may happen only with a non-unique multikey index, when the statement has duplicates in the indexed array, and only if the `deferred_deletes` option is enabled, because otherwise we drop duplicates when we form the transaction write set, see `vy_tx_set()`. With `deferred_deletes`, we may create a `txv` for each multikey entry at the time when we prepare to commit the transaction, see `vy_tx_handle_deferred_delete()`. Another problem is that `vy_mem_rollback_stmt()` always decrements the row count, even if it didn't find the rolled back statement in the tree. As a result, if the transaction with duplicate multikey entries is rolled back on WAL error, we'll decrement the row count of `vy_mem` more times than necessary. To fix this issue, let's make the `vy_mem` methods update the in-memory statistic of `vy_lsm`. This way they should always stay in-sync. Also, we make `vy_mem_rollback_stmt()` skip updating the statistics in case the rolled back statement isn't present in the tree. This issue results in `vinyl-luatest/select_consistency_test.lua` flakiness when checking `index.len()` after compaction. Let's make the test more thorough and also check that `index.len()` equals `index.count()`. Closes #10751 Part of #10752 NO_DOC=bug fix (cherry picked from commit e8810c555d4e6ba56e6c798e04216aa11efb5304)
-
- Nov 07, 2024
-
-
Nikita Zheleztsov authored
This commit fixes some cases of upgrading schema from 1.6.9: 1. Fix updating empty password for users. In 1.6 credentials were array in _user, in 1.7.5 they became map. 2. Automatically update the format of user spaces. Format of system spaces have been properly fixed during upgrade to 1.7.5. However, commit 519bc82e ("Parse and validate space formats") introduced strict checking of format field in 1.7.6. So, the format of user spaces should be also fixed. Back in 1.6 days, it was allowed to write anything in space format. This commit only fixes valid uses of format: {name = 'a', type = 'number'} {'a', type = 'number'} {'a', 'num'} {'a'} Invalid use of format (e.g. {{}}, or {{5, 'number'}} will cause error anyway. User has to fix the format on old version and only after that start a new one. This commit also introduces the test, which checks, that we can properly upgrade from 1.6.9 to the latest versions, at least in basic cases. Closes #10180 NO_DOC=bugfix (cherry picked from commit f69e2ae488b3620e31f1a599d8fb78a66917dbfd)
-
- Nov 01, 2024
-
-
Andrey Saranchin authored
When building an index in background, we create on_rollback triggers for tuples inserted concurrently. The problem here is on_rollback trigger has independent from `index` and `memtx_ddl_state` lifetime - it can be called after the index was build (and `memtx_ddl_state` is destroyed) and even after the index was altered. So, in order to avoid use-after-free in on_rollback trigger, let's drop all on_rollback triggers when the DDL is over. It's OK because all owners of triggers are already prepared, hence, in WAL or replication queue (since we build indexes in background only without MVCC so the transactions cannot yield), so if they are rolled back, the same will happen to the DDL. In order to delete on_rollback triggers, we should collect them into a list in `memtx_ddl_state`. On the other hand, when the DML statement is over (committed or rolled back), we should delete its trigger from the list to prevent use-after-free. That's why the commit adds the on_commit trigger to background build process. Closes #10620 NO_DOC=bugfix (cherry picked from commit d8d82dba4c884c3a7ad825bd3452d35627c7dbf4)
-
- Oct 30, 2024
-
-
Sergey Bronnikov authored
The patch enable UBsan check signed-integer-overflow that was disabled globally in commit 5115d9f3 ("cmake: split UB sanitations into separate flags.") and disable it for a several functions inline. See also #10703 See also #10704 Closes #10228 NO_CHANGELOG=codehealth NO_DOC=codehealth NO_TEST=codehealth (cherry picked from commit 60ba7fb4c0038d9d17387f7ce9755eb587ea1da4)
-
- Oct 28, 2024
-
-
Andrey Saranchin authored
Currently, we create `memtx_tx_snapshot_cleaner` for each index in read view. However, we somewhy clarify all tuples against primary index in all cleaners. As a result, secondary indexes work incorrectly in read view when MVCC is enabled, we may even get a tuple with one key, but a tuple with another key will be returned because it is clarified against primary index and repsects its order - that's wrong because all indexes have its own orders. Let's clarify tuples against given index to fix this mistake. Community Edition is not affected at all since it uses read view only for making a snapshot - we use only primary indexes there. Part of tarantool/tarantool-ee#939 NO_TEST=in EE NO_CHANGELOG=in EE NO_DOC=bugfix (cherry picked from commit 835fadd)
-
- Oct 23, 2024
-
-
Nikolay Shirokovskiy authored
I got compile error for release build on gcc 14.2.1 20240910 version. ``` In function ‘char* mp_store_double(char*, double)’, inlined from ‘char* mp_encode_double(char*, double)’ at /home/shiny/dev/tarantool-ee/tarantool/src/lib/msgpuck/msgpuck.h:2409:24, inlined from ‘uint32_t tuple_hash_field(uint32_t*, uint32_t*, const char**, field_type, coll*)’ at /home/shiny/dev/tarantool-ee/tarantool/src/box/tuple_hash.cc:317:46: /home/shiny/dev/tarantool-ee/tarantool/src/lib/msgpuck/msgpuck.h:340:16: error: ‘value’ may be used uninitialized [-Werror=maybe-uninitialized] 340 | cast.d = val; | ~~~~~~~^~~~~ /home/shiny/dev/tarantool-ee/tarantool/src/box/tuple_hash.cc: In function ‘uint32_t tuple_hash_field(uint32_t*, uint32_t*, const char**, field_type, coll*)’: /home/shiny/dev/tarantool-ee/tarantool/src/box/tuple_hash.cc:311:24: note: ‘value’ was declared here 311 | double value; | ``` NO_TEST=build fix NO_CHANGELOG=build fix NO_DOC=build fix (cherry picked from commit 1129c758d0e3bd86eec89e5229eac3f99155d8ac)
-
- Oct 18, 2024
-
-
Andrey Saranchin authored
Since we often search spaces, users, funcs and so on in internal caches that have `read-committed` isolation level (prepared tuples are seen), let's always allow to read prepared tuples of system spaces. Another advantage of such approach is that we never handle MVCC when working with system spaces, so after the commit they will behave in the same way - prepared tuples will be seen. The only difference is that readers of prepared rows will be aborted if the row will be rolled back. By the way, the inconsistency between internal caches and system spaces could lead to crash in some sophisticated scenarios - the commit fixes this problem as well because now system spaces and internal caches are synchronized. Closes #10262 Closes tarantool/security#131 NO_DOC=bugfix (cherry picked from commit b33f17b25de6bcbe3ebc236250976e4a0250e75e)
-
Andrey Saranchin authored
Yielding DDL operations acquire DDL lock so that the space cannot be modified under its feet. However, there is a case when it actually can: if a yielding DDL has started when there is another DDL is being committed and it gets rolled back due to WAL error, `struct space` created by rolled back DDL is deleted - and it's the space being altered by the yielding DDL. In order to fix this problem, let's simply wait for all previous alters to be committed. We could use `wal_sync` to wait for all previous transactions to be committed, but it is more complicated - we need to use `wal_sync` for single instance and `txn_limbo_wait_last_txn` when the limbo queue has an owner. Such approach has more pitfalls and requires more tests to cover all cases. When relying on `struct alter_space` directly, all situations are handled with the same logic. Alternative solutions that we have tried: 1. Throw an error in the case when user tries to alter space when there is another non-committed alter. Such approach breaks applier since it applies rows asynchronously. Trying applier to execute operations synchronously breaks it even harder. 2. Do not use space in `build_index` and `check_format` methods. In this case, there is another problem: rollback order. We have to rollback previous alters firstly, and the in-progress one can be rolled back only after it's over. It breaks fundamental memtx invariant: rollback order must be reverse of replace order. We could try to use `before_replace` triggers for alter, but the patch would be bulky. Closes #10235 NO_DOC=bugfix (cherry picked from commit fee8c5dd6b16471739ed8512ba4137ff2e7274aa)
-
- Oct 16, 2024
-
-
Ilya Verbin authored
All structures with a non-default alignment (set by `alignas()`) must be allocated by `aligned_alloc()`, otherwise an access to such a structure member fill crash, e.g. if compiled with AVX-512 support. See also commit a60ec82d4f07 ("box: fix SIGSEGV on unaligned access to a struct with extended alignment"). Closes #10699 NO_DOC=bugfix NO_CHANGELOG=minor NO_TEST=tested by debug_asan_clang workflow (cherry picked from commit bf091358806ed17bf44efd2cf382a43c0ba49fe0)
-
- Oct 15, 2024
-
-
Nikolay Shirokovskiy authored
If opts.identity is NULL and strdup is failed we do NULL pointer dereference when reporting the error. Let's just panic if strdup() failed. While at it replace another strdup() with xstrdup() in this function. Our current approach is to panic on runtime OOM. Closes tarantool/security#128 NO_TEST=issue is not possible after the fix NO_CHANGELOG=not reproducible NO_DOC=bugfix (cherry picked from commit 47b72f44986797466b95b9431a381dbef7dd64fd)
-
- Oct 14, 2024
-
-
Ilya Verbin authored
The type cast is unnecessary and causes false-positive errors: NO_WRAP ``` ./src/box/field_map.c:110:10: runtime error: store to misaligned address 0x507000071082 for type 'uint32_t *' (aka 'unsigned int *'), which requires 4 byte alignment 0x507000071082: note: pointer points here 01 00 00 00 be be be be f0 ff ff ff 02 00 00 00 be be be be be be be be 00 00 00 00 00 00 00 00 ^ SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ./src/box/field_map.c:110:10 ``` NO_WRAP Closes #10631 NO_DOC=bugfix NO_CHANGELOG=minor NO_TEST=tested by debug_asan_clang workflow (cherry picked from commit 5ddbd85cc377a29dc27d01ad06acdc6acc24cc5b)
-
Ilya Verbin authored
New commits: * mempool: fix UBSan errors regarding misaligned stores NO_DOC=submodule bump NO_TEST=submodule bump NO_CHANGELOG=submodule bump (cherry picked from commit 9dd56f49be85dc8a1fe874629711a828835f740c)
-
Ilya Verbin authored
All structures with a non-default alignment (set by `alignas()`) must be allocated by `aligned_alloc()`, otherwise an access to such a structure member fill crash, e.g. if compiled with AVX-512 support. Closes #10215 Part of #10631 NO_DOC=bugfix NO_TEST=tested by debug_asan_clang workflow NO_CHANGELOG=fix is actually not user-visible, because tarantool still doesn't work with enabled AVX-512 (#10671) (cherry picked from commit a60ec82d4f07720148b0724e5feff31f76291b56)
-
Ilya Verbin authored
This reverts commit 3c25c667. `aligned_alloc()` is supported by macOS since 10.15. I believe that we do not support older versions now. NO_DOC=internal NO_TEST=internal NO_CHANGELOG=internal (cherry picked from commit 2f4594f748cff99d15f8f6d603797a308793de86)
-
- Oct 08, 2024
-
-
Vladimir Davydov authored
Currently, we just panic without providing any additional information if we failed to create the initial checkpoint on bootstrap. This complicates trouble shooting. Let's replace `panic()` with `say_error()` and raise the exception that caused the failure. The exception will be caught by `box_cfg()`, which will log it and then panic. NO_DOC=error logging NO_TEST=error logging NO_CHANGELOG=error logging (cherry picked from commit e1b5114d99ed2f224e9e9a17bf29882e50be3653)
-
- Oct 07, 2024
-
-
Nikita Zheleztsov authored
We decided to introduce new schema version, which does nothing in order to distinguish, which 2.11 schema we can safely use to allow persistent names on it. Follow up #10549 NO_DOC=internal NO_CHANGELOG=internal NO_TEST=nothing to test
-
Vladislav Shpilevoy authored
The function replica_check_id() is called on any change in _cluster: insert, delete, update. It was supposed to check if the replica ID is valid - not nil, not out of range (VCLOCK_MAX). But it was also raising an error when the ID matched this instance's ID unless the instance was joining. That happened even if a _cluster tuple was updated without changing the ID at all. For example, if one would just do _cluster:replace(_cluster:get(box.info.id)). Better do the check in the only place where the mutation can happen - on deletion. Since replica ID is a primary key in _cluster, it can't be updated there. Only inserted or deleted. This commit is backported to 2.11, since we want to allow using persistent names as early as we can in order to simplify the upgrade process. We also bump the schema version in the following commit in order to distinguish this version from overs 2.11.X, where persistent names doesn't work. Closes #10549 NO_DOC=bugfix and refactoring NO_CHANGELOG=cannot happen without touching system spaces NO_TEST=too insignificant for an own test (cherry picked from commit cb8f4715)
-
Sergey Bronnikov authored
There is no check for NULL for a value returned by `ibuf_alloc`, the NULL will be passed to `memcpy()` if the aforementioned function will return a NULL. The patch fixes that by replacing `ibuf_alloc` with macros `xibuf_alloc` that never return NULL. Found by Svace. NO_CHANGELOG=codehealth NO_DOC=codehealth NO_TEST=codehealth (cherry picked from commit b4ee146fde6e418aed590ac6054cff75c2a59626)
-
Astronomax authored
This patch optimizes the process of collecting ACKs from replicas for synchronous transactions. Before this patch, collecting confirmations was slow in some cases. There was a possible situation where it was necessary to go through the entire limbo again every time the next ACK was received from the replica. This was especially noticeable in the case of a large number of parallel synchronous requests. For example, in the 1mops_write bench with parameters --fibers=6000 --ops=1000000 --transaction=1, performance increases by 13-18 times on small clusters of 2-4 nodes and 2 times on large clusters of 31 nodes. Closes #9917 NO_DOC=performance improvement NO_TEST=performance improvement (cherry picked from commit 4a866f64d64c610a3c8441835fee3d8dda5eca71)
-
Astronomax authored
Two new vclock methods have been added: `vclock_nth_element` and `vclock_count_ge`. * `vclock_nth_element` takes n and returns whatever element would occur in nth position if vclock were sorted. This method is very useful for synchronous replication because it can be used to find out the lsn of the last confirmed transaction - it's simply the result of calling this method with argument {vclock_size - replication_synchro_quorum} (provided that vclock_size >= replication synchro quorum, otherwise it is obvious that no transaction has yet been confirmed). * `vclock_count_ge` takes lsn and returns the number of components whose value is greater than or equal to lsn. This can be useful to understand how many replicas have already received a transaction with a given lsn. Part of #9917 NO_CHANGELOG=Will be added in another commit NO_DOC=internal (cherry picked from commit 58f3c93b660499e85f08a4f63373040bcae28732)
-
- Oct 04, 2024
-
-
Andrey Saranchin authored
According to the C standard, passing `NULL` to `memcpy` is UB, even if it copies nothing (number of bytes to copy is 0). The commit fixes such situation in memtx MVCC. Closes tarantool/security#129 NO_TEST=fix UB NO_CHANGELOG=fix UB NO_DOC=fix UB (cherry picked from commit 24d38cef5adff900bea2484235762678ac1c5234)
-
- Sep 25, 2024
-
-
Vladimir Davydov authored
Vinyl doesn't support altering the primary index of a non-empty space, but the check forbidding this isn't entirely reliable - the DDL function may yield to wait for pending WAL writes to finish after ensuring that the space doesn't contain any tuples. If a new tuples is inserted into the space in the meantime, the DDL operation will proceed rebuilding the primary index and trigger a crash because the code is written on the assumption that it's rebuilding a secondary index: ``` ./src/box/vinyl.c:1572: vy_check_is_unique_secondary_one: Assertion `lsm->index_id > 0' failed. ``` Let's fix this by moving the check after syncing on WAL. Closes #10603 NO_DOC=bug fix (cherry picked from commit 955537b57c2aade58b7ca42501a9bbe50dd91f26)
-
- Sep 23, 2024
-
-
Vladimir Davydov authored
`index.count()` may hang for too long in Vinyl if a substantial consecutive hunk of the space is stored in memory. Let's add a fiber slice check to it to prevent it from blocking the TX thread for too long. Closes #10553 NO_DOC=bug fix (cherry picked from commit e19bca5a74e83d2521fe770f2a93c3e3d3ad4801)
-
Vladimir Davydov authored
The tuple cache doesn't store historical data. It stores only the newest tuple versions, including prepared but not yet confirmed (committed but not written to WAL) tuples. This means that transactions sent to a read view shouldn't add any new chains to the cache because such a chain may bypass a tuple invisible from the read view. A transaction may be sent to a read view in two cases: 1. If some other transactions updates data read by it. 2. If the transaction is operating in the 'read-confirmed' isolation mode and skips an unconfirmed tuple while scanning the memory level. This was added in commit 588170a7 ("vinyl: implement transaction isolation levels"). The second point should be checked by the read iterator itself, and it is indeed for the standard case when we scan the memory level before reading the disk. However, there's the second case: if some other tuples are inserted into the memory level while the read iterator was waiting for a disk read to complete, it rescans the memory level and may skip a new unconfirmed tuple that wasn't there the first time we scanned the memory level. Currently, if this happens, it won't send itself to a read view and may corrupt the cache by inserting a chain that skips over the unconfirmed tuple. Fix this by adding the missing check. While we are at it, let's simplify the code a bit by moving the check inside `vy_read_iterator_scan_mem()`. It's okay because sending to a read view a transaction that's already in the read view is handled correctly by `vy_tx_send_to_read_view()`. Closes #10558 NO_DOC=bug fix (cherry picked from commit a3feee322e76a1e10ab874e63f17f97b6457b59d)
-