- Dec 25, 2020
-
-
Sergey Bronnikov authored
serpent module has been dropped in commit b53cb2ae "console: drop unused serpent module", but comment that belong to module was left in luacheck config.
-
Sergey Bronnikov authored
Closes #5454 Reviewed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Reviewed-by:
Igor Munkin <imun@tarantool.org> Co-authored-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Co-authored-by:
Igor Munkin <imun@tarantool.org>
-
Sergey Bronnikov authored
Closes #5453 Reviewed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Reviewed-by:
Igor Munkin <imun@tarantool.org> Co-authored-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Co-authored-by:
Igor Munkin <imun@tarantool.org>
-
Serge Petrenko authored
Follow-up #5435
-
Serge Petrenko authored
We designed limbo so that it errors on receiving a CONFIRM or ROLLBACK for other instance's data. Actually, this error is pointless, and even harmful. Here's why: Imagine you have 3 instances, 1, 2 and 3. First 1 writes some synchronous transactions, but dies before writing CONFIRM. Now 2 has to write CONFIRM instead of 1 to take limbo ownership. From now on 2 is the limbo owner and in case of high enough load it constantly has some data in the limbo. Once 1 restarts, it first recovers its xlogs, and fills its limbo with its own unconfirmed transactions from the previous run. Now replication between 1, 2 and 3 is started and the first thing 1 sees is that 2 and 3 ack its old transactions. So 1 writes CONFIRM for its own transactions even before the same CONFIRM written by 2 reaches it. Once the CONFIRM written by 1 is replicated to 2 and 3 they error and stop replication, since their limbo contains entries from 2, not from 1. Actually, there's no need to error, since it's just a really old CONFIRM which's already processed by both 2 and 3. So, ignore CONFIRM/ROLLBACK when it references a wrong limbo owner. The issue was discovered with test replication/election_qsync_stress. Follow-up #5435
-
Serge Petrenko authored
The test involves writing synchronous transactions on one node and making other nodes confirm these transactions after its death. In order for the test to work properly we need to make sure the old node replicates all its transactions to peers before killing it. Otherwise once the node is resurrected it'll have newer data, not present on other nodes, which leads to their vclocks being incompatible and noone becoming the new leader and hanging the test. Follow-up #5435
-
Serge Petrenko authored
It is possible that a new leader (elected either via raft or manually or via some user-written election algorithm) loses the data that the old leader has successfully committed and confirmed. Imagine such a situation: there are N nodes in a replicaset, the old leader, denoted A, tries to apply some synchronous transaction. It is written on the leader itself and N/2 other nodes, one of which is B. The transaction has thus gathered quorum, N/2 + 1 acks. Now A writes CONFIRM and commits the transaction, but dies before the confirmation reaches any of its followers. B is elected the new leader and it sees that the last A's transaction is present on N/2 nodes, so it doesn't have a quorum (A was one of the N/2 + 1). Current `clear_synchro_queue()` implementation makes B roll the transaction back, leading to rollback after commit, which is unacceptable. To fix the problem, make `clear_synchro_queue()` wait until all the rows from the previous leader gather `replication_synchro_quorum` acks. In case the quorum wasn't achieved during replication_synchro_timeout, rollback nothing and wait for user's intervention. Closes #5435 Co-developed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
-
Serge Petrenko authored
It'll be useful for box_clear_synchro_queue rework. Prerequisite #5435
-
Vladislav Shpilevoy authored
The trigger is fired every time any of the relays notifies tx of replica's known vclock change. The trigger will be used to collect synchronous transactions quorum for old leader's transactions. Part of #5435
-
Serge Petrenko authored
Clear_synchro_queue isn't meant to be called multiple times on a single instance. Multiple simultaneous invocations of clear_synhcro_queue() shouldn't hurt now, since clear_synchro_queue simply exits on an empty limbo, but may be harmful in future, when clear_synchro_queue is reworked. Prohibit such misuse by introducing an execution guard and raising an error once duplicate invocation is detected. Prerequisite #5435
-
Sergey Bronnikov authored
Closes #5538
-
Sergey Bronnikov authored
For Python 3, PEP 3106 changed the design of the dict builtin and the mapping API in general to replace the separate list based and iterator based APIs in Python 2 with a merged, memory efficient set and multiset view based API. This new style of dict iteration was also added to the Python 2.7 dict type as a new set of iteration methods. PEP-0469 [1] recommends to replace d.iteritems() to iter(d.items()) to make code compatible with Python 3. 1. https://www.python.org/dev/peps/pep-0469/ Part of #5538
-
Sergey Bronnikov authored
The largest change in Python 3 is the handling of strings. In Python 2, the str type was used for two different kinds of values - text and bytes, whereas in Python 3, these are separate and incompatible types. Patch converts strings to byte strings where it is required to make tests compatible with Python 3. Part of #5538
-
Sergey Bronnikov authored
In Python 2.x calling items() makes a copy of the keys that you can iterate over while modifying the dict. This doesn't work in Python 3.x because items() returns an iterator instead of a list and Python 3 raise an exception "dictionary changed size during iteration". To workaround it one can use list to force a copy of the keys to be made. Part of #5538
-
Sergey Bronnikov authored
- convert print statement to function. In a Python 3 'print' becomes a function, see [1]. Patch makes 'print' in a regression tests compatible with Python 3. - according to PEP8, mixing using double quotes and quotes in a project looks inconsistent. Patch makes using quotes with strings consistent. - use "format()" instead of "%" everywhere 1. https://docs.python.org/3/whatsnew/3.0.html#print-is-a-function Part of #5538
-
Serge Petrenko authored
Report box.stat().*.total, box.stat.net().*.total and box.stat.net().*.current via feedback daemon report. Accompany this data with the time when report was generated so that it's possible to calculate RPS from this data on the feedback server. `box.stat().OP_NAME.total` reside in `feedback.stats.box.OP_NAME.total`, while `box.stat.net().OP_NAME.total` reside in `feedback.stats.net.OP_NAME.total` The time of report generation is located at `feedback.stats.time` Closes #5589
-
- Dec 24, 2020
-
-
Cyrill Gorcunov authored
We have a feedback server which gathers information about a running instance. While general info is enough for now we may loose a precious information about crashes (such as call backtrace which caused the issue, type of build and etc). In the commit we add support of sending this kind of information to the feedback server. Internally we gather the reason of failure, pack it into base64 form and then run another Tarantool instance which sends it out. A typical report might look like | { | "crashdump": { | "version": "1", | "data": { | "uname": { | "sysname": "Linux", | "release": "5.9.14-100.fc32.x86_64", | "version": "#1 SMP Fri Dec 11 14:30:38 UTC 2020", | "machine": "x86_64" | }, | "build": { | "version": "2.7.0-115-g360565efb", | "cmake_type": "Linux-x86_64-Debug" | }, | "signal": { | "signo": 11, | "si_code": 0, | "si_addr": "0x3e800004838", | "backtrace": "#0 0x630724 in crash_collect+bf\n...", | "timestamp": "2020-12-23 14:42:10 MSK" | } | } | } | } There is no simple way to test this so I did it manually: 1) Run instance with box.cfg{log_level = 8, feedback_host="127.0.0.1:1500"} 2) Run listener shell as while true ; do nc -l -p 1500 -c 'echo -e "HTTP/1.1 200 OK\n\n $(date)"'; done 3) Send SIGSEGV kill -11 `pidof tarantool` Once SIGSEGV is delivered the crashinfo data is generated and sent out. For debug purpose this data is also printed to the terminal on debug log level. Closes #5261 Co-developed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> @TarantoolBot document Title: Configuration update, allow to disable sending crash information For better analysis of program crashes the information associated with the crash such as - utsname (similar to `uname -a` output except the network name) - build information - reason for a crash - call backtrace is sent to the feedback server. To disable it set `feedback_crashinfo` to `false`.
-
Cyrill Gorcunov authored
When SIGSEGV or SIGFPE reaches the tarantool we try to gather all information related to the crash and print it out to the console (well, stderr actually). Still there is a request to not just show this info locally but send it out to the feedback server. Thus to keep gathering crash related information in one module, we move fatal signal handling into the separate crash.c file. This allows us to collect the data we need in one place and reuse it when we need to send reports to stderr (and to the feedback server, which will be implemented in next patch). Part-of #5261 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
This will allow to reuse this routine in crash reports. Part-of #5261 Acked-by:
Serge Petrenko <sergepetrenko@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
Very convenient to have this string extension. We will use it in crash handling. Acked-by:
Serge Petrenko <sergepetrenko@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Sergey Nikiforov authored
Added corresponding test Fixes: #5307
-
Alexander V. Tikhonov authored
It was added Fedora 32 gitlab-ci packaging job in commit: 507c47f7a829581cc53ba3c4bd6a5191d088cdf ("gitlab-ci: add packaging for Fedora 32") but also it had to be enabled in update_repo tool to make able to save packages in S3 buckets. Follows up #4966
-
Cyrill Gorcunov authored
Part-of #5446 Co-developed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
When we fetch replication_synchro_quorum value (either as a plain integer or via formula evaluation) we trim the number down to integer, which silently hides potential overflow errors. For example | box.cfg{replication_synchro_quorum='4294967297'} which is 1 in terms of machine words. Lets use 8 bytes values and trigger an error instead. Part-of #5446 Reported-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
When synchronous replication is used we prefer a user to specify a quorum number, ie the number of replicas where data must be replicated before the master node continue accepting new transactions. This is not very convenient since a user may not know initially how many replicas will be used. Moreover the number of replicas may vary dynamically. For this sake we allow to specify the number of quorum in a symbolic way. For example box.cfg { replication_synchro_quorum = "N/2+1", } where `N` is a number of registered replicas in a cluster. Once new replica attached or old one detached the number is renewed and propagated. Internally on each replica_set_id() and replica_clear_id(), ie at moment when replica get registered or unregistered, we call box_update_replication_synchro_quorum() helper which finds out if evaluation of replication_synchro_quorum is needed and if so we calculate new replication_synchro_quorum value based on number of currently registered replicas. Then we notify dependent systems such as qsync and raft to update their guts. Note: we do *not* change the default settings for this option, it remains 1 by default for now. Change the default option should be done as a separate commit once we make sure that everything is fine. Closes #5446 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> @TarantoolBot document Title: Support dynamic evaluation of synchronous replication quorum Setting `replication_synchro_quorum` option to an explicit integer value was introduced rather for simplicity sake mostly. For example if the cluster's size is not a constant value and new replicas are connected in dynamically then an administrator might need to increase the option by hands or by some other external tool. Instead one can use a dynamic evaluation of a quorum value via formal representation using symbol `N` as a current number of registered replicas in a cluster. For example the canonical definition for a quorum (ie majority of members in a set) of `N` replicas is `N/2+1`. For such configuration define ``` box.cfg {replication_synchro_quorum = "N/2+1"} ``` The formal statement allows to provide a flexible configuration but keep in mind that only canonical quorum (and bigger values, say `N` for all replicas) guarantees data reliability and various weird forms such as `N/3+1` while allowed may lead to unexpected results.
-
Cyrill Gorcunov authored
Currently the box_check_replication_synchro_quorum helper test for "replication_synchro_quorum" value being valid and returns the value itself to use later in code. This is fine for regular numbers but since we're gonna support formula evaluation the real value to use will be dynamic and returning a number "to use" won't be convenient. Thus lets change the context: make box_check_replication_synchro_quorum() to return 0|-1 for success|failure and when the real value is needed we will fetch it explicitly via cfg_geti call. To make this more explicit the real update of the appropriate variable is done via box_update_replication_synchro_quorum() helper. Part-of #5446 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Cyrill Gorcunov authored
We will need it to figure out if parameter is a numeric value when doing configuration check. Part-of #5446 Acked-by:
Serge Petrenko <sergepetrenko@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Mergen Imeev authored
Prior to this patch, region on fiber was reset during select(), get(), count(), max(), or min(). This would result in an error if one of these operations was used in a user-defined function in SQL. After this patch, these functions truncate region instead of resetting it. Closes #5427
-
- Dec 23, 2020
-
-
Nikita Pettik authored
Accidentally, in built-in declaration list it was specified that ifnull() can return only integer values, meanwhile it should return SCALAR: ifnull() returns first non-null argument so type of return value depends on type of arguments. Let's fix this and set return type of ifnull() SCALAR.
-
Mergen Imeev authored
After this patch, the persistent functions "box.schema.user.info" and "LUA" will have the same rights as the user who executed them. The problem was that setuid was unnecessarily set. Because of this, these functions had the same rights as the user who created them. However, they must have the same rights as the user who used them. Fixes tarantool/security#1
-
Sergey Kaplun authored
Platform panic occurs when fiber.yield() is used within any active (i.e. being executed now) hook. It is a regression caused by 96dbc49d ('lua: prohibit fiber yield when GC hook is active'). This patch fixes false positive panic in cases when VM is not running a GC hook. Relates to #4518 Closes #5649 Reported-by:
Michael Filonenko <filonenko.mikhail@gmail.com>
-
Alexander V. Tikhonov authored
Added packaging jobs for Fedora 32. Closes #4966
-
Alexander V. Tikhonov authored
Found that test replication/skip_conflict_row.test.lua fails with output message in results file: [035] @@ -139,7 +139,19 @@ [035] -- applier is not in follow state [035] test_run:wait_upstream(1, {status = 'stopped', message_re = "Duplicate key exists in unique index 'primary' in space 'test'"}) [035] --- [035] -- true [035] +- false [035] +- id: 1 [035] + uuid: f2084d3c-93f2-4267-925f-015df034d0a5 [035] + lsn: 553 [035] + upstream: [035] + status: follow [035] + idle: 0.0024020448327065 [035] + peer: unix/:/builds/4BUsapPU/0/tarantool/tarantool/test/var/035_replication/master.socket-iproto [035] + lag: 0.0046234130859375 [035] + downstream: [035] + status: follow [035] + idle: 0.086121961474419 [035] + vclock: {2: 3, 1: 553} [035] ... [035] -- [035] -- gh-3977: check that NOP is written instead of conflicting row. Test could not be restarted with checksum because of changing values like UUID on each fail. It happend because test-run uses internal chain of functions wait_upstream() -> gen_box_info_replication_cond() which returns instance information on its fails. To avoid of it this output was redirected to log file instead of results file.
-
Alexander V. Tikhonov authored
Due to current testing schema uses separate pipelines per each testing job then workflow names should be the same as jobs to make it more visible on github actions results page [1]. [1] - https://github.com/tarantool/tarantool/actions
-
- Dec 22, 2020
-
-
mechanik20051988 authored
There was an option 'force_recovery' that makes tarantool to ignore some problems during xlog recovery. This patch change this option behavior and makes tarantool to ignore some errors during snapshot recovery just like during xlog recovery. Error types which can be ignored: - snapshot is someway truncated, but after necessary system spaces - snapshot has some garbage after it declared length - single tuple within snapshot has broken checksum and may be skipped without consequences (in this case we ignore all row with this tuple) @TarantoolBot document Title: Change 'force_recovery' option behavior Change 'force_recovery' option behavior to allow tarantool loading from broken snapshot Closes #5422
-
Alexander V. Tikhonov authored
Found that jobs on push and pull_request filters run duplicating each other [1][2]. To avoid of it found additional module [3]. Used entire jobs skip on duplicated jobs either previously run jobs in queue that were already updated [4]. [1] - https://github.community/t/duplicate-checks-on-push-and-pull-request-simultaneous-event/18012 [2] - https://github.community/t/how-to-trigger-an-action-on-push-or-pull-request-but-not-both/16662 [3] - https://github.com/fkirc/skip-duplicate-actions#concurrent_skipping [4] - https://github.com/fkirc/skip-duplicate-actions#option-1-skip-entire-jobs
-
Alexander V. Tikhonov authored
Added standalone job with coverity check as described at [1]. This job uploads results to coverity.com host to 'tarantool' project when COVERITY_TOKEN environment is enabled. Main coverity functionality added at .travis.mk make file as standalone targets: 'test_coverity_debian_no_deps' - used in github-ci actions 'coverity_debian' - additional target with needed tools check This job configured by cron scheduler on each Saturday 04:00 am. Closes #5600 [1] - https://scan.coverity.com/download?tab=cxx
-
Alexander V. Tikhonov authored
Moved coverage saving to coveralls.io repository from travis-ci to github-ci. Completely removed travis-ci from commit criteria. Part of #5294
-
Alexander V. Tikhonov authored
Implemented github-ci action workflow OSX jobs on commits: - OSX 10.15 - OSX 11.0 Part of #5294
-
Alexander V. Tikhonov authored
Implemented github-ci action workflow on commits. Added group of CI jobs: 1) on Debian 9 ("Stretch"): - luacheck - release - debug_coverage - release_clang - release_lto 2) on Debian 10 ("Buster") - release_lto_clang11 - release_asan_clang11 Part of #5294
-