- Jun 23, 2021
-
-
Cyrill Gorcunov authored
We already have `box.replication.upstream.lag` entry for monitoring sake. Same time in synchronous replication timeouts are key properties for quorum gathering procedure. Thus we would like to know how long it took of a transaction to traverse `initiator WAL -> network -> remote applier -> initiator ACK reception` path. Typical output is | tarantool> box.info.replication[2].downstream | --- | - status: follow | idle: 0.61753897101153 | vclock: {1: 147} | lag: 0 | ... | tarantool> box.space.sync:insert{69} | --- | - [69] | ... | | tarantool> box.info.replication[2].downstream | --- | - status: follow | idle: 0.75324084801832 | vclock: {1: 151} | lag: 0.0011014938354492 | ... Closes #5447 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> @TarantoolBot document Title: Add `box.info.replication[n].downstream.lag` entry `replication[n].downstream.lag` represents a lag between the main node writes a certain transaction to it's own WAL and a moment it receives an ack for this transaction from a replica.
-
- Jun 16, 2021
-
-
Vladislav Shpilevoy authored
When txn_commit/try_async() failed before going to WAL thread, they installed TXN_SIGNATURE_ABORT signature meaning that the caller and the rollback triggers must look at the global diag. But they called txn_rollback() before doing return and calling the triggers, which overrode the signature with TXN_SIGNATURE_ROLLBACK leading to the original error loss. The patch makes TXN_SIGNATURE_ROLLBACK installed only when a real rollback happens (via box_txn_rollback()). This makes the original commit errors like a conflict in the transaction manager and OOM not lost. Besides, ERRINJ_TXN_COMMIT_ASYNC does not need its own diag_log() anymore. Because since this commit the applier logs the correct error instead of ER_WAL_IO/ER_TXN_ROLLBACK. Closes #6027
-
- Jun 12, 2021
-
-
mechanik20051988 authored
All iproto threads listening same socket, and if user change listen address, this socket is closed in each iproto thread. This patch fix this error, now socket is closed only in main thread, and in other threads we are only stop listening, without socket closing. Also this patch fix error, related to the fact, that tarantool did not delete the unix socket path, when it's finishing work.
-
mechanik20051988 authored
At the moment, user can set any number of iproto threads, which leads to incorrect behavior if the specified number of threads is less than or equal to zero or too large. Added check for user input of the number of iproto threads - value must be > 0 and less than or equal to 1000. Closes #6005 @TarantoolBot document Title: Add check for user input of the number of iproto threads Added check for user input of the number of iproto threads - value must be > 0 and less than or equal to 1000.
-
- Jun 11, 2021
-
-
Vladislav Shpilevoy authored
There was a bug that a new replica at join to a election-enabled cluster sometimes tried to register on a non-leader node which couldn't write to _cluster, so the join failed with ER_READONLY error. Now in scope of #5613 the algorithm of join-master selection is changed. A new node looks for writable members of the cluster to use a join-master. It will not choose a follower if there is a leader. Closes #6127
-
- Jun 10, 2021
-
-
Vladislav Shpilevoy authored
The algorithm of looking for an instance to join the replicaset from didn't take into account that some of the instances might be not bootstrapped but still perfectly available. As a result, a ridiculous situation could happen - an instance could connect to a cluster with just read-only instances, but it could have itself with box.cfg{read_only = false}. Then instead of failing or waiting it just booted a brand new cluster. And after that the node just started complaining about the others having a different replicaset UUID. The patch makes so a new instance always prefers a bootstrapped join-source to a non-boostrapped one, including self. In the situation above the new instance now terminates with an error. In future hopefully it should start a retry-loop instead. Closes #5613 @TarantoolBot document Title: IPROTO_BALLOT rework and a new field A couple of fields in `IPROTO_BALLOT 0x29` used to have values not matching with their names. They are changed. * `IPROTO_BALLOT_IS_RO 0x01` used to mean "the instance has `box.cfg{read_only = true}`". It was renamed in the source code to `IPROTO_BALLOT_IS_RO_CFG`. It has the same code `0x01`, and the value is the same. Only the name has changed, and in the doc should be too. * `IPROTO_BALLOT_IS_LOADING 0x04` used to mean "the instance has finished `box.cfg()` and it has `read_only = true`". The name was wrong therefore, because even if the instance finished loading, the flag still was false for `read_only = true` nodes. Also such a value is not very suitable for any sane usage. The name was changed to `IPROTO_BALLOT_IS_RO`, the code stayed the same, and the value now is "the instance is not writable". The reason for being not writable can be any: the node is an orphan; or it has `read_only = true`; or it is a Raft follower; or anything else. And there is a new field. `IPROTO_BALLOT_IS_BOOTED 0x06` means the instance has finished its bootstrap or recovery.
-
- Jun 07, 2021
-
-
Vladislav Shpilevoy authored
If Raft state machine sees the current leader has explicitly resigned from its role, it starts a new election round right away. But in the code starting a new round there was an assumption that there is no a volatile state. There was, in fact. The patch makes the election start code use the volatile state to bump the term. It should be safe, because the other nodes won't receive it anyway until the new term is persisted. There was an alternative - do not schedule new election until the current WAL write ends. It wasn't done, because would achieve the same (the term would be bumped and persisted) but with bigger a latency. Another reason is that if the leader would appear and resign during WAL write on another candidate, in the end of its WAL write the latter would see 0 leader and would think this term didn't have one yet. And would try to elect self now, in the current term. It makes little sense, because it won't win - the current term had already had a leader and the majority of votes is already taken. Closes #6129
-
Mergen Imeev authored
This patch introduces a new SQL built-in function UUID(). Closes #5886 @TarantoolBot document Title: SQL built-in function UUID() SQL built-in function UUID() takes zero or one argument. If no argument is specified, a UUID v4 is generated. If the version of the UUID to generate is specified as an argument, the function returns the new UUID of the given version. Currently only version 4 of UUID is supported.
-
- Jun 03, 2021
-
-
Mergen Imeev authored
This patch allows VARBINARY to be returned for user-defined LUA functions. However, there are currently no values that can be interpreted as VARBINARY by the serializer, so the only way to get a VARBINARY result for user-defined LUA functions is to return a UUID or DECIMAL. Both types are not supported by SQL and are treated as VARBINARY. Closes #6024
-
Nikita Pettik authored
In 0e37af31 an optimization eliminating INSERT+DELETE and DELETE+INSERT statements by the same key in write set was introduced. It is fine until it comes for secondary index build. While we are building secondary index we save current lsn, set on_replace trigger forwarding new requests to the secondary index and copy row-by-row tuples (to be more precise keys) to secondary index until lsn of tuple is less than the one we preserved at the start. Now, if during index build we execute request replacing key that hasn't been already transferred to secondary index, we will get missing key in secondary index since: a) In on_replace trigger replace is split into DELETE+INSERT and eliminated by mentioned optimization (the same concerns simple pair of DELETE+INSERT requests made in single transaction - so that they get into one write set); b) It is skipped in the main loop transferring tuples from PK to SK since lsn of modified tuples is greater than saved lsn. In this respect, we may get missing tuples in secondary index. The proposed solution is quite trivial: we are able to track that index is still being created (see previous commit) so we won't apply INSERT+DELETE annihilation if index build is not finished. Closes #6045
-
- Jun 02, 2021
-
-
Vladislav Shpilevoy authored
Remote node doing the subscribe might be from a different replicaset. Before this patch the subscribe would be retried infinitely because the node couldn't be found in _cluster, and the master assumed it must have joined to another node, and its ID should arrive shortly (ER_TOO_EARLY_SUBSCRIBE). The ID would never arrive, because the node belongs to another replicaset. The patch makes so the master checks if the peer lives in the same replicaset. Since it is doing a subscribe, it must have joined already and should have a valid replicaset UUID, regardless of whether it is anonymous or not. Correct behaviour is to hard cut this peer off immediately, without retries. Closes #6094 Part of #5613
-
- Jun 01, 2021
-
-
Vladislav Shpilevoy authored
It is possible that a new async transaction is added to the limbo when there is an in-progress CONFIRM WAL write for all the pending sync transactions. Then when CONFIRM WAL write is done, it might see that the limbo now in the first place contains an async transaction not yet written to WAL. A suspicious situation - on one hand the async transaction does not have any blocking sync txns before it and can be considered complete, on the other hand its WAL write is not done and it is not complete. Before this patch it resulted into a crash - limbo didn't consider the situation possible at all. Now when CONFIRM covers a not yet written async transactions, they are removed from the limbo and are turned to plain transactions. When their WAL write is done, they see they no more have TXN_WAIT_SYNC flag and don't even need to interact with the limbo. It is important to remove them from the limbo right when the CONFIRM is done. Because otherwise their limbo entry may be not removed at all when it is done on a replica. On a replica the limbo entries are removed only by CONFIRM/ROLLBACK/PROMOTE. If there would be an async transaction in the first position in the limbo queue, it wouldn't be deleted until next sync transaction appears. This replica case is not possible now though. Because all synchro entries on the applier are written in a blocking way. Nonetheless if it ever becomes non-blocking, the code should handle it ok. Closes #6057
-
Cyrill Gorcunov authored
Currently `log` module accepts only numeric values of logging levels. I turn `box.cfg` interface supports symbolic names (such as 'fatal', 'crit' and etc). Thus we should support the same in `log` module. Closes #5882 Reported-by:
Alexander Turenko <alexander.turenko@tarantool.org> Acked-by:
Alexander Turenko <alexander.turenko@tarantool.org> Acked-by:
Serge Petrenko <sergepetrenko@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- May 27, 2021
-
-
Iskander Sagitov authored
It is strange to create a new fiber and see that it has yielded 100 times, when in fact it never actually did it. The patch makes fiber->csw = 0 for each created fiber. Follow-up #5799
-
Iskander Sagitov authored
If you want to get information or get csw (Context SWitch) of some fiber you need to call fiber.info(), but it creates table with information about all the fibers. This patch introduces fiber_object:info() and fiber_object:csw() - functions to solve this problem. Closes #5799 @TarantoolBot document Title: introduce fiber_object:info() and fiber_object:csw() ``` -- fiber_object:info() is the same as fiber.info(), but show information only about one alive fiber. -- fiber_object:csw() show csw (Context SWitch) of alive fiber. ```
-
- May 25, 2021
-
-
Vladislav Shpilevoy authored
Lua json module used to have a global buffer for all encodings. It was reused by each next encode(). This was not correct, because during encode() might happen a GC step, which might call encode() again and spoil the global buffer. The same problem was already fixed for the global static buffer in scope of #5632. Similarly to that time, the patch makes Lua json module use cord_ibuf to prevent "concurrent" usage of the buffer data. The global buffer is deleted. According to a few microbenchmarks it didn't affect the perf anyhow. Core part of the patch is strbuf changes. Firstly, its destruction is now optional, cord_ibuf can free itself on a next yield. Secondly, its reallocation algorithm is kept intact - ibuf is used as an allocator, not as the buffer itself. This is done so as not to be too intrusive in the third party module which might need an upgrade to the upstream in the future. Closes #6050
-
- May 24, 2021
-
-
Mergen Imeev authored
Prior to this patch, UUID was not part of SCALAR. However, this should be changed to comply with the RFC "Consistent Lua/SQL types". Closes #6042 @TarantoolBot document Title: UUID is now part of SCALAR The UUID field type is now part of the SCALAR field type. This means that now values of type UUID can be inserted into the SCALAR field, and these values can participate in the sorting of the SCALAR fields. The order is as follows: boolean < number < string < varbinary < uuid.
-
Cyrill Gorcunov authored
This allows `xlog` Lua module to decode appropriate types into symbolic form. For example with the patch we should see raft and promote types in output. | $ tarantoolctl cat 00000000000000000004.xlog | --- | HEADER: | lsn: 2 | group_id: 1 | type: RAFT | timestamp: 1621541912.4588 | BODY: | 0: 3 | 1: 4 | --- | HEADER: | lsn: 1 | replica_id: 4 | type: PROMOTE | timestamp: 1621541912.4592 | BODY: | 2: 0 | 3: 0 | 83: 3 Fixes #6088 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- May 14, 2021
-
-
Vladislav Shpilevoy authored
Update of an absent field could crash when had 2 operations in reversed order both on not specified fields. The problem was that the rope item was created incorrectly. Rope item is a range of array fields and consists of 2 parts: xrow_update_field and a tail. The xrow field should describe exactly one array field (first in the given range) and its update operation. When no operation, it is NOP. The tail includes the rest of the array fields not affected by any operations yet. But in the code it was so the rope item was created with its xrow field covering multiple array fields, and with zero tail. As a result, split of that item didn't work because it can't split an xrow field. The bug was in the nil-autofill for absent array fields. The patch fixes it so the newly created item with nils has its xrow field containing one nil, and the other nils in the tail. This allows to split the item correctly if necessary. Closes #6069
-
Cyrill Gorcunov authored
Historically we use uint32_t for fiber IDs. And these IDs were wrapping in time, especially if instance is running for a long period. Strictly speaking this is not very convenient because if some external tool gonna track fibers by their IDs it might get confused (or miss) by IDs wrapping. Lets rather switch to wide integers and fixup outputs (such as snprintf callers, Lua's fid reports and etc). This allows us to simplify code a bit and forget about IDs wrapping at all. Same time wrong output specificators resulted in weird informal lines | main/-244760339/cartridge.failover.task I> Instance state changed Thus changing IDs type forced us to review all printouts and fix formatting to not confuse users. Closes #5846 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Serge Petrenko authored
relay_subscribe_f() remembered old recovery pointer, which might be replaced by relay_restart_recovery() if a raft message is delivered during cbus_process() loop in relay_send_is_raft_enabled(). Fix the issue by removing the alias altogether and referencing relay->r directly to prevent any further mistakes. Closes #6031
-
- May 13, 2021
-
-
Serge Petrenko authored
We had various places in box.cc and relay.cc which counted processed rows and yielded every now and then. These yields didn't cover cases, when recovery has to position inside a long WAL file: For example, when tarantool exits without leaving an empty WAL file which will be used to recover instance vclock on restart. In this case the instance freezes while processing the last available WAL in order to recover the vclock. Another issue is with replication. If a replica connects and needs data from the end of a really long WAL, recovery will read up to the needed position without yields, making relay disconnect by timeout. In order to fix the issue, make recovery decide when a yield should happen. Once recovery decides so, it calls a xstream callback, schedule_yield. Currently schedule_yield is fired once recovery processes (either skips or writes) WAL_ROWS_PER_YIELD rows. schedule_yield either yields right away, in case of relay, or saves the yield for later, in case of local recovery, because it might be in the middle of a transaction. Closes #5979
-
- May 04, 2021
-
-
Cyrill Gorcunov authored
In case if we call fiber_join() over the non joinable fiber we trigger an assert and crash execution (on debug build). On release build the asserts will be zapped and won't cause problems but there is an another one -- the target fiber will cause double fiber_reset() calls which in result cause to unregister_fid() with id = 0 (not causing crash but definitely out of intention) and we will drop stack protection which might be not ours anymore. Since we're not allowed to break API on C level lets just panic early in case of such misuse, it is a way better than continue operating with potentially screwed data in memory. Fixes #6046 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
- Apr 28, 2021
-
-
Vladislav Shpilevoy authored
They were not deleted ever. Worked fine for DDL and replication, for which they were introduced in the first place, because these triggers are on the region memory. But didn't work when the triggers became available in the public API, because these are allocated on the heap. As a result, all the box.on_commit() and box.on_rollback() triggers leaked. The patch ensures all the on_commit/on_rollback triggers are destroyed. The statement triggers on_commit/on_rollback are left intact since they are private and never need deletion, but the patch adds assertions in case they ever would need to be destroyed. Another option was to force all the commit and rollback triggers clear themselves. For example, in case of commit all the on_commit triggers must clear themselves, and the rollback triggers are destroyed. Vice versa when a rollback happens. This would allow not to manually destroy on_commit triggers in case of commit. But it didn't work because the Lua triggers all work via a common runner lbox_trigger_run(), which can't destroy its argument in most of the cases (space:on_replace, :before_replace, ...). It could be patched but requires to add some work to the Lua triggers affecting all of them, which in total might be not better. Closes #6025
-
- Apr 27, 2021
-
-
Vladislav Shpilevoy authored
fiber.wakeup() in Lua and fiber_wakeup() in C could lead to a crash or undefined behaviour when called on the currently running fiber. In particular, if after wakeup was started a new fiber in a blocking way (fiber.create() and fiber_start()) it would crash in debug build, and lead to unknown results in release. If after wakeup was made a sleep with non-zero timeout or an infinite yield (fiber_yield()), the fiber woke up in the same event loop iteration regardless of any timeout or other wakeups. It was a spurious wakeup, which is not expected in most of the places internally. The patch makes the wakeup nop on the current fiber making it safe to use anywhere. Closes #5292 Closes #6043 @TarantoolBot document Title: fiber.wakeup() in Lua and fiber_wakeup() in C are nop on self In Lua `fiber.wakeup()` being called on the current fiber does not do anything, safe to use. The same for `fiber_wakeup()` in C.
-
- Apr 22, 2021
-
-
Oleg Babin authored
This patch introduces new hash types for digest module - xxHash32 and xxHash64. Closes #2003 @TarantoolBot document Title: digest module supports xxHash32/64 ```lua -- Examples below demonstrate xxHash32. -- xxHash64 has exactly the same interface -- Calculate the 32-bits hash (default seed is 0). digest.xxhash32(string[, seed]) -- Streaming -- Start a new hash by initializing state with a seed. -- If no value provided, 0 is used as default. xxhash = digest.xxhash32.new([seed]) -- Also it's possible to specify seed manually. If no value -- provided a value initially passed to "new" is used. -- Here and below "seed" expected to be unsigned -- number. Function returns nothing. xxhash:clear([seed]) -- Feed the hash state by calling "update" as many times as -- necessary. Function returns nothing. xxhash:update('string') -- Produce a hash value. xxhash:result() ```
-
- Apr 21, 2021
-
-
Kirill Yukhin authored
-
Serge Petrenko authored
New function name will be `box.ctl.promote()`. It's much shorter and closer to the function's now enriched functionality. Old name `box.ctl.clear_synchro_queue()` remains in Lua for the sake of backward compatibility. Follow-up #5445 Closes #3055 @TarantoolBot document Title: deprecate `box.ctl.clear_synchro_queue()` in favor of `box.ctl.promote()` Replace all the mentions of `box.ctl.clear_synchro_queue()` with `box.ctl.promote()` and add a note that `box.ctl.clear_synchro_queue()` is a deprecated alias to `box.ctl.promote()`
-
Serge Petrenko authored
Start writing the actual leader term together with the PROMOTE request and process terms in PROMOTE requests on receiver side. Make applier only apply synchronous transactions from the instance which has the greatest term as received in PROMOTE requests. Closes #5445 Co-developed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
-
- Apr 19, 2021
-
-
Igor Munkin authored
This patch introduces two scripts to ease crash artefacts collecting and loading for postmortem analysis: * tarabrt.sh - the tool collecting a tarball with the crash artefacts the right way: the coredump with the binary, all loaded shared libs, Tarantool version (this is a separate exercise to get it from the binary built with -O2). Besides, the tarball has a unified layout, so it can be easily processed with the second script: - /coredump - core dump file on the root level - /binary - tarantool executable on the root level - /version - plain text file on the root level with `tarantool --version` output - /checklist - plain text file on the root level with the list of the collected entities - /etc/os-release - the plain text file containing operating system identification data - all shared libraries used by the crashed instance - their layout respects the one on the host machine, so they can be easily loaded with the following gdb command: set sysroot $(realpath .) The script can be easily used either manually or via kernel.core_pattern variable. * gdb.sh - the auxiliary script originally written by @Totktonada, but needed to be adjusted to the crash artefacts layout every time. Since there is a unified layout, the original script is enhanced a bit to automatically load the coredump via gdb the right way. Closes #5569 Reviewed-by:
Alexander Turenko <alexander.turenko@tarantool.org> Reviewed-by:
Sergey Bronnikov <sergeyb@tarantool.org> Signed-off-by:
Igor Munkin <imun@tarantool.org>
-
- Apr 17, 2021
-
-
Mergen Imeev authored
-
Mergen Imeev authored
-
- Apr 15, 2021
-
-
Nikita Pettik authored
-
Roman Khabibov authored
Add ability to set box.cfg options via environment variables. These variables should have name `TT_<OPTION>`. When Tarantool instance is started under tarantoolctl utility, environment variables have higher priority than tarantoolctl configuration file. Closes #5602 Co-authored-by:
Leonid Vasiliev <lvasiliev@tarantool.org> Co-authored-by:
Alexander Turenko <alexander.turenko@tarantool.org> @TarantoolBot document Title: Set box.cfg options via environment variables Now, it is possible to set box.cfg options via environment variables. The name of variable should correspond the following pattern: `TT_<NAME>`, where `<NAME>` is uppercase box.cfg option name. For example: `TT_LISTEN`, `TT_READAHEAD`. Array values are separated by comma. Example: ```sh export TT_REPLICATION=localhost:3301,localhost:3302 ``` An empty variable is the same as unset one.
-
- Apr 14, 2021
-
-
Roman Khabibov authored
Ship libcurl headers to system path "${PREFIX}/include/tarantool" in the case of libcurl included as bundled library or static build. It is needed to use SMTP client with tarantool's libcurl instead of system libcurl. See related issue: https://github.com/tarantool/smtp/issues/24 Closes #4559
-
Roman Khabibov authored
Enable smtp and smtps protocols in bundled libcurl. It is needed to use SMTP client with tarantool's libcurl instead of system libcurl. See related issue: https://github.com/tarantool/smtp/issues/24 Part of #4559
-
Sergey Kaplun authored
LuaJIT submodule is bumped to introduce the following changes: * tools: introduce --leak-only memprof parser option Within this changeset the new Lua module providing post-processing routines for parsed memory events is introduced: * memprof/process.lua: post-process the collected events The changes provide an option showing only heap difference. One can launch memory profile parser with the introduced option via the following command: $ tarantool -e 'require("memprof")(arg)' - --leak-only filename.bin Closes #5812 Reviewed-by:
Igor Munkin <imun@tarantool.org> Signed-off-by:
Igor Munkin <imun@tarantool.org>
-
Cyrill Gorcunov authored
Currently to run "C" function from some external module one have to register it first in "_func" system space. This is a problem if node is in read-only mode (replica). Still people would like to have a way to run such functions even in ro mode. For this sake we implement "box.lib" lua module. Unlike `box.schema.func` interface the `box.lib` does not defer module loading procedure until first call of a function. Instead a module is loaded immediately and if some error happens (say shared library is corrupted or not found) it pops up early. The need of use stored C procedures implies that application is running under serious loads most likely there is modular structure present on Lua level (ie same shared library is loaded in different sources) thus we cache the loaded library and reuse it on next load attempts. To verify that cached library is up to day the module_cache engine test for file attributes (device, inode, size, modification time) on every load attempt. Since both `box.schema.func` and `box.lib` are using caching to minimize module loading procedure the pass-through caching scheme is implemented: - box.lib relies on module_cache engine for caching; - box.schema.func does snoop into box.lib hash table when attempt to load a new module, if module is present in box.lib hash then it simply referenced from there and added into own hash table; in case if module is not present then it loaded from the scratch and put into both hashes; - the module_reload action in box.schema.func invalidates module_cache or fill it if entry is not present. Closes #4642 Co-developed-by:
Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com> @TarantoolBot document Title: box.lib module Overview ======== `box.lib` module provides a way to create, delete and execute `C` procedures from shared libraries. Unlike `box.schema.func` methods the functions created with `box.lib` help are not persistent and live purely in memory. Once a node get turned off they are vanished. An initial purpose for them is to execute them on nodes which are running in read-only mode. Module functions ================ `box.lib.load(path) -> obj | error` ----------------------------------- Loads a module from `path` and return an object instance associate with the module, otherwise an error is thrown. The `path` should not end up with shared library extension (such as `.so`), only a file name shall be there. Possible errors: - IllegalParams: module path is either not supplied or not a string. - SystemError: unable to open a module due to a system error. - ClientError: a module does not exist. - OutOfMemory: unable to allocate a module. Example: ``` Lua -- Without error handling m = box.lib.load('path/to/library) -- With error handling m, err = pcall(box.lib.load, 'path/to/library') if err ~= nil then print(err) end ``` `module:unload() -> true | error` --------------------------------- Unloads a module. Returns `true` on success, otherwise an error is thrown. Once the module is unloaded one can't load new functions from this module instance. Possible errors: - IllegalParams: a module is not supplied. - IllegalParams: a module is already unloaded. Example: ``` Lua m = box.lib.load('path/to/library') -- -- do something with module -- m:unload() ``` If there are functions from this module referenced somewhere in other places of Lua code they still can be executed because the module continue sitting in memory until the last reference to it is closed. If the module become a target to the Lua's garbage collector then unload is called implicitly. `module:load(name) -> obj | error` ---------------------------------- Loads a new function with name `name` from the previously loaded `module` and return a callable object instance associated with the function. On failure an error is thrown. Possible errors: - IllegalParams: function name is either not supplied or not a string. - IllegalParams: attempt to load a function but module has been unloaded already. - ClientError: no such function in the module. - OutOfMemory: unable to allocate a function. Example: ``` Lua -- Load a module if not been loaded yet. m = box.lib.load('path/to/library') -- Load a function with the `foo` name from the module `m`. func = m:load('foo') ``` In case if there is no need for further loading of other functions from the same module then the module might be unloaded immediately. ``` Lua m = box.lib.load('path/to/library') func = m:load('foo') m:unload() ``` `function:unload() -> true | error` ----------------------------------- Unloads a function. Returns `true` on success, otherwise an error is thrown. Possible errors: - IllegalParams: function name is either not supplied or not a string. - IllegalParams: the function already unloaded. Example: ``` Lua m = box.lib.load('path/to/library') func = m:load('foo') -- -- do something with function and cleanup then -- func:unload() m:unload() ``` If the function become a target to the Lua's garbage collector then unload is called implicitly. Executing a loaded function =========================== Once function is loaded it can be executed as an ordinary Lua call. Lets consider the following example. We have a `C` function which takes two numbers and returns their sum. ``` C int cfunc_sum(box_function_ctx_t *ctx, const char *args, const char *args_end) { uint32_t arg_count = mp_decode_array(&args); if (arg_count != 2) { return box_error_set(__FILE__, __LINE__, ER_PROC_C, "%s", "invalid argument count"); } uint64_t a = mp_decode_uint(&args); uint64_t b = mp_decode_uint(&args); char res[16]; char *end = mp_encode_uint(res, a + b); box_return_mp(ctx, res, end); return 0; } ``` The name of the function is `cfunc_sum` and the function is built into `cfunc.so` shared library. First we should load it as ``` Lua m = box.lib.load('cfunc') cfunc_sum = m:load('cfunc_sum') ``` Once successfully loaded we can execute it. Lets call the `cfunc_sum` with wrong number of arguments ``` Lua cfunc_sum() | --- | - error: invalid argument count ``` We will see the `"invalid argument count"` message in output. The error message has been set by the `box_error_set` in `C` code above. On success the sum of arguments will be printed out. ``` Lua cfunc_sum(1, 2) | --- | - 3 ``` The functions may return multiple results. For example a trivial echo function which prints arguments passed in. ``` Lua cfunc_echo(1,2,3) | --- | - 1 | - 2 | - 3 ``` Module and function caches ========================== Loading a module is relatively slow procedure because operating system needs to read the library, resolve its symbols and etc. Thus to speedup this procedure if the module is loaded for a first time we put it into an internal cache. If module is sitting in the cache already and new request to load comes in -- we simply reuse a previous copy. In case if module is updated on a storage device then on new load attempt we detect that file attributes (such as device number, inode, size, modification time) get changed and reload module from the scratch. Note that newly loaded module does not intersect with previously loaded modules, the continue operating with code previously read from cache. Thus if there is a need to update a module then all module instances should be unloaded (together with functions) and loaded again. Similar caching technique applied to functions -- only first function allocation cause symbol resolving, next ones are simply obtained from a function cache.
-
Cyrill Gorcunov authored
In commit 96938faf (Add hot function reload for C procedures) an ability to hot reload of modules has been introduced. When module is been reloaded his functions are resolved to new symbols but if something went wrong it is supposed to restore old symbols from the old module. Actually current code restores only one function and may crash if there a bunch of functions to restore. Lets fix it. Fixes #5968 Signed-off-by:
Cyrill Gorcunov <gorcunov@gmail.com>
-
Vladislav Shpilevoy authored
Applier used to process synchronous rows CONFIRM and ROLLBACK right after receipt before they are written to WAL. That led to a bug that the confirmed data became visible, might be accessed by user requests, then the node restarted before CONFIRM finished its WAL write, and the data was not visible again. That is just like if it would be rolled back, which is not acceptable. Another case - CONFIRM WAL write could simply fail due to any reason (no disk space, OOM), but the transactions would remain confirmed anyway. Also that produced some hacks in the limbo's code to support the confirmation and rollback of transactions not yet written to WAL. The patch makes the synchro rows processed only after they are written to WAL. Although the 'rollback' case above might still happen if the xlogs were in the kernel caches, and the machine was powered off before they were flushed to disk. But that is not related to qsync specifically. To handle the synchro rows after WAL write the patch makes them go to WAL in a blocking way (journal_write() instead of journal_write_try_async()). Otherwise it could happen that a CONFIRM/ROLLBACK is being written to WAL and would clear the limbo afterwards, but a new transaction arrives with a different owner, and it conflicts with the current limbo owner. Closes #5213
-