diff --git a/doc/user/databases.xml b/doc/user/databases.xml index a66cb121f530494013b9368b81e5a25c2d242057..79d852c4960ae8c3275f2183fc049d4297726aa0 100644 --- a/doc/user/databases.xml +++ b/doc/user/databases.xml @@ -2694,291 +2694,6 @@ tarantool> <userinput>box.stat().DELETE -- a selected item of the table</userinp </para> </section> -<section xml:id="sp-shard"> - <title>Package <code>shard</code></title> - -<para> -With <link xlink:href="https://en.wikipedia.org/wiki/Sharding">Sharding</link>, -the tuples of a tuple set are distributed -to multiple nodes, with a Tarantool database server on each node. With this arrangement, -each server is handling only a subset of the total data, so larger loads can be -handled by simply adding more computers to a network. -</para> -<para> -The Tarantool shard package has facilities -for creating or redistributing or cleaning up shards, as well as analogues for the -data-manipulation functions of the box library (select, insert, replace, update, delete). -The important new concept is that there must be a function which, given a key value, -returns a Shard Identification Number so that the data-manipulation functions will know -which location is relevant. -In fact there must be two such functions, one for giving Shard Identification Number -according to the current algorithm, and one for giving Shard Identification Number -according to a previous algorithm. The former is called [curr], the latter is called [prev]. -When the algorithm has to be changed because new nodes are added or because the load -must be balanced differently, the [curr] function becomes the [prev] function, a -new [curr] function is introduced, and a function named <code>shard.copy()</code> redistributes -all the tuples -- a process called "resharding". -</para> -<para> -The shard package can be installed by -putting a directive at the start of an init.lua file:<programlisting> -require 'shard'</programlisting>which brings in all the required functionality -from a program file named shard.lua. -</para> -<para> -For the original package description in Russian, see -<link xlink:href="https://github.com/tarantool/shard/blob/master/README.md"> -https://github.com/tarantool/shard/blob/master/README.md</link>. -</para> -<variablelist> - <varlistentry> - <term><emphasis role="lua"><code>shard.schema.config (<replaceable>{ key = value, ... }</replaceable>)</code></emphasis></term> - <listitem> - <para> - This function configures or reconfigures the shard-host or proxy-host. - The parameters which are indicated by the "key = value" pairs are: - <code> - * shard list - a list of shards and their types (see "List of shards", below) - * me - each node has to know "which shard in the list is me?" so this is the node's number in the list - (or 0 if the shard is a proxy) - * mode - mode for the current shard (either 'ro' or 'rw') - * curr - the current function that returns, for a given space and key, the Shard Number. - * prev - the previous function that returns, for a given space and key, the Shard Number. -</code> - The function reconfigures only those parameters that are transmitted during the call. - Other parameters' values remain the same. - </para> - <para> - The SHARD LIST is a Lua table, which is an array of arrays. - Each entry in the table describes one shard, with the following fields : -<code> - 1. Shard operation mode. Either 'ro' = read only, or 'rw' = read / write. - 2. Shard weight (a number between 0 and 1000 ) - 3. Shard connection parameters (the host and port of the node that handles the shard) - ?. Implicitly, a Shard Number. -</code> - </para> - <para> - Notes: - 1. If there are multiple descriptions of the same shard, only one of them may have shard operation mode = 'rw'. - 2. Operations which modify data must always go to hosts with shard operation mode = 'rw'. - Operations which do not modify data may go to either 'ro' or 'rw' shards, depending on the semantics of the call. - </para> - <para> - The <code>shard.schema.config()</code> function must be performed on each node, - before creating a sharded database, and whenever the configuration changes. - Care must be taken that the nodes' configurations are consistent with each other. - The administrator is responsible for determining the values that [curr] and [prev] - functions should return -- a simple example would be "if key is less than a million - then return Shard Identification Number = 1, else return Shard Identification Number = 2". - </para> - <para> - Given this configuration information, subsequent <code>shard</code> functions will all - know what to do. Simplifying slightly, a subroutine in the shard functions will: -<programlisting> -Get the space number and the primary-key value from the function's parameters. -Pass the space number and the primary-key value to [curr] or [prev] to get a Shard Identification Number. -If the number is equal to "me", perform the function directly because this is the responsible node. -Otherwise, use the box.net package to pass the function to the host and port of -the node that, according to the Shard List, is responsible for handling this tuple. -</programlisting> - </para> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis role="lua"><code>shard.copy()</code></emphasis></term> - <listitem> - <para> - This function reads all tuples, if necessary copying them to new shards. - </para> - <para> - Start conditions: - The 'me' shard, that is, the shard which this node is responsible for, must be marked as 'rw'. - The 'me' shard must have both a [curr] and a [prev] function. - </para> - <para> - For each space, for each tuple in the space: if the [prev] function says that the - tuple belongs in the 'me' node, and the [curr] function says that the tuple belongs - in a different node, then copy it by sending an "insert" request to the different node. - This is actually an "insert nothrow", that is, an insert without an error message, - because the insert is simply not performed if the different node already has a tuple - with the same primary key. The copied tuple is not deleted on the 'me' node at this stage; - that is a separate job which will be handled by a separate function, <code>shard.cleanup()</code>. - </para> - <para> - The function returns a list of tuples. Each tuple contains two fields: - 1. ID Number of the space which was worked on - 2. Count of tuples which are copied into the space (may be 0) - </para> - <para> - In order to perform resharding, the administrator must: -<code> -* For all hosts ,determine the functions [curr] and [prev] - * Update the Shard List with shard.config - * Run shard.copy() on all hosts -- this will find tuples - on nodes according to [prev], and move them to new nodes - according to [curr]. -* Delete the [prev] function on all shards. -* Run shard.cleanup() on all hosts, to remove outdated data. -</code> - </para> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis role="lua"><code>shard.cleanup()</code></emphasis></term> - <listitem> - <para> - This function reads all tuples, if necessary deleting tuples which don't fit. - </para> - <para> - Start conditions : - current shard marked as 'rw' (read/write) - shard specifies only one function - [curr], not [prev] (that is, resharding mode is disabled) - </para> - <para> - This function goes through all the Tarantool spaces and, - for each tuple in a space, invokes the [curr] function. - If the number returned by [curr] is not equal to the actual number that the shard is in (known as [me]), - the tuple is deleted. - </para> - <para> - The function returns a list of tuples. Each tuple contains two fields : - 1. ID Number of the space which was worked on, - 2. Count of tuples which are deleted from the space. - </para> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis role="lua"><code>shard.select (<replaceable>space, key [, subkey, ...]</replaceable>)</code></emphasis></term> - <listitem> - <para> - This function performs a query using a key. A key may have multiple fields. - </para> - <para> - If <code>shard.select()</code> is called while a resharding is taking place, - and the tuple is not found on the node which the [curr] function identifies, - then the request is forwarded to the node which the [prev] function identifies. - </para> - <para> - Notes: - if the key is multi-field key then all fields must be present -- sampling multiple nodes with partially-specified keys is not possible. - </para> - <para> - The <code>shard.select()</code> function is the only function which can access hosts that are marked "read only". - The host selection algorithm ([curr] or [prev]) may choose among multiple possible hosts that all contain - copies of the specific tuple. Usually the algorithm will choose the shard whose 'weight' field in the shard - list has the greatest value. - </para> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis role="lua"><code>shard.eselect (<replaceable>mode, space, key [, subkey, ...]</replaceable>)</code></emphasis></term> - <listitem> - <para> - This function is the same as <code>shard.select()</code> but has an additional parameter, "mode", - to indicate which hosts it is preferable to sample. - </para> - <para> - The mode parameter can have one of the following values: - 'ro' (means that it is preferable to sample the hosts marked as "read only", but if there are none, then sampling an 'rw' node is okay), - 'rw' (means that it is necessary to sample only the hosts marked as " read-write"). - Thus, "call <code>shard.eselect('ro', ...)</code>" is the same as "call <code>shard.select(...)</code>". - </para> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis role="lua"><code>shard.insert (<replaceable>space, ...</replaceable>)</code></emphasis></term> - <listitem> - <para> - This functions inserts a specified tuple into a given space. - </para> - <para> - The essential operation is: -<code> -* The local node -- call it Node#1 -- uses [curr] to determine the shard to go to, - and sends the request to the different node, call it Node#2. -* Node#1 returns a duplicate-key error if the tuple already exists in its space. -* If there is a [prev] function on the different node, then the different node - uses [prev] to determine another node to go to, and sends the request to the - other node, call it Node#3. -* If Node#3 finds the row in its space, then that is an error, which eventually - gets returned to Node#1 and the user sees a duplicate-key error. - </code> - This mechanism is a bit complex, but all the user has to know is: - if there's a duplicate, even if sharding or resharding is taking place when - the insert request happens, no problem -- Tarantool will detect the duplicate. -</para> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis role="lua"><code><replaceable>shard.replace (space, ...)</replaceable></code></emphasis></term> - <listitem> - <para> - This function inserts, or possibly replaces, a tuple. - </para> - <para> - The initial part of the operation is the same as for <code>shard.insert()</code>, - but <code>shard.replace()</code> is simpler because it does not have to search for - duplicates. - </para> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis role="lua"><code>shard.delete (<replaceable>space, key [, subkey, ...])</replaceable></code></emphasis></term> - <listitem> - <para> - This function deletes a tuple. - </para> - <para> - Control is transferred to the node which is responsible according to [curr]. - That different node will delete the tuple from its own tuple set, - and also -- if that different node has a [prev] function in its Shard List -- - will pass the delete request on to whatever node the [prev] function identifies. - </para> - <para> - Returns: exists_from_curr or exists_from_prev. - </para> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis role="lua"><code>shard.update (<replaceable>space, key, [subkey, ...] format, ...</replaceable>)</code></emphasis></term> - <listitem> - <para> - This function updates a tuple. - </para> - <para> - Control is transferred to the node which is responsible according to [curr]. - If the tuple is present on that different node, the tuple is updated - and the new updated value is returned. - Otherwise, if that different node has a [prev] function in its Shard List, - then the request is passed on to whatever node the [prev[ function identifies, - and that node actually performs an insert(), then returns what looks like - an updated value. - </para> - <para> - Like <code>shard.insert()</code>, the operations for <code>shard.update()</code> look a bit - complex. But, again, the user only has to know that the <code>shard</code> package - has routines which ensure that the result is an updated tuple, even though - there are several remotely-possible scenarios which must be taken care of. - </para> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis role="lua"><code>shard.call (<replaceable>mode, procname, ...</replaceable>)</code></emphasis></term> - <listitem> - <para> - This function calls a remote function. - </para> - <para> - The host which receives <code>shard.call()</code> selects a random node from - the Shard List -- preferably a read-only node if the mode parameter says 'ro'. - The remote node will perform the function and return its result to the original - node, which will return it to the user. - </para> - </listitem> - </varlistentry> -</variablelist> -</section> <section xml:id="administrative-requests"> <title>Administrative requests</title> diff --git a/doc/user/server-administration.xml b/doc/user/server-administration.xml index 9d663c6ee42e78e41f5382116d2990b043ca7591..01860432209e4cc8342da9eef8343d714b132686 100644 --- a/doc/user/server-administration.xml +++ b/doc/user/server-administration.xml @@ -207,54 +207,6 @@ Explanatory notes about what tarantool displayed in the above example: </section> -<section xml:id="tarantool_deploy"> -<title>Utility <code>tarantool_deploy</code></title> -<para> -With tarantool_deploy one can set up so that, during system boot, -one or more instances of the tarantool server will start. -This utility is for use on Red Hat or CentOS where Tarantool -was installed using <code>rpm --install</code>. -</para> -<para> -Technically, tarantool_deploy will place instructions in <filename>/etc/init.d</filename> -which will initiate tarantool with appropriate options and -with settings that maximize resource usage. -The root password is necessary. These options are available, -as shown by <code>tarantool_deploy --help</code>: -<programlisting> -Tarantool deployment script: add more Tarantool instances. -usage: tarantool_deploy.sh [options] <instance> - - --prefix <path> installation path (/usr) - --prefix_etc <path> installation etc path (/etc) - --prefix_var <path> installation var path (/var) - - --status display deployment status - --dry don't create anything, show commands - - --debug show commands - --yes don't prompt - --help display this usage -</programlisting> -</para> -<para> -The default prefixes (<filename>/usr</filename> and <filename>/etc</filename> and <filename>/var</filename>) are appropriate -if a Tarantool installation was done with default settings, -for example tarantool should be in <filename>/usr/bin</filename>. -The only necessary argument is the "instance", which is an -arbitrary numeric identification formatted as digit.digit. -The following is a sample run: -<programlisting><prompt>$ </prompt><userinput>tarantool_deploy.sh 0.1</userinput> -tarantool_deploy.sh: About to deploy Tarantool instance 0.1. -tarantool_deploy.sh: Continue? [n/y] -<userinput>y</userinput> -tarantool_deploy.sh: >>> deploy instance 0.1 -tarantool_deploy.sh: >>> updating deployment config -tarantool_deploy.sh: done -</programlisting> -</para> -</section> - <section xml:id="os-install-notes"> <title>System-specific administration notes</title> <blockquote><para>