Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
T
tarantool
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
core
tarantool
Commits
b0eea2c7
Commit
b0eea2c7
authored
9 years ago
by
ocelot-inc
Browse files
Options
Downloads
Patches
Plain Diff
expanded shard.rst
parent
0368fbf0
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/sphinx/reference/shard.rst
+267
-11
267 additions, 11 deletions
doc/sphinx/reference/shard.rst
with
267 additions
and
11 deletions
doc/sphinx/reference/shard.rst
+
267
−
11
View file @
b0eea2c7
...
...
@@ -4,20 +4,276 @@
.. module:: shard
With `sharding`_,
the tuples of a tuple set are distributed
to multiple nodes, with a Tarantool database server on each node. With this arrangement,
each server is handling only a subset of the total data, so larger loads can be
handled by simply adding more computers to a network.
With sharding, the tuples of a tuple set are distributed to multiple nodes,
with a Tarantool database server on each node. With this arrangement,
each server is handling only a subset of the total data,
so larger loads can be handled by simply adding more computers to a network.
The Tarantool shard package has facilities
for creating or redistributing or cleaning up shards, as well as analogues for the
data-manipulation functions of the box library
(select, insert, replace, update, delete).
The Tarantool shard package has facilities
for creating shards,
as well as analogues for the data-manipulation functions of the box library
(select, insert, replace, update, delete).
Some details are `On the shard section of github`_.
First some terminology: |br|
**Consistent Hash** ...
The shard package distributes according to a hash algorithm,
that is, it applies a hash function to a tuple's primary-key value
in order to decide which shard the tuple belongs to.
The hash function is `consistent`_
so that changing the number of tuples will not affect results for many keys.
The specific hash function that the shard package uses is
guava.digest in the :codeitalic:`digest` package. |br|
**Queue** ...
A temporary list of recent update requests. Sometimes called "batching".
Since updates to a sharded database can be slow, it may
speed up throughput to send requests to a queue rather
than wait for the update to finish on ever node.
The shard package has functions for adding requests to the queue,
which it will process without further intervention.
Queuing is optional. |br|
**Redundancy** ...
The number of replicas in each shard. |br|
**Replica** ...
A complete copy of the data.
The shard package handles both sharding and replication.
One shard can contain one or more replicas.
When a write occurs, the write is attempted on every replica in turn.
The shard package does not use the built-in replication feature. |br|
**Shard** ...
A subset of the tuples in the database partitioned according to the
value returned by the consistent hash function. Usually each shard
is on a separate node, or a separate set of nodes (for example if
redundancy = 3 then the shard will be on three nodes). |br|
**Zone** ...
A physical location where the nodes are closely connected, with
the same security and backup and access points. The simplest example
of a zone is a single computer with a single tarantool-server instance.
A shard's replicas should be in different zones.
The shard package is distributed separately from the main tarantool package.
To acquire it, do a separate install. For example on Ubuntu say |br|
sudo apt-get install tarantool-shard tarantool-pool |br|
Or, download from github tarantool/shard
and compile as described in the README. Then, before using the package, say
shard = require('shard')
.. _sharding: https://en.wikipedia.org/wiki/Sharding
.. _On the shard section of github: https://github.com/tarantool/shard
The most important function is |br|
:samp:`shard.init ({shard-configuration})` |br|
This must be called for every shard.
The shard-configuration is a table with these fields: |br|
* servers (a list of URIs of nodes and the zones the nodes are in) |br|
* login (the user name which applies for accessing via the shard package) |br|
* password (the password for the login) |br|
* redundancy (a number, minimum 1) |br|
* binary (a port number that this host is listening on, on the current host)
(distinguishable from the 'listen' port specified by box.cfg) |br|
Possible Errors: Redundancy should not be greater than the number of servers;
the servers must be alive; two replicas of the same shard should not be in the same zone.
| EXAMPLE: shard.init syntax for one shard
| :codenormal:`-- The number of replicas per shard (redundancy) is 3.`
| :codenormal:`-- The number of servers is 3.`
| :codenormal:`-- The shard package will conclude that there is only one shard.`
| :codenormal:`cfg = {`
| |nbsp| |nbsp| :codenormal:`servers = {`
| |nbsp| |nbsp| |nbsp| |nbsp| :codenormal:`{ uri = 'localhost:33131', zone = '1' };`
| |nbsp| |nbsp| |nbsp| |nbsp| :codenormal:`{ uri = 'localhost:33132', zone = '2' };`
| |nbsp| |nbsp| |nbsp| |nbsp| :codenormal:`{ uri = 'localhost:33133', zone = '3' };`
| |nbsp| |nbsp| :codenormal:`};`
| |nbsp| |nbsp| :codenormal:`login = 'tester';`
| |nbsp| |nbsp| :codenormal:`password = 'pass';`
| |nbsp| |nbsp| :codenormal:`redundancy = 3;`
| |nbsp| |nbsp| :codenormal:`binary = 33131;`
| :codenormal:`}`
| :codenormal:`server.init(cfg)`
|
| :codenormal:`EXAMPLE: shard.init syntax for three shards`
| :codenormal:`-- This describes three shards.`
| :codenormal:`-- Each shard has two replicas.`
| :codenormal:`-- Since the number of servers is 7, and the number`
| :codenormal:`-- of replicas per server is 2, and dividing 7 / 2`
| :codenormal:`-- leaves a remainder of 1, one of the servers will`
| :codenormal:`-- not be used. This is not necessarily an error,`
| :codenormal:`-- because perhaps one of the servers in the list is`
| :codenormal:`-- not alive.`
| :codenormal:`cfg = {`
| |nbsp| |nbsp| :codenormal:`servers = {`
| |nbsp| |nbsp| |nbsp| |nbsp| :codenormal:`{ uri = 'localhost:33131', zone = '1' };`
| |nbsp| |nbsp| |nbsp| |nbsp| :codenormal:`{ uri = 'localhost:33132', zone = '2' };`
| |nbsp| |nbsp| |nbsp| |nbsp| :codenormal:`{ uri = 'localhost:33133', zone = '1' };`
| |nbsp| |nbsp| |nbsp| |nbsp| :codenormal:`{ uri = 'localhost:33134', zone = '2' };`
| |nbsp| |nbsp| |nbsp| |nbsp| :codenormal:`{ uri = 'localhost:33135', zone = '1' };`
| |nbsp| |nbsp| |nbsp| |nbsp| :codenormal:`{ uri = 'localhost:33136', zone = '2' };`
| |nbsp| |nbsp| |nbsp| |nbsp| :codenormal:`{ uri = 'localhost:33137', zone = '1' };`
| |nbsp| |nbsp| :codenormal:`};`
| |nbsp| |nbsp| :codenormal:`login = 'tester';`
| |nbsp| |nbsp| :codenormal:`password = 'pass';`
| |nbsp| |nbsp| :codenormal:`redundancy = 2;`
| |nbsp| |nbsp| :codenormal:`binary = 33131;`
| :codenormal:`}`
| :codenormal:`server.init(cfg)`
:samp:`shard.{space_name}.insert` :code:`{...}` etc. |br|
Every data-access function in the box package
has an analogue in the shard package, so (for
example) to insert in table T in a sharded database one
simply says "shard.T:insert{...}" instead of
"box.T:insert{...}".
A "shard.T:select{}" request without a primary key will search all shards.
:samp:`q_shard.{space_name}.insert` {:code:`{...}` etc. |br|
Every queued data-access function has an analogue in the shard package.
The user must add an operation_id. The details of queued
data-access functions, and of maintenance-related functions,
are on `the shard section of github`_.
| :codenormal:`EXAMPLE -- Shard, Minimal Configuration`
| :codenormal:`-- There is only one`
| :codenormal:`-- shard, and that shard contains only one replica.`
| :codenormal:`-- So this isn't illustrating the features of either`
| :codenormal:`-- replication or sharding, it's only illustrating``
| :codenormal:`-- what the syntax is,`
| :codenormal:`-- and what the messages look like,`
| :codenormal:`-- that anyone could duplicate in a minute or two`
| :codenormal:`-- with the magic of cut-and-paste.`
| :codenormal:`mkdir ~/tarantool_sandbox_1`
| :codenormal:`cd ~/tarantool_sandbox_1`
| :codenormal:`rm -r *.snap`
| :codenormal:`rm -r *.xlog`
| :codenormal:`~/tarantool-master/src/tarantool`
| :codenormal:`box.cfg{listen = 3301}`
| :codenormal:`box.schema.create_space('tester')`
| :codenormal:`box.space.tester:create_index('primary', {})`
| :codenormal:`box.schema.user.passwd('admin', 'password')`
| :codenormal:`console = require('console')`
| :codenormal:`console.delimiter('!')`
| :codenormal:`cfg = {`
| |nbsp| |nbsp| :codenormal:`servers = {`
| |nbsp| |nbsp| |nbsp| |nbsp| :codenormal:`{ uri = 'localhost:3301', zone = '1' };`
| |nbsp| |nbsp| :codenormal:`};`
| |nbsp| |nbsp| :codenormal:`login = 'admin';`
| |nbsp| |nbsp| :codenormal:`password = 'password';`
| |nbsp| |nbsp| :codenormal:`redundancy = 1;`
| |nbsp| |nbsp| :codenormal:`binary = 3301;`
| :codenormal:`}!`
| :codenormal:`shard = require('shard')!`
| :codenormal:`shard.init(cfg)!`
| :codenormal:`-- Now put something in ...!`
| :codenormal:`shard.tester:insert{1,'Tuple #1'}!`
If one cuts and pastes the above, then the result,
showing only the requests and responses for shard.init
and shard.tester, should look approximately like this:
| :codenormal:`tarantool>` :codebold:`shard.init(cfg)!`
| :codenormal:`2015-08-09 ... I> Sharding initialization started...`
| :codenormal:`2015-08-09 ... I> establishing connection to cluster servers...`
| :codenormal:`2015-08-09 ... I> - localhost:3301 - connecting...`
| :codenormal:`2015-08-09 ... I> - localhost:3301 - connected`
| :codenormal:`2015-08-09 ... I> connected to all servers`
| :codenormal:`2015-08-09 ... I> started`
| :codenormal:`2015-08-09 ... I> redundancy = 1`
| :codenormal:`2015-08-09 ... I> Zone len=1 THERE`
| :codenormal:`2015-08-09 ... I> Adding localhost:3301 to shard 1`
| :codenormal:`2015-08-09 ... I> Zone len=1 THERE`
| :codenormal:`2015-08-09 ... I> shards = 1`
| :codenormal:`2015-08-09 ... I> Done`
| :codenormal:`---`
| :codenormal:`- true`
| :codenormal:`...`
|
| :codenormal:`tarantool>` :codebold:`-- Now put something in ...!`
| :codenormal:`---`
| :codenormal:`...`
|
| :codenormal:`tarantool>` :codebold:`shard.tester:insert{1,'Tuple #1'}!`
| :codenormal:`---`
| :codenormal:`- - [1, 'Tuple #1']`
| :codenormal:`...`
|
|
| :codenormal:`EXAMPLE -- Shard, Scaling Out`
| :codenormal:`-- There are two shards, and each shard contains one replica.`
| :codenormal:`-- This requires two nodes. In real life the two nodes would`
| :codenormal:`-- be two computers, but for this illustration the requirement`
| :codenormal:`-- is merely: start two shells, which we'll call Terminal#1 and`
| :codenormal:`-- Terminal #2.`
|
| :codenormal:`-- On Terminal #1, say:`
| :codenormal:`mkdir ~/tarantool_sandbox_1`
| :codenormal:`cd ~/tarantool_sandbox_1`
| :codenormal:`rm -r *.snap`
| :codenormal:`rm -r *.xlog`
| :codenormal:`~/tarantool-master/src/tarantool`
| :codenormal:`box.cfg{listen = 3301}`
| :codenormal:`box.schema.create_space('tester')`
| :codenormal:`box.space.tester:create_index('primary', {})`
| :codenormal:`box.schema.user.passwd('admin', 'password')`
| :codenormal:`console = require('console')`
| :codenormal:`console.delimiter('!')`
| :codenormal:`cfg = {`
| |nbsp| |nbsp| :codenormal:`servers = {`
| |nbsp| |nbsp| |nbsp| |nbsp| :codenormal:`{ uri = 'localhost:3301', zone = '1' };`
| |nbsp| |nbsp| |nbsp| |nbsp| :codenormal:`{ uri = 'localhost:3302', zone = '2' };`
| |nbsp| |nbsp| :codenormal:`};`
| |nbsp| |nbsp| :codenormal:`login = 'admin';`
| |nbsp| |nbsp| :codenormal:`password = 'password';`
| |nbsp| |nbsp| :codenormal:`redundancy = 1;`
| |nbsp| |nbsp| :codenormal:`binary = 3301;`
| |nbsp| |nbsp| :codenormal:`}!`
| |nbsp| |nbsp| :codenormal:`shard = require('shard')!`
| |nbsp| |nbsp| :codenormal:`shard.init(cfg)!`
| |nbsp| |nbsp| :codenormal:`-- Now put something in ...!`
| |nbsp| |nbsp| :codenormal:`shard.tester:insert{1,'Tuple #1'}!`
|
| -- On Terminal #2, say:
| :codenormal:`mkdir ~/tarantool_sandbox_2`
| :codenormal:`cd ~/tarantool_sandbox_2`
| :codenormal:`rm -r *.snap`
| :codenormal:`rm -r *.xlog`
| :codenormal:`~/tarantool-master/src/tarantool`
| :codenormal:`box.cfg{listen = 3302}`
| :codenormal:`box.schema.create_space('tester')`
| :codenormal:`box.space.tester:create_index('primary', {})`
| :codenormal:`box.schema.user.passwd('admin', 'password')`
| :codenormal:`console = require('console')`
| :codenormal:`console.delimiter('!')`
| :codenormal:`cfg = {`
| |nbsp| |nbsp| :codenormal:`servers = {`
| |nbsp| |nbsp| |nbsp| |nbsp| :codenormal:`{ uri = 'localhost:3301', zone = '1' };`
| |nbsp| |nbsp| |nbsp| |nbsp| :codenormal:`{ uri = 'localhost:3302', zone = '2' };`
| |nbsp| |nbsp| :codenormal:`};`
| |nbsp| |nbsp| :codenormal:`login = 'admin';`
| |nbsp| |nbsp| :codenormal:`password = 'password';`
| |nbsp| |nbsp| :codenormal:`redundancy = 1;`
| |nbsp| |nbsp| :codenormal:`binary = 3302;`
| :codenormal:`}!`
| :codenormal:`shard = require('shard')!`
| :codenormal:`shard.init(cfg)!`
| :codenormal:`-- Now get something out ...!`
| :codenormal:`shard.tester:select{1}!`
What will appear on Terminal #1 is: a loop of
error messages saying "Connection refused" and
"server check failure". This is normal. It will
go on until Terminal #2 process starts.
What will appear on Terminal #2, at the end,
should look like this:
| :codenormal:`tarantool>shard.tester:select{1}!`
| :codenormal:`---`
| :codenormal:`- - - [1, 'Tuple #1']`
| :codenormal:`...`
This shows that what was inserted by Terminal #1
can be selected by Terminal #2, via the shard package.
Details are on `the shard section of github`_.
.. _consistent: https://en.wikipedia.org/wiki/Consistent_hashing
.. _the shard section of github: https://github.com/tarantool/shard
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment