Commits · f89b5ab0a2fc0cd4b5787603def77067d5539e3f · core / tarantool

Jan 14, 2020

Chris Sosnin authored 5 years ago

- If an optional argument is provided for
  space_object:frommap() (which is {table = true|false}),
  type match for first arguments is omitted, which is
  incorrect. We should return the result only after making
  sure it is possible to build a tuple.

- If there is a type mismatch, however, frommap() does not
  return nil, err as it is mentioned in the description, so we
  change it to be this way.

Closes #4262

f89b5ab0

Jan 13, 2020

fio: fix race condition in mktree · 21ae2899

HustonMmmavr authored 5 years ago

Despite the lack of documentation, fio.mktree() was designed to work
similar to mkdir -p: it creates the directory along with it's parents
and doesn't complain about existing ones.

But this function was subject to a race if two different processes were
trying to create the same directory at the same time. It was caused by
the fact that directory existence check and its creation aren't atomic.

This patch fixes the race by impoving error handling: it's not an error
if directory exists, even if it was created by someone else and mktree
failed.

Related to https://github.com/tarantool/doc/issues/1063
Closes #4660

21ae2899

test: drop dead code from app-tap/msgpackffi test · ec324247

Alexander Turenko authored 5 years ago

It appears due to improper conflict resolution after pushing the
following commits in the reverse order:

* 2b9ef8d1 lua: don't modify pointer type in msgpack.decode*
* 84bcba52 lua: keeping the pointer type in msgpackffi.decode()

Originally 84bcba52 (which should land first) fixes the msgpackffi
module and introduces the test_decode_buffer() function locally for the
msgpackffi test. Then 2b9ef8d1 fixes the msgpack module in the same
way, expands and moves the test_decode_buffer() function to
serializer_test.lua (to use in msgpack and msgpackffi tests both).

After changes made to push the commits in the reverse order, those
commits doing something weird around tests. However the resulting state
is different from the right one just in the dead function in
msgpackffi.test.lua.

Follows up #3926.

ec324247

tuple: add argument length check for update() · b73fb421

Chris Sosnin authored 5 years ago

Currently tuple_object:update() does not check the length
of operation string and just takes the first character
after decoding. This patch fixes this problem.

Follow-up #3884

b73fb421

tuple: fix non-informative update() error message · d4fcec0c

Chris Sosnin authored 5 years ago

Calling tuple_object:update() with invalid argument number
yields 'Unknown UPDATE operation' error. Instead, we replace this
error with explicit "wrong argument number", mentioning which operation
failed, or poiniting out at invalid operation code.

Fixes #3884

d4fcec0c

sql: fix typeof() for double values · 2bc4fe69

Mergen Imeev authored 5 years ago

This patch corrects the result of typeof() for double values.
Previously, it gave the type "number" in the case of a
floating-point number. Now it gives "double".

Follow-up #3812

2bc4fe69

Jan 10, 2020
- travis-ci: push RPM / DEB packages to 2_4 repo · 99eb6614
  Alexander Turenko authored 5 years ago
  
  99eb6614
- sql: rfc on prepared statements · 9887ca8f
  Nikita Pettik authored 5 years ago
  
  View commits for tag 2.4.0 2.4.0
  
  9887ca8f
Dec 31, 2019

test: fix flaky socket test · 4137134c

Ilya Kosarev authored 5 years ago

socket.test had a number of flaky problems:
- socket readiness expectation & read timeouts
- race conditions on socket shutdown in emulation test cases
- UDP datagrams losses on mac os
- excessive random port searches
Now they are solved. 127.0.0.1 is now used instead of 0.0.0.0 or
localhost to prevent wrong connections where appropriate. Socket test
is not fragile anymore.

Closes #4426
Closes #4451
Closes #4469

4137134c

sql: add cache statistics to box.info · 5a1a220e

Nikita Pettik authored 5 years ago

To track current memory occupied by prepared statements and number of
them, let's extend box.info submodule with .sql statistics: now it
contains current total size of prepared statements and their count.

@TarantoolBot document
Title: Prepared statements in SQL

Now it is possible to prepare (i.e. compile into byte-code and save to
the cache) statement and execute it several times. Mechanism is similar
to ones in other DBs. Prepared statement is identified by numeric
ID, which are returned alongside with prepared statement handle.
Note that they are not sequential and represent value of hash function
applied to the string containing original SQL request.
Prepared statement holder is shared among all sessions. However, session
has access only to statements which have been prepared in scope of it.
There's no eviction policy like in any cache; to remove statement from
holder explicit unprepare request is required. Alternatively, session's
disconnect also removes statements from holder.
Several sessions can share one prepared statement, which will be
destroyed when all related sessions are disconnected or send unprepare
request. Memory limit for prepared statements is adjusted by
box.cfg{sql_cache_size} handle (can be set dynamically;

Any DDL operation leads to expiration of all prepared statements: they
should be manually removed or re-prepared.
Prepared statements are available in local mode (i.e. via box.prepare()
function) and are supported in IProto protocol. In the latter case
next IProto keys are used to make up/receive requests/responses:
IPROTO_PREPARE - new IProto command; key is 0x13. It can be sent with
one of two mandatory keys: IPROTO_SQL_TEXT (0x40 and assumes string value)
or IPROTO_STMT_ID (0x43 and assumes integer value). Depending on body it
means to prepare or unprepare SQL statement: IPROTO_SQL_TEXT implies prepare
request, meanwhile IPROTO_STMT_ID - unprepare;
IPROTO_BIND_METADATA (0x33 and contains parameters metadata of type map)
and IPROTO_BIND_COUNT (0x34 and corresponds to the count of parameters to
be bound) are response keys. They are mandatory members of result of
IPROTO_PREPARE execution.

To track statistics of used memory and number of currently prepared
statements, box.info is extended with SQL statistics:

box.info:sql().cache.stmt_count - number of prepared statements;
box.info:sql().cache.size - size of occupied by prepared statements memory.

Typical workflow with prepared statements is following:

s = box.prepare("SELECT * FROM t WHERE id = ?;")
s:execute({1}) or box.execute(s.sql_str, {1})
s:execute({2}) or box.execute(s.sql_str, {2})
s:unprepare() or box.unprepare(s.query_id)

Structure of object is following (member : type):

- stmt_id: integer
  execute: function
  params: map [name : string, type : integer]
  unprepare: function
  metadata: map [name : string, type : integer]
  param_count: integer
...

In terms of remote connection:

cn = netbox:connect(addr)
s = cn:prepare("SELECT * FROM t WHERE id = ?;")
cn:execute(s.sql_str, {1})
cn:unprepare(s.query_id)

Closes #2592

5a1a220e

netbox: introduce prepared statements · 0e1b20c3

Nikita Pettik authored 5 years ago

This patch introduces support of prepared statements in IProto
protocol. To achieve this new IProto command is added - IPROTO_PREPARE
(key is 0x13). It is sent with one of two mandatory keys:
IPROTO_SQL_TEXT (0x40 and assumes string value) or IPROTO_STMT_ID (0x43
and assumes integer value). Depending on body it means to prepare or
unprepare SQL statement: IPROTO_SQL_TEXT implies prepare request,
meanwhile IPROTO_STMT_ID - unprepare.  Also to reply on PREPARE request a
few response keys are added: IPROTO_BIND_METADATA (0x33 and contains
parameters metadata of type map) and IPROTO_BIND_COUNT (0x34 and
corresponds to the count of parameters to be bound).

Part of #2592

0e1b20c3

box: introduce prepared statements · 7bea3d5b

Nikita Pettik authored 5 years ago

This patch introduces local prepared statements. Support of prepared
statements in IProto protocol and netbox is added in the next patch.

Prepared statement is an opaque instance of SQL Virtual Machine. It can
be executed several times without necessity of query recompilation. To
achieve this one can use box.prepare(...) function. It takes string of
SQL query to be prepared; returns extended set of meta-information
including statement's ID, parameter's types and names, types and names
of columns of the resulting set, count of parameters to be bound. Lua
object representing result of :prepare() invocation also features two
methods - :execute() and :unprepare(). They correspond to
box.execute(stmt.stmt_id) and box.unprepare(stmt.stmt_id), i.e.
automatically substitute string of prepared statement to be executed.
Statements are held in prepared statement cache - for details see
previous commit. After schema changes all prepared statement located in
cache are considered to be expired - they must be re-prepared by
separate :prepare() call (or be invalidated with :unrepare()).

Two sessions can share one prepared statements. But in the current
implementation if statement is executed by one session, another is
not able to use it and will compile it from scratch and than execute.

SQL cache memory limit is regulated by box{sql_cache_size} which can be
set dynamically. However, it can be set to the value which is less than
the size of current free space in cache (since otherwise some statements
can disappear from cache).

Part of #2592

7bea3d5b

sql: introduce holder for prepared statemets · 10ebc2d5

Nikita Pettik authored 5 years ago

This patch introduces holder (as data structure) to handle prepared
statements and a set of interface functions (insert, delete, find) to
operate on it. Holder under the hood is implemented as a global hash
(keys are values of hash function applied to the original string containing
SQL query; values are pointer to wrappers around compiled VDBE objects) and
GC queue. Each entry in hash has reference counter. When it reaches 0
value, entry is moved to GC queue. In case prepared statements holder is
out of memory, it launches GC process: each entry in GC queue is deleted
and all resources are released. Such approach allows to avoid workload
spikes on session's disconnect (since on such event all statements must
be deallocated).
Each session is extended with local hash to map statement ids available
for it. That is, session is allowed to execute and deallocate only
statements which are previously prepared in scope of this session.
On the other hand, global hash makes it possible to share same prepared
statement object among different sessions.
Size of cache is regulated via box.cfg{sql_cache_size} parameter.

Part of #2592

10ebc2d5

test: fix and split flaky join_vclock test · adb0a01b

Ilya Kosarev authored 5 years ago

join_vclock test is assumed to verify that changes are not being lost
on the replica. Due to this the test is changed to explicitly check
that all changes on master are applied on replica.
Previously this test was also indirectly verifying that changes are
being applied in the correct order. Now there is separate test for
this, called replica_apply_order.
As far as changed join_vclock test might fail due to #4669, we are now
creating cluster out of fresh instances instead of using default
instance. Considering mentioned fixes it is not fragile anymore.

Closes #4160

adb0a01b

relay: fix vclock obtainment on join · a0e61500

Ilya Kosarev authored 5 years ago

In case of high load vclock used to join replica could be in advance
comparing to an actual WAL. Therefore replica could have missed some
tuples from master. In order to fix this wal_sync is updated so that
now we can obtain up to date vclock on the flushed state using it.

Prerequisites #4160

a0e61500

sql: refactor PRAGMA-related code · d287c0e9
Mergen Imeev authored 5 years ago

d287c0e9

sql: remove control pragmas · eafadc13

Mergen Imeev authored 5 years ago

This patch removes control pragmas. They are not needed now, after
the introduction of the _session_settings system space.

Closes #4511

eafadc13

box: add SQL settings to _session_settings · 4655447c

Mergen Imeev authored 5 years ago

Part of #4511

@TarantoolBot document
Title: _session_settings system space
The _session_settings system space used to view or change session
settings.

This space uses a new engine. This allows us to create tuples on
the fly when the get() or select() methods are called. This
engine does not support the insert(), replace(), and delete()
methods. The only way to change the setting value is update(),
which can only be used with the "=" operation.

Because space creates a tuple on the fly, it allows us to get a
tuple without saving it anywhere. But this means that every time
we get a tuple from this system space, it is a new tuple, even if
they look the same:

tarantool> s = box.space._session_settings
tarantool> name = 'sql_default_engine'
tarantool> s:get({name}) == s:get({name})
---
- false
...

Currently, this space contains only SQL settings, since the only
session settings are SQL settings.

List of currently available session settings:

sql_default_engine
sql_defer_foreign_keys
sql_full_column_names
sql_parser_debug
sql_recursive_triggers
sql_reverse_unordered_selects
sql_select_debug
sql_vdbe_debug

The default values of these settings cannot be changed by the
user.

Debug settings are disabled by default and can only be enabled in
the debug build.

Example of usage:
tarantool> s = box.space._session_settings
-- View session settings values.
tarantool> s:get({'sql_default_engine'})
---
- ['sql_default_engine', 'memtx']
...

tarantool> s:select()

s:select()
---
- - ['sql_default_engine', 'memtx']
  - ['sql_defer_foreign_keys', false]
  - ['sql_full_column_names', false]
  - ['sql_full_metadata', false]
  - ['sql_parser_debug', false]
  - ['sql_recursive_triggers', true]
  - ['sql_reverse_unordered_selects', false]
  - ['sql_select_debug', false]
  - ['sql_vdbe_debug', false]
...

tarantool> s:select('sql_g', {iterator='LE'})
---
- - ['sql_full_metadata', false]
  - ['sql_full_column_names', false]
  - ['sql_defer_foreign_keys', false]
  - ['sql_default_engine', 'memtx']
...

-- Change session setting value.
tarantool> s:update('sql_default_engine', {{'=', 'value', 'vinyl'}})
---
- ['sql_default_engine', 'vinyl']
...

4655447c

box: introduce _session_settings system space · 10d79f7a

Mergen Imeev authored 5 years ago

This patch creates _session_settings system space. This space is
used to view and change session settings. This space is one of the
special spaces that have a "service" engine. The main idea of this
space is that it will not store tuples, but when you call the
get() or select() methods, it creates tuples on the fly. Because
of this, even if the same setting is asked, the returned tuples
will be different. In addition, this space allows you to change
the setting value using the update() method, in which case it
directly changes the setting value.

There are no settings at the moment, some will be added in the
next patch.

Part of #4511

10d79f7a

box: introduce 'service' engine · 81bb565c

Mergen Imeev authored 5 years ago

This patch introduces a new engine called "service" that will be
used to create a new system space. The main idea of this engine is
that it will not have a predefined space_vtab. With this engine,
we can create unusual spaces with their own vtab and behavior.

Due to the nature of this engine, it can only be used to create
system spaces.

Part of #4511

81bb565c

sql: remove PRAGMA "vdbe_addoptrace" · 94c89c64

Mergen Imeev authored 5 years ago

The vdbe_addoptrace pragma provides a convenient way to track the
insertion of opcodes into VDBE during VDBE creation. This patch
makes it impossible to disable this feature in the debug build by
removing the pragma. So, now you do not need to enable pragma in
order to use this feature.

Part of #4511

94c89c64

sql: remove PRAGMA "sql_compound_select_limit" · 3f61b8b7

Mergen Imeev authored 5 years ago

Pragma sql_compound_select_limit was added in commit b2afe208
("sql: decrease SELECT_COMPOUND_LIMIT threshold"). However, there
is no need to make this parameter mutable. We also plan to rework
SELECT (#3700), so this limit will be removed in future.

Part of #4511

3f61b8b7

sql: remove PRAGMA "short_column_names" · 57a514be

Mergen Imeev authored 5 years ago

The pragmas "short_column_names" and "full_column_names" allow us
to use three ways to display the column name in metadata:
1) If both are turned off, then the column name was shown as it
was written by the user.
2) If "short_column_names" = OFF and "full_column_names" = ON,
then the column name is displayed as <table name>.<column name>.
3) If "short_column_names" = ON, then the column name is displayed
as <column name>. This is the default option.

But we need only two ways to show the column name:
1) Show the column name as <column name>. This should be the
default option.
2) Show the column name as <table name>.<column name>.

In this regard, we need only one of these pragmas.

Part of #4511

57a514be

sql: remove PRAGMA "count_changes" · 9ad83926

Mergen Imeev authored 5 years ago

Pragma "count_changes" forces the INSERT, REPLACE, DELETE, and
UPDATE statements to return the number of changed rows as a
result set. This is not necessary, as these statements return the
number of changed rows as metadata.

Part of #4511

9ad83926

test: update test-run · 39657bf2

Alexander Turenko authored 5 years ago

* Support to set default SQL engine using _session_settings space in
  addition to pragma sql_default_engine. This feature is only for
  *.test.sql tests. Needed for #4511.

* Use exact IPv4/IPv6 address in test_run:cmd() in order to avoid rare
  failures due to using wrong address (PR #197).

39657bf2

Dec 30, 2019

sql: make constraint names unique in scope of table · e13a74e4

Roman Khabibov authored 5 years ago

Put constraint names into the space's hash table and drop them on
insert/delete in corresponding system spaces (_index,
_fk_constraint, _ck_constraint).

Closes #3503

@TarantoolBot document
Title: Constraint names are unique in scope of table

SQL:
According to ANSI SQL, table constraint is one of the following
entities: PRIMARY KEY, UNIQUE, FOREIGN KEY, CHECK. (Also there
is NOT NULL, but we don't consider it a constraint.) Every
constraint has its own name passed by user or automatically
generated. And these names must be unique within one table/space.

For example:

    tarantool> box.execute([[CREATE TABLE test (
                                 a INTEGER PRIMARY KEY,
                                 b INTEGER,
                                 CONSTRAINT cnstr CHECK (a >= 0)
                             );]])
    ---
    - row_count: 1
    ...

    tarantool> box.execute('CREATE UNIQUE INDEX cnstr ON test(b);')
    ---
    - null
    - Constraint CHECK 'CNSTR' already exists in space 'TEST'
    ...

Unique index and CHECK are different constraint types, but they
share namespace, and can't have clashing names. The same for all
the other constraints.

e13a74e4

lua: keeping the pointer type in msgpackffi.decode() · 84bcba52

Maria authored 5 years ago

Method decode_unchecked returns two values - the one that has
been decoded and a pointer to the new position within the buffer
given as a parameter. The type of returned pointer used to be
cdata<unsigned char *> and it was not possible to assign returned
value to buf.rpos due to the following error:

> cannot convert 'const unsigned char *' to 'char *'

The patch fixes this by making decode_unchecked method return either
cdata<char *> or cdata<const char *> depending on the given parameter.

Closes #3926

84bcba52

lua: don't modify pointer type in msgpack.decode* · 2b9ef8d1

Alexander Turenko authored 5 years ago

msgpackffi.decode_unchecked([const] char *) returns two values: a
decoded result and a new pointer within passed buffer. After #3926 a
cdata type of the returned pointer follows a type of passed buffer.

This commit modifies behaviour of msgpack module in the same way. The
following functions now returns cdata<char *> or cdata<const char *>
depending of its argument:

* msgpack.decode(cdata<[const] char *>, number)
* msgpack.decode_unchecked(cdata<[const] char *>)
* msgpack.decode_array_header(cdata<[const] char *>, number)
* msgpack.decode_map_header(cdata<[const] char *>, number)

Follows up #3926.

2b9ef8d1

tuple: JSON path update intersection at maps · 6e97d6a9

Vladislav Shpilevoy authored 5 years ago

Previous commits introduced isolated JSON updates, and then
allowed intersection at array. This one completes the puzzle,
adding intersection at maps, so now both these samples work:

Allowed in the previous commit:

    [1][2][3].a.b.c = 20
    [1][2][4].e.f.g = 30
           ^

    First difference is [3] vs [4] - intersection by an array.

Allowed in this commit:

    [1][2][3].a.b.c = 20
    [1][2][3].a.e.f = 30
                ^

    First difference is 'b' vs 'e' - intersection by a map.

Now JSON updates are fully available.

Closes #1261

@TarantoolBot document
Title: JSON updates
Tuple/space/index:update/upsert now support JSON paths. All the
same paths as allowed in tuple["..."].

Example:
box.cfg{}
format = {}
format[1] = {'field1', 'unsigned'}
format[2] = {'field2', 'map'}
format[3] = {'field3', 'array'}
format[4] = {'field4', 'string', is_nullable = true}
s = box.schema.create_space('test', {format = format})
_ = s:create_index('pk')
t = {
    1,
    {
        key1 = 'value',
        key2 = 10
    },
    {
        2,
        3,
        {key3 = 20}
    }
}
t = s:replace(t)

tarantool> t:update({{'=', 'field2.key1', 'new_value'}})
---
- [1, {'key1': 'new_value', 'key2': 10}, [2, 3, {'key3': 20}]]
...

tarantool> t:update({{'+', 'field3[2]', 1}})
---
- [1, {'key1': 'value', 'key2': 10}, [2, 4, {'key3': 20}]]
...

tarantool> s:update({1}, {{'!', 'field4', 'inserted value'}})
---
- [1, {'key1': 'value', 'key2': 10}, [2, 3, {'key3': 20}], 'inserted value']
...

tarantool> s:update({1}, {{'#', '[2].key2', 1}, {'=', '[3][3].key4', 'value4'}})
---
- [1, {'key1': 'value'}, [2, 3, {'key3': 20, 'key4': 'value4'}], 'inserted value']
...

tarantool> s:upsert({1, {k = 'v'}, {}}, {{'#', '[2].key1', 1}})
---
...

tarantool> s:select{}
---
- - [1, {}, [2, 3, {'key3': 20, 'key4': 'value4'}], 'inserted value']
...

Note, that there is the same rule, as in tuple field access by
JSON, for field names looking like JSON paths. The rule is that
firstly the whole path is interpreted as a field name. If such a
name does not exist, then it is treated as a path. For example,
if there is a field name 'field.name.like.json', then this update

    <obj>:update({..., 'field.name.like.json', ...})

will update this field, instead of keys 'field' -> 'name' ->
'like' -> 'json'. If such a name is needed as a part of a bigger
path, then it should be wrapped in quotes and []:

    <obj>:update({..., '["field.name.like.json"].next.fields', ...})

There are some new rules for JSON updates:

- Operation '!' can't be used to create all intermediate nodes of
  a path. For example, {'!', 'field1[1].field3', ...} can't
  create fields 'field1' and '[1]', they should exist.

- Operation '#', when applied to maps, can't delete more than one
  key at once. That is, its argument should be always 1 for maps.

    {'#', 'field1.field2', 1} - this is allowed;
    {'#', 'field1.field2', 10} - this is not.

  That limitation originates from a problem, that keys in a map
  are not ordered anyhow, and '#' with more than 1 key would lead
  to undefined behaviour.

- Operation '!' on maps can't create a key, if it exists already.

- If a map contains non-string keys (booleans, numbers, maps,
  arrays - anything), then these keys can't be updated via JSON
  paths. But it is still allowed to update string keys in such a
  map.

Why JSON updates are good, and should be preferred when only a
part of a tuple needs to be updated?

- They consume less space in WAL, because for an update only its
  keys, operations, and arguments are stored. It is cheaper to
  store update of one deep field, than the whole tuple.

- They are faster. Firstly, this is because they are implemented
  in C, and have no problems with Lua GC and dynamic typing.
  Secondly, some cases of JSON paths are highly optimized. For
  example, an update with a single JSON path costs O(1) memory
  regardless of how deep that path goes (not counting update
  arguments).

- They are available from remote clients, as well as any other
  DML. Before JSON updates to update one deep part of a tuple it
  would be necessary to download that tuple, update it in memory,
  send it back - 2 network hops. With JSON paths it can be 1 when
  the update can be described in paths.

6e97d6a9

tuple: JSON path update intersection at arrays · 8cad025a

Vladislav Shpilevoy authored 5 years ago

Before the patch only isolated JSON updates were supported as the
simplest and fastest to implement. This patch allows update
operations with paths having the same prefix. But difference of
the paths should start from an array index.

For example, this is allowed:

    [1][2][3].a.b.c = 20
    [1][2][4].e.f.g = 30

    First difference is [3] vs [4] - intersection by an array, ok.

This is not allowed yet:

    [1][2][3].a.b.c = 20
    [1][2][3].a.e.f = 30

    First difference is 'b' vs 'e' - intersection by a map,
    not ok.

For that a new update tree node type is added - XUPDATE_ROUTE.
When several update operations have the same prefix, this prefix
becomes an XUPDATE_ROUTE tree field. It stores the prefix and a
subtree with these operations.

Bar and route update nodes can branch and produce more bars and
routes, when new operations come.

Part of #1261

8cad025a

tuple: make update operation tokens consumable · 35084f66

Vladislav Shpilevoy authored 5 years ago

There is a case: [1][2][3][4] = 100. It is not a problem when it
is a single operation, not intersecting with anything. It is an
isolated update then, and works ok. But the next patch allows
several update operations have the same prefix, and the path
[1][2][3][4] can become a tree of updated arrays. For example, a
trivial tree like this:

    root: [ [1] ]
             |
             [ [1] [2] ]
                    |
                    [ [1] [2] [3] ]
                               |
                               [ [1] [2] [3] [4] ]
                                             =100

When the update is applied to root, the JSON path [1][2][3][4]
is decoded one part by one. And the operation goes down the tree
until reaches the leaf, where [4] = 100 is applied. Each time when
the update goes one level down, somebody should update
xrow_update_op.field_no so as on the first level it would be 1,
then 2, 3, 4.

Does it mean that each level of the update [1][2][3][4] should
prepare field_no for the next child? No, because then they would
need to check type of the child if it is an array or map, or
whatever expects a valid field_no/key in xrow_update_op, and
ensure that map-child gets a key, array-child gets a field_no.
That would complicate the code to a totally unreadable
state, and would break encapsulation between
xrow_update_array/map/bar... . Each array update operation would
check a child for all existing types to ensure that the next token
matches it. The same would happen to map updates.

This patch goes another way - let each level of update check if
its field_no/key is already prepared by the caller. And if not,
extract a next token from the operation path. So the map update
will ensure that it has a valid key, an array update will ensure
that it has a valid field no.

Part of #1261

35084f66

replication: introduce anonymous replica. · e17beed8

Serge Petrenko authored 5 years ago

This commit introduces anonymous replicas. Such replicas do not pollute
_cluster table (they can only be read-only and have a zero id in return).
An anonymous replica can be promoted to a normal one if needed.

Closes #3186

@TarantoolBot document
Title: Document anonymous replica

There is a new type of replica in tarantool, anonymous one. Anonymous
replica is read-only (but you still can write to temporary and
replica-local spaces), and it isn't present in _cluster table.

Since anonymous replica isn't registered in _cluster table, there is no
limitation for anonymous replica count in a replicaset. You can have as
many of them as you want.

In order to make a replica anonymous, you have to pass an option
`replication_anon=true` to `box.cfg`. You also have to set 'read_only'
to true.

Let's go through anonymous replica bootstrap.
Suppose we have a master configured with
```
box.cfg{listen=3301}
```
And created a local space called "loc"
```
box.schema.space.create('loc', {is_local=true})
box.space.loc:create_index("pk")
```
Now, to configure an anonymous replica, we have to issue `box.cfg`,
as usual.
```
box.cfg{replication_anon=true, read_only=true, replication=3301}
```
As mentioned above, `replication_anon` may be set to true only together
with `read_only`
The instance will fetch masters snapshot and proceed to following its
changes. It will not receive an id so its id will remain zero.
```
tarantool> box.info.id
---
- 0
...
```
```
tarantool> box.info.replication
---
- 1:
    id: 1
    uuid: 3c84f8d9-e34d-4651-969c-3d0ed214c60f
    lsn: 4
    upstream:
      status: follow
      idle: 0.6912029999985
      peer:
      lag: 0.00014615058898926
...
```
Now we can use the replica.
For example, we may do inserts into the local space:
```
tarantool> for i = 1,10 do
         > box.space.loc:insert{i}
         > end
---
...
```
Note, that while the instance is anonymous, it will increase the 0-th
component of its vclock:
```
tarantool> box.info.vclock
---
- {0: 10, 1: 4}
...
```
Let's now promote the replica to a normal one:
```
tarantool> box.cfg{replication_anon=false}
2019-12-13 20:34:37.423 [71329] main I> assigned id 2 to replica 6a9c2ed2-b9e1-4c57-a0e8-51a46def7661
2019-12-13 20:34:37.424 [71329] main/102/interactive I> set 'replication_anon' configuration option to false
---
...

tarantool> 2019-12-13 20:34:37.424 [71329] main/117/applier/ I> subscribed
2019-12-13 20:34:37.424 [71329] main/117/applier/ I> remote vclock {1: 5} local vclock {0: 10, 1: 5}
2019-12-13 20:34:37.425 [71329] main/118/applierw/ C> leaving orphan mode
```
The replica just received id 2. We can make it read-write now.
```
box.cfg{read_only=false}
2019-12-13 20:35:46.392 [71329] main/102/interactive I> set 'read_only' configuration option to false
---
...

tarantool> box.schema.space.create('test')
---
- engine: memtx
  before_replace: 'function: 0x01109f9dc8'
  on_replace: 'function: 0x01109f9d90'
  ck_constraint: []
  field_count: 0
  temporary: false
  index: []
  is_local: false
  enabled: false
  name: test
  id: 513
- created
...

tarantool> box.info.vclock
---
- {0: 10, 1: 5, 2: 2}
...
```
Now replica tracks its changes in 2nd vclock component, as expected.
It can also become replication master from now on.

Side notes:
  * You cannot replicate from an anonymous instance.
  * To promote an anonymous instance to a regular one,
    you first have to start it as anonymous, and only
    then issue `box.cfg{replication_anon=false}`
  * In order for the deanonymization to succeed, the
    instance must replicate from some read-write instance,
    otherwise noone will be able to add it to _cluster table.

e17beed8

vclock: ignore 0th component in comparisons · 1a2037b1

sergepetrenko authored 5 years ago

0th vclock component will be used to count replica-local rows of an
anonymous replica. These rows won't be replicated and different
instances will have different values in vclock[0].

Add a function vclock_compare_ingore0, which doesn't order vclocks by 0
component and use it where appropriate.

Part of #3186

1a2037b1

applier: split join processing into two stages · 5962ddb0

Serge Petrenko authored 5 years ago

We already have 'initial join' and 'final join' stages in applier logic.
The first actually means fetching master's snapshot, and the second one
-- receiving the rows which should contain replica's registration in
_cluster.
These stages will be used separately once anonymous replica is
implemented, so split them as a preparation.

Prerequisite #3186

5962ddb0

replication: do not decode replicaset uuid when processing a subscribe · 269295cc

Serge Petrenko authored 5 years ago

After moving cluster id check to replica (7f8cbde3)
we do not check it on master side, so no need to decode it.

Prerequisite #3186

269295cc

box: update comment describing join protocol · d3031e47

Serge Petrenko authored 5 years ago

The comment states that relay sends the latest snapshot to replica
during initial join, however, this was changed in commit
6332aca6 (relay: join new replicas off
read view).
Now relay sends rows from the read view created at the moment of join.
Update the comment to match.

Follow-up #1271

d3031e47

sql: move sql_stmt_busy() declaration to box/execute.h · dff6f5dd

Nikita Pettik authored 5 years ago

We are going to use it in box/execute.c and in SQL prepared statement
cache implementation. So to avoid including whole sqlInt.h let's move it
to relative small execute.h header. Let's also fix codestyle of this
function.

Needed for #2592

dff6f5dd

sql: introduce sql_stmt_query_str() method · a14d1df3
Nikita Pettik authored 5 years ago
```
It is getter to fetch string of SQL query from prepared statement.

Needed for #2592
```
a14d1df3

box: increment schema_version on ddl operations · 68094e8b

Nikita Pettik authored 5 years ago

Some DDL operations such as SQL trigger alter, check and foreign
constraint alter don't result in schema version change. On the other
hand, we are going to rely on schema version to determine expired
prepared statements: for instance, if FK constraint has been created
after DML statement preparation, the latter may ignore FK constraint
(instead of proper "statement has expired" error). Let's fix it and
account schema change on each DDL operation.

Needed for #2592

68094e8b

sql: introduce sql_stmt_est_size() function · fb745eb2

Nikita Pettik authored 5 years ago

To implement memory quota of prepared statement cache, we have to
estimate size of prepared statement. This function attempts at that.

Needed of #2592

fb745eb2