Commits · 17525ed81d4e9dcc20c91582dcd1755980282bbe · core / tarantool

May 15, 2018

sql: move SQL statistics to server · 17525ed8

Nikita Pettik authored 6 years ago

SQLite provides kind of statistics concerning data holding in each index
and space. It is used to help query planner build optimized plan. Such
statistics don't depend on internal index implementation (e.g. B-tree or
LSM tree) and contain information concerning tuple distribution.
This patch moves members responsible statistics from original SQLite
structs to Tarantool ones. To be more precise, now index's opts contains
statistics from _stat1 space (arrays of integers and their logarithm
approximation) and more sophisticated one from _stat4 (array of samples).
It also contains set of flags to turn on/off different optimizations
(such as skip-scan).

After ANALYZE command is executed, statistics are saved into _sql_stat1
and _sql_stat4 spaces (in order to persist them). However, it should also
be loaded into in-memory structs representing indexes. This patch
reworks original SQLite routine to load it directly to newly introduced
stat struct of index.

It is worth mentioning that during update of statistics occurs according
to 'everything or nothing' policy: firstly, it is allocated on the
region. Then, if there is enough memory for stats of ALL indexes it is
copied to the heap. Otherwise, region is truncated and no changes take
place.

Closes #3253

17525ed8

sql: refactor usages of table's tuple count · 946d1ebc

Nikita Pettik authored 6 years ago

Count of tuples containing in given space can be calculated as a number of
tuples in primary index. However, if table represents temporary object
such as result set of SELECT or VIEW, tuple count can't be precisely
determined. In this case, default approximation is used: 1 million tuples.

Part of #3253

946d1ebc

sql: add average tuple size calculation · 23918324

Nikita Pettik authored 6 years ago

Average tuple size can be calculated by dividing space size by index
tuple count. Thus, there is no need to hold this value as separate
member. Thus, this patch removes from struct Index and struct Table such
statistic. Note that now there are no partial indexes, so all indexes
feature the same average tuple size. Also, commented unstable tests.

Part of #3253

23918324

sql: optimize compilation of SELECT COUNT(*) · 8378e4b9

Nikita Pettik authored 7 years ago

Originally, SQLite chooses the best index to perform COUNT operation.
In Tarantool there is no any considerable difference between them,
since this simple operation on B+tree has O(1) complexity,
so lets don't spend time on this routine and choose always primary index,
in case of simple query 'SELECT COUNT(*) FROM <tab>'.
Also, patch contains codestyle fixes.

8378e4b9

sql: initialize found flag for collation · 7b37db3e
Kirill Yukhin authored 6 years ago
```
Since local variable wasn't initialized, collation sometimes
was sporadically skipped.
```
7b37db3e

May 14, 2018

sql: replace KeyInfo with key_def · 501c6e28

Kirill Yukhin authored 6 years ago

KeyInfo is a legacy struct which was heavily used in SQL
front-end. This patch replaces all its usages w/ Tarantool's
natural key description structure called key_def.

This change is a part of data dictionary integration effort:
Tarantool indexes don't aware of KeyInfo, that is why it was
evicted.

Legacy KeyInfo memory handling was ref-counting based and now
is replaced w/ memory duplication. This state of affairs should
be improved in future.

Part of #3235

501c6e28

sql: introduce sort order to key_part/key_part_def · 8c508e45

Kirill Yukhin authored 6 years ago

Legacy SQL DD structs contained sort_order, defined per
index column. During integration, those structs are to be
vanished. So, introduce new field to part entity of Tarantool.
This field states for sorting order of given part in give index.
This field is ignored by Tarantool everywhere excpept for
some of nested queries in SQL.

Patch also replaces usages of SQL's stored sort order w/ this new
field.

Part of #3235

8c508e45

sql: fix iproto response compatibility · 9801b346

Vladislav Shpilevoy authored 6 years ago

Before async netbox merge netbox.execute returned tuples instead
of Lua tables. After the merge it was accidentally reverted.
Return tuples back.

9801b346

May 11, 2018

Merge remote-tracking branch 'origin/1.10' into 2.0 · e64f1e39
Konstantin Osipov authored 6 years ago

e64f1e39
alter: fix broken compile · cf94e753
Konstantin Osipov authored 6 years ago

cf94e753
alter: follow up on the previous patch, dereference a deleted collation · 2043a266
Konstantin Osipov authored 6 years ago
```
Fix a regression introduced in the prevoius patch.
Dereference a deleted collation on commit, otherwise it leaks.
```
2043a266

collation: reference collation objects · a9069799

Vladislav Shpilevoy authored 6 years ago

A collation is completely defined by a signature consisting of
collation properties: locale, ICU strength, normalization mode
etc.

When two collations have the same signature, they work identically.
Create a second collation cache, in which collations are stored
by signature instead of id. To reference the same collation in the
signature cache multiple times from id/name cache, introduce
collation reference counting.

a9069799

alter: fix an assert failure when altering a collation · f51ac0fa

Vladislav Shpilevoy authored 6 years ago

Forbid update in _collation system space. It leads to an assertion
failure in collation cache, since it has no support for replace
operation.
Refactor collation deletion and insertion in alter.cc.

Besides, collation update potentially changes order of data
in the index and did not trigger rebuild of all indexes.

f51ac0fa

sql: fix SAVEPOINT RELEASE statement · f3ce5477

Nikita Pettik authored 6 years ago

Before this patch SAVEPOINT RELEASE statement always raised error,
due to SQLite's obsolete code. Now it has been removed, and
SAVEPOINT RELEASE works as desired.

Closes #3379

f3ce5477

sql: allow SAVEPOINT statement outside transaction · 0ab03594

Nikita Pettik authored 6 years ago

Before this patch, usage of SAVEPOINT statement outside transaction or
inside transaction started in Lua, led to assertion fault.
Now, failed assert is substituted with checks to test transaction status.

Closes #3313

0ab03594

sql: allow transitive Lua <-> SQL transactions · 27aaba6a

Nikita Pettik authored 6 years ago

This patch makes possible to start transaction in Lua and continue
operations in SQL as well, and vice versa. Previously, such transactions
result in assertion fault. To support them, it is required to hold deferred
foreign keys constraints as attributes of transaction, not particular VDBE.
Thus, deferred foreign keys counters have been completely removed from
VDBE and transfered to sql_txn struct. In its turn, if there is at least
one deferred foreign key violation, error will be raised alongside with
rollback - that is what ANSI SQL says. Note that in SQLite rollback
doesn't occur: transaction remains open untill explicit rollback or
resolving all FK violations.

Also, 'PRAGMA defer_foreign_keys' has been slightly changed: now it is not
automatically turned off after trasaction's rollback or commit. It
can be turned off by explicit PRAGMA statement only. It was made owing
to the fact that execution of PRAGMA statement occurs in auto-commit
mode, so it ends with COMMIT. Hence, it turns off right after turning on
(outside the transaction).

Closes #3237

27aaba6a

sql: remove OP_AutoCommit opcode · 325e2a1d

Nikita Pettik authored 6 years ago

In SQLite OP_AutoCommit opcode used to set transaction operation:
BEGIN, ROLLBACK and COMMIT, switching auto-commit flag in VDBE.
As for Tarantool, it is confusing, since there are some differences
between auto-commit modes: 'INSERT ... VALUES (1), (2), (3)' is one
indivisible operation for SQLite, and three operations in real
auto-commit mode for Tarantool. To simulate SQLite auto-commit mode,
these three insertions are wrapped into one SEPARATE transaction,
which is, in fact, not real autocommit mode.
So, lets add separate explicit opcodes to BEGIN, ROLLBACK and COMMIT
transactions as user's operations. Auto-commit mode is set once at VDBE
creation and can be changed only by implicit opcode OP_TTransaction,
which is added to each DML statement, or by 'BEGIN' SQL statement.

325e2a1d

alter: fix sloppy code in before_replace trigger · c6ee993c

Konstantin Osipov authored 6 years ago

Lock the latch after allocating on_commit/on_rollback triggers
to avoid being stuck with a locked latch in case of OOM error.

Add a comment.

c6ee993c

sql: Generate rowid by counter · d02286c2

AKhatskevich authored 7 years ago

The function `OP_NextIdEphemeral` do not produce unique ids.
The new way to get rowid is to create sequential a counter.
One of registers initializes with int64_t = 0 and increases after each
insert. There are similar cases in `select.c` file, however, they are not as
straight-forward and would be fixed in future commits.

In case of error in `tarantoolSqlite3EphemeralGetMaxId` it was creating
diag string and silently continue working. It was a huge luck that
non-valid output of the function did not lead to crashes.

After the fix, it was found that some opcodes in `select.c` were using
wrong registers. These were fixed too, but they still have bugs. For
more information see #3297.

Also, fixed a memory leak.

Related to #3297

d02286c2

sql: fixes op-codes' generation for skip-scan · 533ce8d7

SudoBobo authored 6 years ago

Currently we have skip-scan optimization working
the way described here:
(https://sqlite.org/optoverview.html#skipscan)

To understand the problem solved consider the example
with some skip-scan query and a table created like
CREATE TABLE t(a, b, c, PRIMARY KEY(c, b, a));

Before the patch: op-codes realization of skip-scan
relied on the following work of 'Column' op-code:
If P1 is a number of cursor pointing to the table
then P2 is number of column in 'native' order
(0 for a, 1 for b, etc)
If P1 is a number of cursor pointing to the index
then P2 in number of column in 'index' order
(0 for c, 1 for b, etc)
But currently our 'Column' op-code always consider
P2 to be column number in 'native order'.

With the patch:
P2 is always set in 'Column' as a column number
in 'native' order.

Closes #3350, #2859

533ce8d7

May 08, 2018

Merge branch '1.10' into 2.0 · 34237693
Vladislav Shpilevoy authored 6 years ago

34237693

test: fix box/net.box.test · 3ad7f331

Vladislav Shpilevoy authored 6 years ago

box/net.box.test creates a pair of long-poll requests, which are
not finalized then. If the next test appears to be
box/net_msg_max.test.lua, then it hangs since it expects, that
nobody occupies tx fiber pool.

3ad7f331

Merge branch '1.10' into 2.0 · df710479
Vladislav Shpilevoy authored 6 years ago

df710479

netbox: allow async request sending on 'fetch_schema' · 7e4e4687

Vladislav Shpilevoy authored 6 years ago

When a netbox state machine is in 'fetch_schema' state, an async
request must not
* raise an error, because it is not 'error' state;
* wait for 'active' state, because async request must no wait
  anything.

Follow up #3107

7e4e4687

netbox: remove schema_version from requests · 796de67c

Vladislav Shpilevoy authored 6 years ago

Schema_version was used in netbox to update local box-like
schema. The box-like schema makes able to access spaces and
indexes via connection object.

It was updated each time, when a response from a server is
received with a schema version non-equal to the local value.

But there was no reason why a schema version is needed in a
request. It leads to ER_WRONG_SCHEMA_VERSION error sometimes,
but netbox on this error just resends the same request again. The
same behaviour can be reached with just no sending any schema
version to a server.

Remove schema_version from request, and just track schema version
changes in responses.

Part of #3351
Part of #3333
Follow up #3107

796de67c

netbox: introduce fiber-async API · 0f686829

Vladislav Shpilevoy authored 6 years ago

Now any netbox call blocks a caller-fiber until a result is read
from a socket, or time is out. To use it asynchronously it is
necessary to create a fiber per request. Sometimes it is
unwanted - for example if RPS is very high (for example, about
100k), and latency is about 1 second. Or when it is neccessary
to send multiple requests in parallel and then collect responses
(map-reduce).

The patch introduces a new option for all netbox requests:
is_async. With this option any called netbox method returns
immediately (but still yields for a moment) a 'future' object.

By a future object a user can check if the request is finalized,
get a result or error, wait for a timeout, discard a response.

Example of is_async usage:
future = conn:call(func, {params}, {..., is_async = true})
-- Do some work ...
if not future.is_ready() then
    result, err = future:wait_result(timeout)
end
-- Or:
result, error = future:result()

A future:result() and :wait_result() returns either an error or
a response in the same format, as the sync versions of the called
methods.

Part of #3107

0f686829

netbox: extend codec with 'decode' methods · 1766db6b

Vladislav Shpilevoy authored 6 years ago

Netbox has a table 'method_codec' that is used to encode a
request by a method name. But a response is decoded out of codec.
It leads to
1) decoding into Lua tables before decoding into tuples where
needed - it is double decoding and produces a lot of garbage;
2) each method contains hacks like one_tuple(), or single tuple
check.

These things can not be fixed with no real codec instead of
encoder only.

Also global table with decoders is needed for #3107, where
a request could be sent async with no fiber blocking. An async
response when received already does not have a call context - it
has only method name.

Needed for #3107

1766db6b

lua: allow to create and error object with no throw · f3ca94a1

Vladislav Shpilevoy authored 6 years ago

It is needed to return error via 'nil, error_object' notation,
and to store an error object to return it later.

Closes #3031

f3ca94a1

lua: remove box.error.raise · 5cab7e50

Vladislav Shpilevoy authored 6 years ago

It did not work because raise is implemented as __index metatable
member, and error() is __call metatable member. The second one
takes additional implicit argument - self.

And it is not documented, so can be removed.

5cab7e50

schema: expose space_mt and index_mt on box.schema table · 2187d418

Vladislav Shpilevoy authored 7 years ago

This commit allows userland to extend the space and index
metatables with their own functions or even metamethods. Reducing
barriers for this kind of experimentation is vital for user
contribution toward the improvement of Tarantool's API.

There are 4 metatables available for extending:
box.schema.space_mt - metatable of all spaces;
box.schema.index_mt - base metatable of all indexes - replicated
                      into the vinyl and memtx. See below how.
box.schema.vinyl_index_mt - metatable of all vinyl indexes;
box.schema.memtx_index_mt - metatable of all memtx indexes.

On the other hand local space/index metatables still can be
extended individually to save compatibility with existing
modules. Routinely space/index metatable is just a proxy for a
global mt. When a user attempts to extend a space or index
methods via local space/index metatable instead of from
box.schema mt, the local metatable is transformed. Its __index
metamethod starts looking up at first in self, and only then into
the global mt.

Closes #3204

2187d418

schema: inherit vinyl/memtx_index_mt from base index mt · aea5d9d4

Vladislav Shpilevoy authored 7 years ago

Now space.bless() in Lua serves to choose a correct index
metatable that depends on a space engine. Vinyl index methods
must not use FFI since yield breaks it. Lets do not choose
correct methods one by one in space.bless, create them only
once on start, and then just do copy of needed table.

aea5d9d4

vinyl: remove vy_apply_upsert_ops · e650ae9d

Vladislav Shpilevoy authored 6 years ago

Function vy_apply_upsert_opts originaly appears in this commit:
5627e53b, where it is a
refactored version of a sophia upsertion. But when a vy_stmt was
introduced, vinyl_apply_upsert_ops works just like ordinary
tuple_upsert_execute. Remove this useless wrapper.

e650ae9d

schema: move space_mt and index_mt definition out of space bless · a57b740c

Vladislav Shpilevoy authored 7 years ago

Space_mt and index_mt are created individually for each space
inside a giant space.bless function. It makes impossible to
implement #3204: exposing space and index metatables to a user,
because for this they must be global and shared between all
spaces and indexes.

Lets move their definition out of space.bless function, and do
their duplicate inside.

Needed #3204

a57b740c

lua: update index Lua objects on alter, do not replace · 51b2badf

Vladislav Shpilevoy authored 6 years ago

Before the patch on any space alter Tarantool recreates its
indexes leaving old index objects be invalid. To protect a user
from errors about accessing outdated index object fields lets
update it in place instead of creating a new one.

Closes #3285

51b2badf

Merge branch '1.10' into 2.0 · 4bb213bf
Konstantin Osipov authored 6 years ago

4bb213bf

iproto: allow to configure IPROTO_MSG_MAX · b66d0de7

Vladislav Shpilevoy authored 6 years ago

IPROTO_MSG_MAX is a constant that restricts count of requests in
fly. It allows to do not produce too many fibers in TX thread,
that would lead to too big overhead on fibers switching, their
stack storing.

But some users have powerful metal on which Tarantool
IPROTO_MSG_MAX constant is not serious. The patch exposes it as
a configuration runtime parameter.

'net_msg_max' is its name. If a user sees that IProto thread
is stuck due to too many requests, it can change iproto_msg_max
in runtime, and IProto thread immediately starts processing
pending requests.

'net_msg_max' can be decreased, but obviously it can not stop
already runned requests, so if now in IProto thread request count
is > new 'net_msg_max' value, then it takes some time until
some requests will be finished.

`net_msg_max` automatically increases fiber pool size, when
needed.

Closes #3320

b66d0de7

iproto: fix error with unstoppable batching · 01bfa59b

Vladislav Shpilevoy authored 6 years ago

IProto connection stops to read input on reached request limit.
But when multiple requests are in a batch, the IProto does not
check the limit, so it can be violated.

Lets check the limit during batch parsing after each message too,
not only once before parsing.

01bfa59b

replication: Caching replication_skip_conflict in C · 0ecb8eba

Konstantin Belyavskiy authored 6 years ago

Improvement for #3270 (replication_skip_conflict).
Create a global variable for caching Lua variable to not check the
configuration option, that could be a bottleneck in case of massive
conflicts.

For #3270

0ecb8eba

Merge branch '1.9' into 1.10 · 7393236f
Konstantin Osipov authored 6 years ago

7393236f

socket: Fix socket test · 2b973c05

Ilya Markov authored 6 years ago

In sequential launch of app-tap/console.test, tests failed with "User
exists" and binding errors.

Make sockets path relative.
Add users cleanup.

Relates #3168

2b973c05