Commits · 7dee93a01e47449d942b5ee2b86596ee6e836b89 · core / tarantool

Mar 29, 2018

index: add abort_create virtual method · 7dee93a0

The new method is called if index creation failed, either due to WAL
write error or build error. It will be used by Vinyl to purge prepared
LSM tree from vylog.

7dee93a0

Fix net.box test · d9e254f8
Vladislav Shpilevoy authored 6 years ago

d9e254f8
Merge branch '1.9' into 1.10 · 97cc085f
Konstantin Osipov authored 6 years ago

97cc085f

log: Fix logging large objects · 5ab4581d

Ilya Markov authored 6 years ago

The bug was that logging we passed to function write
number of bytes which may be more than size of buffer.
This may happen because formatting log string we use vsnprintf which
returns number of bytes would be written to buffer, not the actual
number.

Fix this with limiting number of bytes passing to write function.

Close #3248

5ab4581d

Merge branch '1.9' into 1.10 · 180af15f
Konstantin Osipov authored 6 years ago

180af15f

vinyl: improve latency stat · f3a84293

Vladimir Davydov authored 7 years ago

To facilitate performance analysis, let's report not only 99th
percentile, but also 50th, 75th, 90th, and 95th. Also, let's add
microsecond-granular buckets to the latency histogram.

Closes #3207

f3a84293

say: Fix log_rotate · 26a4effe

Ilya Markov authored 7 years ago

* Refactor tests.
* Add ev_async and fiber_cond for thread-safe log_rotate usage.

Follow up #3015

26a4effe

log: Fix logger.test.lua · d0dcc8b9

Ilya Markov authored 7 years ago

Fix race condition in test on log_rotate.
Test opened file that must be created by log_rotate and read from it.
But as log_rotate is executed in separate thread, file may be not
created or log may be not written yet by the time of opening in test.

Fix this with waiting creation and reading the line.

d0dcc8b9

netbox: deprecate console support · bd06e32a
Vladislav Shpilevoy authored 7 years ago
```
Print warning about that. After a while the cosole support will
be deleted from netbox.
```
bd06e32a

console: do not use netbox for console text connections · 1730c538

Vladislav Shpilevoy authored 7 years ago

Netbox console support complicates both netbox and console. Lets
use sockets directly for text protocol.

Part of #2677

1730c538

netbox: allow to create a netbox connection from existing socket · d2468dac
Vladislav Shpilevoy authored 7 years ago
```
It is needed to create a binary console connection, when a
socket is already created and a greeting is read and decoded.
```
d2468dac

bloom: drop spectrum · bc859dce

Vladimir Davydov authored 6 years ago

As it was pointed out earlier, the bloom spectrum concept is rather
dubious, because its overhead for a reasonable false positive rate is
about 10 bytes per record while storing all hashes in an array takes
only 4 bytes per record so one can stash all hashes and count records
first, then create the optimal bloom filter and add all hashes there.

bc859dce

bloom: optimize tuple bloom filter size · 4357bcf3

Vladimir Davydov authored 6 years ago

When we check if a multi-part key is hashed in a bloom filter, we check
all its sub keys as well so the resulting false positive rate will be
equal to the product of multiplication of false positive rates of bloom
filters created for each sub key.

The false positive rate of a bloom filter is given by the formula:

  f = (1 - exp(-kn/m)) ^ k

where m is the number of bits in the bloom filter, k is the number of
hash functions, and n is the number of elements hashed in the filter.
By varying n, we can estimate the false positive rate of an existing
bloom filter when used for a greater number of elements, in other words
we can estimate the false positive rate of a bloom filter created for
checking sub keys when used for checking full keys.

Knowing this, we can adjust the target false positive rate of a bloom
filter used for checking keys of a particular length based on false
positive rates of bloom filters used for checking its sub keys. This
will reduce the number of hash functions required to conform to the
configured false positive rate and hence the bloom filter size.

Follow-up #3177

4357bcf3

vinyl: introduce bloom filters for partial key lookups · fc654aaf

Vladimir Davydov authored 6 years ago

Currently, we store and use bloom only for full-key lookups. However,
there are use cases when we can also benefit from maintaining bloom
filters for partial keys as well - see #3177 for example. So this patch
replaces the current full-key bloom filter with a multipart one, which
is basically a set of bloom filters, one per each partial key. Old bloom
filters stored on disk will be recovered as is so users will see the
benefit of this patch only after major compaction takes place.

When a key or tuple is checked against a multipart bloom filter, we
check all its partial keys to reduce the false positive result.
Nevertheless there's no size optimization as per now. E.g. even if the
cardinality of a partial key is the same as of the full key, we will
still store two full-sized bloom filters although we could probably save
some space in this case by assuming that checking against the bloom
corresponding to a partial key would reduce the false positive rate of
full key lookups. This is addressed later in the series.

Before this patch we used a bloom spectrum object to construct a bloom
filter. A bloom spectrum is basically a set of bloom filters ranging in
size. The point of using a spectrum is that we don't know what the run
size will be while we are writing it so we create 10 bloom filters and
choose the best of them after we are done. With the default bloom fpr of
0.05 it is 10 byte overhead per record, which seems to be OK. However,
if we try to optimize other parameters as well, e.g. the number of hash
functions, the cost of a spectrum will become prohibitive. Funny thing
is a tuple hash is only 4 bytes long, which means if we stored all
hashes in an array and built a bloom filter after we'd written a run, we
would reduce the memory footprint by more than half! And that would only
slightly increase the run write time as scanning a memory map of hashes
and constructing a bloom filter is cheap in comparison to mering runs.
Putting it all together, we stop using bloom spectrum in this patch,
instead we stash all hashes in a new bloom builder object and use them
to build a perfect bloom filer after the run has been written and we
know the cardinality of each partial key.

Closes #3177

fc654aaf

bloom: rename bloom_possible_has to bloom_maybe_has · f03fd4db
Vladimir Davydov authored 6 years ago
```
Suggested by @kostja
```
f03fd4db

bloom: use malloc for bitmap allocations · 78df5acd

Vladimir Davydov authored 6 years ago

There's absolutely no point in using mmap() instead of malloc() for
bitmap allocation - malloc() will fallback on mmap() anyway provided
the allocation is large enough.

Note about the unit test: since we don't round the bloom filter size up
to a multiple of page size anymore, we have to use a more sophisticated
hash function for the test to pass.

78df5acd

test: vinyl/layout: fix bloom filter filtering in output · 88c4c19a

Vladimir Davydov authored 6 years ago

We filter bloom filters, because they depend on ICU version and hence
the test output may vary from one platform to another (see commit
0a37ccad "Filter out bloom_filter in vinyl/layout.test.lua").
However, using test_run for this is unreliable, because a bloom string
can contain newline characters and hence be split in multiple lines in
console output, in which case the filter won't work. Fix this by
filtering bloom_filter manually.

88c4c19a

Merge branch '1.9' into 1.10 · 7ee84c95
Vladislav Shpilevoy authored 6 years ago

7ee84c95

netbox: show is_nullable and collation fields · cc935d24

Kirill Shcherbatov authored 6 years ago

Netbox does not need nullability or collation info, but some
customers do. Lets fill index parts with these fields.

Fixes #3256

cc935d24

Mar 28, 2018

tuple: add names_only option to build true dictionary · 5ad26fe2

Kirill Shcherbatov authored 6 years ago

Now tuple:tomap() method returns a map with both field names and
field indexes, equal to the same field values. It is done to
1) allow to still access tomap() result like a tuple, by indexes;
2) allow to access non-named fields.

But is not useful, when a result map must be saved somewhere, for
example, in JSON - all its keys muse be strings. So allow to
get this behaviour using tuple:tomap({names_only = true}).

Fixes #3280

5ad26fe2

Mar 27, 2018

Merge branch '1.9' into 1.10 · 57d3188d
Konstantin Osipov authored 7 years ago

57d3188d

Clear session storage on session stop · cd48321d

Georgy Kirichenko authored 7 years ago

* session_run_on_disconnect_triggers is called only if there are
corresponding triggers so move session_storage_cleanup to
session_destroy.
* fix session storage cleanup path: use
"box.session.aggregate_storage[sid]" instead of
"session.aggregate_storage[sid]" (what was wrong)

Fixed #3279

cd48321d

vinyl: rename vy_index to vy_lsm · 093f172e

Vladimir Davydov authored 7 years ago

Vinyl assigns a unique id to each index so that it can be identified in
vylog, see vy_index->id, but outside Vinyl index id means something
completely different - it's the ordinal number of the index in a space.
This creates a lot of confusion. To resolve this, let's rename vy_index
to vy_lsm and refer to Vinyl indexes as LSM trees in comments so that
Vinyl's index_id turns into lsm_id.

Also, rename vy_log_record's index_def_id and space_def_id back to
conventional index_id and space_id as they doesn't conflict with Vinyl
index id anymore (which is now lsm_id).

Thanks to @Gerold103 for suggesting the new name.

093f172e

test: box/errinj: remove space truncate WAL write error test case · 11e38593

Vladimir Davydov authored 7 years ago

Currently, DDL isn't properly rolled back if WAL write fails, leaving
the space in an inconsistent state - see alter_space_do() that calls
space_commit_alter() right in on_replace trigger. It isn't clear how to
fix this properly right now. Let's disable the corresponding test case
until we figure out how to resolve this problem, so as not to stall
further development.

See #3289

11e38593

Mar 26, 2018
- vinyl: fix index commit_lsn not written to vylog on rotation · bf3ffc81
  Vladimir Davydov authored 7 years ago
  
  Fixes 25fa5e21 vinyl: do not use index lsn to identify indexes in vylog
  bf3ffc81
Mar 22, 2018

alter: rewrite space truncation using alter infrastructure · 1c3eb95b

Vladimir Davydov authored 7 years ago

Truncation of a space is equivalent to recreation of all space indexes
with the same definition. The reason why we use a special system space
to trigger space truncation (_truncate) is that we don't have
transactional DDL while space truncation has to be done atomically.
However, apart from the new system space, implementation of truncation
entailed a new vylog record (VY_LOG_TRUNCATE_INDEX) and quite a few
lines of code to handle it. So why couldn't we just invoke ALTER that
would recreate all indexes?

To answer this question, one needs to recall that back then vinyl used
LSN to identify indexes in vylog. As a result, we couldn't recreate more
than one index in one operation - if we did that, they would all have
the same LSN and hence wouldn't be distinguishable in vylog. So we had
to introduce a special vylog operation (VY_LOG_TRUNCATE_INDEX) that
bump the truncation counter of an index instead of just dropping and
recreating it. We also had to introduce a pair of new virtual space
methods, prepare_truncate and commit_truncate so that we could write
this new command to vylog in vinyl. Putting it all together, it becomes
obvious why we couldn't reuse ALTER code for space truncation.

Fortunately, things have changed since then. Now, vylog identifies
indexes by space_id/index_id. That means that now we can simplify
space truncation implementation a great deal by

 - reusing alter_space_do() for space truncation,
 - dropping space_vtab::prepare_truncate and commit_truncate,
 - removing truncate_count from space, index, and vylog.

1c3eb95b

vinyl: do not use index lsn to identify indexes in vylog · 25fa5e21

Vladimir Davydov authored 7 years ago

vy_log_record::index_lsn serves two purposes. First, it is used as a
unique object identifier in vylog (it is similar to range_id or slice_id
in this regard). Second, it is the LSN of the WAL row that committed the
index, and we use it to lookup the appropriate index incarnation during
WAL recovery. Mixing these two functions is a bad design choice because
as a result we can't create two vinyl indexes in one WAL row, which may
happen on ALTER of a primary key. Besides, we can't create an index
object before WAL write, which is also needed for ALTER, because at that
time there's no LSN assigned to the index yet.

That said, we need to split this variable in two: index_id and
commit_lsn. To be backward compatible, we rename index_lsn to
index_id everywhere in vylog and add a new record field commit_lsn;
if commit_lsn is missing for a create_index record, then this must
be a record left from an old vylog and so we initialize it with
index_id (former index_lsn) - see vy_log_record_decode().

25fa5e21

vinyl: rename vy_log_record::index_id/space_id to index_def_id/space_def_id · 66d52ecb

Vladimir Davydov authored 7 years ago

I'm planning to assign a unique identifier to each vinyl index so that
it could be used instead of lsn for identifying indexes in vylog. In
order not to confuse it with the index ordinal number, let's rename
vy_log_record::index_id to index_def_id and, for consistency, space_id
to space_def_id.

66d52ecb

vinyl: rename vy_index::id to index_id · ee6b868c

Vladimir Davydov authored 7 years ago

Throughout Vinyl we use the term 'id' for calling members representing
unique object identifiers: vy_slice::id, vy_run::id, vy_range::id.
There's one exception though: vy_index::id is the ordinal number of the
index in a space. This is confusing. Besides, I'm planning to assign a
unique id to each vinyl index so that I could look them up in vylog.
I'd like to call the new member 'id' for consistency. So let's rename
vy_index::id to index_id.

ee6b868c

feedback: fix sporadically failing test, improve usability · 110af406
Konstantin Osipov authored 7 years ago

110af406

vinyl: refactor vylog recovery · a1c3e5bd

Vladimir Davydov authored 7 years ago

The vy_recovery structure was initially designed as opaque to the
outside world - to iterate over objects stored in it, one is supposed to
use vy_recovery_iterate(), which invokes the given callback for each
recovered object encoded as vy_log_record that was used to create it.

Such a design gets extremely difficult to use when we need to preserve
some context between callback invocations - e.g. see how ugly backup and
garbage collection procedures look. And it is going to become even more
obfuscated once we introduce the notion of incomplete indexes (indexes
that are currently being built by ALTER).

So let's refactor vylog recovery procedure: let's make the vy_recovery
structure transparent and allow to iterate over internal representations
of recovered objects directly, without callbacks.

a1c3e5bd

box: Introduce feedback daemon · 2ae373ae

Roman Proskin authored 7 years ago

* feedback daemon sends information about instance to the
  specified host.
* Add new options to box.cfg:
    - feedback_enabled - switch on/off daemon, default=true.
    - feedback_host - host to which feedbacks are sent,
      default="https://feedback.tarantool.io".
    - feedback_interval - time interval in seconds of feedbacks
    sending, default=3600.
* Add possibility to generate feedback file in json format with
function box.feedback.save

Closes #2762

2ae373ae

fiber: Introduce fiber.join() related methods · 48cad35e

Ilya authored 7 years ago

Introduce two functions
* fiber.new() - create fiber, schedules it into the
ready queue but doesn't call it and doesn't yield.
Signature of the method is the same as for fiber.create
* fiber.join() - waits until the specified fiber finishes
its execution and returns result or error. Applicable only
to joinable fibers.
* fiber.set_joinable() - sets the fiber joinable flag

Closes #1397

48cad35e

Merge branch '1.9' into 1.10 · 021ce483
Konstantin Osipov authored 7 years ago

021ce483
test: fix a failing test after merging new fio.pathjoin() implementation · 6267bc6c
Konstantin Osipov authored 7 years ago

6267bc6c
Merge remote-tracking branch 'origin/1.9' into 1.10 · c9fb5eec
Konstantin Osipov authored 7 years ago

c9fb5eec

[fio] allow empty path part in pathjoin (#3260) · 49c1de8f

Alec Larson authored 7 years ago

Empty strings should be ignored, rather than throw an error.

Passing only empty strings (or nothing) to `pathjoin` will return '.' which means the current directory

Every path part passed to `pathjoin` is now converted to a string

The `gsub('/+', '/')` call already does what the removed code does, so avoid the unnecessary work
Simply check if the result equals '/' before removing a trailing '/'.

The previous check did extra work for no gain.

49c1de8f

Mar 21, 2018
- Merge branch '1.9' into 1.10 · df2fc14f
  Vladislav Shpilevoy authored 7 years ago
  
  df2fc14f
- Remove empty function declaration · e954dcf7
  Vladislav Shpilevoy authored 7 years ago
  
  e954dcf7
Mar 20, 2018

iproto: get iproto obuf only right before usage · 7a147583

Vladislav Shpilevoy authored 7 years ago

It is possible to discard non-sent responses using a special
sequence of requests and yields. In details: if DML requests
yield on commit too long, there are fast read requests, and a
network is saturated, then some non-sent DML responses are
discarded.

Closes #3255

7a147583