- Apr 05, 2018
-
-
Vladimir Davydov authored
If the primary key is modified, we schedule rebuild of all non-unique (including nullable) secondary TREE indexes. This is valid for memtx, but is not quite right for vinyl. For vinyl we have to rebuild all secondary indexes, because they are all non-clustered (i.e. point to tuples via primary key parts). This doesn't result in any bugs for now, because rebuild of vinyl indexes is not supported, but hopefully this is going to change soon. So let's introduce a new virtual index method, index_vtab::depends_on_pk, which returns true iff the index needs to be updated if the primary key changes, and define this new method for vinyl and memtx TREE indexes.
-
Vladimir Davydov authored
The new method is called after successful update of index definition. It is passed the signature of the WAL record that committed the operation. It will be used by Vinyl to update key definition in vylog.
-
Konstantin Osipov authored
-
Ilya Markov authored
* Remove rewriting format of default logger in case of syslog option. * Add facility option parsing and use parsed results in format message according to RFC3164. Possible values and default value of syslog facility are taken from nginx (https://nginx.ru/en/docs/syslog.html) * Move initialization of logger type and format fucntion before initialization of descriptor in log_XXX_init, so that we can test format function of syslog logger. Closes gh-3244.
-
- Apr 04, 2018
-
-
Vladimir Davydov authored
The only difference between format of UPSERT statements and format of other DML statements of the same index is that the former reserves one byte for UPSERT counter, which is needed to schedule UPSERT squashing. Since we store UPSERT counter on lsregion now, we don't need a special format for UPSERTs anymore. Remove it.
-
Vladimir Davydov authored
Currently, we store upsert counter in tuple metadata (that's what upsert_format is for), but since it's only relevant for tuples of the memory level, we can store it on lsregion, right before tuple data. Let's do it now so that we can get rid of upsert_format.
-
Kirill Yukhin authored
-
Alexander Turenko authored
Filed gh-3311 to remove this export soon. Fixes #3310.
-
- Apr 03, 2018
-
-
Konstantin Osipov authored
-
Vladimir Davydov authored
If the size of a transaction is greater than the configured memory limit (box.cfg.vinyl_memory), the transaction will hang on commit for 60 seconds (box.cfg.vinyl_timeout) and then fail with the following error message: Timed out waiting for Vinyl memory quota This is confusing. Let's fail such transactions immediately with OutOfMemory error. Closes #3291
-
- Apr 02, 2018
-
-
Arseny Antonov authored
-
Arseny Antonov authored
-
- Mar 30, 2018
-
-
Konstantin Belyavskiy authored
In case of sudden power-loss, if data was not written to WAL but already sent to remote replica, local can't recover properly and we have different datasets. Fix it by using remote replica's data and LSN comparison. Based on @GeorgyKirichenko proposal and @locker race free check. Closes #3210
-
Konstantin Belyavskiy authored
Stay in orphan (read-only) mode until local vclock is lower than master's to make sure that datasets are the same across replicaset. Update replication/catch test to reflect the change. Suggested by @kostja Needed for #3210
-
Vladimir Davydov authored
Closes #3148
-
Vladislav Shpilevoy authored
Text console tried to learn about SIGPIPE before its raising by read-before-write. If a socket is readable, but read returns 0, then it is closed, and writing to it can raise SIGPIPE. But Tarantool ignores SIGPIPE, so the process will not be terminated, write() just returns -1. The original code checks for SIGPIPE, because when Tarantool is run under debugger (gdb or lldb), the debugger by default sets its own signal handlers, and SIGPIPE terminates the process. But debugger settings can be changed to ignore SIGPIPE too, so lets remove this overengineering from the console code.
-
Vladislav Shpilevoy authored
If a remote host is unreachable on the first connection attempt, and reconnect_after is set, then netbox state machine enters error state, but it must enter error_reconnect. Do it. The bug was introduced by me in d2468dac.
-
Vladimir Davydov authored
EV_USE_REALTIME and EV_USE_MONOTONIC, which force libev to use clock_gettime, are enabled automatically on Linux, but not on OS X. We used to forcefully enable them for performance reasons, but this broke compilation on certain OS X versions and so was disabled by commit d36ba279 ("Fix gh-1777: clock_gettime detected but unavailable in macos"). Today we need these features enabled not just because of performance, but also to avoid crashes when time changes on the host - see issue #2527 and commit a6c87bf9 ("Use ev_monotonic_now/time instead of ev_now/time for timeouts"). Fortunately, we have this cmake defined macro HAVE_CLOCKGETTIME_DECL, which is set if clock_gettime is available. Let's enable EV_USE_REALTIME and EV_USE_MONOTONIC if this macro is defined. Closes #3299
-
- Mar 29, 2018
-
-
Vladislav Shpilevoy authored
-
Vladimir Davydov authored
When a vylog transaction is rolled back, we always reset vy_log.tx_size. Generally speaking, this is incorrect as rollback doesn't necessarily remove all pending records from the tx buffer - there still may be records committed with vy_log_tx_try_commit() that were left in the buffer due to write errors. We don't rollback such records, but we still reset tx_size, which leads to a discrepancy between vy_log.tx_size and the actual length of vy_log.tx list, which further on results in an assertion failure: src/box/vy_log.c:698: vy_log_flush: Assertion `i < vy_log.tx_size' failed. We need vy_log.tx_size to allocate xrow_header array of a proper size so that we can flush pending vylog records to disk. This isn't a hot path there, because vylog operations are rare. Besides, we iterate over all records anyway to fill the xrow_header array. That said, let's remove vy_log.tx_size altogether and instead calculate the vy_log.tx list length right in place.
-
Vladimir Davydov authored
Currently, we use mh_foreach, but each object is on an rlist, which suits better for iteration.
-
Vladimir Davydov authored
The new method is called if index creation failed, either due to WAL write error or build error. It will be used by Vinyl to purge prepared LSM tree from vylog.
-
Vladislav Shpilevoy authored
-
Konstantin Osipov authored
-
Ilya Markov authored
The bug was that logging we passed to function write number of bytes which may be more than size of buffer. This may happen because formatting log string we use vsnprintf which returns number of bytes would be written to buffer, not the actual number. Fix this with limiting number of bytes passing to write function. Close #3248
-
Konstantin Osipov authored
-
Vladimir Davydov authored
To facilitate performance analysis, let's report not only 99th percentile, but also 50th, 75th, 90th, and 95th. Also, let's add microsecond-granular buckets to the latency histogram. Closes #3207
-
Ilya Markov authored
* Refactor tests. * Add ev_async and fiber_cond for thread-safe log_rotate usage. Follow up #3015
-
Ilya Markov authored
Fix race condition in test on log_rotate. Test opened file that must be created by log_rotate and read from it. But as log_rotate is executed in separate thread, file may be not created or log may be not written yet by the time of opening in test. Fix this with waiting creation and reading the line.
-
Vladislav Shpilevoy authored
Print warning about that. After a while the cosole support will be deleted from netbox.
-
Vladislav Shpilevoy authored
Netbox console support complicates both netbox and console. Lets use sockets directly for text protocol. Part of #2677
-
Vladislav Shpilevoy authored
It is needed to create a binary console connection, when a socket is already created and a greeting is read and decoded.
-
Vladimir Davydov authored
As it was pointed out earlier, the bloom spectrum concept is rather dubious, because its overhead for a reasonable false positive rate is about 10 bytes per record while storing all hashes in an array takes only 4 bytes per record so one can stash all hashes and count records first, then create the optimal bloom filter and add all hashes there.
-
Vladimir Davydov authored
When we check if a multi-part key is hashed in a bloom filter, we check all its sub keys as well so the resulting false positive rate will be equal to the product of multiplication of false positive rates of bloom filters created for each sub key. The false positive rate of a bloom filter is given by the formula: f = (1 - exp(-kn/m)) ^ k where m is the number of bits in the bloom filter, k is the number of hash functions, and n is the number of elements hashed in the filter. By varying n, we can estimate the false positive rate of an existing bloom filter when used for a greater number of elements, in other words we can estimate the false positive rate of a bloom filter created for checking sub keys when used for checking full keys. Knowing this, we can adjust the target false positive rate of a bloom filter used for checking keys of a particular length based on false positive rates of bloom filters used for checking its sub keys. This will reduce the number of hash functions required to conform to the configured false positive rate and hence the bloom filter size. Follow-up #3177
-
Vladimir Davydov authored
Currently, we store and use bloom only for full-key lookups. However, there are use cases when we can also benefit from maintaining bloom filters for partial keys as well - see #3177 for example. So this patch replaces the current full-key bloom filter with a multipart one, which is basically a set of bloom filters, one per each partial key. Old bloom filters stored on disk will be recovered as is so users will see the benefit of this patch only after major compaction takes place. When a key or tuple is checked against a multipart bloom filter, we check all its partial keys to reduce the false positive result. Nevertheless there's no size optimization as per now. E.g. even if the cardinality of a partial key is the same as of the full key, we will still store two full-sized bloom filters although we could probably save some space in this case by assuming that checking against the bloom corresponding to a partial key would reduce the false positive rate of full key lookups. This is addressed later in the series. Before this patch we used a bloom spectrum object to construct a bloom filter. A bloom spectrum is basically a set of bloom filters ranging in size. The point of using a spectrum is that we don't know what the run size will be while we are writing it so we create 10 bloom filters and choose the best of them after we are done. With the default bloom fpr of 0.05 it is 10 byte overhead per record, which seems to be OK. However, if we try to optimize other parameters as well, e.g. the number of hash functions, the cost of a spectrum will become prohibitive. Funny thing is a tuple hash is only 4 bytes long, which means if we stored all hashes in an array and built a bloom filter after we'd written a run, we would reduce the memory footprint by more than half! And that would only slightly increase the run write time as scanning a memory map of hashes and constructing a bloom filter is cheap in comparison to mering runs. Putting it all together, we stop using bloom spectrum in this patch, instead we stash all hashes in a new bloom builder object and use them to build a perfect bloom filer after the run has been written and we know the cardinality of each partial key. Closes #3177
-
Vladimir Davydov authored
Suggested by @kostja
-
Vladimir Davydov authored
There's absolutely no point in using mmap() instead of malloc() for bitmap allocation - malloc() will fallback on mmap() anyway provided the allocation is large enough. Note about the unit test: since we don't round the bloom filter size up to a multiple of page size anymore, we have to use a more sophisticated hash function for the test to pass.
-
Vladimir Davydov authored
We filter bloom filters, because they depend on ICU version and hence the test output may vary from one platform to another (see commit 0a37ccad "Filter out bloom_filter in vinyl/layout.test.lua"). However, using test_run for this is unreliable, because a bloom string can contain newline characters and hence be split in multiple lines in console output, in which case the filter won't work. Fix this by filtering bloom_filter manually.
-
Vladislav Shpilevoy authored
-
Kirill Shcherbatov authored
Netbox does not need nullability or collation info, but some customers do. Lets fill index parts with these fields. Fixes #3256
-