Skip to content
Snippets Groups Projects
  1. Dec 11, 2024
    • Dmitry Ivanov's avatar
      feat: Add `auth_type` to box.schema.user.create() · cefa7109
      Dmitry Ivanov authored
      Now it's possible to specify the desired authentication method during
      user creation via `auth_type`, e.g.
      
      ```lua
      box.schema.user.create('mickey', { auth_type = 'chap-sha1',
                                         password = 'foobar' })
      ```
      
      Furthermore, authentication methods may now specify that they don't
      require password to create stored authentication info. This is used
      in LDAP authentication (`auth_type = 'ldap'`):
      
      ```lua
      box.schema.user.create('mickey', { auth_type = 'ldap' })
      ```
      
      NO_DOC=picodata internal patch
      NO_CHANGELOG=picodata internal patch
      NO_TEST=picodata internal patch
      cefa7109
    • Dmitry Ivanov's avatar
      fix: Establish LDAP connections implicitly · 2d270bcf
      Dmitry Ivanov authored
      Unfortunately, Centos 7 provides only openssl 1.0.2 (at lest if we
      disregard epel), so we can't build the bundled libldap & libsasl2.
      "Okay", one might think, "we can link against the distro's libs".
      Well, turns out libldap 2.4, which is what we have to deal with in
      that case, doesn't have ldap_connect!
      
      Luckily, we don't have to connect explicitly. According to man pages:
      
      ```
      ldap_init() acts just like ldap_open(), but does not open a connection
      to the LDAP server.  The actual connection open will occur when the
      first operation is attempted.
      
      ldap_initialize()  acts  like ldap_init()...
      ```
      
      This is still true for libldap up to and including version 2.6.
      
      NO_DOC=picodata internal patch
      NO_CHANGELOG=picodata internal patch
      NO_TEST=picodata internal patch
      2d270bcf
    • Dmitry Ivanov's avatar
      feat: Implement LDAP authentication · 1fcdd15f
      Dmitry Ivanov authored
      This authentication method doesn't store any secrets; instead,
      we delegate the whole auth to a pre-configured LDAP server. In
      the method's implementation, we connect to the LDAP server and
      perform a BIND operation which checks user's credentials.
      
      Usage example:
      
      ```lua
      -- Set the default auth method to LDAP and create a new user.
      -- NOTE that we still have to provide a dummy password; otherwise
      -- box.schema.user.create will setup an empty auth data.
      box.cfg({auth_type = 'ldap'})
      box.schema.user.create('demo', { password = '' })
      
      -- Configure LDAP server connection URL and DN format string.
      os = require('os')
      os.setenv('TT_LDAP_URL', 'ldap://localhost:1389')
      os.setenv('TT_LDAP_DN_FMT', 'cn=$USER,ou=users,dc=example,dc=org')
      
      -- Authenticate using the LDAP authentication method via net.box.
      conn = require('net.box').connect(uri, {
          user = 'demo',
          password = 'password',
          auth_type = 'ldap',
      })
      ```
      
      NO_DOC=internal
      NO_TEST=internal
      NO_CHANGELOG=internal
      1fcdd15f
    • Dmitry Ivanov's avatar
      feat: Pass user to auth_method::authenticator_check_request · 9f871c65
      Dmitry Ivanov authored
      This is required for LDAP authentication, because we need
      username to format the corresponding DN.
      
      NO_DOC=picodata internal patch
      NO_CHANGELOG=picodata internal patch
      NO_TEST=picodata internal patch
      9f871c65
    • Maksim Kaitmazian's avatar
      fix: box.schema.user.passwd doesn't change the password · bfd298d8
      Maksim Kaitmazian authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      box.schema.user.passwd doesn't change the password for the current
      user because new password is passed instead of the user name.
      
      NO_CHANGELOG=fix an unreleased bug
      NO_DOC=fix an unreleased bug
      bfd298d8
    • Maksim Kaitmazian's avatar
      fix: allow empty password and username in md5 · 21f065b2
      Maksim Kaitmazian authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      It fixes the following assertion
      ```bash
      tarantool: ./src/lib/core/crypt.c:84: md5_encrypt:
      Assertion `password_len + salt_len > 0' failed.
      ```
      caused by the following code
      ```lua
      box.cfg{auth_type='md5'}
      box.schema.user.password("")
      ```
      
      NO_CHANGELOG=fix an unreleased feature
      NO_DOC=fix an unreleased feature
      21f065b2
    • Maksim Kaitmazian's avatar
      feat: make user name argument optional · 160dee2d
      Maksim Kaitmazian authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      part of picodata/tarantool#21
      
      NO_CHANGELOG=refactoring
      NO_DOC=refactoring
      160dee2d
    • Maksim Kaitmazian's avatar
      feat: implement md5 authentication · 2e10bef3
      Maksim Kaitmazian authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      It prevents password sniffing and avoids storing passwords on the
      server in plain text but provides no protection if an attacker
      manages to steal the password hash from the server.
      
      Usage example:
      ```lua
      -- Enable the md5 authentication method for all new users.
      box.cfg({auth_type = 'md5'})
      
      -- Reset existing user passwords to use the md5 authentication method.
      box.schema.user.passwd('alice', 'topsecret')
      
      -- Authenticate using the md5 authentication method via net.box.
      conn = require('net.box').connect(uri, {
          user = 'alice',
          password = 'topsecret',
          -- Specifying the authentication method isn't strictly necessary:
          -- by default the client will use the method set in the remote
      	-- server config (box.cfg.auth_type)
          auth_type = 'md5',
      })
      ```
      
      part of picodata/picodata/sbroad!377
      
      @TarantoolBot document
      Title: md5 authentication method
      
      See the commit message.
      2e10bef3
    • Maksim Kaitmazian's avatar
      feat: add user name argument to `auth_method` api · ef4f31ef
      Maksim Kaitmazian authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      User name is usually used as a salt for user password in order to
      avoid password repeating.
      For instance, postgres md5 authentication stores passwords as
      md5("password", "user"), so that the same passwords are represented by
      different hashes.
      
      part of picodata/picodata/sbroad!377
      
      @TarantoolBot document
      Title: Document updated `box.schema.user.password` declaration.
      
      Since auth methods can use user name for hashing, user name is
      added to argument list of `box.schema.user.password`.
      
      NO_TEST=there are no methods that use user name
      ef4f31ef
    • Georgy Moshkin's avatar
      fiber: introduce fiber_set_name_n function · 0ba5b9cb
      Georgy Moshkin authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      NO_DOC=picodata internal patch
      NO_CHANGELOG=picodata internal patch
      NO_TEST=picodata internal patch
      0ba5b9cb
    • Arseniy Volynets's avatar
      fix: compile errors for centos/altlinux · 047ec4f1
      Arseniy Volynets authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      Previous commit caused CI `pack` job
      to fail on some linux distros. This
      commit fixes the warnings from compiler.
      
      NO_DOC=picodata internal patch
      NO_CHANGELOG=picodata internal patch
      NO_TEST=picodata internal patch
      047ec4f1
    • Arseniy Volynets's avatar
      feat: add limit for max executed vdbe opcodes · a51dd994
      Arseniy Volynets authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      - Add a configurable non-negative
      session parameter "sql_vdbe_max_steps"
      -- max number of opcodes that Vdbe
      is allowed to execute for sql query.
      
      - Default value can be specified in box.cfg.
      If not set via box.cfg, default value
      is 45000. Value 0 means that no
      checks for number of executed Vdbe
      opcodes will be made.
      
      - Add the third argument to box.execute
      function, that allows to specify options
      for query execution. The only option
      supported: sql_vdbe_max_steps. Usage
      example:
      
      ```
      box.execute([[select * from t]], {}, {{sql_vdbe_max_steps = 1000}})
      ```
      
      part of picodata/picodata/sbroad!461
      
      NO_DOC=picodata internal patch
      NO_CHANGELOG=picodata internal patch
      a51dd994
    • Gleb Kashkin's avatar
      console: fix :endswith() err in tntctl connection · 7136dcfa
      Gleb Kashkin authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      There used to be a rare error when failed to connect via tarantoolctl to
      listening cartridge console. It was caused by unclear
      console.local_print() contract. Starting from gh-7031 fix, the function
      assumed string-only arguments, while in some cases cdata error was
      passed.
      
      Now console.local_print() prints all non-string arguments as is, without
      modifying potential local_eos.
      
      Closes #8374
      
      NO_DOC=bugfix
      NO_TEST=very hard to test
      7136dcfa
    • Denis Smirnov's avatar
      feat: expose tuple hash calculation method · aeb4e83d
      Denis Smirnov authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      Picodata supports cluster-wide SQL and needs some predictable
      method to calculate tuple hashes for the bucket ids. Method
      should be available for Lua, C and Rust users. It was decided
      to expose a murmur3 hash calculation method of the key_def module.
      
      NO_DOC=picodata internal patch
      NO_CHANGELOG=picodata internal patch
      
      fix: tuple hash calculation tests
      
      Tuple hash calculation tests for the C API were incorrect. Thanks
      to the full pipeline with DEBUG build we detected the problem and
      fixed it.
      
      NO_DOC=picodata internal patch
      NO_CHANGELOG=picodata internal patch
      aeb4e83d
    • godzie44's avatar
      cbus: introduce lcpipe - light cpipe · fb00c8ca
      godzie44 authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      Introduced a new type of cbus pipe - lcpipe. The current pipe in the
      cbus - cpipe, has a number of limitations, first of all - the cpipe
      cannot be used from the 3rd party threads, cpipe only works as a channel
      between two cords. That why lcpipe is needed. Its main responsibility -
      create channel between any thread and tarantool cord.
      
      Internally lcpipe is a cpipe, but:
      - on flush triggers removed, cause triggers use thread-local mem-pool,
      this is not possible on a third party thread
      - producer event loop removed, cause there is no libev event loop in
      third party thread
      
      Also, lcpipe interface is exported to the outside world.
      
      fix: use-after-free in `cbus_endpoint_delete`
      
      Calling a `TRASH` macro after calling the `free`
      function dereferences the pointer to the already
      freed memory.
      
      NO_DOC=picodata internal patch
      NO_CHANGELOG=picodata internal patch
      NO_TEST=picodata internal patch
      fb00c8ca
    • Дмитрий Кольцов's avatar
      fix(schema version): fix some types that were not updated to 64 bit · cab9848f
      Дмитрий Кольцов authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      NO_DOC=core feature
      NO_TEST=no Lua API
      NO_CHANGELOG=bugfix
      cab9848f
    • Дмитрий Кольцов's avatar
      feat(json): add option to encode decimals as string · c39f1fee
      Дмитрий Кольцов authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      Due to inconsistency of Tarantool type casting while using strict
      data types as "double" or "unsigned" it is needed
      to use "number" data type in a whole bunch of cases.
      However "number" may contain "decimal" that will be serialized into
      string by JSON builtin module.
      
      This commit adds "encode_decimal_as_number" parameter to json.cfg{}.
      That forces to encode `decimal` as JSON number to force type
      consistency in JSON output.
      Use with catious - most of JSON parsers assume that number is restricted
      to float64.
      
      NO_DOC=we do not host doc
      c39f1fee
    • Denis Smirnov's avatar
      sql: fix string dequoting · 7ab5f493
      Denis Smirnov authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      
      Previously,
      
      select "t1"."a" from (select "a" from "t") as "t1";
      
      returned a result column name `t1` instead of `t1.a` because of
      incorrect work of a dequoting function. The reason was that
      previously sqlDequote() function finished its work when found the
      first closing quote.
      
      Old logic worked for simple selects where the column name doesn't
      contain an explicit scan name ("a" -> a).
      But for the sub-queries results sqlDequote() finished its work right
      after the scan name ("t1"."a" -> t1). Now the function continues its
      deqouting till it gets the null terminator at the end of the string.
      
      Closes #7063
      
      NO_DOC=don't change any public API, only a bug fix
      
      Co-authored-by: default avatarMergen Imeev <imeevma@gmail.com>
      7ab5f493
    • Denis Smirnov's avatar
      sql: recompile expired prepared statements · 5f6a5c3c
      Denis Smirnov authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      Actually there is no reason to throw an error and make a user
      manually recreate prepared statement when it expires. A much more
      user friendly way is to recreate it under hood when statement's
      schema version differs from the box one.
      
      NO_DOC=refactoring
      NO_TEST=refactoring
      NO_CHANGELOG=refactoring
      5f6a5c3c
    • Denis Smirnov's avatar
      fix: default result parameter type · 170e7d91
      Denis Smirnov authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      Problem description.
      
      When we prepare a statement with parameters in the result columns
      (for example box.prepare('select ?')) Tarantool has no information
      about the type of the output column and set it to default boolean.
      Then, on the execution phase, the type would be recalculated during
      the parameter binding.
      
      Tarantool expects that there is no way for parameter to appear in the
      result tuple other than exactly be mentioned in the final projection.
      But it is incorrect - we can easily propagate parameter from the inner
      part of the join. For example
      
      box.prepare([[select COLUMN_1 from t1 join (values (?)) as t2 on true]])
      
      In this case column COLUMN_1 in the final projection is not a
      parameter, but a "reference" to it and its type depends on the
      parameter from the inner part of the join. But as Tarantool
      recalculates only binded parameters in the result projection,
      it doesn't change the default boolean metadata type of the COLUMN_1
      and the query fails on comparison with the actual type of the tuple.
      
      Solution.
      As we don't want to patch Vdbe to make COLUMN_1 refer inner parameter,
      it was decided to make a simple workaround: change the default
      column type from BOOLEAN to ANY for parameters. It fixes the
      comparison with the actual tuple type (we do not fail), but in some
      cases get ANY column in the results where we would like to have
      explicitly defined type. Also NULL parameters would also have ANY
      type, though Tarantool prefers to have BOOLEAN in this case.
      
      Closes https://github.com/tarantool/tarantool/issues/7283
      
      NO_DOC=bug fix
      170e7d91
    • godzie44's avatar
      sql: add sql_execute_prepared_ext function · d406e749
      godzie44 authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      It's similar to sql_execute_prepared, but doesn't have the `region` parameter.
      
      NO_DOC=minor
      NO_TEST=minor
      d406e749
    • Дмитрий Кольцов's avatar
      cmake: disable feedback daemon by default · 32fa3a59
      Дмитрий Кольцов authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      NO_DOC=disable feedback
      NO_TEST=disable feedback
      32fa3a59
    • godzie44's avatar
      feat: compatibility with tarantool-module · 5f9a3200
      godzie44 authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      - add box_tuple_data_offset function
        (return offset of the msgpack encoded data from the beginning of the tuple)
      - add more export functions
      
      NO_DOC=build
      NO_TEST=build
      5f9a3200
    • Yaroslav Dynnikov's avatar
      cmake: add option not to build doc or tests · f8e35a5b
      Yaroslav Dynnikov authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      f8e35a5b
    • Дмитрий Кибирев's avatar
      ci(picodata): change rocks-repo to our mirror · 1a83af71
      Дмитрий Кибирев authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      NO_DOC=internal
      NO_TEST=internal
      NO_CHANGELOG=internal
      1a83af71
    • Виталий Шунков's avatar
      fix: change PG repo in Dockerfile · 8518ade6
      Виталий Шунков authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      NO_DOC=internal
      NO_TEST=internal
      NO_CHANGELOG=internal
      8518ade6
    • Дмитрий Кибирев's avatar
      ci(picodata): save old deb-packages · b54b9c8e
      Дмитрий Кибирев authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      NO_DOC=ci change
      NO_TEST=ci change
      NO_CHANGELOG=ci change
      b54b9c8e
    • Виталий Шунков's avatar
      ci(picodata): add downstream trigger to run child pipeline · 8eddea56
      Виталий Шунков authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      NO_DOC=internal
      NO_TEST=internal
      NO_CHANGELOG=internal
      8eddea56
    • Yaroslav Dynnikov's avatar
      ci(picodata): fix building release docker image · 94240c38
      Yaroslav Dynnikov authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      1. Pass `docker build --build-arg TARANTOOL_VERSION=...` to ensure the
         resulting image contains the correct version.
      2. Pass it in the CI trigger variables.
      3. Fix CI jobs relationships: "docker" stage needs "deploy-packages".
      4. Minor fixes in job names.
      
      NO_DOC=internal
      NO_TEST=internal
      NO_CHANGELOG=internal
      94240c38
    • Alexey Protsenko's avatar
      ci(picodata): trigger tarantool-module testing on docker push · b32ee786
      Alexey Protsenko authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      Add a trigger to run tarantool-module CI tests when a new docker image
      is released (on tag).
      
      NO_DOC=internal
      NO_TEST=internal
      NO_CHANGELOG=internal
      b32ee786
    • Alexey Protsenko's avatar
      ci(picodata): setup all CI · 03db9111
      Alexey Protsenko authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      - Add test_linux, test_debian_docker_luacheck to .gitlab.ci.yml.
      - Add coverage from .travis.mk.
      - Sign package on build.
      - Add checkpatch linter.
      - Add docker image build (tarantool/tarantool from Dockerhub)
      
      NO_DOC=ci change
      NO_TEST=ci change
      NO_CHANGELOG=ci change
      03db9111
    • Дмитрий Кольцов's avatar
      gitlab(picodata): add merge request template · 27d26dd9
      Дмитрий Кольцов authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      NO_DOC=chore
      NO_CHANGELOG=chore
      NO_TEST=chore
      27d26dd9
    • Vladimir Davydov's avatar
      vinyl: disable tautological DELETE optimization for deferred DELETEs · 050dcf4d
      Vladimir Davydov authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      If the write iterator sees that one DELETE statement follows another,
      which isn't discarded because it's referenced by a read view, it drops
      the newer DELETE, see commit a6f45d87 ("vinyl: discard tautological
      DELETEs on compaction"). This is incorrect if the older DELETE is a
      deferred DELETE statement (marked as SKIP READ) because such statements
      are dumped out of order, i.e. there may be a statement with the LSN
      lying between the two DELETEs in an older source not included into this
      compaction task. If we discarded the newer DELETE, we wouldn't overwrite
      this statement on major compaction, leaving garbage. Fix this issue by
      disabling this optimization for deferred DELETEs.
      
      Closes #10895
      
      NO_DOC=bug fix
      
      (cherry picked from commit 2945a8c9fde6df9f6cbc714f9cf8677f0fded57a)
      050dcf4d
    • Vladimir Davydov's avatar
      vinyl: move cache invalidation from vy_tx_write to vy_lsm_set · 122d40af
      Vladimir Davydov authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      vy_lsm.c seems to be a more appropriate place for cache invalidation
      because (a) it's vy_lsm that owns the cache and (b) we invalidate
      the cache on rollback in vy_lsm_rollback_stmt().
      
      While we are at it, let's inline vy_tx_write() and vy_tx_write_prepare()
      because they are trivial and used in just one place.
      
      NO_DOC=refactoring
      NO_TEST=refactoring
      NO_CHANGELOG=refactoring
      
      (cherry picked from commit 44c245ef0baa227a6bff75de2a91f20da5532dc1)
      122d40af
    • Vladimir Davydov's avatar
      vinyl: do not invalidate cache on commit after prepare · b2b41560
      Vladimir Davydov authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      There's no point in doing so because if the committed tuple has been
      overwritten by the time it's committed, the statement that overwrote
      it must have already invalidated the cache, see `vy_tx_write()`.
      The code invalidating the cache on commit was added along with the
      cache implementation without any justification.
      
      NO_DOC=minor
      NO_TEST=minor
      NO_CHANGELOG=minor
      
      (cherry picked from commit 6ee49a5955893834fdaaf554d57d92d3f35992bc)
      b2b41560
    • Vladimir Davydov's avatar
      vinyl: fix cache invalidation on rollback of DELETE statement · 76345442
      Vladimir Davydov authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      Once a statement is prepared to be committed to WAL, it becomes visible
      (in the 'read-committed' isolation level) so it can be added to the
      tuple cache. That's why if the statement is rolled back due to a WAL
      error, we have to invalidate the cache. The problem is that the function
      invalidating the cache (`vy_cache_on_write`) ignores the statement if
      it's a DELETE judging that "there was nothing and there is nothing now".
      This is apparently wrong for rollback. Fix it.
      
      Closes #10879
      
      NO_DOC=bug fix
      
      (cherry picked from commit d64e29da2c323a4b4fcc7cf9fddb0300d5dd081f)
      76345442
    • Vladimir Davydov's avatar
      vinyl: fix handling of duplicate multikey entries in transaction write set · 3d584a69
      Vladimir Davydov authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      A multikey index stores a tuple once per each entry of the indexed
      array field, excluding duplicates. For example, if the array field
      equals {1, 3, 2, 3}, the tuple will be stored three times. Currently,
      when a tuple with duplicate multikey entries is inserted into a
      transaction write set, duplicates are overwritten as if they belonged
      to different statements. Actually, this is pointless: we could just as
      well skip them without trying to add to the write set. Besides, this
      may break the assumptions taken by various optimizations, resulting in
      anomalies. Consider the following example:
      
      ```lua
      local s = box.schema.space.create('test', {engine = 'vinyl'})
      s:create_index('primary')
      s:create_index('secondary', {parts = {{'[2][*]', 'unsigned'}}})
      s:replace({1, {10, 10}})
      s:update({1}, {{'=', 2, {10}}})
      ```
      
      It will insert the following entries to the transaction write set
      of the secondary index:
      
       1. REPLACE {10, 1} [overwritten by no.2]
       2. REPLACE {10, 1} [overwritten by no.3]
       3. DELETE {10, 1} [turned into no-op as REPLACE + DELETE]
       4. DELETE {10, 1} [overwritten by no.5]
       5. REPLACE {10, 1} [turned into no-op as DELETE + REPLACE]
      
      (1-2 correspond to `replace()` and 3-5 to `delete()`)
      
      As a result, tuple {1, {10}} will be lost forever.
      
      Let's fix this issue by silently skipping duplicate multikey entries
      added to a transaction write set. After the fix, the example above
      will produce the following write set entries:
      
       1. REPLACE{10, 1} [overwritten by no.2]
       2. DELETE{10, 1} [turned into no-op as REPLACE + DELETE]
       3. REPLACE{10, 1} [committed]
      
      (1 corresponds to `replace()` and 2-3 to `delete()`)
      
      Closes #10869
      Closes #10870
      
      NO_DOC=bug fix
      
      (cherry picked from commit 1869dce15d9a797391e45df75507078d91f1651e)
      3d584a69
    • Vladimir Davydov's avatar
      vinyl: skip invisible read sources · 8f7bae8c
      Vladimir Davydov authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      A Vinyl read iterator scans all read sources (memory and disk levels)
      even if it's executed in a read view from which most of the sources are
      invisible. As a result, a long running scanning request may spend most
      of the time skipping invisible statements. The situation is exacerbated
      if the instance is experiencing a heavy write load because it would pile
      up old statement versions in memory and force the iterator to skip over
      them after each disk read.
      
      Since the replica join procedure in Vinyl uses a read view iterator
      under the hood, the issue is responsible for a severe performance
      degradation of the master instance and the overall join procedure
      slowdown when a new replica is joined to an instance running under
      a heavy write load.
      
      Let's fix this issue by making a read iterator skip read sources that
      aren't visible from its read view.
      
      Closes #10846
      
      NO_DOC=bug fix
      
      (cherry picked from commit 6a214e42e707b502022622866d898123a6f177f1)
      8f7bae8c
    • Vladimir Davydov's avatar
      vinyl: fix handling of overwritten statements in transaction write set · 3344bffc
      Vladimir Davydov authored and Dmitry Ivanov's avatar Dmitry Ivanov committed
      Statements executed in a transaction are first inserted into the
      transaction write set and only when the transaction is committed, they
      are applied to the LSM trees that store indexed keys in memory. If the
      same key is updated more than once in the same transaction, the old
      version is marked as overwritten in the write set and not applied on
      commit.
      
      Initially, write sets of different indexes of the same space were
      independent: when a transaction was applied, we didn't have a special
      check to skip a secondary index statement if the corresponding primary
      index statement was overwritten because in this case the secondary
      index statement would have to be overwritten as well. This changed when
      deferred DELETEs were introduced in commit a6edd455 ("vinyl:
      eliminate disk read on REPLACE/DELETE"). Because of deferred DELETEs,
      a REPLACE or DELETE overwriting a REPLACE in the primary index write
      set wouldn't generate DELETEs that would overwrite the previous key
      version in write sets of the secondary indexes. If we applied such
      a statement to the secondary indexes, it'd stay there forever because,
      since there's no corresponding REPLACE in the primary index, a DELETE
      wouldn't be generated on primary index compaction. So we added a special
      instruction to skip a secondary index statement if the corresponding
      primary index was overwritten, see `vy_tx_prepare()`. Actually, this
      wasn't completely correct because we skipped not only secondary index
      REPLACE but also DELETE. Consider the following example:
      
      ```lua
      local s = box.schema.space.create('test', {engine = 'vinyl'})
      s:create_index('primary')
      s:create_index('secondary', {parts = {2, 'unsigned'}})
      
      s:replace{1, 1}
      
      box.begin()
      s:update(1, {{'=', 2, 2}})
      s:update(1, {{'=', 2, 3}})
      box.commit()
      ```
      
      UPDATEs don't defer DELETEs because, since they have to query the old
      value, they can generate DELETEs immediately so here's what we'd have
      in the transaction write set:
      
       1. REPLACE {1, 2} in 'test.primary' [overwritten by no.4]
       2. DELETE {1, 1} from 'test.secondary'
       3. REPLACE {1, 2} in 'test.secondary' [overwritten by no.5]
       4. REPLACE{1, 3} in 'test.primary'
       5. DELETE{1, 2} from 'test.secondary'
       6. REPLACE{1, 3} in 'test.secondary'
      
      Statement no.2 would be skipped and marked as overwritten because of
      the new check, resulting in {1, 1} never deleted from the secondary
      index. Note, the issue affects spaces both with and without enabled
      deferred DELETEs.
      
      This commit fixes this issue by updating the check to only skip REPLACE
      statements. It should be safe to apply DELETEs in any case.
      
      There's another closely related issue that affects only spaces with
      enabled deferred DELETEs. When we generate deferred DELETEs for
      secondary index when a transaction is committed (we can do it if we find
      the previous version in memory), we assume that there can't be a DELETE
      in a secondary index write set. This isn't true: there can be a DELETE
      generated by UPDATE or UPSERT. If there's a DELETE, we have nothing to
      do unless the DELETE was optimized out (marked as no-op).
      
      Both issues were found by `vinyl-luatest/select_consistency_test.lua`.
      
      Closes #10820
      Closes #10822
      
      NO_DOC=bug fix
      
      (cherry picked from commit 6a87c45deeb49e4e17ae2cc0eeb105cc9ee0f413)
      3344bffc
  2. Nov 22, 2024
Loading