Skip to content
Snippets Groups Projects
  1. Jun 16, 2020
    • Vladislav Shpilevoy's avatar
      sql: don't build sql as a separate library · 35473d5d
      Vladislav Shpilevoy authored
      SQL heavily depends on box, and box on SQL. So they can't be
      separate libraries. The build started failing with undefined box
      symbols in SQL, when code of the latter has slightly changed in
      one of the recent commits.
      
      The build failed only with UB sanitizer enabled, but
      'VERBOSE=1 make' showed that both with UB and without UB the build
      command was the same (not counting -fsanitize flags). So the
      sanitizer has nothing to do with it.
      
      The patch makes SQL sources being built as a part of box library.
      
      Closes #5067
      35473d5d
  2. Jun 15, 2020
    • Roman Khabibov's avatar
      sql: display collation in metadata for scalar · ed935572
      Roman Khabibov authored
      Fix bug with the display of collation for scalar fields in
      <SELECT> result, when sql_full_metadata is enabled.
      
      Closes #4755
      ed935572
    • Alexander V. Tikhonov's avatar
      test: fix flaky replication/status.test.lua · 8b72c893
      Alexander V. Tikhonov authored
      Found issue:
      
       [009] --- replication/status.result	Wed May  6 09:03:18 2020
       [009] +++ replication/status.reject	Tue May 12 15:55:09 2020
       [009] @@ -307,11 +307,12 @@
       [009]  ...
       [009]  r.upstream.status == "stopped"
       [009]  ---
       [009] -- true
       [009] +- false
       [009]  ...
       [009]  r.upstream.message:match('Duplicate') ~= nil
       [009]  ---
       [009] -- true
       [009] +- error: '[string "return r.upstream.message:match(''Duplicate'') ..."]:1: attempt
       [009] +    to index field ''message'' (a nil value)'
       [009]  ...
       [009]  test_run:cmd('switch default')
       [009]  ---
      
      To check the upstream status and it's message need to wait until an
      upstream appears. This prevents an attempt to index a nil value when
      one of those functions are called before a record about a peer appears
      in box.info.replication. It was observed on test:
        replication/show_error_on_disconnect
      after commit
        c6bea65f ('replication: recfg with 0
      quorum returns immediately').
      
      Closes #4969
      8b72c893
    • Alexander V. Tikhonov's avatar
      Correct cleanup gitlab-ci · 892a188b
      Alexander V. Tikhonov authored
      Found the issue on regular testing hosts: [1]
      
        Fetching changes...
        00:04
         Reinitialized existing Git repository in
           /home/gitlab-runner/builds/zzyC6hh5/0/tarantool/tarantool/.git/
         Checking out 8ff7f32c as ...
         warning: failed to remove CMakeFiles/Makefile.cmake
      
      Found the job that saved the directories with root permissions: [2]
      
      The issue appeared because the job that saved directories with root
      permissions used the 'shell' runner to run docker container inside.
      It caused the gitlab-runner to run the default workspace cleanup
      outside the docker container. In opposite to it, when 'docker' runner
      is used, the cleanup routine runs inside the docker container and no
      fails ever exist, because the root permissions are used in the docker
      container and this is the same root permissions for the host. As the
      result using 'shell' runner, cleanup routine failed to remove files
      created by root inside the docker container and which were shared to
      global host with the same permissions, because gitlab-runner runs the
      'shell' runner by regular 'gitlab-runner' user, but not by root.
      
      To fix the issue need to run docker containers using the gitlab-runner
      only in RO mode with Out-Of-Source builds in it. Either use the
      'docker' runners when docker containers are needed. Anyway 'shell'
      runner jobs with additional calls to docker containers can't be
      control for the branches of developers, to avoid of it need to make
      local cleanup routine instead of default to the working paths for each
      job with 'shell' runners use.
      
      Decided to setup gitlab-runner configuration as described in: [3]
      
      Also got the issue with left data from previous builds at the
      submodule pathes, like here: [4]
      
         Undefined symbols for architecture x86_64:
           "_u_isprint_66", referenced from:
               _yaml_emitter_is_printable in libyaml_static.a(emitter.c.o)
         ld: symbol(s) not found for architecture x86_64
         clang: error: linker command failed with exit code 1 (...)
      
      Also got issues with left files from previously tested branches,
      like here: [5]
      
       [087] small/rlist.test
       [087] TAP13 parse failed (Missing plan in the TAP source).
       [087]
       [087] No result file (small/rlist.result) found.
       [087] Run the test with --update-result option to write the new result file.
       [087] [ fail ]
      
       [087] small/static.test
       [087] TAP13 parse failed (Missing plan in the TAP source).
       [087]
       [087] No result file (small/static.result) found.
       [087] Run the test with --update-result option to write the new result file.
       [087] [ fail ]
      
      To fix it was added the command to clean all available git submodules:
        git submodule foreach git clean -ffdx
      
      Closes #5036
      
      1. https://gitlab.com/tarantool/tarantool/-/jobs/577884238#L7
      2. https://gitlab.com/tarantool/tarantool/-/jobs/577768553
      3. https://docs.gitlab.com/ce/ci/yaml/README.html#git-clean-flags
      4. https://gitlab.com/tarantool/tarantool/-/jobs/574199256#L3141
      5. https://gitlab.com/tarantool/tarantool/-/jobs/590573606#L3718
      892a188b
  3. Jun 14, 2020
    • Olga Arkhangelskaia's avatar
      cmake: set CMP0037 policy to NEW · a753f258
      Olga Arkhangelskaia authored
      To fix deprecation warning, CMP0037 policy was changed to NEW for cmake
      3.11 and above.
      
      CMP0037 old behavior (cmake 2.8.12) allowed target names such as test.
      In cmake 3.10 and below names test, help and etc. were reserved.
      
      Starting from cmake 3.11 these names are only reserved when the
      corresponding feature is enabled (e.g. by including the CTest or CPack
      modules). Tarantool does not use CTest so the name test can be used.
      
      Closes #3587
      Unverified
      a753f258
  4. Jun 11, 2020
    • Alexander V. Tikhonov's avatar
      Set full testing for all branches · b59a089c
      Alexander V. Tikhonov authored
      Set full testing with deploy builds and tests for all branches.
      b59a089c
    • Alexander V. Tikhonov's avatar
      Divide test box/net.box · d382184a
      Alexander V. Tikhonov authored
      box/net.box_bad_argument_gh-594.test.lua
      box/net.box_call_blocks_gh-946.test.lua
      box/net.box_collectgarbage_gh-3107.test.lua
      box/net.box_connect_timeout_gh-2054.test.lua
      box/net.box_connect_triggers.test.lua
      box/net.box_console_connections_gh-2677.test.lua
      box/net.box_count_inconsistent_gh-3262.test.lua
      box/net.box_discard_gh-3107.test.lua
      box/net.box_disconnect_gh-3859.test.lua
      box/net.box_fiber-async_gh-3107.test.lua
      box/net.box_field_names_gh-2978.test.lua
      box/net.box_get_connection_object.test.lua
      box/net.box_gibberish_gh-3900.test.lua
      box/net.box_huge_data_gh-983.test.lua
      box/net.box_incompatible_index-gh-1729.test.lua
      box/net.box_incorrect_iterator_gh-841.test.lua
      box/net.box_index_unique_flag_gh-4091.test.lua
      box/net.box_iproto_hangs_gh-3464.test.lua
      box/net.box_is_nullable_gh-3256.test.lua
      box/net.box_leaks_gh-3629.test.lua
      box/net.box_log_corrupted_rows_gh-4040.test.lua
      box/net.box_long-poll_input_gh-3400.test.lua
      box/net.box_methods_gh-3107.test.lua
      box/net.box_msgpack_gh-2195.test.lua
      box/net.box_on_schema_reload-gh-1904.test.lua
      box/net.box_password_gh-1545.test.lua
      box/net.box_permissions.test.lua
      box/net.box_pseudo_objects_gh-2401.test.lua
      box/net.box_raw_response_gh-3107.test.lua
      box/net.box_readahead_gh-3958.test.lua
      box/net.box_reconnect_after_gh-3164.test.lua
      box/net.box_reconnect_after.test.lua
      box/net.box_reload_schema_gh-636.test.lua
      box/net.box_remote_method_gh-544.test.lua
      box/net.box_roll_back_gh-822.test.lua
      box/net.box_schema_change_gh-2666.test.lua
      box/net.box_schema_change_gh-3107.test.lua
      box/net.box_session_type_gh-2642.test.lua
      box/net.box_space_format_gh-2402.test.lua
      box/net.box_stack_diag_gh-1148.test.lua
      box/net.box_timeout_gh-1533.test.lua
      box/net.box_timeout-gh-3107.test.lua
      box/net.box_upsert_gh-970.test.lua
      box/net.box_uri_first_arg_gh-398.test.lua
      box/net.box_wait_connected_gh-3856.test.lua
      
      Also removed the skip condition file:
        box/net.box.skipcond
      mentioned as temporary workaround for enabling testing
      on FreeBSD within issue #4271, because no really issues
      with the tests on it found.
      
      Closes #4880
      d382184a
  5. Jun 10, 2020
    • Alexander Turenko's avatar
      test: update test-run · ef86e3c9
      Alexander Turenko authored
      Explicitly notify about a missing newline at end of a .result file in
      GNU Diff like style:
      
       |  ---
       | -...
       | \ No newline
       | +...
      
      Closes https://github.com/tarantool/test-run/issues/212
      Unverified
      ef86e3c9
    • Nikita Pettik's avatar
      vinyl: bump dump_generation in case scheduler doesn't catch up with DDL · 38ac17f3
      Nikita Pettik authored
      It may turn out that dump_generation does not catch up with current
      generation and no other dump tasks are in progress. This may happen
      dump process is throttled due to errors. In this case generation is
      bumped but dump_generation is not (since dump is not completed). In
      turn, throttling opens a window for DDL operations. For instance, index
      dropping and creation of new one results in mentioned situation:
      
      box.snapshot() -- fails for some reason; next attempt at dumping will be
                     -- taken in one second.
      s:drop() -- drop index to be dumped
      s = box.schema.space.create('test', {engine = 'vinyl'})
      -- create new one (its mem generation is greater than scheduler's one)
      i = s:create_index('pk')
      
      Closes #4821
      38ac17f3
    • Nikita Pettik's avatar
      vinyl: rotate mem during index build on demand · c3561b13
      Nikita Pettik authored
      Meanwhile in 17f6af7d the similar problem has been fixed, still it may
      appear that in-memory level of secondary index being constructed has
      lagging memory generation (in other words, it's values is less than
      the value of current global generation). Such situation can be achieved
      if yield which is required to check uniqueness of value being inserted
      is too long. In this time gap other space may trigger dump process
      bumping global memory generation counter, but dump itself is still not
      yet scheduled.
      
      So to get rid of generations mismatch, let's rotate in-memory level
      after yield on demand.
      
      Note that test is not included into the patch due to its complexity
      and will be added as a follow-up.
      
      Closes #5042
      c3561b13
    • Olga Arkhangelskaia's avatar
      build: don't start example instance in postinstall · 080beba0
      Olga Arkhangelskaia authored
      
      After tarantool installation on Debian/Ubuntu from repo, example
      instance was automatically started on 3301 port. At the same time
      example instance on RHEL/CentOS is started manually. Patch does the same
      for Debian/Ubuntu.
      
      Closes #4507
      
      Reviewed-by: default avatarIgor Munkin <imun@tarantool.org>
      Reviewed-by: default avatarAlexander Turenko <alexander.turenko@tarantool.org>
      Unverified
      080beba0
    • Sergey Kaplun's avatar
      lua: remove excess Lua call from table encoding · 67ac8df8
      Sergey Kaplun authored
      For safe table encoding <lua_field_try_serialize> function is pushed
      to Lua stack along with auxiliary lightuserdata and table object to be
      encoded. Its further protected call catches Lua error if one is raised
      while encoding. It is only necessary when the object to be serialized
      has __serialize field in metatable and this field is a Lua function.
      
      This change reduces GC usage since a Lua function object is not
      created. Moreover the function serializing the given object is called
      without excess protected frame and auxiliary status struct.
      67ac8df8
    • HustonMmmavr's avatar
      static build: fix build on ubuntu · 2e8fa27a
      HustonMmmavr authored
      Fixed static build with '-DBUILD_STATIC=ON' option:
      
      * Added cmake option CMAKE_DL_LIBS to icu library for
        test/unit tests binaries builds at file:
          cmake/FindICU.cmake
        due to fail:
          [ 84%] Linking CXX executable vy_point_lookup.test
          /usr/local/lib/libicuuc.a(putil.ao): In function `uprv_dl_open_62':
          putil.cpp:(.text+0x1a22): undefined reference to `dlopen'
          /usr/local/lib/libicuuc.a(putil.ao): In function `uprv_dlsym_func_62':
          putil.cpp:(.text+0x1a7d): undefined reference to `dlsym'
          /usr/local/lib/libicuuc.a(putil.ao): In function `uprv_dl_close_62':
          putil.cpp:(.text+0x1a61): undefined reference to `dlclose'
          collect2: error: ld returned 1 exit status
      
      * Added cmake option CMAKE_DL_LIBS to gomp library for
        test/unit tests binaries builds at file:
          cmake/BuildMisc.cmake
        due to fail:
          [ 91%] Linking CXX executable bps_tree.test
          /usr/lib/gcc/x86_64-linux-gnu/7/libgomp.a(target.o): In function `gomp_target_init':
          (.text+0x8b): undefined reference to `dlopen'
          (.text+0xa2): undefined reference to `dlsym'
          (.text+0xd9): undefined reference to `dlclose'
          (.text+0x29b): undefined reference to `dlsym'
          (.text+0x2a8): undefined reference to `dlerror'
          (.text+0x2d0): undefined reference to `dlsym'
          (.text+0x2e9): undefined reference to `dlsym'
          (.text+0x300): undefined reference to `dlsym'
          (.text+0x317): undefined reference to `dlsym'
          (.text+0x330): undefined reference to `dlsym'
          /usr/lib/gcc/x86_64-linux-gnu/7/libgomp.a(target.o):(.text+0x34d): more undefined references to `dlsym' follow
          /usr/lib/gcc/x86_64-linux-gnu/7/libgomp.a(target.o): In function `gomp_target_init':
          (.text+0x9cc): undefined reference to `dlerror'
          collect2: error: ld returned 1 exit status
      
      Close #5024
      2e8fa27a
  6. Jun 09, 2020
    • Alexander Turenko's avatar
      test: update test-run · e3cf64a6
      Alexander Turenko authored
      Introduce the new test-run.py --update-result option, which allows to
      write a new result file (a test reference output) or re-write an
      existing one.
      
      Before this patch, test-run did not require the option to write a new
      result file (when there is no exising one). Now 'fail' status will be
      reported for such tests when the option is not passed.
      
      Closes https://github.com/tarantool/test-run/issues/194
      Unverified
      e3cf64a6
    • Dmitry Khominich's avatar
      error: use int64_t as reference counter · 2e3c81de
      Dmitry Khominich authored
      Before it was unsafe to use error.prev, box.error.last(), or any error
      method that increments error reference counter. Any of them could throw a
      Lua error in case of INT32_MAX overflow.
      
      This patch uses int64_t as error reference counter making it impossible
      to get counter overflow.
      
      Closes #4902
      2e3c81de
    • Cyrill Gorcunov's avatar
      box/constraint: make files non executable · 4450419c
      Cyrill Gorcunov authored
      
      For some reason these files are sitting with
      mask 0755. Set proper mode.
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      4450419c
    • Vladislav Shpilevoy's avatar
      sql: fix mem_apply_type double type truncation · a33108ad
      Vladislav Shpilevoy authored
      mem_apply_type(), when tried to cast a double value to an integer,
      used the expressions:
      
          int64_t i = (int64_t) d;
          uint64_t u = (uint64_t) d;
      
      To obtain integer versions of the double value, cast them back to
      double, and see if they are equal. Assuming that if they are, the
      double can be safely cast to one of them.
      
      But this is undefined behaviour. Double can't be cast to int64_t,
      if it is > INT64_MAX or < INT64_MIN. And can't be cast to
      uint64_t, if it is < 0 or > UINT64_MAX.
      
      The patch adds explicit checks for these borders before doing the
      cast.
      
      Part of #4609
      a33108ad
    • Vladislav Shpilevoy's avatar
      sql: fix usage of not initialized index_stat · 3328b841
      Vladislav Shpilevoy authored
      Query planner uses a temporary index definition object 'to
      represent the primary key index'. Whatever real purpose of this
      index_def is (query planner wasn't changed since SQLite merge,
      and may be broken), its opts.stat field pointed at a partially
      initialized index_stat structure. Which is supposed to be used by
      the planner to make decisions such as search by which index would
      be the optimal.
      
      The patch initializes the statistics with 0.
      
      Part of #4609
      3328b841
    • Aleksandr Lyapunov's avatar
      salad: fix UB pointer arithmetics in bps_tree · e73f7da6
      Aleksandr Lyapunov authored
      There is some pointer arithmetics in bps_tree that calculates
      intermediate pointers that points out of array bounds. Though they
      are never dereferenced and only used for further caclulation of
      correct pointers, it is still UB and must be fixed.
      
      Part of #4609
      e73f7da6
    • Vladislav Shpilevoy's avatar
      digest: eliminate UBs from guava() · 620de6d0
      Vladislav Shpilevoy authored
      Guava hash function follows the algorithm described in the paper
      "A Fast, Minimal Memory, Consistent Hash Algorithm", John Lamping,
      Eric Veach. But the implementation somewhy is ported from Java
      instead of taking a ready to use C function right from the paper.
      
      Java version, ported to C as is, leads to undefined behaviour:
      - signed integer overflow;
      - double value outside of integer range assigned to the integer.
      
      Here is the full old function:
      
          static const int64_t K = 2862933555777941757;
          static const double  D = 0x1.0p31;
      
          static inline double lcg(int64_t *state)
          {
              return (double)((int32_t)(((uint64_t)*state >> 33) + 1)) / D;
          }
      
          int32_t
          guava(int64_t state, int32_t buckets)
          {
              int32_t candidate = 0;
              int32_t next;
              while (1) {
                  state = K * state + 1;
                  next = (int32_t)((candidate + 1) / lcg(&state));
                  if (next >= 0 && next < buckets)
                      candidate = next;
                  else
                      return candidate;
              }
          }
      
      Signed integer overflow happened in this line:
      
          state = K * state + 1;
      
      This UB is fixed by changing state type to uint64_t. This is ok
      and does not change behaviour, because overflow for signed
      integers in reality behaves the same as for unsigned integers,
      in terms of binary representation. Change to uint64_t didn't lead
      to change of any single line of the result assembly code.
      
      Double -> int32_t truncation overflow happened in the next line:
      
          next = (int32_t)((candidate + 1) / lcg(&state));
      
      Right expression can become something out of int32_t range. This
      is UB, but in reality the double value on the right is truncated
      to either INT32_MIN, if it is < INT32_MIN, or INT32_MAX, if it is
      > INT32_MAX.
      
      This is fixed by making 'next' and 'candidate' variables int64_t.
      
      This fixed the UB, because the right expression can't become
      < INT64_MIN and > INT64_MAX. Indeed, max possible value of 'next'
      from the expression above is:
      
          (candidate + 1) / lcg(&state) < ((2^31 - 1) + 1) / 2^-31
              = 2^62 < INT64_MAX
      
      Min possible value of 'next' is:
      
          (-2^31 + 1) / 2^-31 > -2^62 > INT64_MAX
      
      'Candidate' is assumed to be inside int32_t range, because it is
      initialized from the previous 'next' value. If it would be out of
      int32_t, it would become either bigger than 'buckets', which is
      int32_t, or < 0, and the cycle would stop on the previous
      iteration.
      
      The patch fixes the UBs, but guava() function is still broken.
      Because it returns values different than the function in the
      original paper from John Lamping and Eric Veach. Here is the
      correct function:
      
          int32_t
          correct_guava(uint64_t key, int32_t num_buckets)
          {
              int64_t b = -1, j = 0;
              while (j < num_buckets) {
                  b=j;
                  key = key * 2862933555777941757ULL + 1;
                  j = (b + 1) * ((double)(1LL << 31) / (double)((key >> 33) + 1));
              }
              return b;
          }
      
      correct_guava(6356101026326471242, 813154869) = 362866355,
                                  guava(<the same>) = 2.
      
      correct_guava(112571758688054605, 1355446793) = 907865430,
                                  guava(<the same>) = 907865431.
      
      And there are more examples. It means, guava() returns something
      with an unknown distrubution, consistency, and not described in
      the paper. That is a subject for a separate patch to deprecate
      this function and add a correct one.
      
      Part of #4609
      620de6d0
    • Vladislav Shpilevoy's avatar
      test: fix signed integer overflow in vclock test · 1cb078d1
      Vladislav Shpilevoy authored
      Vclock unit test had a test where all vclock components were close
      to INT64_MAX. The sum was bigger than INT64_MAX, and it caused
      overflow in vclock's signature.
      
      The patch reduces order of the LSNs in this test so their sum does
      not overflow anymore.
      
      Part of #4609
      1cb078d1
    • Vladislav Shpilevoy's avatar
      swim: fix zero division · 0e02bf5e
      Vladislav Shpilevoy authored
      If swim_cfg() fails during first configuration after the self
      member is created, it is deleted before return. Deletion of the
      member causes member update event addition to the dissemination
      queue. The event TTL depends on log2(member count), where the
      count is zero in the described situation. This leads to zero
      division according to clang sanitizer.
      
      The patch makes member deletion from the member table happen a
      little later, so the member event registration can't ever see the
      table empty.
      
      Part of #4609
      0e02bf5e
    • Vladislav Shpilevoy's avatar
      xrow: don't cast double to float unconditionally · 76787282
      Vladislav Shpilevoy authored
      Xrow update functions use function xrow_update_arith_make() to
      execute arithmetic operation. In case any operand is a floating
      point value, both are cast to double. Afterwards the result value
      tries to shrink to float if possible, to save space.
      
      The check whether the value fits float was done in a dumb but
      simple way:
      
          (double)(float)value == value
      
      This was done to see if the fractional part is not truncated. The
      integral part was checked against FLT_MAX and -FLT_MAX. But the
      latter check was done after the first check. That led to undefined
      behaviour if the |value| is bigger than |FLT_MAX|. This patch
      makes xrow update firstly check the integral bounds and then the
      fractional part.
      
      Alongside there is fixed a bug, when FLT_MAX value was considered
      double.
      
      Part of #4609
      76787282
    • Vladislav Shpilevoy's avatar
      vinyl: fix 0 division in case of canceled dump · cc7da0e5
      Vladislav Shpilevoy authored
      It can happen, that vinyl index dump is scheduled, but the index
      is dropped before the dump starts. In this case the dump task is
      finished with 0 dump time. The time then is used in a log message
      to obtain dump rate as byte count / dump time. Since dump time is
      0, this is undefined behaviour. Moreover, it was not even a dump,
      so nothing to log here.
      
      Also dump time is used to update throttling parameters in
      vy_regulator. Here if the dump took 0 time, there was no a dump,
      so it is just ignored.
      
      Part of #4609
      cc7da0e5
    • Vladislav Shpilevoy's avatar
      test: avoid usleep() usage for error injections · 0f953ab3
      Vladislav Shpilevoy authored
      Some error injections use usleep() to put the current thread in
      sleep. For example, to simulate, that WAL thread is slow.
      
      A few of them allowed to pass custom number of microseconds to
      usleep, in form:
      
          usleep(injection->dvalue * 1000000);
      
      Assuming, that dvalue is a number of seconds. But usleep argument
      is uint32_t, at least on Mac (it is useconds_t, whose size is 4).
      It means, that too big dvalue easily overflows it.
      
      The patch makes it use nanosleep(), in a new wrapper:
      thread_sleep(). It takes a double value, and does not truncate it
      to 4 bytes.
      
      The overflow was the case for ERRINJ_VY_READ_PAGE_TIMEOUT = 9000
      in test/vinyl/errinj_vylog.test.lua. And
      ERRINJ_VY_RUN_WRITE_STMT_TIMEOUT = 9000 in
      test/vinyl/errinj.test.lua.
      
      Part of #4609
      0f953ab3
    • Vladislav Shpilevoy's avatar
      util: introduce double_compare_nint64() · 1a08e786
      Vladislav Shpilevoy authored
      Utility module (util.h and util.c) offers a function
      double_compare_uint64() to compare double and uint64_t in a
      reliable way, without undefined behaviour, without losses, even if
      values are about 2^53 - 2^64.
      
      There was no a similar function to compare double and int64_t,
      which is needed when the right value is negative. To workaround it
      the right value (int64_t) was multiplied by -1 when it was
      negative to be able to use it in double_compare_uint64(). This led
      to undefined behaviour, when right value was INT64_MIN, because
      expression (uint64_t)-value did not help. Firstly -value was
      calculated, and then it was cast to uint64_t. The first step is
      detected by the sanitizer as undefined behaviour.
      
      Not counting, that it was slower than straightforward comparison
      of negative int64_t and a double value. Because involved
      additional negation.
      
      Besides, there was an expression:
      
          assert((uint64_t)(double)rhs == rhs || rhs > (uint64_t)EXP2_53);
      
      Rhs is a uint64_t. And there was hidden an undefined behaviour,
      when rhs is UINT64_MAX. The problem is that UINT64_MAX is
      2^64 - 1. And it can't be stored in a double value without
      precision loss. It is rounded up to 2^64 = UINT64_MAX + 1. This
      is what happens in "(double)rhs" expression. And when it is
      cast back to uint64_t: "(uint64_t)(double)rhs", it explodes.
      
      The patch fixes it by simply changing check order. If rhs is
      bigger than 2^53 (below this border all the integers can be
      represented), then the cast is not done.
      
      Part of #4609
      1a08e786
    • Vladislav Shpilevoy's avatar
      cmake: enable misc types of UB detection in clang · 325b678f
      Vladislav Shpilevoy authored
      Option ENABLE_UB_SANITIZER enables clang undefined behaviour
      sanitizer. So far the only UB to detect was alignment violation.
      This was the biggest problem found by the sanitizer. Now when it
      is fixed, most of the other types of UB are also turned on to fix
      them as well.
      
      There is a few of exceptions - pointer type overflow, vptr check,
      and all types of integer overflow and truncation.
      
      Pointer type overflow detection is disabled because it is abused
      in the source code a lot, by stailq data structure.
      
      Vptr sanitation is a runtime check ensuring that a pointer at a
      non-POD type really points at an object of this type, using RTTI.
      The check false-positively fails in alter.cc when AlterSpaceOp
      class objects are stored in an rlist, and the list is iterated
      using rlist_foreach_entry(). In the cycle there is a condition:
      
          &item->member != head
      
      In the end the 'item' points not at an AlterSpaceOp, but at the
      rlist head - offsetof(typeof(item), member), at an rlist
      structure. Despite 'item' is never dereferenced, clang anyway
      generates vptr check here, which of course fails. Note,
      '&item->member' does not dereference item. It is
      item + offsetof(typeof(item), member). Just another address a few
      bytes after item.
      
      Integer overflow and truncation are disabled because SQL uses
      int64_t variables as a container of values of range [INT64_MIN,
      UINT64_MAX]. This works because there is a flag 'is_neg' near
      each such value which tells how to interpret it - as negative
      int64_t, or as positive uint64_t. As a result, some operations
      lead to a false-positive overflow. For example, consider
      expr_code_int() function. It essentially can do this:
      
          int64_t value;
          ((uint64_t *)&value) = 9223372036854775808;
          value = -value;
      
      9223372036854775808 is -INT64_MIN. It can't be stored in int64_t.
      But the thing is that (uint64_t)9223372036854775808 is stored
      exactly like (int64_t)INT64_MIN, in binary form. So the expression
      "value = -value" looks perfectly valid:
      "value = -9223372036854775808", But in fact it is interpreted as
      "value = -(-9223372036854775808)".
      
      These integer overflow/truncation problems are going to be fixed
      in a separate commit due to big amount of changes needed for that.
      
      Part of #4609
      325b678f
    • Cyrill Gorcunov's avatar
      test: app-tap/logger -- test symbolic loglevels · d64de911
      Cyrill Gorcunov authored
      
      Part-of #689
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      d64de911
    • Cyrill Gorcunov's avatar
      lua/cfg: drop redundant variable · ef8a449b
      Cyrill Gorcunov authored
      
      No need for variable v, we use module_cfg itself.
      
      Part-of #689
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      ef8a449b
    • Cyrill Gorcunov's avatar
      lua/cfg: drop unused log methods · 1a6ad79e
      Cyrill Gorcunov authored
      
      The dynamic configuration of logging level and
      format now goes via log module. These functions
      are no longed needed.
      
      Part-of #689
      
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      1a6ad79e
    • Vladislav Shpilevoy's avatar
      small: bump version after .result file update · 87bb27b0
      Vladislav Shpilevoy authored
      lsregion.result was broken, and now is fixed in small's master.
      87bb27b0
  7. Jun 08, 2020
    • Vladislav Shpilevoy's avatar
      xrow: use unaligned store operation in xrow_to_iovec() · fe6b7d3c
      Vladislav Shpilevoy authored
      xrow_to_iovec() tried to save a uint32_t value by a not aligned
      address. The patch makes it use a special operation for that
      instead of regular assignment.
      
      Part of #4609
      fe6b7d3c
    • Vladislav Shpilevoy's avatar
      port: make port_c_entry not PACKED · e3a68709
      Vladislav Shpilevoy authored
      PACKED structures don't have padding between their members and
      after the structure (needed to be able to store them in an array).
      
      Port_c_entry was PACKED, since it does not have padding between
      its members anyway, and the padding in the end was not needed,
      because these objects are never stored in an array. As a result,
      sizeof(port_c_entry) was not aligned too.
      
      Appeared, that mempool, used to allocate port_c_entry objects,
      can't work correctly, when object size is not aligned at least by
      8 bytes. Because mempool does not do any alignment internally, and
      uses the free objects as a temporary storage for some metadata,
      requiring 8 byte alignment.
      
      The patch removes PACKED attribute from port_c_entry, so now its
      size is aligned by 8 bytes, and mempool works fine.
      
      Part of #4609
      e3a68709
    • Vladislav Shpilevoy's avatar
      tuple: use unaligned store-load for field map · 94345e4d
      Vladislav Shpilevoy authored
      A tuple can have a field map. It is an array of uint32_t values,
      stored right after 'struct tuple' object in the same memory block.
      
      'struct tuple' is 10 byte size, and is aligned by 4 bytes (even
      though it is 'PACKED', so does not need an alignment). It means,
      that field map is stored by an address like this: 4*N + 10.
      
      This address is never aligned by 4, needed for uint32_t field map
      array. So the array members should be accessed using operations
      aware of unaligned nature of the addresses.
      
      Unfortunately, it is impossible to make the field map aligned,
      because that requires + 2 bytes for each tuple. It is unaffordable
      luxury, since tuple count can be millions and even billions. Every
      new byte may result in gigabytes of memory in a cluster.
      
      The patch makes field map accessed using unaligned store-load
      operations.
      
      Part of #4609
      94345e4d
    • Vladislav Shpilevoy's avatar
      vinyl: align statements and bps tree extents · bef66d48
      Vladislav Shpilevoy authored
      Vinyl tuples (vy_stmt) in 0 level of LSM tree are stored in
      lsregion. They were allocated using lsregion_alloc(), which does
      not align its results, and is good only for byte arrays.
      
      As a result, vy_stmt object addresses in 0 LSM level were not
      aligned. Unaligned memory access is slower, and may even crash on
      some platforms.
      
      Besides, even aligned allocations couldn't help upserts in 0 level
      of the LSM tree, because upsert vy_stmt objects had 1 byte prefix
      to count merged upserts stored in this statement. This 1 byte
      prefix ruined all the alignment. Now the upsert counter is also
      aligned, the same as vy_stmt. Note, it does not consume
      significantly more memory, since is used only for vinyl and only
      for upserts, stored in 0 level of the LSM tree.
      
      The same about BPS tree extents. LSM 0 level is a BPS tree, whose
      blocks are allocated on lsregion. The extents are used as pointer
      arrays inside the tree, so they need alignof(void *) alignment.
      
      The mentioned unaligned accesses were revealed by clang undefined
      behaviour sanitizer, and are fixed by this patch.
      
      Part of #4609
      bef66d48
    • Vladislav Shpilevoy's avatar
      region: use aligned allocations where necessary · ea6b814e
      Vladislav Shpilevoy authored
      Region is used for temporary allocations, usually bound to a
      transaction, but not always. Keyword - temporary. It is usually
      much faster to allocate something on the region than on the heap,
      and much much faster to free the region, since it frees data in
      whole slabs. Lots of objects at once.
      
      Region is used both for byte arrays (MessagePack, strings, blobs),
      and for objects like standard types, structs. Almost always for
      allocations was used region_alloc() function, which returns a not
      aligned address. It can be anything, even not multiple of 2.
      
      That led to alignment violation for standard types and structs.
      Usually it is harmless in terms of errors, but can be slower than
      aligned access, and on some platforms may even crash. Also the
      crash can be forced using clang undefined behaviour sanitizer,
      which revealed all the not aligned accesses fixed in this patch.
      
      Part of #4609
      ea6b814e
    • Vladislav Shpilevoy's avatar
      sql: make BtCursor's memory aligned · 294dc053
      Vladislav Shpilevoy authored
      Vdbe at runtime allocates VdbeCursor structure using
      allocateCursor() function. Inside there is a pointer at BtCursor
      structure. To make the allocation faster and improve cache
      locality, both cursors are allocated in one memory block + some
      extra memory for uint32_t array, where BtCursor followed
      VdbeCursor and the array without any padding:
      
         VdbeCursor + uint32_t * N + BtCursor
      
      The problem is that BtCursor needs 8 byte alignment. When it
      followed VdbeCursor (aligned by 8) + some uint32_t values, its
      actual alignment could become 4 bytes. That led to a crash when
      alignment sanitizer is enabled in clang.
      
      The patch makes BtCursor offset aligned by 8 bytes.
      
      Part of #4609
      294dc053
    • Vladislav Shpilevoy's avatar
      crc32: align memory access · abef6986
      Vladislav Shpilevoy authored
      There is some assembly working with a byte array like with an
      array of unsigned long values. That is incorrect, because the byte
      array may be not aligned by 'unsigned long'.
      
      The patch makes crc32 calculate the hash on the prefix
      byte-by-byte, then word-by-word on aligned addresses, and again
      byte-by-byte on a tail which is less than word.
      
      Assuming that the word-by-word part is the longest, this should
      reduce number of memory/cache loads in ~x2 times. Because in case
      of a not aligned word load it was necessary to load 2 words, and
      then merge them into one. When addresses are aligned, this is only
      1 load.
      
      Part of #4609
      abef6986
    • Vladislav Shpilevoy's avatar
      cmake: add option ENABLE_UB_SANITIZER · 366cb668
      Vladislav Shpilevoy authored
      Clang has a built-in sanitizer for undefined behaviour. Such as
      wrong memory alignment, array boundaries violation, 0 division,
      bool values with non standard content, etc.
      
      The sanitizer emits runtime checks which lead to either crash, or
      a trap, or a warning print, depending on what is chosen.
      
      The patch makes it possible to turn the sanitizer on and catch
      UBs. The only supported UB so far is alignment check. Other types
      can be added gradually, along with fixing bugs which they find.
      
      The UB sanitizer is activated for ASAN builds in CI.
      
      Part of #4609
      366cb668
    • Vladislav Shpilevoy's avatar
      cmake: ignore warnings on alignof() and offsetof() · 635b5ad9
      Vladislav Shpilevoy authored
      Warning about invalid offsetof() (used on non-POD types) was set
      for g++, but wasn't for clang++.
      
      Warning about invalid alignof() (when expression is passed to it
      instead of a type) wasn't ignored, but is going to be very
      useful in upcoming unaligned memory access patches. That allows
      to write something like:
      
          struct some_long_type *object = region_aligned_alloc(
                  region, size, alignof(*object));
      
      This will work even if type of 'object' will change in future,
      and so it is safer. And shorter.
      
      Part of #4609
      635b5ad9
Loading