Skip to content
Snippets Groups Projects
  • Alexander Turenko's avatar
    5c1bc3da
    decimal: add the library into the module API · 5c1bc3da
    Alexander Turenko authored
    The main decision made in this patch is how large the public
    `box_decimal_t` type should be. Let's look on some calculations.
    
    We're interested in the following values.
    
    * How much decimal digits is stored?
    * Size of an internal decimal type (`sizeof(decimal_t)`).
    * Size of a buffer to store a string representation of any valid
      `decimat_t` value.
    * Largest signed integer type fully represented in decimal_t (number of
      bits).
    * Largest unsigned integer type fully represented in decimal_t (number
      of bits).
    
    Now `decimal_t` is defined to store 38 decimal digits. It means the
    following values:
    
    | digits | sizeof | string | int???_t | uint???_t |
    | ------ | ------ | ------ | -------- | --------- |
    | 38     | 36     | 52     | 126      | 127       |
    
    In fact, decNumber (the library we currently use under the hood) allows
    to vary the 'decimal digits per unit' parameter, which is 3 by default,
    so we can choose density of the representation. For example, for given
    38 digits the sizeof is 36 by default, but it may vary from 28 to 47
    bytes:
    
    | digits | sizeof     | string | int???_t | uint???_t |
    | ------ | ---------- | ------ | -------- | --------- |
    | 38     | 36 (28-47) | 52     | 126      | 127       |
    
    If we'll want to store `int128_t` and `uint128_t` ranges, we'll need 39
    digits:
    
    | digits | sizeof     | string | int???_t | uint???_t |
    | ------ | ---------- | ------ | -------- | --------- |
    | 39     | 36 (29-48) | 53     | 130      | 129       |
    
    If we'll want to store `int256_t` and `uint256_t` ranges:
    
    | digits | sizeof     | string | int???_t | uint???_t |
    | ------ | ---------- | ------ | -------- | --------- |
    | 78     | 62 (48-87) | 92     | 260      | 259       |
    
    If we'll want to store `int512_t` and `uint512_t` ranges:
    
    | digits | sizeof       | string | int???_t | uint???_t |
    | ------ | ------------ | ------ | -------- | --------- |
    | 155    | 114 (84-164) | 169    | 515      | 514       |
    
    The decision here is what we consdider as possible and what as unlikely.
    The patch freeze the maximum amount of bytes in `decimal_t` as 64. So
    we'll able to store 256 bit integers and will NOT able to store 512 bit
    integers in a future (without the ABI breakage at least).
    
    The script, which helps to calculate those tables, is at end of the
    commit message.
    
    Next, how else `box_decimal_*()` library is different from the internal
    `decimal_*()`?
    
    * Added a structure that may hold any decimal value from any current or
      future tarantool version.
    * Added `box_decimal_copy()`.
    * Left `strtodec()` out of scope -- we can add it later.
    * Left `decimal_str()` out of scope -- it looks dangerous without at
      least a good explanation when data in the static buffer are
      invalidated. There is `box_decimal_to_string()` that writes to an
      explicitly provided buffer.
    * Added `box_decimal_mp_*()` for encoding to/decoding from msgpack.
      Unlike `mp_decimal.h` functions, here we always have `box_decimal_t`
      as the first parameter.
    * Left `decimal_pack()` out of scope, because a user unlikely wants to
      serialize a decimal value piece-by-piece.
    * Exposed `decimal_unpack()` as `box_decimal_mp_decode_data()` to keep a
      consistent terminogoly around msgpack encoding/decoding.
    * More detailed API description, grouping by functionality.
    
    The script, which helps to calculate sizes around `decimal_t`:
    
    ```lua
    -- See notes in decNumber.h.
    
    -- DECOPUN: DECimal Digits Per UNit
    local function unit_size(DECOPUN)
        assert(DECOPUN > 0 and DECOPUN < 10)
        if DECOPUN <= 2 then
            return 1
        elseif DECOPUN <= 4 then
            return 2
        end
        return 4
    end
    
    function sizeof_decimal_t(digits, DECOPUN)
        -- int32_t digits;
        -- int32_t exponent;
        -- uint8_t bits;
        -- <..padding..>
        -- <..units..>
        local us = unit_size(DECOPUN)
        local padding = us - 1
        local unit_count = math.ceil(digits / DECOPUN)
        return 4 + 4 + 1 + padding + us * unit_count
    end
    
    function string_buffer(digits)
        -- -9.{9...}E+999999999# (# is '\0')
        -- ^ ^      ^^^^^^^^^^^^
        return digits + 14
    end
    
    function binary_signed(digits)
        local x = 1
        while math.log10(2 ^ (x - 1)) < digits do
            x = x + 1
        end
        return x - 1
    end
    
    function binary_unsigned(digits)
        local x = 1
        while math.log10(2 ^ x) < digits do
            x = x + 1
        end
        return x - 1
    end
    
    function digits_for_binary_signed(x)
        return math.ceil(math.log10(2 ^ (x - 1)))
    end
    
    function digits_for_binary_unsigned(x)
        return math.ceil(math.log10(2 ^ x))
    end
    
    function summary(digits)
        print('digits', digits)
        local sizeof_min = math.huge
        local sizeof_max = 0
        local DECOPUN_sizeof_min
        local DECOPUN_sizeof_max
        for DECOPUN = 1, 9 do
            local sizeof = sizeof_decimal_t(digits, DECOPUN)
            print('sizeof', sizeof, 'DECOPUN', DECOPUN)
            if sizeof < sizeof_min then
                sizeof_min = sizeof
                DECOPUN_sizeof_min = DECOPUN
            end
            if sizeof > sizeof_max then
                sizeof_max = sizeof
                DECOPUN_sizeof_max = DECOPUN
            end
        end
        print('sizeof min', sizeof_min, 'DECOPUN', DECOPUN_sizeof_min)
        print('sizeof max', sizeof_max, 'DECOPUN', DECOPUN_sizeof_max)
        print('string', string_buffer(digits))
        print('int???_t', binary_signed(digits))
        print('uint???_t', binary_unsigned(digits))
    end
    ```
    
    Part of #7228
    
    @TarantoolBot document
    Title: Module API for decimals
    
    See the declarations in `src/box/decimal.h` in tarantool sources.
    5c1bc3da
    History
    decimal: add the library into the module API
    Alexander Turenko authored
    The main decision made in this patch is how large the public
    `box_decimal_t` type should be. Let's look on some calculations.
    
    We're interested in the following values.
    
    * How much decimal digits is stored?
    * Size of an internal decimal type (`sizeof(decimal_t)`).
    * Size of a buffer to store a string representation of any valid
      `decimat_t` value.
    * Largest signed integer type fully represented in decimal_t (number of
      bits).
    * Largest unsigned integer type fully represented in decimal_t (number
      of bits).
    
    Now `decimal_t` is defined to store 38 decimal digits. It means the
    following values:
    
    | digits | sizeof | string | int???_t | uint???_t |
    | ------ | ------ | ------ | -------- | --------- |
    | 38     | 36     | 52     | 126      | 127       |
    
    In fact, decNumber (the library we currently use under the hood) allows
    to vary the 'decimal digits per unit' parameter, which is 3 by default,
    so we can choose density of the representation. For example, for given
    38 digits the sizeof is 36 by default, but it may vary from 28 to 47
    bytes:
    
    | digits | sizeof     | string | int???_t | uint???_t |
    | ------ | ---------- | ------ | -------- | --------- |
    | 38     | 36 (28-47) | 52     | 126      | 127       |
    
    If we'll want to store `int128_t` and `uint128_t` ranges, we'll need 39
    digits:
    
    | digits | sizeof     | string | int???_t | uint???_t |
    | ------ | ---------- | ------ | -------- | --------- |
    | 39     | 36 (29-48) | 53     | 130      | 129       |
    
    If we'll want to store `int256_t` and `uint256_t` ranges:
    
    | digits | sizeof     | string | int???_t | uint???_t |
    | ------ | ---------- | ------ | -------- | --------- |
    | 78     | 62 (48-87) | 92     | 260      | 259       |
    
    If we'll want to store `int512_t` and `uint512_t` ranges:
    
    | digits | sizeof       | string | int???_t | uint???_t |
    | ------ | ------------ | ------ | -------- | --------- |
    | 155    | 114 (84-164) | 169    | 515      | 514       |
    
    The decision here is what we consdider as possible and what as unlikely.
    The patch freeze the maximum amount of bytes in `decimal_t` as 64. So
    we'll able to store 256 bit integers and will NOT able to store 512 bit
    integers in a future (without the ABI breakage at least).
    
    The script, which helps to calculate those tables, is at end of the
    commit message.
    
    Next, how else `box_decimal_*()` library is different from the internal
    `decimal_*()`?
    
    * Added a structure that may hold any decimal value from any current or
      future tarantool version.
    * Added `box_decimal_copy()`.
    * Left `strtodec()` out of scope -- we can add it later.
    * Left `decimal_str()` out of scope -- it looks dangerous without at
      least a good explanation when data in the static buffer are
      invalidated. There is `box_decimal_to_string()` that writes to an
      explicitly provided buffer.
    * Added `box_decimal_mp_*()` for encoding to/decoding from msgpack.
      Unlike `mp_decimal.h` functions, here we always have `box_decimal_t`
      as the first parameter.
    * Left `decimal_pack()` out of scope, because a user unlikely wants to
      serialize a decimal value piece-by-piece.
    * Exposed `decimal_unpack()` as `box_decimal_mp_decode_data()` to keep a
      consistent terminogoly around msgpack encoding/decoding.
    * More detailed API description, grouping by functionality.
    
    The script, which helps to calculate sizes around `decimal_t`:
    
    ```lua
    -- See notes in decNumber.h.
    
    -- DECOPUN: DECimal Digits Per UNit
    local function unit_size(DECOPUN)
        assert(DECOPUN > 0 and DECOPUN < 10)
        if DECOPUN <= 2 then
            return 1
        elseif DECOPUN <= 4 then
            return 2
        end
        return 4
    end
    
    function sizeof_decimal_t(digits, DECOPUN)
        -- int32_t digits;
        -- int32_t exponent;
        -- uint8_t bits;
        -- <..padding..>
        -- <..units..>
        local us = unit_size(DECOPUN)
        local padding = us - 1
        local unit_count = math.ceil(digits / DECOPUN)
        return 4 + 4 + 1 + padding + us * unit_count
    end
    
    function string_buffer(digits)
        -- -9.{9...}E+999999999# (# is '\0')
        -- ^ ^      ^^^^^^^^^^^^
        return digits + 14
    end
    
    function binary_signed(digits)
        local x = 1
        while math.log10(2 ^ (x - 1)) < digits do
            x = x + 1
        end
        return x - 1
    end
    
    function binary_unsigned(digits)
        local x = 1
        while math.log10(2 ^ x) < digits do
            x = x + 1
        end
        return x - 1
    end
    
    function digits_for_binary_signed(x)
        return math.ceil(math.log10(2 ^ (x - 1)))
    end
    
    function digits_for_binary_unsigned(x)
        return math.ceil(math.log10(2 ^ x))
    end
    
    function summary(digits)
        print('digits', digits)
        local sizeof_min = math.huge
        local sizeof_max = 0
        local DECOPUN_sizeof_min
        local DECOPUN_sizeof_max
        for DECOPUN = 1, 9 do
            local sizeof = sizeof_decimal_t(digits, DECOPUN)
            print('sizeof', sizeof, 'DECOPUN', DECOPUN)
            if sizeof < sizeof_min then
                sizeof_min = sizeof
                DECOPUN_sizeof_min = DECOPUN
            end
            if sizeof > sizeof_max then
                sizeof_max = sizeof
                DECOPUN_sizeof_max = DECOPUN
            end
        end
        print('sizeof min', sizeof_min, 'DECOPUN', DECOPUN_sizeof_min)
        print('sizeof max', sizeof_max, 'DECOPUN', DECOPUN_sizeof_max)
        print('string', string_buffer(digits))
        print('int???_t', binary_signed(digits))
        print('uint???_t', binary_unsigned(digits))
    end
    ```
    
    Part of #7228
    
    @TarantoolBot document
    Title: Module API for decimals
    
    See the declarations in `src/box/decimal.h` in tarantool sources.