src/box/decimal.c · e1d961708504cd6f9f84dc7c8bb9f108eb5152e5 · core / tarantool

2 years ago

decimal: add the library into the module API · 5c1bc3da

The main decision made in this patch is how large the public
`box_decimal_t` type should be. Let's look on some calculations.

We're interested in the following values.

* How much decimal digits is stored?
* Size of an internal decimal type (`sizeof(decimal_t)`).
* Size of a buffer to store a string representation of any valid
  `decimat_t` value.
* Largest signed integer type fully represented in decimal_t (number of
  bits).
* Largest unsigned integer type fully represented in decimal_t (number
  of bits).

Now `decimal_t` is defined to store 38 decimal digits. It means the
following values:

| digits | sizeof | string | int???_t | uint???_t |
| ------ | ------ | ------ | -------- | --------- |
| 38     | 36     | 52     | 126      | 127       |

In fact, decNumber (the library we currently use under the hood) allows
to vary the 'decimal digits per unit' parameter, which is 3 by default,
so we can choose density of the representation. For example, for given
38 digits the sizeof is 36 by default, but it may vary from 28 to 47
bytes:

| digits | sizeof     | string | int???_t | uint???_t |
| ------ | ---------- | ------ | -------- | --------- |
| 38     | 36 (28-47) | 52     | 126      | 127       |

If we'll want to store `int128_t` and `uint128_t` ranges, we'll need 39
digits:

| digits | sizeof     | string | int???_t | uint???_t |
| ------ | ---------- | ------ | -------- | --------- |
| 39     | 36 (29-48) | 53     | 130      | 129       |

If we'll want to store `int256_t` and `uint256_t` ranges:

| digits | sizeof     | string | int???_t | uint???_t |
| ------ | ---------- | ------ | -------- | --------- |
| 78     | 62 (48-87) | 92     | 260      | 259       |

If we'll want to store `int512_t` and `uint512_t` ranges:

| digits | sizeof       | string | int???_t | uint???_t |
| ------ | ------------ | ------ | -------- | --------- |
| 155    | 114 (84-164) | 169    | 515      | 514       |

The decision here is what we consdider as possible and what as unlikely.
The patch freeze the maximum amount of bytes in `decimal_t` as 64. So
we'll able to store 256 bit integers and will NOT able to store 512 bit
integers in a future (without the ABI breakage at least).

The script, which helps to calculate those tables, is at end of the
commit message.

Next, how else `box_decimal_*()` library is different from the internal
`decimal_*()`?

* Added a structure that may hold any decimal value from any current or
  future tarantool version.
* Added `box_decimal_copy()`.
* Left `strtodec()` out of scope -- we can add it later.
* Left `decimal_str()` out of scope -- it looks dangerous without at
  least a good explanation when data in the static buffer are
  invalidated. There is `box_decimal_to_string()` that writes to an
  explicitly provided buffer.
* Added `box_decimal_mp_*()` for encoding to/decoding from msgpack.
  Unlike `mp_decimal.h` functions, here we always have `box_decimal_t`
  as the first parameter.
* Left `decimal_pack()` out of scope, because a user unlikely wants to
  serialize a decimal value piece-by-piece.
* Exposed `decimal_unpack()` as `box_decimal_mp_decode_data()` to keep a
  consistent terminogoly around msgpack encoding/decoding.
* More detailed API description, grouping by functionality.

The script, which helps to calculate sizes around `decimal_t`:

```lua
-- See notes in decNumber.h.

-- DECOPUN: DECimal Digits Per UNit
local function unit_size(DECOPUN)
    assert(DECOPUN > 0 and DECOPUN < 10)
    if DECOPUN <= 2 then
        return 1
    elseif DECOPUN <= 4 then
        return 2
    end
    return 4
end

function sizeof_decimal_t(digits, DECOPUN)
    -- int32_t digits;
    -- int32_t exponent;
    -- uint8_t bits;
    -- <..padding..>
    -- <..units..>
    local us = unit_size(DECOPUN)
    local padding = us - 1
    local unit_count = math.ceil(digits / DECOPUN)
    return 4 + 4 + 1 + padding + us * unit_count
end

function string_buffer(digits)
    -- -9.{9...}E+999999999# (# is '\0')
    -- ^ ^      ^^^^^^^^^^^^
    return digits + 14
end

function binary_signed(digits)
    local x = 1
    while math.log10(2 ^ (x - 1)) < digits do
        x = x + 1
    end
    return x - 1
end

function binary_unsigned(digits)
    local x = 1
    while math.log10(2 ^ x) < digits do
        x = x + 1
    end
    return x - 1
end

function digits_for_binary_signed(x)
    return math.ceil(math.log10(2 ^ (x - 1)))
end

function digits_for_binary_unsigned(x)
    return math.ceil(math.log10(2 ^ x))
end

function summary(digits)
    print('digits', digits)
    local sizeof_min = math.huge
    local sizeof_max = 0
    local DECOPUN_sizeof_min
    local DECOPUN_sizeof_max
    for DECOPUN = 1, 9 do
        local sizeof = sizeof_decimal_t(digits, DECOPUN)
        print('sizeof', sizeof, 'DECOPUN', DECOPUN)
        if sizeof < sizeof_min then
            sizeof_min = sizeof
            DECOPUN_sizeof_min = DECOPUN
        end
        if sizeof > sizeof_max then
            sizeof_max = sizeof
            DECOPUN_sizeof_max = DECOPUN
        end
    end
    print('sizeof min', sizeof_min, 'DECOPUN', DECOPUN_sizeof_min)
    print('sizeof max', sizeof_max, 'DECOPUN', DECOPUN_sizeof_max)
    print('string', string_buffer(digits))
    print('int???_t', binary_signed(digits))
    print('uint???_t', binary_unsigned(digits))
end
```

Part of #7228

@TarantoolBot document
Title: Module API for decimals

See the declarations in `src/box/decimal.h` in tarantool sources.

5c1bc3da

History

decimal: add the library into the module API

Alexander Turenko authored 2 years ago

The main decision made in this patch is how large the public
`box_decimal_t` type should be. Let's look on some calculations.

We're interested in the following values.

* How much decimal digits is stored?
* Size of an internal decimal type (`sizeof(decimal_t)`).
* Size of a buffer to store a string representation of any valid
  `decimat_t` value.
* Largest signed integer type fully represented in decimal_t (number of
  bits).
* Largest unsigned integer type fully represented in decimal_t (number
  of bits).

Now `decimal_t` is defined to store 38 decimal digits. It means the
following values:

| digits | sizeof | string | int???_t | uint???_t |
| ------ | ------ | ------ | -------- | --------- |
| 38     | 36     | 52     | 126      | 127       |

In fact, decNumber (the library we currently use under the hood) allows
to vary the 'decimal digits per unit' parameter, which is 3 by default,
so we can choose density of the representation. For example, for given
38 digits the sizeof is 36 by default, but it may vary from 28 to 47
bytes:

| digits | sizeof     | string | int???_t | uint???_t |
| ------ | ---------- | ------ | -------- | --------- |
| 38     | 36 (28-47) | 52     | 126      | 127       |

If we'll want to store `int128_t` and `uint128_t` ranges, we'll need 39
digits:

| digits | sizeof     | string | int???_t | uint???_t |
| ------ | ---------- | ------ | -------- | --------- |
| 39     | 36 (29-48) | 53     | 130      | 129       |

If we'll want to store `int256_t` and `uint256_t` ranges:

| digits | sizeof     | string | int???_t | uint???_t |
| ------ | ---------- | ------ | -------- | --------- |
| 78     | 62 (48-87) | 92     | 260      | 259       |

If we'll want to store `int512_t` and `uint512_t` ranges:

| digits | sizeof       | string | int???_t | uint???_t |
| ------ | ------------ | ------ | -------- | --------- |
| 155    | 114 (84-164) | 169    | 515      | 514       |

The decision here is what we consdider as possible and what as unlikely.
The patch freeze the maximum amount of bytes in `decimal_t` as 64. So
we'll able to store 256 bit integers and will NOT able to store 512 bit
integers in a future (without the ABI breakage at least).

The script, which helps to calculate those tables, is at end of the
commit message.

Next, how else `box_decimal_*()` library is different from the internal
`decimal_*()`?

* Added a structure that may hold any decimal value from any current or
  future tarantool version.
* Added `box_decimal_copy()`.
* Left `strtodec()` out of scope -- we can add it later.
* Left `decimal_str()` out of scope -- it looks dangerous without at
  least a good explanation when data in the static buffer are
  invalidated. There is `box_decimal_to_string()` that writes to an
  explicitly provided buffer.
* Added `box_decimal_mp_*()` for encoding to/decoding from msgpack.
  Unlike `mp_decimal.h` functions, here we always have `box_decimal_t`
  as the first parameter.
* Left `decimal_pack()` out of scope, because a user unlikely wants to
  serialize a decimal value piece-by-piece.
* Exposed `decimal_unpack()` as `box_decimal_mp_decode_data()` to keep a
  consistent terminogoly around msgpack encoding/decoding.
* More detailed API description, grouping by functionality.

The script, which helps to calculate sizes around `decimal_t`:

```lua
-- See notes in decNumber.h.

-- DECOPUN: DECimal Digits Per UNit
local function unit_size(DECOPUN)
    assert(DECOPUN > 0 and DECOPUN < 10)
    if DECOPUN <= 2 then
        return 1
    elseif DECOPUN <= 4 then
        return 2
    end
    return 4
end

function sizeof_decimal_t(digits, DECOPUN)
    -- int32_t digits;
    -- int32_t exponent;
    -- uint8_t bits;
    -- <..padding..>
    -- <..units..>
    local us = unit_size(DECOPUN)
    local padding = us - 1
    local unit_count = math.ceil(digits / DECOPUN)
    return 4 + 4 + 1 + padding + us * unit_count
end

function string_buffer(digits)
    -- -9.{9...}E+999999999# (# is '\0')
    -- ^ ^      ^^^^^^^^^^^^
    return digits + 14
end

function binary_signed(digits)
    local x = 1
    while math.log10(2 ^ (x - 1)) < digits do
        x = x + 1
    end
    return x - 1
end

function binary_unsigned(digits)
    local x = 1
    while math.log10(2 ^ x) < digits do
        x = x + 1
    end
    return x - 1
end

function digits_for_binary_signed(x)
    return math.ceil(math.log10(2 ^ (x - 1)))
end

function digits_for_binary_unsigned(x)
    return math.ceil(math.log10(2 ^ x))
end

function summary(digits)
    print('digits', digits)
    local sizeof_min = math.huge
    local sizeof_max = 0
    local DECOPUN_sizeof_min
    local DECOPUN_sizeof_max
    for DECOPUN = 1, 9 do
        local sizeof = sizeof_decimal_t(digits, DECOPUN)
        print('sizeof', sizeof, 'DECOPUN', DECOPUN)
        if sizeof < sizeof_min then
            sizeof_min = sizeof
            DECOPUN_sizeof_min = DECOPUN
        end
        if sizeof > sizeof_max then
            sizeof_max = sizeof
            DECOPUN_sizeof_max = DECOPUN
        end
    end
    print('sizeof min', sizeof_min, 'DECOPUN', DECOPUN_sizeof_min)
    print('sizeof max', sizeof_max, 'DECOPUN', DECOPUN_sizeof_max)
    print('string', string_buffer(digits))
    print('int???_t', binary_signed(digits))
    print('uint???_t', binary_unsigned(digits))
end
```

Part of #7228

@TarantoolBot document
Title: Module API for decimals

See the declarations in `src/box/decimal.h` in tarantool sources.