-
Alexander Turenko authored
The main decision made in this patch is how large the public `box_decimal_t` type should be. Let's look on some calculations. We're interested in the following values. * How much decimal digits is stored? * Size of an internal decimal type (`sizeof(decimal_t)`). * Size of a buffer to store a string representation of any valid `decimat_t` value. * Largest signed integer type fully represented in decimal_t (number of bits). * Largest unsigned integer type fully represented in decimal_t (number of bits). Now `decimal_t` is defined to store 38 decimal digits. It means the following values: | digits | sizeof | string | int???_t | uint???_t | | ------ | ------ | ------ | -------- | --------- | | 38 | 36 | 52 | 126 | 127 | In fact, decNumber (the library we currently use under the hood) allows to vary the 'decimal digits per unit' parameter, which is 3 by default, so we can choose density of the representation. For example, for given 38 digits the sizeof is 36 by default, but it may vary from 28 to 47 bytes: | digits | sizeof | string | int???_t | uint???_t | | ------ | ---------- | ------ | -------- | --------- | | 38 | 36 (28-47) | 52 | 126 | 127 | If we'll want to store `int128_t` and `uint128_t` ranges, we'll need 39 digits: | digits | sizeof | string | int???_t | uint???_t | | ------ | ---------- | ------ | -------- | --------- | | 39 | 36 (29-48) | 53 | 130 | 129 | If we'll want to store `int256_t` and `uint256_t` ranges: | digits | sizeof | string | int???_t | uint???_t | | ------ | ---------- | ------ | -------- | --------- | | 78 | 62 (48-87) | 92 | 260 | 259 | If we'll want to store `int512_t` and `uint512_t` ranges: | digits | sizeof | string | int???_t | uint???_t | | ------ | ------------ | ------ | -------- | --------- | | 155 | 114 (84-164) | 169 | 515 | 514 | The decision here is what we consdider as possible and what as unlikely. The patch freeze the maximum amount of bytes in `decimal_t` as 64. So we'll able to store 256 bit integers and will NOT able to store 512 bit integers in a future (without the ABI breakage at least). The script, which helps to calculate those tables, is at end of the commit message. Next, how else `box_decimal_*()` library is different from the internal `decimal_*()`? * Added a structure that may hold any decimal value from any current or future tarantool version. * Added `box_decimal_copy()`. * Left `strtodec()` out of scope -- we can add it later. * Left `decimal_str()` out of scope -- it looks dangerous without at least a good explanation when data in the static buffer are invalidated. There is `box_decimal_to_string()` that writes to an explicitly provided buffer. * Added `box_decimal_mp_*()` for encoding to/decoding from msgpack. Unlike `mp_decimal.h` functions, here we always have `box_decimal_t` as the first parameter. * Left `decimal_pack()` out of scope, because a user unlikely wants to serialize a decimal value piece-by-piece. * Exposed `decimal_unpack()` as `box_decimal_mp_decode_data()` to keep a consistent terminogoly around msgpack encoding/decoding. * More detailed API description, grouping by functionality. The script, which helps to calculate sizes around `decimal_t`: ```lua -- See notes in decNumber.h. -- DECOPUN: DECimal Digits Per UNit local function unit_size(DECOPUN) assert(DECOPUN > 0 and DECOPUN < 10) if DECOPUN <= 2 then return 1 elseif DECOPUN <= 4 then return 2 end return 4 end function sizeof_decimal_t(digits, DECOPUN) -- int32_t digits; -- int32_t exponent; -- uint8_t bits; -- <..padding..> -- <..units..> local us = unit_size(DECOPUN) local padding = us - 1 local unit_count = math.ceil(digits / DECOPUN) return 4 + 4 + 1 + padding + us * unit_count end function string_buffer(digits) -- -9.{9...}E+999999999# (# is '\0') -- ^ ^ ^^^^^^^^^^^^ return digits + 14 end function binary_signed(digits) local x = 1 while math.log10(2 ^ (x - 1)) < digits do x = x + 1 end return x - 1 end function binary_unsigned(digits) local x = 1 while math.log10(2 ^ x) < digits do x = x + 1 end return x - 1 end function digits_for_binary_signed(x) return math.ceil(math.log10(2 ^ (x - 1))) end function digits_for_binary_unsigned(x) return math.ceil(math.log10(2 ^ x)) end function summary(digits) print('digits', digits) local sizeof_min = math.huge local sizeof_max = 0 local DECOPUN_sizeof_min local DECOPUN_sizeof_max for DECOPUN = 1, 9 do local sizeof = sizeof_decimal_t(digits, DECOPUN) print('sizeof', sizeof, 'DECOPUN', DECOPUN) if sizeof < sizeof_min then sizeof_min = sizeof DECOPUN_sizeof_min = DECOPUN end if sizeof > sizeof_max then sizeof_max = sizeof DECOPUN_sizeof_max = DECOPUN end end print('sizeof min', sizeof_min, 'DECOPUN', DECOPUN_sizeof_min) print('sizeof max', sizeof_max, 'DECOPUN', DECOPUN_sizeof_max) print('string', string_buffer(digits)) print('int???_t', binary_signed(digits)) print('uint???_t', binary_unsigned(digits)) end ``` Part of #7228 @TarantoolBot document Title: Module API for decimals See the declarations in `src/box/decimal.h` in tarantool sources.
Alexander Turenko authoredThe main decision made in this patch is how large the public `box_decimal_t` type should be. Let's look on some calculations. We're interested in the following values. * How much decimal digits is stored? * Size of an internal decimal type (`sizeof(decimal_t)`). * Size of a buffer to store a string representation of any valid `decimat_t` value. * Largest signed integer type fully represented in decimal_t (number of bits). * Largest unsigned integer type fully represented in decimal_t (number of bits). Now `decimal_t` is defined to store 38 decimal digits. It means the following values: | digits | sizeof | string | int???_t | uint???_t | | ------ | ------ | ------ | -------- | --------- | | 38 | 36 | 52 | 126 | 127 | In fact, decNumber (the library we currently use under the hood) allows to vary the 'decimal digits per unit' parameter, which is 3 by default, so we can choose density of the representation. For example, for given 38 digits the sizeof is 36 by default, but it may vary from 28 to 47 bytes: | digits | sizeof | string | int???_t | uint???_t | | ------ | ---------- | ------ | -------- | --------- | | 38 | 36 (28-47) | 52 | 126 | 127 | If we'll want to store `int128_t` and `uint128_t` ranges, we'll need 39 digits: | digits | sizeof | string | int???_t | uint???_t | | ------ | ---------- | ------ | -------- | --------- | | 39 | 36 (29-48) | 53 | 130 | 129 | If we'll want to store `int256_t` and `uint256_t` ranges: | digits | sizeof | string | int???_t | uint???_t | | ------ | ---------- | ------ | -------- | --------- | | 78 | 62 (48-87) | 92 | 260 | 259 | If we'll want to store `int512_t` and `uint512_t` ranges: | digits | sizeof | string | int???_t | uint???_t | | ------ | ------------ | ------ | -------- | --------- | | 155 | 114 (84-164) | 169 | 515 | 514 | The decision here is what we consdider as possible and what as unlikely. The patch freeze the maximum amount of bytes in `decimal_t` as 64. So we'll able to store 256 bit integers and will NOT able to store 512 bit integers in a future (without the ABI breakage at least). The script, which helps to calculate those tables, is at end of the commit message. Next, how else `box_decimal_*()` library is different from the internal `decimal_*()`? * Added a structure that may hold any decimal value from any current or future tarantool version. * Added `box_decimal_copy()`. * Left `strtodec()` out of scope -- we can add it later. * Left `decimal_str()` out of scope -- it looks dangerous without at least a good explanation when data in the static buffer are invalidated. There is `box_decimal_to_string()` that writes to an explicitly provided buffer. * Added `box_decimal_mp_*()` for encoding to/decoding from msgpack. Unlike `mp_decimal.h` functions, here we always have `box_decimal_t` as the first parameter. * Left `decimal_pack()` out of scope, because a user unlikely wants to serialize a decimal value piece-by-piece. * Exposed `decimal_unpack()` as `box_decimal_mp_decode_data()` to keep a consistent terminogoly around msgpack encoding/decoding. * More detailed API description, grouping by functionality. The script, which helps to calculate sizes around `decimal_t`: ```lua -- See notes in decNumber.h. -- DECOPUN: DECimal Digits Per UNit local function unit_size(DECOPUN) assert(DECOPUN > 0 and DECOPUN < 10) if DECOPUN <= 2 then return 1 elseif DECOPUN <= 4 then return 2 end return 4 end function sizeof_decimal_t(digits, DECOPUN) -- int32_t digits; -- int32_t exponent; -- uint8_t bits; -- <..padding..> -- <..units..> local us = unit_size(DECOPUN) local padding = us - 1 local unit_count = math.ceil(digits / DECOPUN) return 4 + 4 + 1 + padding + us * unit_count end function string_buffer(digits) -- -9.{9...}E+999999999# (# is '\0') -- ^ ^ ^^^^^^^^^^^^ return digits + 14 end function binary_signed(digits) local x = 1 while math.log10(2 ^ (x - 1)) < digits do x = x + 1 end return x - 1 end function binary_unsigned(digits) local x = 1 while math.log10(2 ^ x) < digits do x = x + 1 end return x - 1 end function digits_for_binary_signed(x) return math.ceil(math.log10(2 ^ (x - 1))) end function digits_for_binary_unsigned(x) return math.ceil(math.log10(2 ^ x)) end function summary(digits) print('digits', digits) local sizeof_min = math.huge local sizeof_max = 0 local DECOPUN_sizeof_min local DECOPUN_sizeof_max for DECOPUN = 1, 9 do local sizeof = sizeof_decimal_t(digits, DECOPUN) print('sizeof', sizeof, 'DECOPUN', DECOPUN) if sizeof < sizeof_min then sizeof_min = sizeof DECOPUN_sizeof_min = DECOPUN end if sizeof > sizeof_max then sizeof_max = sizeof DECOPUN_sizeof_max = DECOPUN end end print('sizeof min', sizeof_min, 'DECOPUN', DECOPUN_sizeof_min) print('sizeof max', sizeof_max, 'DECOPUN', DECOPUN_sizeof_max) print('string', string_buffer(digits)) print('int???_t', binary_signed(digits)) print('uint???_t', binary_unsigned(digits)) end ``` Part of #7228 @TarantoolBot document Title: Module API for decimals See the declarations in `src/box/decimal.h` in tarantool sources.