Files · f4f5764568477edf4d71f02d23f529c1516735f6 · core / tarantool

sql: make built-in funcs treat '\0' as a usual symbol

Ivan Koptelov authored 6 years ago

If utf-8 string is passed to built-in functions such as LIKE, LENGTH etc,
and it contains '\0' symbol, then one is assumed to be end-of-string.
This approach is considered to be inappropriate. Lets fix it: treat '\0'
as another one utf-8 symbol and process strings containing it entirely.
Consider examples:

LENGTH(CHAR(65,00,65)) == 3
LIKE(CHAR(65,00,65), CHAR(65,00,66)) == False

Also the patch changes the way we count length of utf-8 strings.
Before we processed each byte of the string. Now we use the following
algorithm. Starting from the first byte in string, we try to determine
what kind of byte it is: first byte of 1,2,3 or 4 byte sequence. Then we
skip corresponding amount of bytes and increment symbol length (e.g. 2
bytes for 2 byte sequence). If current byte is not a valid first byte of
any sequence, when we skip it and increment symbol length.

Note that new approach might increase performance of LENGTH(), INSTR()
and TRIM().

Closes #3542

@TarantoolBot document
Title: null-term is treated now as usual character in str funcs
User-visible behavior of sql functions dealing with strings
would change as it is described in the commit message.

f4f57645

History

f4f57645 6 years ago

History

Name	Last commit	Last update