Skip to content
Snippets Groups Projects
user avatar
Ivan Koptelov authored
If utf-8 string is passed to built-in functions such as LIKE, LENGTH etc,
and it contains '\0' symbol, then one is assumed to be end-of-string.
This approach is considered to be inappropriate. Lets fix it: treat '\0'
as another one utf-8 symbol and process strings containing it entirely.
Consider examples:

LENGTH(CHAR(65,00,65)) == 3
LIKE(CHAR(65,00,65), CHAR(65,00,66)) == False

Also the patch changes the way we count length of utf-8 strings.
Before we processed each byte of the string. Now we use the following
algorithm. Starting from the first byte in string, we try to determine
what kind of byte it is: first byte of 1,2,3 or 4 byte sequence. Then we
skip corresponding amount of bytes and increment symbol length (e.g. 2
bytes for 2 byte sequence). If current byte is not a valid first byte of
any sequence, when we skip it and increment symbol length.

Note that new approach might increase performance of LENGTH(), INSTR()
and TRIM().

Closes #3542

@TarantoolBot document
Title: null-term is treated now as usual character in str funcs
User-visible behavior of sql functions dealing with strings
would change as it is described in the commit message.
f4f57645
History
Name Last commit Last update