Skip to content
Snippets Groups Projects
Commit f4f57645 authored by Ivan Koptelov's avatar Ivan Koptelov
Browse files

sql: make built-in funcs treat '\0' as a usual symbol

If utf-8 string is passed to built-in functions such as LIKE, LENGTH etc,
and it contains '\0' symbol, then one is assumed to be end-of-string.
This approach is considered to be inappropriate. Lets fix it: treat '\0'
as another one utf-8 symbol and process strings containing it entirely.
Consider examples:

LENGTH(CHAR(65,00,65)) == 3
LIKE(CHAR(65,00,65), CHAR(65,00,66)) == False

Also the patch changes the way we count length of utf-8 strings.
Before we processed each byte of the string. Now we use the following
algorithm. Starting from the first byte in string, we try to determine
what kind of byte it is: first byte of 1,2,3 or 4 byte sequence. Then we
skip corresponding amount of bytes and increment symbol length (e.g. 2
bytes for 2 byte sequence). If current byte is not a valid first byte of
any sequence, when we skip it and increment symbol length.

Note that new approach might increase performance of LENGTH(), INSTR()
and TRIM().

Closes #3542

@TarantoolBot document
Title: null-term is treated now as usual character in str funcs
User-visible behavior of sql functions dealing with strings
would change as it is described in the commit message.
parent aa16acc2
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment