-
Vladimir Davydov authored
Historically, we encode strings that contain invalid or non-printable utf-8 sequences in YAML as binary base64 blobs. We do that because of limitations/bugs of the YAML encoder, which refuses to encode invalid utf-8 strings. To work around this issue, we introduced the helper utf8_check_printable, which is basically a copy of yaml_check_utf8, and treat strings for which it fails as binary data (MP_BIN). This commit updates the YAML submodule to the version where all known issues with encoding invalid/unprintable utf-8 strings are fixed and removes special treatment of such strings (drops utf8_check_printable). Now unprintable or invalid utf-8 sequences are emitted as code points, e.g. '\xFF' or '\uFFFF'. This change is a pre-requisite for introducing the new varbinary type to Lua. Without it plain strings would be implicitly converted to varbinary after decoding/encoding them in YAML, which would be confusing. Closes #8756 NO_DOC=bug fix (cherry picked from commit 890a821c)
Vladimir Davydov authoredHistorically, we encode strings that contain invalid or non-printable utf-8 sequences in YAML as binary base64 blobs. We do that because of limitations/bugs of the YAML encoder, which refuses to encode invalid utf-8 strings. To work around this issue, we introduced the helper utf8_check_printable, which is basically a copy of yaml_check_utf8, and treat strings for which it fails as binary data (MP_BIN). This commit updates the YAML submodule to the version where all known issues with encoding invalid/unprintable utf-8 strings are fixed and removes special treatment of such strings (drops utf8_check_printable). Now unprintable or invalid utf-8 sequences are emitted as code points, e.g. '\xFF' or '\uFFFF'. This change is a pre-requisite for introducing the new varbinary type to Lua. Without it plain strings would be implicitly converted to varbinary after decoding/encoding them in YAML, which would be confusing. Closes #8756 NO_DOC=bug fix (cherry picked from commit 890a821c)
gh-8756-yaml-unprintable-utf8-encoding-fix.md 232 B
bugfix/core
- Eliminated implicit conversion of unprintable utf-8 strings to binary blobs
when encoded in YAML. Now unprintable characters are encoded as escaped utf-8
code points, for example,
\x80
or\u200B
(gh-8756).