AK: Add AllowSurrogates to UTF-8 validator

The [UTF-8](https://datatracker.ietf.org/doc/html/rfc3629#page-5)
standard says to reject strings with upper or lower surrogates. However,
in many standards, ECMAScript included, unpaired surrogates (and
therefore UTF-8 surrogates) are allowed in strings. So, this commit
extends the UTF-8 validation API with `AllowSurrogates`, which will
reject upper and lower surrogate characters.
This commit is contained in:
Diego 2024-06-07 07:25:39 -07:00 committed by Ali Mohammad Pur
commit 7560b640f3
Notes: sideshowbarker 2024-07-17 03:35:24 +09:00
3 changed files with 21 additions and 8 deletions

View file

@ -105,11 +105,12 @@ ErrorOr<String> Utf16View::to_utf8(AllowInvalidCodeUnits allow_invalid_code_unit
TRY(builder.try_append_code_point(static_cast<u32>(*ptr)));
}
} else {
for (auto code_point : *this)
TRY(builder.try_append_code_point(code_point));
return builder.to_string_without_validation();
}
for (auto code_point : *this)
TRY(builder.try_append_code_point(code_point));
return builder.to_string();
}