This is a strictly UTF-16 string with some optimizations for ASCII.
* If created from a short UTF-8 or UTF-16 string that is also ASCII,
then the string is stored in an inlined byte buffer.
* If created with a long UTF-8 or UTF-16 string that is also ASCII,
then the string is stored in an outlined char buffer.
* If created with a short or long UTF-8 or UTF-16 string that is not
ASCII, then the string is stored in an outlined char16 buffer.
We do not store short non-ASCII text in the inlined buffer to avoid
confusion with operations such as `length_in_code_units` and
`code_unit_at`. For example, "😀" would be stored as 4 UTF-8 bytes
in short string form. But we still want `length_in_code_units` to
be 2, and `code_unit_at(0)` to be 0xD83D.
To prepare for an upcoming Utf16String, this migrates Utf16View to store
its data as a char16_t. Most function definitions are moved inline and
made constexpr.
This also adds a UDL to construct a Utf16View from a string literal:
auto string = u"hello"sv;
This let's us remove the NTTP Utf16View constructor, as we have found
that such constructors bloat binary size quite a bit.
When the cursor was positioned at the end of text,
attempting to move it left(using the left arrow key)
would fail because align_boundary() was rejecting
the end-of-text position as a valid boundary.
In the UTF-8 implementation, this prevents out-of-bounds access of the
underlying text data, as the ICU macro would essentially do something
akin to `text[text.length()]`.
The UTF-16 implementation already checks for out-of-bounds, but would
previously return 0. We now return an empty Optional in both impls. This
doesn't affect LibJS (the user of the UTF-16 impl), as it already does
bounds checking before invoking LibUnicode APIs.