AK: Add a UTF-16 string with optimized short- and ASCII-string storage

This is a strictly UTF-16 string with some optimizations for ASCII.

* If created from a short UTF-8 or UTF-16 string that is also ASCII,
  then the string is stored in an inlined byte buffer.

* If created with a long UTF-8 or UTF-16 string that is also ASCII,
  then the string is stored in an outlined char buffer.

* If created with a short or long UTF-8 or UTF-16 string that is not
  ASCII, then the string is stored in an outlined char16 buffer.

We do not store short non-ASCII text in the inlined buffer to avoid
confusion with operations such as `length_in_code_units` and
`code_unit_at`. For example, "😀" would be stored as 4 UTF-8 bytes
in short string form. But we still want `length_in_code_units` to
be 2, and `code_unit_at(0)` to be 0xD83D.
This commit is contained in:
Timothy Flynn 2025-06-12 19:29:41 -04:00 committed by Tim Flynn
parent 8fbb80fffc
commit fe676585f5
Notes: github-actions[bot] 2025-07-18 16:47:31 +00:00
17 changed files with 1527 additions and 44 deletions

View file

@ -89,9 +89,9 @@ WebIDL::ExceptionOr<void> CharacterData::replace_data(size_t offset, size_t coun
Utf16Data full_data;
full_data.ensure_capacity(before_data.length_in_code_units() + inserted_data_result.data.size() + after_data.length_in_code_units());
full_data.append(before_data.span().data(), before_data.length_in_code_units());
full_data.append(before_data.utf16_span().data(), before_data.length_in_code_units());
full_data.extend(inserted_data_result.data);
full_data.append(after_data.span().data(), after_data.length_in_code_units());
full_data.append(after_data.utf16_span().data(), after_data.length_in_code_units());
Utf16View full_view { full_data };
bool characters_are_the_same = utf16_view == full_view;