Commit graph

14 commits

Author SHA1 Message Date
Timothy Flynn
274f8ee462 AK: Make hashing of UTF-16 strings cheaper
No need to iterate every byte of the string, we can iterate the code
units instead.

We must also actually record that we have cached the hash :^)
2025-08-07 02:05:50 +02:00
Timothy Flynn
1bc80848fb AK+LibWeb: Add a UTF-16 starts/ends with wrapper for a single code unit 2025-08-07 02:05:50 +02:00
Timothy Flynn
9e0b1bdfca AK: Add a parameter to to_number methods to change the parsed base
This just forwards through to AK::parse_number.
2025-08-07 02:05:50 +02:00
Timothy Flynn
782f8c381c AK: Implement the spaceship operator for UTF-16 strings 2025-08-05 07:07:15 -04:00
Timothy Flynn
5af99f4dd0 AK: Allow Utf16StringBase to hold null data
This is required by JS::PropertyKey. This will also be needed when we
implement an Optional<Utf16String> specialization.
2025-08-05 07:07:15 -04:00
Timothy Flynn
21d7d236e6 AK: Add a method to check if a UTF-16 string contains any code point 2025-07-28 18:30:50 +02:00
Timothy Flynn
baddac5155 AK: Implement a method to split a UTF-16 string 2025-07-28 12:25:11 +02:00
Timothy Flynn
48a3b2c28e AK: Implement a method to count instances of a needle in a UTF-16 string 2025-07-28 12:25:11 +02:00
Timothy Flynn
6c73dff120 AK: Implement a UTF-16 method to check if a string is ASCII whitespace 2025-07-24 19:00:20 +02:00
Jelle Raaijmakers
15178d5230 AK: Add ::ends_with() to Utf16View and Utf16StringBase
I noticed that we can significantly simplify ::starts_with(), and based
the new ::ends_with() on that.
2025-07-24 07:18:25 -04:00
Timothy Flynn
d40e3af697 AK: Implement UTF-16 string-to-number conversions 2025-07-18 12:45:38 -04:00
Timothy Flynn
6e0290ecaa AK: Define some UTF-16 helper methods
* contains
* escape_html_entities
* replace
* to_ascii_lowercase
* to_ascii_uppercase
* to_ascii_titlecase
* trim
* trim_whitespace
2025-07-18 12:45:38 -04:00
Timothy Flynn
7f069efbc4 AK: Implement a flyweight string for Utf16String
Utf16FlyString more or less works exactly the same as FlyString. It will
store the raw encoded data of the string instance. If the string is a
short ASCII string, Utf16FlyString holds the ShortString bytes; else,
Utf16FlyString holds a pointer to the Utf16StringData.
2025-07-18 12:45:38 -04:00
Timothy Flynn
fe676585f5 AK: Add a UTF-16 string with optimized short- and ASCII-string storage
This is a strictly UTF-16 string with some optimizations for ASCII.

* If created from a short UTF-8 or UTF-16 string that is also ASCII,
  then the string is stored in an inlined byte buffer.

* If created with a long UTF-8 or UTF-16 string that is also ASCII,
  then the string is stored in an outlined char buffer.

* If created with a short or long UTF-8 or UTF-16 string that is not
  ASCII, then the string is stored in an outlined char16 buffer.

We do not store short non-ASCII text in the inlined buffer to avoid
confusion with operations such as `length_in_code_units` and
`code_unit_at`. For example, "😀" would be stored as 4 UTF-8 bytes
in short string form. But we still want `length_in_code_units` to
be 2, and `code_unit_at(0)` to be 0xD83D.
2025-07-18 12:45:38 -04:00