Commit graph

18 commits

Author SHA1 Message Date
Timothy Flynn
d9502505c2 AK: Fix bounds assertions in Utf16View::iterator_offset 2025-07-28 18:30:50 +02:00
Timothy Flynn
67723ef83c AK: Add a method to peek ahead of a UTF-16 iterator 2025-07-28 18:30:50 +02:00
Timothy Flynn
21d7d236e6 AK: Add a method to check if a UTF-16 string contains any code point 2025-07-28 18:30:50 +02:00
Timothy Flynn
ed63a60247 AK: Return an empty optional when UTF-16 code unit lookup fails
Accidentally returned the wrong type here.
2025-07-28 12:25:11 +02:00
Timothy Flynn
baddac5155 AK: Implement a method to split a UTF-16 string 2025-07-28 12:25:11 +02:00
Timothy Flynn
48a3b2c28e AK: Implement a method to count instances of a needle in a UTF-16 string 2025-07-28 12:25:11 +02:00
Timothy Flynn
6c73dff120 AK: Implement a UTF-16 method to check if a string is ASCII whitespace 2025-07-24 19:00:20 +02:00
Jelle Raaijmakers
15178d5230 AK: Add ::ends_with() to Utf16View and Utf16StringBase
I noticed that we can significantly simplify ::starts_with(), and based
the new ::ends_with() on that.
2025-07-24 07:18:25 -04:00
Timothy Flynn
ad7ac679fd AK: Compute Utf16View::code_point_offset_of correctly
There were a couple of issues here, including the following computation
could actually overflow to NumericLimits<size_t>::max():

    code_unit_offset -= it.length_in_code_units();
2025-07-22 17:17:33 +02:00
Timothy Flynn
f595e47c1f AK: Add unit tests for Utf16View::code_unit_offset_of 2025-07-22 17:17:33 +02:00
Timothy Flynn
9582895759 AK+LibJS+LibWeb+LibRegex: Replace AK::Utf16Data with AK::Utf16String 2025-07-18 12:45:38 -04:00
Timothy Flynn
6e0290ecaa AK: Define some UTF-16 helper methods
* contains
* escape_html_entities
* replace
* to_ascii_lowercase
* to_ascii_uppercase
* to_ascii_titlecase
* trim
* trim_whitespace
2025-07-18 12:45:38 -04:00
Timothy Flynn
8fbb80fffc AK: Do not fall back to simdutf for UTF-16 ASCII validation
This was a mistake. Consider U+201C (LEFT DOUBLE QUOTATION MARK). This
code point is encoded as the bytes 0x1c 0x20 in UTF-16LE. Both of these
bytes are ASCII if interpreted as UTF-8. But the string itself is most
certainly not ASCII.
2025-07-18 12:45:38 -04:00
Timothy Flynn
9fc3e72db2 AK+Everywhere: Allow lonely UTF-16 surrogates by default
By definition, the web allows lonely surrogates by default. Let's have
our string APIs reflect this, so we don't have to pass an allow option
all over the place.
2025-07-03 09:51:56 -04:00
Timothy Flynn
86b1c78c1a AK+Everywhere: Prepare Utf16View for integration with a UTF-16 string
To prepare for an upcoming Utf16String, this migrates Utf16View to store
its data as a char16_t. Most function definitions are moved inline and
made constexpr.

This also adds a UDL to construct a Utf16View from a string literal:

    auto string = u"hello"sv;

This let's us remove the NTTP Utf16View constructor, as we have found
that such constructors bloat binary size quite a bit.
2025-07-03 09:51:56 -04:00
Timothy Flynn
2abc955ca9 AK: Allow treating UTF-16 views with lonely surrogates as valid
Much of the web requires us to allow lonely surrogates in UTF-16 data.
The default behavior to disallow such code units has not been changed
here - that will be changed in an upcoming commit.
2025-07-03 09:51:56 -04:00
Timothy Flynn
d978a582a0 AK: Add a Utf16View ASCII validator 2025-07-03 09:51:56 -04:00
Timothy Flynn
35a1832d08 Tests/AK: Rename TestUtf16 / TestUtf8 to TestUtf16View / TestUtf8View
These are the files they actually test, so let's rename them to avoid
confusion with an upcoming Utf16String test.
2025-07-03 09:51:56 -04:00
Renamed from Tests/AK/TestUtf16.cpp (Browse further)