Commit graph

3950 commits

Author SHA1 Message Date
Glenn Skrzypczak
d25d62e74c AK/Time+LibWeb/HTML: Fix ISO8601 week conversions
This reimplements conversions between unix date times and ISO8601
weeks. The new algorithms also do not use loops, so they should be
faster.
2025-08-14 11:05:28 -04:00
Timothy Flynn
8472e469f4 AK+LibJS+LibWeb: Recognize that our UTF-16 string is actually WTF-16
For the web, we allow a wobbly UTF-16 encoding (i.e. lonely surrogates
are permitted). Only in a few exceptional cases do we strictly require
valid UTF-16. As such, our `validate(AllowLonelySurrogates::Yes)` calls
will always succeed. It's a wasted effort to ever make such a check.

This patch eliminates such invocations. The validation methods will now
only check for strict UTF-16, and are only invoked when needed.
2025-08-13 09:56:13 -04:00
Timothy Flynn
36c7302178 AK: Optimize the UTF-16 StringBuilder for ASCII storage
When we build a UTF-16 string, we currently always switch to the UTF-16
storage mode inside StringBuilder. Then when it comes time to create the
string, we switch the storage to ASCII if possible (by shifting the
underlying bytes up).

Instead, let's start out with ASCII storage and then switch to UTF-16
storage once we see a non-ASCII code point. For most strings, this will
avoid allocating 2x the memory, and avoids many ASCII validation calls.
2025-08-13 09:56:13 -04:00
Timothy Flynn
99d7e08dff AK: Templatize GenericLexer for UTF-16 strings
We now define GenericLexer as a template to allow using it with UTF-16
strings. To keep existing users happy, the template is defined in the
Detail namespace. Then AK::GenericLexer is an alias for a char-based
view, and AK::Utf16GenericLexer is an alias for a char16-based view.
2025-08-13 09:56:13 -04:00
Timothy Flynn
28d9d3a2c7 AK+Libraries: Reduce API surface of GenericLexer a bit
* Remove completely unused methods.
* Deduplicate methods that were overloaded with both StringView and
  char const* parameters.

A future commit will templatize GenericLexer by char type. This patch
serves to make that a tiny bit easier.
2025-08-13 09:56:13 -04:00
Callum Law
861bcbd9ad AK: Format floats with precision in scientific notation where applicable 2025-08-11 17:10:04 +01:00
Callum Law
1474da31c7 AK: Reduce duplication between put_f32_or_f64 and put_f64_with_precision
We were handling the special cases of NaN and Infinity in basically the
same way across both functions - we can reduce code duplication by
moving this to before we branch.

This is also required as we will be moving the logic to encode in
scientific notation above the branch in a later commit and the
`convert_floating_point_to_decimal_exponential_form` method doesn't work
with non-finite values.
2025-08-11 17:10:04 +01:00
Timothy Flynn
f03c432b52 AK: Use simdutf for searching strings for a single code unit
In the following synthetic benchmark, the simdutf version is 4x faster:

    BENCHMARK_CASE(find)
    {
        auto string = u"😀Foo😀Bar"sv;

        for (size_t i = 0; i < 100'000'000; ++i)
            (void)string.find_code_unit_offset('a');
    }
2025-08-11 16:55:37 +02:00
Idan Horowitz
93692242b9 AK: Implement take_all_matching(predicate) API in HashMap 2025-08-08 13:09:58 -04:00
Idan Horowitz
5097e72174 AK: Implement take_all_matching(predicate) API in HashTable 2025-08-08 13:09:58 -04:00
Ali Mohammad Pur
e47fceed38 AK: Optionally keep track of the last slot in Vector
last() and take_last() are extremely common ops when the vector is used
like a stack.
2025-08-08 12:54:06 +02:00
Ali Mohammad Pur
2cd4b4e28d AK: Skip vcalls to Stream::read_value and read_until_filled in LEB128
...for the first byte.
This function only really needs to read a single byte at that point, so
read_until_filled() is useless and read_value<u8> is functionally
equivalent to just a read.

This showed up hot in a wasm parse benchmark.
2025-08-08 12:54:06 +02:00
Ali Mohammad Pur
bf4c436ef3 AK: Add some higher-level operations to DoublyLinkedList<T>
This also adds a node cache as allocation/deallocation was showing up in
my profiles; disabled by default to keep the old behaviour.
2025-08-08 12:54:06 +02:00
Ali Mohammad Pur
834fb0be36 AK: Make some Stream::read* functions available inline
These are quite bottlenecky in wasm, the next commit will try to make
use of this by calling them directly instead of doing a vcall, and
having them inlineable helps the compiler a bit.
2025-08-08 12:54:06 +02:00
Ali Mohammad Pur
0f13952f30 AK: Simplify some stream reading logic
These do the same thing in a less convoluted way. NFC.
2025-08-08 12:54:06 +02:00
Timothy Flynn
298ec6a12a AK: Ensure StringBuilder encodes U+10000 as 2 UTF-16 code units 2025-08-07 02:05:50 +02:00
Timothy Flynn
1b611fba67 AK: Ensure Utf16FlyString is hash-compatible with Utf16View/Utf16String 2025-08-07 02:05:50 +02:00
Timothy Flynn
274f8ee462 AK: Make hashing of UTF-16 strings cheaper
No need to iterate every byte of the string, we can iterate the code
units instead.

We must also actually record that we have cached the hash :^)
2025-08-07 02:05:50 +02:00
Timothy Flynn
73154defa8 AK: Allow implicitly constructing a Utf16View from a Utf16FlyString
The same already works for String and FlyString into StringView, and for
Utf16String into Utf16View.
2025-08-07 02:05:50 +02:00
Timothy Flynn
1bc80848fb AK+LibWeb: Add a UTF-16 starts/ends with wrapper for a single code unit 2025-08-07 02:05:50 +02:00
Timothy Flynn
7082cafdbc AK: Add a UTF-16 replacement wrapper to replace a single code unit
Just for convenience for interop with existing code.
2025-08-07 02:05:50 +02:00
Timothy Flynn
9e0b1bdfca AK: Add a parameter to to_number methods to change the parsed base
This just forwards through to AK::parse_number.
2025-08-07 02:05:50 +02:00
Timothy Flynn
bbda6d13f7 AK: Add a Utf16View method to retrieve an iterator at a code unit offset 2025-08-07 02:05:50 +02:00
Timothy Flynn
6d1f90c739 AK: Remove now-unused UTF-16 length from UTF-8 string helper 2025-08-05 15:13:36 +02:00
Timothy Flynn
2dc0a3b3ce AK: Add trim methods to Utf16String that skip allocation when not needed
If the string does not begin with any of the provided code units, we do
not need to create a new string.
2025-08-05 15:13:36 +02:00
Timothy Flynn
cd276235d7 AK: Add a couple of validation-skipping UTF-16 string factories
String and FlyString are known to be valid UTF-8, so we can skip
validation when constructing a UTF-16 string from them.
2025-08-05 07:07:15 -04:00
Timothy Flynn
782f8c381c AK: Implement the spaceship operator for UTF-16 strings 2025-08-05 07:07:15 -04:00
Timothy Flynn
5af99f4dd0 AK: Allow Utf16StringBase to hold null data
This is required by JS::PropertyKey. This will also be needed when we
implement an Optional<Utf16String> specialization.
2025-08-05 07:07:15 -04:00
Timothy Flynn
0bf565b97f AK: Allow comparing UTF-16 strings to UTF-8 strings
Before now, you could compare a Utf16View to a StringView, but it would
only be valid if the StringView were ASCII. When porting code to UTF-16,
it will be handy to have a code point-aware implementation for non-ASCII
StringViews.
2025-08-05 07:07:15 -04:00
Timothy Flynn
319e7aa03b AK: Do not replace lonely surragates with U+FFFD while iterating
Utf8View doesn't do this either. The wobbly format is expected by JS.
2025-08-05 07:07:15 -04:00
Timothy Flynn
13ed6aba71 AK+LibIPC: Implement an encoder/decoder for UTF-16 strings 2025-08-02 10:10:14 -07:00
Aliaksandr Kalenik
d47a22150d AK: Define operator== for HashMap 2025-07-30 11:06:05 +02:00
Grant Knowlton
9e1e4f3b15 AK: Validate compressed tags in IPv4-mapped IPv6 address
This disallows parsing IPv4 mapped IPv6 address strings with multiple
compression prefixes.  Tests are provided for the updated
functionality.
2025-07-30 00:53:10 +02:00
Timothy Flynn
d9502505c2 AK: Fix bounds assertions in Utf16View::iterator_offset 2025-07-28 18:30:50 +02:00
Timothy Flynn
67723ef83c AK: Add a method to peek ahead of a UTF-16 iterator 2025-07-28 18:30:50 +02:00
Timothy Flynn
21d7d236e6 AK: Add a method to check if a UTF-16 string contains any code point 2025-07-28 18:30:50 +02:00
Timothy Flynn
96e75a023b AK: Implement a UTF-16 UnixDateTime stringifier 2025-07-28 12:25:11 +02:00
Timothy Flynn
ed63a60247 AK: Return an empty optional when UTF-16 code unit lookup fails
Accidentally returned the wrong type here.
2025-07-28 12:25:11 +02:00
Timothy Flynn
baddac5155 AK: Implement a method to split a UTF-16 string 2025-07-28 12:25:11 +02:00
Timothy Flynn
48a3b2c28e AK: Implement a method to count instances of a needle in a UTF-16 string 2025-07-28 12:25:11 +02:00
Timothy Flynn
1375e6bf39 AK+LibJS+LibWeb: Use simdutf to create well-formed strings 2025-07-26 00:40:06 +02:00
Timothy Flynn
a740bfd8ff AK+LibUnicode: Implement Unicode-aware UTF-16 case transformations 2025-07-25 18:16:22 +02:00
Timothy Flynn
df77ae1920 AK: Implement creating a UTF-16 string from a repeated code point 2025-07-25 18:16:22 +02:00
Timothy Flynn
a46e9b2adb AK: Compute the correct capacity in StringBuilder::try_append_repeated
This was mistakenly broken in 2803d66d87.
2025-07-25 18:16:22 +02:00
Timothy Flynn
745f288796 AK: Implement a method to acquire a UTF-16 iterator's code unit offset
This is the same as Utf8View::iterator_offset().
2025-07-25 18:16:22 +02:00
Jelle Raaijmakers
0b96690f0c AK: Add HashMap::update()
This updates a HashMap by copying another HashMap's keys and values.
2025-07-25 16:22:06 +02:00
Timothy Flynn
6c73dff120 AK: Implement a UTF-16 method to check if a string is ASCII whitespace 2025-07-24 19:00:20 +02:00
Timothy Flynn
f53389bab1 AK: Add a couple of Utf16String factories
* Utf16String::from_utf8_with_replacement_character
* Utf16String::from_code_point
2025-07-24 19:00:20 +02:00
Jelle Raaijmakers
b1c3ce807b AK: Rename Utf16View::trim_whitespace() to ::trim_ascii_whitespace()
This reflects the naming of String::trim_ascii_whitespace() and better
indicates what exactly we're trimming.
2025-07-24 07:18:25 -04:00
Jelle Raaijmakers
9a03ee1c24 AK: Fix mention of renamed member in Utf16View 2025-07-24 07:18:25 -04:00