Commit graph

1668 commits

Author SHA1 Message Date
Timothy Flynn
695bbcb991 LibJS: Revert common error types to only hold a single string
Some checks are pending
CI / macOS, arm64, Sanitizer, Clang (push) Waiting to run
CI / Linux, x86_64, Fuzzers, Clang (push) Waiting to run
CI / Linux, x86_64, Sanitizer, GNU (push) Waiting to run
CI / Linux, x86_64, Sanitizer, Clang (push) Waiting to run
Package the js repl as a binary artifact / Linux, arm64 (push) Waiting to run
Package the js repl as a binary artifact / macOS, arm64 (push) Waiting to run
Package the js repl as a binary artifact / Linux, x86_64 (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
Before porting to UTF-16, these instances held a String. The port to
UTF-16 changed them to hold the original string as a StringView, and
lazily allocated the UTF-16 message as needed. This somehow negatively
impacting the zlib.js benchmark in the Octane suite.
2025-08-18 01:43:45 +02:00
Tete17
08284e0ef6 LibJS: Implement console.dirxml
This function actually gets tested in WPT.
2025-08-17 07:28:56 -04:00
Timothy Flynn
e314ca5e9d LibJS: Introduce ParseMonthCode and CreateMonthCode Temporal AOs
This is an editorial change in the Temporal proposal. See:
28357ea
32f4b02
f860ac6
e6f565d
2025-08-14 11:35:28 -04:00
Timothy Flynn
0f4d5d3abc LibJS: Remove no-op call to ValidateTemporalUnitValue
This is an editorial change in the Temporal proposal. See:
07c924b
2025-08-14 11:35:28 -04:00
Timothy Flynn
70db474cf0 LibJS+LibWeb: Port interned bytecode strings to UTF-16
This was almost a no-op, except we intern JS exception messages. So the
bulk of this patch is porting exception messages to UTF-16.
2025-08-14 10:27:08 +02:00
Timothy Flynn
cf61171864 LibJS: Port remaining bytecode identifiers to UTF-16 2025-08-14 10:27:08 +02:00
Timothy Flynn
829fd25264 LibJS: Add a UTF-16 variant of Value::to_string_without_side_effects 2025-08-14 10:27:08 +02:00
Timothy Flynn
c87122eb32 LibJS: Add a method to stringify a BigInt to UTF-16
And remove the ByteString variant while we are here.
2025-08-14 10:27:08 +02:00
Timothy Flynn
62d85dd90a LibJS: Port RegExp flags and patterns to UTF-16 2025-08-13 09:56:13 -04:00
Timothy Flynn
b955c9b2a9 LibJS: Port the Identifier AST (and related) nodes to UTF-16
This eliminates quite a lot of UTF-8 / UTF-16 churn.
2025-08-13 09:56:13 -04:00
Timothy Flynn
00182a2405 LibJS: Port the JS lexer and parser to UTF-16
This ports the lexer to UTF-16 and deals with the immediate fallout up
to the AST. The AST will be dealt with in upcoming commits.

The lexer will still accept UTF-8 strings as input, and will transcode
them to UTF-16 for lexing. This doesn't actually incur a new allocation,
as we were already converting the input StringView to a ByteString for
each lexer.

One immediate logical benefit here is that we do not need to know off-
hand how many UTF-8 bytes some special code points occupy. They all
happen to be a single UTF-16 code unit. So instead of advancing the
lexer by 3 positions in some cases, we can just always advance by 1.
2025-08-13 09:56:13 -04:00
Timothy Flynn
eb74781a2d LibJS: Keep the lookahead lexer alive after parsing its next token
Currently, the lexer holds a ByteString, which is always heap-allocated.
When we create a copy of the lexer for the lookahead token, that token
will outlive the lexer copy. The token holds a couple of string views
into the lexer's source string. This is fine for now, because the source
string will be kept alive by the original lexer.

But if the lexer were to hold a String or Utf16String, short strings
will be stored on the stack due to SSO. Thus the token will hold views
into released stack data. We need to keep the lookahead lexer alive to
prevent UAF on views into its source string.
2025-08-13 09:56:13 -04:00
Timothy Flynn
8472e469f4 AK+LibJS+LibWeb: Recognize that our UTF-16 string is actually WTF-16
For the web, we allow a wobbly UTF-16 encoding (i.e. lonely surrogates
are permitted). Only in a few exceptional cases do we strictly require
valid UTF-16. As such, our `validate(AllowLonelySurrogates::Yes)` calls
will always succeed. It's a wasted effort to ever make such a check.

This patch eliminates such invocations. The validation methods will now
only check for strict UTF-16, and are only invoked when needed.
2025-08-13 09:56:13 -04:00
Timothy Flynn
99d7e08dff AK: Templatize GenericLexer for UTF-16 strings
We now define GenericLexer as a template to allow using it with UTF-16
strings. To keep existing users happy, the template is defined in the
Detail namespace. Then AK::GenericLexer is an alias for a char-based
view, and AK::Utf16GenericLexer is an alias for a char16-based view.
2025-08-13 09:56:13 -04:00
Timothy Flynn
28d9d3a2c7 AK+Libraries: Reduce API surface of GenericLexer a bit
* Remove completely unused methods.
* Deduplicate methods that were overloaded with both StringView and
  char const* parameters.

A future commit will templatize GenericLexer by char type. This patch
serves to make that a tiny bit easier.
2025-08-13 09:56:13 -04:00
Timothy Flynn
e2b245add1 LibJS: Handle out-of-range prefixed numbers in Token::double_value
Some checks are pending
CI / macOS, arm64, Sanitizer, Clang (push) Waiting to run
CI / Linux, x86_64, Fuzzers, Clang (push) Waiting to run
CI / Linux, x86_64, Sanitizer, GNU (push) Waiting to run
CI / Linux, x86_64, Sanitizer, Clang (push) Waiting to run
Package the js repl as a binary artifact / Linux, arm64 (push) Waiting to run
Package the js repl as a binary artifact / macOS, arm64 (push) Waiting to run
Package the js repl as a binary artifact / Linux, x86_64 (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
This regressed in cd15b1a2c9.

If a prefixed number is out-of-range of a u64, stroul would previously
fall back to ULONG_MAX. This patch restores that behavior.
2025-08-10 13:35:37 +02:00
Aliaksandr Kalenik
4b3a87eb14 LibJS: Add fast path for Array.prototype.shift
Makes `MicroBench/array-prototype-shift.js` 100x faster on my machine.

Progress on https://github.com/LadybirdBrowser/ladybird/issues/5725
2025-08-08 18:10:14 +02:00
Ali Mohammad Pur
4e2845847b LibJS: Add a fast-path to <Int32>.to_uint8()
This was showing up in a profile as hot.
2025-08-08 12:54:06 +02:00
Timothy Flynn
cd15b1a2c9 LibJS: Use AK's number parsing over stroul in JS::Token
Some checks are pending
CI / macOS, arm64, Sanitizer, Clang (push) Waiting to run
CI / Linux, x86_64, Fuzzers, Clang (push) Waiting to run
CI / Linux, x86_64, Sanitizer, GNU (push) Waiting to run
CI / Linux, x86_64, Sanitizer, Clang (push) Waiting to run
Package the js repl as a binary artifact / Linux, arm64 (push) Waiting to run
Package the js repl as a binary artifact / macOS, arm64 (push) Waiting to run
Package the js repl as a binary artifact / Linux, x86_64 (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
This gives us a drop-in replacement for UTF-16 strings.
2025-08-07 02:05:50 +02:00
Timothy Flynn
20995c620f LibJS: Remove MarkupGenerator
It is unused.
2025-08-07 02:05:50 +02:00
Timothy Flynn
3920194bca LibJS: Move regulating and balancing logic into ISODateSurpasses
This is an editorial change in the Temporal proposal. See:
eddb77f
2025-08-05 11:18:08 -04:00
Timothy Flynn
a95d3e2a5e LibJS: Split ISO and non-ISO Temporal calendar operations
This is an editorial change in the Temporal proposal. See:
47042f2
2025-08-05 11:18:08 -04:00
Timothy Flynn
0efa98a57a LibJS+LibWeb+WebContent: Port JS::PropertyKey to UTF-16
This has quite a lot of fall out. But the majority of it is just type or
UDL substitution, where the changes just fall through to other function
calls.

By changing property key storage to UTF-16, the main affected areas are:
* NativeFunction names must now be UTF-16
* Bytecode identifiers must now be UTF-16
* Module/binding names must now be UTF-16
2025-08-05 07:07:15 -04:00
Aliaksandr Kalenik
a3af7ca1a0 LibJS: Skip PrivateEnvironment allocation if possible
If class doesn't have any private fields, we could avoid allocating
PrivateEnvironment for it.

This allows us to skip thousands of unnecessary PrivateEnvironment
allocations on Discord.
2025-07-30 13:01:53 +02:00
Aliaksandr Kalenik
3c3f1f9fad LibWeb: Don't capture proxy as root in ProxyConstructor::revocable
`revoker_closure` is used to construct `NativeFunction` that visits
`raw_capture_range()`, so there is no need to use GC root for `proxy`.
2025-07-30 08:43:53 +02:00
Timothy Flynn
9d993143de LibJS: Implement a UTF-16 number-to-string converter 2025-07-28 12:25:11 +02:00
Timothy Flynn
1375e6bf39 AK+LibJS+LibWeb: Use simdutf to create well-formed strings 2025-07-26 00:40:06 +02:00
Timothy Flynn
173bb67004 LibJS+LibUnicode: Port Intl.RelativeTimeFormat to UTF-16 strings 2025-07-24 10:39:52 +02:00
Timothy Flynn
6fe0e13474 LibJS+LibUnicode: Port Intl.DurationFormat to UTF-16 strings 2025-07-24 10:39:52 +02:00
Timothy Flynn
e637e148d4 LibJS+LibUnicode: Port Intl.NumberFormat to UTF-16 strings 2025-07-24 10:39:52 +02:00
Timothy Flynn
db2148b44a LibJS+LibUnicode: Port Intl.ListFormat to UTF-16 strings 2025-07-24 10:39:52 +02:00
Timothy Flynn
7d80aabbdb LibJS+LibUnicode: Port Intl.DisplayNames to UTF-16 strings 2025-07-24 10:39:52 +02:00
Timothy Flynn
ee01f857d1 LibJS+LibUnicode: Port Intl.DateTimeFormat to UTF-16 strings 2025-07-24 10:39:52 +02:00
Timothy Flynn
b2f053e783 LibJS+LibUnicode: Port Intl.Collator to UTF-16 strings 2025-07-24 10:39:52 +02:00
Timothy Flynn
b3d52a8238 LibJS: Compute offsetBehaviour in ToTemporalZonedDateTime after parsing
This is an editorial change in the Temporal proposal. See:
4b83ba3
2025-07-23 22:05:15 +02:00
Timothy Flynn
e95f225362 LibJS: Correctly disallow certain calendar-based Temporal strings
This is an editorial change in the Temporal proposal. See:
d83241f
2025-07-23 22:05:15 +02:00
Timothy Flynn
3f75cf270a LibJS: Move ambiguous Temporal time string handling to a separate parser
This is an editorial change in the Temporal proposal. See:
fa3d0b9
2025-07-23 22:05:15 +02:00
Timothy Flynn
7ca3598221 LibJS: Add new spec comments to ISO 8601 early-error handlers
This is an editorial change in the Temporal proposal. See:
f8489fb
2025-07-23 22:05:15 +02:00
Timothy Flynn
3b3ff6f057 LibJS: Split up the GetTemporalUnitValuedOption AO
This is an editorial change in the Temporal proposal. See:
fcdb47e
ef04774
2070032
adf50d4
40eaeb7
aab7bee
a10d38d
06ce375
2025-07-23 22:05:15 +02:00
Timothy Flynn
1aee1a050e LibJS: Use consistent naming for this value in Temporal prototypes
This is an editorial change in the Temporal proposal. See:
ed49b0b
2025-07-23 22:05:15 +02:00
Timothy Flynn
0c5c6a9944 LibJS: Use consistent wording around NewTarget in Temporal constructors
This is an editorial change in the Temporal proposal. See:
e454171
2025-07-23 22:05:15 +02:00
Timothy Flynn
04fe0c6aec LibJS: Create match indices based on code unit length
In Unicode mode, we were mixing code units (start_index) with code point
length (end_index).
2025-07-22 23:11:19 +02:00
Timothy Flynn
772479b334 LibJS: Fix typo in GetMatchIndexPair definition 2025-07-22 23:11:19 +02:00
ayeteadoe
2e2484257d LibJS: Enable EXPLICIT_SYMBOL_EXPORT and annotate minimum symbol set 2025-07-22 11:51:29 -04:00
ayeteadoe
539a675802 LibJS: Revert Enable EXPLICIT_SYMBOL_EXPORT
This reverts commit c14173f651. We
should only annotate the minimum number of symbols that external
consumers actually use, so I am starting from scratch to do that
2025-07-22 11:51:29 -04:00
Timothy Flynn
42b41431eb AK+LibJS: Enforce limits in Utf16View offset computations
RegExp was the only caller relying on being able to provide an offset
larger than the string length. So let's do a pre-check in RegExp and
then enforce that the offsets we receive in Utf16View are valid.
2025-07-22 17:17:33 +02:00
Jelle Raaijmakers
5d19aacce7 LibJS: Do not directly append RegExp pattern code points during parse
Some checks are pending
CI / macOS, arm64, Sanitizer, Clang (push) Waiting to run
CI / Linux, x86_64, Fuzzers, Clang (push) Waiting to run
CI / Linux, x86_64, Sanitizer, GNU (push) Waiting to run
CI / Linux, x86_64, Sanitizer, Clang (push) Waiting to run
Package the js repl as a binary artifact / Linux, arm64 (push) Waiting to run
Package the js repl as a binary artifact / macOS, arm64 (push) Waiting to run
Package the js repl as a binary artifact / Linux, x86_64 (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
There apparently is a bit of a disconnect between the spec asking us to
construct the pattern using code points and LibRegex not being able to
swallow those. Whenever we had multi-byte code points in the pattern and
tried to match that in unicode mode, we would fail.

Change the parser to encode all non-ASCII code units. Fixes 2 test262
cases in `language/literals/regexp`.
2025-07-22 01:23:52 +02:00
Timothy Flynn
9582895759 AK+LibJS+LibWeb+LibRegex: Replace AK::Utf16Data with AK::Utf16String 2025-07-18 12:45:38 -04:00
Timothy Flynn
a43cb15e81 LibJS+LibWeb: Replace JS::Utf16String with AK::Utf16String 2025-07-18 12:45:38 -04:00
Timothy Flynn
2803d66d87 AK: Support UTF-16 string formatting
The underlying storage used during string formatting is StringBuilder.
To support UTF-16 strings, this patch allows callers to specify a mode
during StringBuilder construction. The default mode is UTF-8, for which
StringBuilder remains unchanged.

In UTF-16 mode, we treat the StringBuilder's internal ByteBuffer as a
series of u16 code units. Appending a single character will append 2
bytes for that character (cast to a char16_t). Appending a StringView
will transcode the string to UTF-16.

Utf16String also gains the same memory optimization that we added for
String, where we hand-off the underlying buffer to Utf16String to avoid
having to re-allocate.

In the future, we may want to further optimize for ASCII strings. For
example, we could defer committing to the u16-esque storage until we
see a non-ASCII code point.
2025-07-18 12:45:38 -04:00