Commit graph

85 commits

Author SHA1 Message Date
Shannon Booth
e6235210ff LibURL: Convert to scalar string before URL parsing
URL parsing is expected to take place on well formed unicode
strings.
2025-07-07 06:50:57 -04:00
ayeteadoe
25f5936dee CMake: Rename serenity_* helper functions/macros to ladybird_* 2025-07-03 23:19:41 +02:00
Timothy Flynn
62d9a84b8d AK+Everywhere: Replace custom number parsers with fast_float
Some checks failed
CI / macOS, arm64, Sanitizer_CI, Clang (push) Waiting to run
CI / Linux, x86_64, Fuzzers_CI, Clang (push) Waiting to run
CI / Linux, x86_64, Sanitizer_CI, GNU (push) Waiting to run
CI / Linux, x86_64, Sanitizer_CI, Clang (push) Waiting to run
Package the js repl as a binary artifact / Linux, arm64 (push) Waiting to run
Package the js repl as a binary artifact / macOS, arm64 (push) Waiting to run
Package the js repl as a binary artifact / Linux, x86_64 (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
Build Dev Container Image / build (push) Has been cancelled
Our floating point number parser was based on the fast_float library:
https://github.com/fastfloat/fast_float

However, our implementation only supports 8-bit characters. To support
UTF-16, we will need to be able to convert char16_t-based strings to
numbers as well. This works out-of-the-box with fast_float.

We can also use fast_float for integer parsing.
2025-07-03 09:51:56 -04:00
Shannon Booth
bd67a5afaa LibURL: Differentiate cross site opaque origins
Previously if we had two opaque origins both URLs were
being treated as same site.
2025-06-30 08:06:37 +01:00
Shannon Booth
b49b1b35e4 LibURL: Correct logic for domains not matched by PSL in public_suffix
Some checks are pending
CI / macOS, arm64, Sanitizer_CI, Clang (push) Waiting to run
CI / Linux, x86_64, Fuzzers_CI, Clang (push) Waiting to run
CI / Linux, x86_64, Sanitizer_CI, GNU (push) Waiting to run
CI / Linux, x86_64, Sanitizer_CI, Clang (push) Waiting to run
Package the js repl as a binary artifact / Linux, arm64 (push) Waiting to run
Package the js repl as a binary artifact / macOS, arm64 (push) Waiting to run
Package the js repl as a binary artifact / Linux, x86_64 (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
For the AO defined in the URL specification, in the case the
domain does not match against the PSL, we should be returning
the TLD. This fixes a crash for a bunch of WPT tests using the
Document.domain setter when the test is being served by WPT
locally.

We should be doing similar logic in registrable_domain, but that
unfortunately runs into some other issues, so just leave a FIXME
for now.
2025-06-29 12:47:57 +01:00
Shannon Booth
a2b523eeb8 LibURL: Replace use of URL::get_public_suffix
It is confusing to have both URL::Host::public_suffix and
URL:get_public_suffix, both with slightly different semantics.

Instead, use PublicSuffixData for cases that just want a direct
match against the list, and URL::Host::public_suffix in LibWeb
land as the URL spec defined AO.
2025-06-29 12:47:57 +01:00
Shannon Booth
e6ecafea84 LibURL: Remove ErrorOr from get_public_suffix
The caller only expects ASCII and let's ignore any OOM.
2025-06-29 12:47:57 +01:00
Shannon Booth
c3618b891f Meta+LibURL: Always enable public suffix data
We should not encourage no public suffix data as a supported
configuration.
2025-06-29 12:47:57 +01:00
Shannon Booth
68b57daf84 LibURL: Remove uneeded FIXME for UTF-8 decode in URL parsing
I believe this is in the specification since the spec technically
requires passing through a valid unicode string. However, our
implementation already handles a non valid unicode string, and will
do the replacement character substitution.
2025-06-27 18:45:48 +12:00
Shannon Booth
1f4bbc2bfb LibURL: Publicly expose ability to parse a host
This is used by the HTML specification.
2025-06-27 18:45:48 +12:00
Shannon Booth
38765fd617 LibURL: Use a nonce to distinguish opaque origins
Opaque origins are meant to be unique in terms of equality from
one another. Since this uniqueness needs to be across processes,
use a nonce to implement the uniqueness check.
2025-06-25 16:47:09 +01:00
Shannon Booth
e0d7278820 LibURL+LibWeb: Make URL::Origin default constructor private
Instead, porting over all users to use the newly created
Origin::create_opaque factory function. This also requires porting
over some users of Origin to avoid default construction.
2025-06-17 20:54:03 +02:00
Shannon Booth
40d21e343f LibURL: Add FIXME's for testing equality of opaque origins
The spec seems to indicate in its wording that while opaque
origins only serialize to 'null', they can still be tested
for equality with one another. Probably we will need to
generate some unique ID which is unique across processes.
2025-05-24 09:51:44 -04:00
Timothy Flynn
7280ed6312 Meta: Enforce newlines around namespaces
This has come up several times during code review, so let's just enforce
it using a new clang-format 20 option.
2025-05-14 02:01:59 -06:00
Shannon Booth
8e37cd2f71 LibURL: Remove URL's valid state
Some checks are pending
CI / Lagom (arm64, Sanitizer_CI, false, macos-15, macOS, Clang) (push) Waiting to run
CI / Lagom (x86_64, Fuzzers_CI, false, ubuntu-24.04, Linux, Clang) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, false, ubuntu-24.04, Linux, GNU) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, true, ubuntu-24.04, Linux, Clang) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (arm64, macos-15, macOS, macOS-universal2) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (x86_64, ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
No code now relies on using URL's valid state.

A URL can still be _technically_ invalid through use of the URL
constructor or by directly changing URL fields.

However, all URLs should be constructed through the URL parser,
and we should ideally be getting rid of the default constructor
at some stage.

Also, any code which is manually setting URL fields need to be
aware that this is full of pitfalls since there are many different
forms of canonicalization which is bypassed by not going through
the URL parser.
2025-04-19 07:18:43 -04:00
Shannon Booth
00bbb2105b LibURL: Port create_with_file_scheme to Optional
Removing one of the main remaining users of URL valid state.
2025-04-19 07:18:43 -04:00
Shannon Booth
2072eee83d LibURL: Implement create_with_file_scheme using URL Parser
Creating a URL should almost always go through the URLParser to
handle all of the small edge cases involved. This reduces the
need for URL valid state.
2025-04-19 07:18:43 -04:00
Ali Mohammad Pur
76f5dce3db LibRegex: Flatten capture group list in MatchState
This makes copying the capture group COWVector significantly cheaper,
as we no longer have to run any constructors for it - just memcpy.
2025-04-18 17:09:27 +02:00
stasoid
32ddeb82d6 LibURL+LibWeb: Remove leading slash when converting url to path
...on Windows
2025-04-10 19:04:21 -06:00
Timothy Flynn
f070264800 Everywhere: Remove sv suffix from format string literals
This prevents the compile-time checks that would catch errors in the
format invocation (which would usually lead to a runtime crash).
2025-04-08 20:00:18 -04:00
Timothy Flynn
0a256b0a9a AK+Everywhere: Change StringView case conversions to return String
There's a bit of a UTF-8 assumption with this change. But nearly every
caller of these methods were immediately creating a String from the
resulting ByteString anyways.
2025-04-07 17:44:38 +02:00
Shannon Booth
0a58497ab9 LibURL/Pattern: Fix PatternParser logic for prefix codepoint comparison
We were not properly handling the case that prefix code point was the
empty string (which we represent as an OptionalNone). While this
still resulted in the correct pattern string being generated, an
incorrect regular expression was being generated causing matching
to fail.
2025-04-07 10:29:09 -04:00
Shannon Booth
4b6f0ee24a LibURL: Do not trim whitespace parsing port in URL parser
This has no functional difference as far as I can tell, but for
clarity explicitly do not attempt to do this, which has the nice
side effect of not checking for whitespace known to not exist.
2025-04-07 10:29:09 -04:00
Shannon Booth
565ccc04a9 LibURL/Pattern: Do not trim whitespace interpreting port
It turns out that the problem here was simply that we were trimming
trailing whitespace when we did not need to, which was meaning that
the port number of '80 ' was being converted to the empty string
per URLPattern elision as the port matches the http scheme.
2025-04-07 10:29:09 -04:00
Timothy Flynn
ee6b2db009 AK+LibURL+LibWeb: Use simdutf to validate ASCII strings
simdutf provides a vectorized ASCII validator, so let's use that instead
of looping over strings manually.
2025-04-06 11:05:58 -04:00
Shannon Booth
212095e1c2 LibURL/Pattern: Ensure string passed through in process a URLPatternInit
Some checks are pending
CI / Lagom (arm64, Sanitizer_CI, false, macos-15, macOS, Clang) (push) Waiting to run
CI / Lagom (x86_64, Fuzzers_CI, false, ubuntu-24.04, Linux, Clang) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, false, ubuntu-24.04, Linux, GNU) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, true, ubuntu-24.04, Linux, Clang) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (arm64, macos-15, macOS, macOS-universal2) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (x86_64, ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
Corresponds to: 696b402
2025-04-06 08:24:54 -04:00
Shannon Booth
bee3720b6f LibURL/Pattern: Make dummyURL from the URL parser with a special scheme
Corresponds to: 46c30fda8f

Along with a follow up bug fix that I made of:

5e1c93e2

This for example, fixes canonicalization of URL hosts containing
special characters that should have the unicode ToAscii algorithm
performed on them as the URLs were not being treated as special.
2025-04-06 08:24:54 -04:00
Shannon Booth
83a82a027f LibURL/Pattern: Do not return errors in some canonicalization steps
Corresponds to: 5c979a31
2025-04-06 08:24:54 -04:00
Shannon Booth
a9777a3300 LibURL: Make port state override return failure more for URLPattern
Corresponds to URL spec change:

cc8b776b

Note that the new test failure being introduced here is an unrelated
WPT test change bundled in the resources test file update that I am
not convinced is correct.
2025-04-06 08:24:54 -04:00
Shannon Booth
4e8f2e48c4 LibURL: Report all hostname state failures for URLPattern
Corresponds to URL spec change:

c23aec1
2025-04-06 08:24:54 -04:00
Shannon Booth
dcb7842f59 LibURL/Pattern: Use opaque pathname serialization in canonicalization
The URL spec represents its path as a:

Variant<String, Vector<String>>

A URL is defined has having an opaque path if it has a single String,
the URL path otherwise containing a list of path components.

We (like in an older version of the spec) track this through a boolean
and only use a Vector with a single component for opaque paths.

This means it was incorrect to simple assign the path to a list with
a single empty string without setting that URL as opaque, which
meant that the path serialization was producing incorrect results.

It may make sense changing the API so this situation is a little more
clear. But for now, we simply need to set the opaque path boolean
to true here.
2025-04-06 08:24:54 -04:00
Shannon Booth
e7ad9a9bad LibURL/Pattern: URL parse correct value in opaque path canonicalization 2025-04-06 08:24:54 -04:00
Shannon Booth
e54504ad93 LibURL/Pattern: Implement 'compute protocol matches a special scheme' 2025-04-06 08:24:54 -04:00
Shannon Booth
6b1fa3ecd0 LibURL/Pattern: Implement matching a URLPattern 2025-04-06 08:24:54 -04:00
Shannon Booth
2a44420e52 LibURL/Pattern: Implement generating a component match result 2025-04-06 08:24:54 -04:00
Shannon Booth
e35555f00e LibURL/Pattern: Complete the implementation of the constructor 2025-04-06 08:24:54 -04:00
Shannon Booth
c9e6ad562c LibURL/Pattern: Implement ability to compile a component
This provides the infrastructure for taking a part list from the
pattern parser and generating the actual regexp object which is
used for matching against URLs from the pattern.
2025-04-06 08:24:54 -04:00
Shannon Booth
934f1ec30d LibURL/Pattern: Implement the URLPattern Pattern Parser 2025-04-06 08:24:54 -04:00
Shannon Booth
45d852d14b LibURL: Add helper for getting array of the special schemes
This is useful for iterating over all of the special schemes, as
needed in the URLPattern implementation.
2025-04-06 08:24:54 -04:00
Shannon Booth
e3ef6d3aee LibURL/Pattern: Implement ability to generate a pattern string
Compiling a URLPattern component will generate a 'parts list' which
is used for generating the regular expression that is used for
matching against URLs.

This parts list is also used to generate (through this function) a
pattern string. The pattern string of a URL component is what is
exposed on the USVString getters of the URLPattern class itself.

As an example, the following:

```
let pattern = new URLPattern({ "pathname": "/foo/(.*)*" });
console.log(pattern.pathname);
```

Will log the pattern string of: '/foo/**'.
2025-04-06 08:24:54 -04:00
Shannon Booth
f3679184cb LibURL/Pattern: Add representation of a URL Pattern 'options' struct
These control how a pattern string is generated, which can vary for
different components and is also impacted by the 'ignoreCase' option
that can be provided in the URLPattern constructor.
2025-04-06 08:24:54 -04:00
Shannon Booth
88bea4a717 LibURL/Pattern: Add a URL Pattern 'Part' representation 2025-04-06 08:24:54 -04:00
Shannon Booth
8a33c57c1e LibWeb/LibURL: Use an IgnoreCase enum for URLPatternOptions
This is to save a future name conflict that will appear between
the options IDL dictionary and the options struct that are both
present in the spec.

It is also a nicer interface for now given there is only a single
option at the moment.
2025-04-06 08:24:54 -04:00
Shannon Booth
f80e7d6816 LibURL/Pattern: Implement processing a URL Pattern Init
This gets us to the point just before the point of parsing the
pattern strings for each URL component to produce a regular
expression.
2025-04-06 08:24:54 -04:00
Shannon Booth
3f73cd30a2 LibURL: Rename 'cannot have a base URL' to 'has an opaque path'
This follows a rename made in the URL specification.
2025-04-06 08:24:54 -04:00
Shannon Booth
6b85748f53 LibURL/Pattern: Implement helper for escaping a URL Pattern String 2025-04-06 08:24:54 -04:00
Shannon Booth
ec3c545426 LibURL+LibWeb: Ensure opaque paths always roundtrip
Corresponds to: 6c782003
2025-03-18 12:17:19 +00:00
Shannon Booth
a9e20cb6c3 LibURL/Pattern: Use ConstructorStringParser to construct URLPatternInit 2025-03-15 07:39:03 -04:00
Shannon Booth
e369756e9c LibURL/Pattern: Implement the constructor string parser
This is missing one small bit of functionality where the not-yet
impplemented component compilation is required.
2025-03-15 07:39:03 -04:00
Shannon Booth
e70272ddef LibURL/Pattern: Implement URL Pattern canonicalization
These are used to normalize URL components.
2025-03-15 07:39:03 -04:00