Commit graph

33 commits

Author SHA1 Message Date
Ali Mohammad Pur
76f5dce3db LibRegex: Flatten capture group list in MatchState
This makes copying the capture group COWVector significantly cheaper,
as we no longer have to run any constructors for it - just memcpy.
2025-04-18 17:09:27 +02:00
Timothy Flynn
f070264800 Everywhere: Remove sv suffix from format string literals
This prevents the compile-time checks that would catch errors in the
format invocation (which would usually lead to a runtime crash).
2025-04-08 20:00:18 -04:00
Shannon Booth
0a58497ab9 LibURL/Pattern: Fix PatternParser logic for prefix codepoint comparison
We were not properly handling the case that prefix code point was the
empty string (which we represent as an OptionalNone). While this
still resulted in the correct pattern string being generated, an
incorrect regular expression was being generated causing matching
to fail.
2025-04-07 10:29:09 -04:00
Shannon Booth
565ccc04a9 LibURL/Pattern: Do not trim whitespace interpreting port
It turns out that the problem here was simply that we were trimming
trailing whitespace when we did not need to, which was meaning that
the port number of '80 ' was being converted to the empty string
per URLPattern elision as the port matches the http scheme.
2025-04-07 10:29:09 -04:00
Timothy Flynn
ee6b2db009 AK+LibURL+LibWeb: Use simdutf to validate ASCII strings
simdutf provides a vectorized ASCII validator, so let's use that instead
of looping over strings manually.
2025-04-06 11:05:58 -04:00
Shannon Booth
212095e1c2 LibURL/Pattern: Ensure string passed through in process a URLPatternInit
Some checks are pending
CI / Lagom (arm64, Sanitizer_CI, false, macos-15, macOS, Clang) (push) Waiting to run
CI / Lagom (x86_64, Fuzzers_CI, false, ubuntu-24.04, Linux, Clang) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, false, ubuntu-24.04, Linux, GNU) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, true, ubuntu-24.04, Linux, Clang) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (arm64, macos-15, macOS, macOS-universal2) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (x86_64, ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
Corresponds to: https://github.com/whatwg/urlpattern/commit/696b402
2025-04-06 08:24:54 -04:00
Shannon Booth
bee3720b6f LibURL/Pattern: Make dummyURL from the URL parser with a special scheme
Corresponds to: https://github.com/whatwg/urlpattern/commit/46c30fda8f

Along with a follow up bug fix that I made of:

https://github.com/whatwg/urlpattern/commit/5e1c93e2

This for example, fixes canonicalization of URL hosts containing
special characters that should have the unicode ToAscii algorithm
performed on them as the URLs were not being treated as special.
2025-04-06 08:24:54 -04:00
Shannon Booth
83a82a027f LibURL/Pattern: Do not return errors in some canonicalization steps
Corresponds to: https://github.com/whatwg/urlpattern/commit/5c979a31
2025-04-06 08:24:54 -04:00
Shannon Booth
dcb7842f59 LibURL/Pattern: Use opaque pathname serialization in canonicalization
The URL spec represents its path as a:

Variant<String, Vector<String>>

A URL is defined has having an opaque path if it has a single String,
the URL path otherwise containing a list of path components.

We (like in an older version of the spec) track this through a boolean
and only use a Vector with a single component for opaque paths.

This means it was incorrect to simple assign the path to a list with
a single empty string without setting that URL as opaque, which
meant that the path serialization was producing incorrect results.

It may make sense changing the API so this situation is a little more
clear. But for now, we simply need to set the opaque path boolean
to true here.
2025-04-06 08:24:54 -04:00
Shannon Booth
e7ad9a9bad LibURL/Pattern: URL parse correct value in opaque path canonicalization 2025-04-06 08:24:54 -04:00
Shannon Booth
e54504ad93 LibURL/Pattern: Implement 'compute protocol matches a special scheme' 2025-04-06 08:24:54 -04:00
Shannon Booth
6b1fa3ecd0 LibURL/Pattern: Implement matching a URLPattern 2025-04-06 08:24:54 -04:00
Shannon Booth
2a44420e52 LibURL/Pattern: Implement generating a component match result 2025-04-06 08:24:54 -04:00
Shannon Booth
e35555f00e LibURL/Pattern: Complete the implementation of the constructor 2025-04-06 08:24:54 -04:00
Shannon Booth
c9e6ad562c LibURL/Pattern: Implement ability to compile a component
This provides the infrastructure for taking a part list from the
pattern parser and generating the actual regexp object which is
used for matching against URLs from the pattern.
2025-04-06 08:24:54 -04:00
Shannon Booth
934f1ec30d LibURL/Pattern: Implement the URLPattern Pattern Parser 2025-04-06 08:24:54 -04:00
Shannon Booth
e3ef6d3aee LibURL/Pattern: Implement ability to generate a pattern string
Compiling a URLPattern component will generate a 'parts list' which
is used for generating the regular expression that is used for
matching against URLs.

This parts list is also used to generate (through this function) a
pattern string. The pattern string of a URL component is what is
exposed on the USVString getters of the URLPattern class itself.

As an example, the following:

```
let pattern = new URLPattern({ "pathname": "/foo/(.*)*" });
console.log(pattern.pathname);
```

Will log the pattern string of: '/foo/**'.
2025-04-06 08:24:54 -04:00
Shannon Booth
f3679184cb LibURL/Pattern: Add representation of a URL Pattern 'options' struct
These control how a pattern string is generated, which can vary for
different components and is also impacted by the 'ignoreCase' option
that can be provided in the URLPattern constructor.
2025-04-06 08:24:54 -04:00
Shannon Booth
88bea4a717 LibURL/Pattern: Add a URL Pattern 'Part' representation 2025-04-06 08:24:54 -04:00
Shannon Booth
8a33c57c1e LibWeb/LibURL: Use an IgnoreCase enum for URLPatternOptions
This is to save a future name conflict that will appear between
the options IDL dictionary and the options struct that are both
present in the spec.

It is also a nicer interface for now given there is only a single
option at the moment.
2025-04-06 08:24:54 -04:00
Shannon Booth
f80e7d6816 LibURL/Pattern: Implement processing a URL Pattern Init
This gets us to the point just before the point of parsing the
pattern strings for each URL component to produce a regular
expression.
2025-04-06 08:24:54 -04:00
Shannon Booth
6b85748f53 LibURL/Pattern: Implement helper for escaping a URL Pattern String 2025-04-06 08:24:54 -04:00
Shannon Booth
a9e20cb6c3 LibURL/Pattern: Use ConstructorStringParser to construct URLPatternInit 2025-03-15 07:39:03 -04:00
Shannon Booth
e369756e9c LibURL/Pattern: Implement the constructor string parser
This is missing one small bit of functionality where the not-yet
impplemented component compilation is required.
2025-03-15 07:39:03 -04:00
Shannon Booth
e70272ddef LibURL/Pattern: Implement URL Pattern canonicalization
These are used to normalize URL components.
2025-03-15 07:39:03 -04:00
Shannon Booth
f8f21319f9 LibURL/Pattern: Implement the URL Pattern Tokenizer
The tokenizer is used for both pattern string and constructor string
parsing of URL Patterns.
2025-03-15 07:39:03 -04:00
Shannon Booth
10b32a8dd8 LibURL/Pattern: Stub out URL::Pattern::match
This will allow us to complete the IDL interface, which will leave
remaining work to implement the URL pattern specification within
LibURL.
2025-03-04 16:32:09 -05:00
Shannon Booth
ff07cc1a6c LibURL/Pattern: Add some scaffolding for the URLPattern constructor 2025-03-04 16:32:09 -05:00
Shannon Booth
873f7e4b3d LibURL/Pattern: Add a representation of a URL Pattern error
As the comment in this file explains the caller of LibURL APIs are
meant to assume if they see any error, that it is a TypeError since
that is all the spec throws at the moment.

A custom error type exists here so that we can include more
information in TypeError's which are thrown.
2025-03-04 16:32:09 -05:00
Shannon Booth
f3662c6f88 LibURL/Pattern: Add a representation of a URL Pattern
This is the core object behind a URL pattern which when constructed
can be used for matching the pattern against URLs.

However, the implementation here is missing key functions such as
the constructor and the 'test'/'exec' functions as that relies on
a significant amount of supporting URLPattern infrastructure such
as two different parsers and a tokenizer.

However, this is enough for us to implement some more of the IDL
wrapper layer of this specification.
2025-02-17 19:10:39 -05:00
Shannon Booth
5521836929 LibURL/Pattern: Add a representation of a URL Pattern 'component'
A URL pattern consists of components such as the 'port', 'password'
'hostname', etc. A component is compiled from the input to the
URLPattern constructor and is what is used for matching against
URLs to produce a match result.

This is also where the regex dependency is introduced into LibURL
to support the URLPattern implementation.
2025-02-17 19:10:39 -05:00
Shannon Booth
dc2c62825b LibURL: Add a representation of a URL Pattern 'result'
This is the return value of a URLPattern after `exec` is called on it.
It conveys information about the named (or unammed) regex groups
matched for each component of the URL. For example,

```
let p = new URLPattern({ hostname: "{:subdomain.}*example.com" });
const result = pattern.exec({ hostname: "foo.bar.example.com" });
console.log(result.hostname.groups.subdomain);
```

Will log 'foo.bar'.
2025-02-10 17:05:15 +00:00
Shannon Booth
46bfced9ad LibURL: Add representations of URLPattern{Init,Options,Input}
The URLPattern spec is intended to be implemented inside of LibURL, with
LibWeb only responsible for the IDL conversion layer, in a similar
manner to how URL is implemented.
2025-01-27 18:07:17 +00:00