Commit graph

39 commits

Author SHA1 Message Date
Shannon Booth
ec3c545426 LibURL+LibWeb: Ensure opaque paths always roundtrip
Corresponds to: https://github.com/whatwg/url/commit/6c782003
2025-03-18 12:17:19 +00:00
Shannon Booth
a9e20cb6c3 LibURL/Pattern: Use ConstructorStringParser to construct URLPatternInit 2025-03-15 07:39:03 -04:00
Shannon Booth
e369756e9c LibURL/Pattern: Implement the constructor string parser
This is missing one small bit of functionality where the not-yet
impplemented component compilation is required.
2025-03-15 07:39:03 -04:00
Shannon Booth
e70272ddef LibURL/Pattern: Implement URL Pattern canonicalization
These are used to normalize URL components.
2025-03-15 07:39:03 -04:00
Shannon Booth
f775ee8a93 LibURL: Rename 'cannot be a base URL' state to 'opaque path' state
This follows a rename made in the URL specification.
2025-03-15 07:39:03 -04:00
Shannon Booth
f8f21319f9 LibURL/Pattern: Implement the URL Pattern Tokenizer
The tokenizer is used for both pattern string and constructor string
parsing of URL Patterns.
2025-03-15 07:39:03 -04:00
Timothy Flynn
a34f7a5bd1 LibURL: Correctly acquire the registrable domain for a URL
We were using the public suffix of the URL's host as its registrable
domain. But the registrable domain is actually the public suffix plus
one additional label.
2025-03-11 12:10:42 +01:00
Vishal Biswas
90b303215e LibURL: Add U+005E to path percent encoding list
Passes wpt tests which were failing after
9bc33c39d4.

It also removes ^ from Userinfo set as its included in Path set now
2025-03-10 11:19:36 +01:00
Shannon Booth
10b32a8dd8 LibURL/Pattern: Stub out URL::Pattern::match
This will allow us to complete the IDL interface, which will leave
remaining work to implement the URL pattern specification within
LibURL.
2025-03-04 16:32:09 -05:00
Shannon Booth
ff07cc1a6c LibURL/Pattern: Add some scaffolding for the URLPattern constructor 2025-03-04 16:32:09 -05:00
Shannon Booth
873f7e4b3d LibURL/Pattern: Add a representation of a URL Pattern error
As the comment in this file explains the caller of LibURL APIs are
meant to assume if they see any error, that it is a TypeError since
that is all the spec throws at the moment.

A custom error type exists here so that we can include more
information in TypeError's which are thrown.
2025-03-04 16:32:09 -05:00
Shannon Booth
de89f5af6d LibURL: Remove the implicit URL constructors
All URLs are now either constucted through the URL Parser or by
default constructing a URL, and setting each of the fields of that
URL manually. This makes it much more difficult to create invalid
URLs.
2025-03-04 16:24:19 -05:00
zoupingshi
b609d8481a LibURL+LibWeb+Tests: Remove redundant words 2025-02-27 10:35:39 +00:00
Shannon Booth
d62cf0a807 Everywhere: Remove some use of the URL constructors
These make it too easy to construct an invalid URL, which makes it
difficult to remove the valid state of URL - which this API relies
on.
2025-02-19 08:01:35 -05:00
Shannon Booth
f3662c6f88 LibURL/Pattern: Add a representation of a URL Pattern
This is the core object behind a URL pattern which when constructed
can be used for matching the pattern against URLs.

However, the implementation here is missing key functions such as
the constructor and the 'test'/'exec' functions as that relies on
a significant amount of supporting URLPattern infrastructure such
as two different parsers and a tokenizer.

However, this is enough for us to implement some more of the IDL
wrapper layer of this specification.
2025-02-17 19:10:39 -05:00
Shannon Booth
5521836929 LibURL/Pattern: Add a representation of a URL Pattern 'component'
A URL pattern consists of components such as the 'port', 'password'
'hostname', etc. A component is compiled from the input to the
URLPattern constructor and is what is used for matching against
URLs to produce a match result.

This is also where the regex dependency is introduced into LibURL
to support the URLPattern implementation.
2025-02-17 19:10:39 -05:00
Shannon Booth
07f054e067 LibURL: Add 'about:XXX' helper factory functions
Currently we create URLs such as 'about:blank' through the StringView
or ByteString constructor of URL. However, in order to elimate the
use of URL::is_valid, we need to get rid of these constructors as it
makes it way too easy to create an invalid URL.

It is very cumbersome to construct an 'about:blank' URL when using
URL::Parser::basic_parse. So instead of doing that, create some
helper functions which will create the 'about:XXX' URLs with the
correct properties set.

Conveniently, this is also a much faster way of creating these URLs
as it means we do not need to parse the URL and can set all of the
members up front.
2025-02-15 17:05:55 +00:00
Shannon Booth
53826995f6 LibURL+LibWeb: Port URL::complete_url to Optional
Removing one more source of the URL::is_valid API.
2025-02-15 17:05:55 +00:00
Shannon Booth
dc2c62825b LibURL: Add a representation of a URL Pattern 'result'
This is the return value of a URLPattern after `exec` is called on it.
It conveys information about the named (or unammed) regex groups
matched for each component of the URL. For example,

```
let p = new URLPattern({ hostname: "{:subdomain.}*example.com" });
const result = pattern.exec({ hostname: "foo.bar.example.com" });
console.log(result.hostname.groups.subdomain);
```

Will log 'foo.bar'.
2025-02-10 17:05:15 +00:00
Shannon Booth
46bfced9ad LibURL: Add representations of URLPattern{Init,Options,Input}
The URLPattern spec is intended to be implemented inside of LibURL, with
LibWeb only responsible for the IDL conversion layer, in a similar
manner to how URL is implemented.
2025-01-27 18:07:17 +00:00
Shannon Booth
ca3d9d9ee0 LibURL+LibWeb+LibIPC: Represent blob URL entry's object using structs
Instead of just putting in members directly, wrap them up in structs
which represent what a URL blob entry is meant to hold per the spec.
This makes more obvious what this is meant to represent, such as the
ByteBuffer being used to represent the bytes behind a Blob.

This also allows us to use a stronger type for a function that needs
to return a Blob URL entry's object.
2025-01-21 19:22:07 +00:00
Sam Atkins
9a7ce901b7 LibURL: Gracefully handle a host having no public suffix
Specifically, after implementing some recent spec changes to navigables,
we end up calling `get_public_suffix("localhost")` here, which returns
OptionalNone. This would previously crash.

Our get_public_suffix() seems a little incorrect. From the spec:
> If no rules match, the prevailing rule is "*".
> https://github.com/publicsuffix/list/wiki/Format#algorithm

However, ours returns an empty Optional in that case. To avoid breaking
other users of it, this patch modifies Host's uses of it, rather than
the function itself.
2025-01-21 18:17:18 +01:00
Shannon Booth
5bed8f4055 LibURL+LibWeb: Make URL::basic_parse return an Optional<URL>
URL::basic_parse has a subtle bug where the resulting URL is not set
to valid when StateOveride is provided and the URL parser early returns
a valid URL.

This has not surfaced as a problem so far, as the only users of the
state override API provide an already valid URL buffer and also ignore
the result of basic parsing with a state override.

However, this bug surfaces implementing the URL pattern spec, which as
part of URL canonicalization:
 * Provides a dummy URL record
 * Basic URL parses that URL with state override
 * Checks the result of the URL parser to validate the URL

While we could set URL validity on every early return of the URL parser
during state override, it has been a long standing FIXME around the code
to try and remove the awkward validity state of the URL class. So this
commit makes the first stage of this change by migrating the basic
parser API to return Optional, which also happens to make this subtle
issue not a problem any more.
2025-01-11 10:08:29 -05:00
Shannon Booth
87c8ae31d3 LibURL: Set IDNA's IgnoreInvalidPunycode to false
See: https://github.com/whatwg/url/commit/a6e449 - which should have no
functional change.
2024-12-05 17:29:49 +01:00
Shannon Booth
5dfb825c5c LibURL: Set IDNA's CheckHyphens to the value of beStrict
See: https://github.com/whatwg/url/commit/cd8f1d
2024-12-05 17:29:49 +01:00
Shannon Booth
24267db6b2 LibURL: Implement "find the IPv6 address compressed piece index" helper
This was an editorial change in the spec to put a somewhat complex spec
step in it's own AO.
2024-12-05 17:29:49 +01:00
Shannon Booth
0b4670fb7c LibURL: Percent decode over byte sequence
Instead of going over UTF-8 code points. This better follows the spec,
and is also more performant.
2024-12-05 17:29:49 +01:00
Shannon Booth
0fa54c2327 LibURL+LibWeb: Make URL::serialize return a String
Simplifying a bunch of uneeded error handling around the place.
2024-12-04 16:34:13 +00:00
Jonne Ransijn
d7596a0a61 AK: Don't implicitly convert Optional<T&> to Optional<T>
C++ will jovially select the implicit conversion operator, even if it's
complete bogus, such as for unknown-size types or non-destructible
types. Therefore, all such conversions (which incur a copy) must
(unfortunately) be explicit so that non-copyable types continue to work.

NOTE: We make an exception for trivially copyable types, since they
are, well, trivially copyable.

Co-authored-by: kleines Filmröllchen <filmroellchen@serenityos.org>
2024-12-04 01:58:22 +01:00
Sam Atkins
900c131178 LibURL: Make URL::serialized_host() infallible
This can no longer fail, so update the return type to match.

This makes a few more methods now unable to return errors, but one thing
at a time. 😅
2024-11-30 12:07:39 +01:00
Sam Atkins
b83f015c70 LibURL: Implement Site concept 2024-11-30 12:07:39 +01:00
Sam Atkins
2e64e0b836 LibURL: Migrate Origin scheme from ByteString to String 2024-11-30 12:07:39 +01:00
Sam Atkins
7f7f6e490b LibURL: Implement Host::public_suffix() and registrable_domain()
These algorithms are used in following commits.
2024-11-30 12:07:39 +01:00
Sam Atkins
63688148b9 LibURL: Promote Host to a proper class
This lets us move a few Host-related functions (like serialization and
checks for what the Host is) into Host instead of having them dotted
around the codebase.

For now, the interface is still very Variant-like, to avoid having to
change quite so much in one go.
2024-11-30 12:07:39 +01:00
Sam Atkins
90e763de4c LibURL: Replace Host's Empty state with making Url's Host optional
A couple of reasons:
- Origin's Host (when in the tuple state) can't be null
- There's an "empty host" concept in the spec which is NOT the same as a
  null Host, and that was confusing me.
2024-11-30 12:07:39 +01:00
Sam Atkins
8b984c0c57 LibURL: Clarify whether an Origin is opaque
Origins are immutable and we know on construction whether an Origin is
opaque. This also removes an implicit reliance on Host's Empty state.
2024-11-30 12:07:39 +01:00
Sam Atkins
3124dca528 LibURL+LibWebView: Move public suffix data to LibURL 2024-11-30 12:07:39 +01:00
Shannon Booth
8f6fe1de83 LibURL+LibWeb: Make URL serialization return a String
This can only ever fail from OOM, and will never by string containing
random byte sequences.
2024-11-23 16:43:55 +01:00
Timothy Flynn
93712b24bf Everywhere: Hoist the Libraries folder to the top-level 2024-11-10 12:50:45 +01:00