We were using the public suffix of the URL's host as its registrable
domain. But the registrable domain is actually the public suffix plus
one additional label.
As the comment in this file explains the caller of LibURL APIs are
meant to assume if they see any error, that it is a TypeError since
that is all the spec throws at the moment.
A custom error type exists here so that we can include more
information in TypeError's which are thrown.
All URLs are now either constucted through the URL Parser or by
default constructing a URL, and setting each of the fields of that
URL manually. This makes it much more difficult to create invalid
URLs.
This is the core object behind a URL pattern which when constructed
can be used for matching the pattern against URLs.
However, the implementation here is missing key functions such as
the constructor and the 'test'/'exec' functions as that relies on
a significant amount of supporting URLPattern infrastructure such
as two different parsers and a tokenizer.
However, this is enough for us to implement some more of the IDL
wrapper layer of this specification.
A URL pattern consists of components such as the 'port', 'password'
'hostname', etc. A component is compiled from the input to the
URLPattern constructor and is what is used for matching against
URLs to produce a match result.
This is also where the regex dependency is introduced into LibURL
to support the URLPattern implementation.
Currently we create URLs such as 'about:blank' through the StringView
or ByteString constructor of URL. However, in order to elimate the
use of URL::is_valid, we need to get rid of these constructors as it
makes it way too easy to create an invalid URL.
It is very cumbersome to construct an 'about:blank' URL when using
URL::Parser::basic_parse. So instead of doing that, create some
helper functions which will create the 'about:XXX' URLs with the
correct properties set.
Conveniently, this is also a much faster way of creating these URLs
as it means we do not need to parse the URL and can set all of the
members up front.
This is the return value of a URLPattern after `exec` is called on it.
It conveys information about the named (or unammed) regex groups
matched for each component of the URL. For example,
```
let p = new URLPattern({ hostname: "{:subdomain.}*example.com" });
const result = pattern.exec({ hostname: "foo.bar.example.com" });
console.log(result.hostname.groups.subdomain);
```
Will log 'foo.bar'.
The URLPattern spec is intended to be implemented inside of LibURL, with
LibWeb only responsible for the IDL conversion layer, in a similar
manner to how URL is implemented.
Instead of just putting in members directly, wrap them up in structs
which represent what a URL blob entry is meant to hold per the spec.
This makes more obvious what this is meant to represent, such as the
ByteBuffer being used to represent the bytes behind a Blob.
This also allows us to use a stronger type for a function that needs
to return a Blob URL entry's object.
Specifically, after implementing some recent spec changes to navigables,
we end up calling `get_public_suffix("localhost")` here, which returns
OptionalNone. This would previously crash.
Our get_public_suffix() seems a little incorrect. From the spec:
> If no rules match, the prevailing rule is "*".
> https://github.com/publicsuffix/list/wiki/Format#algorithm
However, ours returns an empty Optional in that case. To avoid breaking
other users of it, this patch modifies Host's uses of it, rather than
the function itself.
URL::basic_parse has a subtle bug where the resulting URL is not set
to valid when StateOveride is provided and the URL parser early returns
a valid URL.
This has not surfaced as a problem so far, as the only users of the
state override API provide an already valid URL buffer and also ignore
the result of basic parsing with a state override.
However, this bug surfaces implementing the URL pattern spec, which as
part of URL canonicalization:
* Provides a dummy URL record
* Basic URL parses that URL with state override
* Checks the result of the URL parser to validate the URL
While we could set URL validity on every early return of the URL parser
during state override, it has been a long standing FIXME around the code
to try and remove the awkward validity state of the URL class. So this
commit makes the first stage of this change by migrating the basic
parser API to return Optional, which also happens to make this subtle
issue not a problem any more.
C++ will jovially select the implicit conversion operator, even if it's
complete bogus, such as for unknown-size types or non-destructible
types. Therefore, all such conversions (which incur a copy) must
(unfortunately) be explicit so that non-copyable types continue to work.
NOTE: We make an exception for trivially copyable types, since they
are, well, trivially copyable.
Co-authored-by: kleines Filmröllchen <filmroellchen@serenityos.org>
This lets us move a few Host-related functions (like serialization and
checks for what the Host is) into Host instead of having them dotted
around the codebase.
For now, the interface is still very Variant-like, to avoid having to
change quite so much in one go.
A couple of reasons:
- Origin's Host (when in the tuple state) can't be null
- There's an "empty host" concept in the spec which is NOT the same as a
null Host, and that was confusing me.