No code now relies on using URL's valid state.
A URL can still be _technically_ invalid through use of the URL
constructor or by directly changing URL fields.
However, all URLs should be constructed through the URL parser,
and we should ideally be getting rid of the default constructor
at some stage.
Also, any code which is manually setting URL fields need to be
aware that this is full of pitfalls since there are many different
forms of canonicalization which is bypassed by not going through
the URL parser.
Creating a URL should almost always go through the URLParser to
handle all of the small edge cases involved. This reduces the
need for URL valid state.
There's a bit of a UTF-8 assumption with this change. But nearly every
caller of these methods were immediately creating a String from the
resulting ByteString anyways.
We were not properly handling the case that prefix code point was the
empty string (which we represent as an OptionalNone). While this
still resulted in the correct pattern string being generated, an
incorrect regular expression was being generated causing matching
to fail.
This has no functional difference as far as I can tell, but for
clarity explicitly do not attempt to do this, which has the nice
side effect of not checking for whitespace known to not exist.
It turns out that the problem here was simply that we were trimming
trailing whitespace when we did not need to, which was meaning that
the port number of '80 ' was being converted to the empty string
per URLPattern elision as the port matches the http scheme.
Corresponds to URL spec change:
https://github.com/whatwg/url/commit/cc8b776b
Note that the new test failure being introduced here is an unrelated
WPT test change bundled in the resources test file update that I am
not convinced is correct.
The URL spec represents its path as a:
Variant<String, Vector<String>>
A URL is defined has having an opaque path if it has a single String,
the URL path otherwise containing a list of path components.
We (like in an older version of the spec) track this through a boolean
and only use a Vector with a single component for opaque paths.
This means it was incorrect to simple assign the path to a list with
a single empty string without setting that URL as opaque, which
meant that the path serialization was producing incorrect results.
It may make sense changing the API so this situation is a little more
clear. But for now, we simply need to set the opaque path boolean
to true here.
This provides the infrastructure for taking a part list from the
pattern parser and generating the actual regexp object which is
used for matching against URLs from the pattern.
Compiling a URLPattern component will generate a 'parts list' which
is used for generating the regular expression that is used for
matching against URLs.
This parts list is also used to generate (through this function) a
pattern string. The pattern string of a URL component is what is
exposed on the USVString getters of the URLPattern class itself.
As an example, the following:
```
let pattern = new URLPattern({ "pathname": "/foo/(.*)*" });
console.log(pattern.pathname);
```
Will log the pattern string of: '/foo/**'.
These control how a pattern string is generated, which can vary for
different components and is also impacted by the 'ignoreCase' option
that can be provided in the URLPattern constructor.
This is to save a future name conflict that will appear between
the options IDL dictionary and the options struct that are both
present in the spec.
It is also a nicer interface for now given there is only a single
option at the moment.
We were using the public suffix of the URL's host as its registrable
domain. But the registrable domain is actually the public suffix plus
one additional label.
As the comment in this file explains the caller of LibURL APIs are
meant to assume if they see any error, that it is a TypeError since
that is all the spec throws at the moment.
A custom error type exists here so that we can include more
information in TypeError's which are thrown.
All URLs are now either constucted through the URL Parser or by
default constructing a URL, and setting each of the fields of that
URL manually. This makes it much more difficult to create invalid
URLs.
This is the core object behind a URL pattern which when constructed
can be used for matching the pattern against URLs.
However, the implementation here is missing key functions such as
the constructor and the 'test'/'exec' functions as that relies on
a significant amount of supporting URLPattern infrastructure such
as two different parsers and a tokenizer.
However, this is enough for us to implement some more of the IDL
wrapper layer of this specification.
A URL pattern consists of components such as the 'port', 'password'
'hostname', etc. A component is compiled from the input to the
URLPattern constructor and is what is used for matching against
URLs to produce a match result.
This is also where the regex dependency is introduced into LibURL
to support the URLPattern implementation.
Currently we create URLs such as 'about:blank' through the StringView
or ByteString constructor of URL. However, in order to elimate the
use of URL::is_valid, we need to get rid of these constructors as it
makes it way too easy to create an invalid URL.
It is very cumbersome to construct an 'about:blank' URL when using
URL::Parser::basic_parse. So instead of doing that, create some
helper functions which will create the 'about:XXX' URLs with the
correct properties set.
Conveniently, this is also a much faster way of creating these URLs
as it means we do not need to parse the URL and can set all of the
members up front.