Commit graph

39 commits

Author SHA1 Message Date
Shannon Booth
088b659abd LibURL: Do not treat port of 0 as a null port in Origin
It is not treated as the same thing in the specification, or by us in
other places too. This fixes 8 more origin related URL tests on WPT.
2024-10-05 10:46:30 +02:00
Shannon Booth
501f92b54e LibWeb+LibURL: Consolidate Origin parsing and serialization into LibURL
Because of the previous awkward factoring of Origin we had two
implementations of Origin serializing and creation. Move the
implementation of DOMURL::url_origin into URL::origin, and
instead use the implemenation of URL::Origin::serialize for
serialization (replacing URL::serialize_origin).

This happens to fix 8 URL subtests as the two implemenations had
diverged, and URL::serialize_origin was previously missing the spec
changes of: whatwg/url@eee49fd and whatwg/url@fff33c3
2024-10-05 10:46:30 +02:00
Shannon Booth
12ea470417 LibWeb+LibURL: Get blob's environments origin from URL's blob entry
This closer matches specification - and removes any dependency on
LibWeb in the implementation of DOMURL::url_origin.

It is also one step closer to moving BlobURLRegistry to a singleton
process to match LibWeb's multiprocess Worker architecture.
2024-10-05 10:46:30 +02:00
Shannon Booth
43973f2d0a LibURL: Define Origin methods depending on URL::Parser out of line
This is also in an effort towards resolving a future circular dependency
between Origin.h and URL.h
2024-10-05 10:46:30 +02:00
Shannon Booth
53ad982bb7 LibURL: Factor Host into a standalone header file
This will help solve a future circular dependency between URL.h
and Origin.h
2024-10-05 10:46:30 +02:00
Shannon Booth
dc401f49ea LibWeb+LibURL: Move HTML::Origin to URL::Origin
While Origin is defined in the HTML spec - this leaves us with quite an
awkward relationship as the URL spec makes use of AO's from what is
defined in the HTML spec.

To simplify this factoring, relocate Origin into LibURL.
2024-10-05 10:46:30 +02:00
Andreas Kling
cc4b3cbacc Meta: Update my e-mail address everywhere 2024-10-04 13:19:50 +02:00
Andreas Kling
af68771dda AK+LibURL: Move CopyOnWrite<T> from LibURL to AK 2024-09-10 13:51:28 +02:00
Shannon Booth
ff71d8f2c9 LibURL+LibWeb: Pass a mutable reference URL to URL parser
If given, the spec expects the input URL to be manipulated on the fly
as it is being parsed, and may ignore any errors thrown by the URL
parser.

Previously, we were not exactly following the specs assumption here
which resulted in us needed to make awkward copies of the URL in these
situations.

For most cases this is not an issue. But it does cause problems for
situations where URL parsing would result in a failure (which is
ignored by the caller), and the URL is _partially_ updated
while parsing.

Such a situation can occur when setting the host of an href alongside a
port number which is not valid. It is expected that this situation will
result in the host being updates - but not the port number.

Adjust the URL parser API so that it mutates the URL given (if any), and
adjust the callers accordingly.

Fixes two tests on https://wpt.live/url/url-setters-a-area.window.html
2024-08-13 14:14:34 +02:00
Shannon Booth
4bb211ba88 LibURL: Make percent_encode_after_encoding infallible 2024-08-12 23:01:29 +01:00
Shannon Booth
84a09476ba LibURL: Update spec comment for validation error in authority state
See: 3e8cd02bb
2024-08-10 10:39:43 +02:00
Shannon Booth
84a7fead0e LibURL: Make percent_encode return a String
This simplifies a bunch of places which were needing to error check and
convert from a ByteString to String.
2024-08-10 10:39:43 +02:00
BenJilks
0ca5675d59 LibTextCodec: Implement iso-2022-jp encoder
Implements the `iso-2022-jp` encoder, as specified by
https://encoding.spec.whatwg.org/#iso-2022-jp-encoder
2024-08-08 17:49:58 +01:00
BenJilks
c1958437f9 LibWeb: Use text encoding from DOM when parsing URLs
This passes the DOM encoding down to the URL parser, so the correct
encoder can be used.
2024-08-08 17:49:58 +01:00
BenJilks
72d0e3284b LibTextCodec+LibURL: Implement utf-8 and euc-jp encoders
Implements the corresponding encoders, selects the appropriate one when
encoding URL search params. If an encoder for the given encoding could
not be found, fallback to utf-8.
2024-08-08 17:49:58 +01:00
Shannon Booth
6cac2981fb LibURL: Fail parsing IPV4 URLs starting with 0x that overflow
Parsing last as an IPV4 number was not returning true in "ends with a
number" as the parsing of that part was overflowing. This means that the
URL is not considered to be an IPv4 address, and is treated as a valid
domain.

Helpfully, the spec also points out in a note that this step is
equivalent to simply checking that the last part ends with 0x followed
by only hex digits - which doesn't suffer from any overflow problem!

Arguably this is an editorial issue in the spec where this should be
clarified a little bit. But for now, fixing this fixes 3 sub tests in
WPT for:

https://wpt.live/url/url-constructor.any.html
2024-08-06 23:08:12 +01:00
Shannon Booth
db3f118046 LibURL: Fix heuristic for URL domain parsing IDNA fast path
Our heuristic was a bit too simplistic and would not run through the
ToASCII unicode algorithm which performs some extra validation. This
would cause invalid URLs that should fail to be parsed be mistakenly
accepted.

This fixes 8 tests in: https://wpt.live/url/url-constructor.any.html
2024-08-06 23:08:12 +01:00
Shannon Booth
1dc4959e91 LibURL: Don't return early parsing a URL with an empty input
We can't simply use the base URL as it may need to be modified in some
form. For example - for the included test, the fragment was previously
being included in the resulting URL.

This fixes 1 test on https://wpt.live/url/url-constructor.any.html
2024-08-06 23:08:12 +01:00
Shannon Booth
d161602b6d LibURL: Fix method name in debug logging 2024-08-06 23:08:12 +01:00
Shannon Booth
d9927d128c LibURL: Remove unused URL::create_with_help_scheme
This is unused since the SerenityOS split.
2024-08-06 23:08:12 +01:00
Shannon Booth
8723f72f0f LibURL: Remove unspecified steps in URL file slash parsing state
There were some extra steps in there which produced wrong results for
relative file URLs.

Fixes 7 test cases in: https://wpt.live/url/url-constructor.any.html

We also need to adjust the test results in TestURL. The behaviour tested
does not match how URL is specified to work as an absolute relative is
given.
2024-08-06 07:58:07 +01:00
Shannon Booth
d6af5bf5eb LibURL: Allow inputs containing only whitespace
The check for:

```
    if (start_index >= end_index)
        return {};
```

To prevent an out of bounds when trimming the start and end of the input
of whitespace was preventing valid URLs (only having whitespace in the
input) from being parsed.

Instead, prevent start_index from ever getting above end_index in the
first place, and don't treat empty inputs as an error.

Fixes one WPT test on:

https://wpt.live/url/url-constructor.any.html
2024-08-05 17:21:26 +01:00
Shannon Booth
4f5af3e90e LibURL: Remove FIXME for stripping c0 control or space
The FIXME was not correct - this was done correctly. But let's use a
helper to make the implementation slightly more readable.
2024-08-05 17:21:26 +01:00
Shannon Booth
670ce3ebb1 LibURL: Remove not particuarly useful NOTE
This doesn't seem like something we will neccessarily do in the URL
class anyway due to performance reasons - unless strictly needed (like
for the DOMURL implementation).
2024-08-05 17:21:26 +01:00
Shannon Booth
41cf9f6fe3 LibURL: Also remove carriage returns from URL input
The definition of an "ASCII tab or newline" also includes U+000D CR.

This fixes 3 subtests in:

https://wpt.live/url/url-constructor.any.html
2024-08-05 11:27:05 +02:00
Shannon Booth
cc55732332 LibURL+Everywhere: Only percent decode URL paths when actually needed
Web specs do not return through javascript percent decoded URL path
components - but we were doing this in a number of places due to the
default behaviour of URL::serialize_path.

Since percent encoded URL paths may not contain valid UTF-8 - this was
resulting in us crashing in these places.

For example - on an HTMLAnchorElement when retrieving the pathname for
the URL of:

http://ladybird.org/foo%C2%91%91

To fix this make the URL class only return the percent encoded
serialized path, matching the URL spec. When the decoded path is
required instead explicitly call URL::percent_decode.

This fixes a crash running WPT URL tests for the anchor element on:

https://wpt.live/url/a-element.html
2024-08-05 09:58:13 +02:00
Shannon Booth
ffe070d7f9 LibWeb+LibURL: Use URL paths directly for comparison
This matches the text of the spec a little more closely in many cases
and is also more efficient than serializing the URL path.
2024-08-05 09:58:13 +02:00
Shannon Booth
fdf4f1e887 LibURL: Validate for invalid _domain_ code points for non-opaque domains
We were previously not checking for C0 control, U+0025 (%), or U+007F
DELETE.

This makes another good set of URL tests in WPT pass :^)
2024-08-04 18:29:06 +01:00
Shannon Booth
f511c0b441 LibURL+LibWeb: Do not percent decode in password/username getters
Doing it is not part of the spec. Whenever needed, the spec will
explicitly percent decode the username and password.

This fixes some URL WPT tests.
2024-08-04 12:59:02 +01:00
Shannon Booth
a661daea71 LibURL: Don't consider file:// URL hosts as always opaque
Which was resulting in file URL hosts not being correctly percent
decoded.
2024-08-04 10:37:33 +02:00
Shannon Booth
dd7c720657 LibURL: Remove note about bare-boned URL host parsing
It's not so bare-boned any longer :^)
2024-08-04 10:37:33 +02:00
Andreas Kling
936b76f36e LibURL: Make URL a copy-on-write type
This patch moves the data members of URL to an internal URL::Data struct
that is also reference-counted. URL then uses a CopyOnWrite<T> template
to give itself copy-on-write behavior.

This means that URL itself is now 8 bytes per instance, and copying is
cheap as long as you don't mutate.

This shrinks many data structures over in LibWeb land. As an example,
CSS::ComputedValues goes from 3024 bytes to 2288 bytes per instance.
2024-08-02 20:37:40 +02:00
Tim Ledbetter
1a4b042664 LibURL: Convert ASCII only URLs to lowercase during parsing
This fixes an issue where entering EXAMPLE.COM into the URL bar in the
browser would fail to load as expected.
2024-06-10 20:34:57 -04:00
Andreas Kling
f2fd8fc928 Everywhere: Remove LibGemini
This hasn't been maintained (or worked at all) for a long time,
and it's not a widely supported protocol, so let's drop it.
2024-06-04 09:19:39 +02:00
Shannon Booth
27e1f4762c LibURL: Add BlobURLEntry to URL 2024-05-12 15:46:29 -06:00
Andreas Kling
d568b15acf LibURL: Avoid expensive IDNA::to_ascii() for all-ASCII domain strings
20% of CPU usage when loading https://utah.edu/ was spent doing these
ASCII conversions in URL parsing.
2024-04-05 16:01:10 -06:00
Timothy Flynn
576c2f4f4d LibURL+LibUnicode+LibWebView: Handle punycode directly in LibURL
We had defined punycode handling in LibUnicode when LibURL (AK at the
time) was unable to depend on LibUnicode. This is no longer the case.
2024-03-26 12:25:21 -04:00
Timothy Flynn
24ecf31ff5 LibURL+LibWeb: Move data URL processing to LibWeb's fetch infrastructure
This is a fetching AO and is only used by LibWeb in the context of fetch
tasks. Move it to LibWeb with other fetch methods.

The main reason for this is that it requires the use of other LibWeb AOs
such as the forgiving Base64 decoder and MIME sniffing. These AOs aren't
available within LibURL.
2024-03-25 08:13:27 +01:00
Shannon Booth
e800605ad3 AK+LibURL: Move AK::URL into a new URL library
This URL library ends up being a relatively fundamental base library of
the system, as LibCore depends on LibURL.

This change has two main benefits:
 * Moving AK back more towards being an agnostic library that can
   be used between the kernel and userspace. URL has never really fit
   that description - and is not used in the kernel.
 * URL _should_ depend on LibUnicode, as it needs punnycode support.
   However, it's not really possible to do this inside of AK as it can't
   depend on any external library. This change brings us a little closer
   to being able to do that, but unfortunately we aren't there quite
   yet, as the code generators depend on LibCore.
2024-03-18 14:06:28 -04:00