These were used to provide a layer of abstraction between ResourceLoader
and the networking backend. Now that we only have RequestServer, we can
remove these adapters to make the code a bit easier to follow.
The web server for WPT has a tendency to just disconnect after sending
us a resource. This makes curl think an error occurred, but it's
actually still recoverable and we have the data.
So instead of just bailing, do what we already do for other kinds of
resources and try to parse the data we got. If it works out, great!
It would be nice to solve this in the networking layer instead, but
I'll leave that as an exercise for our future selves.
Web specs do not return through javascript percent decoded URL path
components - but we were doing this in a number of places due to the
default behaviour of URL::serialize_path.
Since percent encoded URL paths may not contain valid UTF-8 - this was
resulting in us crashing in these places.
For example - on an HTMLAnchorElement when retrieving the pathname for
the URL of:
http://ladybird.org/foo%C2%91%91
To fix this make the URL class only return the percent encoded
serialized path, matching the URL spec. When the decoded path is
required instead explicitly call URL::percent_decode.
This fixes a crash running WPT URL tests for the anchor element on:
https://wpt.live/url/a-element.html
This will reduce log noise when visiting sites
that have a lot of filtered content.
Previously, red error text would be displayed in
the logs for each filtered URL.
Before we had HTTP::HeaderMap (which preserves multiple headers with the
same name), we collected multiple "Set-Cookie" headers and bundled them
together as a JSON array.
This was a huge hack, and now we can stop doing that, since LibWeb gets
access to the full set of headers now.
Instead of using a HashMap<ByteString, ByteString, CaseInsensitive...>
everywhere, we now encapsulate this in a class.
Even better, the new class also allows keeping track of multiple headers
with the same name! This will make it possible for HTTP responses to
actually retain all their headers on the perilous journey from
RequestServer to LibWeb.
This adds an alternate API to ResourceLoader to load HTTP/HTTPS/Gemini
requests unbuffered. Most of the changes here are moving parts of the
existing ResourceLoader::load method to helper methods so they can be
re-used by the new ResourceLoader::load_unbuffered.
LibWeb will need to use unbuffered requests to support server-sent
events. Connection for such events remain open and the remote end sends
data as HTTP bodies at its leisure. The browser needs to be able to
handle this data as it arrives, as the request essentially never
finishes.
To support this, this make Protocol::Request operate in one of two
modes: buffered or unbuffered. The existing mechanism for setting up a
buffered request was a bit awkward; you had to set specific callbacks,
but be sure not to set some others, and then set a flag. The new
mechanism is to set the mode and the callbacks that the mode needs in
one API.
This is to avoid including any LibProtocol header in Objective-C source
files, which will cause a conflict between the Protocol namespace and a
@Protocol interface.
See Ladybird/AppKit/Application/ApplicationBridge.cpp for why this
conflict unfortunately cannot be worked around.
This is a fetching AO and is only used by LibWeb in the context of fetch
tasks. Move it to LibWeb with other fetch methods.
The main reason for this is that it requires the use of other LibWeb AOs
such as the forgiving Base64 decoder and MIME sniffing. These AOs aren't
available within LibURL.
This URL library ends up being a relatively fundamental base library of
the system, as LibCore depends on LibURL.
This change has two main benefits:
* Moving AK back more towards being an agnostic library that can
be used between the kernel and userspace. URL has never really fit
that description - and is not used in the kernel.
* URL _should_ depend on LibUnicode, as it needs punnycode support.
However, it's not really possible to do this inside of AK as it can't
depend on any external library. This change brings us a little closer
to being able to do that, but unfortunately we aren't there quite
yet, as the code generators depend on LibCore.
This was blocked because it can be used for cross-protocol attacks on
some network printers. However, it's also used by the web platform
tests. One can argue that getting WPT working is more important than
theoretical attacks on poorly configured printers.
I noticed while debugging a fully downloaded page that it was trying
to preconnect to a file:// host. That doesn't make any sense, so let's
add a tiny bit of logic to ignore preconnect requests for file: and
data: URLs.
The resource:// scheme is used for Core::Resource files. Currently, any
users of resource:// URLs in Ladybird must manually create the Resource
and extract its data. This will allow for passing the resource:// URL
along for LibWeb to handle.
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')
This is a hack on top of a hack because Workers don't *really* need to
have a Web::Page at all, but the ResourceLoader infra that should be
going away soon ™️ is not quite ready to axe that requirement for
cookies.
Before, navigator.platform would always report the platform as "Serenity
OS", regardless of whether or not that was true. It also did not include
the architecture, which Firefox and Chrome both do. Now, it can report
either "Linux x86_64" or "SerenityOS AArch64".
Parsing 'data:' URLs took it's own route. It never set standard URL
fields like path, query or fragment (except for scheme) and instead
gave us separate methods called `data_payload()`, `data_mime_type()`,
and `data_payload_is_base64()`.
Because parsing 'data:' didn't use standard fields, running the
following JS code:
new URL('#a', 'data:text/plain,hello').toString()
not only cleared the path as URLParser doesn't check for data from
data_payload() function (making the result be 'data:#a'), but it also
crashes the program because we forbid having an empty MIME type when we
serialize to string.
With this change, 'data:' URLs will be parsed like every other URLs.
To decode the 'data:' URL contents, one needs to call process_data_url()
on a URL, which will return a struct containing MIME type with already
decoded data! :^)
This makes the loader more agnostic.
Additionally, this allows us to load tab in Ladybird with a 'data:' URL
containing parameters, as a Resource will now call
`mime_type_from_content_type` to extract the content type from MIME. :^)
This now defaults to serializing the path with percent decoded segments
(which is what all callers expect), but has an option not to. This fixes
`file://` URLs with spaces in their paths.
The name has been changed to serialize_path() path to make it more clear
that this method will generate a new string each call (except for the
cannot_be_a_base_url() case). A few callers have then been updated to
avoid repeatedly calling this function.