Commit graph

49 commits

Author SHA1 Message Date
Gingeh
8e342e3e23 LibWeb: Use Content-Type header to set document encoding
Co-authored-by: Shannon Booth <shannon@serenityos.org>
2024-10-23 11:30:59 -06:00
0x4261756D
c1a14f66ad HTMLEncodingDetection: Use mime type in encoding sniffing
Also added proper spec comments.
Fixes at least one WPT test that was failing previously:
https://wpt.live/encoding/single-byte-decoder.window.html?document
2024-10-12 16:14:38 +02:00
Andreas Kling
cc4b3cbacc Meta: Update my e-mail address everywhere 2024-10-04 13:19:50 +02:00
Khaled Lakehal
2565757c7a LibWeb: Set document type to HTML for text and media documents
This update fixes an issue where the document type was incorrectly set
to XML instead of HTML when initializing text and media documents.
2024-08-30 08:28:16 -04:00
Andreas Kling
f2fd8fc928 Everywhere: Remove LibGemini
This hasn't been maintained (or worked at all) for a long time,
and it's not a widely supported protocol, so let's drop it.
2024-06-04 09:19:39 +02:00
Andreas Kling
e4cd91761d Everywhere: Remove LibMarkdown
This was used to convert markdown into HTML for display in the browser,
but no other browser behaves this way, so let's simplify things by
removing it.

(Yes, we could implement all kinds of "convert to HTML and display" for
every file format out there, but that's far outside the scope of a
browser engine.)
2024-06-04 09:19:39 +02:00
Timothy Flynn
b5ba60f1d1 LibWeb: Change Fetch's ProcessBodyError to accept a plain JS value
This callback is meant to be triggered by streams, which does not always
provide a WebIDL::DOMException. Pass a plain value instead. Of all the
users of this callback, only one actually uses the value, and already
converts the DOMException to a plain value.
2024-05-20 16:57:52 -04:00
Timothy Flynn
1ffda6a805 LibWeb: Propagate OOM in Body::fully_read() through its error callback
Fetched bodies can be on the order of gigabytes, so rather than crashing
when we hit OOM here, we can simply invoke the error callback with a DOM
exception. We use "UnknownError" here as the spec directly supports this
for OOM errors:

    UnknownError: The operation failed for an unknown transient reason
                  (e.g. out of memory).

This is still an ad-hoc implementation. We should be using streams, and
we do have the AOs available to do so. But they need to be massaged to
be compatible with callers of Body::fully_read. And once we do use
streams, this function will become infallible - so making it infallible
here is at least a step in the right direction.
2024-04-27 07:08:14 +02:00
Timothy Flynn
c79f46fe6f LibWeb: Remove OOM propagation from Fetch::Infrastructure::Headers 2024-04-27 07:08:14 +02:00
Andreas Kling
184368285c LibWeb: Fix GC leaks in Fetch::Infrastructure::Body::fully_read()
By making this function accept the success and error steps as
HeapFunction rather than SafeFunction, we break a bunch of strong
GC cycles.
2024-04-23 12:50:40 +02:00
Andreas Kling
5ef6df79ed LibWeb: Generate a simple error page when XML decode/parse fails
This fixes a regression on Acid3, since we are not expected to "best
effort" parse XML. The test specifically checks that you don't create an
incomplete, incorrect DOM.
2024-04-19 11:44:32 +02:00
Andreas Kling
fa43321938 LibWeb: Fire iframe load event for frames with badly encoded XML
When loading an XML resource into an iframe and the resource fails to
decode (e.g due to invalid UTF-8), we must still fire a load event.

This fixes the regression in subtest 69 of Acid3.
2024-04-18 15:40:16 +00:00
Aliaksandr Kalenik
768b1455d6 LibWeb: Run <object> fallback steps if data type is not supported
Progress on fixing regressed Acid2.
2024-04-16 13:47:38 +02:00
Aliaksandr Kalenik
ba633882bf LibWeb: Change load_html_document to run parsing from deferred_invoke()
...callback, otherwise Networking task source will be blocked until the
end of HTML parsing.

This is a preparation before forbidding to interleave HTML tasks with
the same source.
2024-04-10 07:36:42 +02:00
Timothy Flynn
48fb343230 LibWeb: Change HTMLParser's factory to accept the encoding as StringView
No need to force an allocation. This makes a future patch a bit simpler,
where we will have the encoding as a String. With this patch, we won't
have to convert it to a ByteString.
2024-04-04 11:23:21 +02:00
Aliaksandr Kalenik
ffd3639b17 LibWeb: Pass navigation params by const-ref to load_document() 2024-03-28 15:34:52 +01:00
ronak69
05e53b5fd3 LibWeb: Use file's URL as document's URL when loading markdown files
A markdown file gets loaded as an inline content document by
`create_document_for_inline_content()`, for which the default document
URL is "about:error". That breaks the fragment links.

Overriding "about:error" URL by passing the URL of the just loaded
markdown file as an argument to `HTMLParser::run()` ensures that the URL
of the document is as expected.
2024-02-08 07:58:07 -07:00
Bastiaan van der Plaat
e33ae9a58b LibWeb: Improve media document styling to center image 2024-01-07 10:16:24 +01:00
Sam Atkins
5e2fc52b25 LibWeb: Remove old ad-hoc document-loading code
This also removes the code for displaying `gemini://` documents. We
currently don't load documents from that protocol anyway - we hit
`attempt_to_create_a_non_fetch_scheme_document()` in `Navigable.cpp`
which is just a stub. It looks like we should be handling those
separately from regular "fetch" documents, so that's a task for a
future person.
2023-12-26 18:35:29 +01:00
Sam Atkins
9733524f8a LibWeb: Load markdown documents using the spec mechanism
This basically just means it now goes through the
`create_document_for_inline_content()` function.
2023-12-26 18:35:29 +01:00
Sam Atkins
c5223ae77f LibWeb: Adjust create_document_for_inline_content() for future use
(Apologies for bad commit title, it's hard to explain in such a short
space!)

We're going to need to call this for producing markdown and gemini
documents, both of which need a Document and Realm to fetch the entire
response body, so that they can then generate their HTML. So this
commit modifies `create_document_for_inline_content()` to take a lambda
instead of a fixed HTML string, to support these uses.

Also, we always return a nonnull pointer, so make that the return type.

This is a move and change in the same commit, (Sorry!) but all the
changes are to the function signature and step 6.
2023-12-26 18:35:29 +01:00
Sam Atkins
ae8e040287 LibWeb: Bring media-document loading closer to spec
There's an unfortunate hack here. We have to load the media file's data
before we call `HTML::HTMLParser::the_end()` with our generated
document, otherwise the media element (`<img>`/`<audio>`/`<video>`)
never loads and that blocks the document's load event. The previous code
path also did this, which is perhaps why the bug was never noticed.
2023-12-26 18:35:29 +01:00
Sam Atkins
91d82ae17a LibWeb: Bring text-document parsing to spec 2023-12-26 18:35:29 +01:00
Sam Atkins
4dbca3e14a LibWeb: Bring XML-document loading to spec 2023-12-26 18:35:29 +01:00
Sam Atkins
6c74069c1e LibWeb: Bring HTML-document loading to spec 2023-12-26 18:35:29 +01:00
Sam Atkins
933231ffd4 LibWeb: Spec-comment load_document()
This function is currently very ad-hoc. This commit adds comments which
are almost entirely FIXMEs, so that we can then start filling in the
details one step at a time.
2023-12-26 18:35:29 +01:00
Sam Atkins
8dc8d57418 LibWeb: Make load_document()'s NavigationParams non-optional
There's no mention in the spec of this being optional, all the places
that call it always pass a NavigationParams directly, and we're
VERIFYing that it's got a value too!
2023-12-26 18:35:29 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Shannon Booth
f976ec005c LibWeb: Port DOM::Document from DeprecatedString 2023-12-02 22:54:53 +01:00
Idan Horowitz
9677d8eeac LibWeb: Reject improperly encoded XML documents as not well-formed 2023-11-17 16:02:36 +01:00
Idan Horowitz
278e8afb42 LibWeb: Consider content-type charset when determining XML encoding 2023-11-17 16:02:36 +01:00
Idan Horowitz
07ea3ab306 LibWeb: Display error page when document parsing fails 2023-11-17 16:02:36 +01:00
Andreas Kling
3ff81dcb65 LibWeb: Make Web::Namespace::Foo strings be FlyString
This required dealing with a *lot* of fallout, but it's all basically
just switching from DeprecatedFlyString to either FlyString or
Optional<FlyString> in a hundred places to accommodate the change.
2023-11-04 21:28:30 +01:00
Andreas Kling
f052823f5f LibWeb: Use FlyString for create_element() namespace strings 2023-11-04 21:28:30 +01:00
Shannon Booth
e4f8c59210 LibWeb: Port AttributeNames to FlyString 2023-10-08 08:11:48 -04:00
Bastiaan van der Plaat
04ee15a5ad Ladybird+LibWeb: Use old error.html template for navigation errors again 2023-09-24 19:59:00 -06:00
Andrew Kaster
dc0f7c4c54 LibWeb: Align NavigationParams and the creation AOs to the spec
And remove assorted spec FIXMEs along the way. Also align
populate_session_history_entry_document to the spec, with a bonus spec
bug to be filed.

This involves creating a new NonFetchSchemeNavigationParams spec, and
having the associated AOs take a Variant rather than Optional to
accomodate the fact that this extra struct could be returned by the
algorithm. We don't actually *do* anything with these params, but the
scaffolding is there now, with less TODOs.
2023-09-22 19:45:11 -06:00
Aliaksandr Kalenik
4446858401 LibWeb: Do not crash if parsing failed in load_document()
If `load_document()` is called with a response that has a mime type we
can't use to build a document, we should return nullptr as the spec
says, instead of crashing. Also we should not crash if error happened
during parsing.
2023-09-16 16:53:32 +02:00
Shannon Booth
e74031a396 LibWeb: Port Document interface from DeprecatedString to String 2023-09-16 11:17:19 +02:00
Aliaksandr Kalenik
b7c93cae7f LibWeb: Set document content type in load_document()
Fixes regression in `load_document()` compared to FrameLoader.
2023-09-15 18:27:17 +02:00
Bastiaan van der Plaat
222cc29c5c LibWeb: Add XMLHttpRequest Document response type 2023-09-14 22:58:42 +02:00
Shannon Booth
bcb6851c07 LibWeb: Port Text interface from DeprecatedString to String 2023-09-06 11:44:45 -04:00
Aliaksandr Kalenik
bdd3a16b16 LibWeb: Make Fetch::Infrastructure::Body be GC allocated
Making the body GC-allocated allows us to avoid using `JS::Handle`
for `m_stream` in its members.
2023-08-19 15:12:00 +02:00
Andreas Kling
72c9f56c66 LibJS: Make Heap::allocate<T>() infallible
Stop worrying about tiny OOMs. Work towards #20449.

While going through these, I also changed the function signature in many
places where returning ThrowCompletionOr<T> is no longer necessary.
2023-08-13 15:38:42 +02:00
Luke Wilde
7ec7015750 LibWeb: Create an audio document for audio/ MIME types on navigation 2023-06-17 14:16:26 +02:00
Timothy Flynn
3a28be2a98 LibWeb: Parse SVG document types as XML documents
We began parsing SVG documents as HTML years ago in commit 05be648. This
was long before we had an XML parser, and actually violates the spec.
Since SVG documents have a MIME type of "image/svg+xml", the spec
mandates the document should be parsed as XML.

One impact here is that the SVG document is no longer "fixed" to include
<html>, <head>, and <body> tags. This will have prevented document.title
from detecting the document element is an SVG element.
2023-06-09 01:12:48 +02:00
Sam Atkins
9c2d496dbe LibWeb: Make processBodyError take an optional exception
Changed here:
018ac19838
2023-05-15 16:28:16 +02:00
Aliaksandr Kalenik
de2c016556 LibWeb: Implement "attempt to populate the history entry's document"
Implements:
https://html.spec.whatwg.org/multipage/browsing-the-web.html#attempt-to-populate-the-history-entry's-document

This is going to be a replacement for `FrameLoader::load()` after
switching to navigables.

Brief description of `populate_session_history_entry_document`:
- If navigation params have url with fetch scheme then DOM document
  will be populated by fetching url and parsing response. This
  is going to be a replacement for `FrameLoader::load(AK::URL&)`.
- If url in navigation params is abort:srcdoc then DOM document
  will be populated by parsing HTML text passed in document resource.
  This is going to be a replacement for `FrameLoader::load_html()`
2023-05-03 09:39:49 +02:00
Aliaksandr Kalenik
804af38a96 LibWeb: Move document loading into separate file
In upcoming navigables refactor new function that uses fetch
instead of ResourceLoader to load document content is going to be
introduced:
https://html.spec.whatwg.org/multipage/browsing-the-web.html#create-navigation-params-by-fetching

`parse_document()` need to be separated from FrameLoader to reuse
code responsible for parsing HTTP response into DOM document.
2023-05-03 09:39:49 +02:00