Commit graph

299 commits

Author SHA1 Message Date
Andreas Kling
64747c0397 LibWeb: Make HTML parser "current node" APIs return nullable pointer
It's actually possible for there to be no adjusted current node, when
the stack of open elements is empty. This was covered by one of the WPT
parsing tests.
2024-11-03 20:32:32 +01:00
Andreas Kling
ebce483278 LibWeb: Add spec comments to the "in head noscript" parser state 2024-11-03 20:32:32 +01:00
Andreas Kling
0a47d8cb08 LibWeb: Handle more MathML/HTML integration points in the parser 2024-11-03 20:32:32 +01:00
Andreas Kling
ae39c54e51 LibWeb: Fix SVG tag adjustment for feSpotLight, feTile, feTurbulence 2024-11-03 20:32:32 +01:00
Andreas Kling
b82e34ed00 LibWeb: Flesh out the "in frameset" parsing state and add spec comments
This also implements two FIXMEs, which were covered by WPT tests.
2024-11-03 17:51:44 +01:00
Andreas Kling
ecc9fdf377 LibWeb: Fix incomplete assertion in the "in body" parsing state
We were incorrectly tripping an assertion failure on some WPT tests.
2024-11-03 17:51:44 +01:00
Andreas Kling
40ac3cc2c8 LibWeb: Implement HTML frameset parsing in the "in body" state
Covered by many WPT parsing tests, which will be imported.
2024-11-03 17:51:44 +01:00
Andreas Kling
cfa820922b LibWeb: Bail nicely on EOF in HTML parser "in body" state
If we reach the end of the token stream when "ignoring and moving on
to the next token" in the "in body" state, we should just not move
on to the next token, since there isn't one.

Covered by various WPT HTML parsing tests that will be imported in
a subsequent commit.
2024-11-03 17:51:44 +01:00
Sam Atkins
f80fca2dee LibWeb/HTML: Update legacy color parsing steps to match spec 2024-10-31 19:23:23 +00:00
Shannon Booth
1c18b900e2 LibWeb: Port EventLoop::spin_XXX to HeapFunction 2024-10-30 20:55:45 +01:00
Gingeh
8e342e3e23 LibWeb: Use Content-Type header to set document encoding
Co-authored-by: Shannon Booth <shannon@serenityos.org>
2024-10-23 11:30:59 -06:00
Andrew Kaster
9d0ce4df0f LibWeb: Add support for parsing comments in the Swift HTML tokenizer 2024-10-16 08:31:42 +02:00
0x4261756D
c1a14f66ad HTMLEncodingDetection: Use mime type in encoding sniffing
Also added proper spec comments.
Fixes at least one WPT test that was failing previously:
https://wpt.live/encoding/single-byte-decoder.window.html?document
2024-10-12 16:14:38 +02:00
Andreas Kling
cc4b3cbacc Meta: Update my e-mail address everywhere 2024-10-04 13:19:50 +02:00
Andrew Kaster
d96c7edfb6 LibWeb: Add more HTML tokenization states to Swift implementation
This patch adds support for start and end tags, as well as script tag
rules.
2024-10-02 09:44:38 +02:00
Andrew Kaster
7aa0165fe7 LibWeb: Deduplicate attributes when emitting start and end tags
The HTML tokenizer specification says that we're supposed to do this
when leaving the Attribute name or when emitting the token, as
appropriate.

Hopefully 'as appropriate' can mean only when emitting the token, as
that's the easiest place to insert this logic without complicating the
tokenizer any more.
2024-10-01 11:04:28 +02:00
Andreas Kling
a0ed12e839 LibWeb: Always flush character insertions before exiting HTML parser
This fixes an issue where document.write() with only text input would
leave all the character data as unflushed text in the parser.

This fixes many of the WPT tests for document.write().
2024-09-21 10:05:48 +02:00
Andrew Kaster
77718c0a66 LibWeb: Implement the Data state for the Swift tokenizer
And add tests! This implementation closely follows the current C++
implementation, replacing macros and gotos with a slightly more
complex state machine. It's very possible that an async version that
yields tokens on "emit" would be even simpler, but let's get this
one working first :).
2024-08-29 06:31:25 +02:00
Andrew Kaster
a3e6856b56 AK+Swift: Remove Foundation.Data footgun for AK.StringView
Also give the Swift.String init routines an explict label when
constructing from AK String types, as this caused issues in a later
commit to have them both with `_ data`.
2024-08-29 06:31:25 +02:00
Andreas Kling
b64df59cc6 LibWeb: Fix crash when setting innerHTML inside iframe srcdoc document
In particular, there was an assertion failure due to the temporary
parser document's "about base URL" being empty when trying to "parse a
URL" during parsing.

We fix this by copying the context element's document's about base URL
to the temporary parsing document while parsing a fragment.

This fixes a crash when loading search results on https://amazon.com/
2024-08-29 06:24:18 +02:00
Andrew Kaster
c5153cb398 Meta+Libraries+AK: Append Cxx to imported library module names in swift
At the same time, simplify CMakeLists magic for libraries that want to
include Swift code in the library. The Lib-less name of the library is
now always the module name for the library with any Swift additions,
extensions, etc. All vfs overlays now live in a common location to make
finding them easier from CMake functions. A new pattern is needed for
the Lib-less modules to re-export their Cxx counterparts.
2024-08-27 17:22:31 -06:00
Andrew Kaster
49733ed09b LibWeb: Add an HTML tokenizer re-implementation in swift
It doesn't do much yet, the fun part was the scaffolding
2024-08-24 19:14:09 -06:00
Andrew Kaster
33e50889f2 LibWeb: Add CustomStringConvertible extension for HTMLToken types 2024-08-23 19:17:20 -06:00
Andrew Kaster
fb074f9d0c LibWeb: Add start of HTML Tokenizer in Swift
Currently it's just a Token class.
2024-08-23 19:17:20 -06:00
Jamie Mansfield
b3fa8f0ce2 LibWeb/HTML: MathML's <ms> is a special tag
This is an omission I noticed while browsing some code :^)
2024-08-17 07:40:10 +02:00
Shannon Booth
07940a89ca LibWeb: Handle cases with <template> on the HTML parsing stack
This appears to have been a bug in the spec which was later corrected -
so to fix the crash we can simply remove this assertion.

Fixes: #868
2024-08-16 22:38:18 +01:00
Sam Atkins
0e3487b9ab LibWeb: Rename StyleValue -> CSSStyleValue
This matches the name in the CSS Typed OM spec.
https://drafts.css-houdini.org/css-typed-om-1/#cssstylevalue

No behaviour changes.
2024-08-15 13:58:38 +01:00
Timothy Flynn
c838ca78c8 LibWeb: Indicate documents are for fragment parsing during construction
This will allow testing if they are for fragment parsing during methods
invoked from Document::initialize.
2024-08-01 11:35:49 +02:00
Timothy Flynn
657bbd1542 LibWeb: Append attributes to the correct element
The spec indicates we should append attributes to the top element of the
stack of open elements. We were appending the attribute to the bottom.
2024-07-30 09:41:35 +02:00
Timothy Flynn
9fe35ddddf LibWeb: Use an infallible method to add attributes to nodes
In the HTML parser spec, there are 2 instances of the following text:

    add the attribute and its corresponding value to that element

The "add the attribute" text does not have a corresponding spec link to
actually specify what to do. We currently use `set_attribute`, which can
throw an exception if the attribute name contains an invalid character
(such as '<'). Instead, switch to `append_attribute`, which allows such
attribute names. This behavior matches Firefox.

Note we cannot yet make the unclosed-html-element.html test match the
expectations of the unclosed-body-element.html due to another bug that
would prevent checking if the expected element has the right attribute.
That will be fixed in an upcoming commit.
2024-07-30 09:41:35 +02:00
Andreas Kling
7dacd6be89 LibWeb: Use static_cast<HTMLTemplateElement> right after an is<> check
The double verify_cast here was just barely visible in a profile.
2024-07-20 15:35:30 +02:00
Andreas Kling
f9f11dc51d LibWeb: Stop creating transient throwaway JS::Handles in HTML parser
These were being immediately stored in JS::GCPtrs (and dutifully visited
by HTMLParser), so creating temporary handles for them was a complete
waste of time.
2024-07-20 15:35:30 +02:00
Andreas Kling
7892ee355d LibWeb: Use StringBuilder::append_code_point() over append(Utf32View)
When appending a single Unicode code point, we don't have to go through
the trouble of creating a Utf32View wrapper over it.
2024-07-20 15:35:30 +02:00
Andreas Kling
4e0edd42b9 LibWeb: Cap HTML dimension values at 17895700 (same as Firefox)
Instead of allowing arbitrarily large values (which could eventually
overflow an i32), let's just cap them at the same limit as Firefox does.

Found by Domato.
2024-07-20 06:41:25 +02:00
Andreas Kling
1c00e5688d LibWeb: Fix StringView OOB access when parsing 3-character legacy color
Found by Domato.
2024-07-20 06:41:25 +02:00
Luke Warlow
ce8d3d17c4 LibWeb: Implement unsafe HTML parsing methods
Both Element's and ShadowRoot's setHTMLUnsafe, and Document's static
parseHTMLUnsafe methods are implemented.
2024-06-26 06:13:29 +02:00
Andreas Kling
e62db9c118 LibWeb: Update HTML fragment serialization for declarative shadow DOM 2024-06-25 19:22:35 +02:00
Andreas Kling
9eb4b91168 LibWeb: Parse declarative shadow DOM template elements
We now honor the shadowrootmode attribute on template elements while
parsing, and instantiate a shadow tree as required by the spec.
2024-06-25 19:22:35 +02:00
Matthew Olsson
a5f4c9a632 AK+Userland: Remove NOESCAPE
See the next commit for an explanation
2024-05-22 21:55:34 -06:00
Tim Ledbetter
c57d395a48 LibWeb: Use IterationDecision in single level Node iteration methods
`Node::for_each_child()` and `Node::for_each_child_of_type()` callbacks
now return an `IterationDecision`, which allows us to break early if
required.
2024-05-07 16:45:28 -06:00
Timothy Flynn
ec492a1a08 Everywhere: Run clang-format
The following command was used to clang-format these files:

    clang-format-18 -i $(find . \
        -not \( -path "./\.*" -prune \) \
        -not \( -path "./Base/*" -prune \) \
        -not \( -path "./Build/*" -prune \) \
        -not \( -path "./Toolchain/*" -prune \) \
        -not \( -path "./Ports/*" -prune \) \
        -type f -name "*.cpp" -o -name "*.mm" -o -name "*.h")

There are a couple of weird cases where clang-format now thinks that a
pointer access in an initializer list, e.g. `m_member(ptr->foo)`, is a
lambda return statement, and it puts spaces around the `->`.
2024-04-24 16:50:01 -04:00
Andreas Kling
d94a6d8873 LibWeb: Avoid creating tons of temporary FlyStrings in HTMLParser 2024-04-21 19:32:49 +02:00
Andreas Kling
990f8e10a5 LibWeb: Avoid redundant UTF-8 validation in HTML tokenizer 2024-04-21 19:32:49 +02:00
Kenneth Myhra
a3661fd7f2 LibWeb: Let queue_global_task() take a JS::HeapFunction
Changes the signature of queue_global_task() from AK:Function to
JS::HeapFunction to be more clear to the user of the function that this
is what it uses internally.
2024-04-20 18:11:01 +02:00
Andreas Kling
53d0dd4a2e LibJS+LibWeb: Use new Cell::Visitor helpers to avoid manual iteration 2024-04-16 07:40:01 +02:00
Shannon Booth
51a52a867c LibWeb: Use "current high resolution time" AO where relevant
And updating some spec comments to latest spec where it is not relevant.
2024-04-12 09:08:46 +02:00
Andreas Kling
870a954e11 LibWeb: Implement Element.outerHTML
This piggybacks on the same fragment serialization code that innerHTML
uses, but instead of constructing an imaginary parent element like the
spec asks us to, we just add a separate serialization mode that includes
the context element in the serialized markup.

This makes the image carousel on https://utah.edu/ show up :^)
2024-04-09 18:17:14 -04:00
Andreas Kling
0412e17bac LibWeb: Factor out attribute serialization into a separate function 2024-04-09 18:17:14 -04:00
Matthew Olsson
ff00d21d58 Everywhere: Mark a bunch of function parameters as NOESCAPE
This fixes the relevant warnings when running LibJSGCVerifier. Note that
the analysis is only performed over LibJS-adjacent code, but could be
performed over the entire codebase. That will have to wait for a future
commit.
2024-04-09 09:10:44 +02:00
Timothy Flynn
48fb343230 LibWeb: Change HTMLParser's factory to accept the encoding as StringView
No need to force an allocation. This makes a future patch a bit simpler,
where we will have the encoding as a String. With this patch, we won't
have to convert it to a ByteString.
2024-04-04 11:23:21 +02:00