Commit graph

272 commits

Author SHA1 Message Date
Timothy Flynn
c838ca78c8 LibWeb: Indicate documents are for fragment parsing during construction
This will allow testing if they are for fragment parsing during methods
invoked from Document::initialize.
2024-08-01 11:35:49 +02:00
Timothy Flynn
657bbd1542 LibWeb: Append attributes to the correct element
The spec indicates we should append attributes to the top element of the
stack of open elements. We were appending the attribute to the bottom.
2024-07-30 09:41:35 +02:00
Timothy Flynn
9fe35ddddf LibWeb: Use an infallible method to add attributes to nodes
In the HTML parser spec, there are 2 instances of the following text:

    add the attribute and its corresponding value to that element

The "add the attribute" text does not have a corresponding spec link to
actually specify what to do. We currently use `set_attribute`, which can
throw an exception if the attribute name contains an invalid character
(such as '<'). Instead, switch to `append_attribute`, which allows such
attribute names. This behavior matches Firefox.

Note we cannot yet make the unclosed-html-element.html test match the
expectations of the unclosed-body-element.html due to another bug that
would prevent checking if the expected element has the right attribute.
That will be fixed in an upcoming commit.
2024-07-30 09:41:35 +02:00
Andreas Kling
7dacd6be89 LibWeb: Use static_cast<HTMLTemplateElement> right after an is<> check
The double verify_cast here was just barely visible in a profile.
2024-07-20 15:35:30 +02:00
Andreas Kling
f9f11dc51d LibWeb: Stop creating transient throwaway JS::Handles in HTML parser
These were being immediately stored in JS::GCPtrs (and dutifully visited
by HTMLParser), so creating temporary handles for them was a complete
waste of time.
2024-07-20 15:35:30 +02:00
Andreas Kling
7892ee355d LibWeb: Use StringBuilder::append_code_point() over append(Utf32View)
When appending a single Unicode code point, we don't have to go through
the trouble of creating a Utf32View wrapper over it.
2024-07-20 15:35:30 +02:00
Andreas Kling
4e0edd42b9 LibWeb: Cap HTML dimension values at 17895700 (same as Firefox)
Instead of allowing arbitrarily large values (which could eventually
overflow an i32), let's just cap them at the same limit as Firefox does.

Found by Domato.
2024-07-20 06:41:25 +02:00
Andreas Kling
1c00e5688d LibWeb: Fix StringView OOB access when parsing 3-character legacy color
Found by Domato.
2024-07-20 06:41:25 +02:00
Luke Warlow
ce8d3d17c4 LibWeb: Implement unsafe HTML parsing methods
Both Element's and ShadowRoot's setHTMLUnsafe, and Document's static
parseHTMLUnsafe methods are implemented.
2024-06-26 06:13:29 +02:00
Andreas Kling
e62db9c118 LibWeb: Update HTML fragment serialization for declarative shadow DOM 2024-06-25 19:22:35 +02:00
Andreas Kling
9eb4b91168 LibWeb: Parse declarative shadow DOM template elements
We now honor the shadowrootmode attribute on template elements while
parsing, and instantiate a shadow tree as required by the spec.
2024-06-25 19:22:35 +02:00
Matthew Olsson
a5f4c9a632 AK+Userland: Remove NOESCAPE
See the next commit for an explanation
2024-05-22 21:55:34 -06:00
Tim Ledbetter
c57d395a48 LibWeb: Use IterationDecision in single level Node iteration methods
`Node::for_each_child()` and `Node::for_each_child_of_type()` callbacks
now return an `IterationDecision`, which allows us to break early if
required.
2024-05-07 16:45:28 -06:00
Timothy Flynn
ec492a1a08 Everywhere: Run clang-format
The following command was used to clang-format these files:

    clang-format-18 -i $(find . \
        -not \( -path "./\.*" -prune \) \
        -not \( -path "./Base/*" -prune \) \
        -not \( -path "./Build/*" -prune \) \
        -not \( -path "./Toolchain/*" -prune \) \
        -not \( -path "./Ports/*" -prune \) \
        -type f -name "*.cpp" -o -name "*.mm" -o -name "*.h")

There are a couple of weird cases where clang-format now thinks that a
pointer access in an initializer list, e.g. `m_member(ptr->foo)`, is a
lambda return statement, and it puts spaces around the `->`.
2024-04-24 16:50:01 -04:00
Andreas Kling
d94a6d8873 LibWeb: Avoid creating tons of temporary FlyStrings in HTMLParser 2024-04-21 19:32:49 +02:00
Andreas Kling
990f8e10a5 LibWeb: Avoid redundant UTF-8 validation in HTML tokenizer 2024-04-21 19:32:49 +02:00
Kenneth Myhra
a3661fd7f2 LibWeb: Let queue_global_task() take a JS::HeapFunction
Changes the signature of queue_global_task() from AK:Function to
JS::HeapFunction to be more clear to the user of the function that this
is what it uses internally.
2024-04-20 18:11:01 +02:00
Andreas Kling
53d0dd4a2e LibJS+LibWeb: Use new Cell::Visitor helpers to avoid manual iteration 2024-04-16 07:40:01 +02:00
Shannon Booth
51a52a867c LibWeb: Use "current high resolution time" AO where relevant
And updating some spec comments to latest spec where it is not relevant.
2024-04-12 09:08:46 +02:00
Andreas Kling
870a954e11 LibWeb: Implement Element.outerHTML
This piggybacks on the same fragment serialization code that innerHTML
uses, but instead of constructing an imaginary parent element like the
spec asks us to, we just add a separate serialization mode that includes
the context element in the serialized markup.

This makes the image carousel on https://utah.edu/ show up :^)
2024-04-09 18:17:14 -04:00
Andreas Kling
0412e17bac LibWeb: Factor out attribute serialization into a separate function 2024-04-09 18:17:14 -04:00
Matthew Olsson
ff00d21d58 Everywhere: Mark a bunch of function parameters as NOESCAPE
This fixes the relevant warnings when running LibJSGCVerifier. Note that
the analysis is only performed over LibJS-adjacent code, but could be
performed over the entire codebase. That will have to wait for a future
commit.
2024-04-09 09:10:44 +02:00
Timothy Flynn
48fb343230 LibWeb: Change HTMLParser's factory to accept the encoding as StringView
No need to force an allocation. This makes a future patch a bit simpler,
where we will have the encoding as a String. With this patch, we won't
have to convert it to a ByteString.
2024-04-04 11:23:21 +02:00
Timothy Flynn
feddecde5b LibWeb: Emit the current token before EOF on invalid comments
The spec for each of these state:

    -> EOF:
    This is an eof-in-comment parse error. Emit the current comment
    token. Emit an end-of-file token.

We were neglecting to emit the current comment token before emitting an
EOF token. Note the existing EMIT_CURRENT_TOKEN macro was unused.
2024-03-23 20:58:31 +01:00
Shannon Booth
e800605ad3 AK+LibURL: Move AK::URL into a new URL library
This URL library ends up being a relatively fundamental base library of
the system, as LibCore depends on LibURL.

This change has two main benefits:
 * Moving AK back more towards being an agnostic library that can
   be used between the kernel and userspace. URL has never really fit
   that description - and is not used in the kernel.
 * URL _should_ depend on LibUnicode, as it needs punnycode support.
   However, it's not really possible to do this inside of AK as it can't
   depend on any external library. This change brings us a little closer
   to being able to do that, but unfortunately we aren't there quite
   yet, as the code generators depend on LibCore.
2024-03-18 14:06:28 -04:00
Andreas Kling
b98a2be96b LibWeb: Ignore window-forwarded document.body.onfoo in detached DOM
Normally, assigning to e.g document.body.onload will forward to
window.onload. However, in a detached DOM tree, there is no associated
window, so we have nowhere to forward to, making this a no-op.

The bulk of this change is making Document::window() return a nullable
pointer, as documents created by DOMParser or DOMImplementation do not
have an associated window object, and so must be able to return null
from here.
2024-03-11 18:29:10 +01:00
Shannon Booth
9ce8189f21 Everywhere: Use unqualified AK::URL
Now possible in LibWeb now that there is no longer a Web::URL.
2024-02-25 08:54:31 +01:00
Timothy Flynn
af57bd5cca LibWeb: Stop parsing after document.write at the insertion point
If a call to `document.write` inserts an incomplete HTML tag, e.g.:

    document.write("<p");

we would previously continue parsing the document until we reached a
closing angle bracket. However, the spec states we should stop once we
reach the new insertion point.
2024-02-20 17:04:36 +01:00
Timothy Flynn
64dcd3f1f4 LibWeb: Restore the previous tokenizer iterator after inserting input
Otherwise, m_prev_utf8_iterator still points at the old source.
2024-02-20 17:04:36 +01:00
Timothy Flynn
fcf83a8ed0 LibWeb: Allocate fewer strings during document.write 2024-02-20 17:04:36 +01:00
Bastiaan van der Plaat
a681429dff LibWeb: Remove DOM element deprecated_get_attribute() 2024-01-19 13:12:54 -07:00
Shannon Booth
4135c3885c LibWeb: Only wait for document to be ready for scripts if executing one
HTML fragments are parsed with a temporary HTML document that never has
its flag set to say that it is ready to have scripts executed. For these
fragments, in the HTMLParser, these scripts are prepared, but
execute_script is never called on them.

This results in the HTMLParser waiting forever on the document to be
ready to have scripts executed.

To fix this, only wait for the document to be ready if we are definitely
going to execute a script.

This fixes a hang processing the HTML in the attached test, as seen on:
https://github.com/SerenityOS/serenity

Fixes: #22735
2024-01-14 11:27:58 +00:00
MacDue
fc41c282ec LibWeb: Fix utf16-be check in HTMLEncodingDetection
The utf-16be check mistakenly skipped index 3, so was not checking the
correct bytes. This meant UTF16-BE files could fail to decode.
2024-01-08 23:35:09 +01:00
MacDue
5e973fca0b LibWeb: Prevent OOB access in HTMLEncodingDetection for input of '</'
Previously, this never checked if `position + 2` was valid. This
slightly reorders the loop so all indices are checked.

Fixes #22163
2024-01-08 23:35:09 +01:00
Aliaksandr Kalenik
07928129dd LibWeb: Wait until new document becomes active before running scripts
Fixes https://github.com/SerenityOS/serenity/issues/22485

With this change WebContent does not crash when `location.reload()` is
invoked but `Navigable::reload()` still not working because of spec
issue (https://github.com/whatwg/html/issues/9869) so we can't add a
test yet.
2023-12-30 19:32:31 +01:00
Andreas Kling
9ce267944c LibWeb: Fix crash in HTML encoding detection when handling non-ASCII
The fix here was to stop using StringBuilder::append(char) when told to
append a code point, and switch to StringBuilder::append_code_point(u32)

There's probably a bunch more issues like this, and we should stop using
append(char) in general since it allows building of garbage strings.
2023-12-30 13:49:50 +01:00
Andreas Kling
83f43310fa LibWeb: Add spec comments and fixups to "get an attribute" prescan algo
In particular, make some minor adjustments so it flows a little more
like the spec.
2023-12-30 13:49:50 +01:00
Sam Atkins
6ffda5f271 LibWeb: Make HTMLParser::the_end() callable from outside
This is a little awkward: The spec requires when loading media documents
or ones that don't have a DOM, that we "act as if the user agent had
stopped parsing document" which means following this algorithm. Only a
few steps require an HTMLParser, but those that do, involve reaching
into its internals. The simplest solution I could think of (other than
duplicating this fairly hefty function) is making it static and taking
a Document and optional HTMLParser as parameters.
2023-12-26 18:35:29 +01:00
Shannon Booth
e2e7c4d574 Everywhere: Use to_number<T> instead of to_{int,uint,float,double}
In a bunch of cases, this actually ends up simplifying the code as
to_number will handle something such as:

```
Optional<I> opt;
if constexpr (IsSigned<I>)
    opt = view.to_int<I>();
else
    opt = view.to_uint<I>();
```

For us.

The main goal here however is to have a single generic number conversion
API between all of the String classes.
2023-12-23 20:41:07 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Bastiaan van der Plaat
b439431488 LibWeb: Allow hr elements in select and optgroup elements 2023-12-09 22:06:20 +01:00
Shannon Booth
f976ec005c LibWeb: Port DOM::Document from DeprecatedString 2023-12-02 22:54:53 +01:00
Sam Atkins
6c5450f9ce LibWeb: Report if anything is delaying load event, not the count
Some elements that delay the load event are more complicated than a
simple count will allow for. We'll implement those in a bit!
2023-12-01 10:28:02 +01:00
Andreas Kling
bfd354492e LibWeb: Put most LibWeb GC objects in type-specific heap blocks
With this change, we now have ~1200 CellAllocators across both LibJS and
LibWeb in a normal WebContent instance.

This gives us a minimum heap size of 4.7 MiB in the scenario where we
only have one cell allocated per type. Of course, in practice there will
be many more of each type, so the effective overhead is quite a bit
smaller than that in practice.

I left a few types unconverted to this mechanism because I got tired of
doing this. :^)
2023-11-19 22:00:48 +01:00
Shannon Booth
87a4a5b302 LibWeb: Remove FIXMe's for HTML attribute serialization steps
As far as I can tell all of these steps are just equivalent to using the
qualified name. Add some tests which cover some of these cases, and
remove the FIXME's.
2023-11-11 08:50:25 +01:00
Shannon Booth
96fc1741b5 LibWeb: Return an Optional<String> from HTMLToken::attribute
Move away from using a nullable StringView.
2023-11-11 08:50:25 +01:00
Shannon Booth
72bb928dd8 LibWeb: Add spec comments to HTMLParser::handle_in_body
I have been going down into a bit of a rabbit hole trying to figure out
why the namespace is not getting set up properly on certain attributes.
At one stage, I thought the issue might have been around here where
attributes were being adjusted (it is not). I started adding spec
comments to understand what was happening, and by the time I realised it
wasn't in this place, I was already in too deep!

Add a whole bunch of spec comments, and leave one or two minor FIXME's
where the spec seems to have changed since this was originally
implemented.
2023-11-11 08:50:25 +01:00
Shannon Booth
a8fd4fab00 LibWeb: Port HTMLParser::serialize_html_fragment from DeprecatedString 2023-11-11 08:50:25 +01:00
Shannon Booth
326b34c7c7 LibWeb: Port all callers of Element::namespace to Element::namespace_uri
Removing some more use of DeprecatedFlyString
2023-11-06 11:37:08 +01:00
Shannon Booth
c8a4fc6c1a LibWeb: Port HTML parser quirk public IDs to StringView
These were DeprecatedFlyStrings, but had no reason to be. We were not
making use of the O(1) lookup, so instead of porting it over to a
FlyString, just make it a StringView.
2023-11-06 11:37:08 +01:00