The HTML tokenizer specification says that we're supposed to do this
when leaving the Attribute name or when emitting the token, as
appropriate.
Hopefully 'as appropriate' can mean only when emitting the token, as
that's the easiest place to insert this logic without complicating the
tokenizer any more.
This fixes an issue where document.write() with only text input would
leave all the character data as unflushed text in the parser.
This fixes many of the WPT tests for document.write().
And add tests! This implementation closely follows the current C++
implementation, replacing macros and gotos with a slightly more
complex state machine. It's very possible that an async version that
yields tokens on "emit" would be even simpler, but let's get this
one working first :).
Also give the Swift.String init routines an explict label when
constructing from AK String types, as this caused issues in a later
commit to have them both with `_ data`.
In particular, there was an assertion failure due to the temporary
parser document's "about base URL" being empty when trying to "parse a
URL" during parsing.
We fix this by copying the context element's document's about base URL
to the temporary parsing document while parsing a fragment.
This fixes a crash when loading search results on https://amazon.com/
At the same time, simplify CMakeLists magic for libraries that want to
include Swift code in the library. The Lib-less name of the library is
now always the module name for the library with any Swift additions,
extensions, etc. All vfs overlays now live in a common location to make
finding them easier from CMake functions. A new pattern is needed for
the Lib-less modules to re-export their Cxx counterparts.
In the HTML parser spec, there are 2 instances of the following text:
add the attribute and its corresponding value to that element
The "add the attribute" text does not have a corresponding spec link to
actually specify what to do. We currently use `set_attribute`, which can
throw an exception if the attribute name contains an invalid character
(such as '<'). Instead, switch to `append_attribute`, which allows such
attribute names. This behavior matches Firefox.
Note we cannot yet make the unclosed-html-element.html test match the
expectations of the unclosed-body-element.html due to another bug that
would prevent checking if the expected element has the right attribute.
That will be fixed in an upcoming commit.
These were being immediately stored in JS::GCPtrs (and dutifully visited
by HTMLParser), so creating temporary handles for them was a complete
waste of time.
Instead of allowing arbitrarily large values (which could eventually
overflow an i32), let's just cap them at the same limit as Firefox does.
Found by Domato.
The following command was used to clang-format these files:
clang-format-18 -i $(find . \
-not \( -path "./\.*" -prune \) \
-not \( -path "./Base/*" -prune \) \
-not \( -path "./Build/*" -prune \) \
-not \( -path "./Toolchain/*" -prune \) \
-not \( -path "./Ports/*" -prune \) \
-type f -name "*.cpp" -o -name "*.mm" -o -name "*.h")
There are a couple of weird cases where clang-format now thinks that a
pointer access in an initializer list, e.g. `m_member(ptr->foo)`, is a
lambda return statement, and it puts spaces around the `->`.
Changes the signature of queue_global_task() from AK:Function to
JS::HeapFunction to be more clear to the user of the function that this
is what it uses internally.
This piggybacks on the same fragment serialization code that innerHTML
uses, but instead of constructing an imaginary parent element like the
spec asks us to, we just add a separate serialization mode that includes
the context element in the serialized markup.
This makes the image carousel on https://utah.edu/ show up :^)
This fixes the relevant warnings when running LibJSGCVerifier. Note that
the analysis is only performed over LibJS-adjacent code, but could be
performed over the entire codebase. That will have to wait for a future
commit.
No need to force an allocation. This makes a future patch a bit simpler,
where we will have the encoding as a String. With this patch, we won't
have to convert it to a ByteString.
The spec for each of these state:
-> EOF:
This is an eof-in-comment parse error. Emit the current comment
token. Emit an end-of-file token.
We were neglecting to emit the current comment token before emitting an
EOF token. Note the existing EMIT_CURRENT_TOKEN macro was unused.
This URL library ends up being a relatively fundamental base library of
the system, as LibCore depends on LibURL.
This change has two main benefits:
* Moving AK back more towards being an agnostic library that can
be used between the kernel and userspace. URL has never really fit
that description - and is not used in the kernel.
* URL _should_ depend on LibUnicode, as it needs punnycode support.
However, it's not really possible to do this inside of AK as it can't
depend on any external library. This change brings us a little closer
to being able to do that, but unfortunately we aren't there quite
yet, as the code generators depend on LibCore.
Normally, assigning to e.g document.body.onload will forward to
window.onload. However, in a detached DOM tree, there is no associated
window, so we have nowhere to forward to, making this a no-op.
The bulk of this change is making Document::window() return a nullable
pointer, as documents created by DOMParser or DOMImplementation do not
have an associated window object, and so must be able to return null
from here.
If a call to `document.write` inserts an incomplete HTML tag, e.g.:
document.write("<p");
we would previously continue parsing the document until we reached a
closing angle bracket. However, the spec states we should stop once we
reach the new insertion point.
HTML fragments are parsed with a temporary HTML document that never has
its flag set to say that it is ready to have scripts executed. For these
fragments, in the HTMLParser, these scripts are prepared, but
execute_script is never called on them.
This results in the HTMLParser waiting forever on the document to be
ready to have scripts executed.
To fix this, only wait for the document to be ready if we are definitely
going to execute a script.
This fixes a hang processing the HTML in the attached test, as seen on:
https://github.com/SerenityOS/serenityFixes: #22735
The fix here was to stop using StringBuilder::append(char) when told to
append a code point, and switch to StringBuilder::append_code_point(u32)
There's probably a bunch more issues like this, and we should stop using
append(char) in general since it allows building of garbage strings.