This patch implements the HTML specification's "encoding sniffing
algorithm", which is used when no encoding can be obtained from the
Content-Type header (either because it doesn't contain a charset=...)
value or the file has not been opened via HTTP (as with local files).
It also modifies the creator of the HTMLDocumentParser to use the new
HTMLDocumentParser::create_with_uncertain_encoding static method, which
runs the encoding sniffing algorithm before instantiating the parser.
This now allows us to load local HTML pages (or remote pages without a
charset specified in the 'Content-Type' header) with a non-UTF-8
encoding such as 'windows-1252'. This would previously crash the
browser. :^)
This patch changes get_standardized_encoding to use an Optional<String>
return type instead of just returning the null string when unable to
match the provided encoding to one of the canonical encoding names.
This is part of an effort to move away from using null strings towards
explicitly using Optional<String> to indicate that the String may not
have a value.
The Acid1 test has a bit of an unusual background - the html and body
tags have different background colors. Our painting order of the DOM was
such that the body background was painted first, then all other elements
were painted in-phase according to Appendix E of CSS 2.1. So the html
element's background color was painted over the body background.
This removes the special handling of the body background from
InitialContainingBlockBox and now all boxes are painted in-phase. Doing
this also exposed that we weren't handling Section 2.11.2 of the spec;
when the html background is unset, the body's background should be
propagated to the html element.
* tBodies - returns a HTMLCollection of all tbody elements
* createTBody - If necessary, creates a new tbody element
and add it to the table after the last tbody element
* tFoot - Getter for the tfoot element
The setter is not currently implemented
* createTFoot - If necessary, creates a new tfoot element
and add it to the table after any tbody elements
* deleteTFoot - If a tfoot element exists in the table, delete it
* tHead - Getter for the thead element
The setter is not currently implemented
* createTHead - If necessary, creates a new thead element
and add it to the table after any caption or colgroup elements,
but before anything else
* deleteTHead - If a thead element exists in the table, delete it
* caption - Getter and setter for the caption element
* createCaption - If necessary, creates a new caption element
and add it to the table
* deleteCaption - If a caption element exists in the table, delete it
rows returns a HTMLCollection of all the tr elements contained within
the table.
We leave the SameObject attribute off the attribute in the IDL as we
cannot currently return the same HTMLCollection every time (see the
FIXME on DOM::Document::applets)
The WrapperGenerator currently does not correctly handle the default
value for the type long on insertRow. Currently not specifying the
index will insert a row at index 0.
A Frame now knows about its nesting-level.
The FrameLoader checks whether the recursion level of the current
frame allows it to be displayed and if not doesn't even load the
requested resource.
The nesting-check is done on a per-URL-basis, so there can be many many
nested Frames as long as they have different URLs.
If there are however Frames with the same URL nested inside each other
we only allow this to happen 3 times.
This mitigates infinetely recursing <iframe>s in an HTML-document
crashing the browser with an OOM.
This commit unifies methods and method/param names between the above
classes, as well as adds [[nodiscard]] and ALWAYS_INLINE where
appropriate. It also renamed the various move_by methods to
translate_by, as that more closely matches the transformation
terminology.
We had some inconsistencies before:
- Sometimes "The", sometimes "the"
- Sometimes trailing ".", sometimes no trailing "."
I picked the most common one (lowecase "the", trailing ".") and applied
it to all copyright headers.
By using the exact same string everywhere we can ensure nothing gets
missed during a global search (and replace), and that these
inconsistencies are not spread any further (as copyright headers are
commonly copied to new files).
The WebSocket bindings match the original specification from the
WHATWG living standard, but do not match the later update of the
standard that involves FETCH. The FETCH update will be handled later
since the changes would also affect XMLHttpRequest.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
HTMLInputElement now inherits from FormAssociatedElement, which will
be a common base for the handful of elements that need to track their
owner form (and register with it for the form.elements collection.)
At the moment, the owner form is assigned during DOM insertion/removal
of an HTMLInputElement. I didn't implement any of the legacy behaviors
defined by the HTML parsing spec yet.
Completing an empty URL string from the document base URL will just
return the document URL, so any document that had an "<iframe>"
would endlessly load itself in recursive iframes.
This warning informs of float-to-double conversions. The best solution
seems to be to do math *either* in 32-bit *or* in 64-bit, and only to
cross over when absolutely necessary.
This required changing the load_sync API to take a LoadRequest instead
of just a URL. Since HTMLScriptElement was the only (non-test) user of
this API, it didn't seem useful to instead add an overload of load_sync
for this.
As defined by the specification (and used by the website i am testing):
interface mixin CanvasDrawPath {
undefined fill(optional CanvasFillRule fillRule = "nonzero");
}
Previously we didn't check if we could insert the element in the
adjusted insertion location's parent.
Also makes the return type NonnullRefPtr, as that's what element is.
This is because it includes the initial node that the function was
called on, which makes it "inclusive" as according to the spec.
This is important as there are non-inclusive variants, particularly
used in the node mutation algorithms.
Also updates the "inserted_into" function as per the previous commit.
Changes the FIXME, as according to the spec there is no notification
system to be notified of things such as the node becoming connected.
Instead, "becomes connected" means when the insertion steps are run,
the element is now connected when it previously wasn't.
https://html.spec.whatwg.org/multipage/infrastructure.html#becomes-connected
This is done in this PR because the insertion steps are run when the
start tag is inserted. This made it try to prepare the script too early
for inline scripts.
The order of operations in the HTML document parser ensures that
the parser document is set before the insertion steps are run.
This particularly affects the insertion steps and the removed steps.
The insertion steps no longer take into the parent that the node
was inserted to, as per the spec. Due to this, I have renamed the
function from "inserted_into" to simply "inserted". None of the
users of the insertion steps was using it anyway.
The removed steps now take a pointer to the old parent instead of
a reference. This is because it is optional according to the spec
and old parent is null when running the removal steps for the
descendants of a node that just got removed.
This commit does not affect HTMLScriptElement as there is a bit
more to that, which is better suited for a separate commit.
Also adds in the adopted steps as they will be used later.
The HTML <label> element is special in that it may be associated with
some other <input> element. When the label element is clicked, the input
element should be activated.
To achieve this, a LableableNode base class is introduced to provide an
interface for "labelable" elements to handle mouse events on their
associated labels. This not only allows clicking the label to activate
the input, but dragging the mouse from the label to the input (and vice-
versa) while the mouse is clicked will also active the label.
As of this commit, this infrastructure is not hooked up to any elements.
A FrameHostElement is an HTML element (<frame> or <iframe>) that may
have a content frame that participates in the frame tree.
This basically just moves code from <iframe> to a separate base class
so we can share it with <frame> once we implement <frame>.
The previous names (RGBA32 and RGB32) were misleading since that's not
the actual byte order in memory. The new names reflect exactly how the
color values get laid out in bitmap data.
There's a bit more nuance to how this should really work, but let's at
least make sure we execute <script> elements if you insert them into
the document.