And use them to highlight javascript in HTML source.
This commit also changes how TextDocumentSpan::data is interpreted,
as it used to be an opaque pointer, but everyone stuffed an enum value
inside it, which made the values not unique to each highlighter;
that field is now a u64 serial id.
The syntax highlighters don't need to change their ways of stuffing
token types into that field, but a highlighter that calls another
nested highlighter needs to register the nested types for use with
token pairs.
This patch aims to fix wrong highlighting for some cases in HTML's
syntax highlighter. The values were somewhat experimentally determined
are are subject to change. Regardless, it should be more correct with
this patch than without it. :^)
This changes the HTML SyntaxHighlighter to conform to the now-fixed
rendering of syntax highlighting spans in GUI::TextEditor. It also
avoids emitting tokens if they have a zero or negative length.
This fixes a bug where single-character tokens were not highlighted
properly.
This patch changes HTMLTokenizer::nth_last_position to not fail if the
requested position is not available. Rather, it will just return (0-0).
While this is not the correct solution, it prevents the tokenizer from
crashing just because it cannot find a source position. This should only
affect SyntaxHighlighter.
This replaces ctype.h with CharacterType.h everywhere I could find
issues with narrowing conversions. While using it will probably make
sense almost everywhere in the future, the most critical places should
have been addressed.
We currently only support application/x-www-form-urlencoded for
form submissions, which uses a special percent encode set when
percent encoding the body/query. However, we were not using this
percent encode set.
With the new URL implementation, we can now specify the percent encode
set to be used, allowing us to use this special percent encode set.
This is one of the fixes needed to make the Google cookie consent work.
Our "frame" concept very closely matches what the web specs call a
"browsing context", so let's rename it to that. :^)
The "main frame" becomes the "top-level browsing context",
and "sub-frames" are now "nested browsing contexts".
Instead of being its own separate unrelated class.
This automatically makes typed array properties available to it,
as well as making it available to the runtime.
skip() is supposed to end up keeping the previous iterator only one
index behind the current one, and restore_to() should actually do the
restore instead of just removing the now-useless source positions.
Fixes#7331.
This patch implements the HTML specification's "encoding sniffing
algorithm", which is used when no encoding can be obtained from the
Content-Type header (either because it doesn't contain a charset=...)
value or the file has not been opened via HTTP (as with local files).
It also modifies the creator of the HTMLDocumentParser to use the new
HTMLDocumentParser::create_with_uncertain_encoding static method, which
runs the encoding sniffing algorithm before instantiating the parser.
This now allows us to load local HTML pages (or remote pages without a
charset specified in the 'Content-Type' header) with a non-UTF-8
encoding such as 'windows-1252'. This would previously crash the
browser. :^)
This patch changes get_standardized_encoding to use an Optional<String>
return type instead of just returning the null string when unable to
match the provided encoding to one of the canonical encoding names.
This is part of an effort to move away from using null strings towards
explicitly using Optional<String> to indicate that the String may not
have a value.
The Acid1 test has a bit of an unusual background - the html and body
tags have different background colors. Our painting order of the DOM was
such that the body background was painted first, then all other elements
were painted in-phase according to Appendix E of CSS 2.1. So the html
element's background color was painted over the body background.
This removes the special handling of the body background from
InitialContainingBlockBox and now all boxes are painted in-phase. Doing
this also exposed that we weren't handling Section 2.11.2 of the spec;
when the html background is unset, the body's background should be
propagated to the html element.
* tBodies - returns a HTMLCollection of all tbody elements
* createTBody - If necessary, creates a new tbody element
and add it to the table after the last tbody element
* tFoot - Getter for the tfoot element
The setter is not currently implemented
* createTFoot - If necessary, creates a new tfoot element
and add it to the table after any tbody elements
* deleteTFoot - If a tfoot element exists in the table, delete it
* tHead - Getter for the thead element
The setter is not currently implemented
* createTHead - If necessary, creates a new thead element
and add it to the table after any caption or colgroup elements,
but before anything else
* deleteTHead - If a thead element exists in the table, delete it
* caption - Getter and setter for the caption element
* createCaption - If necessary, creates a new caption element
and add it to the table
* deleteCaption - If a caption element exists in the table, delete it
rows returns a HTMLCollection of all the tr elements contained within
the table.
We leave the SameObject attribute off the attribute in the IDL as we
cannot currently return the same HTMLCollection every time (see the
FIXME on DOM::Document::applets)
The WrapperGenerator currently does not correctly handle the default
value for the type long on insertRow. Currently not specifying the
index will insert a row at index 0.
A Frame now knows about its nesting-level.
The FrameLoader checks whether the recursion level of the current
frame allows it to be displayed and if not doesn't even load the
requested resource.
The nesting-check is done on a per-URL-basis, so there can be many many
nested Frames as long as they have different URLs.
If there are however Frames with the same URL nested inside each other
we only allow this to happen 3 times.
This mitigates infinetely recursing <iframe>s in an HTML-document
crashing the browser with an OOM.
This commit unifies methods and method/param names between the above
classes, as well as adds [[nodiscard]] and ALWAYS_INLINE where
appropriate. It also renamed the various move_by methods to
translate_by, as that more closely matches the transformation
terminology.
We had some inconsistencies before:
- Sometimes "The", sometimes "the"
- Sometimes trailing ".", sometimes no trailing "."
I picked the most common one (lowecase "the", trailing ".") and applied
it to all copyright headers.
By using the exact same string everywhere we can ensure nothing gets
missed during a global search (and replace), and that these
inconsistencies are not spread any further (as copyright headers are
commonly copied to new files).
The WebSocket bindings match the original specification from the
WHATWG living standard, but do not match the later update of the
standard that involves FETCH. The FETCH update will be handled later
since the changes would also affect XMLHttpRequest.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
HTMLInputElement now inherits from FormAssociatedElement, which will
be a common base for the handful of elements that need to track their
owner form (and register with it for the form.elements collection.)
At the moment, the owner form is assigned during DOM insertion/removal
of an HTMLInputElement. I didn't implement any of the legacy behaviors
defined by the HTML parsing spec yet.
Completing an empty URL string from the document base URL will just
return the document URL, so any document that had an "<iframe>"
would endlessly load itself in recursive iframes.
This warning informs of float-to-double conversions. The best solution
seems to be to do math *either* in 32-bit *or* in 64-bit, and only to
cross over when absolutely necessary.