Commit graph

1180 commits

Author SHA1 Message Date
Max Wipfli
15d8635afc LibWeb: User getter+setter for HTMLToken tag name and self-closing flag 2021-07-17 16:24:57 +04:30
Max Wipfli
1aeafcc58b LibWeb: Use getter and setter for Character type HTMLTokens
While storing the code point in a UTF-8 encoded String in horrendously
inefficient, this problem will be addressed at a later stage.
2021-07-17 16:24:57 +04:30
Max Wipfli
e8e9426b4f LibWeb: User getter and setter for Comment type HTMLTokens 2021-07-17 16:24:57 +04:30
Max Wipfli
f886aa15b8 LibWeb: Rename HTMLToken::AttributeBuilder struct to Attribute
This does not contain StringBuilders anymore, so it can do with a
simpler name: Attribute.
2021-07-17 16:24:57 +04:30
Max Wipfli
d82f3eb085 LibWeb: Make HTMLToken::{Position,AttributeBuilder} structs public
There was and is no reason for those to be private. Making them public
also allows us to explicitly specify the return type of some getters.
2021-07-17 16:24:57 +04:30
Max Wipfli
e22a34badb LibWeb: Fix assertion failures in HTMLTokenizer
The *TagName states are all very similar, so it seems to be correct to
apply the fix from #8761 to all of those states.

This fixes #8788.
2021-07-16 11:55:55 +02:00
Max Wipfli
2404ad6897 LibWeb: Fix assertion failure when tokenizing JS regex literals
This fixes parsing the following regular expression: /</g;

It also adds a simple script element to the HTMLTokenizer regression
test, which also contains that specific regex.
2021-07-15 01:47:22 +02:00
Max Wipfli
bb2aed7d76 LibWeb: Correct behavior of Comment* states in HTMLTokenizer
Previously, this would lead to assertion failures when parsing HTML
comments. This fixes #8757.
2021-07-15 00:48:45 +02:00
Max Wipfli
af0b483123 LibWeb: VERIFY an empty builder when emitting tokens in HTMLTokenizer 2021-07-15 00:48:45 +02:00
Max Wipfli
045a6a566b LibWeb: Remove unused HTMLTokenizer::m_input member variable 2021-07-14 23:03:36 +02:00
Max Wipfli
35f32ac170 LibWeb: Change HTMLToken.h to east const style 2021-07-14 23:03:36 +02:00
Max Wipfli
125982943a LibWeb: Change HTMLTokenizer.{cpp,h} to east const style 2021-07-14 23:03:36 +02:00
Gunnar Beutner
300823c314 LibWeb: Use move() when enqueuing tokens in HTMLTokenizer
We're not using the current token anymore once it's enqueued so let's
use move() when enqueuing the tokens.
2021-07-14 23:03:36 +02:00
Gunnar Beutner
c3ad8e9a52 LibWeb: Remove StringBuilder from HTMLToken::m_comment_or_character 2021-07-14 23:03:36 +02:00
Gunnar Beutner
3aa202c432 LibWeb: Remove StringBuilder from HTMLToken::m_tag 2021-07-14 23:03:36 +02:00
Gunnar Beutner
901d71148b LibWeb: Remove StringBuilders from HTMLToken::AttributeBuilder 2021-07-14 23:03:36 +02:00
Gunnar Beutner
992964aa7d LibWeb: Remove StringBuilders from HTMLToken::m_doctype 2021-07-14 23:03:36 +02:00
Gunnar Beutner
2150609590 LibWeb: Remove more unused StringBuilders in HTMLToken
These fields aren't read anywhere but I didn't feel like removing
them outright.
2021-07-14 23:03:36 +02:00
Gunnar Beutner
d9e52997e2 LibWeb: Use an Optional<String> to track the last HTML start tag
Using an HTMLToken object here is unnecessary because the only
attribute we're interested in is the tag_name.
2021-07-14 23:03:36 +02:00
Luke
e9eae9d880 LibWeb: Add extracting character encoding from a meta content attribute
Some Gmail emails contain this.
2021-07-13 20:23:44 +02:00
Adam Hodgen
3e46e8fea8 LibWeb: Fix HTMLTable Element attributes
`Element::tag_name` return an uppercase version of the tag name. However
the `Web::HTML::TagNames` values are all lowercase.

This change fixes that using `Element::local_name`, which returns a
lowercase value.
2021-07-11 14:14:01 +02:00
Luke
a826df773e LibWeb: Make WrapperGenerator generate nullable wrapper types
Previously it was not doing so, and some code relied on this not being
the case.

In particular, set_caption, set_t_head and set_t_foot in
HTMLTableElement relied on this. This commit is not here to fix this,
so I added an assertion to make it equivalent to a reference for now.
2021-07-05 12:39:46 +02:00
Luke
62c015dc96 LibWeb: Implement the adoption steps for <template> elements
While I'm here with the cloning steps, let's implement this too.
2021-07-05 12:39:46 +02:00
Luke
a7fa757dd1 LibWeb: Implement the cloning steps for <template> elements 2021-07-05 12:39:46 +02:00
Luke
f7ad8c0f94 LibWeb: Add DOMParser
This allows you to invoke the HTML document parser and retrieve a
document as though it was loaded as a web page, minus any scripting
ability.

This does not currently support XML parsing.

This is used by YouTube (or more accurately, Web Components Polyfills)
to polyfill templates.
2021-07-05 12:39:46 +02:00
Luke
0ea50d44bf LibWeb: Check if scripting is disabled before running script
This is not a full check, it's just enough to prevent script execution
in DOMParser.
2021-07-05 12:39:46 +02:00
Andreas Kling
c8270dbe2e LibJS: Rename ScriptFunction => OrdinaryFunctionObject
These are basically what the spec calls "ordinary function objects",
so let's have the name reflect that. :^)
2021-06-27 22:36:04 +02:00
Andreas Kling
ba9d5c4d54 LibJS: Rename Function => FunctionObject 2021-06-27 22:36:04 +02:00
Andreas Kling
ee3a73ddbb AK: Rename downcast<T> => verify_cast<T>
This makes it much clearer what this cast actually does: it will
VERIFY that the thing we're casting is a T (using is<T>()).
2021-06-24 19:57:01 +02:00
Andreas Kling
dc65f54c06 AK: Rename Vector::append(Vector) => Vector::extend(Vector)
Let's make it a bit more clear when we're appending the elements from
one vector to the end of another vector.
2021-06-12 13:24:45 +02:00
Ali Mohammad Pur
8b3f8879c1 LibJS: Use an enum class instead of 'bool is_generator'
This avoid confusion in the order of the multiple boolean parameters
that exist.
2021-06-11 19:42:58 +04:30
Ali Mohammad Pur
3234697eca LibJS: Implement generator functions (only in bytecode mode) 2021-06-11 00:30:09 +02:00
Ali Mohammad Pur
71b4433b0d LibWeb+LibSyntax: Implement nested syntax highlighters
And use them to highlight javascript in HTML source.
This commit also changes how TextDocumentSpan::data is interpreted,
as it used to be an opaque pointer, but everyone stuffed an enum value
inside it, which made the values not unique to each highlighter;
that field is now a u64 serial id.
The syntax highlighters don't need to change their ways of stuffing
token types into that field, but a highlighter that calls another
nested highlighter needs to register the nested types for use with
token pairs.
2021-06-07 14:45:49 +04:30
Max Wipfli
282a623853 LibWeb: Change a few source end positions in HTMLTokenizer
This patch aims to fix wrong highlighting for some cases in HTML's
syntax highlighter. The values were somewhat experimentally determined
are are subject to change. Regardless, it should be more correct with
this patch than without it. :^)
2021-06-05 00:32:28 +04:30
Max Wipfli
44c438d0ca LibWeb: Fix off-by-one error in SyntaxHighlighter
This changes the HTML SyntaxHighlighter to conform to the now-fixed
rendering of syntax highlighting spans in GUI::TextEditor. It also
avoids emitting tokens if they have a zero or negative length.

This fixes a bug where single-character tokens were not highlighted
properly.
2021-06-05 00:32:28 +04:30
Max Wipfli
932161e581 LibWeb: Be more forgiving when adding source positions in HTMLTokenizer
This patch changes HTMLTokenizer::nth_last_position to not fail if the
requested position is not available. Rather, it will just return (0-0).

While this is not the correct solution, it prevents the tokenizer from
crashing just because it cannot find a source position. This should only
affect SyntaxHighlighter.
2021-06-05 00:32:28 +04:30
Max Wipfli
93d830b5cc LibWeb: Add debugging statements in SyntaxHighlighter
This also changes SyntaxHighlighter.{h,cpp} to use east const style.
2021-06-05 00:32:28 +04:30
Max Wipfli
bc8d16ad28 Everywhere: Replace ctype.h to avoid narrowing conversions
This replaces ctype.h with CharacterType.h everywhere I could find
issues with narrowing conversions. While using it will probably make
sense almost everywhere in the future, the most critical places should
have been addressed.
2021-06-03 13:31:46 +02:00
Luke
70a575d75f LibWeb: Use correct percent encode set for form submissions
We currently only support application/x-www-form-urlencoded for
form submissions, which uses a special percent encode set when
percent encoding the body/query. However, we were not using this
percent encode set.

With the new URL implementation, we can now specify the percent encode
set to be used, allowing us to use this special percent encode set.

This is one of the fixes needed to make the Google cookie consent work.
2021-06-01 23:26:03 +04:30
Andreas Kling
407d6cd9e4 AK: Rename Utf8CodepointIterator => Utf8CodePointIterator 2021-06-01 09:45:52 +02:00
Luke
59cfc4a8db LibWeb: Rename "FrameHostElement" to "BrowsingContextContainer"
With the renaming of "Frame" to "BrowsingContext", this changes
"FrameHostElement" to "BrowsingContextContainer" to further
match the spec.

https://html.spec.whatwg.org/#browsing-context-container
2021-05-31 16:25:13 +02:00
Andreas Kling
4190fd2199 LibWeb: Rename Web::Frame to Web::BrowsingContext
Our "frame" concept very closely matches what the web specs call a
"browsing context", so let's rename it to that. :^)

The "main frame" becomes the "top-level browsing context",
and "sub-frames" are now "nested browsing contexts".
2021-05-30 12:39:53 +02:00
Ali Mohammad Pur
6af596d9e8 LibJS+LibWeb: Make Uint8ClampedArray use TypedArray
Instead of being its own separate unrelated class.
This automatically makes typed array properties available to it,
as well as making it available to the runtime.
2021-05-26 15:34:13 +04:30
Tobias Christiansen
9a1ac662f1 LibWeb: Don't try to load anything if src is empty in <img>
Previously we tried to load "" if the src was present but empty and
essentially only waited for RequestServer to time out.
2021-05-24 17:45:09 +01:00
Andreas Kling
81641ee469 LibWeb: Make tag names bold in syntax-highlighted HTML :^) 2021-05-21 15:32:53 +02:00
Ali Mohammad Pur
1822d6b8ac LibWeb: Fix invalid behaviour of HTMLTokenizer::skip() and restore_to()
skip() is supposed to end up keeping the previous iterator only one
index behind the current one, and restore_to() should actually do the
restore instead of just removing the now-useless source positions.
Fixes #7331.
2021-05-21 09:22:35 +02:00
Ali Mohammad Pur
97a230e4ef LibWeb: Add a super basic HTML syntax highlighter
This can currently highlight tag names and attribute names/values.
2021-05-20 22:06:45 +02:00
Ali Mohammad Pur
aa7939bc6c LibWeb: Add position tracking information to HTML tokens 2021-05-20 22:06:45 +02:00
Max Wipfli
f808279769 LibWeb: Implement encoding sniffing algorithm
This patch implements the HTML specification's "encoding sniffing
algorithm", which is used when no encoding can be obtained from the
Content-Type header (either because it doesn't contain a charset=...)
value or the file has not been opened via HTTP (as with local files).

It also modifies the creator of the HTMLDocumentParser to use the new
HTMLDocumentParser::create_with_uncertain_encoding static method, which
runs the encoding sniffing algorithm before instantiating the parser.

This now allows us to load local HTML pages (or remote pages without a
charset specified in the 'Content-Type' header) with a non-UTF-8
encoding such as 'windows-1252'. This would previously crash the
browser. :^)
2021-05-18 21:02:07 +02:00
Max Wipfli
d325403cb5 LibTextCodec: Use Optional<String> for get_standardized_encoding
This patch changes get_standardized_encoding to use an Optional<String>
return type instead of just returning the null string when unable to
match the provided encoding to one of the canonical encoding names.

This is part of an effort to move away from using null strings towards
explicitly using Optional<String> to indicate that the String may not
have a value.
2021-05-18 21:02:07 +02:00