Commit graph

3218 commits

Author SHA1 Message Date
Andreas Kling
b6288163f1 LibWeb: Make the new HTML parser parse input as UTF-8
We already convert the input to UTF-8 before starting the tokenizer,
so all this patch had to do was switch the tokenizer to use an Utf8View
for its input (and to emit 32-bit codepoints.)
2020-06-04 21:12:17 +02:00
Andreas Kling
19190267a6 LibWeb: Fix incorrectly consumed characters after reference tokens
The NumericCharacterReferenceEnd tokenizer state should not advance
the input stream.
2020-06-04 16:49:21 +02:00
Andreas Kling
959de19418 LibWeb: Process style sheets in document order
Until now we would simply apply stylesheets in the order they finished
loading. This patch adds a StyleSheetList object that hangs off of each
Document and contains all the style sheets in document order.

There's still a lot of work to do for a proper cascade, but at least
this makes us consistently wrong every time. :^)
2020-06-04 16:06:32 +02:00
Andreas Kling
ec1891837f LibWeb: Fix <body> and <img> elements not parsing their class attribute
Subclasses that override Element::parse_attribute() must always call to
base class since otherwise we might forget to parse some attributes.

This makes class selectors work on <body> and <img> elements. :^)
2020-06-04 16:04:52 +02:00
Andreas Kling
ca33bc7895 LibWeb: Fix tokenization of attributes with empty attributes
We were neglecting to emit start tags for tags where the last attribute
had no value.

Also fix a parse error TODO that I hit while looking at this.
2020-06-04 12:00:09 +02:00
Kyle McLean
b9549078cc LibWeb: Handle "html" end tag during "in body" 2020-06-04 09:09:33 +02:00
Kyle McLean
a3bf3a5d68 LibWeb: Handle "xmp" start tag during "in body" 2020-06-04 09:09:33 +02:00
Kyle McLean
c70bd0ba58 LibWeb: Handle "nobr" start tag during "in body" 2020-06-04 09:09:33 +02:00
Kyle McLean
22521e57fd LibWeb: Handle "form" end tag during "in body" if stack of open elements does not contain "template" 2020-06-04 09:09:33 +02:00
Kyle McLean
4edd0643a6 LibWeb: Handle NULL character during "in body" 2020-06-04 09:09:33 +02:00
Kyle McLean
5e3972a946 LibWeb: Parse "body" end tags during "in body" 2020-06-04 09:09:33 +02:00
Kyle McLean
1ad81e4833 LibWeb: Parse "br" end tags during "in body" 2020-06-04 09:09:33 +02:00
Kyle McLean
9fca4b56d3 LibWeb: Parse end tags for "applet", "marquee", and "object" during "in body" 2020-06-04 09:09:33 +02:00
Andreas Kling
3c2fbc825c LibWeb: Call children_changed() on text nodes when flushing characters
Now that we flush characters in a single place, we can call the Text's
children_changed() from there instead of having a goofy targeted hack
for <style> elements. :^)
2020-06-03 22:13:29 +02:00
Andreas Kling
21957745f7 LibWeb: Special-case initialization of HTML::AttributeNames::class_
Just do it after all the others instead of trying to be clever.
2020-06-03 22:06:52 +02:00
Andreas Kling
c40de9275a LibWeb: Buffer text node character insertions in the new parser
Instead of appending character-at-a-time, we now buffer character
insertions in a StringBuilder, and flush them to the relevant node
whenever we start inserting into a new node (and when parsing ends.)
2020-06-03 21:53:08 +02:00
Andreas Kling
2149820260 LibWeb: Use HTML::AttributeNames::foo instead of FlyString("foo")
To avoid the costly instantiation of FlyStrings whenever we're looking
up attributes, use the premade HTML::AttributeNames globals. :^)
2020-06-03 21:53:00 +02:00
Andreas Kling
b750843797 LibWeb: Remove assertion in HTMLImageElement::resource_did_load()
We might end up here with a non-null decoder if the Resource fires load
callbacks again for the resource. It's harmless since we'll just get
the same decoder again.
2020-06-02 22:05:29 +02:00
Andreas Kling
d4ddb0013c LibWeb: Share decoded images at the Resource level :^)
This patch adds ImageResource as a subclass of Resource. This new class
also keeps a Gfx::ImageDecoder so that we can share decoded bitmaps
between all clients of an image resource inside LibWeb.

With this, we now share both encoded and decoded data for images. :^)

I had to change how the purgeable-volatile flag is updated to keep the
volatile-images-outside-the-visible-viewport optimization working.
HTMLImageElement now inherits from ImageResourceClient (a subclass of
ResourceClient with additional image-specific stuff) and informs its
ImageResource about whether it's inside the viewport or outside.

This is pretty awesome! :^)
2020-06-02 20:32:38 +02:00
Andreas Kling
a3936f10eb LibWeb: Fix tokenizing scripts with '<' in them
The EMIT_CHARACTER_AND_RECONSUME_IN was emitting the current token
instead of the specified codepoint.
2020-06-02 14:27:53 +02:00
Andreas Kling
f3799b501e LibWeb: Port ImageStyleValue to the ResourceClient interface 2020-06-02 14:26:10 +02:00
Andreas Kling
ca8398bc19 LibWeb: Avoid an unnecessary temporary variable in HTMLImageElement 2020-06-02 13:51:57 +02:00
Andreas Kling
7197adbd55 LibWeb: Port HTMLLinkElement to the ResourceClient interface 2020-06-02 13:51:57 +02:00
Andreas Kling
410fa5abe0 LibWeb: Parse barebones document without doctype, <html>, etc.
Last night I tried making a little test page that had a bunch of <img>
elements and nothing else. It didn't work.

Fix this by correctly adding a synthesized <html> element to the
document if we get something else in the "before html insertion mode.
2020-06-02 08:50:33 +02:00
Andreas Kling
9170edf541 LibWeb: Protect ourselves during ResourceClient iteration
Notifying a Resource's clients may lead to arbitrary JS execution,
so we can't rely on the ResourceClient pointers remaining valid.
Use WeakPtr to avoid this problem.
2020-06-01 22:09:38 +02:00
Andreas Kling
e5ddb76a67 LibWeb: Support "td" and "th" start tags during "in table body"
This makes it possible to load Google Image Search results. You can't
see the images yet, but it's still something. :^)
2020-06-01 22:09:09 +02:00
Andreas Kling
7af337764e LibWeb: Add a naive Resource cache
This patch introduces a caching mechanism in ResourceLoader. It's keyed
on a LoadRequest object which is what you provide to load_resource()
when you want to load a resource.

We currently never prune the cache, so resources will stay in there
forever. This is obviously not gonna stay that way, but we're just
getting started here. :^)

This should drastically reduce the number of requests when loading
some sites (like Twitter) that reuse the same images over and over.
2020-06-01 21:58:29 +02:00
Andreas Kling
5ed66cb8d9 LibWeb: Start building a new Resource class to share more resources
A Resource represents a resource that we're loading, have loaded or
will soon load. Basically, it's a downloadable resource that can be
shared by multiple clients.

A typical usecase is multiple <img> elements with the same src.
In a future patch, we will try to make sure that those <img> elements
get the same Resource if possible. This will reduce network usage,
memory usage, and CPU usage. :^)

For now, this first patch simply introduces the mechanism.

You get a Resource by calling ResourceLoader::load_resource().
To get notified about changes to a Resource's load status, you inherit
from ResourceClient and implement the callbacks you're interested in.

This patch turns HTMLImageElement into a ResourceClient.
2020-06-01 21:36:43 +02:00
Andreas Kling
6ed11f1d1c LibWeb: Move ResourceLoader into a new Loader/ directory 2020-06-01 20:42:50 +02:00
Andreas Kling
77a3710e9d LibWeb: Tokenize "anything else" in CommentLessThanSignBangDashDash 2020-06-01 20:14:23 +02:00
Andreas Kling
e58e315e0f LibWeb: Make input widget (buttons, text boxes, etc) scroll with page
We now relayout all LayoutWidgets when the view is scrolled. This will
cause them to follow along with the rest of the page content.
2020-06-01 19:52:38 +02:00
Andreas Kling
8766e49a7c LibWeb+Browser: Use the new HTML parser by default
You can still run the old parser with "br -O", but the new one is good
enough to be the default parser now. We'll fix issues as we go and
eventually remove the old one completely. :^)
2020-06-01 19:08:31 +02:00
Andreas Kling
db93db8100 LibWeb: Put whining about tokenizer errors behind an #ifdef
Real web content has *tons* of tokenizer errors and we don't need to
complain every time as that makes the debug log unbearable.
2020-06-01 18:46:11 +02:00
Andreas Kling
5944abf31c LibWeb: More parser cases in the "in body" and "after after body" modes 2020-06-01 18:46:11 +02:00
Andreas Kling
a775c2c717 LibWeb: Handle more cases in the SelfClosingStartTag tokenizer state 2020-06-01 18:46:11 +02:00
Andreas Kling
8429551368 LibWeb: Implement more of the "after head" insertion mode 2020-06-01 18:46:11 +02:00
Andreas Kling
f3b09ddd8e LibWeb: Implement more of the ScriptDataEndTagName tokenizer state
Some of this is extremely repetitive. We'll need to rethink how we
do queue/emit to improve this.
2020-05-30 23:00:35 +02:00
Andreas Kling
d058addd74 LibWeb: Handle "dd" and "dt" end tags during "in body" 2020-05-30 23:00:35 +02:00
Andreas Kling
ca6fbefbc9 LibWeb: Support parsing "select" elements (outside of tables) 2020-05-30 19:58:52 +02:00
Andreas Kling
60352c7b9b LibWeb: Hack the parser to dodge <template> elements in <head> for now 2020-05-30 19:23:04 +02:00
Andreas Kling
1212485348 LibWeb: Fix typo in StackOfOpenElements::topmost_special_node_below()
Backwards iteration works better if we actually go backwards! :^)
2020-05-30 18:49:48 +02:00
Andreas Kling
ca23db10ef LibWeb: Don't crash when encountering <svg> or <math> elements
Just treat them like unknown elements for now. :^)
2020-05-30 18:46:39 +02:00
Andreas Kling
756829555a LibWeb: Parse "textarea" tags during the "in body" insertion mode
Had to handle some more cases in the tokenizer to support this.
2020-05-30 18:40:23 +02:00
Andreas Kling
f4778d1ba0 LibWeb: Add missing special tag case in the "in body" insertion mode 2020-05-30 18:26:44 +02:00
Andreas Kling
e5ec05bd3a LibWeb: Correctly determine whether a block has only inline children
There's more to life than inline-vs-block, so we have to take all the
non-block non-inline display types into account when computing whether
a block should say children_are_inline() == true.
2020-05-30 18:26:44 +02:00
Andreas Kling
5818ef2c80 LibWeb: Implement more table-related insertion modes 2020-05-30 18:26:44 +02:00
Andreas Kling
8c96b8174b LibWeb: Handle AAA situation where there's no formatting element found
In this case, we're supposed to return from the AAA and then jump to a
different behavior in the "in body" insertion mode. So now we do that.
2020-05-30 17:47:50 +02:00
Andreas Kling
c9dd459822 LibWeb: Implement some more RAWTEXT stuff in the tokenizer 2020-05-30 17:47:50 +02:00
TheDumpap
d92c9d3772 LibWeb: Implement more of the tokenizer states
Slowly adding more unimplemented options for tokenizer states.
2020-05-30 17:47:50 +02:00
Andreas Kling
f662b1ea37 LibWeb: Implement enough parsing to parse the HTML spec front page :^)
We can now actually open http://html.spec.whatwg.org/ in Browser.
2020-05-30 13:07:47 +02:00