Previously, the Object class had many different types of functions for
each action. For example: get_by_index, get(PropertyName),
get(FlyString). This is a bit verbose, so these methods have been
shortened to simply use the PropertyName structure. The methods then
internally call _by_index if necessary. Note that the _by_index
have been made private to enforce this change.
Secondly, a clear distinction has been made between "putting" and
"defining" an object property. "Putting" should mean modifying a
(potentially) already existing property. This is akin to doing "a.b =
'foo'".
This implies two things about put operations:
- They will search the prototype chain for setters and call them, if
necessary.
- If no property exists with a particular key, the put operation
should create a new property with the default attributes
(configurable, writable, and enumerable).
In contrast, "defining" a property should completely overwrite any
existing value without calling setters (if that property is
configurable, of course).
Thus, all of the many JS objects have had any "put" calls changed to
"define_property" calls. Additionally, "put_native_function" and
"put_native_property" have had their "put" replaced with "define".
Finally, "put_own_property" has been made private, as all necessary
functionality should be exposed with the put and define_property
methods.
Instead of creating extremely common FlyStrings like "id" and "class"
on demand every time they are needed, we now have AttributeNames.h,
which provides Web::HTML::AttributeNames::{id,class_}
This avoids a bunch of string allocations during selector matching.
Instead of string splitting every time you call Element::has_class(),
we now split the "class" attribute value when it changes, and cache
the individual classes as FlyStrings in Element::m_classes.
This makes has_class() significantly faster and moves the pain point
of selector matching somewhere else.
Sometimes people put a '}' where it doesn't belong, or various other
things go wrong. 99% of the time, it's our fault, but either way,
this patch makes us not crash or infinite-loop in some common cases.
The real solution here is to write a proper CSS lexer-parser according
to the language spec, this is just a hack fix to make more sites load
at all.
We now implement the somewhat fuzzy shrink-to-fit algorithm when laying
out inline-block elements with both block and inline children.
Shrink-to-fit works by doing two speculative layouts of the entire
subtree inside the current block, to compute two things:
1. Preferred minimum width: If we made a line break at every chance we
had, how wide would the widest line be?
2. Preferred width: We break only when explicitly told to (e.g "<br>")
How wide would the widest line be?
We then shrink the width of the inline-block element to an appropriate
value based on the above, taking the available width in the containing
block into consideration (sans all the box model fluff.)
To make the speculative layouts possible, plumb a LayoutMode enum
throughout the layout system since it needs to be respected in various
places.
Note that this is quite hackish and I'm sure there are smarter ways to
do a lot of this. But it does kinda work! :^)
In step 4 of the "renstruct the active formatting elements" algorithm it
says:
Rewind: If there are no entries before entry in the list of active
formatting elements, then jump to the step labeled create.
Prior to this patch, the implementation accorded to the spec only for
the first loop iteration.
This was causing very tall lines on many websites. We can now see the
section header thingy on google.com (although it's broken into lines
where it should not be..) :^)
And move canonicalized_path() to a static method on LexicalPath.
This is to make it clear that FileSystemPath/canonicalized_path() only
perform *lexical* canonicalization.
Unless otherwise stated, we shouldn't stop parsing just because there's
a parse error, so let's allow ourselves to continue.
With this change, we can now tokenize and parse the ACID1 test. :^)
Now that we've gotten rid of the misguided character buffering in the
tokenizer, it actually spits out character tokens that we have to deal
with in the parser.
This patch implements enough to bring us back to speed with simple.html
When watching the video of the new HTML parser I noticed a small copy
and paste error. In one of the cases in `handle_after_head` the code
was checking for end tags when it should check for start tags.
I haven't tested this change, just looking at the spec.
Add reasonable support for all values of white-space CSS property.
Values of the property are translated into a 3-tuple of rules:
do_collapse: whether whitespace is to be collapsed
do_wrap_lines: whether to wrap on word boundaries when
lines get too long
do_wrap_breaks: whether to wrap on linebreaks
The previously separate handling of per-line splitting and per-word
splitting have been unified. The Word structure is now a more
general Chunk, which represents different amounts of text depending
on splitting rules.
The "audible bell" character ('\a' U+0007) was treated as whitespace
while the "line feed" character ('\n' U+000a) was not.
'\a' is no longer considered whitespace.
'\n' is now considered whitespace.
We can now parse a little DOM like this:
<!DOCTYPE html>
<html>
<head></head>
<body>
<div></div>
</body>
</html>
This is pretty slow work, but the incremental progress is satisfying!
This patch adds a new HTMLDocumentParser class. It keeps a tokenizer
object internally and feeds itself with one token at a time from it.
The names and idioms in this class are expressed as closely to the
actual HTML parsing spec as possible, to make development as easy
and bug free as possible. :^)
This is going to become pretty large, but it's pretty cool!