LibWeb: Use an ancestor filter to quickly reject many CSS selectors

Given a selector like `.foo .bar #baz`, we know that elements with
the class names `foo` and `bar` must be present in the ancestor chain of
the candidate element, or the selector cannot match.

By keeping track of the current ancestor chain during style computation,
and which strings are used in tag names and attribute names, we can do
a quick check before evaluating the selector itself, to see if all the
required ancestors are present.

The way this works:

1. CSS::Selector now has a cache of up to 8 strings that must be present
   in the ancestor chain of a matching element. Note that we actually
   store string *hashes*, not the strings themselves.

2. When Document performs a recursive style update, we now push and pop
   elements to the ancestor chain stack as they are entered and exited.

3. When entering/exiting an ancestor, StyleComputer collects all the
   relevant string hashes from that ancestor element and updates a
   counting bloom filter.

4. Before evaluating a selector, we first check if any of the hashes
   required by the selector are definitely missing from the ancestor
   filter. If so, it cannot be a match, and we reject it immediately.

5. Otherwise, we carry on and evaluate the selector as usual.

I originally tried doing this with a HashMap, but we ended up losing
a huge chunk of the time saved to HashMap instead. As it turns out,
a simple counting bloom filter is way better at handling this.
The cost is a flat 8KB per StyleComputer, and since it's a bloom filter,
false positives are a thing.

This is extremely efficient, and allows us to quickly reject the
majority of selectors on many huge websites.

Some example rejection rates:
- https://amazon.com: 77%
- https://github.com/SerenityOS/serenity: 61%
- https://nytimes.com: 57%
- https://store.steampowered.com: 55%
- https://en.wikipedia.org: 45%
- https://youtube.com: 32%
- https://shopify.com: 25%

This also yields a chunky 37% speedup on StyleBench. :^)
This commit is contained in:
Andreas Kling 2024-03-22 13:50:33 +01:00
commit afe6abfc09
Notes: sideshowbarker 2024-07-16 23:05:02 +09:00
6 changed files with 181 additions and 5 deletions

View file

@ -23,6 +23,53 @@ Selector::Selector(Vector<CompoundSelector>&& compound_selectors)
}
}
}
collect_ancestor_hashes();
}
void Selector::collect_ancestor_hashes()
{
size_t next_hash_index = 0;
auto append_unique_hash = [&](u32 hash) -> bool {
if (next_hash_index >= m_ancestor_hashes.size())
return true;
for (size_t i = 0; i < next_hash_index; ++i) {
if (m_ancestor_hashes[i] == hash)
return false;
}
m_ancestor_hashes[next_hash_index++] = hash;
return false;
};
auto last_combinator = m_compound_selectors.last().combinator;
for (ssize_t compound_selector_index = static_cast<ssize_t>(m_compound_selectors.size()) - 2; compound_selector_index >= 0; --compound_selector_index) {
auto const& compound_selector = m_compound_selectors[compound_selector_index];
if (last_combinator == Combinator::Descendant) {
for (auto const& simple_selector : compound_selector.simple_selectors) {
switch (simple_selector.type) {
case SimpleSelector::Type::Id:
case SimpleSelector::Type::Class:
if (append_unique_hash(simple_selector.name().hash()))
return;
break;
case SimpleSelector::Type::TagName:
if (append_unique_hash(simple_selector.qualified_name().name.name.hash()))
return;
break;
case SimpleSelector::Type::Attribute:
if (append_unique_hash(simple_selector.attribute().qualified_name.name.name.hash()))
return;
break;
default:
break;
}
}
}
last_combinator = compound_selector.combinator;
}
for (size_t i = next_hash_index; i < m_ancestor_hashes.size(); ++i)
m_ancestor_hashes[i] = 0;
}
// https://www.w3.org/TR/selectors-4/#specificity-rules