When loading Infiltrating the Airship in Ruffle, the copying of these
hash maps/tables were at least 10% of the runtime. This disappears when
returning them by const reference.
CSSUnitValue is a typed-om type which we will implement separately in
the future. However, it still seems useful to give our dimension values
a base class. (Maybe they could be templated in the future?) So instead
of deleting it entirely, rename it to DimensionStyleValue and make its
API match our style better.
This reverts 0e3487b9ab.
Back when I made that change, I thought we could make our StyleValue
classes match the typed-om definitions directly. However, they have
different requirements. Typed-om types need to be mutable and GCed,
whereas StyleValues are immutable and ideally wouldn't require a JS VM.
While I was already making such a cataclysmic change, I've moved it into
the StyleValues directory, because it *not* being there has bothered me
for a long time. 😅
Largely combinations of i32.const and local.get.
This shaves off at most single-digit% number of instructions from
dispatch, which translates to at most ~10% reduced dispatch time.
Across most benchmarks, this gains around ~5% perf increase.
This commit adds a register allocator, with 8 available "register"
slots.
In testing with various random blobs, this moves anywhere from 30% to
74% of value accesses into predefined slots, and is about a ~20% perf
increase end-to-end.
To actually make this usable, a few structural changes were also made:
- we no longer do one instruction per interpret call
- trapping is an (unlikely) exit condition
- the label and frame stacks are replaced with linked lists with a huge
node cache size, as we only need to touch the last element and
push/pop is very frequent.
This is largely unused (only in wasm.cpp)
A future reimplementation can bring it back as a separate interpreter
class that embeds the current bytecode interpreter.
...for the first byte.
This function only really needs to read a single byte at that point, so
read_until_filled() is useless and read_value<u8> is functionally
equivalent to just a read.
This showed up hot in a wasm parse benchmark.
The average wasm function rarely goes over these bounds for the labels
(32 nested control structures), and 8 frames is just enough to clear
most initialization code/start section without allocating anything.
These are quite bottlenecky in wasm, the next commit will try to make
use of this by calling them directly instead of doing a vcall, and
having them inlineable helps the compiler a bit.
If the address is already aligned properly, just read a T from it;
otherwise copy it to a local aligned array. This was a bottleneck on
memory-heavy benchmarks.