We now fuse sequences like [LessThan, JumpIf] to JumpLessThan.
This is only allowed for temporaries (i.e VM registers) with no other
references to them.
This patch adds a register freelist to Bytecode::Generator and switches
all operands inside the generator to a new ScopedOperand type that is
ref-counted and automatically frees the register when nothing uses it.
This dramatically reduces the size of bytecode executable register
windows, which were often in the several thousands of registers for
large functions. Most functions now use less than 100 registers.
Once executed, this instruction will always produce the same result
in subsequent executions, so it's okay to cache it.
Unfortunately it may throw, so we can't just hoist it to the top of
every executable, since that would break observable execution order.
Instead of storing a BasicBlock* and forcing the size of Label to be
sizeof(BasicBlock*), we now store the basic block index as a u32.
This means the final version of the bytecode is able to keep labels
at sizeof(u32), shrinking the size of many instructions. :^)
Instead of storing source offsets with each instruction, we now keep
them in a side table in Executable.
This shrinks each instruction by 8 bytes, further improving locality.
Instead of keeping bytecode as a set of disjoint basic blocks on the
malloc heap, bytecode is now a contiguous sequence of bytes(!)
The transformation happens at the end of Bytecode::Generator::generate()
and the only really hairy part is rerouting jump labels.
This required solving a few problems:
- The interpreter execution loop had to change quite a bit, since we
were storing BasicBlock pointers all over the place, and control
transfer was done by redirecting the interpreter's current block.
- Exception handlers & finalizers are now stored per-bytecode-range
in a side table in Executable.
- The interpreter now has a plain program counter instead of a stream
iterator. This actually makes error stack generation a bit nicer
since we just have to deal with a number instead of reaching into
the iterator.
This yields a 25% performance improvement on this microbenchmark:
for (let i = 0; i < 1_000_000; ++i) { }
But basically everything gets faster. :^)
This patch adds a new "Peephole" pass for performing small, local
optimizations to bytecode.
We also introduce the first such optimization, fusing a sequence of
some comparison instruction FooCompare followed by a JumpIf into a
new set of JumpFooCompare instructions.
This gives a ~50% speed-up on the following microbenchmark:
for (let i = 0; i < 10_000_000; ++i) {
}
But more traditional benchmarks see a pretty sizable speed-up as well,
for example 15% on Kraken/ai-astar.js and 16% on Kraken/audio-dft.js :^)
These are then restored upon `ContinuePendingUnwind`.
This stops us from forgetting where we needed to jump when we do extra
try-catches in finally blocks.
Co-Authored-By: Jesús "gsus" Lapastora <cyber.gsuscode@gmail.com>
This is currently only used in the bytecode dump to annotate to where
unwinds lead per block, but will be hooked up to the virtual machine in
the next commit.
This reduces the minimum size of a basic block from 4 KiB to 0 bytes.
With this change, memory usage at the end of Speedometer is 1.2 GiB,
down from 1.8 GiB.
If the exception from the `try` block has already been caught by
`catch`, we need to clear the saved exception before entering `finally`
so that ContinuePendingUnwind will not re-throw it.
9 new passes on test262 :^)
The var environments will unwind as needed with the ExecutionContext
and there's no need to include it in the unwind info.
We still need to do this for lexical environments though, since they
can have short local lifetimes inside a function.
Since the relationship between VM and Bytecode::Interpreter is now
clear, we can have VM ask the Interpreter for roots in the GC marking
pass. This avoids having to register and unregister handles and
MarkedVectors over and over.
Since GeneratorObject can also own a RegisterWindow, we share the code
in a RegisterWindow::visit_edges() helper.
~4% speed-up on Kraken/stanford-crypto-ccm.js :^)
Unwind contexts now remember the lexical and variable environments in
effect when they were created. If an exception is caught, we revert
to those environments in the running execution context.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
To support situations like this:
function foo() { throw 1; }
try {
foo();
} catch (e) {
}
Each unwind context now keeps track of its origin executable.
When an exception is thrown, we return from run() immediately if the
nearest unwind context isn't in the current executable.
This causes a natural unwind to the point where we find the
catch/finally block(s) to jump into.
EnterUnwindContext pushes an unwind context (exception handler and/or
finalizer) onto a stack.
LeaveUnwindContext pops the unwind context from that stack.
Upon return to the interpreter loop we check whether the VM has an
exception pending. If no unwind context is available we return from the
loop. If an exception handler is available we clear the VM's exception,
put the exception value into the accumulator register, clear the unwind
context's handler and jump to the handler. If no handler is available
but a finalizer is available we save the exception value + metadata (for
later use by ContinuePendingUnwind), clear the VM's exception, pop the
unwind context and jump to the finalizer.
ContinuePendingUnwind checks whether a saved exception is available. If
no saved exception is available it jumps to the resume label. Otherwise
it stores the exception into the VM.
The Jump after LeaveUnwindContext could be integrated into the
LeaveUnwindContext instruction. I've kept them separate for now to make
the bytecode more readable.
> try { 1; throw "x" } catch (e) { 2 } finally { 3 }; 4
1:
[ 0] EnterScope
[ 10] EnterUnwindContext handler:@4 finalizer:@3
[ 38] EnterScope
[ 48] LoadImmediate 1
[ 60] NewString 1 ("x")
[ 70] Throw
<for non-terminated blocks: insert LeaveUnwindContext + Jump @3 here>
2:
[ 0] LoadImmediate 4
3:
[ 0] EnterScope
[ 10] LoadImmediate 3
[ 28] ContinuePendingUnwind resume:@2
4:
[ 0] SetVariable 0 (e)
[ 10] EnterScope
[ 20] LoadImmediate 2
[ 38] LeaveUnwindContext
[ 3c] Jump @3
String Table:
0: e
1: x
Instead of using Strings in the bytecode ops this adds a global string
table to the Executable struct which individual operations can refer
to using indices. This brings bytecode ops one step closer to being
pointer free.
This limits the size of each block (currently set to 1K), and gets us
closer to a canonical, more easily analysable bytecode format.
As a result of this, "Labels" are now simply entries to basic blocks.
Since there is no more 'conditional' jump (as all jumps are always
taken), JumpIf{True,False} are unified to JumpConditional, and
JumpIfNullish is renamed to JumpNullish.
Also fixes#7914 as a result of reimplementing the loop logic.
2021-06-09 09:07:29 +02:00
Renamed from Userland/Libraries/LibJS/Bytecode/Block.h (Browse further)