Instead of SetVariable having 2x2 modes for variable/lexical and
initialize/set, those 4 modes are now separate instructions, which
makes each instruction much less branchy.
The last completion value in a function is not exposed to the language,
since functions always either return something, or undefined.
Given this, we can avoid emitting code that propagates the completion
value from various statements, as long as we know we're generating code
for a context where the completion value is not accessible. In practical
terms, this means that function code gets to do less completion
shuffling, while global and eval code has to keep doing it.
These were out-of-line because we had some ideas about marking
instruction streams PROT_READ only, but that seems pretty arbitrary and
there's a lot of performance to be gained by putting these inline.
Instead, generate bytecode to execute their AST nodes and save the
resulting operands inside the NewClass instruction.
Moving property expression evaluation to happen before NewClass
execution also moves along creation of new private environment and
its population with private members (private members should be visible
during property evaluation).
Before:
- NewClass
After:
- CreatePrivateEnvironment
- AddPrivateName
- ...
- AddPrivateName
- NewClass
- LeavePrivateEnvironment
This patch stops emitting the BlockDeclarationInstantiation instruction
when there are no locals, and no function declarations in the scope.
We were spending 20% of CPU time on https://ventrella.com/Clusters/ just
creating empty environments for no reason.
If the callee is already a temporary register, we don't need to copy it
to *another* temporary before evaluating arguments. None of the
arguments will clobber the existing temporary anyway.
We only need to make copies of locals here, in case the locals are
modified by something like increment/decrement expressions.
Registers and constants can slip right through, without being Mov'ed
into a temporary first.
We know that `undefined` in the global scope is always the proper
undefined value. This commit takes advantage of that by simply emitting
a constant undefined value instead.
Unfortunately we can't be so sure in other scopes.
This helps some of the Cloudflare Turnstile stuff run faster, since they
are deliberately screwing with JS engines by asking us to do a bunch of
bitwise operations on e.g 65535.56
By rounding such values in bytecode generation, the interpreter can stay
on the happy path while executing, and finish quite a bit faster.
We now fuse sequences like [LessThan, JumpIf] to JumpLessThan.
This is only allowed for temporaries (i.e VM registers) with no other
references to them.
Before this change, switch codegen would interleave bytecode like this:
(test for case 1)
(code for case 1)
(test for case 2)
(code for case 2)
This meant that we often had to make many large jumps while looking for
the matching case, since code for each case can be huge.
It now looks like this instead:
(test for case 1)
(test for case 2)
(code for case 1)
(code for case 2)
This way, we can just fall through the tests until we hit one that fits,
without having to make any large jumps.
This removes a layer of indirection in the bytecode where we had to make
sure all the initializer elements were laid out in sequential registers.
Array expressions no longer clobber registers permanently, and they can
be reused immediately afterwards.
This patch adds a register freelist to Bytecode::Generator and switches
all operands inside the generator to a new ScopedOperand type that is
ref-counted and automatically frees the register when nothing uses it.
This dramatically reduces the size of bytecode executable register
windows, which were often in the several thousands of registers for
large functions. Most functions now use less than 100 registers.
Once executed, this instruction will always produce the same result
in subsequent executions, so it's okay to cache it.
Unfortunately it may throw, so we can't just hoist it to the top of
every executable, since that would break observable execution order.
This does two things:
* Clear exceptions when transferring control out of a finalizer
Otherwise they would resurface at the end of the next finalizer
(see test the new test case), or at the end of a function
* Pop one scheduled jump when transferring control out of a finalizer
This removes one old FIXME
For this case to work correctly in the current bytecode world:
func(a, a++)
We have to put the function arguments in temporaries instead of allowing
the postfix increment to modify `a` in place.
This fixes a problem where jQuery.each() would skip over items.
When a PutById / PutByValue bytecode operation results in accessing a
nullish object, we now include the name of the property and the object
being accessed in the exception message (if available). This should make
it easier to debug live websites.
For example, the following errors would all previously produce a generic
error message of "ToObject on null or undefined":
> foo = null
> foo.bar = 1
Uncaught exception:
[TypeError] Cannot access property "bar" on null object "foo"
at <unknown>
> foo = { bar: undefined }
> foo.bar.baz = 1
Uncaught exception:
[TypeError] Cannot access property "baz" on undefined object "foo.bar"
at <unknown>
Note we certainly don't capture all possible nullish property write
accesses here. This just covers cases I've seen most on live websites;
we can cover more cases as they arise.
When a GetById / GetByValue bytecode operation results in accessing a
nullish object, we now include the name of the property and the object
being accessed in the exception message (if available). This should make
it easier to debug live websites.
For example, the following errors would all previously produce a generic
error message of "ToObject on null or undefined":
> foo = null
> foo.bar
Uncaught exception:
[TypeError] Cannot access property "bar" on null object "foo"
at <unknown>
> foo = { bar: undefined }
> foo.bar.baz
Uncaught exception:
[TypeError] Cannot access property "baz" on undefined object "foo.bar"
at <unknown>
Note we certainly don't capture all possible nullish property read
accesses here. This just covers cases I've seen most on live websites;
we can cover more cases as they arise.
Instead of emitting a NewBigInt instruction to construct a primitive
bigint from a parsed literal, we now instantiate the BigInt on the heap
during codegen.
Instead of emitting a NewString instruction to construct a primitive
string from a parsed literal, we now instantiate the PrimitiveString on
the heap during codegen.
`var` declarations can have duplicates, but duplicate `let` or `const`
bindings are a syntax error.
Because of this, we can sink `let` and `const` directly into the
preferred_dst if available. This is not safe for `var` since the
preferred_dst may be used in the initializer.
This patch fixes the issue by simply skipping the preferred_dst
optimization for `var` declarations.
When compiling code like this:
x = { foo: x }
We don't want to put a new JS::Object in `x` until *after* we've
evaluated `x` for the `foo` field.
This fixes an issue when loading https://puter.com/ :^)
Instead of having Call refer to a range of VM registers, it now has
a trailing list of argument operands as part of the instruction.
This means we no longer have to shuffle every argument value into
a register before making a call, making bytecode smaller & faster. :^)
Instead of splitting the postfix variants into ToNumeric + Inc/Dec,
we now have dedicated PostfixIncrement and PostfixDecrement instructions
that handle both outputs in one go.
This patch moves us away from the accumulator-based bytecode format to
one with explicit source and destination registers.
The new format has multiple benefits:
- ~25% faster on the Kraken and Octane benchmarks :^)
- Fewer instructions to accomplish the same thing
- Much easier for humans to read(!)
Because this change requires a fundamental shift in how bytecode is
generated, it is quite comprehensive.
Main implementation mechanism: generate_bytecode() virtual function now
takes an optional "preferred dst" operand, which allows callers to
communicate when they have an operand that would be optimal for the
result to go into. It also returns an optional "actual dst" operand,
which is where the completion value (if any) of the AST node is stored
after the node has "executed".
One thing of note that's new: because instructions can now take locals
as operands, this means we got rid of the GetLocal instruction.
A side-effect of that is we have to think about the temporal deadzone
(TDZ) a bit differently for locals (GetLocal would previously check
for empty values and interpret that as a TDZ access and throw).
We now insert special ThrowIfTDZ instructions in places where a local
access may be in the TDZ, to maintain the correct behavior.
There are a number of progressions and regressions from this test:
A number of async generator tests have been accidentally fixed while
converting the implementation to the new bytecode format. It didn't
seem useful to preserve bugs in the original code when converting it.
Some "does eval() return the correct completion value" tests have
regressed, in particular ones related to propagating the appropriate
completion after control flow statements like continue and break.
These are all fairly obscure issues, and I believe we can continue
working on them separately.
The net test262 result is a progression though. :^)
This is pure prep work for refactoring the bytecode to use more operands
instead of only registers.
generate_bytecode() virtuals now return an Optional<Operand>, and the
idea is to return an Operand referring to the value produced by this
AST node.
They also take an Optional<Operand> "preferred_dst" input. This is
intended to communicate the caller's preference for an output operand,
if any. This will be used to elide temporaries when we can store the
result directly in a local, for example.
Previously, constructing a `UnsignedBigInteger::from_base()` could
produce an incorrect result if the input string contained a valid
Base36 digit that was out of range of the given base. The same method
would also crash if the input string contained an invalid Base36 digit.
An error is now returned in both these cases.
Constructing a BigFraction from string is now also fallible, so that we
can handle the case where we are given an input string with invalid
digits.
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')
When iterating over an iterable, we get back a JS object with the fields
"value" and "done".
Before this change, we've had two dedicated instructions for retrieving
the two fields: IteratorResultValue and IteratorResultDone. These had no
fast path whatsoever and just did a generic [[Get]] access to fetch the
corresponding property values.
By replacing the instructions with GetById("value") and GetById("done"),
they instantly get caching and JIT fast paths for free, making iterating
over iterables much faster. :^)
26% speed-up on this microbenchmark:
function go(a) {
for (const p of a) {
}
}
const a = [];
a.length = 1_000_000;
go(a);
This patch makes IteratorRecord an Object. Although it's not exposed to
author code, this does allow us to store it in a VM register.
Now that we can store it in a VM register, we don't need to convert it
back and forth between IteratorRecord and Object when accessing it from
bytecode.
The big win here is avoiding 3 [[Get]] accesses on every iteration step
of for..of loops. There are also a bunch of smaller efficiencies gained.
20% speed-up on this microbenchmark:
function go(a) {
for (const p of a) {
}
}
const a = [];
a.length = 1_000_000;
go(a);
When all the variables in a for..in/of block's lexical scope have been
turned into locals, we don't need to create and immediately abandon an
empty environment for them.
This avoid environment allocation in cases like this:
function foo(a) {
for (const x of a) {
}
}
This will not meaningfully affect short array literals, but it does
give us a bit of extra perf when evaluating huge array expressions like
in Kraken/imaging-darkroom.js
Until now, the unwind context stack has not been maintained by jitted
code, which meant we were unable to support the `with` statement.
This is a first step towards supporting that by making jitted code
call out to C++ to update the unwind context stack when entering/leaving
unwind contexts.
We also introduce a new "Catch" bytecode instruction that moves the
current exception into the accumulator. It's always emitted at the start
of a "catch" block.