Even though this code was already optimized to re-use a single result
object, returning { value, done } directly in output parameters still
provides a substantial speedup.
1.21x speedup on MicroBench/for-in-indexed-properties.js
Apply a little ensure_capacity() to avoid excessive rehashing of the
property key table when enumerating a large number of properties.
1.23x speedup on MicroBench/for-in-indexed-properties.js
...by avoiding `{ value, done }` iterator result value allocation. This
change applies the same otimization 81b6a11 added for `for..in` and
`for..of`.
Makes following micro benchmark go 22% faster on my computer:
```js
function f() {
const arr = [];
for (let i = 0; i < 10_000_000; i++) {
arr.push([i]);
}
let sum = 0;
for (let [i] of arr) {
sum += i;
}
}
f();
```
Introduce special instruction for `for..of` and `for..in` loop that
skips `{ value, done }` result object allocation if iterator is builtin
(array, map, set, string). This reduces GC pressure significantly and
avoids extracting the `value` and `done` properties.
This change makes this micro benchmark 48% faster on my computer:
```js
const arr = new Array(10_000_000);
let counter = 0;
for (let _ of arr) {
counter++;
}
```
This reverts commit 36bb2824a6.
Although this was faster on my M3 MacBook Pro, other Apple machines
disagree, including our benchmark runner. So let's revert it.
This is a simple trick to generate better native code for access to
registers, locals, and constants. Before this change, each access had
to first dereference the member pointer in Interpreter, and then get to
the values. Now we always have a pointer directly to the values on hand.
Here's how it looks:
class StackFrame {
public:
Value get(Operand) const;
void set(Operand, Value);
private:
Value m_values[];
};
And we just place one of these as a window on top of the execution
context's array of values (registers, locals, and constants).
Getting the running_execution_context() already verifies that the
execution context stack is non-empty, we don't need to do it separately
here as well.
The old accumulator register is really only used to pass the end
completion to the caller of run_bytecode() nowadays. As such, we don't
need to cache a pointer to it for fast access. One less thing to do
on run_bytecode() entry.
This way it's always automatically correct, and we don't have to
manually flush it in push_execution_context().
~7% speedup on the MicroBench/call* tests :^)
Instead of letting every [[Call]] implementation allocate an
ExecutionContext, we now make that a responsibility of the caller.
The main point of this exercise is to allow the Call instruction
to write function arguments directly into the callee ExecutionContext
instead of copying them later.
This makes function calls significantly faster:
- 10-20% faster on micro-benchmarks (depending on argument count)
- 4% speedup on Kraken
- 2% speedup on Octane
- 5% speedup on JetStream
This allows us to get rid of instructions that move arguments to locals
and allocate smaller JS::Value vector in ExecutionContext by reusing
slots that were already allocated for arguments.
With this change for following function:
```js
function f(x, y) {
return x + y;
}
```
we now produce following bytecode:
```
[ 0] 0: Add dst:reg6, lhs:arg0, rhs:arg1
[ 10] Return value:reg6
```
instead of:
```
[ 0] 0: GetArgument 0, dst:x~1
[ 10] GetArgument 1, dst:y~0
[ 20] Add dst:reg6, lhs:x~1, rhs:y~0
[ 30] Return value:reg6
```
By doing that we avoid doing separate allocation for each such vector,
which was really expensive on js heavy websites. For example this change
helps to get EC allocation down from ~17% to ~2% on Google Maps. This
comes at cost of adding extra complexity to custom execution context
allocator, because EC no longer has fixed size and we need to maintain
a list of buckets.
This is better because:
- Better data locality
- Allocate vector for registers+constants+locals+arguments in one go
instead of allocating two vectors separately
By doing that we consistently use Identifier node for identifiers and
also enable mechanism that registers identifiers in a corresponding
ScopePusher for catch parameters, which is necessary for work in the
upcoming changes.
This allows us to remove the BoundFunction::m_name field, which we
were initializing with a formatted FlyString on every function binding,
despite never using it for anything.
We were doing way too much computation every time an ESFO was
instantiated. This was particularly sad, since the results of these
computations were identical every time!
This patch adds a new SharedFunctionInstanceData object that gets
shared between all instances of an ESFO instantiated from some kind of
AST FunctionNode.
~5% speedup on Speedometer 2.1 :^)
We cached the length identifier for GetLength, but not
GetLengthWithThis. This caused an `has_value()` verification failure
when accessing super.length. Found by Fuzzilli.
This is a slightly mystified attempt to recover the performance
regression seen on our JS benchmark runner after 3c2a2bb39f
and c7bba505ea.
With this change, c7bba505ea is effectively reverted from the
perspective of x86_64.
It seems both aarch64 and x86_64 are extremely sensitive to the use of
bitfields here. Unfortunately, aarch64 gains a huge speedup from them
while x86_64 sees a very noticeable slowdown.
Since we're talking about 5%+ swings in both directions here, let's go
for the best of both worlds and use ifdefs in the Operand memory layout.
Before this change, we were inlining this function after every
handler for instructions that could throw.
Forcing it out-of-line shrinks the main bytecode interpreter by 15%
and yields a decent 2.5% speedup on JetStream/gcc-loops.cpp.js
This packs the bytecode much better and gives us a decent performance
boost on throughput-focused benchmarks.
Measured on my M3 MacBook Pro:
- 4.7% speedup on Kraken
- 2.3% speedup on Octane
- 2.7% speedup on JetStream1
We can use the index's invalid state to signal an empty optional.
This makes Optional<StringTableIndex> 4 bytes instead of 8,
shrinking every bytecode instruction that uses these.
The special empty value (that we use for array holes, Optional<Value>
when empty and a few other other placeholder/sentinel tasks) still
exists, but you now create one via JS::js_special_empty_value() and
check for it with Value::is_special_empty_value().
The main idea here is to make it very unlikely to accidentally create an
unexpected special empty value.