The underlying storage used during string formatting is StringBuilder.
To support UTF-16 strings, this patch allows callers to specify a mode
during StringBuilder construction. The default mode is UTF-8, for which
StringBuilder remains unchanged.
In UTF-16 mode, we treat the StringBuilder's internal ByteBuffer as a
series of u16 code units. Appending a single character will append 2
bytes for that character (cast to a char16_t). Appending a StringView
will transcode the string to UTF-16.
Utf16String also gains the same memory optimization that we added for
String, where we hand-off the underlying buffer to Utf16String to avoid
having to re-allocate.
In the future, we may want to further optimize for ASCII strings. For
example, we could defer committing to the u16-esque storage until we
see a non-ASCII code point.
This is a strictly UTF-16 string with some optimizations for ASCII.
* If created from a short UTF-8 or UTF-16 string that is also ASCII,
then the string is stored in an inlined byte buffer.
* If created with a long UTF-8 or UTF-16 string that is also ASCII,
then the string is stored in an outlined char buffer.
* If created with a short or long UTF-8 or UTF-16 string that is not
ASCII, then the string is stored in an outlined char16 buffer.
We do not store short non-ASCII text in the inlined buffer to avoid
confusion with operations such as `length_in_code_units` and
`code_unit_at`. For example, "😀" would be stored as 4 UTF-8 bytes
in short string form. But we still want `length_in_code_units` to
be 2, and `code_unit_at(0)` to be 0xD83D.
This will allow structured deserialization in LibWeb to create shared
buffers without having to go through CreateSharedByteDataBlock. That
will let us pass in the underlying ByteBuffer, rather than having to
allocate a second buffer via CreateSharedByteDataBlock and memcpy the
data into it.
This is an active proposal at stage 3 of the TC39 proposal process.
See: https://tc39.es/proposal-dynamic-code-brand-checks/
See: https://github.com/tc39/proposal-dynamic-code-brand-checks
This proposal essentially adds support for the TrustedScript type from
the Trusted Types specification to eval and Function. This in turn
pipes support for the type into the CSP hook to check if the CSP allows
dynamic code compilation.
However, it currently doesn't support ShadowRealms, so the
implementation here is a close approximation, using PerformEval as the
basis.
See: https://github.com/tc39/proposal-dynamic-code-brand-checks/issues/19
This is required to support the new function signature for the CSP
hook, and will allow us to slot in Trusted Types support in the future.
Our floating point number parser was based on the fast_float library:
https://github.com/fastfloat/fast_float
However, our implementation only supports 8-bit characters. To support
UTF-16, we will need to be able to convert char16_t-based strings to
numbers as well. This works out-of-the-box with fast_float.
We can also use fast_float for integer parsing.
By definition, the web allows lonely surrogates by default. Let's have
our string APIs reflect this, so we don't have to pass an allow option
all over the place.
To prepare for an upcoming Utf16String, this migrates Utf16View to store
its data as a char16_t. Most function definitions are moved inline and
made constexpr.
This also adds a UDL to construct a Utf16View from a string literal:
auto string = u"hello"sv;
This let's us remove the NTTP Utf16View constructor, as we have found
that such constructors bloat binary size quite a bit.
`TypedArray` objects need to know their own constructor objects to allow
copying. This was implemented by storing a function pointer to the
`Intrinsic` object's method which returns the constructor object.
The problem is that function pointers aren't polymorphic, we can't
legally just cast e.g. a `Derived* (*ptr)(void)` to `Base*
(*ptr)(void)` (this is why the code needed a `bit_cast` to even
compile). But this wasn't actually a problem in practice because their
ABIs were the same. But with pointer authentication (Apple's `arm64e`
ABI) this signature mismatch becomes a hard failure and crashes the
process.
Fix this by adding a virtual function that returns the intrinsic
constructor (actually, a `NativeFunction`, as typed arrays constructors
don't inherit from the base `TypedArray` constructor) instead of the
function pointer.
With this, test-js passes and Ladybird launches correctly when built
(with a lot of vcpkg hacks) for arm64e.
...of Array. If array has simple storage, which implies that attributes
of all indexed properties are default, and newly added property also
has default attribute, we can do a fast path and skip lots of checks
that happen in `Object::internal_define_own_property()`.
If array has packed index property storage without holes, we could check
if indexed property is present simple by checking if it's less than
array's length.
Makes the following program go 1.1x faster:
```js
function f() {
let array = [];
for (let i = 0; i < 3_000; i++) {
array.push(i);
}
for (let i = 0; i < 10_000; i++) {
array.map(x => x * 2);
}
}
f();
```
This one is required to cover the case when new empty elements are
introduced by assigning to element with index > length, like:
```js
var x = [];
x[0] = 1;
x[2] = 2;
```
...by avoiding `CreateListFromArrayLike` in cases when we could directly
use elements of underlying object's indexed properties storage.
Makes this program go 2.1x faster:
```js
function target(a, b, c) {
return a + b + c;
}
const args = [1, 2, 3];
let result = 0;
(function() {
for (let i = 0; i < 10_000_000; i++) {
result += target.apply(null, args);
}
})();
```
Before this change each built-in iterator object has a boolean
`m_next_method_was_redefined`. If user code later changed the iterator’s
prototype (e.g. `Object.setPrototypeOf()`), we still believed the
built-in fast-path was safe and skipped the user supplied override,
producing wrong results.
With this change
`BuiltinIterator::as_builtin_iterator_if_next_is_not_redefined()` looks
up the current `next` property and verifies that it is still the
built-in native function.
This commit adds the minimal export macros needed to run js.exe on
windows. A followup commit is planned to move to explicit export
entirely.
A static_assert for the size of a struct is also ifdef'ed out as the
semantics around object layout and inheritance are different on MSVC abi
and the struct IteratorRecord ends up being 40 bytes not 32.
...when Array.prototype and Object.prototype are intact.
If `internal_set()` is called on an array exotic object with a numeric
PropertyKey, and:
- the prototype chain has not been modified (i.e., there are no getters
or setters for indexed properties), and
- the array is not the target of a Proxy object,
then we can directly store the value in the receiver's indexed
properties, without checking whether it already exists somewhere in the
prototype chain.
1.7x improvement on the following program:
```js
function f() {
let a = [];
let i = 0;
while (i < 10_000_000) {
a.push(i);
i++;
}
}
f();
```
Replace the implementation of maths in `UnsignedBigInteger`
and `SignedBigInteger` with LibTomMath. This gives benefits in terms of
less code to maintain, correctness and speed.
These changes also remove now-unsued methods and improve the error
propagation for functions allocating lots of memory. Additionally, the
new implementation is always trimmed and won't have dangling zeros when
exporting it.
Having it as a method instead of a free function is necessary for the
next commits and generally allows for optimizations that require deeper
access into the `UnsignedBigInteger` / `SignedBigInteger`.
Also restrict the exponent to 32 bits to avoid huge memory allocations.
This is functionaly the same since caller_context is the topmost
execution context on the stack, but makes it more clear that
we are directly inheriting the strict mode from the caller context
when pushing the next context on to the stack.
- Avoids unnecessary conversions between StringOrSymbol and PropertyKey
on the hot path of property access.
- Simplifies the code by removing StringOrSymbol and using PropertyKey
directly. There was no reason to have a separate StringOrSymbol type
representing the same data as PropertyKey, just with the index key
stored as a string.
PropertyKey has been updated to use a tagged pointer instead of a
Variant, so it still occupies 8 bytes, same as StringOrSymbol.
12% improvement on JetStream/gcc-loops.cpp.js
12% improvement on MicroBench/object-assign.js
7% improvement on MicroBench/object-keys.js