Commit graph

54 commits

Author SHA1 Message Date
Andreas Kling
206479b2b5 LibJS: Cache UTF-16 strings on the VM
We were already caching UTF-8 and byte strings, so let's add a cache
for UTF-16 strings as well. This is particularly profitable whenever we
run regular expressions, since the output of regex execution is a set of
UTF-16 strings.

Note that this is a weak cache like the other JS string caches, meaning
that strings are removed from the cache as they are garbage collected.

This avoids billions of PrimitiveString allocations across a run of WPT,
significantly reducing GC activity.
2024-10-24 19:00:00 -04:00
Andreas Kling
cc4b3cbacc Meta: Update my e-mail address everywhere 2024-10-04 13:19:50 +02:00
Timothy Flynn
e8f4ae487d LibJS: Pre-allocate the resolved rope string's underlying buffer
For performance, rather than slowly incrementing the capacity of the
rope string's buffer, compute an approximate length for that buffer to
be reserved up front.
2024-07-20 06:45:49 +02:00
Dan Klishch
a53911717f LibJS: Remove DeprecatedFlyString::impl use in PrimitiveString 2024-02-24 15:06:52 -07:00
Andreas Kling
0ad4be3d78 LibJS: Skip redundant UTF-8 validation in rope string resolution
When resolving a rope, we've already taken care to resolve it to
a UTF-8 byte stream. There's no need to do a separate pass just for
validating the data again.

This was noticeable in some profiles. I made a simple microbenchmark
that gets a 30% speed-up:

    ("x" + "y".repeat(100_000_000)).trimStart()
2023-12-30 13:49:50 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Andreas Kling
eda2a6d9f7 LibJS: Don't die when making PrimitiveString from "" DeprecatedFlyString 2023-11-29 09:48:18 +01:00
Andreas Kling
3c74dc9f4d LibJS: Segregate GC-allocated objects by type
This patch adds two macros to declare per-type allocators:

- JS_DECLARE_ALLOCATOR(TypeName)
- JS_DEFINE_ALLOCATOR(TypeName)

When used, they add a type-specific CellAllocator that the Heap will
delegate allocation requests to.

The result of this is that GC objects of the same type always end up
within the same HeapBlock, drastically reducing the ability to perform
type confusion attacks.

It also improves HeapBlock utilization, since each block now has cells
sized exactly to the type used within that block. (Previously we only
had a handful of block sizes available, and most GC allocations ended
up with a large amount of slack in their tails.)

There is a small performance hit from this, but I'm sure we can make
up for it elsewhere.

Note that the old size-based allocators still exist, and we fall back
to them for any type that doesn't have its own CellAllocator.
2023-11-19 12:10:31 +01:00
Ali Mohammad Pur
aeee98b3a1 AK+Everywhere: Remove the null state of DeprecatedString
This commit removes DeprecatedString's "null" state, and replaces all
its users with one of the following:
- A normal, empty DeprecatedString
- Optional<DeprecatedString>

Note that null states of DeprecatedFlyString/StringView/etc are *not*
affected by this commit. However, DeprecatedString::empty() is now
considered equal to a null StringView.
2023-10-13 18:33:21 +03:30
Timothy Flynn
573cbb5ca0 LibJS+LibWeb+WebContent: Stop using ThrowableStringBuilder 2023-09-09 13:03:25 -04:00
Andreas Kling
b8f78c0adc LibJS: Make JS::number_to_string() infallible
Work towards #20449.
2023-08-09 17:09:16 +02:00
Andreas Kling
09547ec975 LibJS: Make PrimitiveString::deprecated_string() infallible
Work towards #20449.
2023-08-09 17:09:16 +02:00
Andreas Kling
c084269e5f LibJS: Make PrimitiveString::utf8_string() infallible
Work towards #20449.
2023-08-09 17:09:16 +02:00
Andreas Kling
7849950383 LibJS: Make Utf16String & related APIs infallible
Work towards #20449.
2023-08-09 17:09:16 +02:00
Andreas Kling
9708b86d65 LibJS: Make PrimitiveString::resolve_rope_if_needed() infallible
Work towards #20449.
2023-08-09 17:09:16 +02:00
Andreas Kling
1a27c525d5 LibJS: Make PrimitiveString::create() infallible
Work towards #20449.
2023-08-09 17:09:16 +02:00
Andreas Kling
a3e4535f34 LibJS: Resolve rope strings directly to UTF-16 when preferable
When someone calls PrimitiveString::utf16_string() on a rope string,
we know for sure that the client wants a UTF-16 string and may not
be interested in a UTF-8 version at all.

To avoid round-tripping through UTF-8 in this scenario, callers can
now inform resolve_rope_if_needed() about their preferred encoding,
should rope resolution take place. The UTF-16 case is actually a lot
simpler than the UTF-8 case, since we can simply ask for UTF-16 data
for each fiber of the rope, and then concatenate all the fibers.

Since LibJS always uses UTF-16 for regular expression matching, this
avoids round-tripping through UTF-8 whenever the input to a regex test
is already UTF-16. :^)
2023-07-13 20:53:54 +02:00
Hendiadyoin1
9300b9a364 LibJS: Don't lie about m_deprecated_string being a StringView 2023-06-13 01:49:02 +02:00
Matthew Olsson
82eeee2008 LibJS+LibWeb: Normalize calls to Base::visit_edges in GC objects 2023-04-30 06:04:33 +02:00
Timothy Flynn
0d0b87fd46 LibJS: Add a PrimitiveString::create overload for FlyString
This is to disambiguate this type from the StringView overload.
2023-03-18 19:50:45 +01:00
Timothy Flynn
36d72a7f4c LibJS: Convert CanonicalNumericIndexString to use NumberToString 2023-02-16 14:32:22 +01:00
Timothy Flynn
c3abb1396c LibJS+LibWeb: Convert string view PrimitiveString instances to String
First, this adds an overload of PrimitiveString::create for StringView.
This overload will throw an OOM completion if creating a String fails.
This is not only a bit more convenient, but it also ensures at compile
time that all PrimitiveString::create(string_view) invocations will be
handled as String and OOM-aware.

Next, this wraps all invocations to PrimitiveString::create(string_view)
with MUST_OR_THROW_OOM.

A small PrimitiveString::create(DeprecatedFlyString) overload also had
to be added to disambiguate between the StringView and DeprecatedString
overloads.
2023-02-09 17:13:33 +00:00
Timothy Flynn
4235c59397 LibJS: Add a convenience StringView accessor to PrimitiveString 2023-01-16 10:12:37 +00:00
Timothy Flynn
46dd8c1c0b LibJS: Resolve all UTF-8 rope strings as a String 2023-01-15 01:00:20 +00:00
Timothy Flynn
8f5bdce8e7 LibJS: Add initial support for creating PrimitiveStrings with AK::String
This will temporarily bloat the size of PrimitiveString as LibJS is
transitioned to use String throughout, but will make doing so piecemeal
much easier.
2023-01-15 01:00:20 +00:00
Timothy Flynn
4eb5eb2080 LibJS: Rename Utf16String::to_utf8 to to_deprecated_string 2023-01-15 01:00:20 +00:00
Timothy Flynn
ca655f5e7d LibJS: Rename VM::string_cache to deprecated_string_cache
And rename the member variable from m_string_cache to
m_deprecated_string_cache to match.
2023-01-15 01:00:20 +00:00
Timothy Flynn
3a004e8f1a LibJS: Rename PrimitiveString::has_utf8_string to has_deprecated_string
And rename the member variable from m_utf8_string to m_deprecated_string
to match.
2023-01-15 01:00:20 +00:00
Timothy Flynn
a59ebdac2d LibJS+Everywhere: Return strings by value from PrimitiveString
It turns out return a ThrowCompletionOr<T const&> is flawed, as the GCC
expansion trick used with TRY will always make a copy. PrimitiveString
is luckily the only such use case.
2023-01-13 18:50:47 -05:00
Timothy Flynn
6e1a239a62 LibJS: Use fallible methods to handle OOM when resolving rope strings 2023-01-08 12:13:15 +01:00
Timothy Flynn
115baa7e32 LibJS+Everywhere: Make PrimitiveString and Utf16String fallible
This makes construction of Utf16String fallible in OOM conditions. The
immediate impact is that PrimitiveString must then be fallible as well,
as it may either transcode UTF-8 to UTF-16, or create a UTF-16 string
from ropes.

There are a couple of places where it is very non-trivial to propagate
the error further. A FIXME has been added to those locations.
2023-01-08 12:13:15 +01:00
Timothy Flynn
d793262beb AK+Everywhere: Make UTF-16 to UTF-8 converter fallible
This could fail to allocate the underlying storage needed to store the
UTF-8 data. Propagate this error.
2023-01-08 12:13:15 +01:00
Timothy Flynn
425c168ded AK+LibJS+LibRegex: Define an alias for UTF-16 string data storage
Instead of writing out "Vector<u16, 1>" everywhere, let's have a name
for it.
2023-01-08 12:13:15 +01:00
Linus Groh
22089436ed LibJS: Convert Heap::allocate{,_without_realm}() to NonnullGCPtr 2022-12-15 06:56:37 -05:00
Linus Groh
525f22d018 LibJS: Replace standalone js_string() with PrimitiveString::create()
Note that js_rope_string() has been folded into this, the old name was
misleading - it would not always create a rope string, only if both
sides are not empty strings. Use a three-argument create() overload
instead.
2022-12-07 16:43:06 +00:00
Linus Groh
57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Andreas Kling
71067cbc6c LibJS+LibWeb: Make Runtime/AbstractOperations.h not include AST.h
This led to considerable fallout and many files had to be patched with
now-missing include statements.
2022-11-23 16:05:59 +00:00
Linus Groh
56b2ae5ac0 LibJS: Replace GlobalObject with VM in remaining AOs [Part 19/19] 2022-08-23 13:58:30 +01:00
Linus Groh
e992a9f469 LibJS+LibWeb: Replace GlobalObject with Realm in Heap::allocate<T>()
This is a continuation of the previous three commits.

Now that create() receives the allocating realm, we can simply forward
that to allocate(), which accounts for the majority of these changes.
Additionally, we can get rid of the realm_from_global_object() in one
place, with one more remaining in VM::throw_completion().
2022-08-23 13:58:30 +01:00
Linus Groh
12edbb51bc LibJS: Rename PrimitiveString::m_{left,right} to m_{lhs,rhs}
The LHS/RHS naming is already widely used as parameter names and local
variables with the same meaning, so let's also use them for the members.
2022-08-06 12:02:48 +02:00
Andreas Kling
64b29eb459 LibJS: Implement string concatenation using ropes
Instead of concatenating string data every time you add two strings
together in JavaScript, we now create a new PrimitiveString that points
to the two concatenated strings instead.

This turns concatenated strings into a tree structure that doesn't have
to be serialized until someone wants the characters in the string.

This *dramatically* reduces the peak memory footprint when running
the SunSpider benchmark (from ~6G to ~1G on my machine). It's also
significantly faster (1.39x) :^)
2022-08-06 00:29:15 +02:00
Andreas Kling
f4c68eb0a4 LibJS: Add PrimitiveString::is_empty() and use it
If we're only interested in whether the string is empty, we can skip the
conversion from UTF-16 to UTF-8.
2022-07-19 12:45:50 +02:00
davidot
da374a82bc LibJS: Correct an include in PrimitiveString 2022-02-15 00:51:25 +00:00
Anonymous
745b998774 LibJS: Get rid of unnecessary work from canonical_numeric_index_string
The spec version of canonical_numeric_index_string is absurdly complex,
and ends up converting from a string to a number, and then back again
which is both slow and also requires a few allocations and a string
compare.

Instead this patch moves away from using Values to represent canonical
a canonical index. In most cases all we need to know is whether a
PropertyKey is an integer between 0 and 2^^32-2, which we already
compute when we construct a PropertyKey so the existing is_number()
check is sufficient.

The more expensive case is handling strings containing numbers that
don't roundtrip through string conversion. In most cases these turn
into regular string properties, but for TypedArray access these
property names are not treated as normal named properties.
TypedArrays treat these numeric properties as magic indexes that are
ignored on read and are not stored (but are evaluated) on assignment.

For that reason there's now a mode flag on canonical_numeric_index_string
so that only TypedArrays take the cost of the ToString round trip test.
In order to improve the performance of this path this patch includes
some early returns to avoid conversion in cases where we can quickly
know whether a property can round trip.
2022-02-14 21:06:49 +00:00
Andreas Kling
4b412e8fee Revert "LibJS: Get rid of unnecessary work from canonical_numeric_index_string"
This reverts commit 3a184f7841.

This broke a number of test262 tests under "TypedArrayConstructors".
The issue is that the CanonicalNumericIndexString AO should not fail
for inputs like "1.1", despite them not being integral indices.
2022-02-13 16:01:32 +01:00
Anonymous
d1cc67bbe1 LibJS: Avoid unnecessary ToObject conversion when resolving references
When performing GetValue on a primitive type we do not need to perform
the ToObject conversion as it will resolve to a property on the
prototype object.

To avoid this we skip the initial ToObject conversion on the base value
as it only serves to get the primitive's boxed prototype. We further
specialize on PrimitiveString in order to get efficient behaviour
behaviour for the direct properties.

Depending on the tests anywhere from 20 to 60%, with significant loop
overhead.
2022-02-13 14:44:36 +01:00
Andreas Kling
f290c59dd8 LibJS: Keep track of PrimitiveStrings and share them
VM now has a string cache which tracks all live PrimitiveStrings and
reuses an existing one if possible. This drastically reduces the number
of GC-allocated strings in many real-word situations.
2021-10-02 16:39:28 +02:00
Timothy Flynn
c1e99fca1a LibJS: Replace Vector<u16> usage in PrimitiveString wth Utf16String
This commit does not go out of its way to reduce copying of the string
data yet, but is a minimum set of changes to compile LibJS after making
PrimitiveString hold a Utf16String.
2021-08-10 23:07:50 +02:00
Timothy Flynn
b6ff7f4fcc LibJS: Allow PrimitiveString to be created with a UTF-16 string
PrimitiveString may currently only be created with a UTF-8 string, and
it transcodes on the fly when a UTF-16 string is needed. Allow creating
a PrimitiveString from a UTF-16 string to avoid unnecessary transcoding
when the caller only wants UTF-16.
2021-08-04 11:18:24 +02:00