LibLocale was split off from LibUnicode a couple years ago to reduce the
number of applications on SerenityOS that depend on CLDR data. Now that
we use ICU, both LibUnicode and LibLocale are actually linking in this
data. And since vcpkg gives us static libraries, both libraries are over
30MB in size.
This patch reverts the separation and merges LibLocale into LibUnicode
again. We now have just one library that includes the ICU data.
Further, this will let LibUnicode share the locale cache that previously
would only exist in LibLocale.
Note: We keep locale parsing and syntactic validation as-is. ECMA-402
places additional restrictions on locales above what is required by the
Unicode spec. ICU doesn't provide methods that let us easily check those
restrictions, whereas LibLocale does. Other browsers also implement
their own validators here.
This introduces a locale cache to re-use parsed locale data and various
related structures (not doing so has a non-negligible performance impact
on Intl tests).
The existing APIs for canonicalization and display names are pretty
intertwined, so they must both be adapted at once here. The results of
canonicalization are slightly different on some edge cases. But the
changed results are actually now aligned with Chrome and Safari.
This patch adds two macros to declare per-type allocators:
- JS_DECLARE_ALLOCATOR(TypeName)
- JS_DEFINE_ALLOCATOR(TypeName)
When used, they add a type-specific CellAllocator that the Heap will
delegate allocation requests to.
The result of this is that GC objects of the same type always end up
within the same HeapBlock, drastically reducing the ability to perform
type confusion attacks.
It also improves HeapBlock utilization, since each block now has cells
sized exactly to the type used within that block. (Previously we only
had a handful of block sizes available, and most GC allocations ended
up with a large amount of slack in their tails.)
There is a small performance hit from this, but I'm sure we can make
up for it elsewhere.
Note that the old size-based allocators still exist, and we fall back
to them for any type that doesn't have its own CellAllocator.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
Instead of passing a GlobalObject everywhere, we will simply pass a VM,
from which we can get everything we need: common names, the current
realm, symbols, arguments, the heap, and a few other things.
In some places we already don't actually need a global object and just
do it for consistency - no more `auto& vm = global_object.vm();`!
This will eventually automatically fix the "wrong realm" issue we have
in some places where we (incorrectly) use the global object from the
allocating object, e.g. in call() / construct() implementations. When
only ever a VM is passed around, this issue can't happen :^)
I've decided to split this change into a series of patches that should
keep each commit down do a somewhat manageable size.
Intl.DisplayNames v2 adds "calendar" and "dateTimeField" types, as well
as a "languageDisplay" option for the "language" type. This just adds
these options to the constructor.
Intl.DisplayNames was the first Intl object implemented, and at that
point all AOs were just put into the main Intl AO header. But AOs that
belong to specific objects belong in that object's header. So this moves
CanonicalCodeForDisplayNames to the Intl.DisplayNames header.