Commit graph

529 commits

Author SHA1 Message Date
Timothy Flynn
73ca9516a9 LibJS: Remove the InitializeNumberFormat AO
The Initialize* AOs for Intl formatters were removed some time ago, and
the formatter construction steps are now inlined in the constructors
themselves. InitializeNumberFormat was the one remaining initializer we
still had laying around.
2024-08-15 17:21:00 -04:00
Timothy Flynn
a1a368bb61 LibJS: Fix editorial rebasing errors in the ECMA-402 spec
This is the remainder of the editorial rebasing errors that were fixed
in:
3f029b0
2024-08-15 17:21:00 -04:00
Timothy Flynn
c7dd4afd9c LibJS+LibUnicode: Update the Intl.DateTimeFormat constructor spec steps
This constructor has undergone a handful of editorial changes that we
fell behind on. But we weren't able to take the updates until now due to
a spec bug in those updates. See:
3f029b0

The result is that we can remove the inheritance of Intl::DateTimeFormat
from Unicode::DateTimeFormat; the former now contains the latter as an
internal slot.
2024-08-15 17:21:00 -04:00
Timothy Flynn
ca1257c6f9 LibJS+LibUnicode: Make the collation sensitivity default locale-aware
Note this happens to be 'variant' for every locale currently.
2024-08-15 13:44:32 +02:00
Timothy Flynn
78625c746d LibJS+LibUnicode: Make the collation punctation default locale-aware 2024-08-15 13:44:32 +02:00
Timothy Flynn
eb8e516ed3 LibJS: Update Intl.Collator spec steps to the latest
The steps have been updated to indicate a few options that should be
defaulted based on locale preferences. Those steps have not yet been
implemented in this patch.
2024-08-15 13:44:32 +02:00
Timothy Flynn
eb7e3583c9 LibJS+LibUnicode: Fully implement Intl.Collator with ICU
We were never able to implement anything other than a basic, locale-
unaware collator with the JSON export of the CLDR as it did not have
collation data. We can now use ICU to implement collation.
2024-08-15 13:44:32 +02:00
Timothy Flynn
50dfaf8581 LibJS: Disallow grouping separators in formatted duration fields
This is a normative change in the Intl.DurationFormat proposal. See:
68b00f3
2024-08-14 11:48:08 +02:00
Timothy Flynn
72f61396cd LibJS: Correctly display a negative sign on negative durations
This is a normative change in the Intl.DurationFormat proposal. See:
adfc4a1
2024-08-14 11:48:08 +02:00
Timothy Flynn
1eced20521 LibJS: Change Intl.Locale.prototype.firstDayOfWeek to be a string
This is a normative change in the Intl Locale Info proposal. See:

5cb45fd
6d80e69
04039b8
2024-08-01 11:40:37 +02:00
Timothy Flynn
1b2d47e6bb LibJS+LibUnicode: Port retrieving available regional time zones to ICU 2024-06-26 10:14:02 +02:00
Timothy Flynn
4fc0fba646 LibCore+LibJS+LibUnicode: Port retrieving available time zones to ICU
This required updating some LibJS spec steps to their latest versions,
as the data expected by the old steps does not quite match the APIs that
are available with the ICU. The new spec steps are much more aligned.
2024-06-26 10:14:02 +02:00
Timothy Flynn
d3e809bcd4 LibJS+LibUnicode: Port retrieving the system time zone to ICU 2024-06-26 10:14:02 +02:00
Timothy Flynn
89aa9a3af0 LibJS: Update Intl AO spec numbers
Otherwise, upcoming AO implementations will conflict with these.
2024-06-26 10:14:02 +02:00
Timothy Flynn
ebdb92eef6 LibUnicode+Everywhere: Merge LibLocale back into LibUnicode
LibLocale was split off from LibUnicode a couple years ago to reduce the
number of applications on SerenityOS that depend on CLDR data. Now that
we use ICU, both LibUnicode and LibLocale are actually linking in this
data. And since vcpkg gives us static libraries, both libraries are over
30MB in size.

This patch reverts the separation and merges LibLocale into LibUnicode
again. We now have just one library that includes the ICU data.

Further, this will let LibUnicode share the locale cache that previously
would only exist in LibLocale.
2024-06-23 19:52:45 +02:00
Timothy Flynn
14071c52f9 LibJS: Port Intl.Segmenter to the ICU text segmenter
This also lets us fully implement detecting if a segment is word-like,
although that is not tested by test262.
2024-06-20 13:46:54 +02:00
Timothy Flynn
9c3a775395 LibJS: Update AOs involved in locale resolution to the latest ECMA-402
There have been a number of changes to the locale resolution AOs that
we've fallen behind on. Mostly editorial, but includes one normative
change to canonicalize Unicode extension keywords in the Intl.Locale
constructor.
2024-06-18 21:06:50 +02:00
Timothy Flynn
2c311448c7 LibLocale+LibJS: Make a locale canonicalization API a bit more ergonomic
Instead of taking an out-parameter, return the canonicalization result.
This allows the API to be used where specs want to store the result and
the original values in separate variables.
2024-06-18 21:06:50 +02:00
Timothy Flynn
de99dd2c89 LibJS+LibLocale: Change ListFormat to be created once per Intl object
ListFormat was the first formatter I ported to ICU. This patch makes it
match the style of subsequently ported formatters, where we create the
formatter once per Intl object, rather than once per prototype
invocation.
2024-06-17 18:46:22 -04:00
Timothy Flynn
1c51ac4763 LibJS: Remove unused PartitionPattern AO and related types
And move some headers around that are no longer needed in the AO header.
2024-06-17 18:46:22 -04:00
Timothy Flynn
638a6c8c00 LibJS: Support non-Gregorian calendars for Intl.DateTimeFormat
This almost worked out of the box, but we need to be sure we pass the
full locale (e.g. en-u-ca-hebrew) and not just the data locale (en) to
ICU.
2024-06-17 21:59:59 +02:00
Timothy Flynn
4598a505b1 LibJS: Differentiate between undefined and null locale keys
We were previously treating undefined and null as the same (an empty
Optional). However, there are edge cases in ECMA-402 where we must treat
them differently. Namely, the hour cycle (hc) keyword. An undefined hc
value has no effect on the resolved locale, whereas a null hc value can
actively override any hc specified in the locale string. For example:

    new Intl.DateTimeFormat("en-u-hc-h11", { hour12: false });

In that object, the hour12 option does not match the u-hc-h11 value. So
the spec dictates we remove the hc value by setting it to null.
2024-06-17 21:59:59 +02:00
Timothy Flynn
1bcc29d0d1 LibJS+LibLocale: Replace Unicode keyword lookups with ICU
Note: All of the changes to the test files in this patch are now aligned
with both Chrome and Firefox.
2024-06-16 06:57:08 +02:00
Timothy Flynn
a1464342e1 LibJS+LibLocale: Remove unused parameter from keyword canonicalization 2024-06-16 06:57:08 +02:00
Timothy Flynn
5e2ee4447e LibJS+LibLocale: Replace plural rules selection with ICU
This uses ICU for all of the Intl.PluralRules prototypes, which lets us
remove all data from our plural rules generator.

Plural rules depend directly on internal data from the number formatter,
so rather than creating a separate Locale::PluralRules class (which will
make accessing that data awkward), this adds plural rules APIs to the
existing Locale::NumberFormat.
2024-06-15 06:57:16 +02:00
Timothy Flynn
7f9ccd39f5 LibJS+LibLocale: Replace relative time formatting with ICU
This uses ICU for all of the Intl.RelativeTimeFormat prototypes, which
lets us remove all data from our relative-time format generator.
2024-06-15 06:57:16 +02:00
Timothy Flynn
d634039c10 LibJS: Implement the latest Intl.DurationFormat proposal
The proposal has undergone quite a few normative changes since we last
synced with it. There was a time when it could not be implemented as it
was written, which is no longer the case. The resulting proposal has had
so many changes compared to our implementation, that it wouldn't make
sense to implement them commit-by-commit as we normally do. So instead,
this just implements the HEAD revision of the spec in one pass.
2024-06-14 07:59:42 +02:00
Timothy Flynn
4b3e26c583 LibJS+LibLocale: Replace calendar weekday information with ICU 2024-06-13 07:42:09 +02:00
Timothy Flynn
9cb1857dc6 LibJS+LibLocale: Replace preferred hour cycle lookups with ICU 2024-06-13 07:42:09 +02:00
Timothy Flynn
273694d8de LibJS+LibLocale: Replace date-time formatting with ICU
This uses ICU for the Intl.DateTimeFormat `format` `formatToParts`,
`formatRange`, and `formatRangeToParts`.

This lets us remove most data from our date-time format generator. All
that remains are time zone data and locale week info, which are relied
upon still for other interfaces. So they will be removed in a future
patch.

Note: All of the changes to the test files in this patch are now aligned
with other browsers. This includes:

* Some very incorrect formatting of Japanese symbols. (Looking at the
  old results now, it's very obvious they were wrong.)
* Old FIXMEs regarding range formatting not including the start/end date
  when only time fields were requested, but the dates differ.
* Day period inconsistencies.
2024-06-13 07:42:09 +02:00
Timothy Flynn
3b68bb6e73 LibJS: Store Intl mathematical values as strings when appropriate
The IntlMV is meant to be arbitrarily precise. If the user provides a
string value to be formatted, we lose precision by converting extremely
large values to a double. We were never able to address this, as support
for arbitrary precision was a big FIXME. But ICU can handle it by just
passing the raw string on through.
2024-06-10 13:51:51 +02:00
Timothy Flynn
f6bee0f5a8 LibJS+LibLocale: Replace number range formatting with ICU
This uses ICU for the Intl.NumberFormat `formatRange` and
`formatRangeToParts` prototypes.

Note: All of the changes to the test files in this patch are now aligned
with both Chrome and Safari.
2024-06-10 13:51:51 +02:00
Timothy Flynn
67f3de2320 LibJS+LibLocale: Begin replacing number formatting with ICU
This uses ICU for the Intl.NumberFormat `format` and `formatToParts`
prototypes. It does not yet port the range formatter prototypes.

Most of the new code in LibLocale/NumberFormat is simply mapping from
ECMA-402 types to ICU types. Beyond that, the only algorithmic change is
that we have to mutate the output from ICU for `formatToParts` to match
what is expected by ECMA-402. This is explained in NumberFormat.cpp in
`flatten_partitions`.

This lets us remove most data from our number format generator. All that
remains are numbering system digits and symbols, which are relied upon
still for other interfaces (e.g. Intl.DateTimeFormat). So they will be
removed in a future patch.

Note: All of the changes to the test files in this patch are now aligned
with both Chrome and Safari.
2024-06-10 13:51:51 +02:00
Timothy Flynn
5f7251fd91 LibJS+LibLocale: Replace list formatting with ICU
This also largely eliminates the need for some ECMA-402 AOs, as is it
all handled internally by ICU (which the spec is basically based on).
2024-06-09 10:47:28 +02:00
Timothy Flynn
d17d131224 LibJS+LibLocale: Replace locale character ordering with ICU 2024-06-09 10:47:28 +02:00
Timothy Flynn
e487f91388 LibJS+LibLocale: Replace locale maximization and minimization with ICU 2024-06-09 10:47:28 +02:00
Timothy Flynn
9724a25daf LibJS+LibLocale: Replace canonical locales and display names with ICU
Note: We keep locale parsing and syntactic validation as-is. ECMA-402
places additional restrictions on locales above what is required by the
Unicode spec. ICU doesn't provide methods that let us easily check those
restrictions, whereas LibLocale does. Other browsers also implement
their own validators here.

This introduces a locale cache to re-use parsed locale data and various
related structures (not doing so has a non-negligible performance impact
on Intl tests).

The existing APIs for canonicalization and display names are pretty
intertwined, so they must both be adapted at once here. The results of
canonicalization are slightly different on some edge cases. But the
changed results are actually now aligned with Chrome and Safari.
2024-06-09 10:47:28 +02:00
Timothy Flynn
ec492a1a08 Everywhere: Run clang-format
The following command was used to clang-format these files:

    clang-format-18 -i $(find . \
        -not \( -path "./\.*" -prune \) \
        -not \( -path "./Base/*" -prune \) \
        -not \( -path "./Build/*" -prune \) \
        -not \( -path "./Toolchain/*" -prune \) \
        -not \( -path "./Ports/*" -prune \) \
        -type f -name "*.cpp" -o -name "*.mm" -o -name "*.h")

There are a couple of weird cases where clang-format now thinks that a
pointer access in an initializer list, e.g. `m_member(ptr->foo)`, is a
lambda return statement, and it puts spaces around the `->`.
2024-04-24 16:50:01 -04:00
Timothy Flynn
0d3072bdac LibJS: Use IteratorStepValue in ECMA-402
This is an editorial change in the ECMA-402 spec. See:
e295500
2024-02-03 14:07:26 -05:00
Shannon Booth
986abe7047 LibJS: Rename IntlNumberIsNaNOrInfinity to NumberIsNaNOrInfinity
While only currently used in Intl in LibJS, this is a pretty generic
error and is useful elsewhere. Rename it to something more generic.
2024-01-02 10:01:26 +01:00
Andreas Kling
f4fa37afd2 LibJS+LibWeb: Add missing JS_DEFINE_ALLOCATOR() for a bunch of classes 2023-12-23 23:02:10 +01:00
Shannon Booth
e2e7c4d574 Everywhere: Use to_number<T> instead of to_{int,uint,float,double}
In a bunch of cases, this actually ends up simplifying the code as
to_number will handle something such as:

```
Optional<I> opt;
if constexpr (IsSigned<I>)
    opt = view.to_int<I>();
else
    opt = view.to_uint<I>();
```

For us.

The main goal here however is to have a single generic number conversion
API between all of the String classes.
2023-12-23 20:41:07 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Andreas Kling
3c74dc9f4d LibJS: Segregate GC-allocated objects by type
This patch adds two macros to declare per-type allocators:

- JS_DECLARE_ALLOCATOR(TypeName)
- JS_DEFINE_ALLOCATOR(TypeName)

When used, they add a type-specific CellAllocator that the Heap will
delegate allocation requests to.

The result of this is that GC objects of the same type always end up
within the same HeapBlock, drastically reducing the ability to perform
type confusion attacks.

It also improves HeapBlock utilization, since each block now has cells
sized exactly to the type used within that block. (Previously we only
had a handful of block sizes available, and most GC allocations ended
up with a large amount of slack in their tails.)

There is a small performance hit from this, but I'm sure we can make
up for it elsewhere.

Note that the old size-based allocators still exist, and we fall back
to them for any type that doesn't have its own CellAllocator.
2023-11-19 12:10:31 +01:00
Timothy Flynn
1d76738dde LibJS: Change Intl.Locale info APIs from property getters to methods
This is a normative change in the Intl Locale Info spec. See:
e550152
2023-11-13 20:10:58 +01:00
Timothy Flynn
a357874c77 LibJS: Implement Intl.Locale.prototype.firstDayOfWeek
This is a normative change in the Intl Locale Info spec. See:
f03a814
2023-11-13 20:10:58 +01:00
Daniel Bertalan
6f972c190b Everywhere: Work around Clang trunk bug with templated lambda + Variant
Since 2023-09-08, Clang trunk has had a bug which causes a segfault when
evaluating certain `requires` expressions inside templated lambdas.
There isn't an imminent fix on the horizon, so let's work around the
issue by specifying the type of the offending lambda arguments
explicitly.

See https://github.com/llvm/llvm-project/issues/67260
2023-11-05 13:41:13 -07:00
Andreas Kling
65717e3b75 LibJS: Inline fast case for Value::to_{boolean,number,numeric,primitive}
These functions all have a very common case that can be dealt with a
very simple inline check, often avoiding the need to call an out-of-line
function. This patch moves the common case to inline functions in a new
ValueInlines.h header (necessary due to header dependency issues..)

8% speed-up on the entire Kraken benchmark :^)
2023-10-07 07:13:52 +02:00
Timothy Flynn
03be26317f LibJS: Alphabetize handling some Intl.NumberFormat/PluralRules options
This is a normative change in the ECMA-402 spec. See:
5a43090
2023-10-05 17:01:02 +02:00
Timothy Flynn
05e080c4ba LibJS: Correctly resolve locale hour cycles in Intl.DateTimeFormat
This is a normative change in the ECMA-402 spec. See:
2f002b2
2023-10-05 17:01:02 +02:00