Commit graph

184 commits

Author SHA1 Message Date
Timothy Flynn
ca1257c6f9 LibJS+LibUnicode: Make the collation sensitivity default locale-aware
Note this happens to be 'variant' for every locale currently.
2024-08-15 13:44:32 +02:00
Timothy Flynn
78625c746d LibJS+LibUnicode: Make the collation punctation default locale-aware 2024-08-15 13:44:32 +02:00
Timothy Flynn
eb7e3583c9 LibJS+LibUnicode: Fully implement Intl.Collator with ICU
We were never able to implement anything other than a basic, locale-
unaware collator with the JSON export of the CLDR as it did not have
collation data. We can now use ICU to implement collation.
2024-08-15 13:44:32 +02:00
Timothy Flynn
50dfaf8581 LibJS: Disallow grouping separators in formatted duration fields
This is a normative change in the Intl.DurationFormat proposal. See:
68b00f3
2024-08-14 11:48:08 +02:00
Timothy Flynn
72f61396cd LibJS: Correctly display a negative sign on negative durations
This is a normative change in the Intl.DurationFormat proposal. See:
adfc4a1
2024-08-14 11:48:08 +02:00
Timothy Flynn
ee00730225 LibUnicode+LibJS: Normalize spaces in formatted date-time strings
ICU 72 began using non-ASCII spaces in some formatted date-time strings.
Every major browser has found that this introduced major breakage in web
compatibility, as many sites and tools expect ASCII spaces. This patch
removes these non-ASCII spaces in the same manner as the major engines.
Such behavior is also tested by WPT.
2024-08-02 08:05:52 +02:00
Timothy Flynn
1eced20521 LibJS: Change Intl.Locale.prototype.firstDayOfWeek to be a string
This is a normative change in the Intl Locale Info proposal. See:

5cb45fd
6d80e69
04039b8
2024-08-01 11:40:37 +02:00
Timothy Flynn
4fc0fba646 LibCore+LibJS+LibUnicode: Port retrieving available time zones to ICU
This required updating some LibJS spec steps to their latest versions,
as the data expected by the old steps does not quite match the APIs that
are available with the ICU. The new spec steps are much more aligned.
2024-06-26 10:14:02 +02:00
Timothy Flynn
14071c52f9 LibJS: Port Intl.Segmenter to the ICU text segmenter
This also lets us fully implement detecting if a segment is word-like,
although that is not tested by test262.
2024-06-20 13:46:54 +02:00
Timothy Flynn
9c3a775395 LibJS: Update AOs involved in locale resolution to the latest ECMA-402
There have been a number of changes to the locale resolution AOs that
we've fallen behind on. Mostly editorial, but includes one normative
change to canonicalize Unicode extension keywords in the Intl.Locale
constructor.
2024-06-18 21:06:50 +02:00
Timothy Flynn
638a6c8c00 LibJS: Support non-Gregorian calendars for Intl.DateTimeFormat
This almost worked out of the box, but we need to be sure we pass the
full locale (e.g. en-u-ca-hebrew) and not just the data locale (en) to
ICU.
2024-06-17 21:59:59 +02:00
Timothy Flynn
1bcc29d0d1 LibJS+LibLocale: Replace Unicode keyword lookups with ICU
Note: All of the changes to the test files in this patch are now aligned
with both Chrome and Firefox.
2024-06-16 06:57:08 +02:00
Timothy Flynn
5e2ee4447e LibJS+LibLocale: Replace plural rules selection with ICU
This uses ICU for all of the Intl.PluralRules prototypes, which lets us
remove all data from our plural rules generator.

Plural rules depend directly on internal data from the number formatter,
so rather than creating a separate Locale::PluralRules class (which will
make accessing that data awkward), this adds plural rules APIs to the
existing Locale::NumberFormat.
2024-06-15 06:57:16 +02:00
Timothy Flynn
d634039c10 LibJS: Implement the latest Intl.DurationFormat proposal
The proposal has undergone quite a few normative changes since we last
synced with it. There was a time when it could not be implemented as it
was written, which is no longer the case. The resulting proposal has had
so many changes compared to our implementation, that it wouldn't make
sense to implement them commit-by-commit as we normally do. So instead,
this just implements the HEAD revision of the spec in one pass.
2024-06-14 07:59:42 +02:00
Timothy Flynn
273694d8de LibJS+LibLocale: Replace date-time formatting with ICU
This uses ICU for the Intl.DateTimeFormat `format` `formatToParts`,
`formatRange`, and `formatRangeToParts`.

This lets us remove most data from our date-time format generator. All
that remains are time zone data and locale week info, which are relied
upon still for other interfaces. So they will be removed in a future
patch.

Note: All of the changes to the test files in this patch are now aligned
with other browsers. This includes:

* Some very incorrect formatting of Japanese symbols. (Looking at the
  old results now, it's very obvious they were wrong.)
* Old FIXMEs regarding range formatting not including the start/end date
  when only time fields were requested, but the dates differ.
* Day period inconsistencies.
2024-06-13 07:42:09 +02:00
Timothy Flynn
3b68bb6e73 LibJS: Store Intl mathematical values as strings when appropriate
The IntlMV is meant to be arbitrarily precise. If the user provides a
string value to be formatted, we lose precision by converting extremely
large values to a double. We were never able to address this, as support
for arbitrary precision was a big FIXME. But ICU can handle it by just
passing the raw string on through.
2024-06-10 13:51:51 +02:00
Timothy Flynn
f6bee0f5a8 LibJS+LibLocale: Replace number range formatting with ICU
This uses ICU for the Intl.NumberFormat `formatRange` and
`formatRangeToParts` prototypes.

Note: All of the changes to the test files in this patch are now aligned
with both Chrome and Safari.
2024-06-10 13:51:51 +02:00
Timothy Flynn
67f3de2320 LibJS+LibLocale: Begin replacing number formatting with ICU
This uses ICU for the Intl.NumberFormat `format` and `formatToParts`
prototypes. It does not yet port the range formatter prototypes.

Most of the new code in LibLocale/NumberFormat is simply mapping from
ECMA-402 types to ICU types. Beyond that, the only algorithmic change is
that we have to mutate the output from ICU for `formatToParts` to match
what is expected by ECMA-402. This is explained in NumberFormat.cpp in
`flatten_partitions`.

This lets us remove most data from our number format generator. All that
remains are numbering system digits and symbols, which are relied upon
still for other interfaces (e.g. Intl.DateTimeFormat). So they will be
removed in a future patch.

Note: All of the changes to the test files in this patch are now aligned
with both Chrome and Safari.
2024-06-10 13:51:51 +02:00
Timothy Flynn
9724a25daf LibJS+LibLocale: Replace canonical locales and display names with ICU
Note: We keep locale parsing and syntactic validation as-is. ECMA-402
places additional restrictions on locales above what is required by the
Unicode spec. ICU doesn't provide methods that let us easily check those
restrictions, whereas LibLocale does. Other browsers also implement
their own validators here.

This introduces a locale cache to re-use parsed locale data and various
related structures (not doing so has a non-negligible performance impact
on Intl tests).

The existing APIs for canonicalization and display names are pretty
intertwined, so they must both be adapted at once here. The results of
canonicalization are slightly different on some edge cases. But the
changed results are actually now aligned with Chrome and Safari.
2024-06-09 10:47:28 +02:00
Timothy Flynn
1d76738dde LibJS: Change Intl.Locale info APIs from property getters to methods
This is a normative change in the Intl Locale Info spec. See:
e550152
2023-11-13 20:10:58 +01:00
Timothy Flynn
a357874c77 LibJS: Implement Intl.Locale.prototype.firstDayOfWeek
This is a normative change in the Intl Locale Info spec. See:
f03a814
2023-11-13 20:10:58 +01:00
Timothy Flynn
38dd284915 LibLocale: Update to CLDR version 44.0.1
https://cldr.unicode.org/index/downloads/cldr-44

Notable changes that affect us include:

* The Islamic Calendar is now localized as the Hijri Calender (in en-US)
  but has not been updated for all locales. So this patch updates tests
  where possible and removes a few test cases that currently cannot be
  localized.

* The und locale has received more likely subtag data (the und locale is
  basically a pseudo-locale meaning "undetermined").

* The exponential symbol in the Arabic number system was changed from
  U+0627 to U+0623.
2023-11-06 08:31:56 -05:00
Timothy Flynn
eeb16f03bb LibLocale: Parse day-period hour cycle preferences
For example, the locale "fr-FR" will have the preferred hour cycle list
of "H hB", meaning h23 and h12-with-day-periods. Whether date-times are
actually formatted with day-periods is up to the user, but we need to
parse the hour cycle as h12 to know that the FR region supports h12.

This bug was revealed by LibJS no longer blindly falling back to h12 (if
the `hour12` option is true) or h24 (if the `hour12` option is false).
2023-10-05 17:01:02 +02:00
Timothy Flynn
05e080c4ba LibJS: Correctly resolve locale hour cycles in Intl.DateTimeFormat
This is a normative change in the ECMA-402 spec. See:
2f002b2
2023-10-05 17:01:02 +02:00
Timothy Flynn
39be5cb73a LibJS: Allow formatting UTC-offset time zones with Intl.DateTimeFormat
These are normative changes in the ECMA-402 spec. See:
896ffcc
af4ec46
e25c455

(This combines the above commits into one patch as they each do not work
on their own).
2023-10-05 17:01:02 +02:00
Timothy Flynn
ca0d926036 LibJS: Use decimal compact patterns for currency style sub-patterns
When formatting a currency style pattern with compact notation, we were
(trying to) doubly insert the currency symbol into the formatted string.
We would first look up the currency pattern in GetNumberFormatPattern
(for the en locale, this is "¤#,##0.00", which our generator transforms
to "{currency}{number}").

When we hit the "{number}" field, NumberFormat will do a second lookup
for the compact pattern to use for the number being formatted. By using
the currency compact patterns, we receive a second pattern that also has
the currency symbol (for the en locale, if formatting the number 1000,
this is "¤0K", which our generator transforms to
"{currency}{number}{compactIdentifier:0}". This second lookup is not
supposed to have currency symbols (or any other symbols), thus we hit a
VERIFY_NOT_REACHED().

Instead, we are meant to use the decimal compact pattern, and allow the
currency symbol to be handled by only the outer currency pattern.
2023-09-04 18:22:28 +02:00
Timothy Flynn
ea774111e8 LibJS: Raise the upper minimum/maximum fraction digit limit to 100
This is a normative change in the ECMA-402 spec. See:
f6d2945
2023-07-22 10:18:55 +02:00
Timothy Flynn
5cbf054651 LibUnicode: Fix typos causing text segmentation on mid-word punctuation
For example the words "can't" and "32.3" should not have boundaries
detected on the "'" and "." code points, respectively.

The String test cases fixed here are because "b'ar" is now considered
one word.
2023-02-15 12:36:47 +01:00
Timothy Flynn
bb4fda3b97 LibJS: Format the era of ISO year 0 as BC
This is a normative change in the ECMA-402 spec. See:
2034315
2023-02-02 12:12:26 +00:00
Timothy Flynn
e74e8381d5 LibJS: Allow "approximately" results to differ in plural form
This is a normative change in the Intl.NumberFormat V3 spec. See:
08f599b

Note that this didn't seem to actually affect our implementation. The
Unicode spec states:

https://www.unicode.org/reports/tr35/tr35-53/tr35-numbers.html#Plural_Ranges
"If there is no value for a <start,end> pair, the default result is end"

Therefore, our implementation did not have the behavior noted by the
issue this normative change addressed:

    const pr = new Intl.PluralRules("en-US");
    pr.selectRange(1, 1); // Is "other", should be "one"

Our implementation already returned "one" here because there is no such
<start=one, end=one> value in the CLDR for en-US. Thus, we already
returned the end value of "one".
2023-01-30 14:10:07 -05:00
Timothy Flynn
5b3b14be0a LibJS: Move resolution of some Intl.NumberFormat options to a common AO
This is a normative change in the Intl.NumberFormat V3 spec. See:
29acfc6

This is to allow Intl.PluralRules to use these options, as they were in-
effect required by later AOs anyways.
2023-01-30 12:19:14 -05:00
Timothy Flynn
d1881da2be LibJS: Set approximate number range format result's "source" to "shared"
This is a normative change in the Intl.NumberFormat v3 spec. See:
7510e7f
2023-01-14 19:12:48 +00:00
Timothy Flynn
a2cf026b30 LibJS: Throw a RangeError when when formatting strings in DurationFormat
This is a normative change in the Intl.DurationFormat proposal. See:
2546080
2022-12-15 09:40:09 +00:00
Timothy Flynn
675e5bfdce LibJS: Allow specifying only roundingIncrement in NumberFormat options
This is a normative change in the Intl.NumberFormat v3 spec. See:
a260aa3
2022-11-29 10:24:44 +01:00
Timothy Flynn
d56205f991 LibJS: Use more accurate number-to-string method in Intl.NumberFormat
Intl.NumberFormat only ever wants literal number-to-digits here, without
extra exponential formatting.
2022-11-04 21:12:10 +00:00
Timothy Flynn
a5bf32018f LibJS+LibUnicode: Add "microsecond" and "nanosecond" as sanctioned units
This is a normative change in the ECMA-402 spec. See:
f627573
2022-11-03 18:37:48 +00:00
Timothy Flynn
4686989582 LibJS: Map DurationFormat's list style to "short" when it is "digital"
This is a normative change in the Intl.DurationFormat proposal. See:
7495e32
2022-11-01 14:33:07 +00:00
Timothy Flynn
b077fccd3d LibLocale+LibJS: Update to CLDR version 42.0.0
There were some notable changes to the CLDR JSON format and data in this
release.

The patterns for a date at a specific time, i.e. "{date} at {time}", now
appear under the "atTime" attribute of the "dateTimeFormats" object.

Locale specific changes that affected test-js:

All locales:

* In many patterns, the code points U+00A0 (NO-BREAK SPACE) and U+202F
  (NARROW NO-BREAK SPACE) are now used in place of an ASCII space. For
  example, before the "dayPeriod" fields AM and PM.

* Separators such as U+2013 (EN DASH) are now surrounded by U+2009 (THIN
  SPACE) in place of an ASCII space character.

Locale "en":

* Narrow localizations of time formats are even more narrow. For
  example, the abbreviation "wk." for "week" is now just "wk".

Locale "ar":

* The code point U+060C (ARABIC COMMA) is now used in place of an ASCII
  comma.

* The code point U+200F (RIGHT-TO-LEFT MARK) now appears at the
  beginning of many localizations.

* When the "latn" numbering system is used for currency formatting, the
  currency symbol more consistently is placed at the end of the pattern.

Locale "he":

* The "many" plural rules category has been removed.

Locales "zh" and "es-419":

* Several display-name localizations were changed.
2022-10-25 10:10:39 +01:00
Timothy Flynn
82e730eba1 LibJS: Change default time display options to "always" for digital style
This is a normative change in the Intl.DurationFormat proposal. See:
d28076b
2022-09-22 14:39:24 +01:00
Timothy Flynn
60a6bae53d LibJS: Change digital default style from "narrow" to "short"
This is a normative change in the Intl.DurationFormat proposal. See:
4c24876
2022-09-21 16:09:38 +01:00
Timothy Flynn
887dac0929 LibJS: Handle NumberFormat's [[UseGrouping]] option for "true" / "false"
This is a normative change to the Intl NumberFormat V3 spec. See:
4751da5
2022-09-18 09:45:40 -04:00
Brian Gianforcaro
d0a1775369 Everywhere: Fix a variety of typos
Spelling fixes found by `codespell`.
2022-09-14 04:46:49 +00:00
Timothy Flynn
c477425b9b LibJS: Create DurationFormat's ListFormat object with type and style
This is a normative change in the Intl.DurationFormat spec. See:
1304e4b
2022-08-30 14:26:11 -04:00
Timothy Flynn
127b28c940 LibJS: Use numeric style if the previous style was numeric or 2-digit
This is a normative change in the Intl.DurationFormat proposal. See:
3a46ee3
2022-08-30 14:26:11 -04:00
Timothy Flynn
d57b92da09 LibJS: Default to "short" for DurationFormat's style option
This is a normative change in the Intl.DurationFormat proposal. See:
b289494
2022-08-30 14:26:11 -04:00
Timothy Flynn
765d016670 LibJS: Default to 0 for DurationFormat's fractionalDigits option
This is a normative change in the Intl.DurationFormat proposal. See:
ac7e184
2022-08-30 14:26:11 -04:00
Timothy Flynn
417a385db1 LibJS: Allow out-of-order plural ranges to be formatted
This is a normative change to the Intl NumberFormat V3 spec:
0c3d849
2022-07-26 10:46:08 -07:00
Timothy Flynn
fd7d97fba5 LibJS: Allow out-of-order number ranges to be formatted
This is a normative change to the Intl NumberFormat V3 spec:
0c3d849
2022-07-26 10:46:08 -07:00
Timothy Flynn
415742ab98 LibJS: Allow out-of-order date ranges to be formatted
This is a normative change to the Intl spec:
769df4b
2022-07-26 10:46:08 -07:00
Timothy Flynn
ae2acc8cdf LibJS+LibUnicode: Generate a set of default DateTimeFormat patterns
This isn't called out in TR-35, but before ICU even looks at CLDR data,
it adds a hard-coded set of default patterns to each locale's calendar.
It has done this since 2006 when its DateTimeFormat feature was first
created. Several test262 tests depend on this, which under ECMA-402,
falls into "implementation defined" behavior. For compatibility, we
can do the same in LibUnicode.
2022-07-22 23:51:56 +01:00