Commit graph

281 commits

Author SHA1 Message Date
Timothy Flynn
70ede2825e LibUnicode: Use BCP 47 data to filter valid calendar names 2022-02-16 07:23:07 -05:00
Timothy Flynn
71d86261c3 LibUnicode: Use BCP 47 data to filter valid numbering system names
There isn't too much of an effective difference here other than that the
BCP 47 data contains some aliases we would otherwise not handle.
2022-02-16 07:23:07 -05:00
Timothy Flynn
63c3437274 LibUnicode: Use BCP 47 data to generate available calendars and numbers
BCP 47 will be the single source of truth for known calendar and number
system keywords, and their aliases (e.g. "gregory" is an alias for
"gregorian"). Move the generation of available keywords to where we
parse the BCP 47 data, so that hard-coded aliases may be removed from
other generators.
2022-02-16 07:23:07 -05:00
Timothy Flynn
89ead8c00a LibJS+LibUnicode: Parse Unicode keywords from the BCP 47 CLDR package
We have a fair amount of hard-coded keywords / aliases that can now be
replaced with real data from BCP 47. As a result, the also changes the
awkward way we were previously generating keys. Before, we were more or
less generating keywords as a CSV list of keys, e.g. for the "nu" key,
we'd generate "latn,arab,grek" (ordered by locale preference). Then at
runtime, we'd split on the comma. We now just generate spans of keywords
directly.
2022-02-16 07:23:07 -05:00
Timothy Flynn
d0fc61e79b LibUnicode: Extract the BCP 47 package from the CLDR
This package was originally meant to be included in CLDR version 40, but
was missed in their release scripts. This has been resolved:
https://unicode-org.atlassian.net/browse/CLDR-15158

Unfortunately, the CLDR was re-released with the same version number. So
to bust the build's CLDR cache, change the "version" used to detect that
we need to redownload the CLDR.
2022-02-16 07:23:07 -05:00
thankyouverycool
0505e031f1 Meta+LibUnicode: Download and parse Unicode block properties
This parses Blocks.txt for CharacterType properties and creates
a global display array for use in apps.
2022-02-15 10:13:19 -05:00
Timothy Flynn
b52e592eac LibUnicode: Port the CLDR time format generator to the stream API 2022-02-14 11:39:46 -05:00
Timothy Flynn
ca3bcf201f LibUnicode: Port the CLDR date format generator to the stream API 2022-02-14 11:39:46 -05:00
Timothy Flynn
f39540876b LibUnicode: Port the CLDR number format generator to the stream API 2022-02-14 11:39:46 -05:00
Timothy Flynn
a338e9403b LibUnicode: Port the CLDR locale generator to the stream API
This adds a generator utility to read an entire file and parse it as a
JSON value. This is heavily used by the CLDR generators. The idea here
is to put the file reading details in the utility so that when we have a
good story for generically reading an entire stream in LibCore, we can
update the generators to use that by only touching this helper.
2022-02-14 11:39:46 -05:00
Timothy Flynn
a64a7940e4 LibUnicode: Port the UCD generator to the stream API 2022-02-14 11:39:46 -05:00
Timothy Flynn
9327c2233f LibTimeZone: Port the TZDB generator to the stream API
This also moves the open_file helper to the utility file. It's currently
a lambda redefined in each TZDB/Unicode generator. It used to display
the missing command line flag and other info local to each generator.
After switching to LibMain, it just returns a generic error message, and
is duplicated several times.
2022-02-14 11:39:46 -05:00
Idan Horowitz
2d50c08f34 LibUnicode: Download and parse {Grapheme,Word,Sentence} break props 2022-01-31 21:05:04 +02:00
Timothy Flynn
6efbafa6e0 Everywhere: Update copyrights with my new serenityos.org e-mail :^) 2022-01-31 18:23:22 +00:00
Timothy Flynn
bb0f548614 LibUnicode: Generate a list of available currencies 2022-01-31 00:32:41 +00:00
Timothy Flynn
481ced53d8 LibUnicode: Generate a list of available numbering systems 2022-01-31 00:32:41 +00:00
Timothy Flynn
ebd33e580b LibUnicode: Generate a list of available calendars 2022-01-31 00:32:41 +00:00
Timothy Flynn
4d43aeae30 LibUnicode: Fill in case-first and numeric BCP47 keywords
Unlike other BCP47 keywords that we are parsing, these only appear in
the BCP47 XML file itself within the CLDR. The values are very simple
though, so just hard code them until the Unicode org re-releases the
CLDR with BCP47: https://unicode-org.atlassian.net/browse/CLDR-15158
2022-01-29 20:27:24 +00:00
Timothy Flynn
789f093b2e LibUnicode: Parse and generate relative-time format patterns
Relative-time format patterns are of one of two forms:

    * Tensed - refer to the past or the future, e.g. "N years ago" or
      "in N years".
    * Numbered - refer to a specific numeric value, e.g. "in 1 year"
      becomes "next year" and "in 0 years" becomes "this year".

In ECMA-402, tensed and numbered refer to the numeric formatting options
of "always" and "auto", respectively.
2022-01-27 21:16:44 +00:00
Timothy Flynn
27eda77c97 LibUnicode: Create a nearly empty generator for relative-time formatting
This sets up the generator plumbing to create the relative-time data
files. This data could probably be included in the date-time generator,
but that generator is large enough that I'd rather put this tangentially
related data in its own file.
2022-01-27 21:16:44 +00:00
Timothy Flynn
589e7354fb LibUnicode: Remove extraneous semi-colons at end of generator functions 2022-01-27 21:16:44 +00:00
Timothy Flynn
2d2f713426 LibUnicode: Generate per-locale minimum grouping digit values
Previously, we were breaking up digits into groups without regard for
the locale's minimumGroupingDigits value in the CLDR. This value is 1 in
most locales, but is 2 in locales such as pl-PL. What this means is that
in those locales, the group separator should only be inserted if the
thousands group has at least 2 digits. So 1000 is formatted as "1,000"
in en-US, but "1000" in pl-PL. And 10000 is "10,000" in en-US and
"10 000" in pl-PL.
2022-01-27 20:30:52 +00:00
Timothy Flynn
bced4e9324 LibJS+LibUnicode: Convert Intl.ListFormat to use Unicode::Style
Remove ListFormat's own definition of the Style enum, which was further
duplicated by a generated ListPatternStyle enum with the same values.
2022-01-25 19:02:59 +00:00
Timothy Flynn
4400150cd2 LibJS+LibUnicode: Return the appropriate time zone name depending on DST 2022-01-19 21:20:41 +00:00
Timothy Flynn
bf677eb485 LibUnicode: Generate both standard and daylight time zone names
While LibTimeZone didn't support DST, we only generated one of them,
preferring the standard name. Now that DST can be tested, generate both
names.
2022-01-19 21:20:41 +00:00
Timothy Flynn
701b7810ba LibUnicode: Generate code point abbreviations 2022-01-18 15:13:25 +00:00
Idan Horowitz
877ae85017 LibJS+LibUnicode: Make static const Utf8View variables constexpr 2022-01-17 14:46:07 +00:00
Timothy Flynn
c86f7a675d LibUnicode: Do not limit language display names to known locales
Currently, the UnicodeLocale generator collects a list of known locales
from the CLDR before processing language display names. For each locale,
the identifier is broken into language, script, and region subtags, and
we create a list of seen languages. When processing display names, we
skip languages we hadn't seen in that first step.

This is insufficient for language display names like "en-GB", which do
not have an locale entry in the CLDR, and thus are skipped. So instead,
create the list of known languages by actually reading through the list
of languages which have a display name.
2022-01-13 23:05:31 +01:00
Timothy Flynn
91acc2e9c5 LibUnicode: Parse and generate locale display patterns
These patterns indicate how to display locale strings when that locale
contains multiple subtags. For example, "en-US" would be displayed as
"English (United States)".
2022-01-13 23:05:31 +01:00
Timothy Flynn
0d75949827 LibUnicode: Parse and generate locale display names for date fields 2022-01-13 13:43:57 +01:00
Timothy Flynn
7f162c471d LibUnicode: Parse and generate locale display names for calendars
Note there's a bit of an unfortunate duplication in the calendar enum
generated by UnicodeLocale and the existing enum generated by
UnicodeDateTimeFormat. The former contains every calendar known to the
CLDR, whereas the latter contains the calendars we've actually parsed
for DateTimeFormat (currently only Gregorian). The new enum generated
here can be removed once DateTimeFormat knows about all calendars.
2022-01-13 13:43:57 +01:00
Timothy Flynn
bdf02c21e1 LibUnicode: Swap the preferred order of standard time zone display names
Our generator is currently preferring the DST variant of the time zone
display names over the non-DST variant. LibTimeZone currently does not
have DST support, and operates in a mode that basically assumes DST does
not exist. Swap the display names for now just to be consistent until we
have DST support.

Note we will need to generate both of these variants and select the
appropriate one at runtime once we have DST support.
2022-01-12 15:43:12 +01:00
Timothy Flynn
0d8120eeb2 LibUnicode: Perform number system lookups by enumeration value
Now that number systems are generated as an enum, we can generated the
number system data in the order of that enum. This lets us perform
lookups of that data by index instead of a loop of string comparisons.
2022-01-12 10:49:07 +01:00
Timothy Flynn
c5138f0f2b LibUnicode: Parse number system digits from the CLDR
We had a hard-coded table of number system digits copied from ECMA-402.
Turns out these digits are in the CLDR, so let's parse the digits from
there instead of hard-coding them.
2022-01-12 10:49:07 +01:00
Timothy Flynn
e2dfbe8f67 LibUnicode: Parse and generate long and short generic time zone names
This implements the CalendarPatternStyle::{Long,Short}Generic styles of
time zone name formatting.
2022-01-11 23:56:35 +01:00
Timothy Flynn
8d35563f28 LibUnicode: Implement TR-35's localized GMT offset formatting
This adds an API to use LibTimeZone to convert a time zone such as
"America/New_York" to a GMT offset string like "GMT-5" (short form) or
"GMT-05:00" (long form).
2022-01-11 23:56:35 +01:00
Timothy Flynn
1c2c98ac5d LibTimeZone: Add method to convert a time zone to a string 2022-01-11 00:36:45 +01:00
Timothy Flynn
b543c3e490 Meta: Don't assume how each generator wants to generate keyed map names
The generate_mapping helper generates a series of structs like:

    Array<SomeType, 1> s_mapping_key_0 {};
    Array<SomeType, 2> s_mapping_key_1 {};
    Array<SomeType, 3> s_mapping_key_2 {};
    Array<Span<SomeType const>> s_mapping { {
        s_mapping_key_0.span(),
        s_mapping_key_1.span(),
        s_mapping_key_2.span(),
    } };

Where the names of the struct were generated by the format_mapping_name
lambda inside the helper. Rather than this lambda making assumptions on
how each generator wants to name its structs, add a parameter for the
caller to provide a naming formatter.

This is because the TimeZoneData generator will want pretty specific
identifier formatting rules.
2022-01-11 00:36:45 +01:00
Timothy Flynn
6da1bfeeea Meta: Support generating case-insensitive value-from-string methods
This also extracts the default parameters for generate_value_from_string
to a structure. This is just to make it cleaner to add new options.
2022-01-11 00:36:45 +01:00
Timothy Flynn
498b741434 LibUnicode: Use LibTimeZone's list of time zone names
LibUnicode no longer needs to generate a list of time zone names that it
parsed from metaZones.json. We can defer to the TZDB for a golden list
of time zones.
2022-01-08 12:45:34 +01:00
Timothy Flynn
ca9123f66f LibUnicode: Rename DateTimeFormat's generator's TimeZone struct
Before using LibTimeZone within LibUnicode, rename this structure to
avoid naming conflicts with the TimeZone namespace.
2022-01-08 12:45:34 +01:00
mjz19910
10ec98dd38 Everywhere: Fix spelling mistakes 2022-01-07 15:44:42 +01:00
Timothy Flynn
6d7d9dd324 LibUnicode: Do not assume time zones & meta zones have a 1-to-1 mapping
The generator parses metaZones.json to form a mapping of meta zones to
time zones (AKA "golden zone" in TR-35). This parser errantly assumed
this was a 1-to-1 mapping.
2022-01-06 22:28:01 +01:00
Timothy Flynn
62d8d1fdfd LibUnicode: Move UTC verification to the scope that requires it
In Unicode::get_time_zone_name(), we don't need to require that the time
zone is UTC for long- and short-style name lookups. This is required for
other styles, because they will depend on TZDB data - so move the VERIFY
to that scope.
2022-01-06 22:28:01 +01:00
Timothy Flynn
ec7d5351ed LibJS+LibUnicode: Handle flexible day periods that roll over midnight
When searching for the locale-specific flexible day period for a given
hour, we were neglecting to handle cases where the period crosses 00:00.
For example, the en locale defines a day period range of [21:00, 06:00).
When given the hour of 05:00, we were checking if (21 <= 5 && 5 < 6),
thus not recognizing that the hour falls in that period.
2022-01-05 16:22:55 +01:00
Timothy Flynn
dd88ff70ac LibUnicode: Remove now unused value-from-string generator overload 2022-01-04 22:49:43 +00:00
Timothy Flynn
437b9fe204 LibUnicode: Convert UnicodeData to link with weak symbols 2022-01-04 22:49:43 +00:00
Timothy Flynn
f576142fe8 LibJS+LibUnicode: Convert UnicodeLocale to link with weak symbols 2022-01-04 22:49:43 +00:00
Timothy Flynn
cf8e11a562 LibUnicode: Add temporary overload of value-from-string generator
This is a temporary mechanism while LibUnicode is in an in-between state
where some symbols are weakly linked and others are dynamically loaded.
The latter require an asm() label to be loaded.
2022-01-04 22:49:43 +00:00
Timothy Flynn
ba4cdf34f8 LibUnicode: Convert UnicodeDateTimeFormat to link with weak symbols 2022-01-04 22:49:43 +00:00