This includes:
* The minimum number of days in a week for that week to count as the
first week of a new year.
* The day to be shown as the first day of the week in a calendar.
* The start/end days of the weekend.
Like the existing hour cycle data, week data is presented per-region in
the CLDR, rather than per-locale. The method to add likely subtags to a
locale to perform region lookups is the same.
The list of regions in the CLDR for hour cycle, minimum days, first day,
and weekend days are quite different. So rather than changing the
existing HourCycleRegion enum to a generic Region enum, we generate
separate enums for each of the week data fields. This allows each lookup
into these fields to remain simple array-based index access, without any
"jumps" for regions that don't have CLDR data for a field.
This commit has no behavior changes.
In particular, this does not fix any of the wrong uses of the previous
default parameter (which used to be 'false', meaning "only replace the
first occurence in the string"). It simply replaces the default uses by
String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.
BCP 47 will be the single source of truth for known calendar and number
system keywords, and their aliases (e.g. "gregory" is an alias for
"gregorian"). Move the generation of available keywords to where we
parse the BCP 47 data, so that hard-coded aliases may be removed from
other generators.
The following table in TR-35 includes a web of fall back rules when the
requested time zone style is unavailable:
https://unicode.org/reports/tr35/tr35-dates.html#dfst-zone
Conveniently, the subset of styles supported by ECMA-402 (and therefore
LibUnicode) all either fall back to GMT offset or to a style that is
unsupported but itself falls back to GMT offset.
This adds an API to use LibTimeZone to convert a time zone such as
"America/New_York" to a GMT offset string like "GMT-5" (short form) or
"GMT-05:00" (long form).
LibUnicode no longer needs to generate a list of time zone names that it
parsed from metaZones.json. We can defer to the TZDB for a golden list
of time zones.
ECMA-402 now supports short-offset, long-offset, short-generic, and
long-generic time zone name formatting. For example, in the en-US locale
the America/Eastern time zone would be formatted as:
short-offset: GMT-5
long-offset: GMT-05:00
short-generic: ET
long-generic: Eastern Time
We currently only support the UTC time zone, however. Therefore, this
very minimal implementation does not consider GMT offset or generic
display names. Instead, the CLDR defines specific strings for UTC.
The parsing in parse_calendar_symbols() might be a bit more verbose than
it really needs to be, but it is to ensure the symbols are generated in
a known order that we can control with enumerations.
Unlike most data in the CLDR, hour cycles are not stored on a per-locale
basis. Instead, they are keyed by a string that is usually a region, but
sometimes is a locale. Therefore, given a locale, to determine the hour
cycles for that locale, we:
1. Check if the locale itself is assigned hour cycles.
2. If the locale has a region, check if that region is assigned hour
cycles.
3. Otherwise, maximize that locale, and if the maximized locale has
a region, check if that region is assigned hour cycles.
4. If the above all fail, fallback to the "001" region.
Further, each locale's default hour cycle is the first assigned hour
cycle.