It's expensive to determine the system time zone from disk each time it
is requested. This makes LibUnicode cache the result, and provides an
API to clear that cache. This will let us set up a monitor to watch for
system time zone changes in platform-dependent ways.
This constructor has undergone a handful of editorial changes that we
fell behind on. But we weren't able to take the updates until now due to
a spec bug in those updates. See:
3f029b0
The result is that we can remove the inheritance of Intl::DateTimeFormat
from Unicode::DateTimeFormat; the former now contains the latter as an
internal slot.
We were never able to implement anything other than a basic, locale-
unaware collator with the JSON export of the CLDR as it did not have
collation data. We can now use ICU to implement collation.
ICU 72 began using non-ASCII spaces in some formatted date-time strings.
Every major browser has found that this introduced major breakage in web
compatibility, as many sites and tools expect ASCII spaces. This patch
removes these non-ASCII spaces in the same manner as the major engines.
Such behavior is also tested by WPT.
The changes to tests are due to LibTimeZone incorrectly interpreting
time stamps in the TZDB. The TZDB will list zone transitions in either
UTC or the zone's local time (which is then subject to DST offsets).
LibTimeZone did not handle the latter at all.
For example:
The following rule is in effect until November 18, 6PM UTC.
America/Chicago -5:50:36 - LMT 1883 Nov 18 18:00u
The following rule is in effect until March 1, 2AM in Chicago time. But
at that time, a DST transition occurs, so the local time is actually
3AM.
America/Chicago -6:00 Chicago C%sT 1936 Mar 1 2:00
This required updating some LibJS spec steps to their latest versions,
as the data expected by the old steps does not quite match the APIs that
are available with the ICU. The new spec steps are much more aligned.
LibLocale was split off from LibUnicode a couple years ago to reduce the
number of applications on SerenityOS that depend on CLDR data. Now that
we use ICU, both LibUnicode and LibLocale are actually linking in this
data. And since vcpkg gives us static libraries, both libraries are over
30MB in size.
This patch reverts the separation and merges LibLocale into LibUnicode
again. We now have just one library that includes the ICU data.
Further, this will let LibUnicode share the locale cache that previously
would only exist in LibLocale.
For SerenityOS, we parse emoji metadata from the UCD to learn emoji
groups, subgroups, names, etc. We used this information only in the
emoji picker dialog. It is entirely unused within Ladybird.
This removes our dependence on the UCD emoji file, as we no longer
need any of its information. All we need to know is the file path to
our custom emoji, which we get from Meta/emoji-file-list.txt.
There are a couple of differences here due to using ICU:
1. Titlecasing behaves slightly differently. We previously transformed
"123dollars" to "123Dollars", as we would use word segmentation to
split a string into words, then transform the first cased character
to titlecase. ICU doesn't go quite that far, and leaves the string
as "123dollars". While this is a behavior change, the only user of
this API is the `text-transform: capitalize;` CSS rule, and we now
match the behavior of other browsers.
2. There isn't an API to compare strings with case insensitivity without
allocating case-folded strings for both the left- and right-hand-side
strings. Our implementation was previously allocation-free; however,
in a benchmark, ICU is still ~1.4x faster.
These allow us to binary search the code point compositions based on
the first code point being combined, which makes the search close to
O(log N) instead of O(N).
This URL library ends up being a relatively fundamental base library of
the system, as LibCore depends on LibURL.
This change has two main benefits:
* Moving AK back more towards being an agnostic library that can
be used between the kernel and userspace. URL has never really fit
that description - and is not used in the kernel.
* URL _should_ depend on LibUnicode, as it needs punnycode support.
However, it's not really possible to do this inside of AK as it can't
depend on any external library. This change brings us a little closer
to being able to do that, but unfortunately we aren't there quite
yet, as the code generators depend on LibCore.
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')