Commit graph

315 commits

Author SHA1 Message Date
Timothy Flynn
448754d95d LibWeb: Only fire keypress events if the key press produced a character
For example, pressing just the shift key should not producde a keypress
event.
2024-10-09 19:10:02 +02:00
Timothy Flynn
5eda629326 LibUnicode: Remove unused emoji support methods 2024-09-06 15:42:33 -04:00
Aliaksandr Kalenik
4e9d6a543a Everywhere: Remove bitmap emojis inherited from SerenityOS
These are no longer used since we switched to using the system emoji
font.
2024-09-06 08:30:30 -04:00
Timothy Flynn
3e116769fb LibUnicode: Add code point GC queries for Punctuation and Separator 2024-09-06 07:42:59 +02:00
Timothy Flynn
d392c38a73 LibUnicode: Cache all created icu::TimeZone objects
This cache works exactly the same as the existing icu::Locale cache.
2024-09-03 19:26:04 +02:00
Timothy Flynn
fa4b324a12 LibUnicode: Remove unused time zone cache option
Added this during development while testing some callers, but forgot to
remove it before opening a PR.
2024-08-26 11:37:02 -04:00
Timothy Flynn
b31c11bca5 LibUnicode: Cache the system time zone
It's expensive to determine the system time zone from disk each time it
is requested. This makes LibUnicode cache the result, and provides an
API to clear that cache. This will let us set up a monitor to watch for
system time zone changes in platform-dependent ways.
2024-08-25 09:47:42 +02:00
Timothy Flynn
c7dd4afd9c LibJS+LibUnicode: Update the Intl.DateTimeFormat constructor spec steps
This constructor has undergone a handful of editorial changes that we
fell behind on. But we weren't able to take the updates until now due to
a spec bug in those updates. See:
3f029b0

The result is that we can remove the inheritance of Intl::DateTimeFormat
from Unicode::DateTimeFormat; the former now contains the latter as an
internal slot.
2024-08-15 17:21:00 -04:00
Timothy Flynn
ca1257c6f9 LibJS+LibUnicode: Make the collation sensitivity default locale-aware
Note this happens to be 'variant' for every locale currently.
2024-08-15 13:44:32 +02:00
Timothy Flynn
78625c746d LibJS+LibUnicode: Make the collation punctation default locale-aware 2024-08-15 13:44:32 +02:00
Timothy Flynn
eb7e3583c9 LibJS+LibUnicode: Fully implement Intl.Collator with ICU
We were never able to implement anything other than a basic, locale-
unaware collator with the JSON export of the CLDR as it did not have
collation data. We can now use ICU to implement collation.
2024-08-15 13:44:32 +02:00
Timothy Flynn
ee00730225 LibUnicode+LibJS: Normalize spaces in formatted date-time strings
ICU 72 began using non-ASCII spaces in some formatted date-time strings.
Every major browser has found that this introduced major breakage in web
compatibility, as many sites and tools expect ASCII spaces. This patch
removes these non-ASCII spaces in the same manner as the major engines.
Such behavior is also tested by WPT.
2024-08-02 08:05:52 +02:00
Andrew Kaster
45301e8169 Everywhere: Remove AK_DONT_REPLACE_STD macro
Let's just always include `<utility>`. Placing our own incompatible with
the STL declaration of these functions in AK was always fishy to begin
with.
2024-07-30 18:38:02 -06:00
Andrew Kaster
2fa9ec20bd LibUnicode: Prefix AK::Duration with AK Namespace 2024-07-18 09:43:38 +01:00
Andrew Kaster
bd97442771 Meta: Add vulkan and vulkan-headers to vcpkg dependencies
Also require a specific ICU version to not run into unexpected problems.
2024-07-06 01:44:58 +02:00
Timothy Flynn
672a555f98 LibCore+LibJS+LibUnicode: Port retrieving time zone offsets to ICU
The changes to tests are due to LibTimeZone incorrectly interpreting
time stamps in the TZDB. The TZDB will list zone transitions in either
UTC or the zone's local time (which is then subject to DST offsets).
LibTimeZone did not handle the latter at all.

For example:

The following rule is in effect until November 18, 6PM UTC.

    America/Chicago -5:50:36 - LMT 1883 Nov 18 18:00u

The following rule is in effect until March 1, 2AM in Chicago time. But
at that time, a DST transition occurs, so the local time is actually
3AM.

    America/Chicago -6:00 Chicago C%sT 1936 Mar 1 2:00
2024-06-26 10:14:02 +02:00
Timothy Flynn
1b2d47e6bb LibJS+LibUnicode: Port retrieving available regional time zones to ICU 2024-06-26 10:14:02 +02:00
Timothy Flynn
4fc0fba646 LibCore+LibJS+LibUnicode: Port retrieving available time zones to ICU
This required updating some LibJS spec steps to their latest versions,
as the data expected by the old steps does not quite match the APIs that
are available with the ICU. The new spec steps are much more aligned.
2024-06-26 10:14:02 +02:00
Timothy Flynn
d3e809bcd4 LibJS+LibUnicode: Port retrieving the system time zone to ICU 2024-06-26 10:14:02 +02:00
Timothy Flynn
c379b35798 LibUnicode: Move helper to convert StringEnumeration to a list to ICU.h
This will be needed outside of UnicodeKeywords.cpp.
2024-06-26 10:14:02 +02:00
Andrew Kaster
a587eafbf4 CMake: Consistently use imported targets for third party dependencies 2024-06-25 17:15:42 -04:00
Timothy Flynn
ebdb92eef6 LibUnicode+Everywhere: Merge LibLocale back into LibUnicode
LibLocale was split off from LibUnicode a couple years ago to reduce the
number of applications on SerenityOS that depend on CLDR data. Now that
we use ICU, both LibUnicode and LibLocale are actually linking in this
data. And since vcpkg gives us static libraries, both libraries are over
30MB in size.

This patch reverts the separation and merges LibLocale into LibUnicode
again. We now have just one library that includes the ICU data.

Further, this will let LibUnicode share the locale cache that previously
would only exist in LibLocale.
2024-06-23 19:52:45 +02:00
Timothy Flynn
9220a89d2f CI+LibUnicode: Remove the UCD from the system 2024-06-22 14:56:39 +02:00
Timothy Flynn
2ba7b4c529 LibUnicode: Remove now-unused code generator facilities 2024-06-22 14:56:39 +02:00
Timothy Flynn
069bed5d47 LibUnicode+LibGfx: Remove superfluous emoji metadata
For SerenityOS, we parse emoji metadata from the UCD to learn emoji
groups, subgroups, names, etc. We used this information only in the
emoji picker dialog. It is entirely unused within Ladybird.

This removes our dependence on the UCD emoji file, as we no longer
need any of its information. All we need to know is the file path to
our custom emoji, which we get from Meta/emoji-file-list.txt.
2024-06-22 14:56:39 +02:00
Timothy Flynn
aa3a30870b LibUnicode: Replace code point bidirectional classes with ICU 2024-06-22 14:56:39 +02:00
Timothy Flynn
e77dafc987 LibUnicode: Replace code point scripts and script extensions with ICU 2024-06-22 14:56:39 +02:00
Timothy Flynn
986ff984cc LibUnicode: Replace code point general categories with ICU 2024-06-22 14:56:39 +02:00
Timothy Flynn
c804bda5fd LibUnicode: Replace code point properties with ICU 2024-06-22 14:56:39 +02:00
Timothy Flynn
ab56b8c8dc LibUnicode: Remove the locale-unaware text segmentation implementation 2024-06-20 13:46:54 +02:00
Timothy Flynn
5cf818e305 LibUnicode: Replace case transformations and comparison with ICUs
There are a couple of differences here due to using ICU:

1. Titlecasing behaves slightly differently. We previously transformed
   "123dollars" to "123Dollars", as we would use word segmentation to
   split a string into words, then transform the first cased character
   to titlecase. ICU doesn't go quite that far, and leaves the string
   as "123dollars". While this is a behavior change, the only user of
   this API is the `text-transform: capitalize;` CSS rule, and we now
   match the behavior of other browsers.

2. There isn't an API to compare strings with case insensitivity without
   allocating case-folded strings for both the left- and right-hand-side
   strings. Our implementation was previously allocation-free; however,
   in a benchmark, ICU is still ~1.4x faster.
2024-06-20 10:59:55 +02:00
Timothy Flynn
8d7216f4e0 LibUnicode: Replace IDNA ASCII conversion with ICU 2024-06-18 21:07:56 +02:00
Timothy Flynn
83475c5380 LibUnicode: Replace Unicode string normalization with ICU
In a benchmark, ICU's implementation was over 3x faster than ours.
2024-06-18 21:07:56 +02:00
Timothy Flynn
1feef17bf7 LibUnicode: Remove completely unused code point name & block name data
These were used for e.g. the Character Map on Serenity, but are not used
at all for Ladybird.
2024-06-18 21:07:56 +02:00
Timothy Flynn
fe3fde2411 AK+LibUnicode: Implement a case-insensitive variant of find_byte_offset
The existing String::find_byte_offset is case-sensitive. This variant
allows performing searches using Unicode-aware case folding.
2024-06-01 07:37:54 +02:00
Andreas Kling
df547bb321 LibUnicode: Avoid redundant UTF-8 validation in AK::String helpers 2024-04-21 19:32:49 +02:00
Idan Horowitz
945c58c7c1 LibUnicode: Generate and use code point composition mappings
These allow us to binary search the code point compositions based on
the first code point being combined, which makes the search close to
O(log N) instead of O(N).
2024-04-06 14:21:04 -04:00
Idan Horowitz
e227bf0f71 LibUnicode: Optimize the canonical composition algorithm implementation
It now takes O(N) time instead of O(N^2) time. Additionally some always
false conditions are removed.
2024-04-06 14:21:04 -04:00
Timothy Flynn
576c2f4f4d LibURL+LibUnicode+LibWebView: Handle punycode directly in LibURL
We had defined punycode handling in LibUnicode when LibURL (AK at the
time) was unable to depend on LibUnicode. This is no longer the case.
2024-03-26 12:25:21 -04:00
Shannon Booth
e800605ad3 AK+LibURL: Move AK::URL into a new URL library
This URL library ends up being a relatively fundamental base library of
the system, as LibCore depends on LibURL.

This change has two main benefits:
 * Moving AK back more towards being an agnostic library that can
   be used between the kernel and userspace. URL has never really fit
   that description - and is not used in the kernel.
 * URL _should_ depend on LibUnicode, as it needs punnycode support.
   However, it's not really possible to do this inside of AK as it can't
   depend on any external library. This change brings us a little closer
   to being able to do that, but unfortunately we aren't there quite
   yet, as the code generators depend on LibCore.
2024-03-18 14:06:28 -04:00
Timothy Flynn
aa0a6d58b2 Userland: Remove LibCore dependency from libraries that do not use it 2024-01-22 08:48:34 -05:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Timothy Flynn
43e9dc0500 LibUnicode: Use weak symbols to provide default IDNA defintions
Rather than using #ifdef blocks, update the fallback IDNA definitions to
use weak symbols to match the rest of LibUnicode / LibLocale.
2023-12-10 10:19:14 -05:00
Timothy Flynn
1f0e24bc3b LibUnicode: Fix compilation when ENABLE_UNICODE_DATABASE_DOWNLOAD is OFF 2023-12-10 10:19:14 -05:00
Simon Wanner
58f08107b0 AK+LibUnicode: Add Unicode::create_unicode_url
This is a workaround for the fact that AK::URLParser can't call into
LibUnicode directly.
2023-12-10 08:04:58 -05:00
Simon Wanner
5bcb019106 LibUnicode: Add IDNA::to_ascii
This implements the ToASCII operation of Unicode Technical Standard 46
2023-12-10 08:04:58 -05:00
Simon Wanner
7d9fe44039 LibUnicode: Download and parse IDNA data 2023-12-10 08:04:58 -05:00
Simon Wanner
cfd0a60863 LibUnicode: Add Punycode::encode 2023-12-10 08:04:58 -05:00
Simon Wanner
299d35aadc LibUnicode: Add Punycode::decode 2023-12-10 08:04:58 -05:00
Shannon Booth
d777b279e3 LibUnicode+Tests: Remove now unused to_unicode_*_full methods
Relocating all of the tests for these in LibUnicode over to the AK
String testsuite.
2023-11-28 17:15:27 -05:00