ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-07-12 20:11:51 +00:00

Author	SHA1	Message	Date
Timothy Flynn	537fcaf59e	AK+LibUnicode: Provide Unicode-aware caseless String matching The Unicode spec defines much more complicated caseless matching algorithms in its Collation spec. This implements the "basic" case folding comparison.	2023-01-18 14:43:40 +00:00
Timothy Flynn	8f2589b3b0	LibUnicode: Parse and generate case folding code point data Case folding rules have a similar mapping style as special casing rules, where one code point may map to zero or more case folding rules. These will be used for case-insensitive string comparisons. To see how case folding can differ from other casing rules, consider "ß" (U+00DF): >>> "ß".lower() 'ß' >>> "ß".upper() 'SS' >>> "ß".title() 'Ss' >>> "ß".casefold() 'ss'	2023-01-18 14:43:40 +00:00
Timothy Flynn	8d9fb898d7	LibUnicode: Update out-of-date spec links And remove links that aren't adding much value but will often get out of date (i.e. links to UCD files, which are already all listed in unicode_data.cmake).	2023-01-18 14:43:40 +00:00
Timothy Flynn	d6ddca0c0f	AK+LibUnicode: Provide Unicode-aware String titlecase transformation	2023-01-16 18:33:44 -05:00
Timothy Flynn	bc51017a03	LibUnicode: Support full case folding for titlecasing a string Unicode declares that to titlecase a string, the first cased code point after each word boundary should be transformed to its titlecase mapping. All other codepoints are transformed to their lowercase mapping.	2023-01-16 18:33:44 -05:00
Timothy Flynn	b562348d31	LibUnicode: Generate simple case folding mappings for titlecase Note we already generate the special case foldings for titlecase.	2023-01-16 18:33:44 -05:00
Timothy Flynn	6d710eeb43	LibUnicode: Add an overload of word segmentation for UTF-8 strings	2023-01-16 18:33:44 -05:00
Timothy Flynn	58bc831750	LibUnicode: Return a String from Unicode normalization	2023-01-15 01:00:20 +00:00
Timothy Flynn	6fcc1c7426	AK+LibUnicode: Provide Unicode-aware String case transformations Since AK can't refer to LibUnicode directly, the strategy here is that if you need case transformations, you can link LibUnicode and receive them. If you try to use either of these methods without linking it, then you'll of course get a linker error (note we don't do any fallbacks to e.g. ASCII case transformations). If you don't need these methods, you don't have to link LibUnicode.	2023-01-09 19:23:46 -07:00
Timothy Flynn	12f6793223	LibUnicode: Move Unicode-aware case transformations to a helper file These will be needed by AK::String as well, so move them to a helper file where they can be re-used.	2023-01-09 19:23:46 -07:00
Timothy Flynn	3d22efccca	LibUnicode+LibJS: Propagate OOM from Unicode normalization	2023-01-09 22:48:15 +00:00
Timothy Flynn	1ff29afc45	LibUnicode+LibJS+LibWeb: Propagate OOM from Unicode case transformations	2023-01-09 22:48:15 +00:00
Timothy Flynn	d382e77d38	LibUnicode: Fix compilation when the UCD download is disabled	2022-12-14 15:24:48 +00:00
Linus Groh	57dc179b1f	Everywhere: Rename to_{string => deprecated_string}() where applicable This will make it easier to support both string types at the same time while we convert code, and tracking down remaining uses. One big exception is Value::to_string() in LibJS, where the name is dictated by the ToString AO.	2022-12-06 08:54:33 +01:00
Linus Groh	6e19ab2bbc	AK+Everywhere: Rename String to DeprecatedString We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)	2022-12-06 08:54:33 +01:00
Gunnar Beutner	2d3567ee92	Meta+LibUnicode: Avoid relocations for static unicode data Previously the s_decomposition_mappings variable would refer to other data in s_decomposition_mappings_data. This would cause thousands of avoidable relocations at load time. This saves about 128kB RAM for each process which uses LibUnicode.	2022-11-06 17:34:06 +01:00
Tim Schumacher	ce2f1b845f	Everywhere: Mark dependencies of most targets as PRIVATE Otherwise, we end up propagating those dependencies into targets that link against that library, which creates unnecessary link-time dependencies. Also included are changes to readd now missing dependencies to tools that actually need them.	2022-11-01 14:49:09 +00:00
Andrew Kaster	b8e51425e9	Lagom+CMake: Propagate dependencies for generated custom targets We have logic for serenity_generated_sources which works well for source files that are specified in GENERATED_SOURCES prior to calling serenity_lib or serenity_bin. However, code generated with invoke_generator, and the LibWeb generators do not always follow the pattern of the IDL and GML files. For the LibWeb generators, we can just add_dependencies to LibWeb at the time we declare the generate_Foo custom target. However for LibLocale, LibTimeZone, and LibUnicode, we don't have the name of the target available, so export the name in a variable to set into GENERATED_SOURCES. To make this work for Lagom, we need to make sure that lagom_lib and serenity_bin in Lagom/CMakeLists.txt call serenity_generated_sources on the target. This enables the Xcode generator on macOS hosts, at least for Lagom.	2022-10-17 15:55:55 +02:00
matcool	104b51b912	LibUnicode: Fix Hangul syllable composition for specific cases This fixes `combine_hangul_code_points` which would try to combine a LVT syllable with a trailing consonant, resulting in a wrong character. Also added a test for this specific case.	2022-10-07 07:53:27 -04:00
Timothy Flynn	19b758ce8b	LibUnicode: Add to-and-from string converters for NormalizationForm	2022-10-06 22:14:44 +01:00
matcool	70d0c1616f	LibUnicode: Add decomposition mappings and Unicode normalization The mappings are exposed via `Unicode::code_point_decomposition(u32)` and `Unicode::code_point_decompositions()`, the latter being useful for reverse searching a code point from its decomposition. The normalization code does not make use of `Quick_Check` props (https://www.unicode.org/reports/tr44/#Decompositions_and_Normalization), meaning no quick check optimizations.	2022-10-06 08:24:39 -04:00
Timothy Flynn	b7ef36aa36	LibUnicode: Parse and generate custom emoji added for SerenityOS Parse emoji from emoji-serenity.txt to allow displaying their names and grouping them together in the EmojiInputDialog. This also adds an "Unknown" value to the EmojiGroup enum. This will be useful for emoji that aren't found in the UCD, or for when UCD downloads are disabled.	2022-09-11 20:33:57 +01:00
Timothy Flynn	b61eca0a1e	LibUncode: Parse and generate emoji code point data According to TR #51, the "best definition of the full set [of emojis] is in the emoji-test.txt file". This defines not only the emoji themselves, but the order in which they should be displayed, and what "group" of emojis they belong to.	2022-09-08 23:12:31 +01:00
Timothy Flynn	9e860d973e	LibLocale: Move locale source files to the LibLocale library Everything is now setup to create the LibLocale library and link it where needed.	2022-09-05 14:37:16 -04:00
Timothy Flynn	f082b6ae48	LibUnicode: Generate a separate Locale enumeration for special casing The UCD only cares about a few locales for special casing rules (az, lt, and tr). Unfortunately, LibUnicode cannot use LibLocale once the libraries are separate because LibLocale will need to use LibUnicode for many more things; thus there would be a circular dependency. Instead, just generate the small enum needed for this one use case.	2022-09-05 14:37:16 -04:00
Timothy Flynn	43a3471298	LibLocale: Move locale source files to the LibLocale folder These are still included in LibUnicode, but this updates their location and the include paths of other files which include them.	2022-09-05 14:37:16 -04:00
Timothy Flynn	ff48220dca	Userland: Move files destined for LibLocale to the Locale namespace	2022-09-05 14:37:16 -04:00
Timothy Flynn	6c7b05a0ff	LibUnicode+LibJS: Move Unicode::get_available_currencies() to Locale.h This is generated by GenerateLocaleData, which will soon be in the Locale namespace. Move it out of CurrencyCode.h, as that will continue to live in the Unicode namespace.	2022-09-05 14:37:16 -04:00
Timothy Flynn	1e0276f541	LibLocale+LibUnicode: Move generated CLDR data files to LibLocale folder They are still included into LibUnicode, but this moves their generated location to be under LibLocale.	2022-09-05 14:37:16 -04:00
Timothy Flynn	fc8bf7ac3e	LibUnicode+Userland: Migrate generated CLDR data to LibLocaleData Currently, LibUnicodeData contains the generated UCD and CLDR data. Move the UCD data to the main LibUnicode library, and rename LibUnicodeData to LibLocaleData. This is another prepatory change to migrate to LibLocale.	2022-09-05 14:37:16 -04:00
Timothy Flynn	89d1813b5d	LibUnicode: Move CLDR data generators to a LibLocale subfolder To prepare for placing all CLDR generated data in a new library, LibLocale, this moves the code generators for the CLDR data to the LibLocale subfolder.	2022-09-05 14:37:16 -04:00
Timothy Flynn	e3e0602833	LibUnicode: Fully qualify use of AK::Variant in Locale.h The generated locale data contains an enum also named Variant, as variants are part of locale strings. This hasn't been an issue, but as includes are reordered, the order in which the enum and AK::Variant are included may cause an ambiguity error.	2022-09-05 14:37:16 -04:00
Timothy Flynn	6af9bf1a1e	LibUnicode: Fix compilation when ENABLE_UNICODE_DATABASE_DOWNLOAD is OFF	2022-08-25 16:20:22 +01:00
Timothy Flynn	32c07bc6c3	LibUnicode: Generate per-locale data for the "noon" fixed day period Note that not all locales have this day period.	2022-07-21 20:36:03 +01:00
Timothy Flynn	0a6363d3e9	LibUnicode: Implement the range pattern processing algorithm This algorithm is to inject spacing around the range separator under certain conditions. For example, in en-US, the range [3, 5] should be formatted as "3–5" if unitless, but as "$3 – $5" for currency.	2022-07-20 22:30:16 +01:00
Timothy Flynn	b2709f161e	LibUnicode: Generate per-locale approximately & range separator symbols	2022-07-20 22:30:16 +01:00
Timothy Flynn	998f62936b	LibUnicode: Remove obsolete Unicode::get_default_number_system This has been superseded by get_preferred_keyword_value_for_locale, which doesn't require allocating a Vector just to return its first element.	2022-07-15 12:31:43 +02:00
Timothy Flynn	f8f7015419	LibUnicode: Generate a method to lookup locale-preferred keyword values	2022-07-15 12:31:43 +02:00
Timothy Flynn	80568d5776	LibUnicode: Generate a method to lookup available keyword values	2022-07-15 12:31:43 +02:00
Timothy Flynn	c2e5b20eb6	LibUnicode: Generate available values for the keywords co, kf, kn, hc This also ensures we only include values we actually support in the generated list of available values.	2022-07-15 12:31:43 +02:00
Timothy Flynn	a337b059dd	LibUnicode: Parse and generate per-locale plural ranges	2022-07-12 00:43:34 +01:00
Timothy Flynn	f672b4c151	LibUnicode: Remove now-unused Unicode::select_pattern_with_plurality	2022-07-08 20:33:52 +02:00
Timothy Flynn	232df4196b	LibUnicode: Replace NumberFormat::Plurality with Unicode::PluralCategory To prepare for using plural rules within number & duration format, this removes the NumberFormat::Plurality enumeration. This also adds PluralCategory::ExactlyZero & PluralCategory::ExactlyOne. These are used in locales like French, where PluralCategory::One really means any value from 0.00 to 1.99. PluralCategory::ExactlyOne means only the value 1, as the name implies. These exact rules are not known by the general plural rules, they are explicitly for number / currency format.	2022-07-08 20:33:52 +02:00
Timothy Flynn	cc5c707649	LibJS+LibUnicode: Do not generate the PluralCategory enum The PluralCategory enum is currently generated for plural rules. Instead of generating it, this moves the enum to the public LibUnicode header. While it was nice to auto-discover these values, they are well defined by TR-35, and we will need their values from within the number format code generator (which can't rely on the plural rules generator having run yet). Further, number format will require additional values in the enum that plural rules doesn't know about.	2022-07-08 20:33:52 +02:00
Timothy Flynn	bf85bf2a9e	LibJS: Use Intl.PluralRules within Intl.RelativeFormat The Polish test cases added here cover previous failures from test262, due to the way that 0 is specified to be "many" in Polish.	2022-07-08 11:51:54 +02:00
Timothy Flynn	8aeacccd82	LibUnicode: Generate a list of available plural categories per locale Separate lists are generated for cardinal and ordinal form.	2022-07-08 11:51:54 +02:00
Timothy Flynn	ea78bac36d	LibUnicode: Parse and generate per-locale plural rules from the CLDR Plural rules in the CLDR are of the form: "cs": { "pluralRule-count-one": "i = 1 and v = 0 @integer 1", "pluralRule-count-few": "i = 2..4 and v = 0 @integer 2~4", "pluralRule-count-many": "v != 0 @decimal 0.0~1.5, 10.0, 100.0 ...", "pluralRule-count-other": "@integer 0, 5~19, 100, 1000, 10000 ..." } The syntax is described here: https://unicode.org/reports/tr35/tr35-numbers.html#Plural_rules_syntax There are up to 2 sets of rules for each locale, a cardinal set and an ordinal set. The approach here is to generate a C++ function for each set of rules. Each condition in the rules (e.g. "i = 1 and v = 0") is transpiled to a C++ if-statement within its function. Then lookup tables are generated to match locales to their generated functions. NOTE: -Wno-parentheses-equality is added to the LibUnicodeData compile flags because the generated plural rules have lots of extra parentheses (because e.g. we need to selectively negate and combine rules). The code to generate only exactly the right number of parentheses is quite hairy, so this just tells the compiler to ignore the extras.	2022-07-08 11:51:54 +02:00
Timothy Flynn	12e7c0808a	LibUnicode: Generate per-region week data This includes: * The minimum number of days in a week for that week to count as the first week of a new year. * The day to be shown as the first day of the week in a calendar. * The start/end days of the weekend. Like the existing hour cycle data, week data is presented per-region in the CLDR, rather than per-locale. The method to add likely subtags to a locale to perform region lookups is the same. The list of regions in the CLDR for hour cycle, minimum days, first day, and weekend days are quite different. So rather than changing the existing HourCycleRegion enum to a generic Region enum, we generate separate enums for each of the week data fields. This allows each lookup into these fields to remain simple array-based index access, without any "jumps" for regions that don't have CLDR data for a field.	2022-07-06 16:56:42 +02:00
Timothy Flynn	4868b888be	LibUnicode: Generate per-locale text layout information Currently contains just each locale's character order, but is set up to easily add other text layout fields from the CLDR if ECMA-402 eventually requires them.	2022-07-06 16:56:42 +02:00
DexesTTP	7ceeb74535	AK: Use an enum instead of a bool for String::replace(all_occurences) This commit has no behavior changes. In particular, this does not fix any of the wrong uses of the previous default parameter (which used to be 'false', meaning "only replace the first occurence in the string"). It simply replaces the default uses by String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.	2022-07-06 11:12:45 +02:00

1 2 3 4 5

242 commits