ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-10-09 09:39:13 +00:00

Author	SHA1	Message	Date
Timothy Flynn	ada4bab405	LibUnicode: Remove GeneralCategory::Symbol string lookup When I originally wrote this method, I had it in LibJS, where we can't refer to the GeneralCategory enumeration directly. This is a big TODO, anyone outside of LibUnicode can't assume the generated enumerations exist and must get these values by string lookup. But this function ended up living in LibUnicode, who can reference the enumeration.	2021-11-13 19:01:25 +00:00
Timothy Flynn	a701ed52fc	LibJS+LibUnicode: Fully implement currency number formatting Currencies are a bit strange; the layout of currency data in the CLDR is not particularly compatible with what ECMA-402 expects. For example, the currency format in the "en" and "ar" locales for the Latin script are: en: "¤#,##0.00" ar: "¤\u00A0#,##0.00" Note how the "ar" locale has a non-breaking space after the currency symbol (¤), but "en" does not. This does not mean that this space will appear in the "ar"-formatted string, nor does it mean that a space won't appear in the "en"-formatted string. This is a runtime decision based on the currency display chosen by the user ("$" vs. "USD" vs. "US dollar") and other rules in the Unicode TR-35 spec. ECMA-402 shies away from the nuances here with "implementation-defined" steps. LibUnicode will store the data parsed from the CLDR however it is presented; making decisions about spacing, etc. will occur at runtime based on user input.	2021-11-13 11:52:45 +00:00
Timothy Flynn	9421d5c0cf	LibUnicode: Generate currency unit-pattern number formats These are used when formatting a number as currency with a display option of "name" (e.g. for USD, the name is "US Dollars" in en-US). These patterns appear in the CLDR in a different manner than other number formats that are pluralized. They are of the form "{0} {1}", therefore do not undergo subpattern replacements.	2021-11-13 11:52:45 +00:00
Timothy Flynn	39e031c4dd	LibJS+LibUnicode: Generate all styles of currency localizations Currently, LibUnicode is only parsing and generating the "long" style of currency display names. However, the CLDR contains "short" and "narrow" forms as well that need to be handled. Parse these, and update LibJS to actually respect the "style" option provided by the user for displaying currencies with Intl.DisplayNames. Note: There are some discrepencies between the engines on how style is handled. In particular, running: new Intl.DisplayNames('en', {type:'currency', style:'narrow'}).of('usd') Gives: SpiderMoney: "USD" V8: "US Dollar" LibJS: "$" And running: new Intl.DisplayNames('en', {type:'currency', style:'short'}).of('usd') Gives: SpiderMonkey: "$" V8: "US Dollar" LibJS: "$" My best guess is V8 isn't handling style, and just returning the long form (which is what LibJS did before this commit). And SpiderMoney can handle some styles, but if they don't have a value for the requested style, they fall back to the canonicalized code passed into of().	2021-11-13 11:52:45 +00:00
Timothy Flynn	1f2ac0ab41	LibUnicode: Move number formatting code generator to UnicodeNumberFormat	2021-11-12 20:46:38 +00:00
Timothy Flynn	be69eae651	LibUnicode: Precompute the compact scale of each number formatting rule This will be needed for the ComputeExponentForMagnitude AO for compact formatting, namely step 5b: Let exponent be an implementation- and locale-dependent (ILD) integer by which to scale a number of the given magnitude in compact notation for the current locale.	2021-11-12 09:17:08 +00:00
Timothy Flynn	230b133ee3	LibUnicode: Parse number formats into zero/positive/negative patterns A number formatting pattern in the CLDR contains one or two entries, delimited by a semi-colon. Previously, LibUnicode was just storing the entire pattern as one string. This changes the generator to split the pattern on that delimiter and generate the 3 unique patterns expected by ECMA-402. The rules for generating the 3 patterns are as follows: * If the pattern contains 1 entry, it is the zero pattern. The positive pattern is the zero pattern prepended with {plusSign}. The negative pattern is the zero pattern prepended with {minusSign}. * If the pattern contains 2 entries, the first is the zero pattern, and the second is the negative pattern. The positive pattern is the zero pattern prepended with {plusSign}.	2021-11-12 09:17:08 +00:00
Timothy Flynn	1244ebcd4f	LibUnicode: Parse and generate standard accounting formatting rules Also known as "currency-accounting" in some CLDR documentation.	2021-11-12 09:17:08 +00:00
Timothy Flynn	967afc1b84	LibUnicode: Parse and generate standard currency formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	bffd73e0d4	LibUnicode: Parse and generate standard decimal formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	feb8c22a62	LibUnicode: Parse and generate standard percentage formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	4317a1b552	LibUnicode: Parse and generate compact currency formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	604a596c90	LibUnicode: Parse and generate compact decimal formatting rules	2021-11-12 09:17:08 +00:00
Timothy Flynn	12b468a588	LibUnicode: Begin parsing and generating locale number systems The number system data in the CLDR contains information on how to format numbers in a locale-dependent manner. Start parsing this data, beginning with numeric symbol strings. For example the symbol NaN maps to "NaN" in the en-US locale, and "非數值" in the zh-Hant locale.	2021-11-12 09:17:08 +00:00
Andreas Kling	8b1108e485	Everywhere: Pass AK::StringView by value	2021-11-11 01:27:46 +01:00
Timothy Flynn	d83b262e64	LibUnicode: Generate standalone compile-time array for combining class	2021-10-10 13:49:37 +02:00
Timothy Flynn	9f83774913	LibUnicode: Generate standalone compile-time array for special casing There are only 112 code points with special casing rules, so this array is quite small (compared to the size 34,626 UnicodeData hash map that is also storing this data). Removing all casing rules from UnicodeData will happen in a subsequent commit.	2021-10-10 13:49:37 +02:00
Timothy Flynn	da4b8897a7	LibUnicode: Generate standalone compile-time arrays for simple casing Currently, all casing information (simple and special) are stored in a compile-time array of size 34,626, then statically copied to a hash map at runtime. In an effort to reduce the resulting memory usage, store the simple casing rules in standalone compile-time arrays. The uppercase map is size 1,450 and the lowercase map is size 1,433. Any code point not in a map will implicitly have an identity mapping.	2021-10-10 13:49:37 +02:00
Timothy Flynn	e6334cb856	LibUnicode: Add some data related to currency codes This data is published under ISO-4217 as an XML file. Since we can't parse XML files yet, and the data isn't very large, it was translated to C++ manually here.	2021-09-11 11:05:50 +01:00
Timothy Flynn	3ae4ff109f	LibUnicode: Extract canonicalization of Unicode extension values LibJS will need to canonicalize Unicode extension values, so extract the lambda that was doing this work to its own function. This also changes the helpers it invokes to take the provided key as a StringView because we don't need (and won't always have) full String objects here.	2021-09-11 11:05:50 +01:00
Timothy Flynn	b1d4bcf364	LibUnicode: Generate numeric keyword values for each locale This is needed for Intl.NumberFormat's usage of the ResolveLocale AO, where the [[RelevantExtensionKeys]] internal slot will be "nu".	2021-09-11 11:05:50 +01:00
Timothy Flynn	4f2bcebe74	LibUnicode+LibJS: Store locale keyword values as a single string Previously, LibUnicode would store the values of a keyword as a Vector. For example, the locale "en-u-ca-abc-def" would have its keyword "ca" stored as {"abc, "def"}. Then, canonicalization would occur on each of the elements in that Vector. This is incorrect because, for example, the keyword value "true" should only be dropped if that is the entire value. That is, the canonical form of "en-u-kb-true" is "en-u-kb", but "en-u-kb-abc-true" does not change for canonicalization. However, we would canonicalize that locale as "en-u-kb-abc".	2021-09-08 21:08:48 +01:00
Timothy Flynn	75657b79c6	LibUnicode: Update comment with link to related upstream issue LibUnicode has to hard-code some aliases because the related data is not available in the JSON export of CLDR. Turns out there is a ticket to add this data in an upcoming CLDR release. Add a link to that ticket for reference.	2021-09-08 21:08:48 +01:00
Timothy Flynn	3f64a14e06	LibUnicode: Parse and generate the Unicode locale list patterns dataset This data informs consumers how to join lists of values. For example, in en-US, the list ["a", "b", "c"] formatted to a string should become "a, b, and c".	2021-09-06 23:49:56 +01:00
Timothy Flynn	40ea659282	LibUnicode+LibJS: Return removed extensions from remove_extension_type Some callers will need to hold onto the removed extensions.	2021-09-06 15:24:27 +01:00
Timothy Flynn	50158abaf1	LibUnicode: Implement locale-aware BEFORE_DOT special casing Note that the algorithm in the Unicode spec is for checking that a code point precedes U+0307, but the special casing condition NotBeforeDot is interested in the inverse of this rule.	2021-09-06 15:24:27 +01:00
Timothy Flynn	436faf9fd9	LibUnicode: Implement locale-aware MORE_ABOVE special casing	2021-09-06 15:24:27 +01:00
Timothy Flynn	1427ebc622	LibUnicode: Implement locale-aware AFTER_SOFT_DOTTED special casing	2021-09-06 15:24:27 +01:00
Timothy Flynn	0053d48c41	LibUnicode: Implement locale-aware AFTER_I special casing	2021-09-06 15:24:27 +01:00
Timothy Flynn	68b2680040	LibUnicode: Ensure case conversion methods increment the current index There was one branch in these methods (the branch where a special casing was found) that neglected to update the current index.	2021-09-06 15:24:27 +01:00
Timothy Flynn	12ae0a44d7	LibUnicode: Add public wrapper for the generated locale_from_string	2021-09-06 15:24:27 +01:00
Timothy Flynn	a77f323dfb	LibUnicode: Implement the Remove Likely Subtags method Unlike Add Likely Subtags, this method doesn't require generated data. Instead, it is defined in terms of Add Likely Subtags.	2021-09-04 13:51:40 +01:00
Timothy Flynn	e6a2ab1202	LibUnicode: Generate an implementation of the Add Likely Subtags method	2021-09-04 13:51:40 +01:00
Timothy Flynn	ca90231794	LibUnicode: Define is_unicode_*_subtag helpers inline in their header The UnicodeLocale generator will need to parse canonicalized locale strings, and will require using these methods. However, the generator cannot depend on LibUnicode because Locale.cpp within LibUnicode already depends on the generated file. Instead, defining the methods that the generator needs inline allows the generator to use them without linking against LibUnicode.	2021-09-04 13:51:40 +01:00
Timothy Flynn	21c4922ac0	LibUnicode: Add helper methods to LocaleID and LanguageID for LibJS Add a method to remove an extension type from the locale's extension set and methods to convert a locale and language to a string without canonicalization. Each of these will be used by LibJS.	2021-09-02 17:56:42 +01:00
Timothy Flynn	a05419db55	LibUnicode: Add lexer to test if a string matches the "type" production	2021-09-02 17:56:42 +01:00
Timothy Flynn	113bf4a9dd	LibUnicode: Add missing structures to forwarding header	2021-09-02 17:56:42 +01:00
Timothy Flynn	fd0011989a	LibUnicode: Resolve the most likely territory alias when there are many	2021-09-01 14:14:47 +01:00
Timothy Flynn	1fbc5dba08	LibUnicode: Generate Unicode locale likely subtag data CLDR contains a set of likely subtag data where, given a locale, you can resolve what is the most likely language, script, or territory of that locale. This data is needed for resolving territory aliases. These aliases might contain multiple territories, and we need to resolve which of those territories is most likely correct for a locale. Note that the likely subtag data is quite huge (a few thousand entries). As an optimization encouraged by the spec, we only generate the smallest subset of this data that we actually need (about 150 entries).	2021-09-01 14:14:47 +01:00
Timothy Flynn	72f49e42b4	LibUnicode: Perform complex Unicode locale alias substitution	2021-09-01 14:14:47 +01:00
Timothy Flynn	9ae7ac4c87	LibUnicode: Generate complex Unicode locale alias matching Most alias substitutions are "simple", meaning that alias matching is done by examining a single locale subtag. However, there are a handful of "complex" aliases where matching is done by examining multiple subtags. For example, the variant subtag "lojban" causes the locale "art-lojban" to be canonicalized to "jbo", but only when the language subtag is "art" (i.e. this should not occur for the locale "en-lojban"). This generates a method to perform complex alias matching.	2021-09-01 14:14:47 +01:00
Timothy Flynn	da89cf9afb	LibUnicode: Canonicalize calendar subtags Calendar subtags are a bit of an odd-man-out in that we must match the variants "ethiopic-amete-alem" in that order, without any other variant in the locale. So a separate method is needed for this, and we now defer sorting the variant list until after other canonicalization is done.	2021-09-01 14:14:47 +01:00
Timothy Flynn	8458f477a4	LibUnicode: Canonicalize timezone subtags	2021-09-01 14:14:47 +01:00
Timothy Flynn	335f985b31	LibUnicode: Canonicalize the subtag "imperial" to "uksystem"	2021-09-01 14:14:47 +01:00
Timothy Flynn	2d90144888	LibUnicode: Canonicalize the subtag "primary" and "tertiary" to "levelN"	2021-09-01 14:14:47 +01:00
Timothy Flynn	409f39b336	LibUnicode: Canonicalize the subtag "names" to "prprname"	2021-09-01 14:14:47 +01:00
Timothy Flynn	f907a7dc38	LibUnicode: Canonicalize the subtag "yes" to "true"	2021-09-01 14:14:47 +01:00
Timothy Flynn	556374a904	LibUnicode: Substitute Unicode locale aliases during canonicalization Unicode TR35 defines how locale subtag aliases should be emplaced when converting a locale to canonical form. For most subtags, it is a simple substitution. Language subtags depend on context; for example, the language "sh" should become "sr-Latn", but if the original locale has a script subtag already ("sh-Cyrl"), then only the language subtag of the alias should be taken ("sr-Latn"). To facilitate this, we now make two passes when canonicalizing a locale. In the first pass, we convert the LocaleID structure to canonical syntax (where the conversions all happen in-place). In the second pass, we form the canonical string based on the canonical syntax.	2021-09-01 14:14:47 +01:00
Timothy Flynn	9b118f1f06	LibUnicode: Generate Unicode locale alias data CLDR contains a set of aliases for languages, territories, etc. that no longer are meant to be used (e.g. due to deprecation). For example, the language "aam" is deprecated and should be canonicalized as "aas".	2021-09-01 14:14:47 +01:00
Timothy Flynn	d13142f015	LibJS+LibUnicode: Store parsed Unicode locale data as full strings Originally, it was convenient to store the parsed Unicode locale data as views into the original string being parsed. But to implement locale aliases will require mutating the data that was parsed. To prepare for that, store the parsed data as proper strings.	2021-09-01 14:14:47 +01:00

1 2 3

108 commits