ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-10-27 18:40:00 +00:00

Author	SHA1	Message	Date
Timothy Flynn	f1809db994	LibUnicode: Add public methods to compare and lookup Unicode properties Adds methods to retrieve a Unicode property from a string and to check if a code point matches a Unicode property. Also adds a <LibUnicode/Forward.h> header.	2021-07-30 21:26:31 +01:00
Timothy Flynn	3f80791ed5	LibUnicode: Manually assign special code point properties The Unicode standard defines a few extra properties that are not defined in any UCD file, so we must assign them manually.	2021-07-30 21:26:31 +01:00
Timothy Flynn	bba3152104	LibUnicode: Parse and generate PropertyAliases These are all used for Unicode property escapes.	2021-07-30 21:26:31 +01:00
Timothy Flynn	761c16d873	LibUnicode: Parse and utilize DerivedCoreProperties DerivedCoreProperties are pseudo-properties that are the union of other categories and properties. For example, the derived property Math is the union of the general category Sm and the property Other_Math. Parsing these is necessary for implementing Unicode property escapes. But it also has the added benefit that LibUnicode now does not need to derive some of these properties at runtime.	2021-07-30 21:26:31 +01:00
Timothy Flynn	4eb4b06688	LibUnicode: Do not replace underscores in property names Originally, this was done to make the generated enums look more like the rest of Serenity's enums. But for Unicode property escapes, LibUnicode will need to compare property names from a RegExp.prototype object to these parsed property names, which will be easier without this modification.	2021-07-30 21:26:31 +01:00
Timothy Flynn	5d09a00189	LibUnicode: Generate PropList enumeration as a bitmask Rather than generating the PropList as a list of enums, generate it as a bitmask. Not only will this be better for runtime property searching, this will allow parsing of the DerivedCoreProperties list more easily.	2021-07-30 21:26:31 +01:00
Andrew Kaster	38707f4a20	LibUnicode: Make unicode data generation logic more relocatable The previous logic had several checks for Lagom directories and subdirectories. All we really want to do for these header checks is make sure that the files end up in an included folder prefixed with LibUnicode. We also don't need to hard code the path to the generator, the $<TARGET_FILES> generator expression can create the path for us.	2021-07-29 21:46:25 +01:00
Timothy Flynn	c4bfda7f7f	LibUnicode: Handle code points that are both cased and case-ignorable Apparently, some code points fit both categories, for example U+0345 (COMBINING GREEK YPOGEGRAMMENI). Handle this fact when determining if a code point is a final code point in a string.	2021-07-28 23:42:29 +02:00
Timothy Flynn	dff156b7c6	LibUnicode: Reduce Unicode data generator boilerplate There's a fair amount of boilerplate when e.g. adding a new UCD file to parse or a new enumeration to generate. Reduce the overhead by adding helper lambdas. Also adds a couple missing spec links with UCD field information.	2021-07-28 23:42:29 +02:00
Timothy Flynn	7827aede6f	LibUnicode: Check word break when deciding on case-ignorable code points	2021-07-28 23:42:29 +02:00
Timothy Flynn	12fb3ae033	LibUnicode: Download and parse the word break property list UCD file Note that unlike the main property list, each code point has only one word break property. Code points that do not have a word break property are to be assigned the property "Other".	2021-07-28 23:42:29 +02:00
Timothy Flynn	c45a014645	LibUnicode: Check property list when deciding if a code point is cased	2021-07-28 23:42:29 +02:00
Timothy Flynn	38adfd8874	LibUnicode: Download and parse the property list UCD file	2021-07-28 23:42:29 +02:00
Timothy Flynn	39f971e42b	LibUnicode: Begin implementing special Unicode case folding This implements unconditional special case folding, and conditional folding for non-locale cases. Worth noting that the only conditional, non-locale special case is for converting an uppercase sigma to lowercase.	2021-07-27 21:04:36 +01:00
Timothy Flynn	5b110034dd	LibUnicode: Produce each code point's general category This will be needed for the Unicode Standard's Default Case Algorithm. Generate the field as an enumeration rather than a string for easier comparison.	2021-07-27 21:04:36 +01:00
Timothy Flynn	32ea461385	LibUnicode: Download and parse the special casing UCD file This adds a SpecialCasing structure to the generated UnicodeData.h/cpp files. This structure contains casing rules for code points which have non-1-to-1 upper-to-lower case code point mappings. Further, these rules may be limited to specific locales or other context.	2021-07-27 21:04:36 +01:00
Timothy Flynn	98d8274040	Meta: Add LibUnicode (and its tests) to Lagom This is primarily to allow using LibUnicode within LibJS and its REPL. Note: this seems to be the first time that a Lagom dependency requires generated source files. For this to work, some of Lagom's CMakeLists.txt commands needed to be re-organized to include the CMake files that fetch and parse UnicodeData.txt. The paths required to invoke the generator also differ depending on what is currently building (SerenityOS vs. Lagom as part of the Serenity build vs. a standalone Lagom build).	2021-07-26 17:03:55 +01:00
Timothy Flynn	4dda3edc9e	LibUnicode: Introduce a Unicode library for interacting with UCD files The Unicode standard publishes the Unicode Character Database (UCD) with information about every code point, such as each code point's upper case mapping. LibUnicode exists to download and parse UCD files at build time and to provide accessors to that data. As a start, LibUnicode includes upper- and lower-case code point converters.	2021-07-26 17:03:55 +01:00

18 commits