ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-07-05 16:41:52 +00:00

Author	SHA1	Message	Date
Timothy Flynn	fa96811a22	LibUnicode: Skip over emoji sequences in grapheme boundary segmentation Emoji sequences in the grapheme segmentation spec are a bit tricky: \p{Extended_Pictographic} Extend* ZWJ × \p{Extended_Pictographic} Our current strategy of tracking a boolean to indicate if we are in an emoji sequence was causing us to break up emoji made of multiple sub- sequences. For example, in the "family: man, woman, girl, boy" sequence: U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466 We would break at indices 0 (correctly) and 6 (incorrectly). Instead of tracking a boolean, it's quite a bit simpler to reason about emoji sequences by just skipping past them entirely. Note that in cases like the above emoji, we skip one sub-sequence at a time.	2023-02-25 22:23:39 +01:00
Timothy Flynn	5cbf054651	LibUnicode: Fix typos causing text segmentation on mid-word punctuation For example the words "can't" and "32.3" should not have boundaries detected on the "'" and "." code points, respectively. The String test cases fixed here are because "b'ar" is now considered one word.	2023-02-15 12:36:47 +01:00
Timothy Flynn	abe7786a81	LibUnicode: Allow iterating over text segmentation boundaries This will be useful for e.g. finding the next boundary after a specific index - we can just stop iterating once a condition is satisfied.	2023-02-15 12:36:47 +01:00
Timothy Flynn	dd4c47456e	LibUnicode: Implement text segmentation algorithms for all UTF encodings Similar to commit `6d710eeb43`. Rather than pick-and-chosing what to support, let's just support all encodings now, as it is trivial. For example, LibGUI will want the UTF-32 overloads.	2023-02-15 12:36:47 +01:00
Timothy Flynn	2d487e4e4c	LibUnicode+LibJS: Move text segmentation algorithms to their own files These algorithms are quite chonky, and more APIs around them are to be added, so let's move them to their own files for a bit of organization.	2023-02-15 12:36:47 +01:00

5 commits