ladybird

70447 commits 1 branch 0 tags 481 MiB

Author	SHA1	Message	Date
Timothy Flynn	fe676585f5	AK: Add a UTF-16 string with optimized short- and ASCII-string storage This is a strictly UTF-16 string with some optimizations for ASCII. * If created from a short UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an inlined byte buffer. * If created with a long UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an outlined char buffer. * If created with a short or long UTF-8 or UTF-16 string that is not ASCII, then the string is stored in an outlined char16 buffer. We do not store short non-ASCII text in the inlined buffer to avoid confusion with operations such as `length_in_code_units` and `code_unit_at`. For example, "😀" would be stored as 4 UTF-8 bytes in short string form. But we still want `length_in_code_units` to be 2, and `code_unit_at(0)` to be 0xD83D.	2025-07-18 12:45:38 -04:00
Timothy Flynn	86b1c78c1a	AK+Everywhere: Prepare Utf16View for integration with a UTF-16 string To prepare for an upcoming Utf16String, this migrates Utf16View to store its data as a char16_t. Most function definitions are moved inline and made constexpr. This also adds a UDL to construct a Utf16View from a string literal: auto string = u"hello"sv; This let's us remove the NTTP Utf16View constructor, as we have found that such constructors bloat binary size quite a bit.	2025-07-03 09:51:56 -04:00
mikiubo	8ec72d6906	LibUnicode: Avoid rejecting end-of-text position as a valid boundary When the cursor was positioned at the end of text, attempting to move it left(using the left arrow key) would fail because align_boundary() was rejecting the end-of-text position as a valid boundary.	2025-04-11 15:30:17 -04:00
Timothy Flynn	e6b7c8cde2	LibUnicode: Consistently reject out-of-bounds segmenter indices In the UTF-8 implementation, this prevents out-of-bounds access of the underlying text data, as the ICU macro would essentially do something akin to `text[text.length()]`. The UTF-16 implementation already checks for out-of-bounds, but would previously return 0. We now return an empty Optional in both impls. This doesn't affect LibJS (the user of the UTF-16 impl), as it already does bounds checking before invoking LibUnicode APIs.	2025-01-16 23:22:48 +01:00
Timothy Flynn	93712b24bf	Everywhere: Hoist the Libraries folder to the top-level	2024-11-10 12:50:45 +01:00

Renamed from Userland/Libraries/LibUnicode/Segmenter.cpp (Browse further)

5 commits