ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-10-21 23:50:06 +00:00

Author	SHA1	Message	Date
Timothy Flynn	298ec6a12a	AK: Ensure StringBuilder encodes U+10000 as 2 UTF-16 code units	2025-08-07 02:05:50 +02:00
Timothy Flynn	1b611fba67	AK: Ensure Utf16FlyString is hash-compatible with Utf16View/Utf16String	2025-08-07 02:05:50 +02:00
Timothy Flynn	2dc0a3b3ce	AK: Add trim methods to Utf16String that skip allocation when not needed If the string does not begin with any of the provided code units, we do not need to create a new string.	2025-08-05 15:13:36 +02:00
Timothy Flynn	0bf565b97f	AK: Allow comparing UTF-16 strings to UTF-8 strings Before now, you could compare a Utf16View to a StringView, but it would only be valid if the StringView were ASCII. When porting code to UTF-16, it will be handy to have a code point-aware implementation for non-ASCII StringViews.	2025-08-05 07:07:15 -04:00
Timothy Flynn	13ed6aba71	AK+LibIPC: Implement an encoder/decoder for UTF-16 strings	2025-08-02 10:10:14 -07:00
Timothy Flynn	a740bfd8ff	AK+LibUnicode: Implement Unicode-aware UTF-16 case transformations	2025-07-25 18:16:22 +02:00
Timothy Flynn	df77ae1920	AK: Implement creating a UTF-16 string from a repeated code point	2025-07-25 18:16:22 +02:00
Timothy Flynn	f53389bab1	AK: Add a couple of Utf16String factories * Utf16String::from_utf8_with_replacement_character * Utf16String::from_code_point	2025-07-24 19:00:20 +02:00
Timothy Flynn	2803d66d87	AK: Support UTF-16 string formatting The underlying storage used during string formatting is StringBuilder. To support UTF-16 strings, this patch allows callers to specify a mode during StringBuilder construction. The default mode is UTF-8, for which StringBuilder remains unchanged. In UTF-16 mode, we treat the StringBuilder's internal ByteBuffer as a series of u16 code units. Appending a single character will append 2 bytes for that character (cast to a char16_t). Appending a StringView will transcode the string to UTF-16. Utf16String also gains the same memory optimization that we added for String, where we hand-off the underlying buffer to Utf16String to avoid having to re-allocate. In the future, we may want to further optimize for ASCII strings. For example, we could defer committing to the u16-esque storage until we see a non-ASCII code point.	2025-07-18 12:45:38 -04:00
Timothy Flynn	fe676585f5	AK: Add a UTF-16 string with optimized short- and ASCII-string storage This is a strictly UTF-16 string with some optimizations for ASCII. * If created from a short UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an inlined byte buffer. * If created with a long UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an outlined char buffer. * If created with a short or long UTF-8 or UTF-16 string that is not ASCII, then the string is stored in an outlined char16 buffer. We do not store short non-ASCII text in the inlined buffer to avoid confusion with operations such as `length_in_code_units` and `code_unit_at`. For example, "😀" would be stored as 4 UTF-8 bytes in short string form. But we still want `length_in_code_units` to be 2, and `code_unit_at(0)` to be 0xD83D.	2025-07-18 12:45:38 -04:00

10 commits