ladybird

70446 commits 1 branch 0 tags 478 MiB

Author	SHA1	Message	Date
Timothy Flynn	fe676585f5	AK: Add a UTF-16 string with optimized short- and ASCII-string storage This is a strictly UTF-16 string with some optimizations for ASCII. * If created from a short UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an inlined byte buffer. * If created with a long UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an outlined char buffer. * If created with a short or long UTF-8 or UTF-16 string that is not ASCII, then the string is stored in an outlined char16 buffer. We do not store short non-ASCII text in the inlined buffer to avoid confusion with operations such as `length_in_code_units` and `code_unit_at`. For example, "😀" would be stored as 4 UTF-8 bytes in short string form. But we still want `length_in_code_units` to be 2, and `code_unit_at(0)` to be 0xD83D.	2025-07-18 12:45:38 -04:00
Timothy Flynn	9fc3e72db2	AK+Everywhere: Allow lonely UTF-16 surrogates by default By definition, the web allows lonely surrogates by default. Let's have our string APIs reflect this, so we don't have to pass an allow option all over the place.	2025-07-03 09:51:56 -04:00
Timothy Flynn	86b1c78c1a	AK+Everywhere: Prepare Utf16View for integration with a UTF-16 string To prepare for an upcoming Utf16String, this migrates Utf16View to store its data as a char16_t. Most function definitions are moved inline and made constexpr. This also adds a UDL to construct a Utf16View from a string literal: auto string = u"hello"sv; This let's us remove the NTTP Utf16View constructor, as we have found that such constructors bloat binary size quite a bit.	2025-07-03 09:51:56 -04:00
Timothy Flynn	efa9737cf7	AK+LibJS: Do not set UTF-16 code point length to its code unit length	2025-06-25 22:20:47 +02:00
Ali Mohammad Pur	eea81738cd	AK+Everywhere: Recognise that surrogates in utf16 aren't all that common For the slight cost of counting code points when converting between encodings and a teeny bit of memory, this commit adds a fast path for all-happy utf-16 substrings and code point operations. This seems to be a significant chunk of time spent in many regex benchmarks.	2025-04-23 07:56:02 -06:00
Andreas Kling	4dc63ddf49	LibJS: Make Optional<Utf16String> use less space Some checks are pending CI / Lagom (arm64, Sanitizer_CI, false, macos-15, macOS, Clang) (push) Waiting to run Details CI / Lagom (x86_64, Fuzzers_CI, false, ubuntu-24.04, Linux, Clang) (push) Waiting to run Details CI / Lagom (x86_64, Sanitizer_CI, false, ubuntu-24.04, Linux, GNU) (push) Waiting to run Details CI / Lagom (x86_64, Sanitizer_CI, true, ubuntu-24.04, Linux, Clang) (push) Waiting to run Details Package the js repl as a binary artifact / build-and-package (macos-14, macOS, macOS-universal2) (push) Waiting to run Details Package the js repl as a binary artifact / build-and-package (ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run Details Run test262 and test-wasm / run_and_update_results (push) Waiting to run Details Lint Code / lint (push) Waiting to run Details Label PRs with merge conflicts / auto-labeler (push) Waiting to run Details Push notes / build (push) Waiting to run Details By specializing the template, we can shrink it from 16 to 8 bytes. This makes PrimitiveString a measly 32 bytes. :^)	2025-03-30 07:16:40 +01:00
Timothy Flynn	93712b24bf	Everywhere: Hoist the Libraries folder to the top-level	2024-11-10 12:50:45 +01:00

Renamed from Userland/Libraries/LibJS/Runtime/Utf16String.cpp (Browse further)

7 commits