AK+Everywhere: Recognise that surrogates in utf16 aren't all that common

For the slight cost of counting code points when converting between encodings and a teeny bit of memory, this commit adds a fast path for all-happy utf-16 substrings and code point operations. This seems to be a significant chunk of time spent in many regex benchmarks.
Author: https://github.com/alimpfard Commit: eea81738cd Pull-request: https://github.com/LadybirdBrowser/ladybird/pull/4196 Reviewed-by: https://github.com/ADKaster ✅
2025-07-24 18:02:20 +00:00 · 2025-04-02 17:56:49 +02:00 · 2025-04-02 17:56:49 +02:00 · eea81738cd · 2025-04-23 13:57:06 +00:00
commit eea81738cd
parent 86c756a589
11 changed files with 74 additions and 37 deletions
--- a/Libraries/LibJS/Runtime/GlobalObject.cpp
+++ b/Libraries/LibJS/Runtime/GlobalObject.cpp
@ -572,7 +572,8 @@ JS_DEFINE_NATIVE_FUNCTION(GlobalObject::escape)
    // 2. Let length be the length of string.
    // 5. Let k be 0.
    // 6. Repeat, while k < length,
-    for (auto code_point : TRY_OR_THROW_OOM(vm, utf8_to_utf16(string))) {
+    auto utf16_conversion = TRY_OR_THROW_OOM(vm, utf8_to_utf16(string));
+    for (auto code_point : utf16_conversion.data) {
        // a. Let char be the code unit at index k within string.

        // b. If unescapedSet contains char, then