AK+Everywhere: Recognise that surrogates in utf16 aren't all that common

For the slight cost of counting code points when converting between encodings and a teeny bit of memory, this commit adds a fast path for all-happy utf-16 substrings and code point operations. This seems to be a significant chunk of time spent in many regex benchmarks.
Author: https://github.com/alimpfard Commit: eea81738cd Pull-request: https://github.com/LadybirdBrowser/ladybird/pull/4196 Reviewed-by: https://github.com/ADKaster ✅
2025-08-04 15:19:42 +00:00 · 2025-04-02 17:56:49 +02:00 · 2025-04-02 17:56:49 +02:00 · eea81738cd · 2025-04-23 13:57:06 +00:00
commit eea81738cd
parent 86c756a589
11 changed files with 74 additions and 37 deletions
--- a/Libraries/LibJS/Runtime/RegExpObject.cpp
+++ b/Libraries/LibJS/Runtime/RegExpObject.cpp
@ -97,8 +97,8 @@ ErrorOr<String, ParseRegexPatternError> parse_regex_pattern(StringView pattern,
    if (utf16_pattern_result.is_error())
        return ParseRegexPatternError { "Out of memory"_string };

-    auto utf16_pattern = utf16_pattern_result.release_value();
-    Utf16View utf16_pattern_view { utf16_pattern };
+    auto utf16_result = utf16_pattern_result.release_value();
+    Utf16View utf16_pattern_view { utf16_result };
    StringBuilder builder;

    // If the Unicode flag is set, append each code point to the pattern. Otherwise, append each