AK+Everywhere: Recognise that surrogates in utf16 aren't all that common

For the slight cost of counting code points when converting between encodings and a teeny bit of memory, this commit adds a fast path for all-happy utf-16 substrings and code point operations. This seems to be a significant chunk of time spent in many regex benchmarks.
Author: https://github.com/alimpfard Commit: eea81738cd Pull-request: https://github.com/LadybirdBrowser/ladybird/pull/4196 Reviewed-by: https://github.com/ADKaster ✅
2025-08-04 23:30:20 +00:00 · 2025-04-02 17:56:49 +02:00 · 2025-04-02 17:56:49 +02:00 · eea81738cd · 2025-04-23 13:57:06 +00:00
commit eea81738cd
parent 86c756a589
11 changed files with 74 additions and 37 deletions
--- a/Libraries/LibWeb/DOMURL/URLSearchParams.cpp
+++ b/Libraries/LibWeb/DOMURL/URLSearchParams.cpp
@ -327,8 +327,8 @@ void URLSearchParams::sort()
    // 1. Sort all name-value pairs, if any, by their names. Sorting must be done by comparison of code units. The relative order between name-value pairs with equal names must be preserved.
    insertion_sort(m_list, [](auto& a, auto& b) {
        // FIXME: There should be a way to do this without converting to utf16
-        auto a_utf16 = MUST(utf8_to_utf16(a.name));
-        auto b_utf16 = MUST(utf8_to_utf16(b.name));
+        auto a_utf16 = MUST(utf8_to_utf16(a.name)).data;
+        auto b_utf16 = MUST(utf8_to_utf16(b.name)).data;

        auto common_length = min(a_utf16.size(), b_utf16.size());