AK+Everywhere: Recognise that surrogates in utf16 aren't all that common

For the slight cost of counting code points when converting between
encodings and a teeny bit of memory, this commit adds a fast path for
all-happy utf-16 substrings and code point operations.

This seems to be a significant chunk of time spent in many regex
benchmarks.
This commit is contained in:
Ali Mohammad Pur 2025-04-02 17:56:49 +02:00 committed by Andrew Kaster
parent 86c756a589
commit eea81738cd
Notes: github-actions[bot] 2025-04-23 13:57:06 +00:00
11 changed files with 74 additions and 37 deletions

View file

@ -572,7 +572,8 @@ JS_DEFINE_NATIVE_FUNCTION(GlobalObject::escape)
// 2. Let length be the length of string.
// 5. Let k be 0.
// 6. Repeat, while k < length,
for (auto code_point : TRY_OR_THROW_OOM(vm, utf8_to_utf16(string))) {
auto utf16_conversion = TRY_OR_THROW_OOM(vm, utf8_to_utf16(string));
for (auto code_point : utf16_conversion.data) {
// a. Let char be the code unit at index k within string.
// b. If unescapedSet contains char, then