Ali Mohammad Pur
5b45223d5f
LibRegex: Account for uppercase characters in insensitive patterns
2025-07-12 11:26:23 +02:00
Shannon Booth
bd6581fe22
LibRegex: Correctly use ClassSetReservedPunctuator in ClassSetCharacter
...
CI / macOS, arm64, Sanitizer, Clang (push) Waiting to run
CI / Linux, x86_64, Fuzzers, Clang (push) Waiting to run
CI / Linux, x86_64, Sanitizer, GNU (push) Waiting to run
CI / Linux, x86_64, Sanitizer, Clang (push) Waiting to run
Package the js repl as a binary artifact / Linux, arm64 (push) Waiting to run
Package the js repl as a binary artifact / macOS, arm64 (push) Waiting to run
Package the js repl as a binary artifact / Linux, x86_64 (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
We had typo'd using ClassSetReservedDoublePunctuator which was
resulting in a parse error for the regex:
([^\\:]+?)
With the 'v' flag set.
Co-Authored-By: Ali Mohammad Pur <mpfard@serenityos.org>
2025-07-10 11:41:02 +02:00
ayeteadoe
25f5936dee
CMake: Rename serenity_* helper functions/macros to ladybird_*
2025-07-03 23:19:41 +02:00
Timothy Flynn
62d9a84b8d
AK+Everywhere: Replace custom number parsers with fast_float
...
CI / macOS, arm64, Sanitizer_CI, Clang (push) Waiting to run
CI / Linux, x86_64, Fuzzers_CI, Clang (push) Waiting to run
CI / Linux, x86_64, Sanitizer_CI, GNU (push) Waiting to run
CI / Linux, x86_64, Sanitizer_CI, Clang (push) Waiting to run
Package the js repl as a binary artifact / Linux, arm64 (push) Waiting to run
Package the js repl as a binary artifact / macOS, arm64 (push) Waiting to run
Package the js repl as a binary artifact / Linux, x86_64 (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
Build Dev Container Image / build (push) Has been cancelled
Our floating point number parser was based on the fast_float library:
https://github.com/fastfloat/fast_float
However, our implementation only supports 8-bit characters. To support
UTF-16, we will need to be able to convert char16_t-based strings to
numbers as well. This works out-of-the-box with fast_float.
We can also use fast_float for integer parsing.
2025-07-03 09:51:56 -04:00
Timothy Flynn
9fc3e72db2
AK+Everywhere: Allow lonely UTF-16 surrogates by default
...
By definition, the web allows lonely surrogates by default. Let's have
our string APIs reflect this, so we don't have to pass an allow option
all over the place.
2025-07-03 09:51:56 -04:00
Timothy Flynn
86b1c78c1a
AK+Everywhere: Prepare Utf16View for integration with a UTF-16 string
...
To prepare for an upcoming Utf16String, this migrates Utf16View to store
its data as a char16_t. Most function definitions are moved inline and
made constexpr.
This also adds a UDL to construct a Utf16View from a string literal:
auto string = u"hello"sv;
This let's us remove the NTTP Utf16View constructor, as we have found
that such constructors bloat binary size quite a bit.
2025-07-03 09:51:56 -04:00
Ali Mohammad Pur
b0e471228d
LibRegex: Avoid use-after-return of MatchState in 'is_an_eligible_jump'
...
The opcode may have last been accessed by
block_satisfies_atomic_rewrite_precondition, which would set it to a
state that no longer exists.
Set the state to the correct one unconditionally to ensure we're looking
at the right value.
Fixes #5145 .
2025-06-24 18:43:01 +02:00
Ali Mohammad Pur
2947ae7d6e
LibRegex: Move required bytecode.flatten() outside optimization function
...
Not running the optimization passes should not leave the bytecode in a
broken state. Fixes #5146 .
2025-06-24 18:43:01 +02:00
aplefull
486602e796
LibRegex: Fix handling of + quantifier with zero-width matches
...
CI / Lagom (arm64, Sanitizer_CI, false, macOS, macos-15, Clang) (push) Waiting to run
CI / Lagom (x86_64, Fuzzers_CI, false, Linux, blacksmith-16vcpu-ubuntu-2404, Clang) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, false, Linux, blacksmith-16vcpu-ubuntu-2404, GNU) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, true, Linux, blacksmith-16vcpu-ubuntu-2404, Clang) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (arm64, macOS, macOS-arm64, macos-15) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (x86_64, Linux, Linux-x86_64, blacksmith-8vcpu-ubuntu-2404) (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
Small change that allows quantifiers using Fork* forms (e.g., +) to
succeed after one match, even if that match has zero width.
2025-06-02 15:52:26 +02:00
Ali Mohammad Pur
cfc241f61d
LibRegex: Make the trie rewrite optimisation maintain the alt order
...
This is required by the spec.
2025-05-21 14:28:45 +02:00
Ali Mohammad Pur
2eccd68ba5
LibRegex: Document the append_alternative optimisation a bit
2025-05-21 14:28:45 +02:00
Timothy Flynn
7280ed6312
Meta: Enforce newlines around namespaces
...
This has come up several times during code review, so let's just enforce
it using a new clang-format 20 option.
2025-05-14 02:01:59 -06:00
ayeteadoe
a3754a7bf1
LibRegex: Annotate classes with export macro for hidden visibility
...
This fix demos the gradual opt-in migration process libraries can
take to switch to explicit symbol exports via the FOO_API macros.
2025-05-12 03:22:23 -06:00
Andrew Kaster
3dd2fbd041
LibRegex: Move StringTable ctor/dtor out of line
...
This also moves the next_serial class static into a file scope static.
The public class static was causing visibility issues with certain Linux
builds when hidden visibility was enabled. However, the current design
makes more sense anyway :^).
2025-05-12 03:22:23 -06:00
ayeteadoe
0253342c1a
LibRegex: Use NO_UNIQUE_ADDRESS
in RegexMatch for Windows support
...
Clang's `x86_64-pc-windows-msvc` target requires
`[[msvc::no_unique_address]]`, which is properly set in the
`NO_UNIQUE_ADDRESS` macro in `AK/Platform.h`. Without this, building
on Windows fails due to `-Wunknown-attributes`.
2025-05-12 03:22:23 -06:00
Ali Mohammad Pur
022cd1adca
LibRegex: Use the right offset when patching jumps through fork-trees
...
Fixes #4474 .
2025-04-27 12:16:15 +02:00
Ali Mohammad Pur
fca1d33fec
LibRegex: Correctly calculate the target for Repeat in table alts
...
Fixes a bunch of websites breaking because we now verify jump offsets by
trying to remove 0-offset jumps.
This has been broken for a good while, it was just rare to see Repeat
inside alternatives that lended themselves well to tree alts.
2025-04-24 01:17:27 -06:00
Ali Mohammad Pur
4b9abdb963
LibRegex: Remove useless jumps (Jump* +0) before running opts
...
CI / Lagom (arm64, Sanitizer_CI, false, macos-15, macOS, Clang) (push) Waiting to run
CI / Lagom (x86_64, Fuzzers_CI, false, ubuntu-24.04, Linux, Clang) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, false, ubuntu-24.04, Linux, GNU) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, true, ubuntu-24.04, Linux, Clang) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (arm64, macos-15, macOS, macOS-universal2) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (x86_64, ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
This leads to some more significant performance increases on the simple
/<script|<style|<link/ regex in speedometer (~2x)
2025-04-23 22:57:49 +02:00
Ali Mohammad Pur
ec0836c9ea
LibRegex: Don't blindly treat multi-target tree jumps as a single jump
...
The tree generation was broken, we just didn't notice it because it was
very rarely being picked for more complex bytecodes.
2025-04-23 22:57:49 +02:00
Ali Mohammad Pur
09eb28ee1d
LibRegex: Better estimate the cost of laying out alts as a chain
...
Previously we were counting the total number of *nodes* in the tree for
the chain cost, which greatly underestimated its cost when large
bytecode entries were present,
This commit switches to estimating it using the total bytecode *size*,
which is a closer value to the true cost than the tree node count.
This corresponds to a ~4x perf improvement on /<script|<style|<link/ in
speedometer.
2025-04-23 22:57:49 +02:00
Ali Mohammad Pur
eea81738cd
AK+Everywhere: Recognise that surrogates in utf16 aren't all that common
...
For the slight cost of counting code points when converting between
encodings and a teeny bit of memory, this commit adds a fast path for
all-happy utf-16 substrings and code point operations.
This seems to be a significant chunk of time spent in many regex
benchmarks.
2025-04-23 07:56:02 -06:00
Ali Mohammad Pur
3b4a184f1a
LibRegex: Avoid hashing the state hashes again
...
CI / Lagom (arm64, Sanitizer_CI, false, macos-15, macOS, Clang) (push) Waiting to run
CI / Lagom (x86_64, Fuzzers_CI, false, ubuntu-24.04, Linux, Clang) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, false, ubuntu-24.04, Linux, GNU) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, true, ubuntu-24.04, Linux, Clang) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (arm64, macos-15, macOS, macOS-universal2) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (x86_64, ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
We already had a really nice hash that had a single issue, this commit
fixes that and makes it *the* hash for the hash table, so we avoid
double-hashing and making a long chain.
This is an easy 10% perf gain.
2025-04-18 17:09:27 +02:00
Ali Mohammad Pur
446a453719
LibRegex: Pull out the first compare to avoid unnecessary execution
...
This adds a fast-path to drop view indices we know will not match
immediately without going through the regex VM.
2025-04-18 17:09:27 +02:00
Ali Mohammad Pur
76f5dce3db
LibRegex: Flatten capture group list in MatchState
...
This makes copying the capture group COWVector significantly cheaper,
as we no longer have to run any constructors for it - just memcpy.
2025-04-18 17:09:27 +02:00
Andreas Kling
ca2f0141f6
LibRegex: Remove unused "simple substring search" optimization
...
CI / Lagom (arm64, Sanitizer_CI, false, macos-15, macOS, Clang) (push) Waiting to run
CI / Lagom (x86_64, Fuzzers_CI, false, ubuntu-24.04, Linux, Clang) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, false, ubuntu-24.04, Linux, GNU) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, true, ubuntu-24.04, Linux, Clang) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (arm64, macos-15, macOS, macOS-universal2) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (x86_64, ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
This is not relevant for LibJS since it only works when the input is
UTF-8, and LibJS always provides UTF-16.
2025-04-16 10:04:50 +02:00
Andreas Kling
96f1f15ad6
LibRegex: Remove unused Utf8View/Utf32View support in RegexStringView
2025-04-16 10:04:50 +02:00
Andreas Kling
87ec5b32b0
LibRegex: Use ReadonlySpan to peek into OpCode_Compare LUTs
...
CI / Lagom (x86_64, Fuzzers_CI, false, ubuntu-24.04, Linux, Clang) (push) Waiting to run
CI / Lagom (arm64, Sanitizer_CI, false, macos-15, macOS, Clang) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, false, ubuntu-24.04, Linux, GNU) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, true, ubuntu-24.04, Linux, Clang) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (arm64, macos-15, macOS, macOS-universal2) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (x86_64, ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
By the time we're executing bytecode, we know the the bytecode will be
flattened. This means we can use ReadonlySpan to look into it instead of
DisjointChunks::spans(), which allocates.
2025-04-14 17:40:13 +02:00
Andreas Kling
c1c3b01a6c
LibRegex: Allow Vector<Match> to use trivial memcpy
...
Now that Match has no more members that need destruction, we can allow
Vector to memcpy them around.
2025-04-14 17:40:13 +02:00
Andreas Kling
5308d77600
LibRegex: Don't use Optional<T> inside regex::Match
...
This prevented Match from being trivially copyable, which we want it to
be for fast Vector copying.
2025-04-14 17:40:13 +02:00
Andreas Kling
54edf29f1b
LibRegex: Make Match::capture_group_name an index into the string table
...
This removes another Match member that required destruction. The "API"
for accessing the strings is definitely a bit awkward. We'll think of
something nicer eventually.
2025-04-14 17:40:13 +02:00
Andreas Kling
9d47cc54f8
LibRegex: Remove unused regex::Match::string and unused constructor
...
This shrinks regex::Match by 8 bytes and removes a member that needs
destruction.
2025-04-14 17:40:13 +02:00
Ali Mohammad Pur
69050da929
LibRegex: Merge inverse string table mappings separately
CI / Lagom (arm64, Sanitizer_CI, false, macos-15, macOS, Clang) (push) Waiting to run
CI / Lagom (x86_64, Fuzzers_CI, false, ubuntu-24.04, Linux, Clang) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, false, ubuntu-24.04, Linux, GNU) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, true, ubuntu-24.04, Linux, Clang) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (arm64, macos-15, macOS, macOS-universal2) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (x86_64, ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
2025-04-06 20:21:16 +02:00
Ali Mohammad Pur
299b9ca572
LibRegex: Check backreference index before looking it up
...
If a backref happens after it's cleared, the slot may be cleared
already.
2025-04-06 20:21:16 +02:00
Jess
83e46b3728
LibRegex: Fix crash when parse result exceeds max cache size
...
Before, If the cache was empty we would try and evict non-existant
entries and crash. So the fix is to make sure that we don't saturate
the cache with a single parse result.
2025-04-04 16:10:25 +02:00
Ali Mohammad Pur
4136d8d13e
LibRegex: Use an interned string table for capture group names
...
This avoids messing around with unsafe string pointers and removes the
only non-FlyString-able user of DeprecatedFlyString.
2025-04-02 11:43:13 +02:00
Andreas Kling
e5db913b0d
Revert "LibRegex: Port remaining DeprecatedFlyString to ByteString"
...
This reverts commit aab3fbe254
.
Greatly regressed JavaScript benchmark performance.
2025-04-01 15:40:38 +02:00
Andreas Kling
7c32d1e8a5
Revert "Everywhere: Remove DeprecatedFlyString + any remaining references to it"
...
This reverts commit 3131e6369f
.
Greatly regressed JavaScript benchmark performance.
2025-04-01 15:40:27 +02:00
Kenneth Myhra
3131e6369f
Everywhere: Remove DeprecatedFlyString + any remaining references to it
2025-04-01 12:50:00 +02:00
Kenneth Myhra
aab3fbe254
LibRegex: Port remaining DeprecatedFlyString to ByteString
2025-04-01 12:50:00 +02:00
Andreas Kling
6b6d3b32a4
LibRegex: Remove the StringCopyMatches mode
...
This mode made a lot of incorrect assumptions about string lifetimes,
and instead of fixing it, let's just remove it and tweak the few unit
tests that used it.
2025-03-24 22:27:17 +00:00
Andreas Kling
46a5710238
LibJS: Use FlyString in PropertyKey instead of DeprecatedFlyString
...
This required dealing with *substantial* fallout.
2025-03-24 22:27:17 +00:00
mikiubo
c85df78c4c
LibRegex: Remove orphaned save points in nested LookAhead
2025-03-17 16:11:02 +01:00
Tim Ledbetter
b9ac99d2eb
Revert "LibRegex: Remove orphaned save points in nested LookAhead"
...
This reverts commit f2678bfcb8
.
2025-03-14 19:57:33 +00:00
mikiubo
f2678bfcb8
LibRegex: Remove orphaned save points in nested LookAhead
2025-03-14 09:41:41 +01:00
Ali Mohammad Pur
5355710481
LibRegex: Don't treat single-jump blocks as noop in the optimizer
2025-03-09 14:37:57 +01:00
aplefull
389a63d6bf
LibRegex: Allow duplicate named capture groups in separate alternatives
2025-03-05 14:36:09 +01:00
aplefull
61744322ad
LibRegex: Ensure nullable quantifiers backtrack when input remains
...
Makes patterns like `/(a?b??)*/` correctly match the string
2025-03-02 15:19:04 +01:00
Ali Mohammad Pur
ea3b7efd91
LibRegex: Treat the UnicodeSets flag as Unicode
...
Fixes /.../v not being interpreted as a unicode pattern.
2025-02-28 14:31:45 -05:00
mikiubo
8a6f7b787e
LibRegex: Use depth-first search in regex optimizer
...
CI / Lagom (arm64, Sanitizer_CI, false, macos-15, macOS, Clang) (push) Waiting to run
CI / Lagom (x86_64, Fuzzers_CI, false, ubuntu-24.04, Linux, Clang) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, false, ubuntu-24.04, Linux, GNU) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, true, ubuntu-24.04, Linux, Clang) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (macos-14, macOS, macOS-universal2) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
use depth-first search in optimizer code bacause using breadth-first
search generate a bug. Add test example in test lib.
2025-02-25 00:09:20 +01:00
Ali Mohammad Pur
08ebfaff17
LibRegex: Take trailing inversion state into account in block comparison
...
Fixes #3421 .
2025-02-01 11:30:02 +01:00