Callum Law
8ada4b7fdc
LibRegex: Account for opcode size when calculating incoming jump edges
...
Not accounting for opcode size when calculating incoming jump edges
meant that we were merging nodes where we otherwise shouldn't have been,
for example /.*a|.*b/.
2025-07-28 17:06:58 +02:00
Ali Mohammad Pur
5b45223d5f
LibRegex: Account for uppercase characters in insensitive patterns
2025-07-12 11:26:23 +02:00
Ali Mohammad Pur
b0e471228d
LibRegex: Avoid use-after-return of MatchState in 'is_an_eligible_jump'
...
The opcode may have last been accessed by
block_satisfies_atomic_rewrite_precondition, which would set it to a
state that no longer exists.
Set the state to the correct one unconditionally to ensure we're looking
at the right value.
Fixes #5145 .
2025-06-24 18:43:01 +02:00
Ali Mohammad Pur
2947ae7d6e
LibRegex: Move required bytecode.flatten() outside optimization function
...
Not running the optimization passes should not leave the bytecode in a
broken state. Fixes #5146 .
2025-06-24 18:43:01 +02:00
Ali Mohammad Pur
cfc241f61d
LibRegex: Make the trie rewrite optimisation maintain the alt order
...
This is required by the spec.
2025-05-21 14:28:45 +02:00
Ali Mohammad Pur
2eccd68ba5
LibRegex: Document the append_alternative optimisation a bit
2025-05-21 14:28:45 +02:00
Timothy Flynn
7280ed6312
Meta: Enforce newlines around namespaces
...
This has come up several times during code review, so let's just enforce
it using a new clang-format 20 option.
2025-05-14 02:01:59 -06:00
Ali Mohammad Pur
022cd1adca
LibRegex: Use the right offset when patching jumps through fork-trees
...
Fixes #4474 .
2025-04-27 12:16:15 +02:00
Ali Mohammad Pur
fca1d33fec
LibRegex: Correctly calculate the target for Repeat in table alts
...
Fixes a bunch of websites breaking because we now verify jump offsets by
trying to remove 0-offset jumps.
This has been broken for a good while, it was just rare to see Repeat
inside alternatives that lended themselves well to tree alts.
2025-04-24 01:17:27 -06:00
Ali Mohammad Pur
4b9abdb963
LibRegex: Remove useless jumps (Jump* +0) before running opts
...
CI / Lagom (arm64, Sanitizer_CI, false, macos-15, macOS, Clang) (push) Waiting to run
CI / Lagom (x86_64, Fuzzers_CI, false, ubuntu-24.04, Linux, Clang) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, false, ubuntu-24.04, Linux, GNU) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, true, ubuntu-24.04, Linux, Clang) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (arm64, macos-15, macOS, macOS-universal2) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (x86_64, ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
This leads to some more significant performance increases on the simple
/<script|<style|<link/ regex in speedometer (~2x)
2025-04-23 22:57:49 +02:00
Ali Mohammad Pur
ec0836c9ea
LibRegex: Don't blindly treat multi-target tree jumps as a single jump
...
The tree generation was broken, we just didn't notice it because it was
very rarely being picked for more complex bytecodes.
2025-04-23 22:57:49 +02:00
Ali Mohammad Pur
09eb28ee1d
LibRegex: Better estimate the cost of laying out alts as a chain
...
Previously we were counting the total number of *nodes* in the tree for
the chain cost, which greatly underestimated its cost when large
bytecode entries were present,
This commit switches to estimating it using the total bytecode *size*,
which is a closer value to the true cost than the tree node count.
This corresponds to a ~4x perf improvement on /<script|<style|<link/ in
speedometer.
2025-04-23 22:57:49 +02:00
Ali Mohammad Pur
446a453719
LibRegex: Pull out the first compare to avoid unnecessary execution
...
This adds a fast-path to drop view indices we know will not match
immediately without going through the regex VM.
2025-04-18 17:09:27 +02:00
Ali Mohammad Pur
76f5dce3db
LibRegex: Flatten capture group list in MatchState
...
This makes copying the capture group COWVector significantly cheaper,
as we no longer have to run any constructors for it - just memcpy.
2025-04-18 17:09:27 +02:00
Ali Mohammad Pur
69050da929
LibRegex: Merge inverse string table mappings separately
CI / Lagom (arm64, Sanitizer_CI, false, macos-15, macOS, Clang) (push) Waiting to run
CI / Lagom (x86_64, Fuzzers_CI, false, ubuntu-24.04, Linux, Clang) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, false, ubuntu-24.04, Linux, GNU) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, true, ubuntu-24.04, Linux, Clang) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (arm64, macos-15, macOS, macOS-universal2) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (x86_64, ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
2025-04-06 20:21:16 +02:00
Ali Mohammad Pur
4136d8d13e
LibRegex: Use an interned string table for capture group names
...
This avoids messing around with unsafe string pointers and removes the
only non-FlyString-able user of DeprecatedFlyString.
2025-04-02 11:43:13 +02:00
Ali Mohammad Pur
5355710481
LibRegex: Don't treat single-jump blocks as noop in the optimizer
2025-03-09 14:37:57 +01:00
Ali Mohammad Pur
ea3b7efd91
LibRegex: Treat the UnicodeSets flag as Unicode
...
Fixes /.../v not being interpreted as a unicode pattern.
2025-02-28 14:31:45 -05:00
mikiubo
8a6f7b787e
LibRegex: Use depth-first search in regex optimizer
...
CI / Lagom (arm64, Sanitizer_CI, false, macos-15, macOS, Clang) (push) Waiting to run
CI / Lagom (x86_64, Fuzzers_CI, false, ubuntu-24.04, Linux, Clang) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, false, ubuntu-24.04, Linux, GNU) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, true, ubuntu-24.04, Linux, Clang) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (macos-14, macOS, macOS-universal2) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
use depth-first search in optimizer code bacause using breadth-first
search generate a bug. Add test example in test lib.
2025-02-25 00:09:20 +01:00
Ali Mohammad Pur
08ebfaff17
LibRegex: Take trailing inversion state into account in block comparison
...
Fixes #3421 .
2025-02-01 11:30:02 +01:00
Ali Mohammad Pur
50733c564c
LibRegex: Use the *actually* correct repeat start offset for Repeat
...
Fixes #2931 and various frequent crashes.
2024-12-23 13:13:52 +01:00
Ali Mohammad Pur
358378c1c0
LibRegex: Pick the right target for OpCode_Repeat
...
Repeat's 'offset' field is a bit odd in that it is treated as a negative
offset, causing a backwards jump when positive; the optimizer didn't
correctly model this behaviour, which caused crashes and misopts when
dealing with Repeats.
This commit fixes that behaviour.
2024-12-13 10:00:16 +01:00
Ali Mohammad Pur
4a8d3e35a3
LibRegex: Add some more debugging info to bytecode block ranges
...
These were getting difficult to differentiate, now they each get a
comment on where they came from to aid with future debugging.
2024-12-13 10:00:16 +01:00
Ali Mohammad Pur
5a4d657a4e
LibRegex: Avoid generating ForkJumps when jumping to the next alt block
...
Fixes #2398 .
2024-11-17 20:12:39 +01:00
Ali Mohammad Pur
00bc22c332
LibRegex: Don't immediately ignore TempInverse in optimizer
...
fe46b2c141
added the reset-temp-inverse flag, but set it up so all
tempinverse ops were negated at the start of the next op; this commit
makes it so these flags actually persist for one op and not zero.
Fixes #2296 .
2024-11-17 09:03:29 -05:00
Ali Mohammad Pur
dabd60180f
LibRegex: Don't ignore references that weren't bound in checked blocks
...
Fixes #2281 .
2024-11-12 10:37:57 +01:00
Timothy Flynn
93712b24bf
Everywhere: Hoist the Libraries folder to the top-level
2024-11-10 12:50:45 +01:00