Commit graph

8 commits

Author SHA1 Message Date
Ali Mohammad Pur
d2e51fafa9 LibRegex: Merge alternations based on blocks and not instructions
The instructions can have dependencies (e.g. Repeat), so only unify
equal blocks instead of consecutive instructions.
Fixes #11247.

Also adds the minimal test case(s) from that issue.
2021-12-15 19:36:45 +03:30
Ali Mohammad Pur
387df06385 LibRegex: Avoid rewriting a+ as a* as part of atomic rewriting
The initial `ForkStay` is only needed if the looping block has a
following block, if there's no following block or the following block
does not attempt to match anything, we should not insert the ForkStay,
otherwise we would be rewriting `a+` as `a*` by allowing the 'end' to be
executed.
Fixes #10952.
2021-11-18 09:09:22 +01:00
Ali Mohammad Pur
ac856cb965 LibRegex: Don't ignore empty alternatives in append_alternation()
Doing so would cause patterns like `(a|)` to not match the empty string.
2021-10-29 15:57:59 +02:00
Ali Mohammad Pur
8f722302d9 LibRegex: Use a match table for character classes
Generate a sorted, compressed series of ranges in a match table for
character classes, and use a binary search to find the matches.
This is about a 3-4x speedup for character class match performance. :^)
2021-10-03 19:16:36 +02:00
Andreas Kling
2758d99bbc LibRegex: Flatten bytecode before performing optimizations
This avoids doing DisjointChunks traversal for every bytecode access,
significantly reducing startup time for large regular expressions.
2021-09-29 18:45:26 +02:00
Ali Mohammad Pur
741886a4c4 LibRegex: Make the optimizer understand references and capture groups
Otherwise the fork in patterns like `(1+)\1` would be (incorrectly)
optimized away.
2021-09-15 15:52:28 +04:30
Ali Mohammad Pur
bf0315ff8f LibRegex: Avoid excessive Vector copy when compiling regexps
Previously we would've copied the bytecode instead of moving the chunks
around, use the fancy new DisjointChunks<T> abstraction to make that
happen automagically.
This decreases vector copies and uses of memmove() by nearly 10x :^)
2021-09-14 21:33:15 +04:30
Ali Mohammad Pur
246ab432ff LibRegex: Add a basic optimization pass
This currently tries to convert forking loops to atomic groups, and
unify the left side of alternations.
2021-09-13 14:38:53 +04:30