ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-10-19 14:40:18 +00:00

Author	SHA1	Message	Date
Eli Youngs	87a961534f	LibRegex: Prevent patterns from matching the empty string twice Previously, if a pattern matched the empty string (e.g. ".*"), it would match the string twice instead of once. Among other issues, this caused a Regex replacement to duplicate its expected output, since it would replace "both" empty matches.	2023-01-06 13:52:21 -07:00
Linus Groh	57dc179b1f	Everywhere: Rename to_{string => deprecated_string}() where applicable This will make it easier to support both string types at the same time while we convert code, and tracking down remaining uses. One big exception is Value::to_string() in LibJS, where the name is dictated by the ToString AO.	2022-12-06 08:54:33 +01:00
Linus Groh	6e19ab2bbc	AK+Everywhere: Rename String to DeprecatedString We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)	2022-12-06 08:54:33 +01:00
Ali Mohammad Pur	f1851346d3	LibRegex: Use a copy-on-write vector for fork state	2022-11-17 20:13:04 +03:30
Ali Mohammad Pur	cfcd6e770c	LibRegex: Don't copy forked results twice	2022-11-17 20:13:04 +03:30
Tim Schumacher	8763dbcccc	Everywhere: Remove a bunch of dead write-only variables LLVM 15 now warns (and thus errors) about this, and there is really no point in keeping them.	2022-09-16 05:39:28 +00:00
sin-ack	3f3f45580a	Everywhere: Add sv suffix to strings relying on StringView(char const) Each of these strings would previously rely on StringView's char const constructor overload, which would call __builtin_strlen on the string. Since we now have operator ""sv, we can replace these with much simpler versions. This opens the door to being able to remove StringView(char const*). No functional changes.	2022-07-12 23:11:35 +02:00
Timothy Flynn	3729fd06fa	LibRegex: Do not return an Optional from Regex::Matcher::execute The code path that could return an optional no longer exists as of commit: `a962ee020a`	2022-02-05 19:06:50 +03:30
Timothy Flynn	27d3de1f17	LibRegex: Do not continue searching input when the sticky bit is set This partially reverts commit `a962ee020a`. When the sticky bit is set, the global bit should basically be ignored except by external callers who want their own special behavior. For example, RegExp.prototype [ @@match ] will use the global flag to accumulate consecutive matches. But on the first failure, the regex loop should break.	2022-02-05 19:06:50 +03:30
Ali Mohammad Pur	a962ee020a	LibJS+LibRegex: Don't repeat regex match in regexp_exec() LibRegex already implements this loop in a more performant way, so all LibJS has to do here is to return things in the right shape, and not loop over the input string. Previously this was a quadratic operation on string length, which lead to crazy execution times on failing regexps - now it's nice and fast :^) Note that a Regex test has to be updated to remove the stateful flag as it repeats matching on multiple strings.	2022-02-05 00:09:32 +01:00
Ali Mohammad Pur	2b028f6faa	LibRegex+LibJS: Avoid searching for more than one match in JS RegExps All of JS's regular expression APIs only want a single match, so avoid trying to produce more (which will be discarded anyway).	2022-02-05 00:09:32 +01:00
Ali Mohammad Pur	5fac41f733	LibRegex: Implement ECMA262 multiline matching without splitting lines As ECMA262 regex allows `[^]` and literal newlines to match newlines in the input string, we shouldn't split the input string into lines, rather simply make boundaries and catchall patterns capable of checking for these conditions specifically.	2022-01-26 00:53:09 +03:30
Ali Mohammad Pur	cd83325c7c	LibRegex: Preserve capture groups and matches across ForkReplace This makes the (flawed) ForkStay inserted as a loop header unnecessary, and finally fixes LibRegex rewriting weird loops in weird ways.	2022-01-22 00:35:49 +00:00
Ali Mohammad Pur	704e0654b3	Revert "LibRegex: Implement an ECMA262 Regex quirk with negative loo..." This partially reverts commit `c11be92e23`. That commit fixes one thing and breaks many more, a next commit will implement this quirk in a more sane way.	2022-01-22 00:35:49 +00:00
Ali Mohammad Pur	9eccd4c56e	LibRegex: Allow the pattern to match the zero-length end of the string ...only if Multiline is not enabled. Fixes #11940.	2022-01-21 18:14:08 +03:30
Ali Mohammad Pur	c11be92e23	LibRegex: Implement an ECMA262 Regex quirk with negative lookarounds This implements the quirk defined by "Note 3" in section "Canonicalize" (https://tc39.es/ecma262/#sec-runtime-semantics-canonicalize-ch). Crosses off another quirk from #6042.	2022-01-21 18:14:08 +03:30
Hendiadyoin1	a2563496f5	LibRegex: Remove some else-after-returns	2021-12-21 18:17:28 -08:00
Hendiadyoin1	b674de6957	LibRegex: Add some implied auto qualifiers	2021-12-21 18:17:28 -08:00
Andreas Kling	8b1108e485	Everywhere: Pass AK::StringView by value	2021-11-11 01:27:46 +01:00
Ben Wiederhake	50698a0db4	AK: Prevent accidental misuse of BumpAllocator In particular, we implicitly required that the caller initializes the returned instances themselves (solved by making UniformBumpAllocator::allocate call the constructor), and BumpAllocator itself cannot handle classes that are not trivially deconstructible (solved by deleting the method). Co-authored-by: Ali Mohammad Pur <ali.mpfard@gmail.com>	2021-10-23 19:02:54 +01:00
Andreas Kling	bb6634b024	LibRegex: Don't emit signpost events for every regular expression The time we were spending on these signposts was adding up to way too much, so let's not do it automatically.	2021-10-02 16:53:03 +02:00
Ali Mohammad Pur	e4b1c0b8b1	LibRegex: Set a signpost on every executed regular expression	2021-09-13 14:38:53 +04:30
Ali Mohammad Pur	246ab432ff	LibRegex: Add a basic optimization pass This currently tries to convert forking loops to atomic groups, and unify the left side of alternations.	2021-09-13 14:38:53 +04:30
Timothy Flynn	9509433e25	LibRegex: Implement and use a REPEAT operation for bytecode repetition Currently, when we need to repeat an instruction N times, we simply add that instruction N times in a for-loop. This doesn't scale well with extremely large values of N, and ECMA-262 allows up to N = 2^53 - 1. Instead, add a new REPEAT bytecode operation to defer this loop from the parser to the runtime executor. This allows the parser to complete sans any loops (for this instruction), and allows the executor to bail early if the repeated bytecode fails. Note: The templated ByteCode methods are to allow the Posix parsers to continue using u32 because they are limited to N = 2^20.	2021-08-15 11:43:45 +01:00
Timothy Flynn	a0b72f5ad3	LibRegex: Remove (mostly) unused regex::MatchOutput This struct holds a counter for the number of executed operations, and vectors for matches, captures groups, and named capture groups. Each of the vectors is unused. Remove the struct and just keep a separate counter for the executed operations.	2021-08-15 11:43:45 +01:00
Timothy Flynn	f1ce998d73	LibRegex+LibJS: Combine named and unnamed capture groups in MatchState Combining these into one list helps reduce the size of MatchState, and as a result, reduces the amount of memory consumed during execution of very large regex matches. Doing this also allows us to remove a few regex byte code instructions: ClearNamedCaptureGroup, SaveLeftNamedCaptureGroup, and NamedReference. Named groups now behave the same as unnamed groups for these operations. Note that SaveRightNamedCaptureGroup still exists to cache the matched group name. This also removes the recursion level from the MatchState, as it can exist as a local variable in Matcher::execute instead.	2021-08-15 11:43:45 +01:00
Timothy Flynn	fea181bde3	LibRegex: Reduce RegexMatcher's BumpAllocator chunk size Before the BumpAllocator OOB access issue was understood and fixed, the chunk size was increased to 8MiB as a workaround in commit: `27d555bab0`. The issue is now resolved by: `0f1425c895`. We can reduce the chunk size to 2MiB, which has the added benefit of reducing runtime of the RegExp.prototype.exec test.	2021-08-15 11:43:45 +01:00
Timothy Flynn	27d555bab0	LibRegex: Track string position in both code units and code points In non-Unicode mode, the existing MatchState::string_position is tracked in code units; in Unicode mode, it is tracked in code points. In order for some RegexStringView operations to be performant, it is useful for the MatchState to have a field to always track the position in code units. This will allow RegexStringView methods (e.g. operator[]) to perform lookups based on code unit offsets, rather than needing to iterate over the entire string to find a code point offset.	2021-08-04 11:18:24 +02:00
Ali Mohammad Pur	d5984d296f	LibRegex: Make Matcher<>::match(Vector<>) take a reference to the vector It was previously copying the entire vector every time, which is not a nice thing to do. :^)	2021-08-02 17:22:50 +04:30
Ali Mohammad Pur	a7653e6a05	LibRegex: Use a bump-allocated linked list for fork save states This makes it avoid the excessively high malloc() traffic.	2021-08-02 17:22:50 +04:30
Ali Mohammad Pur	5f342e4fa9	LibRegex: Make Fork{Jump,Stay} non-recursive This makes very fork-heavy expressions (like `(aa)*`) not run out of stack space when matching very long strings.	2021-08-02 17:22:50 +04:30
Brian Gianforcaro	18d6f9ed5c	Libraries: Remove unused header includes	2021-08-01 08:10:16 +02:00
Timothy Flynn	1400e3cf58	LibRegex: Allow separately parsing patterns and creating Regex objects Adds a static method to parse a regex pattern and return the result, and a constructor to accept a parse result. This is to allow LibJS to parse the pattern string of a RegExpLiteral once and hand off regex objects any number of times thereafter.	2021-07-30 21:26:31 +01:00
Timothy Flynn	b162517065	LibRegex: Take ownership of pattern string and fix move operations The Regex object created a copy of the pattern string anyways, so tweak the constructor to allow callers to move() pattern strings into the regex. The Regex move constructor and assignment operator currently result in memory corruption. The Regex object stores a Matcher object, which holds a reference to the Regex object. So when the Regex object is moved, that reference is no longer valid. To fix this, the reference stored in the Matcher must be updated when the Regex is moved.	2021-07-30 21:26:31 +01:00
Timothy Flynn	47f6bb38a1	LibRegex: Support UTF-16 RegexStringView and improve Unicode matching When the Unicode option is not set, regular expressions should match based on code units; when it is set, they should match based on code points. To do so, the regex parser must combine surrogate pairs when the Unicode option is set. Further, RegexStringView needs to know if the flag is set in order to return code point vs. code unit based string lengths and substrings.	2021-07-23 23:06:57 +01:00
Ali Mohammad Pur	36bfc912fc	LibRegex: Switch to east-const style	2021-07-23 21:19:21 +04:30
Ali Mohammad Pur	f364fcec5d	LibRegex+Everywhere: Make LibRegex more unicode-aware This commit makes LibRegex (mostly) capable of operating on any of the three main string views: - StringView for raw strings - Utf8View for utf-8 encoded strings - Utf32View for raw unicode strings As a result, regexps with unicode strings should be able to properly handle utf-8 and not stop in the middle of a code point. A future commit will update LibJS to use the correct type of string depending on the flags.	2021-07-18 21:10:55 +04:30
Ali Mohammad Pur	54d89609de	LibRegex: Add support for the Basic POSIX regular expressions This implements the internal regex stuff for #8506.	2021-07-10 13:33:08 +02:00
Timothy Flynn	0f0ac37b56	LibRegex: Break from execution loop when the sticky flag is set If the sticky flag is set, the regex execution loop should break immediately even if the execution was a failure. The specification for several RegExp.prototype methods (e.g. exec and @@split) rely on this behavior.	2021-07-09 19:45:55 +01:00
Gunnar Beutner	794dc368f1	LibRegex: Avoid prepending items to vectors	2021-06-14 16:09:58 +04:30
Gunnar Beutner	281f39073d	LibRegex: Make get_opcode() return a reference Previously this would return a pointer which could be null if the requested opcode was invalid. This should never be the case though so let's VERIFY() that instead.	2021-06-14 16:09:58 +04:30
Linus Groh	dac0554fa0	LibRegex: Replace fprintf()/printf() with warnln()/outln()/dbgln()	2021-05-31 17:43:54 +01:00
Andreas Kling	79ff1902aa	LibRegex: Convert StringBuilder::appendf() => AK::Format	2021-05-07 21:12:09 +02:00
Linus Groh	a4c1860bfc	LibRegex: Put to dbgln()s behind REGEX_DEBUG	2021-04-23 20:52:12 +02:00
Ali Mohammad Pur	bf9c04a3da	LibRegex: Implement multiline stateful matches	2021-04-23 10:05:04 +02:00
Ali Mohammad Pur	bb40d4d5ff	LibRegex: Do not attempt to find more matches when one match is needed	2021-04-23 10:05:04 +02:00
Brian Gianforcaro	1682f0b760	Everything: Move to SPDX license identifiers in all files. SPDX License Identifiers are a more compact / standardized way of representing file license information. See: https://spdx.dev/resources/use/#identifiers This was done with the `ambr` search and replace tool. ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *	2021-04-22 11:22:27 +02:00
AnotherTest	ade97d4094	LibRegex: Make sure there are as many group matches as actual matches Fixes #6131.	2021-04-05 09:02:06 +02:00
AnotherTest	76f63c2980	LibRegex: Allocate entries for all capture groups in RegexResult Not just the seen ones. Fixes #6108.	2021-04-04 16:04:06 +02:00
Andreas Kling	5d180d1f99	Everywhere: Rename ASSERT => VERIFY (...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED) Since all of these checks are done in release builds as well, let's rename them to VERIFY to prevent confusion, as everyone is used to assertions being compiled out in release. We can introduce a new ASSERT macro that is specifically for debug checks, but I'm doing this wholesale conversion first since we've accumulated thousands of these already, and it's not immediately obvious which ones are suitable for ASSERT.	2021-02-23 20:56:54 +01:00

1 2

55 commits