ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-07-12 12:01:52 +00:00

Author	SHA1	Message	Date
Timothy Flynn	27d555bab0	LibRegex: Track string position in both code units and code points In non-Unicode mode, the existing MatchState::string_position is tracked in code units; in Unicode mode, it is tracked in code points. In order for some RegexStringView operations to be performant, it is useful for the MatchState to have a field to always track the position in code units. This will allow RegexStringView methods (e.g. operator[]) to perform lookups based on code unit offsets, rather than needing to iterate over the entire string to find a code point offset.	2021-08-04 11:18:24 +02:00
Timothy Flynn	510bbcd8e0	AK+LibRegex: Add Utf16View::code_point_at and use it in RegexStringView The current method of iterating through the string to access a code point hurts performance quite badly for very large strings. The test262 test "RegExp/property-escapes/generated/Any.js" previously took 3 hours to complete; this one change brings it down to under 10 seconds.	2021-08-04 11:18:24 +02:00
Ali Mohammad Pur	5f342e4fa9	LibRegex: Make Fork{Jump,Stay} non-recursive This makes very fork-heavy expressions (like `(aa)*`) not run out of stack space when matching very long strings.	2021-08-02 17:22:50 +04:30
Ali Mohammad Pur	1dd1378159	LibRegex: Preserve the type of the match when clearing capture groups Even though the contents are supposed to be reset, the type should stay unchanged, as that's an assumption the engine is making.	2021-07-24 20:52:43 +04:30
Timothy Flynn	0e6375558d	AK+LibRegex: Partially implement case insensitive UTF-16 comparison This will work for ASCII code points. Unicode case folding will be needed for non-ASCII.	2021-07-23 23:06:57 +01:00
Timothy Flynn	47f6bb38a1	LibRegex: Support UTF-16 RegexStringView and improve Unicode matching When the Unicode option is not set, regular expressions should match based on code units; when it is set, they should match based on code points. To do so, the regex parser must combine surrogate pairs when the Unicode option is set. Further, RegexStringView needs to know if the flag is set in order to return code point vs. code unit based string lengths and substrings.	2021-07-23 23:06:57 +01:00
Ali Mohammad Pur	36bfc912fc	LibRegex: Switch to east-const style	2021-07-23 21:19:21 +04:30
Ali Mohammad Pur	f364fcec5d	LibRegex+Everywhere: Make LibRegex more unicode-aware This commit makes LibRegex (mostly) capable of operating on any of the three main string views: - StringView for raw strings - Utf8View for utf-8 encoded strings - Utf32View for raw unicode strings As a result, regexps with unicode strings should be able to properly handle utf-8 and not stop in the middle of a code point. A future commit will update LibJS to use the correct type of string depending on the flags.	2021-07-18 21:10:55 +04:30
Ali Mohammad Pur	2961982277	LibRegex: Use <...> includes in RegexMatch.h	2021-07-18 21:10:55 +04:30
Ali Mohammad Pur	da1fda73a7	LibRegex: Implement line splitting for Utf32View Co-authored-by: Timothy Flynn <trflynn89@pm.me>	2021-07-18 21:10:55 +04:30
sin-ack	74d76528d6	LibRegex: Display correct position for Compare in REGEX_DEBUG When REGEX_DEBUG is enabled, LibRegex dumps a table of information regarding the state of the regex bytecode execution. The Compare opcode manipulates state.string_position directly, so the string_position value cannot be used to display where the comparison started; therefore, this patch introduces a new variable to keep track of where we were before the comparison happened.	2021-06-16 16:30:12 +04:30
Brian Gianforcaro	1682f0b760	Everything: Move to SPDX license identifiers in all files. SPDX License Identifiers are a more compact / standardized way of representing file license information. See: https://spdx.dev/resources/use/#identifiers This was done with the `ambr` search and replace tool. ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *	2021-04-22 11:22:27 +02:00
AnotherTest	f05e518cbc	LibRegex: Implement section B.1.4. of the ECMA262 spec This allows the parser to deal with crazy patterns like the one in #5517.	2021-02-27 07:31:01 +01:00
Andreas Kling	5d180d1f99	Everywhere: Rename ASSERT => VERIFY (...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED) Since all of these checks are done in release builds as well, let's rename them to VERIFY to prevent confusion, as everyone is used to assertions being compiled out in release. We can introduce a new ASSERT macro that is specifically for debug checks, but I'm doing this wholesale conversion first since we've accumulated thousands of these already, and it's not immediately obvious which ones are suitable for ASSERT.	2021-02-23 20:56:54 +01:00
asynts	5c5665c1e7	Everywhere: Replace a bundle of dbg with dbgln. These changes are arbitrarily divided into multiple commits to make it easier to find potentially introduced bugs with git bisect.	2021-01-22 22:14:30 +01:00
Andreas Kling	13d7c09125	Libraries: Move to Userland/Libraries/	2021-01-12 12:17:46 +01:00

16 commits