ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-07-07 01:21:57 +00:00

Author	SHA1	Message	Date
Timothy Flynn	484ccfadc3	LibRegex: Support property escapes of Unicode script extensions	2021-08-04 13:50:32 +01:00
Timothy Flynn	06088df729	LibRegex: Support property escapes of the Unicode script property Note that unlike binary properties and general categories, scripts must be specified in the non-binary (Script=Value) form.	2021-08-04 13:50:32 +01:00
Timothy Flynn	27d555bab0	LibRegex: Track string position in both code units and code points In non-Unicode mode, the existing MatchState::string_position is tracked in code units; in Unicode mode, it is tracked in code points. In order for some RegexStringView operations to be performant, it is useful for the MatchState to have a field to always track the position in code units. This will allow RegexStringView methods (e.g. operator[]) to perform lookups based on code unit offsets, rather than needing to iterate over the entire string to find a code point offset.	2021-08-04 11:18:24 +02:00
Timothy Flynn	1e10d6d7ce	LibRegex: Support property escapes of Unicode General Categories This changes LibRegex to parse the property escape as a Variant of Unicode Property & General Category values. A byte code instruction is added to perform matching based on General Category values.	2021-08-02 21:02:09 +04:30
Timothy Flynn	d485cf29d7	LibRegex+LibUnicode: Begin implementing Unicode property escapes This supports some binary property matching. It does not support any properties not yet parsed by LibUnicode, nor does it support value matching (such as Script_Extensions=Latin).	2021-07-30 21:26:31 +01:00
Ali Mohammad Pur	1dd1378159	LibRegex: Preserve the type of the match when clearing capture groups Even though the contents are supposed to be reset, the type should stay unchanged, as that's an assumption the engine is making.	2021-07-24 20:52:43 +04:30
Timothy Flynn	47f6bb38a1	LibRegex: Support UTF-16 RegexStringView and improve Unicode matching When the Unicode option is not set, regular expressions should match based on code units; when it is set, they should match based on code points. To do so, the regex parser must combine surrogate pairs when the Unicode option is set. Further, RegexStringView needs to know if the flag is set in order to return code point vs. code unit based string lengths and substrings.	2021-07-23 23:06:57 +01:00
Ali Mohammad Pur	36bfc912fc	LibRegex: Switch to east-const style	2021-07-23 21:19:21 +04:30
Ali Mohammad Pur	c8b2199251	LibRegex: Clear previous capture group contents in ECMA262 mode ECMA262 requires that the capture groups only contain the values from the last iteration, e.g. `((c)(a)?(b))` should _not_ contain 'a' in the second capture group when matching "cabcb".	2021-07-23 21:19:21 +04:30
Ali Mohammad Pur	f364fcec5d	LibRegex+Everywhere: Make LibRegex more unicode-aware This commit makes LibRegex (mostly) capable of operating on any of the three main string views: - StringView for raw strings - Utf8View for utf-8 encoded strings - Utf32View for raw unicode strings As a result, regexps with unicode strings should be able to properly handle utf-8 and not stop in the middle of a code point. A future commit will update LibJS to use the correct type of string depending on the flags.	2021-07-18 21:10:55 +04:30
Ali Mohammad Pur	052004f92d	LibRegex: Partially implement string compare for Utf32View	2021-07-18 21:10:55 +04:30
sin-ack	74d76528d6	LibRegex: Display correct position for Compare in REGEX_DEBUG When REGEX_DEBUG is enabled, LibRegex dumps a table of information regarding the state of the regex bytecode execution. The Compare opcode manipulates state.string_position directly, so the string_position value cannot be used to display where the comparison started; therefore, this patch introduces a new variable to keep track of where we were before the comparison happened.	2021-06-16 16:30:12 +04:30
sin-ack	6b2e264093	LibRegex: Fix incorrect case-sensitive comparisons A tiny typo was introduced in `bc8d16ad` which caused all case insensitive comparisons to fail.	2021-06-16 16:30:12 +04:30
Gunnar Beutner	d3c2a3caea	LibRegex: Avoid initialization checks in get_opcode_by_id()	2021-06-14 16:09:58 +04:30
Gunnar Beutner	214410b397	LibRegex: Avoid making unnecessary string copies	2021-06-14 16:09:58 +04:30
Gunnar Beutner	281f39073d	LibRegex: Make get_opcode() return a reference Previously this would return a pointer which could be null if the requested opcode was invalid. This should never be the case though so let's VERIFY() that instead.	2021-06-14 16:09:58 +04:30
Gunnar Beutner	cd49fb0229	LibRegex: Remove return value for setters	2021-06-14 16:09:58 +04:30
Gunnar Beutner	1fb4471506	LibRegex: Use a plain array to store opcodes Using a hash map is unnecessary because the number of opcodes and their IDs never change.	2021-06-14 16:09:58 +04:30
Max Wipfli	bc8d16ad28	Everywhere: Replace ctype.h to avoid narrowing conversions This replaces ctype.h with CharacterType.h everywhere I could find issues with narrowing conversions. While using it will probably make sense almost everywhere in the future, the most critical places should have been addressed.	2021-06-03 13:31:46 +02:00
Linus Groh	dac0554fa0	LibRegex: Replace fprintf()/printf() with warnln()/outln()/dbgln()	2021-05-31 17:43:54 +01:00
Linus Groh	dbe72fd962	Everywhere: Remove empty line after function body opening curly brace	2021-04-25 20:20:00 +02:00
Brian Gianforcaro	1682f0b760	Everything: Move to SPDX license identifiers in all files. SPDX License Identifiers are a more compact / standardized way of representing file license information. See: https://spdx.dev/resources/use/#identifiers This was done with the `ambr` search and replace tool. ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *	2021-04-22 11:22:27 +02:00
Andreas Kling	c68dcf45b6	LibRegex: Convert String::format() => String::formatted()	2021-04-21 23:49:02 +02:00
AnotherTest	6bbb26fdaf	LibRegex: Allow references to capture groups that aren't parsed yet This only applies to the ECMA262 parser. This behaviour is an ECMA262-specific quirk, such references always generate zero-length matches (even on subsequent passes). Also adds a test in LibJS's test suite. Fixes #6039.	2021-04-01 21:55:47 +02:00
AnotherTest	f05e518cbc	LibRegex: Implement section B.1.4. of the ECMA262 spec This allows the parser to deal with crazy patterns like the one in #5517.	2021-02-27 07:31:01 +01:00
Andreas Kling	5d180d1f99	Everywhere: Rename ASSERT => VERIFY (...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED) Since all of these checks are done in release builds as well, let's rename them to VERIFY to prevent confusion, as everyone is used to assertions being compiled out in release. We can introduce a new ASSERT macro that is specifically for debug checks, but I'm doing this wholesale conversion first since we've accumulated thousands of these already, and it's not immediately obvious which ones are suitable for ASSERT.	2021-02-23 20:56:54 +01:00
asynts	8465683dcf	Everywhere: Debug macros instead of constexpr. This was done with the following script: find . \( -name '.cpp' -o -name '.h' -o -name '.in' \) -not -path './Toolchain/' -not -path './Build/' -exec sed -i -E 's/dbgln<debug_([a-z_]+)>/dbgln<\U\1_DEBUG>/' {} \; find . \( -name '.cpp' -o -name '.h' -o -name '.in' \) -not -path './Toolchain/' -not -path './Build/' -exec sed -i -E 's/if constexpr \(debug_([a-z0-9_]+)/if constexpr \(\U\1_DEBUG/' {} \;	2021-01-25 09:47:36 +01:00
asynts	5c5665c1e7	Everywhere: Replace a bundle of dbg with dbgln. These changes are arbitrarily divided into multiple commits to make it easier to find potentially introduced bugs with git bisect.	2021-01-22 22:14:30 +01:00
Andreas Kling	13d7c09125	Libraries: Move to Userland/Libraries/	2021-01-12 12:17:46 +01:00

29 commits