Commit graph

58 commits

Author SHA1 Message Date
sin-ack
74d76528d6 LibRegex: Display correct position for Compare in REGEX_DEBUG
When REGEX_DEBUG is enabled, LibRegex dumps a table of information
regarding the state of the regex bytecode execution. The Compare opcode
manipulates state.string_position directly, so the string_position value
cannot be used to display where the comparison started; therefore, this
patch introduces a new variable to keep track of where we were before
the comparison happened.
2021-06-16 16:30:12 +04:30
sin-ack
6b2e264093 LibRegex: Fix incorrect case-sensitive comparisons
A tiny typo was introduced in bc8d16ad which caused all case insensitive
comparisons to fail.
2021-06-16 16:30:12 +04:30
Gunnar Beutner
5bfe601152 LibRegex: Remove unused code 2021-06-14 16:09:58 +04:30
Gunnar Beutner
a167941852 LibRegex: Use a plain pointer for OpCode::m_state 2021-06-14 16:09:58 +04:30
Gunnar Beutner
d3c2a3caea LibRegex: Avoid initialization checks in get_opcode_by_id() 2021-06-14 16:09:58 +04:30
Gunnar Beutner
794dc368f1 LibRegex: Avoid prepending items to vectors 2021-06-14 16:09:58 +04:30
Gunnar Beutner
214410b397 LibRegex: Avoid making unnecessary string copies 2021-06-14 16:09:58 +04:30
Gunnar Beutner
281f39073d LibRegex: Make get_opcode() return a reference
Previously this would return a pointer which could be null if the
requested opcode was invalid. This should never be the case though
so let's VERIFY() that instead.
2021-06-14 16:09:58 +04:30
Gunnar Beutner
cd49fb0229 LibRegex: Remove return value for setters 2021-06-14 16:09:58 +04:30
Gunnar Beutner
1fb4471506 LibRegex: Use a plain array to store opcodes
Using a hash map is unnecessary because the number of opcodes and their
IDs never change.
2021-06-14 16:09:58 +04:30
Gunnar Beutner
d476144565 Userland: Allow building SerenityOS with -funsigned-char
Some of the code assumed that chars were always signed while that is
not the case on ARM hosts.

Also, some of the code tried to use EOF (-1) in a way similar to what
fgetc() does, however instead of storing the characters in an int
variable a char was used.

While this seemed to work it also meant that character 0xFF would be
incorrectly seen as an end-of-file.

Careful reading of fgetc() reveals that fgetc() stores character
data in an int where valid characters are in the range of 0-255 and
the EOF value is explicitly outside of that range (usually -1).
2021-06-13 18:52:58 +02:00
Andreas Kling
dc65f54c06 AK: Rename Vector::append(Vector) => Vector::extend(Vector)
Let's make it a bit more clear when we're appending the elements from
one vector to the end of another vector.
2021-06-12 13:24:45 +02:00
Linus Groh
939da41fa1 LibRegex: Fix compilation errors on my host machine
I have no idea *why*, but this stopped working suddenly:

    return { { .code_point = '-', .is_character_class = false } };

Fails with:

    error: could not convert ‘{{'-', false}}’ from
    ‘<brace-enclosed initializer list>’ to
    ‘AK::Optional<regex::CharClassRangeElement>

Might be related to 66f15c2 somehow, going one past that commit makes
the build work again, however reverting the commit doesn't. Not sure
what's up with that.

Consider this patch a band-aid until we can find the reason and an
actual fix...

Compiler version:
gcc (GCC) 11.1.1 20210531 (Red Hat 11.1.1-3)
2021-06-06 09:26:07 +01:00
Max Wipfli
bc8d16ad28 Everywhere: Replace ctype.h to avoid narrowing conversions
This replaces ctype.h with CharacterType.h everywhere I could find
issues with narrowing conversions. While using it will probably make
sense almost everywhere in the future, the most critical places should
have been addressed.
2021-06-03 13:31:46 +02:00
Linus Groh
a5903ac4b6 LibRegex: Hide stray dbgln() behind REGEX_DEBUG 2021-06-02 18:31:43 +01:00
Andreas Kling
12a42edd13 Everywhere: codepoint => code point 2021-06-01 10:01:11 +02:00
Linus Groh
dac0554fa0 LibRegex: Replace fprintf()/printf() with warnln()/outln()/dbgln() 2021-05-31 17:43:54 +01:00
Linus Groh
d60ebbbba6 Revert "Userland: static vs non-static constexpr variables"
This reverts commit 800ea8ea96.

Booting the system no longer worked after these changes.
2021-05-21 10:30:52 +01:00
Lenny Maiorani
800ea8ea96 Userland: static vs non-static constexpr variables
Problem:
- `static` variables consume memory and sometimes are less
  optimizable.
- `static const` variables can be `constexpr`, usually.
- `static` function-local variables require an initialization check
  every time the function is run.

Solution:
- If a global `static` variable is only used in a single function then
  move it into the function and make it non-`static` and `constexpr`.
- Make all global `static` variables `constexpr` instead of `const`.
- Change function-local `static const[expr]` variables to be just
  `constexpr`.
2021-05-21 10:07:06 +01:00
Andreas Kling
79ff1902aa LibRegex: Convert StringBuilder::appendf() => AK::Format 2021-05-07 21:12:09 +02:00
Brian Gianforcaro
6e918e4e02 Tests: Move LibRegex tests to Tests/LibRegex 2021-05-06 17:54:28 +02:00
Gunnar Beutner
6cf59b6ae9 Everywhere: Turn #if *_DEBUG into dbgln_if/if constexpr 2021-05-01 21:25:06 +02:00
Brian Gianforcaro
cf0640c870 Build: Remove unused ${REGEX_SOURCES} from the tests CMakeLists.txt 2021-04-29 10:37:26 +02:00
Linus Groh
dbe72fd962 Everywhere: Remove empty line after function body opening curly brace 2021-04-25 20:20:00 +02:00
Andrew Kaster
35c0a6c54d AK+Userland: Move AK/TestSuite.h into LibTest and rework Tests' CMake
As many macros as possible are moved to Macros.h, while the
macros to create a test case are moved to TestCase.h. TestCase is now
the only user-facing header for creating a test case. TestSuite and its
helpers have moved into a .cpp file. Instead of requiring a TEST_MAIN
macro to be instantiated into the test file, a TestMain.cpp file is
provided instead that will be linked against each test. This has the
side effect that, if we wanted to have test cases split across multiple
files, it's as simple as adding them all to the same executable.

The test main should be portable to kernel mode as well, so if
there's a set of tests that should be run in self-test mode in kernel
space, we can accomodate that.

A new serenity_test CMake function streamlines adding a new test with
arguments for the test source file, subdirectory under /usr/Tests to
install the test application and an optional list of libraries to link
against the test application. To accomodate future test where the
provided TestMain.cpp is not suitable (e.g. test-js), a CUSTOM_MAIN
parameter can be passed to the function to not link against the
boilerplate main function.
2021-04-25 09:36:49 +02:00
Linus Groh
a4c1860bfc LibRegex: Put to dbgln()s behind REGEX_DEBUG 2021-04-23 20:52:12 +02:00
Ali Mohammad Pur
bf9c04a3da LibRegex: Implement multiline stateful matches 2021-04-23 10:05:04 +02:00
Ali Mohammad Pur
bb40d4d5ff LibRegex: Do not attempt to find more matches when one match is needed 2021-04-23 10:05:04 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Andreas Kling
de7062af9c LibRegex: Unbreak the ALL_DEBUG build 2021-04-22 09:23:28 +02:00
Andreas Kling
c68dcf45b6 LibRegex: Convert String::format() => String::formatted() 2021-04-21 23:49:02 +02:00
AnotherTest
5a14f7ea2f LibRegex: Generate a 'Compare' op for empty character classes
Otherwise it would match zero-length strings.
Fixes #6256.
2021-04-12 08:54:58 +02:00
AnotherTest
c128b3fd91 LibRegex: Remove 'ReadDigitFollowPolicy' as it's no longer needed
Thanks to @GMTA: 1b071455b1 (r49343474)
2021-04-10 12:10:45 +02:00
AnotherTest
1b071455b1 LibRegex: Treat brace quantifiers with invalid contents as literals
Fixes #6208.
2021-04-10 09:16:03 +02:00
AnotherTest
25d336bc27 LibRegex: Take the regex as a const reference in print_bytecode() 2021-04-10 09:16:03 +02:00
AnotherTest
e9279d1790 LibRegex: Allow a '?' suffix for brace quantifiers
This fixes another compat point in #6042.
2021-04-10 09:16:03 +02:00
AnotherTest
8d7bcc2476 LibRegex: Give ByteCode a copy ctor and and a move assignment operator
Previously all move assignments were actually copies. oops.
2021-04-10 09:16:03 +02:00
Jelle Raaijmakers
db321db5f4 LibRegex: Parse \0 as a zero-byte instead of 0x30 ("0")
This was causing some regexes to trip up. Fixes #6202.
2021-04-09 21:53:14 +02:00
AnotherTest
ade97d4094 LibRegex: Make sure there are as many group matches as actual matches
Fixes #6131.
2021-04-05 09:02:06 +02:00
AnotherTest
1bdc1cf77e LibRegex: Consider named capture groups as normal capture groups too 2021-04-05 09:02:06 +02:00
AnotherTest
be0182d049 LibRegex: Reset capture group indices when resetting parser state 2021-04-05 09:02:06 +02:00
AnotherTest
76f63c2980 LibRegex: Allocate entries for all capture groups in RegexResult
Not just the seen ones.
Fixes #6108.
2021-04-04 16:04:06 +02:00
AnotherTest
0f468a5013 LibRegex: Test alternatives in the expected order
That is, first try to match the left side of the alternation, and then
the right side.
Fixes part of #6042.
2021-04-01 21:55:47 +02:00
AnotherTest
6bbb26fdaf LibRegex: Allow references to capture groups that aren't parsed yet
This only applies to the ECMA262 parser.
This behaviour is an ECMA262-specific quirk, such references always
generate zero-length matches (even on subsequent passes).
Also adds a test in LibJS's test suite.

Fixes #6039.
2021-04-01 21:55:47 +02:00
Andreas Kling
ef1e5db1d0 Everywhere: Remove klog(), dbg() and purge all LogStream usage :^)
Good-bye LogStream. Long live AK::Format!
2021-03-12 17:29:37 +01:00
Andrew Kaster
dc6485cfcb LibRegex: VERIFY that string builder in print_header is not null.
I don't know why g++ thinks this is the case with
ENABLE_ALL_DEBUG_MACROS when building for serenity. Adding an assert to
placate it seems reasonable
2021-02-28 18:19:37 +01:00
Andrew Kaster
e787738c24 Meta: Build AK and LibRegex tests in Lagom and for Serenity
These tests were never built for the serenity target. Move their Lagom
build steps to the Lagom CMakeLists.txt, and add serenity build steps
for them. Also, fix the build errors when building them with the
serenity cross-compiler :^)
2021-02-28 18:19:37 +01:00
AnotherTest
e0ac85288e LibRegex: Allow missing high bound in {x,y} quantifiers
Fixes #5518.
2021-02-27 07:31:01 +01:00
AnotherTest
91bf3dc7fe LibRegex: Match the escaped part of escaped syntax characters
Previously, `\^` would've matched `\`, not `^`.
2021-02-27 07:31:01 +01:00
AnotherTest
f05e518cbc LibRegex: Implement section B.1.4. of the ECMA262 spec
This allows the parser to deal with crazy patterns like the one
in #5517.
2021-02-27 07:31:01 +01:00