ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-10-20 23:19:44 +00:00

Author	SHA1	Message	Date
Aliaksandr Kalenik	d47a22150d	AK: Define `operator==` for HashMap	2025-07-30 11:06:05 +02:00
Grant Knowlton	9e1e4f3b15	AK: Validate compressed tags in IPv4-mapped IPv6 address This disallows parsing IPv4 mapped IPv6 address strings with multiple compression prefixes. Tests are provided for the updated functionality.	2025-07-30 00:53:10 +02:00
Timothy Flynn	d9502505c2	AK: Fix bounds assertions in Utf16View::iterator_offset	2025-07-28 18:30:50 +02:00
Timothy Flynn	67723ef83c	AK: Add a method to peek ahead of a UTF-16 iterator	2025-07-28 18:30:50 +02:00
Timothy Flynn	21d7d236e6	AK: Add a method to check if a UTF-16 string contains any code point	2025-07-28 18:30:50 +02:00
Timothy Flynn	ed63a60247	AK: Return an empty optional when UTF-16 code unit lookup fails Accidentally returned the wrong type here.	2025-07-28 12:25:11 +02:00
Timothy Flynn	baddac5155	AK: Implement a method to split a UTF-16 string	2025-07-28 12:25:11 +02:00
Timothy Flynn	48a3b2c28e	AK: Implement a method to count instances of a needle in a UTF-16 string	2025-07-28 12:25:11 +02:00
Andrew Kaster	7d669b8b0c	AK: Update Swift test for Utf16String changes	2025-07-26 23:33:58 +02:00
Timothy Flynn	a740bfd8ff	AK+LibUnicode: Implement Unicode-aware UTF-16 case transformations	2025-07-25 18:16:22 +02:00
Timothy Flynn	df77ae1920	AK: Implement creating a UTF-16 string from a repeated code point	2025-07-25 18:16:22 +02:00
Jelle Raaijmakers	0b96690f0c	AK: Add HashMap::update() This updates a HashMap by copying another HashMap's keys and values.	2025-07-25 16:22:06 +02:00
Timothy Flynn	6c73dff120	AK: Implement a UTF-16 method to check if a string is ASCII whitespace	2025-07-24 19:00:20 +02:00
Timothy Flynn	f53389bab1	AK: Add a couple of Utf16String factories * Utf16String::from_utf8_with_replacement_character * Utf16String::from_code_point	2025-07-24 19:00:20 +02:00
Jelle Raaijmakers	15178d5230	AK: Add ::ends_with() to Utf16View and Utf16StringBase I noticed that we can significantly simplify ::starts_with(), and based the new ::ends_with() on that.	2025-07-24 07:18:25 -04:00
Jelle Raaijmakers	54dd45d3f6	AK: Add Span::ends_with() Originally I added this to use it in Utf16View::ends_with(), but the final implementation ended up a lot simpler. I chose to keep this anyway since it mirrors Span::starts_with().	2025-07-24 07:18:25 -04:00
Timothy Flynn	ad7ac679fd	AK: Compute Utf16View::code_point_offset_of correctly There were a couple of issues here, including the following computation could actually overflow to NumericLimits<size_t>::max(): code_unit_offset -= it.length_in_code_units();	2025-07-22 17:17:33 +02:00
Timothy Flynn	f595e47c1f	AK: Add unit tests for Utf16View::code_unit_offset_of	2025-07-22 17:17:33 +02:00
Jelle Raaijmakers	265e278275	AK: Allow indexing at length in Utf8View::byte_offset_of() And do the same for Utf8View::code_point_offset_of(). Some of these `VERIFY`s of the view's length were introduced recently, but they caused the parsing of named capture groups in RegexParser to crash in some situations. Instead, allow indexing at the view's length: the byte offset of code point `length()` is known, even though that code point does not exist in the view. Similarly, we know the code point offset at byte offset `byte_length()`. Beyond those offsets, we still crash. Fixes 13 failures in test262's `language/literals/regexp/named-groups`.	2025-07-22 09:10:32 -04:00
Timothy Flynn	9582895759	AK+LibJS+LibWeb+LibRegex: Replace AK::Utf16Data with AK::Utf16String	2025-07-18 12:45:38 -04:00
Timothy Flynn	d40e3af697	AK: Implement UTF-16 string-to-number conversions	2025-07-18 12:45:38 -04:00
Timothy Flynn	6e0290ecaa	AK: Define some UTF-16 helper methods * contains * escape_html_entities * replace * to_ascii_lowercase * to_ascii_uppercase * to_ascii_titlecase * trim * trim_whitespace	2025-07-18 12:45:38 -04:00
Timothy Flynn	7f069efbc4	AK: Implement a flyweight string for Utf16String Utf16FlyString more or less works exactly the same as FlyString. It will store the raw encoded data of the string instance. If the string is a short ASCII string, Utf16FlyString holds the ShortString bytes; else, Utf16FlyString holds a pointer to the Utf16StringData.	2025-07-18 12:45:38 -04:00
Timothy Flynn	2803d66d87	AK: Support UTF-16 string formatting The underlying storage used during string formatting is StringBuilder. To support UTF-16 strings, this patch allows callers to specify a mode during StringBuilder construction. The default mode is UTF-8, for which StringBuilder remains unchanged. In UTF-16 mode, we treat the StringBuilder's internal ByteBuffer as a series of u16 code units. Appending a single character will append 2 bytes for that character (cast to a char16_t). Appending a StringView will transcode the string to UTF-16. Utf16String also gains the same memory optimization that we added for String, where we hand-off the underlying buffer to Utf16String to avoid having to re-allocate. In the future, we may want to further optimize for ASCII strings. For example, we could defer committing to the u16-esque storage until we see a non-ASCII code point.	2025-07-18 12:45:38 -04:00
Timothy Flynn	fe676585f5	AK: Add a UTF-16 string with optimized short- and ASCII-string storage This is a strictly UTF-16 string with some optimizations for ASCII. * If created from a short UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an inlined byte buffer. * If created with a long UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an outlined char buffer. * If created with a short or long UTF-8 or UTF-16 string that is not ASCII, then the string is stored in an outlined char16 buffer. We do not store short non-ASCII text in the inlined buffer to avoid confusion with operations such as `length_in_code_units` and `code_unit_at`. For example, "😀" would be stored as 4 UTF-8 bytes in short string form. But we still want `length_in_code_units` to be 2, and `code_unit_at(0)` to be 0xD83D.	2025-07-18 12:45:38 -04:00
Timothy Flynn	8fbb80fffc	AK: Do not fall back to simdutf for UTF-16 ASCII validation This was a mistake. Consider U+201C (LEFT DOUBLE QUOTATION MARK). This code point is encoded as the bytes 0x1c 0x20 in UTF-16LE. Both of these bytes are ASCII if interpreted as UTF-8. But the string itself is most certainly not ASCII.	2025-07-18 12:45:38 -04:00
Timothy Flynn	01ebf1eb07	AK: Replace surrogates in String::from_utf8_with_replacement_character Some checks are pending CI / macOS, arm64, Sanitizer_CI, Clang (push) Waiting to run Details CI / Linux, x86_64, Fuzzers_CI, Clang (push) Waiting to run Details CI / Linux, x86_64, Sanitizer_CI, GNU (push) Waiting to run Details CI / Linux, x86_64, Sanitizer_CI, Clang (push) Waiting to run Details Package the js repl as a binary artifact / Linux, arm64 (push) Waiting to run Details Package the js repl as a binary artifact / macOS, arm64 (push) Waiting to run Details Package the js repl as a binary artifact / Linux, x86_64 (push) Waiting to run Details Run test262 and test-wasm / run_and_update_results (push) Waiting to run Details Lint Code / lint (push) Waiting to run Details Label PRs with merge conflicts / auto-labeler (push) Waiting to run Details Push notes / build (push) Waiting to run Details We are expected to replace lonely surrogates with U+FFFD when decoding UTF-8 text.	2025-07-06 04:30:17 +12:00
ayeteadoe	25f5936dee	CMake: Rename serenity_* helper functions/macros to ladybird_*	2025-07-03 23:19:41 +02:00
Timothy Flynn	62d9a84b8d	AK+Everywhere: Replace custom number parsers with fast_float Some checks failed CI / macOS, arm64, Sanitizer_CI, Clang (push) Waiting to run Details CI / Linux, x86_64, Fuzzers_CI, Clang (push) Waiting to run Details CI / Linux, x86_64, Sanitizer_CI, GNU (push) Waiting to run Details CI / Linux, x86_64, Sanitizer_CI, Clang (push) Waiting to run Details Package the js repl as a binary artifact / Linux, arm64 (push) Waiting to run Details Package the js repl as a binary artifact / macOS, arm64 (push) Waiting to run Details Package the js repl as a binary artifact / Linux, x86_64 (push) Waiting to run Details Run test262 and test-wasm / run_and_update_results (push) Waiting to run Details Lint Code / lint (push) Waiting to run Details Label PRs with merge conflicts / auto-labeler (push) Waiting to run Details Push notes / build (push) Waiting to run Details Build Dev Container Image / build (push) Has been cancelled Details Our floating point number parser was based on the fast_float library: https://github.com/fastfloat/fast_float However, our implementation only supports 8-bit characters. To support UTF-16, we will need to be able to convert char16_t-based strings to numbers as well. This works out-of-the-box with fast_float. We can also use fast_float for integer parsing.	2025-07-03 09:51:56 -04:00
Timothy Flynn	9fc3e72db2	AK+Everywhere: Allow lonely UTF-16 surrogates by default By definition, the web allows lonely surrogates by default. Let's have our string APIs reflect this, so we don't have to pass an allow option all over the place.	2025-07-03 09:51:56 -04:00
Timothy Flynn	86b1c78c1a	AK+Everywhere: Prepare Utf16View for integration with a UTF-16 string To prepare for an upcoming Utf16String, this migrates Utf16View to store its data as a char16_t. Most function definitions are moved inline and made constexpr. This also adds a UDL to construct a Utf16View from a string literal: auto string = u"hello"sv; This let's us remove the NTTP Utf16View constructor, as we have found that such constructors bloat binary size quite a bit.	2025-07-03 09:51:56 -04:00
Timothy Flynn	2abc955ca9	AK: Allow treating UTF-16 views with lonely surrogates as valid Much of the web requires us to allow lonely surrogates in UTF-16 data. The default behavior to disallow such code units has not been changed here - that will be changed in an upcoming commit.	2025-07-03 09:51:56 -04:00
Timothy Flynn	d978a582a0	AK: Add a Utf16View ASCII validator	2025-07-03 09:51:56 -04:00
Timothy Flynn	35a1832d08	Tests/AK: Rename TestUtf16 / TestUtf8 to TestUtf16View / TestUtf8View These are the files they actually test, so let's rename them to avoid confusion with an upcoming Utf16String test.	2025-07-03 09:51:56 -04:00
Luke Wilde	31a8004ddb	AK: Add the ability to consume specifically by a predicate This will be used by Content Security Policy to consume the next character, if it matches a whole range of characters, such as is_ascii_alpha.	2025-07-01 10:24:24 +12:00
Tomasz Strejczek	8f8e51b1fc	AK: Implement AK::UnixDateTime::to_string() Copy implementation of LibCore::DateTime::to_string() to AK. Rename TestDuration.cpp to TestTime.cpp and add there tests for to_string().	2025-06-19 18:42:45 -06:00
Tomasz Strejczek	e03c558a0a	AK: Implement demangle() for MSVC ABI This implements demangle() using Windows API. Also some rudimentary test is provided.	2025-06-17 18:39:18 -06:00
Sam Atkins	26105b8b11	AK: Add a Formatter for Checked This goes in Format.h instead of Checked.h, to avoid an include cycle.	2025-06-17 20:44:01 +02:00
Jelle Raaijmakers	6f926e6977	AK: Add `Utf8View::code_point_offset_of()`	2025-06-13 15:08:26 +02:00
Jelle Raaijmakers	cc0a28ee7d	AK: Add `Utf16View::find_code_unit_offset(_ignoring_case)`	2025-06-13 15:08:26 +02:00
Jelle Raaijmakers	7d7f6fa494	AK: Remove superfluous check from `Utf16View::equals_ignoring_case()` Returning true if both lengths are 0 is already handled by the default case.	2025-06-13 15:08:26 +02:00
Jelle Raaijmakers	b558b4dba6	AK: Add `Span<T>::index_of(ReadonlySpan)` This will be used for case-sensitive substring index matches in a later commit.	2025-06-13 15:08:26 +02:00
ayeteadoe	8cf01a25c2	AK: Add initial support for AK testsuite on Windows Some checks are pending CI / Lagom (arm64, Sanitizer_CI, false, macos-15, macOS, Clang) (push) Waiting to run Details CI / Lagom (x86_64, Fuzzers_CI, false, ubuntu-24.04, Linux, Clang) (push) Waiting to run Details CI / Lagom (x86_64, Sanitizer_CI, false, ubuntu-24.04, Linux, GNU) (push) Waiting to run Details CI / Lagom (x86_64, Sanitizer_CI, true, ubuntu-24.04, Linux, Clang) (push) Waiting to run Details Package the js repl as a binary artifact / build-and-package (arm64, macos-15, macOS, macOS-universal2) (push) Waiting to run Details Package the js repl as a binary artifact / build-and-package (x86_64, ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run Details Run test262 and test-wasm / run_and_update_results (push) Waiting to run Details Lint Code / lint (push) Waiting to run Details Label PRs with merge conflicts / auto-labeler (push) Waiting to run Details Push notes / build (push) Waiting to run Details We now explicitly enabling support for the minimum libraries needed to build and run the AK testsuite. 81/82 tests are running and passing. The exception is LexicalPath, as some path behaviour on Windows is different than Unix, so the current tests will have lots of platform specific failures. The implementer of LexicalPathWindows recommended windows-specific tests here, so I will do that in a follow up.	2025-05-20 10:58:43 -06:00
Ashton	5f5ae6bf8b	AK: Replace wchar_t formatting with char32_t Some checks are pending CI / Lagom (arm64, Sanitizer_CI, false, macos-15, macOS, Clang) (push) Waiting to run Details CI / Lagom (x86_64, Fuzzers_CI, false, ubuntu-24.04, Linux, Clang) (push) Waiting to run Details CI / Lagom (x86_64, Sanitizer_CI, false, ubuntu-24.04, Linux, GNU) (push) Waiting to run Details CI / Lagom (x86_64, Sanitizer_CI, true, ubuntu-24.04, Linux, Clang) (push) Waiting to run Details Package the js repl as a binary artifact / build-and-package (arm64, macos-15, macOS, macOS-universal2) (push) Waiting to run Details Package the js repl as a binary artifact / build-and-package (x86_64, ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run Details Run test262 and test-wasm / run_and_update_results (push) Waiting to run Details Lint Code / lint (push) Waiting to run Details Label PRs with merge conflicts / auto-labeler (push) Waiting to run Details Push notes / build (push) Waiting to run Details This makes TestFormat fully cross-platform as we no longer have to work around the 16 vs 32-bit wide strings	2025-05-18 19:18:13 -06:00
Ashton	4b3a3b0856	AK: Remove redundant TestPrint test This test was only useful when AK/PrintfImplementation.h existed. But that was removed 11 months ago, so since then this has just been testing std library functions not implemented by us.	2025-05-18 19:18:13 -06:00
Andreas Kling	734bc2a0ea	AK: Strip trailing zero decimals in default formatting of float numbers This gives us a more human-looking serialization of numbers by default, and in case a fixed number of decimal digits is actually wanted, we still have the 'f' specifier.	2025-05-18 17:23:34 +02:00
ayeteadoe	744fd91d0b	LibTest: Support death tests without child process cloning A challenge for getting LibTest working on Windows has always been CrashTest. It implements death tests similar to Google Test where a child process is cloned to invoke the expression that should abort/terminate the program. Then the exit code of the child is used by the parent test process to verify if the application correctly aborted/terminated due to invoking the expression. The problem was that finding an equivalent way to port Crash::run() to Windows was not looking very likely as publicly exposed Win32/ Native APIs have no equivalent to fork(); however, Windows actually does have native support for process cloning via undocumented NT APIs that clever people reverse engineered and published, see `NtCreateUserProcess()`. All that being said, this `EXPECT_DEATH()` implementation avoids needing to use a child process in general, allowing us to remove CrashTest in favour of a single cross-platform solution for death tests.	2025-05-16 13:23:32 -06:00
Andreas Kling	cf6e2531d9	AK: Make String::number() much faster for integer types Some checks are pending CI / Lagom (arm64, Sanitizer_CI, false, macos-15, macOS, Clang) (push) Waiting to run Details CI / Lagom (x86_64, Fuzzers_CI, false, ubuntu-24.04, Linux, Clang) (push) Waiting to run Details CI / Lagom (x86_64, Sanitizer_CI, false, ubuntu-24.04, Linux, GNU) (push) Waiting to run Details CI / Lagom (x86_64, Sanitizer_CI, true, ubuntu-24.04, Linux, Clang) (push) Waiting to run Details Package the js repl as a binary artifact / build-and-package (arm64, macos-15, macOS, macOS-universal2) (push) Waiting to run Details Package the js repl as a binary artifact / build-and-package (x86_64, ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run Details Run test262 and test-wasm / run_and_update_results (push) Waiting to run Details Lint Code / lint (push) Waiting to run Details Label PRs with merge conflicts / auto-labeler (push) Waiting to run Details Push notes / build (push) Waiting to run Details Instead of going through String::formatted(), we now have a specialized code path for base-10 serialization directly to UTF-8. This is roughly 5-10x faster than the previous implementation, depending on how many digits we end up outputting. 1.07x speedup on MicroBench/for-in-indexed-properties.js	2025-05-02 19:13:03 +02:00
Tim Ledbetter	31dea89fe0	AK: Add lowest common multiple and greatest common divisor functions	2025-04-23 09:13:45 +01:00
Jonne Ransijn	bb20a0d8f8	AK: Allow the `Optional<T>` move assignment operator to be trivial This will change behaviour for moved-from `Optional<T>`s, since they will now no longer clear their value if `T` is trivial. However, a moved-from value should be considered to be in an unspecified state. Use `Optional<T>::clear` or `Optional<T>::release_value` instead.	2025-04-22 21:19:31 -06:00

1 2 3 4 5 ...

571 commits