ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-10-21 23:50:06 +00:00

Author	SHA1	Message	Date
Aliaksandr Kalenik	d47a22150d	AK: Define `operator==` for HashMap	2025-07-30 11:06:05 +02:00
Grant Knowlton	9e1e4f3b15	AK: Validate compressed tags in IPv4-mapped IPv6 address This disallows parsing IPv4 mapped IPv6 address strings with multiple compression prefixes. Tests are provided for the updated functionality.	2025-07-30 00:53:10 +02:00
Timothy Flynn	d9502505c2	AK: Fix bounds assertions in Utf16View::iterator_offset	2025-07-28 18:30:50 +02:00
Timothy Flynn	67723ef83c	AK: Add a method to peek ahead of a UTF-16 iterator	2025-07-28 18:30:50 +02:00
Timothy Flynn	21d7d236e6	AK: Add a method to check if a UTF-16 string contains any code point	2025-07-28 18:30:50 +02:00
Timothy Flynn	96e75a023b	AK: Implement a UTF-16 UnixDateTime stringifier	2025-07-28 12:25:11 +02:00
Timothy Flynn	ed63a60247	AK: Return an empty optional when UTF-16 code unit lookup fails Accidentally returned the wrong type here.	2025-07-28 12:25:11 +02:00
Timothy Flynn	baddac5155	AK: Implement a method to split a UTF-16 string	2025-07-28 12:25:11 +02:00
Timothy Flynn	48a3b2c28e	AK: Implement a method to count instances of a needle in a UTF-16 string	2025-07-28 12:25:11 +02:00
Timothy Flynn	1375e6bf39	AK+LibJS+LibWeb: Use simdutf to create well-formed strings	2025-07-26 00:40:06 +02:00
Timothy Flynn	a740bfd8ff	AK+LibUnicode: Implement Unicode-aware UTF-16 case transformations	2025-07-25 18:16:22 +02:00
Timothy Flynn	df77ae1920	AK: Implement creating a UTF-16 string from a repeated code point	2025-07-25 18:16:22 +02:00
Timothy Flynn	a46e9b2adb	AK: Compute the correct capacity in StringBuilder::try_append_repeated This was mistakenly broken in `2803d66d87`.	2025-07-25 18:16:22 +02:00
Timothy Flynn	745f288796	AK: Implement a method to acquire a UTF-16 iterator's code unit offset This is the same as Utf8View::iterator_offset().	2025-07-25 18:16:22 +02:00
Jelle Raaijmakers	0b96690f0c	AK: Add HashMap::update() This updates a HashMap by copying another HashMap's keys and values.	2025-07-25 16:22:06 +02:00
Timothy Flynn	6c73dff120	AK: Implement a UTF-16 method to check if a string is ASCII whitespace	2025-07-24 19:00:20 +02:00
Timothy Flynn	f53389bab1	AK: Add a couple of Utf16String factories * Utf16String::from_utf8_with_replacement_character * Utf16String::from_code_point	2025-07-24 19:00:20 +02:00
Jelle Raaijmakers	b1c3ce807b	AK: Rename Utf16View::trim_whitespace() to ::trim_ascii_whitespace() This reflects the naming of String::trim_ascii_whitespace() and better indicates what exactly we're trimming.	2025-07-24 07:18:25 -04:00
Jelle Raaijmakers	9a03ee1c24	AK: Fix mention of renamed member in Utf16View	2025-07-24 07:18:25 -04:00
Jelle Raaijmakers	15178d5230	AK: Add ::ends_with() to Utf16View and Utf16StringBase I noticed that we can significantly simplify ::starts_with(), and based the new ::ends_with() on that.	2025-07-24 07:18:25 -04:00
Jelle Raaijmakers	7f8468b0e6	AK: Compare pointers in TypedTransfer<T>::compare() We can return `true` quickly if the two pointers are identical.	2025-07-24 07:18:25 -04:00
Jelle Raaijmakers	54dd45d3f6	AK: Add Span::ends_with() Originally I added this to use it in Utf16View::ends_with(), but the final implementation ended up a lot simpler. I chose to keep this anyway since it mirrors Span::starts_with().	2025-07-24 07:18:25 -04:00
Timothy Flynn	6ddbb70051	AK: Remove constexpr specifier from Utf16View::bytes() The Span constructor used here uses reinterpret_cast under the hood, so it and Utf16View::bytes() cannot be constexpr.	2025-07-22 13:33:51 -04:00
Timothy Flynn	42b41431eb	AK+LibJS: Enforce limits in Utf16View offset computations RegExp was the only caller relying on being able to provide an offset larger than the string length. So let's do a pre-check in RegExp and then enforce that the offsets we receive in Utf16View are valid.	2025-07-22 17:17:33 +02:00
Timothy Flynn	ad7ac679fd	AK: Compute Utf16View::code_point_offset_of correctly There were a couple of issues here, including the following computation could actually overflow to NumericLimits<size_t>::max(): code_unit_offset -= it.length_in_code_units();	2025-07-22 17:17:33 +02:00
Timothy Flynn	0bbb725bcd	AK: Mark a couple of methods in Utf16View.h as constexpr	2025-07-22 17:17:33 +02:00
Jelle Raaijmakers	265e278275	AK: Allow indexing at length in Utf8View::byte_offset_of() And do the same for Utf8View::code_point_offset_of(). Some of these `VERIFY`s of the view's length were introduced recently, but they caused the parsing of named capture groups in RegexParser to crash in some situations. Instead, allow indexing at the view's length: the byte offset of code point `length()` is known, even though that code point does not exist in the view. Similarly, we know the code point offset at byte offset `byte_length()`. Beyond those offsets, we still crash. Fixes 13 failures in test262's `language/literals/regexp/named-groups`.	2025-07-22 09:10:32 -04:00
dmaivel	52a23dc02e	AK+LibWeb/CSS: Add `lower-greek` counter style	2025-07-21 15:18:17 +01:00
Jelle Raaijmakers	86dc3ce001	AK: Add `dbgln_dump()` macro This turns: dbgln_dump(some_expression() + 1); Into: dbgln("some_expression() + 1: {}", (some_expression() + 1));	2025-07-18 14:40:00 -04:00
Timothy Flynn	9582895759	AK+LibJS+LibWeb+LibRegex: Replace AK::Utf16Data with AK::Utf16String	2025-07-18 12:45:38 -04:00
Timothy Flynn	d40e3af697	AK: Implement UTF-16 string-to-number conversions	2025-07-18 12:45:38 -04:00
Timothy Flynn	6e0290ecaa	AK: Define some UTF-16 helper methods * contains * escape_html_entities * replace * to_ascii_lowercase * to_ascii_uppercase * to_ascii_titlecase * trim * trim_whitespace	2025-07-18 12:45:38 -04:00
Timothy Flynn	7f069efbc4	AK: Implement a flyweight string for Utf16String Utf16FlyString more or less works exactly the same as FlyString. It will store the raw encoded data of the string instance. If the string is a short ASCII string, Utf16FlyString holds the ShortString bytes; else, Utf16FlyString holds a pointer to the Utf16StringData.	2025-07-18 12:45:38 -04:00
Timothy Flynn	2803d66d87	AK: Support UTF-16 string formatting The underlying storage used during string formatting is StringBuilder. To support UTF-16 strings, this patch allows callers to specify a mode during StringBuilder construction. The default mode is UTF-8, for which StringBuilder remains unchanged. In UTF-16 mode, we treat the StringBuilder's internal ByteBuffer as a series of u16 code units. Appending a single character will append 2 bytes for that character (cast to a char16_t). Appending a StringView will transcode the string to UTF-16. Utf16String also gains the same memory optimization that we added for String, where we hand-off the underlying buffer to Utf16String to avoid having to re-allocate. In the future, we may want to further optimize for ASCII strings. For example, we could defer committing to the u16-esque storage until we see a non-ASCII code point.	2025-07-18 12:45:38 -04:00
Timothy Flynn	fe676585f5	AK: Add a UTF-16 string with optimized short- and ASCII-string storage This is a strictly UTF-16 string with some optimizations for ASCII. * If created from a short UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an inlined byte buffer. * If created with a long UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an outlined char buffer. * If created with a short or long UTF-8 or UTF-16 string that is not ASCII, then the string is stored in an outlined char16 buffer. We do not store short non-ASCII text in the inlined buffer to avoid confusion with operations such as `length_in_code_units` and `code_unit_at`. For example, "😀" would be stored as 4 UTF-8 bytes in short string form. But we still want `length_in_code_units` to be 2, and `code_unit_at(0)` to be 0xD83D.	2025-07-18 12:45:38 -04:00
Timothy Flynn	8fbb80fffc	AK: Do not fall back to simdutf for UTF-16 ASCII validation This was a mistake. Consider U+201C (LEFT DOUBLE QUOTATION MARK). This code point is encoded as the bytes 0x1c 0x20 in UTF-16LE. Both of these bytes are ASCII if interpreted as UTF-8. But the string itself is most certainly not ASCII.	2025-07-18 12:45:38 -04:00
Aliaksandr Kalenik	6be559f639	AK: Define ConstIterator for SegmentedVector	2025-07-13 19:15:05 +02:00
Shannon Booth	8bd43f2cb9	AK: Add hash traits for Optional<T> To enable storing it in a hashmap. 13 is a somewhat arbitrary value, something like 0 is not appropriate since a lot of types return 0 as a hash for an invalid / empty state.	2025-07-07 20:24:47 +01:00
Andrew Kaster	ad1938086d	AK: Add missing find_package command for fast_float This was missed in `62d9a84b8d`	2025-07-07 06:47:06 -04:00
Timothy Flynn	01ebf1eb07	AK: Replace surrogates in String::from_utf8_with_replacement_character Some checks are pending CI / macOS, arm64, Sanitizer_CI, Clang (push) Waiting to run Details CI / Linux, x86_64, Fuzzers_CI, Clang (push) Waiting to run Details CI / Linux, x86_64, Sanitizer_CI, GNU (push) Waiting to run Details CI / Linux, x86_64, Sanitizer_CI, Clang (push) Waiting to run Details Package the js repl as a binary artifact / Linux, arm64 (push) Waiting to run Details Package the js repl as a binary artifact / macOS, arm64 (push) Waiting to run Details Package the js repl as a binary artifact / Linux, x86_64 (push) Waiting to run Details Run test262 and test-wasm / run_and_update_results (push) Waiting to run Details Lint Code / lint (push) Waiting to run Details Label PRs with merge conflicts / auto-labeler (push) Waiting to run Details Push notes / build (push) Waiting to run Details We are expected to replace lonely surrogates with U+FFFD when decoding UTF-8 text.	2025-07-06 04:30:17 +12:00
Timothy Flynn	51afbf5280	AK: Simplify BOM parsing in String::from_utf8_with_replacement_character	2025-07-06 04:30:17 +12:00
Gingeh	f098bd029c	LibTextCodec: Replace unmatched utf16 surrogates	2025-07-05 09:58:57 -04:00
Timothy Flynn	418409aa6f	AK: Fix usage of constexpr within Utf16View and related utilities * Error and ErrorOr are not themelves constexpr, so a function returning these types cannot be constexpr. * The UDL was trying to call Utf16View::validate, which is not constexpr itself. The compiler will actually already raise an error if a UTF-16 literal is invalid, so let's just avoid the call altogether.	2025-07-05 01:25:22 +12:00
ayeteadoe	25f5936dee	CMake: Rename serenity_* helper functions/macros to ladybird_*	2025-07-03 23:19:41 +02:00
Timothy Flynn	69074a3841	AK: Avoid double allocations when converting UTF-16 LE/BE to UTF-8 We can form the UTF-8 string in-place.	2025-07-03 11:45:23 -04:00
Timothy Flynn	62d9a84b8d	AK+Everywhere: Replace custom number parsers with fast_float Some checks failed CI / macOS, arm64, Sanitizer_CI, Clang (push) Waiting to run Details CI / Linux, x86_64, Fuzzers_CI, Clang (push) Waiting to run Details CI / Linux, x86_64, Sanitizer_CI, GNU (push) Waiting to run Details CI / Linux, x86_64, Sanitizer_CI, Clang (push) Waiting to run Details Package the js repl as a binary artifact / Linux, arm64 (push) Waiting to run Details Package the js repl as a binary artifact / macOS, arm64 (push) Waiting to run Details Package the js repl as a binary artifact / Linux, x86_64 (push) Waiting to run Details Run test262 and test-wasm / run_and_update_results (push) Waiting to run Details Lint Code / lint (push) Waiting to run Details Label PRs with merge conflicts / auto-labeler (push) Waiting to run Details Push notes / build (push) Waiting to run Details Build Dev Container Image / build (push) Has been cancelled Details Our floating point number parser was based on the fast_float library: https://github.com/fastfloat/fast_float However, our implementation only supports 8-bit characters. To support UTF-16, we will need to be able to convert char16_t-based strings to numbers as well. This works out-of-the-box with fast_float. We can also use fast_float for integer parsing.	2025-07-03 09:51:56 -04:00
Timothy Flynn	9fc3e72db2	AK+Everywhere: Allow lonely UTF-16 surrogates by default By definition, the web allows lonely surrogates by default. Let's have our string APIs reflect this, so we don't have to pass an allow option all over the place.	2025-07-03 09:51:56 -04:00
Timothy Flynn	86b1c78c1a	AK+Everywhere: Prepare Utf16View for integration with a UTF-16 string To prepare for an upcoming Utf16String, this migrates Utf16View to store its data as a char16_t. Most function definitions are moved inline and made constexpr. This also adds a UDL to construct a Utf16View from a string literal: auto string = u"hello"sv; This let's us remove the NTTP Utf16View constructor, as we have found that such constructors bloat binary size quite a bit.	2025-07-03 09:51:56 -04:00
Timothy Flynn	c17b067e1d	AK: Completely remove endianness from Utf16View APIs These were mostly removed in `7628ddfaf7`. This removes the few remaining cases, as no callers are providing any non-host endianness. This is just to prevent weird API dissymmetry between Utf16View and an upcoming Utf16String.	2025-07-03 09:51:56 -04:00
Timothy Flynn	a0eb47e2fc	AK: Add hash traits for Utf16View This is based on the hash in JS::Utf16StringImpl::compute_hash.	2025-07-03 09:51:56 -04:00

1 2 3 4 5 ...

3919 commits