ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-10-20 06:59:47 +00:00

Author	SHA1	Message	Date
Timothy Flynn	d9502505c2	AK: Fix bounds assertions in Utf16View::iterator_offset	2025-07-28 18:30:50 +02:00
Timothy Flynn	67723ef83c	AK: Add a method to peek ahead of a UTF-16 iterator	2025-07-28 18:30:50 +02:00
Timothy Flynn	21d7d236e6	AK: Add a method to check if a UTF-16 string contains any code point	2025-07-28 18:30:50 +02:00
Timothy Flynn	ed63a60247	AK: Return an empty optional when UTF-16 code unit lookup fails Accidentally returned the wrong type here.	2025-07-28 12:25:11 +02:00
Timothy Flynn	baddac5155	AK: Implement a method to split a UTF-16 string	2025-07-28 12:25:11 +02:00
Timothy Flynn	48a3b2c28e	AK: Implement a method to count instances of a needle in a UTF-16 string	2025-07-28 12:25:11 +02:00
Timothy Flynn	745f288796	AK: Implement a method to acquire a UTF-16 iterator's code unit offset This is the same as Utf8View::iterator_offset().	2025-07-25 18:16:22 +02:00
Timothy Flynn	6c73dff120	AK: Implement a UTF-16 method to check if a string is ASCII whitespace	2025-07-24 19:00:20 +02:00
Jelle Raaijmakers	b1c3ce807b	AK: Rename Utf16View::trim_whitespace() to ::trim_ascii_whitespace() This reflects the naming of String::trim_ascii_whitespace() and better indicates what exactly we're trimming.	2025-07-24 07:18:25 -04:00
Jelle Raaijmakers	9a03ee1c24	AK: Fix mention of renamed member in Utf16View	2025-07-24 07:18:25 -04:00
Jelle Raaijmakers	15178d5230	AK: Add ::ends_with() to Utf16View and Utf16StringBase I noticed that we can significantly simplify ::starts_with(), and based the new ::ends_with() on that.	2025-07-24 07:18:25 -04:00
Jelle Raaijmakers	7f8468b0e6	AK: Compare pointers in TypedTransfer<T>::compare() We can return `true` quickly if the two pointers are identical.	2025-07-24 07:18:25 -04:00
Timothy Flynn	6ddbb70051	AK: Remove constexpr specifier from Utf16View::bytes() The Span constructor used here uses reinterpret_cast under the hood, so it and Utf16View::bytes() cannot be constexpr.	2025-07-22 13:33:51 -04:00
Timothy Flynn	ad7ac679fd	AK: Compute Utf16View::code_point_offset_of correctly There were a couple of issues here, including the following computation could actually overflow to NumericLimits<size_t>::max(): code_unit_offset -= it.length_in_code_units();	2025-07-22 17:17:33 +02:00
Timothy Flynn	0bbb725bcd	AK: Mark a couple of methods in Utf16View.h as constexpr	2025-07-22 17:17:33 +02:00
Timothy Flynn	9582895759	AK+LibJS+LibWeb+LibRegex: Replace AK::Utf16Data with AK::Utf16String	2025-07-18 12:45:38 -04:00
Timothy Flynn	d40e3af697	AK: Implement UTF-16 string-to-number conversions	2025-07-18 12:45:38 -04:00
Timothy Flynn	6e0290ecaa	AK: Define some UTF-16 helper methods * contains * escape_html_entities * replace * to_ascii_lowercase * to_ascii_uppercase * to_ascii_titlecase * trim * trim_whitespace	2025-07-18 12:45:38 -04:00
Timothy Flynn	fe676585f5	AK: Add a UTF-16 string with optimized short- and ASCII-string storage This is a strictly UTF-16 string with some optimizations for ASCII. * If created from a short UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an inlined byte buffer. * If created with a long UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an outlined char buffer. * If created with a short or long UTF-8 or UTF-16 string that is not ASCII, then the string is stored in an outlined char16 buffer. We do not store short non-ASCII text in the inlined buffer to avoid confusion with operations such as `length_in_code_units` and `code_unit_at`. For example, "😀" would be stored as 4 UTF-8 bytes in short string form. But we still want `length_in_code_units` to be 2, and `code_unit_at(0)` to be 0xD83D.	2025-07-18 12:45:38 -04:00
Timothy Flynn	418409aa6f	AK: Fix usage of constexpr within Utf16View and related utilities * Error and ErrorOr are not themelves constexpr, so a function returning these types cannot be constexpr. * The UDL was trying to call Utf16View::validate, which is not constexpr itself. The compiler will actually already raise an error if a UTF-16 literal is invalid, so let's just avoid the call altogether.	2025-07-05 01:25:22 +12:00
Timothy Flynn	9fc3e72db2	AK+Everywhere: Allow lonely UTF-16 surrogates by default By definition, the web allows lonely surrogates by default. Let's have our string APIs reflect this, so we don't have to pass an allow option all over the place.	2025-07-03 09:51:56 -04:00
Timothy Flynn	86b1c78c1a	AK+Everywhere: Prepare Utf16View for integration with a UTF-16 string To prepare for an upcoming Utf16String, this migrates Utf16View to store its data as a char16_t. Most function definitions are moved inline and made constexpr. This also adds a UDL to construct a Utf16View from a string literal: auto string = u"hello"sv; This let's us remove the NTTP Utf16View constructor, as we have found that such constructors bloat binary size quite a bit.	2025-07-03 09:51:56 -04:00
Timothy Flynn	c17b067e1d	AK: Completely remove endianness from Utf16View APIs These were mostly removed in `7628ddfaf7`. This removes the few remaining cases, as no callers are providing any non-host endianness. This is just to prevent weird API dissymmetry between Utf16View and an upcoming Utf16String.	2025-07-03 09:51:56 -04:00
Timothy Flynn	a0eb47e2fc	AK: Add hash traits for Utf16View This is based on the hash in JS::Utf16StringImpl::compute_hash.	2025-07-03 09:51:56 -04:00
Timothy Flynn	2abc955ca9	AK: Allow treating UTF-16 views with lonely surrogates as valid Much of the web requires us to allow lonely surrogates in UTF-16 data. The default behavior to disallow such code units has not been changed here - that will be changed in an upcoming commit.	2025-07-03 09:51:56 -04:00
Timothy Flynn	d978a582a0	AK: Add a Utf16View ASCII validator	2025-07-03 09:51:56 -04:00
Timothy Flynn	66006d3812	AK+LibJS: Extract some UTF-16 helpers for use in an outside class An upcoming Utf16String will need access to these helpers. Let's make them publicly available.	2025-07-03 09:51:56 -04:00
Timothy Flynn	efa9737cf7	AK+LibJS: Do not set UTF-16 code point length to its code unit length	2025-06-25 22:20:47 +02:00
Jelle Raaijmakers	cc0a28ee7d	AK: Add `Utf16View::find_code_unit_offset(_ignoring_case)`	2025-06-13 15:08:26 +02:00
Shannon Booth	5cf87dcfdc	AK: Add a Utf16View::is_code_unit_less_than helper This seems like the natural place to put this since it is specific to UTF-16.	2025-05-17 08:00:59 -04:00
Ali Mohammad Pur	eea81738cd	AK+Everywhere: Recognise that surrogates in utf16 aren't all that common For the slight cost of counting code points when converting between encodings and a teeny bit of memory, this commit adds a fast path for all-happy utf-16 substrings and code point operations. This seems to be a significant chunk of time spent in many regex benchmarks.	2025-04-23 07:56:02 -06:00
Andreas Kling	0c93a07fb1	AK: Shrink Utf16View Use a sentinel value instead of Optional for the cached length in code points, shrinking Utf16View from 32 to 24 bytes.	2025-04-16 10:04:50 +02:00
Andreas Kling	7628ddfaf7	AK: Remove endianness override from Utf16View Utf16View is now always in "host" endian mode. This makes it smaller and less branchy for everyone!	2025-04-16 10:04:50 +02:00
Andreas Kling	0e9480b944	AK+LibTextCodec: Stop using Utf16View endianness override This is preparation for removing the endianness override, since it was only used by a single client: LibTextCodec. While here, add helpers and make use of simdutf for fast conversion.	2025-04-16 10:04:50 +02:00
Andrew Kaster	5e7e6475c6	AK: Annotate [[no_unique_address]] members with NO_UNIQUE_ADDRESS macro	2025-04-15 02:19:06 -06:00
Andreas Kling	b2779ad9f7	AK: Shrink Utf16View from 40 bytes to 32 bytes This ends up making RegexStringView smaller, which means less stuff to copy when forking in the regex engine. Thanks to Leon for suggesting the [[no_unique_address]] trick!	2025-04-09 07:22:01 +02:00
Jonne Ransijn	04920d06f0	AK: Use `simdutf` when appending UTF-16 to StringBuilder Adds a fast path for valid UTF-16 using `simdutf`, and fall back to the slow path for unmatched surrogates.	2024-10-30 10:28:24 +01:00
Timothy Flynn	7a17c654d2	AK: Add a method to compute UTF-16 length from a UTF-8 string	2024-07-31 05:55:34 -04:00
Timothy Flynn	71c29504af	AK: Support non-native endianness in Utf16View Utf16View currently assumes host endianness. Add support for specifying either big or little endianness (which we mostly just pipe through to simdutf). This will allow using simdutf facilities with LibTextCodec.	2024-07-18 19:43:57 +02:00
Timothy Flynn	32ffe9bbfc	AK: Replace UTF-16 validation and length computation with simdutf	2024-07-18 14:46:25 +02:00
Timothy Flynn	ec492a1a08	Everywhere: Run clang-format The following command was used to clang-format these files: clang-format-18 -i $(find . \ -not $ -path "./\." -prune $ \ -not $ -path "./Base/" -prune $ \ -not $ -path "./Build/" -prune $ \ -not $ -path "./Toolchain/" -prune $ \ -not $ -path "./Ports/" -prune $ \ -type f -name ".cpp" -o -name ".mm" -o -name ".h") There are a couple of weird cases where clang-format now thinks that a pointer access in an initializer list, e.g. `m_member(ptr->foo)`, is a lambda return statement, and it puts spaces around the `->`.	2024-04-24 16:50:01 -04:00
Timothy Flynn	1b4a23095c	AK: Add a Utf16View::starts_with method Based heavily on Utf8View::starts_with.	2024-01-04 12:43:10 +01:00
Timothy Flynn	c46ba7e68d	AK: Allow constructing a UTF-16 view from a UTF-16 string literal UTF-16 string literals are a language-level feature. It is convenient to be able to construct a Utf16View from these strings.	2024-01-04 12:43:10 +01:00
Ali Mohammad Pur	5e1499d104	Everywhere: Rename {Deprecated => Byte}String This commit un-deprecates DeprecatedString, and repurposes it as a byte string. As the null state has already been removed, there are no other particularly hairy blockers in repurposing this type as a byte string (what it _really_ is). This commit is auto-generated: $ xs=$(ack -l \bDeprecatedString\b\\|deprecated_string AK Userland \ Meta Ports Ladybird Tests Kernel) $ perl -pie 's/\bDeprecatedString\b/ByteString/g; s/deprecated_string/byte_string/g' $xs $ clang-format --style=file -i \ $(git diff --name-only \| grep \.cpp\\|\.h) $ gn format $(git ls-files '.gn' '.gni')	2023-12-17 18:25:10 +03:30
Timothy Flynn	370ea9441c	AK: Define an alias for Utf16View's iterator type Utf8View and Utf32View do so already. This allows using these views more readily in generic code.	2023-11-08 12:54:26 -05:00
MacDue	63b11030f0	Everywhere: Use ReadonlySpan<T> instead of Span<T const>	2023-02-08 19:15:45 +00:00
Timothy Flynn	2eacc7aec1	AK: Add Utf16View::to_utf8 to convert the view to a UTF-8 AK::String	2023-01-09 23:00:24 +00:00
Timothy Flynn	d0403ec14f	AK+Everywhere: Rename Utf16View::to_utf8 to to_deprecated_string A subsequent commit will add to_utf8 back to create an AK::String.	2023-01-09 23:00:24 +00:00
Timothy Flynn	d793262beb	AK+Everywhere: Make UTF-16 to UTF-8 converter fallible This could fail to allocate the underlying storage needed to store the UTF-8 data. Propagate this error.	2023-01-08 12:13:15 +01:00
Timothy Flynn	1edb96376b	AK+Everywhere: Make UTF-8 and UTF-32 to UTF-16 converters fallible These could fail to allocate the underlying storage needed to store the UTF-16 data. Propagate these errors.	2023-01-08 12:13:15 +01:00

1 2

66 commits