ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-07-24 18:02:20 +00:00

Author	SHA1	Message	Date
Timothy Flynn	bd73dd316d	LibTimeZone: Remove LibTimeZone and TZDB data All users have been ported to the ICU implementation.	2024-06-26 10:14:02 +02:00
Timothy Flynn	666979fb90	LibUnicode: Remove unused headers from GenerateEmojiData Mostly just to make it clear we do not depend on LibUnicode in this generator.	2024-06-23 19:52:45 +02:00
Timothy Flynn	2ba7b4c529	LibUnicode: Remove now-unused code generator facilities	2024-06-22 14:56:39 +02:00
Timothy Flynn	069bed5d47	LibUnicode+LibGfx: Remove superfluous emoji metadata For SerenityOS, we parse emoji metadata from the UCD to learn emoji groups, subgroups, names, etc. We used this information only in the emoji picker dialog. It is entirely unused within Ladybird. This removes our dependence on the UCD emoji file, as we no longer need any of its information. All we need to know is the file path to our custom emoji, which we get from Meta/emoji-file-list.txt.	2024-06-22 14:56:39 +02:00
Timothy Flynn	aa3a30870b	LibUnicode: Replace code point bidirectional classes with ICU	2024-06-22 14:56:39 +02:00
Timothy Flynn	e77dafc987	LibUnicode: Replace code point scripts and script extensions with ICU	2024-06-22 14:56:39 +02:00
Timothy Flynn	986ff984cc	LibUnicode: Replace code point general categories with ICU	2024-06-22 14:56:39 +02:00
Timothy Flynn	c804bda5fd	LibUnicode: Replace code point properties with ICU	2024-06-22 14:56:39 +02:00
Timothy Flynn	ab56b8c8dc	LibUnicode: Remove the locale-unaware text segmentation implementation	2024-06-20 13:46:54 +02:00
Timothy Flynn	5cf818e305	LibUnicode: Replace case transformations and comparison with ICUs There are a couple of differences here due to using ICU: 1. Titlecasing behaves slightly differently. We previously transformed "123dollars" to "123Dollars", as we would use word segmentation to split a string into words, then transform the first cased character to titlecase. ICU doesn't go quite that far, and leaves the string as "123dollars". While this is a behavior change, the only user of this API is the `text-transform: capitalize;` CSS rule, and we now match the behavior of other browsers. 2. There isn't an API to compare strings with case insensitivity without allocating case-folded strings for both the left- and right-hand-side strings. Our implementation was previously allocation-free; however, in a benchmark, ICU is still ~1.4x faster.	2024-06-20 10:59:55 +02:00
Timothy Flynn	8d7216f4e0	LibUnicode: Replace IDNA ASCII conversion with ICU	2024-06-18 21:07:56 +02:00
Timothy Flynn	83475c5380	LibUnicode: Replace Unicode string normalization with ICU In a benchmark, ICU's implementation was over 3x faster than ours.	2024-06-18 21:07:56 +02:00
Timothy Flynn	1feef17bf7	LibUnicode: Remove completely unused code point name & block name data These were used for e.g. the Character Map on Serenity, but are not used at all for Ladybird.	2024-06-18 21:07:56 +02:00
Timothy Flynn	4de8adabac	LibLocale: Replace available locale lookups with ICU	2024-06-16 06:57:08 +02:00
Dan Klishch	b8c3e75573	Meta+Userland: Fix more instances of bad lambda-Variant interaction These don't cause compilation to fail but they still crash crashd.	2024-04-18 13:14:33 -06:00
Idan Horowitz	945c58c7c1	LibUnicode: Generate and use code point composition mappings These allow us to binary search the code point compositions based on the first code point being combined, which makes the search close to O(log N) instead of O(N).	2024-04-06 14:21:04 -04:00
Timothy Flynn	683c08744a	Userland: Avoid some conversions from rvalue strings to StringView These are all actually fine, there is no UAF here. But once e.g. `ByteString::view() &&` is deleted, these instances won't compile.	2024-04-04 11:23:21 +02:00
Timothy Flynn	91cd43a7ac	Meta: Add a file containing a list of all emoji file names And add a verification step to the emoji data generator to ensure all emoji are listed in this file. This file will be used as a sources list in both the CMake and GN build systems. It is probably possible to generate this list. But in a first attempt, the CMake code to set the file as a dependency of a pseudo target, which would then parse the file and install the listed emoji was getting quite verbose and complicated. So for now, let's just maintain this list.	2024-03-23 17:26:31 -04:00
Nico Weber	24a469f521	Everywhere: Prefer {:#x} over 0x{:x} in format strings The former automatically adapts the prefix to binary and octal output, and is what we already use in the majority of cases. Patch generated by: rg -l '0x\{' \| xargs sed -i '' -e 's/0x{:/{:#/' I ran it 4 times (until it stopped changing things) since each invocation only converted one instance per line. No behavior change.	2024-02-21 17:54:38 +01:00
Ali Mohammad Pur	5e1499d104	Everywhere: Rename {Deprecated => Byte}String This commit un-deprecates DeprecatedString, and repurposes it as a byte string. As the null state has already been removed, there are no other particularly hairy blockers in repurposing this type as a byte string (what it _really_ is). This commit is auto-generated: $ xs=$(ack -l \bDeprecatedString\b\\|deprecated_string AK Userland \ Meta Ports Ladybird Tests Kernel) $ perl -pie 's/\bDeprecatedString\b/ByteString/g; s/deprecated_string/byte_string/g' $xs $ clang-format --style=file -i \ $(git diff --name-only \| grep \.cpp\\|\.h) $ gn format $(git ls-files '.gn' '.gni')	2023-12-17 18:25:10 +03:30
Timothy Flynn	43e9dc0500	LibUnicode: Use weak symbols to provide default IDNA defintions Rather than using #ifdef blocks, update the fallback IDNA definitions to use weak symbols to match the rest of LibUnicode / LibLocale.	2023-12-10 10:19:14 -05:00
Simon Wanner	7d9fe44039	LibUnicode: Download and parse IDNA data	2023-12-10 08:04:58 -05:00
Tim Schumacher	a2f60911fe	AK: Rename GenericTraits to DefaultTraits This feels like a more fitting name for something that provides the default values for Traits.	2023-11-09 10:05:51 -05:00
Timothy Flynn	139c575cc9	LibUnicode: Update to Unicode version 15.1.0 https://unicode.org/versions/Unicode15.1.0/ This update includes a new set of code point properties, Indic Conjunct Break. These may have the values Consonant, Linker, or Extend. These are used in text segmentation to prevent breaking on some extended grapheme cluster sequences.	2023-09-15 18:30:26 +02:00
Andreas Kling	8b936b5912	AK: Make SourceGenerator::set() infallible	2023-08-22 13:08:24 +02:00
Sam Atkins	0d021a63c7	LibUnicode: Generate data for bidirectional character types This will let us examine code points to determine the rtl/ltr direction of a piece of text.	2023-08-20 16:21:35 -04:00
Lucas CHOLLET	3f35ffb648	Userland: Prefer `_string` over `_short_string` As `_string` can't fail anymore (since `3434412`), there are no real benefits to use the short variant in most cases.	2023-08-08 07:37:21 +02:00
Timothy Flynn	b91af3c6a0	LibUnicode: Remove a few generator tracking fields that are now unused These were used to generate specialized tables. Now that those tables have been migrated to general 2-stage lookup tables, these fields are all unused.	2023-07-28 05:28:50 +02:00
Timothy Flynn	456211932f	LibUnicode: Perform code point case conversion lookups in constant time Similar to commit `0652cc4`, we now generate 2-stage lookup tables for case conversion information. Only about 1500 code points are actually cased. This means that case information is rather highly compressible, as the blocks we break the code points into will generally all have no casing information at all. In total, this change: * Does not change the size of libunicode.so (which is nice because, generally, the 2-stage lookup tables are expected to trade a bit of size for performance). * Reduces the runtime of the new benchmark test case added here from 1.383s to 1.127s (about an 18.5% improvement).	2023-07-28 05:28:50 +02:00
Timothy Flynn	0ee133af90	LibUnicode: Separate code point case information into its own structure There is no functional change here. This information will compose the upcoming multistage casing tables in an upcoming patch. Extract it to its own struct to prepare for that.	2023-07-28 05:28:50 +02:00
Timothy Flynn	a332a8ad19	LibUnicode: Prepare Unicode data generator for multistage casing tables There is no functional change here. This just adjusts the changes made in commit `0652cc4` to be a bit more generic for code point casing tables. We currently only generate property tables, which boil down to a vector of booleans. Casing tables will be a struct of varying types, so this generalizes some of the generator to prepare for that ahead of time, to make the upcoming casing patch smaller / easier to grok.	2023-07-28 05:28:50 +02:00
Timothy Flynn	3fae92eea2	LibUnicode: Search code point properties sequentially at compile time When generating code point property tables, we currently binary search the code point range lists for each property to decide if a code point has that property. However, we are both iterating over the code points and through the sorted properties in order. This means we do not need to search code point ranges that are below the current code point at all. We can even remove the code point ranges that fall below the current code point, as we will not see a code point in those ranges again. On my machine, this reduces the run time of GenerateUnicodeData from 3.4 seconds to 1.2 seconds.	2023-07-28 05:28:50 +02:00
Timothy Flynn	0652cc48c0	LibUnicode: Perform code point property lookups in constant time We currently produce a single table for all categories of code point properties (GeneralCategory, Script, etc.). Each row contains a field indicating the range of code points to which that property applies. At runtime, we then do a binary search through that table to decide if a code point has a property. This changes our approach to generate a 2-stage lookup table for each of those categories. There is an in-depth explanation of these tables above the new `create_code_point_tables` method. The end effect is that code point property lookup is reduced from a binary search to constant-time array lookups. In total, this change: * Increases the size of libunicode.so from 2.7 MB to 2.9 MB. * Reduces the runtime of the new benchmark test case added here from 3.576s to 1.020s (a 3.5x speedup). * In a profile of resizing a TextEditor window with a 3MB file open, the runtime of checking if a code point has a word break property reduces from ~81% to ~56%.	2023-07-26 08:36:20 +02:00
Timothy Flynn	8f1d73abde	LibUnicode: Use the public CodePointRange in the code generator The next commit will need a type from LibUnicode/CharacterTypes.h. To avoid conflicts between that header's CodePointRange and the one that is defined in the code generator, just use the public definition.	2023-07-26 08:36:20 +02:00
Timothy Flynn	cb128dcf75	LibUnicode: Move the CodePointRangeComparator struct to a public header Move it out of the generated code so that it may be used by the code generator itself.	2023-07-26 08:36:20 +02:00
Timothy Flynn	c950f88611	LibUnicode: Stop generating Block property data We started generating this data in commit `0505e03`, but it was unused. It's still not used, so let's remove it, rather than bloating the size of libunicode.so with unused data. If we need it in the future, it's trivial to add back. Note we have always used the block name data from that commit, and that is still present here.	2023-07-26 08:36:20 +02:00
Ben Wiederhake	5cfa883b9f	LibUnicode: Explicitly mark HashMap copy	2023-05-19 22:33:57 +02:00
Lucas CHOLLET	8c34959b53	AK: Add the `Input` word to input-only buffered streams This concerns both `BufferedSeekable` and `BufferedFile`.	2023-05-09 11:18:46 +02:00
Cameron Youell	1d24f394c6	Everywhere: Use `LibFileSystem` where trivial	2023-03-21 19:03:21 +00:00
Sam Atkins	b18c1c7291	LibUnicode: Remove now-unused dir-iterator helper functions	2023-03-15 12:49:33 -04:00
Sam Atkins	8a8ad81aa1	LibUnicode: Migrate GenerateEmojiData to Directory::for_each_entry()	2023-03-15 12:49:33 -04:00
Sam Atkins	8672b380f6	LibUnicode: Read emoji file title from LexicalPath directly ... rather than taking the whole file name, and then manually trimming the extension off.	2023-03-15 12:49:33 -04:00
gustrb	5141c86587	AK: Rename CaseInsensitiveStringViewTraits to reflect intent Now it is called `CaseInsensitiveASCIIStringViewTraits`, so we can be more specific about what data structure does it operate onto. ;)	2023-03-14 21:34:32 +00:00
Tim Schumacher	8032724574	CodeGenerators: Ensure that we always print the entire generated output	2023-03-13 15:16:20 +00:00
Tim Schumacher	d5871f5717	AK: Rename Stream::{read,write} to Stream::{read_some,write_some} Similar to POSIX read, the basic read and write functions of AK::Stream do not have a lower limit of how much data they read or write (apart from "none at all"). Rename the functions to "read some [data]" and "write some [data]" (with "data" being omitted, since everything here is reading and writing data) to make them sufficiently distinct from the functions that ensure to use the entire buffer (which should be the go-to function for most usages). No functional changes, just a lot of new FIXMEs.	2023-03-13 15:16:20 +00:00
Sam Atkins	774f328783	LibCore+Everywhere: Return an Error from DirIterator::error() This also removes DirIterator::error_string(), since the same strerror() string will be included when you print the Error itself. Except in `ls` which is still using fprintf() for now.	2023-03-05 20:23:42 +01:00
Timothy Flynn	ca2b030336	LibUnicode: Use binary search for lookups into the generated emoji data This sorts the array of generated emoji data by code point (first by code point length, then by code point value). This lets us use a binary search to find emoji data, rather than the current linear search. In a profile of scrolling around /home/anon/Documents/emoji.txt, this reduces the runtime of Gfx::Emoji::emoji_for_code_points from 69.03% to 28.42%. Within that, Unicode::find_emoji_for_code_points reduces from 28.42% to just 1.95%.	2023-03-05 16:44:20 +01:00
Timothy Flynn	03f32bdf86	LibUnicode: Validate that all emoji images in /res/emoji actually exist This will raise a compile error if an emoji image was neglected to be added to e.g. emoji-serenity.txt, or if the code points are not correct.	2023-03-03 17:09:58 +00:00
Timothy Flynn	fd1fbad1d2	LibGfx+LibUnicode: Support specifying the path to search for emoji Similar to the FontDatabase, this will be needed for Ladybird to find emoji images. We now generate just the file name of emoji image in LibUnicode, and look for that file in the specified path (defaulting to /res/emoji) at runtime.	2023-03-01 14:54:16 +00:00
MacDue	01fa3bb788	LibUnicode: Propagate try_append() errors when building emoji data	2023-02-24 22:18:25 +01:00

1 2 3 4 5 ...

304 commits