Commit graph

153 commits

Author SHA1 Message Date
Lucas CHOLLET
3fdf5072ec LibCompress/Brotli: Remove CanonicalCode::clear()
This function was used in a single place and don't provide a huge
benefit over simply recreating the object.
2023-07-22 07:10:47 +02:00
Lucas CHOLLET
bb834ed765 LibCompress: Add a constructor to Brotli::CanonicalCode
This constructor will be used by the JPEG-XL decoder to support a
non-standard special case. Other user should only use other
constructors.
2023-07-21 10:47:34 -06:00
Lucas CHOLLET
96eace8390 LibCompress: Move CanonicalCode in the Brotli namespace
The class was an inner class of `BrotliDecompressionStream`, let's move
it outside the `Stream` object in order to ease the access to user only
interested in this part.
2023-07-21 10:47:34 -06:00
Lucas CHOLLET
9248fd7f33 LibCompress: Move CanonicalCode's initializers inside CanonicalCode
These routines:
 - read_prefix_code
 - read_simple_prefix_code
 - read_complex_prefix_code

 were methods of `BrotliDecompressionStream` taking a `CanonicalCode` as
 an out parameter. This patch puts them in `CanonicalCode` as static
 methods.
2023-07-21 10:47:34 -06:00
Lucas CHOLLET
d2dd4142d1 LibCompress: Make CanonicalCode::read_symbol const 2023-07-21 10:47:34 -06:00
Timothy Flynn
c911781c21 Everywhere: Remove needless trailing semi-colons after functions
This is a new option in clang-format-16.
2023-07-08 10:32:56 +01:00
Tim Schumacher
60ac254df6 AK: Use hashing to accelerate searching a CircularBuffer 2023-07-06 15:06:20 +01:00
Tim Schumacher
42d01b21d8 AK: Rewrite the hint-based CircularBuffer::find_copy_in_seekback
This now searches the memory in blocks, which should be slightly more
efficient. However, it doesn't make much difference (e.g. ~1% in LZMA
compression) in most real-world applications, as the non-hint function
is more expensive by orders of magnitude.
2023-07-06 15:06:20 +01:00
Tim Schumacher
046a9faeb3 AK: Split up CircularBuffer::find_copy_in_seekback
The "operation modes" of this function have very different focuses, and
trying to combine both in a way where we share the most amount of code
probably results in the worst performance.

Instead, split up the function into "existing distances" and "no
existing distances" so that we can optimize either case separately.
2023-07-06 15:06:20 +01:00
Tim Schumacher
9e82ad758e AK: Move parts for searching CircularBuffer into a new class
We will be adding extra logic to the CircularBuffer to optimize
searching, but this would negatively impact the performance of
CircularBuffer users that don't need that functionality.
2023-07-06 15:06:20 +01:00
tgsm
c30775522e LibCompress/Gzip: Replace usage of DeprecatedString 2023-06-17 06:44:16 +02:00
Tim Schumacher
d4b0e64825 LibCompress: Move two shared LZMA magic numbers into a common place 2023-05-19 23:40:33 +02:00
Tim Schumacher
a01968ee6d LibCompress: Handle arbitrarily long FF-chains in the LZMA encoder 2023-05-19 23:40:33 +02:00
Tim Schumacher
cb93186350 LibCompress: Add debug logging for handling LZMA direct bits 2023-05-19 23:40:33 +02:00
Tim Schumacher
df071d8a76 LibCompress: Add a lot of debug logging to LZMA 2023-05-17 09:08:53 +02:00
Tim Schumacher
85a54cc796 LibCompress: Add an LZMA encoder 2023-05-17 09:08:53 +02:00
Tim Schumacher
9ab3646bc7 LibCompress: Use the variable for LZMA "normalized to real distance"
The variable already existed, but I forgot to use it earlier.
2023-05-17 09:08:53 +02:00
Tim Schumacher
42514c6961 LibCompress: Decode the LZMA match type in a separate function
This should keep the `read_some` function a bit flatter and shorter, and
make it easier to match the match type decoding process with the
specification.
2023-05-17 09:08:53 +02:00
Tim Schumacher
4a37bac374 LibCompress: Make LzmaHeader a POD-like type
This allows us to initialize the struct using an aggregate initializer.
2023-05-17 09:08:53 +02:00
Tim Schumacher
440d8f908f LibCompress: Extract the LZMA state to a separate class
We will also need this in the compressor, as it needs to do the exact
same calculations in reverse.
2023-05-17 09:08:53 +02:00
Lucas CHOLLET
8c34959b53 AK: Add the Input word to input-only buffered streams
This concerns both `BufferedSeekable` and `BufferedFile`.
2023-05-09 11:18:46 +02:00
Tim Schumacher
dffef6bb71 LibCompress: Remove special casing for looping DEFLATE seekbacks
The `copy_from_seekback` method already handles this exactly as DEFLATE
expects, but it is slightly more optimized.
2023-05-04 20:01:16 +02:00
Tim Schumacher
4098335600 LibCompress: Error on truncated uncompressed DEFLATE blocks 2023-04-12 14:02:13 -04:00
Tim Schumacher
e11e7309dd LibCompress: Replace usages of the Endian bytes accessor 2023-04-12 07:33:15 -04:00
Tim Schumacher
381da77ffb LibCompress: Mark some XZ-related variables and functions as const 2023-04-08 15:18:59 -07:00
Tim Schumacher
e9789e9f36 LibCompress: Move loading XZ blocks into its own function 2023-04-08 15:18:59 -07:00
Tim Schumacher
e6b1e1bb33 LibCompress: Move finishing the current XZ stream into its own function 2023-04-08 15:18:59 -07:00
Tim Schumacher
68984abc43 LibCompress: Move finishing the current XZ block into its own function 2023-04-08 15:18:59 -07:00
Tim Schumacher
0e11e7012d LibCompress: Move loading XZ stream headers into its own function 2023-04-08 15:18:59 -07:00
Nico Weber
6d38824985 LibCompress: Tolerate more than 288 entries in CanonicalCode
Webp lossless can have up to 2328 symbols. This code assumed the deflate
max of 288, leading to crashes for webp lossless files using more than
288 symbols (such as Tests/LibGfx/test-inputs/simple-vp8l.webp).

Nothing writes webp files at this point, so the m_bit_codes and
m_bit_code_lengths arrays aren't ever used in practice with more than
288 entries.
2023-04-07 20:49:39 +02:00
Tim Schumacher
7000ccf89f LibCompress: Copy LZMA repetitions from the buffer in sequence
This improves the decompression time of `clang-15.0.7.src.tar.xz` from
5.2 seconds down to about 2.7 seconds.
2023-04-05 07:30:38 -04:00
Tim Schumacher
b88c58b94c AK+LibCompress: Break when seekback copying to a full CircularBuffer
Otherwise, we just end up infinitely looping while waiting for more
space in the destination.
2023-04-05 07:30:38 -04:00
Nico Weber
c84968dafd LibGfx: Add some support for decoding lossless webp files
Missing:
* Transform support (used by virtually all lossless webp files)
* Meta prefix / entropy image support

Working:
* Decoding of regular image streams
* Color cache

This happens to be enough to be able to decode
Tests/LibGfx/test-inputs/extended-lossless.webp

The canonical prefix code is very similar to deflate's, enough so that
this can use Compress::CanonicalCode (and take advantage of all the
recent performance improvements there).
2023-04-05 13:24:00 +02:00
Nico Weber
26230f2ffd LibCompress: Order branches in Deflate's decode_codes() numerically
deflate_special_code_length_copy has value 16, so it should be
before the two zero-filling branches for codes 17 and 18.

Also, the initial if also refers to deflate_special_code_length_copy
as well, so if it's repeated right in the next else, one has to keep
it on the mental stack for shorter when reading this code.

No behavior change.
2023-04-04 19:16:06 +02:00
Nico Weber
72d6a30e08 LibCompress: Remove a few no-op continue statements in Deflate
Alternatively, we could remove the else after the continue, but
all branches here should be equally prominent, so this seems a bit
nicer.

No behavior change.
2023-04-04 19:16:06 +02:00
Timothy Flynn
eed956b473 AK: Increase LittleEndianOutputBitStream's buffer size and remove loops
This is very similar to the LittleEndianInputBitStream bit buffer change
from 8e834d4bb2.

We currently buffer one byte of data for the underlying stream. And when
we put bits onto that buffer, we do so 1 bit at a time.

This replaces the u8 buffer with a u64. And instead of looping at all,
we perform bitwise operations to write the desired number of bits.

Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), compression time
decreases from:

    13.62s to 10.9s on Serenity (cold)
    13.62s to 9.22s on Serenity (warm)
    2.93s to 2.32s on Linux

One caveat is that this requires explicitly flushing any leftover bits
when the caller is done with the stream. The byte buffer implementation
implicitly flushed its data every time the buffer was byte-aligned, as
doing so would always fill the byte. This is no longer the case. But for
now, this should be fine as the one user of this class, DEFLATE, already
has a "flush everything now that we're done" finalizer.
2023-04-02 10:54:37 +02:00
Nico Weber
85d0637058 LibCompress: Make CanonicalCode::from_bytes() return ErrorOr<>
No intended behavior change.
2023-04-02 06:19:46 +02:00
Tim Schumacher
ad31265e60 LibCompress: Implement block size validation for XZ streams 2023-04-01 13:57:54 +02:00
Tim Schumacher
20f1a29202 LibCompress: Factor out the list of XZ check sizes 2023-04-01 13:57:54 +02:00
Nico Weber
bc70d7bb77 LibCompress: Reduce indentation in CompressedBlock::try_read_more()
...by removing `else` after `return`.

No behavior change.
2023-04-01 13:57:39 +02:00
Timothy Flynn
7ec91dfde7 LibCompress: Add a utility to GZIP compress an entire file
This is copy-pasted from the gzip utility, along with its existing TODO.
This is currently only needed by that utility, but this gives us API
symmetry with GzipDecompressor, and helps ensure we won't end up in a
situation where only one utility receives optimizations that should be
received by all interested parties.
2023-04-01 08:15:49 +02:00
Timothy Flynn
857f559a06 gunzip+LibCompress: Move utility to decompress files to GzipDecompressor
This is to allow re-using this method (and any optimization it receives)
by other utilities, like gzip.
2023-04-01 08:15:49 +02:00
Nico Weber
c3b8b3124c LibCompress: Remove two needless heap allocations 2023-03-31 08:44:30 -06:00
Timothy Flynn
8b56d82865 AK+LibCompress: Remove the Deflate back-reference intermediate buffer
Instead of reading bytes from the output stream into a buffer, just to
immediately write them back out, we can skip the middle-man and copy the
bytes directly into the output buffer.
2023-03-31 06:56:11 +02:00
Timothy Flynn
9f238793e0 gunzip+LibCompress: Increase buffer sizes used by Deflate and gunzip
Co-authored-by: Andreas Kling <kling@serenityos.org>
2023-03-31 06:56:11 +02:00
Tim Schumacher
fe761a4e9b LibCompress: Use LZMA context from preexisting dictionary 2023-03-30 14:39:31 +02:00
Tim Schumacher
c020ee8bfa LibCompress: Avoid overflowing the size of uncompressed LZMA2 chunks 2023-03-30 14:39:31 +02:00
Tim Schumacher
023c64011c LibCompress: Use the correct LZMA repetition offset in all cases 2023-03-30 14:39:31 +02:00
Tim Schumacher
9ccb0fc1d8 LibCompress: Only require new LZMA2 properties after dictionary reset 2023-03-30 14:39:31 +02:00
Tim Schumacher
d9627503a9 LibCompress: Reduce repeated code in the LZMA decompressor 2023-03-30 14:39:31 +02:00