The class was an inner class of `BrotliDecompressionStream`, let's move
it outside the `Stream` object in order to ease the access to user only
interested in this part.
These routines:
- read_prefix_code
- read_simple_prefix_code
- read_complex_prefix_code
were methods of `BrotliDecompressionStream` taking a `CanonicalCode` as
an out parameter. This patch puts them in `CanonicalCode` as static
methods.
This now searches the memory in blocks, which should be slightly more
efficient. However, it doesn't make much difference (e.g. ~1% in LZMA
compression) in most real-world applications, as the non-hint function
is more expensive by orders of magnitude.
The "operation modes" of this function have very different focuses, and
trying to combine both in a way where we share the most amount of code
probably results in the worst performance.
Instead, split up the function into "existing distances" and "no
existing distances" so that we can optimize either case separately.
We will be adding extra logic to the CircularBuffer to optimize
searching, but this would negatively impact the performance of
CircularBuffer users that don't need that functionality.
This should keep the `read_some` function a bit flatter and shorter, and
make it easier to match the match type decoding process with the
specification.
Webp lossless can have up to 2328 symbols. This code assumed the deflate
max of 288, leading to crashes for webp lossless files using more than
288 symbols (such as Tests/LibGfx/test-inputs/simple-vp8l.webp).
Nothing writes webp files at this point, so the m_bit_codes and
m_bit_code_lengths arrays aren't ever used in practice with more than
288 entries.
Missing:
* Transform support (used by virtually all lossless webp files)
* Meta prefix / entropy image support
Working:
* Decoding of regular image streams
* Color cache
This happens to be enough to be able to decode
Tests/LibGfx/test-inputs/extended-lossless.webp
The canonical prefix code is very similar to deflate's, enough so that
this can use Compress::CanonicalCode (and take advantage of all the
recent performance improvements there).
deflate_special_code_length_copy has value 16, so it should be
before the two zero-filling branches for codes 17 and 18.
Also, the initial if also refers to deflate_special_code_length_copy
as well, so if it's repeated right in the next else, one has to keep
it on the mental stack for shorter when reading this code.
No behavior change.
Alternatively, we could remove the else after the continue, but
all branches here should be equally prominent, so this seems a bit
nicer.
No behavior change.
This is very similar to the LittleEndianInputBitStream bit buffer change
from 8e834d4bb2.
We currently buffer one byte of data for the underlying stream. And when
we put bits onto that buffer, we do so 1 bit at a time.
This replaces the u8 buffer with a u64. And instead of looping at all,
we perform bitwise operations to write the desired number of bits.
Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), compression time
decreases from:
13.62s to 10.9s on Serenity (cold)
13.62s to 9.22s on Serenity (warm)
2.93s to 2.32s on Linux
One caveat is that this requires explicitly flushing any leftover bits
when the caller is done with the stream. The byte buffer implementation
implicitly flushed its data every time the buffer was byte-aligned, as
doing so would always fill the byte. This is no longer the case. But for
now, this should be fine as the one user of this class, DEFLATE, already
has a "flush everything now that we're done" finalizer.
This is copy-pasted from the gzip utility, along with its existing TODO.
This is currently only needed by that utility, but this gives us API
symmetry with GzipDecompressor, and helps ensure we won't end up in a
situation where only one utility receives optimizations that should be
received by all interested parties.
Instead of reading bytes from the output stream into a buffer, just to
immediately write them back out, we can skip the middle-man and copy the
bytes directly into the output buffer.