Deflate and WebP can store at most 15 bits per symbol, meaning their
huffman trees can be at most 15 levels deep.
During construction, when we hit this level, we used to try again
with an ever lower frequency cap per symbol. This had the effect
of giving the symbols with the highest frequency lower frequencies
first, causing the most-frequent symbols to be merged. For example,
maybe the most-frequent symbol had 1 bit, and the 2nd-frequent
two bits (and everything else at least 3). With the cap, the two
most frequent symbols might both have 2 symbols, freeing up bits
for the lower levels of the tree.
This has the effect of making the most-frequent symbols longer at
first, which isn't great for file size.
Instead of using a frequency cap, ignore ever more of the low
bits of the frequency. This sacrifices resolution where it hurts
the lower levels of the tree first, and those are stored less
frequently.
For deflate, the 64 kiB block size means this doesn't have a big
effect, but for WebP it can have a big effect:
sunset-retro.png (876K): 2.02M -> 1.73M -- now (very slightly) smaller
than twice the input size! Maybe we'll be competitive one day.
(For wow.webp and 7z7c.webp, it has no effect, since we don't hit
the "tree too deep" case there, since those have relatively few
colors.)
No behavior change other than smaller file size. (No performance
cost either, and it's less code too.)
For our deflate, block size is limited to less than 64 kiB, so the sum
of all byte frequencies always fits in a u16 by construction.
But while I haven't hit this in practice, but it can conceivably happen
when writing WebP files, which currently use a single huffman tree
(per channel) for a while image -- which is often much larger than
64 kiB.
No dramatic behavior change in practice, just feels more correct.
To be used in WebPWriter.
JPEGWriter currently hardcodes huffman tables; maybe it can use this
to build data-dependent huffman tables in the future as well.
Pure code move (except for removing the `DeflateCompressor::` prefix
on the function's name, and putting the default argument for the 4th
argument in the function's definition), no behavior change.