For a 1024x1024 image, saves about a quarter MB of memory use while
decoding (compared to the decompressed image data itself needing
4 MiB). Not a huge win, but also very easy to do, so might as well.
No behavior change, no measurable performance impact.
I could not discover proof that this is actually faster than the non-SSE
version. In addition, for these relatively simple structures, the
compiler is often sufficiently smart to generate SSE code itself.
For a synthetic font benchmark I wrote, this results in a nice 11%
decrease in runtime.
If the flex container is being sized under a max-content main size
constraint, there is effectively infinite space available for flex
items. Thus, flex lines should be allowed to be infinitely long.
This is a little awkward, because the spec doesn't mention specifics
about how to resolve flexible lengths during intrninsic sizing.
I've marked the spec deviations with big "AD-HOC" comments.
Instead of just measuring the layout viewport, we now measure overflow
in every box that is a scroll container.
This has the side effect of no longer creating paintables for layout
boxes that didn't participate in layout. (For example, empty/anonymous
boxes that were ignored by flex itemization.)
Such boxes are now marked as "(not painted)" in the layout tree dumps,
as they have no paintable to dump geometry from.
For some reason, we were decoding the source color twice for every pixel
in the inner-most loop of `blit_filtered`. This makes sure we only
decode the source color once, and rearranges the code to improve
readability.
For my synthetic font rendering benchmark, this improves glyph rendering
performance by ~9%.
Now that we allocate an oversized backing store during resizing of the
viewport, we need to constrain the source rect used when drawing a
scaled version of the content in the special mode used by Presenter.
Regressed with 85c542ab00.
This ensures that the RAM does not fill up with already processed
coredumps when many tests crash (as is the case on AArch64). We only
do this in self-test mode so as to avoid racing CrashDaemon.
This is not a beautiful program, but it does allow you to regenerate
the baseline expectation for a given layout or text test with a single
command. :^)
We already clamp these values to zero, so it's actually pretty harmless
when this happens. If someone wants to investigate these issues deeper
and see if they can be fixed earlier in the layout pipeline, they can
enable the spam locally.
for_each_line_segment_on_elliptical_arc() flips the start/end points
for negative theta deltas. When doing this we have to make sure the
line segments emitted swap the start/end points back, so that the
(correct) winding order can be calculated from them.
This makes nonzero fills not totally broken for a lot of SVGs.
This is an implementation of the scanline edge-flag algorithm for
antialiased path filling described here:
https://mlab.taik.fi/~kkallio/antialiasing/EdgeFlagAA.pdf
The initial implementation does not try to implement every possible
optimization in favour of keeping things simple. However, it does
support:
- Both evenodd and nonzero fill rules
- Applying paint styles/gradients
- A range of samples per pixel (8, 16, 32)
- Very nice antialiasing :^)
This replaces the previous path filling code, that only really applied
antialiasing in the x-axis.
There's some very nice improvements around the web with this change,
especially for small icons. Strokes are still a bit wonky, as they don't
yet use this rasterizer, but I think it should be possible to convert
them to do so.
The glyph bitmap is a grayscale image that is multiplied with the
requested color provided to `Gfx::Painter::draw_glyph()` to get the
final glyph bitmap that can be blitted.
Using `Gfx::Color::multiply()` is unnecessary however: by simply taking
the destination color and copying over the glyph bitmap's alpha value,
we can prevent four multiplications and divisions per pixel.
In an artifical benchmark I wrote, this improved glyph rendering
performance by ~9%.
This is safe because:
* prediction only computes averages, or explicitly clamps for
TM_PRED / B_TM_PRED. Since the inputs are in [0, 255], so will the
outputs.
* Addition of IDCT and prediction buffer is immediately clamped back
to [0, 255]
No behavior change, and matches what both libwebp and the reference
implementation in rfc6386 do.
https://datatracker.ietf.org/doc/html/rfc6386#section-14.5 says:
"""
The summing procedure is fairly straightforward, having only a couple
of details. The prediction and residue buffers are both arrays of
16-bit signed integers. Each individual (Y, U, and V pixel) result
is calculated first as a 32-bit sum of the prediction and residue,
and is then saturated to 8-bit unsigned range (using, say, the
clamp255 function defined above) before being stored as an 8-bit
unsigned pixel value.
"""
It's IMHO not 100% clear if the clamping is supposed to happen
immediately (so that it affects prediction inputs for the next
macroblock) or later.
But vp8_dixie_idct_add() on page 173 in
https://datatracker.ietf.org/doc/html/rfc6386#section-20.8 does:
recon[0] = CLAMP_255(predict[0] + ((a1 + d1 + 4) >> 3));
So it does look like it should happen immediately.
(I'm a bit confused why the spec then says "The prediction and residue
buffers are both arrays of 16-bit signed integers", since the
prediction buffer can just be an u8 buffer now, without changing
behavior.
This is enforced by the hardware and an exception is generated when the
stack pointer is not properly aligned. This brings us closer to booting
the aarch64 Kernel on baremetal.
With this, the WAV loader is a completely modern LibAudio loader:
- Own type header for RIFF data structures
- custom stream read functions for the types
- Final removal of legacy I/O error checking
- clearer error messages
- clean handling of header chunks
The latter will allow proper handling of other chunks (before "data") in
the future, such as metadata :^)
The spec has that clamp at the end of
https://datatracker.ietf.org/doc/html/rfc6386#section-12.2:
The exact algorithm is as follows:
[...]
b[r][c] = clamp255(L[r]+ A[c] - P);
For the test images I'm looking at, it doesn't seem to make a
dramatic difference, but omitting it in `B_TM_PRED` did make
a dramatic difference, so add it. (Also, the spec demands it.)
Before this commit, notifications would appear in the top left of the
screen when created, then move to the top right once hovered by the
mouse. This happened because the first notification would use its own
default-constructed position of 0,0 as a point of reference.
Each secondary partition has an independent BooleanDecoder.
Their bitstreams interleave per macroblock row, that is the first
macroblock row is read from the first decoder, the second from the
second, ..., until it wraps around again.
All partitions share a single prediction state though: The second
macroblock row (which reads coefficients off the second decoder) is
predicted using the result of decoding the frist macroblock row (which
reads coefficients off the first decoder).
So if I understand things right, in theory the coefficient reading could
be parallelized, but prediction can't be. (IDCT can also be
parallelized, but that's true with just a single partition too.)
I created the test image by running
examples/cwebp -low_memory -partitions 3 -o foo.webp \
~/src/serenity/Tests/LibGfx/test-inputs/4.webp
using a cwebp hacked up as described in #19149. Since creating
multi-partition lossy webps requires hacking up `cwebp`, they're likely
very rare in practice. (But maybe other programs using the libwebp API
create them.)
Fixes#19149.
With this, webp lossy support is complete (*) :^)
And with that, webp support is complete: Lossless, lossy, lossy with
alpha, animated lossless, animated lossy, animated lossy with alpha all
work.
(*: Loop filtering isn't implemented yet, which has a minor visual
effect on the output. But it's only visible when carefully comparing
a webp decoded without loop filtering to the same decoded with it.
But it's technically a part of the spec that's still missing.
The upsampling of UV in the YUV->RGB code is also low-quality. This
produces somewhat visible banding in practice in some images (e.g.
in the fire breather's face in 5.webp), so we should probably improve
that at some point. Our JPG decoder has the same issue.)
Currently, we only look at the relative path `./{helper}/{helper}`,
which fails if the working directory is not the same as the directory
where the ladybird binary lives.
Previously, we would unwrap the duration value even when it contained
an error, causing an assertion failure.
Later, we should change it to return the last known sample timestamp
in the media data, allowing for example live-streamed video to have
a semi-useful duration. In general, though, this is not used by
players in the wild, so we can leave it for now.
These are only used during layout, and always within formatting context
code, so we might as well put them in FormattingContext and avoid having
to pass the LayoutState around all the time.
At one point in the past, we had some functions that were called across
different formatting context types, which necessitated making them
static and taking the LayoutState as a parameter.
In all cases, those functions were used to do incorrect hacks, all of
which we've replaced with more correct solutions. :^)