Previously we had a race condition in the page fault handling: We were
relying on the affected Region staying alive while handling the page
fault, but this was not actually guaranteed, as an munmap from another
thread could result in the region being removed concurrently.
This commit closes that hole by extending the lifetime of the region
affected by the page fault until the handling of the page fault is
complete. This is achieved by maintaing a psuedo-reference count on the
region which counts the number of in-progress page faults being handled
on this region, and extending the lifetime of the region while this
counter is non zero.
Since both the increment of the counter by the page fault handler and
the spin loop waiting for it to reach 0 during Region destruction are
serialized using the appropriate AddressSpace spinlock, eventual
progress is guaranteed: As soon as the region is removed from the tree
no more page faults on the region can start.
And similarly correctness is ensured: The counter is incremented under
the same lock, so any page faults that are being handled will have
already incremented the counter before the region is deallocated.
This replaces the previous owning address space pointer. This commit
should not change any of the existing functionality, but it lays down
the groundwork needed to let us properly access the region table under
the address space spinlock during page fault handling.
Instead of setting up the new address space on it's own, and only swap
to the new address space at the end, we now immediately swap to the new
address space (while still keeping the old one alive) and only revert
back to the old one if we fail at any point.
This is done to ensure that the process' active address space (aka the
contents of m_space) always matches actual address space in use by it.
That should allow us to eventually make the page fault handler process-
aware, which will let us properly lock the process address space lock.
When calculating the intrinsic width of a block-level box, we now ignore
the preferred width entirely, and not just when the preferred width
should be treated as auto.
The condition for this was both confused and wrong, as it looked at the
available width around the box, but didn't check for a width constraint
on the box itself.
Just because the available width has an intrinsic sizing constraint
doesn't mean that the box is undergoing intrinsic sizing. It could also
be the box's containing block!
No need to store them in `i32`, the JPEG norm specifies that they are
not bigger than 16 bits for extended JPEGs. So, in this patch, we
replace `i32` with `i16`. It almost divides memory usage by two :^)
When creating a copy of the font containing only the glyphs that are in
use, we previously looped over all possible code points, instead of the
range of code points that are actually in use (and allocated) in the
font. This is a problem, since we index into the array of widths to find
out if a given glyph is used. This array is only as long as the number
of glyphs the font was created with, causing an out of bounds read when
that number is less than our maximum.
Loads of changes that are tightly connected... :/
* Change lambdas to static functions
* Add spec docs to those functions
* Keep the current scope around as a parameter
* Add wrapping classes for some Certificate members
* Parse ec and ecdsa data from certificates
The spec is at best misleading here, suggesting that max_symbol should
be set to "num_code_lengths" if it's not explicitly stored.
But num_code_lengths doesn't mean the num_code_lengths mentioned a few
lines further up in the spec, but alphabet_size!
(I had to cheat and look at libwebp instead of the spec for this: See
vp8l_dec.c, ReadHuffmanCode() which passes alphabet_size to
ReadHuffmanCodeLengths() as num_symbols, and ReadHuffmanCodeLengths()
then sets max_symbol to that.)
I haven't yet found a file that uses max_symbol, so this isn't actually
tested. But it's close to what's in libwebp, so maybe it works!
This improves the decompression time of `clang-15.0.7.src.tar.xz` from
41 seconds down to about 5 seconds.
The reason for this very significant improvement is that LZMA, the
underlying compression of XZ, fills its range decompressor one byte at a
time, causing a lot of overhead at the syscall barrier.
I was originally thinking in the wrong direction when adding this limit,
we can at most read from the buffer until we reach the current write
head. Since that write head is the reference point for the distance,
we need to limit ourselves to that instead of the seekback limit (which
is the maximum of how far back the distance can be).