The T.800 spec says there should only be one 'colr' box, but the
extended jpx file format spec in T.801 annex M allows having multiple.
Method 2 is a basic ICC profile, while method 3 (jpx-only) allows full
ICC profiles. Support that.
For the test, I opened buggie.png in Photoshop, converted it to
grayscale, and saved it as a JPEG2000, with "JP2 Compatible" checked
and "Include Transparency" unchecked. I also unchecked "Include
Metadata", and "Lossless". I left "Fast Mode" checked and the quality
at the default 50.
I found this interesting, and it also explains e.g. why some
of the step numbers in 6.4 Text Region Decoding Procedure are off --
they added step 3) for COLEXTFLAG and forgot to update step references
to later steps.
.jpf (JPEG2000) files written by Photoshop contain a whole bunch of
these boxes.
fileformats.archiveteam.org/wiki/Boxes/atoms_format lists a few
UUID types. Of those 3, these are in Photoshop-written .jpf files:
* 0537cdab-9d0c-4431-a72a-fa561f2a113e Exif
* 2c4c0100-8504-40b9-a03e-562148d6dfeb Photoshop Image Resource
* be7acfcb-97a9-42e8-9c71-999491e3afac XMP
JPEG2000 is the last image format used in PDF filters that we
don't have a loader for. Let's change that.
This adds all the scaffolding, but no actual implementation yet.
This doesn't have to be a virtual method: it's called from
various create_from_stream() methods that have a static type
that's created. There's no point in the virtual call here,
and it makes it harder to add additional parameters to
read_from_stream() in some subclasses.
...but only as long as REFAGGNINST == 1. That's enough for 0000337.pdf.
Except that it also needs GRTEMPLATE=1 support in the generic
refinement region decoding procedure, so no behaivor change yet.
...instead of a lambda that checks the template on every call.
Doesn't make a performance difference locally, but seems maybe nicer?
No behavior change.
Template 2 is needed by some symbols in 0000372.pdf page 11 and
0000857.pdf pages 1-4. Implement the others too while here. (The
mentioned pages in those two PDFs also use the "end of stripe" segment,
so they still don't render yet.
We still don't support EXTTEMPLATE.
...and make text_region_decoding_procedure() call it.
generic_refinement_region_decoding_procedure() still just returns
"unimplemented", so no behavior change yet.
Text segments using refinement are still rejected later, by
text_region_decoding_procedure(). But we deserialize the input data now,
and the error when this feature is used is now slightly different.
It seems to do the right thing already, and nothing in the spec says
not to do this as far as I can tell.
With this, we can finally decode the test input from #23659.
See f391c7822d for a similar change for generic regions and
lossless generic regions.
Text segments conceptually store (x,y,id) triples. (x,y) are a
coordinate, and id refers to an id from a symbol segment.
A text segment has the effect of drawing some of the bitmaps stored
in a symbol segment to the output bitmap.
For example, the symbol segment might contain a small bitmap that
happens to look like the letter 'A', and the text segment might
draw that everywhere a scanned page has an 'A'. (The JBIG2 format
only treats it as an abstract bitmap. It doesn't know that this
small bitmap is an 'A'.)
This is missing support for many things:
* Huffman-coded input (not used in practice)
* Symbol refinement
* Transposed symbols
* Colors (not used in practice)
Still, we now have basic symbol/text segment support. This is enough
to decode the downloadable PDF here:
https://www.google.com/books/edition/Paradise_Lost/6qdbAAAAQAAJ
It doesn't lead to any progression on my 1000 file test PDF set.
The 7 files in there that use JBIG2 with symbol and text segments
now fail to load for other reasons (4 need symbol refinement for
text segments, one needs end-of-stripe segment support, one needs
support for symbol segments referring to other segments).
(And possibly, many other PDFs from Google Books, but that's the
only one I've tried so far.)
This extracts the bitbuffer combining code we had into a new function
composite_bitbuffer() and adds the following features:
* Real support for combination operators (which also lets us allow black
as background color again, even if that's never used in practice)
* Clipping support (not used here yet, but will be needed elsewhere
soon)
We're going to need this for text segment handling.
No behavior change.