Commit graph

644 commits

Author SHA1 Message Date
Nico Weber
ab7da32d25 LibGfx/JPEG2000: Support jpx extended 'colr' boxes
The T.800 spec says there should only be one 'colr' box, but the
extended jpx file format spec in T.801 annex M allows having multiple.

Method 2 is a basic ICC profile, while method 3 (jpx-only) allows full
ICC profiles. Support that.

For the test, I opened buggie.png in Photoshop, converted it to
grayscale, and saved it as a JPEG2000, with "JP2 Compatible" checked
and "Include Transparency" unchecked. I also unchecked "Include
Metadata", and "Lossless". I left "Fast Mode" checked and the quality
at the default 50.
2024-03-30 10:01:07 +01:00
Nico Weber
578f301017 LibGfx/ISOBMFF: Print box type if a box fails to consume all its data 2024-03-30 10:01:07 +01:00
Nico Weber
5a43fc83bd LibGfx/JBIG2Loader: Add a short comment with spec history
I found this interesting, and it also explains e.g. why some
of the step numbers in 6.4 Text Region Decoding Procedure are off --
they added step 3) for COLEXTFLAG and forgot to update step references
to later steps.
2024-03-27 11:55:09 -04:00
Nico Weber
2e6626ae3b LibGfx/JBIG2: Tweak spec comment quote characters 2024-03-26 17:21:47 -04:00
Nico Weber
6842299959 LibGfx/JBIG2: Fix a comment typo 2024-03-26 17:21:47 -04:00
Nico Weber
6ff446fd30 LibGfx/JBIG2: Fix a comment typo 2024-03-26 17:21:47 -04:00
Nico Weber
a2a5fc76aa LibGfx/JBIG2: Don't assert on unexpected OOB values in the bitstream
This should only happen on either invalid inputs or if our code has
a bug (gasp!). Printing an error instead of asserting seems nicer.
2024-03-26 17:21:47 -04:00
Nico Weber
b3c423e4ca LibGfx/ISOBMFF: Implement JPEG2000DefaultDisplayResolutionBox
Found e.g. in http://opf-labs.org/format-corpus/jp2k-formats/balloon.jpf
2024-03-26 17:19:03 -04:00
Nico Weber
a9ef2fac01 LibGfx/ISOBMFF: Introduce JPEG2000ResolutionSubboxBase
No behavior change.
2024-03-26 17:19:03 -04:00
Nico Weber
a971625c49 LibGfx/ISOBMFF: Implement UserExtensionBox
.jpf (JPEG2000) files written by Photoshop contain a whole bunch of
these boxes.

fileformats.archiveteam.org/wiki/Boxes/atoms_format lists a few
UUID types. Of those 3, these are in Photoshop-written .jpf files:

* 0537cdab-9d0c-4431-a72a-fa561f2a113e Exif
* 2c4c0100-8504-40b9-a03e-562148d6dfeb Photoshop Image Resource
* be7acfcb-97a9-42e8-9c71-999491e3afac XMP
2024-03-26 17:19:03 -04:00
Nico Weber
a92d887ee3 LibGfx/JPEG2000: Read file structure
This is enough for `file` to print the dimensions of .jp2 / .jpx files,
and for `icc` to print color profile information embedded in the
'colr' box.
2024-03-25 20:35:00 +01:00
Nico Weber
1ab28276f6 LibGfx: Add the start of a JPEG2000 loader
JPEG2000 is the last image format used in PDF filters that we
don't have a loader for. Let's change that.

This adds all the scaffolding, but no actual implementation yet.
2024-03-25 20:35:00 +01:00
Nico Weber
1e95c08db5 LibGfx/ISOBMFF: Add JPEG2000ChannelDefinitionBox 2024-03-25 20:35:00 +01:00
Nico Weber
f080836127 LibGfx/ISOBMFF: Add JPEG2000URLBox 2024-03-25 20:35:00 +01:00
Nico Weber
c58996f4fc LibGfx/ISOBMFF: Add JPEG2000ContiguousCodestreamBox 2024-03-25 20:35:00 +01:00
Nico Weber
f372a9b346 LibGfx/ISOBMFF: Add JPEG2000UUIDListBox 2024-03-25 20:35:00 +01:00
Nico Weber
4a95e55fb3 LibGfx/ISOBMFF: Add JPEG2000CaptureResolutionBox 2024-03-25 20:35:00 +01:00
Nico Weber
b386d5bb14 LibGfx/ISOBMFF: Add JPEG2000ResolutionBox 2024-03-25 20:35:00 +01:00
Nico Weber
7d137dc480 LibGfx/ISOBMFF: Add JPEG2000UUIDInfoBox 2024-03-25 20:35:00 +01:00
Nico Weber
214ff799ce LibGfx/ISOBMFF: Add JPEG2000ColorSpecificationBox 2024-03-25 20:35:00 +01:00
Nico Weber
59bd378db8 LibGfx/ISOBMFF: Add JPEG2000ImageHeaderBox 2024-03-25 20:35:00 +01:00
Nico Weber
78deac3dca LibGfx/ISOBMFF: Give Reader::read_entire_file() a factory callback
This will allow creating different child boxes in different containers.
2024-03-25 20:35:00 +01:00
Nico Weber
b7a120c47e LibGfx/ISOBMFF: Remove Box::read_from_stream()
This doesn't have to be a virtual method: it's called from
various create_from_stream() methods that have a static type
that's created. There's no point in the virtual call here,
and it makes it harder to add additional parameters to
read_from_stream() in some subclasses.
2024-03-25 20:35:00 +01:00
Nico Weber
c84487ed2d LibGfx/ISOBMFF: Give JPEG2000HeaderBox its own type
...and make SuperBox a pure superclass that's not usable by itself.
2024-03-25 20:35:00 +01:00
Nico Weber
65bd090815 LibGfx/ISOBMFF: Start creating JPEG2000 box types
`isobmff` can now dump the id in a JPEG2000SignatureBox.
Creates JPEG2000Boxes.{h,cpp} to house JPEG2000 box types.
2024-03-25 20:35:00 +01:00
Nico Weber
a073b2d047 LibGfx/ISOBMFF: Read JPEG2000HeaderBox 2024-03-25 20:35:00 +01:00
Nico Weber
15ba0a7e18 LibGfx/ISOBMFF: Make BoxStream MaybeOwn its stream
...and make Reader always have a BoxStream.
2024-03-25 20:35:00 +01:00
Nico Weber
a72770cdf6 LibGfx/ISOBMFF: Add JPEG2000 box types
I prefixed the types that are labeled as "JPEG2000" on
https://mp4ra.org/registered-types/boxes with "JPEG2000".
2024-03-25 20:35:00 +01:00
Nico Weber
cdbdc334de LibGfx/ISOBMFF: Alphabetize box type ENUMERATE_ONE() lines 2024-03-25 20:35:00 +01:00
Nico Weber
e81009b338 LibGfx/ISOBMFF: Put string literals in box type ENUMERATE_ONE()
This allows types that have spaces in their FourCC.
2024-03-25 20:35:00 +01:00
Nico Weber
bdb4f6bd49 LibGfx/ISOBMFF: Remove prototypes for nonexistent methods 2024-03-25 20:35:00 +01:00
Nico Weber
270d3303ce LibGfx/ISOBMFF: FileTypeBox is not a FullBox 2024-03-25 20:35:00 +01:00
Nico Weber
7dd5457b8f LibGfx/JBIG2: Add support for refinement coding template 1
This is used when refining a symbol in 0000337.pdf.
2024-03-25 13:16:02 -04:00
Nico Weber
ef9bfce0e7 LibGfx/JBIG2: Add support for SDREFAGG=1 symbol segments
...but only as long as REFAGGNINST == 1. That's enough for 0000337.pdf.
Except that it also needs GRTEMPLATE=1 support in the generic
refinement region decoding procedure, so no behaivor change yet.
2024-03-25 13:16:02 -04:00
Nico Weber
3fa2ecdd65 LibGfx/JBIG2: Extract read_id() into a class
We'll need this for refinement/aggregate coding of symbols.
2024-03-25 13:16:02 -04:00
Nico Weber
68d47cb84a LibGfx/JBIG2: Implement support for symbols segments with input symbols
Needed for 0000337.pdf. It now fails complaining about missing SDREFAGG
support.
2024-03-25 13:16:02 -04:00
Nico Weber
59e6a10f30 LibGfx/JBIG2: Initialize POD members of refinement region input struct
I missed putting this in #23696 while juggling local branches.

No behavior change.
2024-03-25 12:07:18 -04:00
Nico Weber
8e9157d6ce LibGfx/JBIG2: Implement decode_end_of_stripe() a bit
This is enough to be able to decode 0000857.pdf p1-4 and
0000372.pdf p11.
2024-03-25 14:08:40 +01:00
Nico Weber
c4a45bb521 LibGfx/JBIG2: Make compute_context() a function pointer
...instead of a lambda that checks the template on every call.

Doesn't make a performance difference locally, but seems maybe nicer?

No behavior change.
2024-03-25 14:08:40 +01:00
Nico Weber
828c640087 LibGfx/JBIG2: Make get_pixel static constexpr
...so it doesn't need to be captured.
2024-03-25 14:08:40 +01:00
Nico Weber
b45a4508c7 LibGfx/JBIG2: Implement support for context templates 1, 2, and 3
Template 2 is needed by some symbols in 0000372.pdf page 11 and
0000857.pdf pages 1-4. Implement the others too while here.  (The
mentioned pages in those two PDFs also use the "end of stripe" segment,
so they still don't render yet.

We still don't support EXTTEMPLATE.
2024-03-25 14:08:40 +01:00
Nico Weber
7035c2a2ff LibGfx/JBIG2: Add some debug logging to decode_page_information() 2024-03-25 14:08:40 +01:00
Nico Weber
d2998c1f5e LibGfx/JBIG2: Implement generic_refinement_region_decoding_procedure()
With this, we can decode all pages of 0000425.pdf, 0000215.pdf,
0000882.pdf, and 0000057.pdf.
2024-03-25 08:15:36 +01:00
Nico Weber
0d2e91b4ea LibGfx/JBIG2: Reject things in refinement decoding
These aren't hit for my 1000 page PDF test set.
2024-03-25 08:15:36 +01:00
Nico Weber
562d8ed619 LibGfx/JBIG2: Stub out generic_refinement_region_decoding_procedure()
...and make text_region_decoding_procedure() call it.

generic_refinement_region_decoding_procedure() still just returns
"unimplemented", so no behavior change yet.
2024-03-25 08:15:36 +01:00
Nico Weber
c4c48c1d5f LibGfx/JBIG2: Sketch out text segment refinement coding a bit 2024-03-25 08:15:36 +01:00
Nico Weber
9f327833c0 LibGfx/JBIG2: Read refinement adaptive template pixels for text segments
Text segments using refinement are still rejected later, by
text_region_decoding_procedure(). But we deserialize the input data now,
and the error when this feature is used is now slightly different.
2024-03-25 08:15:36 +01:00
Nico Weber
ced21d8419 LibGfx/JBIG2: Call decode_immediate_text_region for lossless text region
It seems to do the right thing already, and nothing in the spec says
not to do this as far as I can tell.

With this, we can finally decode the test input from #23659.

See f391c7822d for a similar change for generic regions and
lossless generic regions.
2024-03-23 17:30:15 -04:00
Nico Weber
b15e1d2b2a LibGfx/JBIG2: Implement initial support for text segments
Text segments conceptually store (x,y,id) triples. (x,y) are a
coordinate, and id refers to an id from a symbol segment.
A text segment has the effect of drawing some of the bitmaps stored
in a symbol segment to the output bitmap.

For example, the symbol segment might contain a small bitmap that
happens to look like the letter 'A', and the text segment might
draw that everywhere a scanned page has an 'A'. (The JBIG2 format
only treats it as an abstract bitmap. It doesn't know that this
small bitmap is an 'A'.)

This is missing support for many things:

* Huffman-coded input (not used in practice)
* Symbol refinement
* Transposed symbols
* Colors (not used in practice)

Still, we now have basic symbol/text segment support. This is enough
to decode the downloadable PDF here:
https://www.google.com/books/edition/Paradise_Lost/6qdbAAAAQAAJ

It doesn't lead to any progression on my 1000 file test PDF set.
The 7 files in there that use JBIG2 with symbol and text segments
now fail to load for other reasons (4 need symbol refinement for
text segments, one needs end-of-stripe segment support, one needs
support for symbol segments referring to other segments).

(And possibly, many other PDFs from Google Books, but that's the
only one I've tried so far.)
2024-03-23 17:30:15 -04:00
Nico Weber
3454970903 LibGfx/JBIG2: Extract composite_bitbuffer() and add some features
This extracts the bitbuffer combining code we had into a new function
composite_bitbuffer() and adds the following features:

* Real support for combination operators (which also lets us allow black
  as background color again, even if that's never used in practice)
* Clipping support (not used here yet, but will be needed elsewhere
  soon)

We're going to need this for text segment handling.

No behavior change.
2024-03-23 17:30:15 -04:00