Commit graph

71 commits

Author SHA1 Message Date
Nico Weber
d27722ee00 LibGfx/JBIG2: Implement decode_pattern_dictionary()
It calls pattern_dictionary_decoding_procedure(), which is stubbed out.

No real behavior change yet.
2024-04-08 06:27:51 +02:00
Nico Weber
825b4d4e94 LibGfx/JBIG2: Tweak decode_immediate_generic_region()
Set context only for non-MMR.

No behavior change.
2024-04-08 06:27:51 +02:00
Nico Weber
2d4964b945 LibGfx/JBIG2: Support custom adaptive template pixels in refinement
The implementation is very similar to #23831.

I created the test exactly like in #23713, except that I replaced the
last four lines in the ini file with:

```
-txt -Param -rATX1 10
-txt -Param -rATY1 -1
-txt -Param -rATX2 4
-txt -Param -rATY2 15
```
2024-04-05 21:32:18 +02:00
Nico Weber
154d0bb458 LibGfx/JBIG2: Extract check_valid_adaptive_template_pixel()
No behavior change.
2024-04-05 21:32:18 +02:00
Nico Weber
a0a14296f9 LibGfx/JBIG2: Implement support for custom adaptive template pixels
...in the generic region decoding procedure (not yet in the generic
refinement region procedure). Not yet for EXTTEMPLATE though.

I haven't seen these being used in the wild, but:
* I want to optimize this code some, and it's probably good if it
  is feature complete (and well-tested) before being optimized
* Other PDF engines implement support for this
* The Pattern/Halftone feature (which we don't yet implement either,
  but which I'd like to implement because see previous two bullets)
  calls the generic region decoding procedure with custom adaptive
  template pixels
2024-04-04 11:44:50 -04:00
Nico Weber
7b5852bf91 LibGfx/JBIG2: Fix rendering of transposed text
The current code was incorrect for non-TopRight reference corners.
Fixes rendering of ghostpdl/tests/jbig2/042_19.jb2.
2024-04-03 11:40:25 -04:00
Nico Weber
b31e3fe573 LibGfx/JBIG2: Add some debug logging for text segments 2024-04-03 11:40:25 -04:00
Nico Weber
b130e36330 LibGfx/JBIG2: Add spaces to some spec comments 2024-04-03 11:40:25 -04:00
Nico Weber
b04569c1da LibGfx/JBIG2: Implement support for transposed text regions
Only the coordinates get transposed -- the bitmaps apparently don't.
And all the prose amounts to "if the transposed bit is set, swap
instance s and t coordinates before painting", as far as I can tell.

Makes pages 3/4 and 7/8 in 0001346.pdf render. (But here the feature
isn't used to render transposed text -- it just has stripes that keep s
roughy constant, which would normally produce vertical runs but here
produces regular horizontal runs. It's not clear to me why this feature
is used for these pages!)
2024-04-01 14:41:17 +02:00
Nico Weber
ca6ebedf58 LibGfx/JBIG2: Simplify non-transposed text region coordinate math
If the origin is on the right, we need to subtract width - 1 from s,
if it's on top bottom, we need to subtract height - 1 from t.

No behavior change.
2024-04-01 14:41:17 +02:00
Nico Weber
43752a1ff8 LibGfx/JBIG2: Remove a now-unneeded void cast
We've been reading `segment_page_association_size_is_32_bits` a bit
further down for a while now.

No behavior change.
2024-04-01 08:27:10 +01:00
Nico Weber
5a43fc83bd LibGfx/JBIG2Loader: Add a short comment with spec history
I found this interesting, and it also explains e.g. why some
of the step numbers in 6.4 Text Region Decoding Procedure are off --
they added step 3) for COLEXTFLAG and forgot to update step references
to later steps.
2024-03-27 11:55:09 -04:00
Nico Weber
2e6626ae3b LibGfx/JBIG2: Tweak spec comment quote characters 2024-03-26 17:21:47 -04:00
Nico Weber
6842299959 LibGfx/JBIG2: Fix a comment typo 2024-03-26 17:21:47 -04:00
Nico Weber
6ff446fd30 LibGfx/JBIG2: Fix a comment typo 2024-03-26 17:21:47 -04:00
Nico Weber
a2a5fc76aa LibGfx/JBIG2: Don't assert on unexpected OOB values in the bitstream
This should only happen on either invalid inputs or if our code has
a bug (gasp!). Printing an error instead of asserting seems nicer.
2024-03-26 17:21:47 -04:00
Nico Weber
7dd5457b8f LibGfx/JBIG2: Add support for refinement coding template 1
This is used when refining a symbol in 0000337.pdf.
2024-03-25 13:16:02 -04:00
Nico Weber
ef9bfce0e7 LibGfx/JBIG2: Add support for SDREFAGG=1 symbol segments
...but only as long as REFAGGNINST == 1. That's enough for 0000337.pdf.
Except that it also needs GRTEMPLATE=1 support in the generic
refinement region decoding procedure, so no behaivor change yet.
2024-03-25 13:16:02 -04:00
Nico Weber
3fa2ecdd65 LibGfx/JBIG2: Extract read_id() into a class
We'll need this for refinement/aggregate coding of symbols.
2024-03-25 13:16:02 -04:00
Nico Weber
68d47cb84a LibGfx/JBIG2: Implement support for symbols segments with input symbols
Needed for 0000337.pdf. It now fails complaining about missing SDREFAGG
support.
2024-03-25 13:16:02 -04:00
Nico Weber
59e6a10f30 LibGfx/JBIG2: Initialize POD members of refinement region input struct
I missed putting this in #23696 while juggling local branches.

No behavior change.
2024-03-25 12:07:18 -04:00
Nico Weber
8e9157d6ce LibGfx/JBIG2: Implement decode_end_of_stripe() a bit
This is enough to be able to decode 0000857.pdf p1-4 and
0000372.pdf p11.
2024-03-25 14:08:40 +01:00
Nico Weber
c4a45bb521 LibGfx/JBIG2: Make compute_context() a function pointer
...instead of a lambda that checks the template on every call.

Doesn't make a performance difference locally, but seems maybe nicer?

No behavior change.
2024-03-25 14:08:40 +01:00
Nico Weber
828c640087 LibGfx/JBIG2: Make get_pixel static constexpr
...so it doesn't need to be captured.
2024-03-25 14:08:40 +01:00
Nico Weber
b45a4508c7 LibGfx/JBIG2: Implement support for context templates 1, 2, and 3
Template 2 is needed by some symbols in 0000372.pdf page 11 and
0000857.pdf pages 1-4. Implement the others too while here.  (The
mentioned pages in those two PDFs also use the "end of stripe" segment,
so they still don't render yet.

We still don't support EXTTEMPLATE.
2024-03-25 14:08:40 +01:00
Nico Weber
7035c2a2ff LibGfx/JBIG2: Add some debug logging to decode_page_information() 2024-03-25 14:08:40 +01:00
Nico Weber
d2998c1f5e LibGfx/JBIG2: Implement generic_refinement_region_decoding_procedure()
With this, we can decode all pages of 0000425.pdf, 0000215.pdf,
0000882.pdf, and 0000057.pdf.
2024-03-25 08:15:36 +01:00
Nico Weber
0d2e91b4ea LibGfx/JBIG2: Reject things in refinement decoding
These aren't hit for my 1000 page PDF test set.
2024-03-25 08:15:36 +01:00
Nico Weber
562d8ed619 LibGfx/JBIG2: Stub out generic_refinement_region_decoding_procedure()
...and make text_region_decoding_procedure() call it.

generic_refinement_region_decoding_procedure() still just returns
"unimplemented", so no behavior change yet.
2024-03-25 08:15:36 +01:00
Nico Weber
c4c48c1d5f LibGfx/JBIG2: Sketch out text segment refinement coding a bit 2024-03-25 08:15:36 +01:00
Nico Weber
9f327833c0 LibGfx/JBIG2: Read refinement adaptive template pixels for text segments
Text segments using refinement are still rejected later, by
text_region_decoding_procedure(). But we deserialize the input data now,
and the error when this feature is used is now slightly different.
2024-03-25 08:15:36 +01:00
Nico Weber
ced21d8419 LibGfx/JBIG2: Call decode_immediate_text_region for lossless text region
It seems to do the right thing already, and nothing in the spec says
not to do this as far as I can tell.

With this, we can finally decode the test input from #23659.

See f391c7822d for a similar change for generic regions and
lossless generic regions.
2024-03-23 17:30:15 -04:00
Nico Weber
b15e1d2b2a LibGfx/JBIG2: Implement initial support for text segments
Text segments conceptually store (x,y,id) triples. (x,y) are a
coordinate, and id refers to an id from a symbol segment.
A text segment has the effect of drawing some of the bitmaps stored
in a symbol segment to the output bitmap.

For example, the symbol segment might contain a small bitmap that
happens to look like the letter 'A', and the text segment might
draw that everywhere a scanned page has an 'A'. (The JBIG2 format
only treats it as an abstract bitmap. It doesn't know that this
small bitmap is an 'A'.)

This is missing support for many things:

* Huffman-coded input (not used in practice)
* Symbol refinement
* Transposed symbols
* Colors (not used in practice)

Still, we now have basic symbol/text segment support. This is enough
to decode the downloadable PDF here:
https://www.google.com/books/edition/Paradise_Lost/6qdbAAAAQAAJ

It doesn't lead to any progression on my 1000 file test PDF set.
The 7 files in there that use JBIG2 with symbol and text segments
now fail to load for other reasons (4 need symbol refinement for
text segments, one needs end-of-stripe segment support, one needs
support for symbol segments referring to other segments).

(And possibly, many other PDFs from Google Books, but that's the
only one I've tried so far.)
2024-03-23 17:30:15 -04:00
Nico Weber
3454970903 LibGfx/JBIG2: Extract composite_bitbuffer() and add some features
This extracts the bitbuffer combining code we had into a new function
composite_bitbuffer() and adds the following features:

* Real support for combination operators (which also lets us allow black
  as background color again, even if that's never used in practice)
* Clipping support (not used here yet, but will be needed elsewhere
  soon)

We're going to need this for text segment handling.

No behavior change.
2024-03-23 17:30:15 -04:00
Nico Weber
754e1b46fc LibGfx/JBIG2: Implement basic symbol segment processing
A symbol segment defines a bunch of small bitmaps and associates them
with numeric IDs.

This only implements reading symbols encoded with the arithmetic coder.
It does not support huffman coding. (In practice, everything seems to
use arithmetic coding.)

Support for refinement or aggregate coding isn't implemented yet.
Support for retaining bitmap coding contexts isn't implemented yet.
Support for symbol segments referring to other symbol segments isn't
implemented yet.
But all produce diagnostics if encountered, so we won't forget about
them. (I haven't seen either being used in the wild.)

No visible behavior change yet, but with JBIG2_DEBUG turned on,
it produces all kinds of debug output.
2024-03-23 17:30:15 -04:00
Nico Weber
93fcb529cf LibGfx/JBIG2: Move SegmentData down a bit
Symbol segments will store decoded symbols, and for that SegmentData
needs to come after BitBuffer.

No behavior change.
2024-03-23 17:30:15 -04:00
Nico Weber
2099ca48a1 LibGfx/JBIG2: Pass in decoder and contexts to generic region decoder
The symbol segment decoding procedure will read generic regions
that aren't at a byte boundary, and that share contexts across
several regions.

No behavior change.
2024-03-23 17:30:15 -04:00
Nico Weber
376b1a2309 LibGfx/JBIG2: Have just one CombinationOperator enum class
We already had two, and we would need another one for text segments.

No behavior change.
2024-03-23 17:30:15 -04:00
Nico Weber
c06110da87 LibGfx/JBIG2: Make AdaptiveTemplatePixel toplevel
We're going to need it for symbol segment decoding too.

No behavior change.
2024-03-23 17:30:15 -04:00
Nico Weber
8e82c2b932 LibGfx/JBIG2: Add arithmetic integer decoder
The existing ArithmeticEncoder (from Annex E) reads one bit at a
time.

ArithmeticIntegerDecoder (from Annex A) builds on top of that to
read integer values.

This will be used by both the symbol segment and the text segment
readers.

(This does not yet implement the IAID decoding procedure in A.3.
We only need that one in the text segment decoder at the moment,
and it's pretty small, so I'll put it inline there for now.)

Not used yet, so no behavior change yet.
2024-03-23 17:30:15 -04:00
Nico Weber
c99506da7d LibGfx/JBIG2: Initialize POD members
And use Array<> instead of C-style arrays.
2024-03-23 17:30:15 -04:00
Nico Weber
7650e657aa LibGfx/JBIG2: Implement support for TPGDON 2024-03-17 17:38:30 +01:00
Nico Weber
f391c7822d LibGfx/JBIG2: Call decode_immediate_generic_region for lossless regions
It seems to do the right thing already, and nothing in the spec says
not to do this as far as I can tell.

With this, we can finally decode
Tests/LibGfx/test-inputs/jbig2/bitmap.jbig2 and add a test for
decoding simple arithmetic-coded images.
2024-03-16 09:21:42 -04:00
Nico Weber
6788a82ec5 LibGfx/JBIG2: Implement generic_region_decoding_procedure() happy path
This errors out on many special cases. None of those seem to be hit
in practice (with the exception of TPGDON, which is used in a handful
PDFs. I have an implementation of that locally, but I'll put that
in a separate PR. The code for it is straightforward, but adding a
test for it is a bit involved.)

With this, we can decode about half of the JBIG2 images in my PDF
test dataset.
2024-03-16 09:21:42 -04:00
Nico Weber
b0c73d1652 LibGfx/JBIG2: Reject unimplemented combination operators
In practice, everything uses white backgrounds and operators `or`
or `xor` to turn them black, at least for the simple images we're
about to be able to decode.

To make sure we don't forget implementing this for real once needed,
reject other ops, and also reject black backgrounds (because 1 | 0
is 1, not 0 like our overwrite implementation will produce).

This means we have to remove a test, but since this scenario doesn't
seem to happen in practice, that seems ok.
2024-03-16 09:21:42 -04:00
Nico Weber
5dc9ead1c5 LibGfx/JBIG2: Expand a comment 2024-03-16 09:21:42 -04:00
Nico Weber
21c54839e6 LibGfx/JBIG2: Add two dbgln_if()s 2024-03-16 09:21:42 -04:00
Nico Weber
b8f80501ec LibGfx/JBIG2: Pass Context to get_next_bit() instead of to initialize()
The context can vary for every bit we read.

This does not affect the one use in the test which reuses the same
context for all bits, but it is necessary for future changes.
2024-03-16 09:21:42 -04:00
Nico Weber
df9dd8ec69 LibGfx/JBIG2: Add arithmetic coding decoder
I think the context normally changes for every bit. But this here
is enough to correctly decode the test bitstream in Annex H.2 in
the spec, which seems like a good checkpoint.

The internals of the decoder use spec naming, to make the code
look virtually identical to what's in the spec. (Even so, I managed
to put in several typos that took a while to track down.)
2024-03-14 18:18:15 -06:00
Nico Weber
98729c97f4 LibGfx/JBIG2: Simplify and restrict adaptive template pixel reading
EXTTEMPLATE=1 was added later and doesn't seem to be used much in
practice -- it doesn't appear in no simple generic regions in any PDF
I tested so far at least. Since the spec contradicts itself on what
to do with these as far as I can tell, error out on them for now and
then add support once we find actual files using this, so that we can
check our implementation actually works.

Deduplicate the data reading for the different cases, and
zero-initialize all adaptive template pixels to zero to make that
possible.

Other than prohibiting EXTTEMPLATE=1, no behavior change.
2024-03-14 10:57:57 -04:00