It's perfectly possible for JavaScript to call unobserve() on an element
that hasn't been observed. Let's stop asserting if that happens. :^)
Fixes#22020
Before this change, there was some confusion possible where an IO would
try to find its way back to the document where we registered it.
This led to an assertion failure in the test I'm adding in the next
commit, so let's fix this first.
IOs now (weakly) remember the document where they are registered, and
only unregister from there.
Per 1.7 spec 3.8.1, there are multiple logical text string types:
* text strings
* ASCII strings
* byte strings
Text strings can be in UTF-16BE, PDFDocEncoding, or (since PDF 2.0)
UTF-8.
But byte strings shouldn't be converted but treated as binary
data.
This makes us no longer convert strings used for drawing page text.
TABLE 5.6 "Text-showing operators" lists the operands for text-showing
operators as just "string", not "text string" (even though these strings
confusingly are called "text strings" in the body text), so not doing
this there is correct (and matches other viewers).
We also no longer incorrectly convert strings used for cypto data
(such as passwords), if they start with an UTF-16BE or UTF-8 marker.
No behavior change for outlines and info dict entries.
https://pdfa.org/understanding-utf-8-in-pdf-2-0/ has a good overview of
this.
(ASCII strings only contain ASCII characters and behave the same
anyways.)
Manually added an Outlines dict with three items, one each for
every text string encoding in its title.
(Preview.app apparently can't handle UTF-8 in outlines either.)
For now, this uses UTF-16BE and UTF-8 marked strings in page body text.
These markings should be ignored in body text.
Hand-written, with `set fenc=latin1` and `set binary` in vim, and
xref etc fixed up by running
mutool clean Tests/LibPDF/encoding.pdf Tests/LibPDF/encoding.pdf
as usual.
Else, outline items that have newlines in them end up with a weird
vertical offset.
(This does affect the outline item's tooltip, which shows the whole
title. But not having a newline there seems alright, arguably
preferable.)
The title of an OutlineItem is already in UTF-8.
This is currently done in LibPDF's Parser::parse_string(). I think
that's not quite the right place (it shouldn't be done for all strings)
and not done quite right (text strings should convert from
PDFDocEncoding to UTF-8 unless prefixed by an UTF-8 BOM), but even if
that changes, I think we'll keep OutlineItem.title in UTF-8.
Implemented by adding the extra 3-value syntax as its own case and only
running it when parsing background-position. I'm sure it could be
implemented in a smarter way but this is still a bunch less code than
before. :^)
This means `object-position` will no longer incorrectly accept the
3-value background-position syntax.
Remove the now-ambiguous and unused `position` enum while we're at it.
(This enum only existed as a hack.)
And dealing with the fallout of doing so. I am not 100% sure that it is
safe for us to be treating Strings in the value sanitization algorithm
in all cases as if they are ASCII, but this commit does not change any
existing behaviour there.