Commit graph

11 commits

Author SHA1 Message Date
Ben Wiederhake
f866c80222 LibPDF: Avoid unnecessary HashMap copy, mark other copies 2023-05-19 22:33:57 +02:00
Rodrigo Tobar
64bbe431b5 LibPDF: Add char_code -> name mapping function
We already keep both mappings internally, now it's time to actually use
it.
2023-02-08 19:47:15 +01:00
Rodrigo Tobar
286e3e6872 LibPDF: Simplify Encoding to align with simple font requirements
All "Simple Fonts" in PDF (all but Type0 fonts) have the property that
glyphs are selected with single byte character codes. This means that
the Encoding objects should use u8 for representing these character
codes. Moreover, and as mentioned in a previous commit, there is no need
to store the unicode code point associated with a character (which was
in turn wrongly associated to a glyph).

This commit greatly simplifies the Encoding class. Namely it:

 * Removes the unnecessary CharDescriptor class.
 * Changes the internal maps to be u8 -> FlyString and vice-versa,
   effectively providing two-way lookups.
 * Adds a new method to set a two-way u8 -> FlyString mapping and uses
   it in all possible places.
 * Simplified the creation of Encoding objects.
 * Changes how the WinAnsi special treatment for bullet points is
   implemented.
2023-02-02 14:50:38 +01:00
Rodrigo Tobar
2f773b3c5c LibPDF: Stop storing unicode code points in Encoding
In PDF's fonts, encoding objects are used to translate bytes into fonts'
glyphs. Glyphs (in the fonts we currently support) organise their glyphs
in such a way that they are accessed by name, and thus encoding
translate between a byte sequence and a glyph name.

Note that an no point this translation includes a Unicode character, and
therefore assigning a character to a glyph in the Encoding object is the
wrong thing to do. Moreover, using the code point for this character
during the byte-sequence-to-glyph translation sequence is double-wrong.

This commit removes the characters associated to each translation in the
built-in Encoding objects. In order to keep commits short and sweet, I'm
currently simply removing the character from the enumeration, leaving
the old structure this information was held on intact. Instead, I'm
filling the "code_point" member with a zero, and filling both mappings
(which will be changed later on too) with the glyph name and the
associated char code.
2023-02-02 14:50:38 +01:00
Rodrigo Tobar
1ec4ad5eb6 LibPDF: Add name -> char code conversion in Encoding
This is an operation that was already being done (sub-optimally) in
PS1FontProgram, so we are replacing that. We will use this during CFF
parsing too.
2023-01-25 15:40:11 +01:00
Andreas Kling
d6a3be1615 LibPDF: Add missing character quirk for WinAnsiEncoding fonts
Fonts with the encoding name "WinAnsiEncoding" should render missing
characters above character code 040 (octal) as a "bullet" character.

This patch adds Encoding::should_map_to_bullet(char_code) which is then
called by char_code_to_code_point() to check if the given char code
should be displayed as a bullet instead.

I didn't have a good way to test this, so I've only verified that it
works by manually overriding inputs to the function during the rendering
stage.

This takes care of a FIXME in the Annex D part of the PDF specification.
2022-12-08 09:54:20 +01:00
Julian Offenhäuser
b14f0950a5 LibPDF: Add very basic support for Adobe Type 1 font rendering
Previously we would draw all text, no matter what font type, as
Liberation Serif, which results in things like ugly character spacing.

We now have partial support for drawing Type 1 glyphs, which are part of
a PostScript font program. We completely ignore hinting for now, which
results in ugly looking characters at low resolutions, but gain support
for a large number of typefaces, including most of the default fonts
used in TeX.
2022-10-16 17:44:54 +02:00
Julian Offenhäuser
04cb00dc9a LibPDF: Fix handling of differences array in custom encodings
When looking up differences in the specified encoding, we previously
didn't recognize a lot of characters, namely those that are referred to
by a string in the PDF itself, like "/germandbls".

We now create a mapping of those characters to the code points they are
referring to, and correctly look them up when needed.
2022-09-17 10:07:14 +01:00
sin-ack
3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
Matthew Olsson
49cb040c27 LibPDF: Fix some base-encoding-related crashes 2022-03-31 18:10:45 +02:00
Matthew Olsson
8441fa2bc4 LibPDF: Add support for builtin and custom Encodings 2022-03-29 02:52:57 +02:00