ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-04-27 06:48:49 +00:00

Author	SHA1	Message	Date
Tim Schumacher	b1bfeb391e	LibPDF: Use `Core::Stream` to parse the page offset hint table	2023-01-21 00:45:33 +00:00
Julian Offenhäuser	d1bc89e30b	LibPDF: Try to repair XRef tables with broken indices An XRef table usually starts with an object number of zero. While it could technically start at any other number, this is a tell-tale sign of a broken table. For the "broken" documents I encountered, this always meant that some objects must have been removed from the start of the table, without updating the following indices. When this is the case, the document is not able to be read normally. However, most other PDF parsers seem to know of this quirk and fix the XRef table automatically. Likewise, we now check for this exact case, and if it matches up with what we expect, we update the XRef table such that all object numbers match the actual objects found in the file again.	2022-11-25 22:44:47 +01:00
Julian Offenhäuser	563d91b6c4	LibPDF: Implement loading compressed objects from object streams Now, whenever the xref table points to a compressed object, parse_object_with_index will look it up in the corresponding object stream as if it were a regular object. With this, our parser gains the bare minimum support for xref streams.	2022-09-17 10:07:14 +01:00
Julian Offenhäuser	f9beff7b5e	LibPDF: Initial work on parsing xref streams Since PDF version 1.5, a document may omit the xref table in favor of a new kind of xref stream object. This is used to reference so-called "compressed" objects that are part of an object stream. With this patch we are able to parse this new kind of xref object, but we'll have to implement object streams to use them correctly.	2022-09-17 10:07:14 +01:00
Julian Offenhäuser	4887aacec7	LibPDF: Move document-specific parsing functionality into its own class The Parser class is now a generic PDF object parser, of which the new DocumentParser class derives. DocumentParser now takes over all functions relating to linearization, pages, xref and trailer handling. This allows the use of multiple parsers in the same document's context, which will be needed in order to handle PDF object streams.	2022-09-17 10:07:14 +01:00

5 commits