PDF is a structured binary format: a header, a sequence of indirect objects, an xref table that records the byte offset of every object, and a trailer. When any of those parts are damaged — by a failed download, interrupted save, bit rot on archival media, or a corrupted USB transfer — most viewers refuse to open the file. PDF Repair attempts to rescue what is recoverable using a three-stage pipeline running entirely in your browser.
Stage 1 is strict parsing with pdf-lib. The library reads the file as the spec describes and produces a clean rewrite if the structure is valid. This succeeds on PDFs with minor formatting issues that strict viewers reject — non-compliant headers, missing whitespace, slightly malformed xref entries — and produces a standards-compliant output identical in content to the original.
Stage 2 is lenient parsing, also via pdf-lib but with the throwOnInvalidObject option disabled. The parser skips broken objects, reconstructs the xref table by scanning for object headers, and rebuilds the trailer. Text, fonts, hyperlinks, and vector graphics survive when their underlying objects are intact. Pages whose objects are unreadable are dropped from the output.
Stage 3 is rasterization through pdfjs-dist. PDF.js is far more tolerant of corruption than pdf-lib because it was designed to render anything browsers might encounter. Each page that PDF.js can render is captured to a canvas, JPEG-encoded, and stitched into a fresh PDF using pdf-lib. This stage produces a viewable file even from heavily corrupted inputs but loses text selection, search, and copy-paste because every page becomes an image.
The pipeline runs stages in order and stops at the first that succeeds. If Stage 1 produces a usable file, Stages 2 and 3 are skipped. If all three fail, the engine reports which stage failed and why — usually because the file was truncated below a recoverable threshold or because the entire content stream was overwritten with zeros.
Realistic limits: PDFs that have been partially overwritten with garbage, encrypted with a key that is no longer available, or truncated to less than ~200 bytes are unrecoverable. Files damaged by bit-flips on a few bytes usually recover at Stage 1 or Stage 2. Files that crashed mid-save often recover at Stage 2 with the last few pages missing.
All three stages run in your browser — pdf-lib for parsing and rebuilding, pdfjs-dist (with its WebAssembly worker) for rendering. No file leaves your device. After repair, run the result through PDF Compressor if Stage 3 was used, since rasterized pages are much larger than the original text/vector content.