Skip to content
LivePython 3.12 · pikepdf · fonttools · pdfminer.six · pytest · mypy strict

pdf-edit-engine

Change the words in a PDF without touching the layout — operator-level content-stream surgery that preserves fonts, kerning, and pixel-exact positioning.

~ $ pdf-edit-engine
$ pip install pdf-edit-engine
Installed pdf-edit-engine 0.1.0
>>> from pdf_edit_engine import replace
>>> r = replace("invoice.pdf",
... "$12,500", "$13,250",
... "invoice-v2.pdf")
>>> r.fidelity_report.font_preserved
True

Install

pip
pip install pdf-edit-engine

The Challenge

Editing text in an existing PDF is a constant need — names, dates, typos, labels — but PDF was designed as a display format, not an editing format. Text is stored as positioned glyph indices, not editable strings, which is why every mainstream tool falls back to one of two approaches: redact the area and stamp new text over it with a substitute font, or extract to another format and re-render. Both silently destroy the original fonts, kerning, and exact pixel positioning. I hit this wall while building pdf-toolkit-mcp and realised there was no production-grade library that could change the words in a PDF while keeping everything else identical.

The Approach

Instead of treating a PDF as a document, pdf-edit-engine treats it as an instruction stream. It interprets the content-stream operators inside BT/ET blocks, tracks graphics state — transformation matrix, active font, colour — and modifies the operators themselves. Where PyMuPDF — the mainstream Python tool — covers the original text with a white rectangle and stamps replacement text with a substitute font, pdf-edit-engine keeps the original glyphs and just changes the operators that position them. A two-tier font system extends embedded subsets on demand: a CMap-only fast path when the needed glyphs already exist in the font binary, and a full re-embed with --retain-gids when they don't. Replacement text has its kerning redistributed across glyphs so the output preserves the original string width exactly — no visible spacing gaps. Every edit returns a structured FidelityReport (font_preserved, overflow_detected, reflow_applied, glyphs_missing) so automated pipelines and AI agents can verify quality programmatically without visual review. Every function also supports dry_run=True to preview the report before touching disk.

The Impact

Shipped to PyPI as pdf-edit-engine v0.1.0 with 628 tests at 85% coverage under mypy strict. The CI matrix validates the engine against seven PDF generators (Chrome, Google Docs, four reportlab variants, pikepdf synthetic) with 100% character agreement across all of them. Benchmarks on a 100-page PDF: 0.3 s to index 900 matches, 0.03 s to replace on a single page, 0.1 s for a 50-edit batch, under 500 MB of memory. The engine powers pdf-edit-mcp — a 38-tool MCP server that brings format-preserving editing to AI agents.

Tech Stack

Python 3.12 + pikepdf

pikepdf gives byte-level access to content streams and can unparse modified operators back into a valid PDF — the foundation of the in-place edit approach.

fonttools

Font introspection, CMap parsing, and glyph metrics drive both the two-tier subset-extension algorithm and the kerning-redistribution that preserves original text widths.

pdfminer.six

Position-aware text extraction. The engine correlates pdfminer's layout with pikepdf's content streams so operator-level edits target the correct glyphs.

pytest + mypy strict

628 tests across seven PDF generators catch encoding and font edge cases; mypy strict keeps the public API type-safe so downstream tools like pdf-edit-mcp get reliable typings.