codex-pdf
Structured PDF extraction API that turns complex files into consistent JSON.
Generate and rewrite print-ready PDFs from a simple API — a discrete build step you call from your own automation. Byte-deterministic output with SHA-256 lineage on every operation.
CompilePDF assembles and rewrites print-ready PDFs over a simple API — every byte deterministic, every operation verifiable, with SHA-256 lineage. AGPL-3.0-licensed, no vendor lock-in.
Fifteen object-tree mutations across structural, hygiene, and lifecycle categories. OCG flips, page boxes, metadata patches, color-space swaps — every op round-trips deterministically.
Twelve mark types across production, proofing, and universal categories — register, crop, color-bar, fold, slug-text — plus PDF/PNG external template ingestion.
Sheet-level step-and-repeat with work-and-turn / tumble. Codex solves the layout via tile_grid; Compile drops cells via pikepdf. Cell-extract round-trip verifier in CI.
Ink-pair spread / choke with three engine slots (pure_python default, ghostscript, external). Codex polygon_offset + delta_e_2000 verification. trap-diff artifact baked in.
Compile Job Definition envelopes — JSON and XML — batch several operations (rewrite, marks, impose, trap) into one API call. Canonical step ordering; strict_order rejects out-of-spec.
One record per producer step, threading input/output SHA-256 + cache_key through the chain. Memory backend default; S3 + Redis lit up for durable storage.
Every producer ships post-condition gates: schema, determinism, nothing-else-touched, plus a producer-specific Layer (marks-hash, cell-extract, delta_e tolerance).
Pdf.save(deterministic_id=True), no wall-clock time inside engines, fixed-decimal numeric formatting. Same input + same plan → same SHA-256 across machines.
Cache keys composed from codex_pdf wheel version + color/geom/document schema versions + producer + plan SHA + input SHA. Section bumps invalidate automatically.
trap-extract walks PDF content streams, finds spot-ink rectangles, and emits suggested trap_zones for every adjacency. Auto-trap pipelines stop being manual.
COMPILE_AUTH_MODE gates producer + CJD + lineage endpoints (bearer / api-key / internal / basic). Healthz / contract / version / metrics stay open by design.
Five Celery task wrappers (one per producer + CJD). Workers run via `celery -A compile_pdf.tasks worker`; /v1/healthz.celery_workers surfaces live count.
Open source · managed hosting
A toolkit of focused, standalone PDF utilities — extraction, preflight, viewing, assembly, imposition planning, and an asset store. Each one plugs into the prepress workflow you already run. Use the open source yourself, or let us host any single tool for you on work.withsynergy.io.
Structured PDF extraction API that turns complex files into consistent JSON.
Programmatic PDF assembly — a deterministic API build step for rewriting and generating print-ready PDFs.
Detection-only PDF preflight engine — 500+ checks plus the PDF/X-4 conformance suite.
Embeddable PDF viewer with separations, TAC, layers, and annotation overlays.
PDF assay and metadata reporting — surface what's actually inside the file.
WYSIWYG canvas editor for label and packaging artwork — PDF/X-4 output, flexo support, and a full create-to-RIP workflow.
Stateless imposition-planning solver — step-and-repeat, gang, and true-shape nesting.
Content-addressed digital-asset plane — versioned blobs, a presigned data plane, and on-prem agent recall.