Document AILive

Digitizer

If you can read it, it will be digitized. Even when you can't.

Drop in a PDF or image — printed, scanned, faxed, rotated, multilingual, even handwritten — get back clean structured JSON. A hybrid pipeline combines classical OCR (Tesseract OSD, bbox geometry) with a vision-language model (Qwen3-VL-235B) so each step uses the cheapest tool that solves it correctly. Every field comes back with a bounding box, so the UI can highlight exactly what the model saw. API-key authenticated and agent-ready out of the box.

Try it with your PDF

How it works

Handles anything you point at it

Handwritten cursive, sideways scans, smudged receipts, faxes, multilingual invoices, free-form letters. No template, no schema, no labels required.

Two-tier rotation, never sideways

Tesseract OSD handles the 80% case for cents. A VLM fallback judges orientation when OSD's confidence drops. Pages always come back upright.

Tight bounding boxes via a second-pass locator

The VLM's bboxes drift 5-10%. A focused locator call snaps each box to the cell after extraction, so highlights line up with the source.

Structured JSON, not OCR text

Every field is named, typed, and tied to coordinates. Items become tables, lists become numbered lists, prose stays prose. Render it, ingest it, agent it.

Agent-ready

API-key login means an LLM agent can authenticate, upload, and consume the JSON without a browser. Swagger UI at /docs, raw spec at /openapi.json.

Tested at every layer

Rotation detection, bbox geometry, vision locator, network retry, frontend rendering — each step has unit and integration tests. Not a notebook.

Ready to try Digitizer?

Try it with your PDF