Document AI that reads contracts and invoices the way finance needs
A pipeline that pulls structured, finance-ready data out of commercial contracts and vendor invoices — text or scanned.
Context
Two of the most paper-bound jobs in finance are reading commercial contracts (to know what to bill, when, and how to recognise it) and processing vendor invoices (to book them correctly). Both are usually done by eye, line by line — slow, inconsistent, and exactly where revenue-recognition and booking errors creep in.
Challenge
Extract the data finance actually needs — not just "what does this document say," but milestones, payment schedules, revenue-recognition triggers, tax fields — reliably, across both clean digital PDFs and messy scanned documents, and land it in a structure the team can work from.
What I built
A document-AI pipeline with the right tool for each document type:
- Commercial / licensing contracts → a language-model extraction pass that pulls milestones, scope activities, payment schedules and revenue-recognition flags into a structured master tracker.
- Vendor invoices and import documents → a layout-aware extraction pass (digital text where available, a vision model as fallback for scans) that captures the booking-relevant fields and supports reconciliation against the ERP.
The output is structured data, not a summary — it feeds the billing checklist, the revenue register, and the booking process directly. Crucially, the extraction is shaped by the *accounting* questions, so it surfaces the things that change a number, not just the things that are easy to read.
Outcome
Contract and invoice data captured in minutes and in a consistent shape, instead of read by hand and re-keyed. The errors that come from a missed milestone or a mis-read field move from "found at audit" to "caught at intake."