What we walked into.
Taxation documents are confidential, so third-party LLMs (OpenAI, Claude, Grok) were off the table — everything had to run on open-source models deployed to private servers. The documents include free-form handwritten text with no consistent format, plus information that had to be retrieved from government monograms.
The solution.
A dual vision-language model pipeline running entirely on the client's private infrastructure.
Extraction accuracy above 98% using only a 2B-parameter model — small enough to run economically on-premises.
What changed.
Previously four people worked full-time on review and data entry, processing about 480 documents a day combined. Each document now processes in under 5 seconds.
Throughput equivalent to 17,000+ documents per person per day — a 14,000%+ increase in processing speed.
Manual workload for this task eliminated entirely, freeing staff for higher-value work.