The Challenge
AyuHealth's business model hinged on processing medical bills and surgery insurance cases faster than anyone else. But insurance document processing is a nightmare — hospitals submit poorly-scanned paperwork across 10+ document categories, each requiring different compliance checks and data extraction.
The manual process took 5 hours per case. At that speed, the unit economics didn't work. Without automation, AyuHealth couldn't scale — and without scale, the Series A pitch had no teeth.
They needed:
- Automated classification across 10+ document categories with near-human accuracy
- Compliance checking and data extraction that could handle the reality of messy hospital paperwork
- Processing speed that would make hospital partners choose AyuHealth over competitors
- Cost-efficient AI — burning $5k/month on LLM calls wasn't sustainable at startup scale
The Solution
Built the entire document AI pipeline as founding engineer — from image preprocessing to classification to extraction to compliance checking.
Three-Stage AI Pipeline
The key architectural decision: don't run expensive Document AI on every page. Medical submissions often contain 50+ pages, but only 10-15 are relevant for processing.
- Stage 1: Document Classification — classify each incoming page into its document category (discharge summary, bill, prescription, lab report, insurance form, etc.) to understand what you're dealing with before any extraction begins
- Stage 2: Relevance Triage — ChatGPT identifies which classified pages are actually relevant for processing based on content signals — fast, cheap filtering
- Stage 3: Deep Extraction — Google Document AI runs extraction only on the relevant, pre-classified pages that matter
This three-stage approach cut LLM costs from $5k to $2k/month (60% savings) without sacrificing accuracy. By classifying first, the system knew exactly what type of extraction to run on each page — and by triaging second, it avoided running expensive extraction on irrelevant pages entirely.
Document Classification & Extraction
- Google Document AI + Vision API + Gemini for multi-model classification across 10+ categories — discharge summaries, bills, prescriptions, lab reports, insurance forms
- Custom OCR pipeline for edge cases where standard models struggled with handwritten notes or degraded scans
- >95% classification accuracy — high enough that human reviewers only needed to check edge cases
Image Preprocessing
Hospital paperwork arrives in every condition imaginable — skewed scans, low contrast, partial pages, phone photos of documents.
- OpenCV-based preprocessing pipeline — deskewing, contrast enhancement, noise reduction, bounding box detection
- Document readability improved to >90% before hitting the classification models
- Visual annotation with bounding boxes on source documents so reviewers could see exactly what the AI extracted and from where
The Impact
- Processing time: 5 hours → 1 hour (80% faster) — hospital partners reported the fastest processing they had ever seen
- 30,000+ documents processed daily at production scale
- >95% classification accuracy across 10+ document categories
- LLM costs cut 60% ($5k → $2k/month) through intelligent three-stage processing
- Core revenue engine — this system's speed became AyuHealth's primary competitive advantage and a key part of every investor pitch through $100M+ in funding
Testimonial
"Plenvo didn't just build our tech — they were our tech team. The speed and quality of what they delivered was instrumental to every fundraise we did."
— Founding Team, AyuHealth