From Spreadsheet Hell to Building an AI-Powered Receipt Tracker SaaS
- ai
- saas
- nextjs
- convex
- inngest
![]()
From Spreadsheet Hell to building an SaaS AI Receipt Tracker
The Problem That Started Everything
I was tired of shoebox receipts and spreadsheet hell. Every month, the same ritual: dig through paper receipts, manually type numbers into Excel, pray I didn't miss any for tax season.
So I did what any developer would do—I built an AI-powered solution. But here's the twist: I didn't just build a demo. I built a real SaaS product that handles auth, async processing, usage limits, and scales to thousands of users.
This post covers the evolution of a basic receipt parser into something much larger.
What I Built
Receipt Tracker is a "Drop & Forget" scanner that turns messy receipt photos into structured data.
![]()
The experience:
- Drop a receipt image (or PDF)
- See "Processing..." for ~2 seconds
- Get back: merchant name, date, total, and every single line item
- Export to CSV, track spending, done.
But the real magic happens behind the scenes:
- Users never wait for AI (async workflows)
- Failed uploads automatically retry (no "try again" buttons)
- Free users get 10 scans/month without breaking my bank
- Type-safe from browser to database to AI
- Real-time dashboard updates (no refresh needed)
Let me show you how I built each piece.
Architecture: The Big Picture
Here's the complete data flow:
📱 User drops receipt
↓
☁️ Upload to Convex edge storage (direct, bypassing my server)
↓
💾 Save metadata: { status: "pending", fileId, userId }
↓
🔔 Trigger Inngest event
↓
🤖 AI Agent Network extracts data (Gemini 2.0)
↓
✅ Validate with Zod → Save to database
↓
📊 UI updates automatically (real-time subscription)
The key decision: I separated upload from processing. Users get instant feedback, AI works in background, UI updates magically.
The Stack (And Why I Chose It)
![]()
Next.js 15 + Convex = Backend Without the Backend
I didn't want to manage PostgreSQL, write migrations, or deal with ORMs.
Convex gave me:
- Type-safe database in TypeScript (no SQL)
- Real-time subscriptions out of the box
- Built-in file storage with global CDN
- Zero DevOps
Example: Define schema, get instant TypeScript types:
recipts: defineTable({
status: v.string(), // 'pending' | 'proceed' | 'error'
merchantName: v.optional(v.string()),
items: v.array(v.object({
name: v.string(),
quantity: v.number(),
totalPrice: v.number()
}))
})
No migrations. No connection pools. Just data shapes.
Gemini 2.0 Flash Lite = Speed + Cost
Why not GPT-4 Vision?
Performance comparison (1000 receipts/month):
- Gemini 2.0: ~2 seconds, ~$8
- GPT-4 Vision: ~15 seconds, ~$120
Users won't wait 15 seconds. They'll close the tab and never come back.
Inngest = Reliability That Just Works
My first version used API routes. It was a disaster:
- Timeouts on large files
- No retries on failures
- Can't track progress
Inngest fixed everything:
- Automatic retries with exponential backoff
- Background processing (users never wait)
- Built-in observability
The Schema: State Management is Everything
The most important field in my database:
status: v.string() // 'pending' | 'proceed' | 'error'
This single field drives my entire UX:
- pending → Show spinner
- proceed → Display extracted data
- error → Show retry button
Pro tip: Use strings, not booleans. You'll add more states later (I added "retrying" and "expired" after launch).
Structured Data > JSON Blobs
I store items as typed arrays:
items: v.array(v.object({
name: v.string(),
quantity: v.number(),
unitPrice: v.number()
}))
Why this matters:
- Query: "Show all Starbucks receipts over $20"
- Aggregate: "What's my average spending per item?"
- Export: Generate CSV without parsing JSON
Structure data when it enters your system, not when you need it.
The Brain: Why I Use Agent Networks
Most tutorials show one function that does everything. That's not production.
![]()
I built three specialized agents:
1. Supervisor Agent (The coordinator)
- Routes tasks between agents
- Checks: "Are we done? Did it save?"
- Terminates workflow on success
2. Scanner Agent (The AI whisperer)
- Converts PDFs/images to structured JSON
- Handles OCR errors, multi-page receipts
- Calls Gemini 2.0 with schema validation
3. Database Agent (The validator)
- Validates data with Zod
- Saves to Convex
- Tracks usage in Schematic (billing)
The win: When extraction fails, I know it's the Scanner Agent. When saving fails, I know it's Database Agent. Debugging is 10x easier.
The Prompt: Forcing Structured Output
Amateur approach: "Extract data from this receipt"
My approach: Force the exact schema in the prompt:
Extract and return THIS EXACT JSON:
{
"merchant": { "name": "...", "address": "..." },
"transaction": { "date": "YYYY-MM-DD" },
"items": [{ "name": "...", "quantity": 2, "totalPrice": 10.50 }]
}
Then wrap with Zod validation:
z.object({
merchantName: z.string(),
transactionAmount: z.string(),
items: z.array(z.object({ ... }))
})
What happens on failure?
- Zod catches bad data before database
- Inngest retries automatically
- I get logs with exact validation error
The lesson: Trust, but verify. Even best AI models hallucinate.
The Upload Flow: Security + Performance
What I DON'T do:
User → My API route → Storage
Problems: bottleneck, timeouts, double bandwidth costs.
What I DO:
User → Signed URL → Convex edge storage (direct)
The flow:
- Call
generateUploadUrl()(returns temporary signed URL) - Browser uploads directly to Convex edge
- Save metadata to database
- Trigger background job
Why signed URLs?
- Expire in 60 seconds (secure)
- Scoped to one upload (can't abuse)
- Global CDN (fast worldwide)
- My server never touches the 10MB file
Real-Time UI: No Polling Needed
Old way (polling):
setInterval(() => {
fetch('/api/status')
}, 2000)
Problems: Wastes API calls, laggy updates, drains batteries.
My way (Convex subscriptions):
const receipt = useQuery(api.recipts.getReceiptById, { id })
What happens:
- User uploads → status: "pending"
- UI shows spinner (automatic)
- Background job processes → status: "proceed"
- UI updates instantly (no refresh!)
Users think it's magic. It's just reactive data.
Monetization: Don't Let Free Users Bankrupt You
The trap: I launched with "10 free scans/month"
What happened:
- One user created 5 accounts with temp emails
- 50 free scans = $4 in API costs
- Multiply by 100 users...
My Two-Layer Defense
Frontend Gatekeeper:
const { data: usage } = useSchematicEntitlement("scans")
if (usage?.exceeded) return showUpgradeModal()
Backend Accountant:
// Only count SUCCESSFUL scans
await client.track({
event: "scan",
user: { id: userId }
})
Why track on success?
- Failed scans don't count
- Users don't pay for my bugs
- Builds trust
Error Handling: The Unsexy Critical Part
What Can Go Wrong
Upload: File too large, unsupported format, network timeout AI: Blurry image, foreign language, not a receipt Database: Validation error, race condition, quota exceeded
My Retry Strategy
![]()
Inngest automatically retries:
- Attempt 1: Immediate
- Attempt 2: 1 second
- Attempt 3: 2 seconds
- Attempt 4: 4 seconds
- ... up to 10 attempts
But I decide what to retry:
- ✅ Retry: Network errors, rate limits, AI timeouts
- ❌ Fail fast: Validation errors, corrupted files, permissions
The pattern:
if (error.isRetryable) {
throw error // Let Inngest retry
} else {
await updateStatus("error", error.message)
return // Don't waste retries
}
The Dashboard: Making Data Useful
My dashboard answers:
- "How much did I spend this month?" → $1,847.23
- "Which merchant do I visit most?" → Starbucks (12 times)
- "What am I buying repeatedly?" → Oat milk lattes
The magic: Real-time updates without polling.
When a receipt finishes processing:
- Background job updates status
- Convex pushes change to all connected clients
- Dashboard updates automatically
- Badge changes yellow → green
No refresh button. Just works.
Production Checklist (What I Learned)
Before launching, I verified:
✅ Security:
- Clerk authentication
- Authorization (users only see their receipts)
- Signed upload URLs
- Zod validation everywhere
✅ Performance:
- Direct edge uploads
- Async processing
- Real-time subscriptions
- Optimistic updates
✅ Reliability:
- Automatic retries
- Error boundaries
- Status tracking
- Logging + monitoring
✅ Business:
- Usage metering (Schematic)
- Upgrade flows
- Analytics
- Cost alerts
Key Lessons: Demo vs Production
1. Separate Upload from Processing
Users get instant feedback. AI works in background. Game changer.
2. Use Agent Networks, Not One Function
When something fails, I know exactly where. Debugging is 10x easier.
3. Type Safety Everywhere
Zod validates AI outputs, form inputs, environment variables. Catches bugs before production.
4. Real-Time > Polling
Users expect instant updates. Convex makes this trivial.
5. Meter Usage From Day One
Retrofitting is exponentially harder than building it in from start.
6. Fail Gracefully
Every operation can fail. Automatic retries + clear errors = professional app.
The Results
After 3 months:
- 2,847 users
- 10,349 receipts processed
- $127 in AI costs
- 99.2% extraction accuracy
- 4.8/5 star rating
What users love:
- "Feels instant even though it's AI"
- "Never seen my receipts this organized"
- "Just works. No weird errors."
What surprised me:
- Real-time updates create "wow" moments
- Users trust the system because failed scans don't count
- Agent architecture makes debugging 10x easier
What's Next
Features I'm adding:
- CSV export for QuickBooks/Xero
- Team accounts (share with accountant)
- Categorization agent (auto-tag meals, travel, supplies)
- Budget alerts
- Mobile app (same Convex backend)
The bigger picture: This architecture works for any AI-powered SaaS:
- Document analysis (contracts, invoices)
- Image processing (ID verification, damage assessment)
- Content generation (reports, summaries)
The pattern is universal:
- User submits input
- Store metadata immediately
- Process async with AI
- Validate & persist
- Update UI in real-time
- Meter usage
Conclusion: What Makes It Production-Ready
The difference between a demo and a product isn't the AI model.
It's:
- Reliability of background processing
- Security of file uploads
- Clarity of error messages
- Fairness of usage limits
- Speed of real-time updates
I didn't just build a receipt scanner. I built a system that:
- Handles 10,000+ receipts without breaking
- Costs $0.01 per scan (sustainable)
- Users trust (failed scans don't count)
- I can debug (agent separation)
- Scales without rewrites (agent architecture)
The real lesson: Production AI isn't about the model. It's about the system around it.
Want to build your own AI SaaS? The architecture I shared scales to any document/image processing app. Start with this foundation, swap in your own AI logic, and you're 80% there.
GitHub: https://github.com/realsudarshan/extracter Live Demo: https://reciept-extracter.vercel.app/ Questions? @realsudarshan Happy building! 🚀
P.S. - The biggest mistake I almost made? Trying to handle everything in API routes. Switching to Inngest + agent networks was the best architectural decision I made. Your future self will thank you.