Receipt Tracker Hero

From Spreadsheet Hell to building an SaaS AI Receipt Tracker

The Problem That Started Everything

I was tired of shoebox receipts and spreadsheet hell. Every month, the same ritual: dig through paper receipts, manually type numbers into Excel, pray I didn't miss any for tax season.

So I did what any developer would do—I built an AI-powered solution. But here's the twist: I didn't just build a demo. I built a real SaaS product that handles auth, async processing, usage limits, and scales to thousands of users.

This post covers the evolution of a basic receipt parser into something much larger.

What I Built

Receipt Tracker is a "Drop & Forget" scanner that turns messy receipt photos into structured data.

Receipt Tracker Main Interface

The experience:

Drop a receipt image (or PDF)
See "Processing..." for ~2 seconds
Get back: merchant name, date, total, and every single line item
Export to CSV, track spending, done.

But the real magic happens behind the scenes:

Users never wait for AI (async workflows)
Failed uploads automatically retry (no "try again" buttons)
Free users get 10 scans/month without breaking my bank
Type-safe from browser to database to AI
Real-time dashboard updates (no refresh needed)

Let me show you how I built each piece.

Architecture: The Big Picture

Here's the complete data flow:

📱 User drops receipt 
  ↓
☁️ Upload to Convex edge storage (direct, bypassing my server)
  ↓
💾 Save metadata: { status: "pending", fileId, userId }
  ↓
🔔 Trigger Inngest event
  ↓
🤖 AI Agent Network extracts data (Gemini 2.0)
  ↓
✅ Validate with Zod → Save to database
  ↓
📊 UI updates automatically (real-time subscription)

The key decision: I separated upload from processing. Users get instant feedback, AI works in background, UI updates magically.

The Stack (And Why I Chose It)

System Architecture Diagram

Next.js 15 + Convex = Backend Without the Backend

I didn't want to manage PostgreSQL, write migrations, or deal with ORMs.

Convex gave me:

Type-safe database in TypeScript (no SQL)
Real-time subscriptions out of the box
Built-in file storage with global CDN
Zero DevOps

Example: Define schema, get instant TypeScript types:

recipts: defineTable({
  status: v.string(), // 'pending' | 'proceed' | 'error'
  merchantName: v.optional(v.string()),
  items: v.array(v.object({
    name: v.string(),
    quantity: v.number(),
    totalPrice: v.number()
  }))
})

No migrations. No connection pools. Just data shapes.

Gemini 2.0 Flash Lite = Speed + Cost

Why not GPT-4 Vision?

Performance comparison (1000 receipts/month):

Gemini 2.0: ~2 seconds, ~$8
GPT-4 Vision: ~15 seconds, ~$120

Users won't wait 15 seconds. They'll close the tab and never come back.

Inngest = Reliability That Just Works

My first version used API routes. It was a disaster:

Timeouts on large files
No retries on failures
Can't track progress

Inngest fixed everything:

Automatic retries with exponential backoff
Background processing (users never wait)
Built-in observability

The Schema: State Management is Everything

The most important field in my database:

status: v.string() // 'pending' | 'proceed' | 'error'

This single field drives my entire UX:

pending → Show spinner
proceed → Display extracted data
error → Show retry button

Pro tip: Use strings, not booleans. You'll add more states later (I added "retrying" and "expired" after launch).

Structured Data > JSON Blobs

I store items as typed arrays:

items: v.array(v.object({
  name: v.string(),
  quantity: v.number(),
  unitPrice: v.number()
}))

Why this matters:

Query: "Show all Starbucks receipts over $20"
Aggregate: "What's my average spending per item?"
Export: Generate CSV without parsing JSON

Structure data when it enters your system, not when you need it.

The Brain: Why I Use Agent Networks

Most tutorials show one function that does everything. That's not production.

Agent Network Architecture

I built three specialized agents:

1. Supervisor Agent (The coordinator)

Routes tasks between agents
Checks: "Are we done? Did it save?"
Terminates workflow on success

2. Scanner Agent (The AI whisperer)

Converts PDFs/images to structured JSON
Handles OCR errors, multi-page receipts
Calls Gemini 2.0 with schema validation

3. Database Agent (The validator)

Validates data with Zod
Saves to Convex
Tracks usage in Schematic (billing)

The win: When extraction fails, I know it's the Scanner Agent. When saving fails, I know it's Database Agent. Debugging is 10x easier.

The Prompt: Forcing Structured Output

Amateur approach: "Extract data from this receipt"

My approach: Force the exact schema in the prompt:

Extract and return THIS EXACT JSON:
{
  "merchant": { "name": "...", "address": "..." },
  "transaction": { "date": "YYYY-MM-DD" },
  "items": [{ "name": "...", "quantity": 2, "totalPrice": 10.50 }]
}

Then wrap with Zod validation:

z.object({
  merchantName: z.string(),
  transactionAmount: z.string(),
  items: z.array(z.object({ ... }))
})

What happens on failure?

Zod catches bad data before database
Inngest retries automatically
I get logs with exact validation error

The lesson: Trust, but verify. Even best AI models hallucinate.

The Upload Flow: Security + Performance

What I DON'T do:

User → My API route → Storage

Problems: bottleneck, timeouts, double bandwidth costs.

What I DO:

User → Signed URL → Convex edge storage (direct)

The flow:

Call generateUploadUrl() (returns temporary signed URL)
Browser uploads directly to Convex edge
Save metadata to database
Trigger background job

Why signed URLs?

Expire in 60 seconds (secure)
Scoped to one upload (can't abuse)
Global CDN (fast worldwide)
My server never touches the 10MB file

Real-Time UI: No Polling Needed

Old way (polling):

setInterval(() => {
  fetch('/api/status')
}, 2000)

Problems: Wastes API calls, laggy updates, drains batteries.

My way (Convex subscriptions):

const receipt = useQuery(api.recipts.getReceiptById, { id })

What happens:

User uploads → status: "pending"
UI shows spinner (automatic)
Background job processes → status: "proceed"
UI updates instantly (no refresh!)

Users think it's magic. It's just reactive data.

Monetization: Don't Let Free Users Bankrupt You

The trap: I launched with "10 free scans/month"

What happened:

One user created 5 accounts with temp emails
50 free scans = $4 in API costs
Multiply by 100 users...

My Two-Layer Defense

Frontend Gatekeeper:

const { data: usage } = useSchematicEntitlement("scans")
if (usage?.exceeded) return showUpgradeModal()

Backend Accountant:

// Only count SUCCESSFUL scans
await client.track({
  event: "scan",
  user: { id: userId }
})

Why track on success?

Failed scans don't count
Users don't pay for my bugs
Builds trust

Error Handling: The Unsexy Critical Part

What Can Go Wrong

Upload: File too large, unsupported format, network timeout AI: Blurry image, foreign language, not a receipt Database: Validation error, race condition, quota exceeded

My Retry Strategy

Automatic Retry Mechanism

Inngest automatically retries:

Attempt 1: Immediate
Attempt 2: 1 second
Attempt 3: 2 seconds
Attempt 4: 4 seconds
... up to 10 attempts

But I decide what to retry:

✅ Retry: Network errors, rate limits, AI timeouts
❌ Fail fast: Validation errors, corrupted files, permissions

The pattern:

if (error.isRetryable) {
  throw error // Let Inngest retry
} else {
  await updateStatus("error", error.message)
  return // Don't waste retries
}

The Dashboard: Making Data Useful

My dashboard answers:

"How much did I spend this month?" → $1,847.23
"Which merchant do I visit most?" → Starbucks (12 times)
"What am I buying repeatedly?" → Oat milk lattes

The magic: Real-time updates without polling.

When a receipt finishes processing:

Background job updates status
Convex pushes change to all connected clients
Dashboard updates automatically
Badge changes yellow → green

No refresh button. Just works.

Production Checklist (What I Learned)

Before launching, I verified:

✅ Security:

Clerk authentication
Authorization (users only see their receipts)
Signed upload URLs
Zod validation everywhere

✅ Performance:

Direct edge uploads
Async processing
Real-time subscriptions
Optimistic updates

✅ Reliability:

Automatic retries
Error boundaries
Status tracking
Logging + monitoring

✅ Business:

Usage metering (Schematic)
Upgrade flows
Analytics
Cost alerts

Key Lessons: Demo vs Production

1. Separate Upload from Processing

Users get instant feedback. AI works in background. Game changer.

2. Use Agent Networks, Not One Function

When something fails, I know exactly where. Debugging is 10x easier.

3. Type Safety Everywhere

Zod validates AI outputs, form inputs, environment variables. Catches bugs before production.

4. Real-Time > Polling

Users expect instant updates. Convex makes this trivial.

5. Meter Usage From Day One

Retrofitting is exponentially harder than building it in from start.

6. Fail Gracefully

Every operation can fail. Automatic retries + clear errors = professional app.

The Results

After 3 months:

2,847 users
10,349 receipts processed
$127 in AI costs
99.2% extraction accuracy
4.8/5 star rating

What users love:

"Feels instant even though it's AI"
"Never seen my receipts this organized"
"Just works. No weird errors."

What surprised me:

Real-time updates create "wow" moments
Users trust the system because failed scans don't count
Agent architecture makes debugging 10x easier

What's Next

Features I'm adding:

CSV export for QuickBooks/Xero
Team accounts (share with accountant)
Categorization agent (auto-tag meals, travel, supplies)
Budget alerts
Mobile app (same Convex backend)

The bigger picture: This architecture works for any AI-powered SaaS:

Document analysis (contracts, invoices)
Image processing (ID verification, damage assessment)
Content generation (reports, summaries)

The pattern is universal:

User submits input
Store metadata immediately
Process async with AI
Validate & persist
Update UI in real-time
Meter usage

Conclusion: What Makes It Production-Ready

The difference between a demo and a product isn't the AI model.

It's:

Reliability of background processing
Security of file uploads
Clarity of error messages
Fairness of usage limits
Speed of real-time updates

I didn't just build a receipt scanner. I built a system that:

Handles 10,000+ receipts without breaking
Costs $0.01 per scan (sustainable)
Users trust (failed scans don't count)
I can debug (agent separation)
Scales without rewrites (agent architecture)

The real lesson: Production AI isn't about the model. It's about the system around it.

Want to build your own AI SaaS? The architecture I shared scales to any document/image processing app. Start with this foundation, swap in your own AI logic, and you're 80% there.

GitHub: https://github.com/realsudarshan/extracter Live Demo: https://reciept-extracter.vercel.app/ Questions? @realsudarshan Happy building! 🚀

P.S. - The biggest mistake I almost made? Trying to handle everything in API routes. Switching to Inngest + agent networks was the best architectural decision I made. Your future self will thank you.