AI Financial Agent Case Study | Multi-Agent RAG System for Fintech MVP

The Challenge

A pre-seed, pre-PMF fintech startup wanted to build an AI-powered financial assistant that could connect to a user's Gmail, extract and classify financial data (transactions, subscriptions, bills, investment statements), and provide personalized financial insights and recommendations — all through a conversational interface.

The core challenges:

Financial data is scattered across hundreds of emails in different formats from different institutions
Users need accurate calculations, not LLM-hallucinated math — "How much did I spend on food last month?" requires real computation
The AI needs to understand context while working with constantly changing data
The system had to run cheaply enough for a pre-seed startup with no revenue

They needed a founding engineer who could architect the entire system and ship a production-ready MVP fast enough to raise funding.

The Solution

Full-stack + AI engineer as a founding member. 5-month engagement (April–August 2025), with a production-ready MVP delivered in the first 2 months and continued iteration for the remaining 3.

Data Ingestion Pipeline

Connected to users' Gmail accounts via OAuth and built n8n workflows to:

Pull financial emails — bank statements, transaction alerts, subscription confirmations, investment updates
Preprocess and normalize data into a consistent schema across multiple email formats
Run automated classification and preprocessing with error handling and retry logic

2-Level Classification Pipeline

Built a classification system to categorize financial data with >95% accuracy:

Level 1: Broad categorization — income, expense, investment, subscription, bill
Level 2: Granular sub-categories — food, transport, SaaS, utilities, etc.

Without accurate classification, the AI couldn't answer specific questions like "How much am I spending on subscriptions?" without pulling irrelevant context.

RAG Architecture

Context-aware retrieval system using Qdrant vector database with Voyage embeddings:

Financial data embedded and stored with metadata (date, category, amount, source)
Semantic search pulling only relevant financial records for each query
Metadata filtering to scope queries by time period, category, or source
Reduced context window usage by 10x compared to naive "load everything" approach

Tool Calling System

LLMs don't do math — tools do. Built dedicated calculation tools that the AI agent calls:

Sum/average/trend calculations across filtered datasets
Budget comparison tools (planned vs. actual)
Subscription tracking and cost projection
Investment return calculations

The agent decides when to use tools vs. when to respond conversationally, based on the query type.

Multi-Agent System (Google ADK)

Primary Agent: Handles user conversation, decides what data to fetch, formulates responses
Validation Agent: Independently verifies calculations before they reach the user — catches hallucinated numbers
Agent orchestration ensures the validation step happens transparently without slowing down the UX

n8n Workflows

n8n as the orchestration layer for data pipelines:

Scheduled pulls from Gmail API, parsing email content, extracting financial data
Routing extracted data through the 2-level classification pipeline
Triggering agent responses and tool calls
Connecting Gmail, Supabase, Qdrant, and LLM APIs
Retry logic, dead letter queues, and failure alerting

Tech Stack

Frontend: React with clean, intuitive UI for non-technical users
Backend: Node.js, FastAPI
AI/LLM: GPT-4 (complex reasoning), Gemini Flash 2.0 (classification/preprocessing)
Agent Framework: Google ADK
Vector DB: Qdrant with Voyage embeddings
Automation: n8n (self-hosted workflows)
Auth: Supabase (Gmail OAuth)

Production Challenges Solved

Context Rot

Initially loaded too much financial data into the LLM context — the AI would miss important details buried in a wall of transactions. The 2-level classification pipeline solved this: classify the query first, retrieve only relevant records via RAG. 10x context reduction while improving answer accuracy.

Hallucinated Calculations

When asked "How much did I spend on food in March?", the AI would sometimes fabricate numbers. Dedicated calculation tools with tool calling fixed this — the AI identifies that a calculation is needed, calls the appropriate tool, and a validation agent independently checks the math before the response is sent.

Stale Memory

The AI cached responses and reused them for dynamic queries — a user asking about current spending would get data from days ago. Prompt engineering forces fresh tool calls for time-sensitive queries, bypassing cached responses and triggering live data retrieval. >90% success rate on fresh data.

Cost Control

Multi-agent systems with RAG, tool calling, and validation get expensive fast. Strategic model selection — Gemini Flash 2.0 for cheaper classification and preprocessing, GPT-4 reserved for complex reasoning and user-facing responses. Optimized embedding generation and vector search. Result: $100/month total running cost.

The Impact

Production-ready MVP in 2 months — not a demo, a working product with real users
$100k in pre-seed funding raised — investors could see a working product, not just a pitch deck
$100/month running cost — proved the business model was viable at scale
>95% classification accuracy on the 2-level financial data pipeline
>90% fresh data retrieval success rate on time-sensitive queries
10x context reduction via classification + RAG, improving both accuracy and cost
5 months of iteration — continued refining the product through August 2025

Testimonial

"Yatharth played a pivotal role in helping us put up our first alpha product. He took ownership of the entire backend development end-to-end from architecting the system and setting up the infrastructure to integrating with multiple external vendors and ensured that everything worked seamlessly together, with little oversight. What stood out to me most was his ability to translate high-level product goals into a technical foundation, even linking to cost effectiveness early on, and speed of iteration to get the product to a quality baseline, pushing product/growth stakeholders to gain traction before over-building systems. This enabled us to prototype and validate a range of AI-driven use cases in financial services with users and double down for beta/open market access. His technical depth, ownership mindset, and collaborative approach made him an indispensable part of an early team building in a complex space."

— Lizann Fernandez, Founder at Amalgamic

Building an AI-Powered Financial Agent for a Pre-Seed Fintech