AI Reporting Automation Case Study | Week-to-20-Minutes Client QBRs with Claude Skills + Custom MCP

The Challenge

A B2B demand-generation marketing agency produced recurring quarterly business reviews (QBRs) for its SaaS clients entirely by hand. Each report was a multi-step slog that only the founder could reliably get right:

Pull deal data out of a CRM and ad spend out of several advertising platforms
Reconcile everything in a spreadsheet of ~100 derived formulas — pipeline-per-dollar, win rate, ACV, sales-cycle length, funnel conversion, channel attribution, goals-vs-actuals
Write the narrative insights from scratch
Hand-assemble a branded, chart-laden deck

The result was a process that took roughly a week of skilled work per report, was inconsistent in quality, and didn't scale across a growing client roster. Worse, it was a single point of failure: report quality was effectively gated on the founder's involvement.

The goal: collapse all of it into a guided, mostly-automated workflow an operator could trigger in plain English — "run the Q1 report for client X" — while keeping a human in the loop for review.

The Solution

Plenvo designed and built the system end to end: a production Claude Skill that orchestrates the full pipeline, backed by a custom Model Context Protocol (MCP) server built and deployed to handle the data integrations the platform couldn't do out of the box. First working iteration shipped in two weeks, with roughly a month of refinement and back-and-forth to production.

A Claude Skill that orchestrates the pipeline

A single packaged skill — instructions plus a suite of ~25 single-responsibility Python scripts, templates, and reference specs — running inside a Claude.ai Project. The operator triggers natural-language "verbs": onboard a client, validate inputs, run the analysis, generate the report, re-detect schema, resume.

Key decision — deterministic code, not prompt-held logic. The pipeline is a chain of small deterministic scripts (parse → validate → fetch → process → compute → generate insights → build charts → assemble → upload). Each reads JSON on stdin and writes JSON with explicit exit codes. Why: business math and reconciliation live in tested Python where they're independently verifiable and cheap to run; the model is reserved for orchestration and prose. This keeps results reproducible, minimizes token spend, and means a sandbox reset doesn't force a full re-run — steps are idempotent and resumable.

A custom MCP server for the write path

The built-in cloud-storage connector was read-only — you can't save a generated binary deck back to a client folder through it. So Plenvo built and deployed a purpose-built MCP server that:

Is drop-in compatible with the built-in connector's tools (same names, parameter shapes, response envelopes) — so the skill needed zero changes to swap connectors
Adds the ~11 operational tools the built-in lacked: folder creation, metadata updates, server-side copy, permissions, shareable links, document import, and chunked resumable uploads for large binaries
Handles binary uploads correctly — a MIME-aware encoder with an explicit text-vs-binary branch, avoiding the classic bug where a naïve UTF-8 path corrupts office documents
Speaks streamable-HTTP over OAuth 2.1 with least-privilege scopes — only what the workflow uses, no mail or calendar access — and is Dockerized for cloud deployment with a local stdio mode for desktop clients

Quality-gated insights, not vague slop

Key decision — codify what "good" means. Generated insights follow a structured Observation → Implication → Recommendation framework against a hard rubric: every insight must name a metric and number, be time-anchored, be comparative (vs. prior period / benchmark / target), and be actionable. A two-pass quality validator auto-fails banned filler and missing numbers; a deterministic fallback tags content [REVIEW NEEDED] after repeated failures. Why: the system degrades safely instead of shipping a vague report to a paying client.

Native, data-linked charts

The final decks embed native charts linked to their source tables, not static images. The skill builds chart-definition payloads sandbox-side as pure functions; the MCP issues the authenticated API calls, so the OAuth token never leaves the server. A retry layer rides out the eventual-consistency window where a just-created source briefly 404s.

Engineering quality

Regression eval suite (pytest) built from real connector responses and a real CRM export captured as fixtures — which caught real bugs (a script written against an imagined response shape; a fresh template falsely reporting itself as "confirmed"). PII fixtures live outside the shipped artifact.
Single-writer, create-newest-wins state model — mutable state modeled as immutable versioned files, a deliberate consistency choice given the storage layer's constraints. Schema drift in the CRM export is auto-detected and versioned for an audit trail instead of silent breakage.

Tech Stack

AI / Orchestration: Claude API, Claude Skills (Claude.ai Projects), structured prompt rubric
Integration: Custom MCP server, Model Context Protocol, OAuth 2.1, streamable-HTTP
Backend: Python, deterministic JSON-in/JSON-out scripts, pandas-style tabular processing
Infra & Testing: Docker, cloud PaaS deployment, pytest regression evals
Data sources: CRM export, multi-platform ad-data aggregation API, cloud-storage / spreadsheet / presentation REST APIs

The Impact

Report turnaround cut from ~1 week to ~20 minutes — the week-long, hands-on reconciliation-and-assembly loop became a plain-English trigger with a human review step at the end.
Consistent output, every run — because the math, rubric, and deck assembly are codified in tested code, the same inputs produce the same quality of report every time, eliminating the run-to-run variability of the manual process.
Founder-quality reports, team-wide — quality was previously gated on the founder. Now anyone on the team can generate the same caliber of report, because the logic lives in the system rather than in one person's head.
Scales across a growing client roster — onboarding a new client provisions its folder layout and goal targets; idempotent, resumable steps make repeat runs cheap and safe.
Consistent and safe by construction — the two-pass quality gate blocks vague output, and least-privilege OAuth keeps the system read-only on all ad platforms with no PII ever sent to the model.
First iteration in two weeks, full production hardening within roughly a month.

Testimonial

Testimonial pending — available on request from the client.

Automating Client Reports with a Claude Skill + Custom MCP Server

The Challenge

The Solution

A Claude Skill that orchestrates the pipeline

A custom MCP server for the write path

Quality-gated insights, not vague slop

Native, data-linked charts

Engineering quality

Tech Stack

The Impact

Testimonial

Related Case Studies

Building an Enterprise Analytics Platform Processing 100M+ Data Points

Coaching a SaaS Team to Ship Faster with AI-Driven Development

How a Property Rental Company Automated Guest Communication with AI-Powered Multi-Channel Support

Have a similar challenge?