Marketing & Advertising·Boutique B2B demand-generation agencyConfidential

Automating Client Reports with a Claude Skill + Custom MCP Server

Collapsed a week-long, hand-built quarterly reporting process into a Claude-powered pipeline an operator triggers in plain English — ingesting CRM and multi-platform ad data, computing a full metrics model, and assembling branded, chart-linked decks. What took a week of skilled work now runs in about 20 minutes, with consistent output anyone on the team can produce.

Claude SkillsClaude APIModel Context ProtocolPythonOAuth 2.1Docker
1 Week → ~20 Min
Report Turnaround
Founder-level, team-wide
Report Quality
11+
Custom MCP Tools
2 Weeks
First Iteration Shipped

Client details are confidential under NDA. Architecture and anonymized results are shared with permission; no client names, data, or contract terms are included.

The Challenge

A B2B demand-generation marketing agency produced recurring quarterly business reviews (QBRs) for its SaaS clients entirely by hand. Each report was a multi-step slog that only the founder could reliably get right:

  • Pull deal data out of a CRM and ad spend out of several advertising platforms
  • Reconcile everything in a spreadsheet of ~100 derived formulas — pipeline-per-dollar, win rate, ACV, sales-cycle length, funnel conversion, channel attribution, goals-vs-actuals
  • Write the narrative insights from scratch
  • Hand-assemble a branded, chart-laden deck

The result was a process that took roughly a week of skilled work per report, was inconsistent in quality, and didn't scale across a growing client roster. Worse, it was a single point of failure: report quality was effectively gated on the founder's involvement.

The goal: collapse all of it into a guided, mostly-automated workflow an operator could trigger in plain English — "run the Q1 report for client X" — while keeping a human in the loop for review.

The Solution

Plenvo designed and built the system end to end: a production Claude Skill that orchestrates the full pipeline, backed by a custom Model Context Protocol (MCP) server built and deployed to handle the data integrations the platform couldn't do out of the box. First working iteration shipped in two weeks, with roughly a month of refinement and back-and-forth to production.

A Claude Skill that orchestrates the pipeline

A single packaged skill — instructions plus a suite of ~25 single-responsibility Python scripts, templates, and reference specs — running inside a Claude.ai Project. The operator triggers natural-language "verbs": onboard a client, validate inputs, run the analysis, generate the report, re-detect schema, resume.

Key decision — deterministic code, not prompt-held logic. The pipeline is a chain of small deterministic scripts (parse → validate → fetch → process → compute → generate insights → build charts → assemble → upload). Each reads JSON on stdin and writes JSON with explicit exit codes. Why: business math and reconciliation live in tested Python where they're independently verifiable and cheap to run; the model is reserved for orchestration and prose. This keeps results reproducible, minimizes token spend, and means a sandbox reset doesn't force a full re-run — steps are idempotent and resumable.

A custom MCP server for the write path

The built-in cloud-storage connector was read-only — you can't save a generated binary deck back to a client folder through it. So Plenvo built and deployed a purpose-built MCP server that:

  • Is drop-in compatible with the built-in connector's tools (same names, parameter shapes, response envelopes) — so the skill needed zero changes to swap connectors
  • Adds the ~11 operational tools the built-in lacked: folder creation, metadata updates, server-side copy, permissions, shareable links, document import, and chunked resumable uploads for large binaries
  • Handles binary uploads correctly — a MIME-aware encoder with an explicit text-vs-binary branch, avoiding the classic bug where a naïve UTF-8 path corrupts office documents
  • Speaks streamable-HTTP over OAuth 2.1 with least-privilege scopes — only what the workflow uses, no mail or calendar access — and is Dockerized for cloud deployment with a local stdio mode for desktop clients

Quality-gated insights, not vague slop

Key decision — codify what "good" means. Generated insights follow a structured Observation → Implication → Recommendation framework against a hard rubric: every insight must name a metric and number, be time-anchored, be comparative (vs. prior period / benchmark / target), and be actionable. A two-pass quality validator auto-fails banned filler and missing numbers; a deterministic fallback tags content [REVIEW NEEDED] after repeated failures. Why: the system degrades safely instead of shipping a vague report to a paying client.

Native, data-linked charts

The final decks embed native charts linked to their source tables, not static images. The skill builds chart-definition payloads sandbox-side as pure functions; the MCP issues the authenticated API calls, so the OAuth token never leaves the server. A retry layer rides out the eventual-consistency window where a just-created source briefly 404s.

Engineering quality

  • Regression eval suite (pytest) built from real connector responses and a real CRM export captured as fixtures — which caught real bugs (a script written against an imagined response shape; a fresh template falsely reporting itself as "confirmed"). PII fixtures live outside the shipped artifact.
  • Single-writer, create-newest-wins state model — mutable state modeled as immutable versioned files, a deliberate consistency choice given the storage layer's constraints. Schema drift in the CRM export is auto-detected and versioned for an audit trail instead of silent breakage.

Tech Stack

  • AI / Orchestration: Claude API, Claude Skills (Claude.ai Projects), structured prompt rubric
  • Integration: Custom MCP server, Model Context Protocol, OAuth 2.1, streamable-HTTP
  • Backend: Python, deterministic JSON-in/JSON-out scripts, pandas-style tabular processing
  • Infra & Testing: Docker, cloud PaaS deployment, pytest regression evals
  • Data sources: CRM export, multi-platform ad-data aggregation API, cloud-storage / spreadsheet / presentation REST APIs

The Impact

  • Report turnaround cut from ~1 week to ~20 minutes — the week-long, hands-on reconciliation-and-assembly loop became a plain-English trigger with a human review step at the end.
  • Consistent output, every run — because the math, rubric, and deck assembly are codified in tested code, the same inputs produce the same quality of report every time, eliminating the run-to-run variability of the manual process.
  • Founder-quality reports, team-wide — quality was previously gated on the founder. Now anyone on the team can generate the same caliber of report, because the logic lives in the system rather than in one person's head.
  • Scales across a growing client roster — onboarding a new client provisions its folder layout and goal targets; idempotent, resumable steps make repeat runs cheap and safe.
  • Consistent and safe by construction — the two-pass quality gate blocks vague output, and least-privilege OAuth keeps the system read-only on all ad platforms with no PII ever sent to the model.
  • First iteration in two weeks, full production hardening within roughly a month.

Testimonial

Testimonial pending — available on request from the client.

<!-- ### Gaps to Fill - [ ] No client testimonial quote yet — consider requesting one (template: "Plenvo turned a week-long, founder-only reporting process into something anyone on my team can run in ~20 minutes — with consistent quality every time.") - [ ] Optional harder metrics — turnaround (1 week → ~20 min) and consistency are now captured. If the agency shares more (number of clients/reports served, error-rate reduction), add them to the metrics block for extra proof. - [ ] thumbnail asset not yet created: /assets/case-studies/ai-reporting-automation-thumb.jpg -->

Have a similar challenge?

Book a free discovery call.

Book a Call
Book a Call