Marketing-native AI agent
01Agent — Performance Marketing

Lower CAC by fixing the learning loop.

Claude Code does not know which leads close. The Performance Marketing agent connects campaigns, creative, and CRM into one operating layer that optimizes for pipeline quality, not platform proxies.

SurfaceSearch · social · retargeting
OutputExperiments · budget memos · hygiene
Optimizes forPipeline quality · CAC payback
SEO Optimizer

Run the weekly search-term review on metaflow.life — surface rising queries, find new themes, draft RSA copy.

I'll break this into agent steps and execute with your guardrails.

Browsing
Search Term Puller
Opportunity Scorer
RSA Drafter
Search Term PullerSonnet 4.6
Working…
Opportunity ScorerSonnet 4.6

Queued…

RSA DrafterSonnet 4.6

Queued…

02The harness gap

Claude Code can answer the question. It cannot close the loop.

Generic agents do not know which leads close. They cannot weigh marginal CAC, flag incrementality risk, or design an experiment with a decision rule. Platform automation gives you cheap conversions. Pipeline quality requires a harness.

You don't ship pipeline from a fresh context window.
Generic agents
claude.ai · new chat
fresh context
You

What should I do with my Google Ads spend this week?

last-quarter-report.pdf· 38 pagespasted
Context window 91% full
Claude

Based on what you have shared, here are some general suggestions:

  • Consider pausing keywords with high CPL and low conversions.
  • You could test new ad copy or landing page variants.
  • Review your audience targeting and exclude irrelevant segments.
No rubric. No source policy. No refresh hook. No memory write.
vs.
Metaflow
performance-marketing.run
CRM synced · 1,287 leads matched
Pipeline quality scored — 14 campaigns weighted by closed-won
Experiment hypothesis

Pain-led primary text outperforms feature-led for ICP segment B.

beliefcontrolguardraildecision rule
CAC payback
4.1mo
Pipeline qual
0.91
LP match
0.69
Budget memo · marginal CAC · awaiting operator
03Old way vs. agentic way

A four-tab paid stack vs. one operating layer.

Operators jump between Claude, the ads UI, Looker, and a budget sheet. The CRM never reaches back. Lifetime CAC averages decide budget. Nothing compounds.

Tabs, dashboards, sheets
Claude · paid ads question

"What should I do with my Google Ads spend?"

No CRM access · no incrementality view

Google Ads · campaigns

Smart Bidding · Maximize Conversions

Conv = form fill (no closed-won)

Looker · dashboard

CTR ↑ CPL ↓ Conv ↑

Pipeline? /not-connected/

Q4-budget.xlsx

lifetime CAC averages

last touched 2 weeks ago

vs.
One operating layer
performance-marketing.run
CRM synced · 1,287 leads matched
Pipeline quality scored — 14 campaigns weighted by closed-won
Experiment hypothesis

Pain-led primary text outperforms feature-led for ICP segment B.

beliefcontrolguardraildecision rule
CAC payback
4.1mo
Pipeline qual
0.91
LP match
0.69
Budget memo · marginal CAC · awaiting operator
Dimension
Tabs, dashboards, sheets
One operating layer
Optimizes for
Cost per lead, platform conversions.
Pipeline quality and CAC payback.
Value signal
Form fills count. Closed-won never feeds back.
CRM closed-won feeds the platform.
Experiments
Variants without a hypothesis.
Belief, control, guardrail, decision rule. Six fields or no test.
Memory
Lessons live in slack threads.
Winners persist. Losers retire with reasoning.
04The encoded playbook

What replaces dashboard tweaking.

Four operating principles. Each one carries a method anchor and a piece of product evidence.

Audience first, channel second.

Performance improves when audience quality, suppression, and CRM feedback are strong before budget moves. Channel comes second.

Method anchor — Audience-first paid — Emily Kramer

SegmentFitCAC payback
ICP A · enterprise0.924.1mo
ICP B · mid-market0.845.6mo
ICP C · SMB long-tail0.4114mo
05The operating loop

From ad account ingest to memory update.

The agent reads the account, scores spend leakage and pipeline quality, designs experiments with decision rules, audits LPs, and writes outcomes to memory.

performance-marketing.run
Continuous loop
InputAd accounts · CRM · LPs
OutputPipeline-quality optimized paid system

Each run writes outcomes to memory. The next run starts with the prior decision graph and review boundary already loaded.

Performance Marketing — reliability stack

Production-grade agents need more than a clever prompt. Each layer below is required for governed autonomy.

01
Instructions
A skill.md file scopes mission, inputs, principles, and output contract.
02
Tools
Domain APIs, search, scrapers, CRMs, and platform connectors.
03
Memory
Workflow memory carries context, brand voice, and prior decisions.
04
Evaluations
Quality gates score every output against domain-specific rubrics.
05
Execution trace
Every tool call, decision, and rubric pass is inspectable.
06
Human review
Approval thresholds route risky outputs to operators for sign-off.
07
Feedback loop
Outcomes write back to memory so the next run starts smarter.
06The skill file

What the agent reads before every run.

A versioned, editable operating procedure. The way a senior performance lead would document their own playbook.
What is a skill file?

Every Metaflow agent is grounded in a domain-specific skill file — a structured operating procedure that defines inputs, workflows, evaluation criteria, anti-patterns, and output contracts.

The skill file is editable, versioned, and inspectable. It is not a hidden prompt.

performance-marketing.skill.md
v1.4.0 · last edited 4d ago
# Mission

Optimize paid growth against pipeline quality and CAC payback —
not platform-reported conversions.

Audience first. Value signal honest. Experiments disciplined.
Budget moved on marginal CAC. Lessons preserved.

## Optimizes for
- pipeline quality, weighted by closed-won
- CAC payback within target window
- incremental contribution, not raw conversion volume

## Does not promise
- guaranteed CAC reduction
- "fully autonomous paid growth"
- replacing the operator's judgment on offer or strategy
UTF-8 · markdown · 6 sectionsgoverned by run.evals.json
07The quality gate

Recommendations are scored before they reach an operator.

The agent does not act on the ad account directly. It scores its own recommendations and routes anything below threshold for review.

paid_quality_rubric — performance-marketing
run.evals.json

Refusal conditions

Where the agent stops or hands back instead of guessing.

  • CRM signal stale — no closed-won data in 30 days.
  • Audience overlap > 0.6 without operator override.
  • Experiment without a decision rule.
  • Budget shift exceeds 20% without explicit approval.
Autonomous
  • Score campaign quality
  • Mine search terms
  • Detect audience overlap
  • Assemble experiment cards
Recommend
  • Budget reallocation memo
  • New audience expansion
  • Creative variant queue
  • Landing page edits
Approve
  • Spend changes above threshold
  • New campaign launches
  • Negative keyword additions
  • Account structure changes
08What it produces

System evidence, not feature cards.

Five artifacts. Each one is something the agent generates, scores, or maintains.

Wasted spend

Search-term mining — last 7 days.

Continuous waste detection. Negative keywords queued for approval, never silently applied.

TermSpendPipeAction
free crm software$214NoneNegate
jobs in marketing$184NoneNegate
metaflow tutorial free$162LowNegate
ai agents pricing$86HighKeep
ai marketing platform comparison$71HighKeep
Methodology anchors

Google value-based bidding

Paid systems perform better when they optimize against meaningful business value, not raw conversion volume.

CRM closed-won outcomes flow back to the platform.

Google experiment frameworks

Tests need clean hypotheses, controlled variables, sufficient duration, and decision rules.

Experiment cards require all six fields. The agent will not start without them.

Demand Curve creative testing

Creative testing should produce learning, not endless variants.

Each variant tests a named belief: pain, promise, proof, offer, audience, or objection.

Audience-first paid — Emily Kramer

Performance improves when the audience foundation is strong before budget moves.

Audience quality and CRM feedback are scored before scale.

CAC payback discipline

Not every reported conversion is incremental. Not every cheap lead is good.

Pipeline quality and CAC payback are the two scores that ship.

Reforge-style growth loops

Growth compounds when execution creates learning that improves the next cycle.

Winners persist as memory. Next quarter does not start from zero.

09Against the field

Not a bid optimizer. Not an ad generator. An operating layer.

Four contenders. Hover the Metaflow column to see the product evidence.

Dimension
Claude Code, Cursor

Generic agents in chat windows.

Platform automation

Smart bidding, broad targeting.

Performance agency

Outsourced humans with platform access.

Metaflow agent

Agentic operating layer with memory and evals.

Optimizes forWhatever the prompt encodes.Platform-reported conversions.Whatever the strategist negotiates.
Pipeline quality and CAC payback. CRM closed-won fed back.
Experiment disciplineVariants without a hypothesis.Auto-experiments with hidden logic.Sometimes structured, often not.
Six fields required. No fields, no test.
MemoryResets every chat.Lives in the platform's ML.Lives in slack and the strategist.
Winners persist. Losers retire with reasoning.
IncrementalityNot addressed.Limited and proprietary.Available if requested.
Surfaced before reallocation, not after.
CompoundingEach chat starts at zero.Compounds inside the platform's walls.Compounds with the strategist who stays.
Outcomes update memory across runs and platforms.
10Where it runs

Repeatable plays the agent runs end-to-end.

Each play has the same shape: ingest, audit, score, hypothesize, design, audit LP, recommend, approve, learn.

01

Weekly account hygiene

Search-term mining, fatigue detection, audience overlap, stale ad sets.

Outcome

A reasoned waste report and queued negs for approval.

Wasted spend · 7d
free crm software$214
jobs in marketing$184
metaflow tutorial free$162
01 / 06
11The first session

Audit your paid growth system in one focused session.

A 30-minute working session with a Metaflow operator. We pull a sample week of spend, score campaigns against pipeline quality, and surface waste and incrementality risk.

  • 01Pipeline-quality scorecard for your top 5 active campaigns.
  • 02Wasted spend report from a 7-day search term sample.
  • 03LP message-match audit on your highest-spend ad set.
  • 04A written 1-page memo with the next 3 plays we would run.
What you leave with
Pipeline-quality scorecard
brand-search
0.91
icp-a-bofu
0.88
broad-discovery
0.41
Budget memo
  • brand-search +18%+$580
  • icp-a-bofu +12%+$340
  • broad-discovery -22%-$960
Marginal CAC · incrementality flagged

A focused diagnostic. No slides. Walk away with a written assessment whether or not we work together.