Marketing-native AI agent

01Agent — Performance Marketing

Lower CAC by fixing the
learning loop.

Claude Code does not know which leads close. The Performance Marketing agent connects campaigns, creative, and CRM into one operating layer that optimizes for pipeline quality, not platform proxies.

Audit your paid growth system See the performance workflow

SurfaceSearch · social · retargeting

OutputExperiments · budget memos · hygiene

Optimizes forPipeline quality · CAC payback

02The harness gap

Claude Code can answer the question. It cannot close the loop.

Generic agents do not know which leads close. They cannot weigh marginal CAC, flag incrementality risk, or design an experiment with a decision rule. Platform automation gives you cheap conversions. Pipeline quality requires a harness.

“You don't ship pipeline from a fresh context window.”

Generic agents

claude.ai · new chat

fresh context

You

What should I do with my Google Ads spend this week?

last-quarter-report.pdf· 38 pagespasted

Context window 91% full

Claude

Based on what you have shared, here are some general suggestions:

Consider pausing keywords with high CPL and low conversions.
You could test new ad copy or landing page variants.
Review your audience targeting and exclude irrelevant segments.

No rubric. No source policy. No refresh hook. No memory write.

vs.

Metaflow

performance-marketing.run

CRM synced · 1,287 leads matched

Pipeline quality scored — 14 campaigns weighted by closed-won

Experiment hypothesis

Pain-led primary text outperforms feature-led for ICP segment B.

beliefcontrolguardraildecision rule

CAC payback

4.1mo

Pipeline qual

0.91

LP match

0.69

Budget memo · marginal CAC · awaiting operator

03Old way vs. agentic way

A four-tab paid stack vs. one operating layer.

Operators jump between Claude, the ads UI, Looker, and a budget sheet. The CRM never reaches back. Lifetime CAC averages decide budget. Nothing compounds.

Tabs, dashboards, sheets

Claude · paid ads question

"What should I do with my Google Ads spend?"

No CRM access · no incrementality view

Google Ads · campaigns

Smart Bidding · Maximize Conversions

Conv = form fill (no closed-won)

Looker · dashboard

CTR ↑ CPL ↓ Conv ↑

Pipeline? /not-connected/

Q4-budget.xlsx

lifetime CAC averages

last touched 2 weeks ago

vs.

One operating layer

performance-marketing.run

CRM synced · 1,287 leads matched

Pipeline quality scored — 14 campaigns weighted by closed-won

Experiment hypothesis

Pain-led primary text outperforms feature-led for ICP segment B.

beliefcontrolguardraildecision rule

CAC payback

4.1mo

Pipeline qual

0.91

LP match

0.69

Budget memo · marginal CAC · awaiting operator

Dimension

Tabs, dashboards, sheets

One operating layer

Optimizes for

Cost per lead, platform conversions.

Pipeline quality and CAC payback.

Value signal

Form fills count. Closed-won never feeds back.

CRM closed-won feeds the platform.

Experiments

Variants without a hypothesis.

Belief, control, guardrail, decision rule. Six fields or no test.

Memory

Lessons live in slack threads.

Winners persist. Losers retire with reasoning.

04The encoded playbook

What replaces dashboard tweaking.

Four operating principles. Each one carries a method anchor and a piece of product evidence.

Audience first, channel second.

Performance improves when audience quality, suppression, and CRM feedback are strong before budget moves. Channel comes second.

Method anchor — Audience-first paid — Emily Kramer

Segment	Fit	CAC payback
ICP A · enterprise	0.92	4.1mo
ICP B · mid-market	0.84	5.6mo
ICP C · SMB long-tail	0.41	14mo

05The operating loop

From ad account ingest to memory update.

The agent reads the account, scores spend leakage and pipeline quality, designs experiments with decision rules, audits LPs, and writes outcomes to memory.

performance-marketing.run

Continuous loop

InputAd accounts · CRM · LPs

OutputPipeline-quality optimized paid system

Each run writes outcomes to memory. The next run starts with the prior decision graph and review boundary already loaded.

Performance Marketing — reliability stack

Production-grade agents need more than a clever prompt. Each layer below is required for governed autonomy.

Instructions

A skill.md file scopes mission, inputs, principles, and output contract.

Tools

Domain APIs, search, scrapers, CRMs, and platform connectors.

Memory

Workflow memory carries context, brand voice, and prior decisions.

Evaluations

Quality gates score every output against domain-specific rubrics.

Execution trace

Every tool call, decision, and rubric pass is inspectable.

Human review

Approval thresholds route risky outputs to operators for sign-off.

Feedback loop

Outcomes write back to memory so the next run starts smarter.

06The skill file

What the agent reads before every run.

A versioned, editable operating procedure. The way a senior performance lead would document their own playbook.

What is a skill file?

Every Metaflow agent is grounded in a domain-specific skill file — a structured operating procedure that defines inputs, workflows, evaluation criteria, anti-patterns, and output contracts.

The skill file is editable, versioned, and inspectable. It is not a hidden prompt.

performance-marketing.skill.md

v1.4.0 · last edited 4d ago

# Mission

Optimize paid growth against pipeline quality and CAC payback —
not platform-reported conversions.

Audience first. Value signal honest. Experiments disciplined.
Budget moved on marginal CAC. Lessons preserved.

## Optimizes for
- pipeline quality, weighted by closed-won
- CAC payback within target window
- incremental contribution, not raw conversion volume

## Does not promise
- guaranteed CAC reduction
- "fully autonomous paid growth"
- replacing the operator's judgment on offer or strategy

UTF-8 · markdown · 6 sectionsgoverned by run.evals.json

07The quality gate

Recommendations are scored before they reach an operator.

The agent does not act on the ad account directly. It scores its own recommendations and routes anything below threshold for review.

paid_quality_rubric — performance-marketing

run.evals.json

Refusal conditions

Where the agent stops or hands back instead of guessing.

CRM signal stale — no closed-won data in 30 days.
Audience overlap > 0.6 without operator override.
Experiment without a decision rule.
Budget shift exceeds 20% without explicit approval.

Autonomous

Score campaign quality
Mine search terms
Detect audience overlap
Assemble experiment cards

Recommend

Budget reallocation memo
New audience expansion
Creative variant queue
Landing page edits

Approve

Spend changes above threshold
New campaign launches
Negative keyword additions
Account structure changes

08What it produces

System evidence, not feature cards.

Five artifacts. Each one is something the agent generates, scores, or maintains.

Wasted spend

Search-term mining — last 7 days.

Continuous waste detection. Negative keywords queued for approval, never silently applied.

Term	Spend	Pipe	Action
free crm software	$214	None	Negate
jobs in marketing	$184	None	Negate
metaflow tutorial free	$162	Low	Negate
ai agents pricing	$86	High	Keep
ai marketing platform comparison	$71	High	Keep

Methodology anchors

Google value-based bidding

Paid systems perform better when they optimize against meaningful business value, not raw conversion volume.

CRM closed-won outcomes flow back to the platform.

Google experiment frameworks

Tests need clean hypotheses, controlled variables, sufficient duration, and decision rules.

Experiment cards require all six fields. The agent will not start without them.

Demand Curve creative testing

Creative testing should produce learning, not endless variants.

Each variant tests a named belief: pain, promise, proof, offer, audience, or objection.

Audience-first paid — Emily Kramer

Performance improves when the audience foundation is strong before budget moves.

Audience quality and CRM feedback are scored before scale.

CAC payback discipline

Not every reported conversion is incremental. Not every cheap lead is good.

Pipeline quality and CAC payback are the two scores that ship.

Reforge-style growth loops

Growth compounds when execution creates learning that improves the next cycle.

Winners persist as memory. Next quarter does not start from zero.

09Against the field

Not a bid optimizer. Not an ad generator. An operating layer.

Four contenders. Hover the Metaflow column to see the product evidence.

Dimension	Claude Code, Cursor Generic agents in chat windows.	Platform automation Smart bidding, broad targeting.	Performance agency Outsourced humans with platform access.	Metaflow agent Agentic operating layer with memory and evals.
Optimizes for	Whatever the prompt encodes.	Platform-reported conversions.	Whatever the strategist negotiates.	Pipeline quality and CAC payback. CRM closed-won fed back. crm.match 1,287 leads matched · 14 campaigns scored by closed-won.
Experiment discipline	Variants without a hypothesis.	Auto-experiments with hidden logic.	Sometimes structured, often not.	Six fields required. No fields, no test. experiment.compose belief · control · treatment · primary · guardrail · rule
Memory	Resets every chat.	Lives in the platform's ML.	Lives in slack and the strategist.	Winners persist. Losers retire with reasoning. memory.write “Pain-led primary text wins for ICP B” — persisted v1.4.0.
Incrementality	Not addressed.	Limited and proprietary.	Available if requested.	Surfaced before reallocation, not after. budget.compose Audience overlap 0.42 flagged · awaiting operator.
Compounding	Each chat starts at zero.	Compounds inside the platform's walls.	Compounds with the strategist who stays.	Outcomes update memory across runs and platforms. cross-platform memory Belief won on Meta · re-tested on Google · pending data.

10Where it runs

Repeatable plays the agent runs end-to-end.

Each play has the same shape: ingest, audit, score, hypothesize, design, audit LP, recommend, approve, learn.

Weekly account hygiene

Search-term mining, fatigue detection, audience overlap, stale ad sets.

Outcome

A reasoned waste report and queued negs for approval.

Wasted spend · 7d

free crm software$214

jobs in marketing$184

metaflow tutorial free$162

01 / 06

11The first session

Audit your paid growth system in one focused session.

A 30-minute working session with a Metaflow operator. We pull a sample week of spend, score campaigns against pipeline quality, and surface waste and incrementality risk.

Book the diagnostic Read the methodology

01Pipeline-quality scorecard for your top 5 active campaigns.
02Wasted spend report from a 7-day search term sample.
03LP message-match audit on your highest-spend ad set.
04A written 1-page memo with the next 3 plays we would run.

What you leave with

Pipeline-quality scorecard

brand-search

0.91

icp-a-bofu

0.88

broad-discovery

0.41

Budget memo

brand-search +18%+$580
icp-a-bofu +12%+$340
broad-discovery -22%-$960

Marginal CAC · incrementality flagged

A focused diagnostic. No slides. Walk away with a written assessment whether or not we work together.

Lower CAC by fixing the learning loop.

Claude Code can answer the question. It cannot close the loop.

A four-tab paid stack vs. one operating layer.

What replaces dashboard tweaking.

Audience first, channel second.

From ad account ingest to memory update.

What the agent reads before every run.

Recommendations are scored before they reach an operator.

ICP alignmentCampaign-level ICP weighting based on closed-won fit.0.92Pass

Value signal freshnessCRM sync recent and closed-won data available.0.97Pass

Experiment completenessAll six required fields present on every test card.1.00Pass

Guardrail definedA named guardrail metric exists and is monitorable.0.74Route for review

LP message matchSemantic match between ad belief and LP hero copy.0.69Route for review

Marginal CAC basisReallocation grounded in marginal CAC curves.0.86Pass

System evidence, not feature cards.

Search-term mining — last 7 days.

Not a bid optimizer. Not an ad generator. An operating layer.

Repeatable plays the agent runs end-to-end.

Weekly account hygiene

Audit your paid growth system in one focused session.

Lower CAC by fixing the
learning loop.