When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions 'A/B test,' 'split test,' 'experiment,' 'test this change,' 'variant copy,' 'multivariate test,' 'hypothesis,' 'should I test this,' 'which version is better,' 'test two versions,' 'statistic
This skill guides you through designing and executing A/B tests and experiments that yield statistically valid and actionable insights. It emphasizes forming a clear hypothesis tied to a specific business outcome, choosing one variable to test at a time, and rigorously defining sample sizes and metrics before launching. The skill also covers variant design, traffic allocation strategies, and how to interpret test results in the context of primary, secondary, and guardrail metrics to ensure decisions align with growth goals.
This skill is ideal for performance marketers planning conversion rate optimization (CRO) tests on landing pages or product features, growth leads who need to validate hypotheses before scaling changes, and agency strategists responsible for advising clients on data-driven experimentation. It suits scenarios where understanding the impact of a single change on user behavior or revenue metrics is critical, especially when balancing technical constraints, traffic volume, and timeline pressures.
Practitioners start by assessing the test context, including baseline conversion rates, traffic availability, and any technical or timing constraints. Next, they formulate a strong hypothesis using data or observations, specifying the expected outcome and target audience. Then, they select a single variable to change—such as copy, design, or CTA—and determine the appropriate traffic split and sample size to detect meaningful lifts, often referencing established calculators. Finally, they launch the test with tracking and QA in place, monitor for technical issues without peeking at interim results, and analyze outcomes against statistical significance and guardrail metrics before making implementation decisions.
How do I decide which metric to use as the primary outcome? Choose the single metric most directly tied to your hypothesis and business objective to avoid ambiguous results. What sample size do I need to detect a 15% lift from a 3% baseline conversion? Expect roughly 20,000 visitors per variant, using standard calculators for precise estimates. Can I test multiple changes at once? It’s best to test one variable at a time to isolate cause and effect; multivariate tests require significantly higher traffic and complexity.
Attach this skill to Metaflow agent tasks when you need structured guidance on planning and running A/B tests or experiments. The agent will prompt you for key inputs like hypothesis, baseline metrics, and constraints before helping design variants and traffic splits. Expect support through step-by-step workflows that enforce statistical rigor and proper documentation. This skill integrates smoothly with other analytics and growth tools to streamline your experimentation process and...
For broader context, see our roundup of claude skills marketing, and read Claude Code workflows for marketing agencies for related setup guidance.