When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," or "hypothesis." For tracking implementation, see analytics-tracking.
This skill guides marketers through planning, designing, and implementing A/B tests to generate statistically valid and actionable results. It covers formulating a strong hypothesis, selecting appropriate test types, calculating sample sizes, defining metrics, designing variants, and managing traffic allocation. The skill also addresses common execution pitfalls such as early stopping and improper analysis, ensuring tests produce reliable insights that inform growth decisions.
This skill is ideal for performance marketers managing conversion rate optimization on landing pages or signup flows, growth leads overseeing experimentation roadmaps, and SEO or PPC operators validating messaging or design changes before scaling. Agencies running multivariate or split URL tests for clients will also benefit from its structured approach to hypothesis framing, sample sizing, and result interpretation.
First, practitioners gather context including baseline conversion rates, traffic volumes, and technical constraints to frame the test objective. Next, they develop a clear hypothesis using a structured template linking observations to expected outcomes and success metrics. Then, they select the test type (A/B, A/B/n, multivariate, or split URL) and calculate sample size based on baseline metrics and minimum detectable effect. Following that, they design variants focusing on a single meaningful change aligned with the hypothesis and decide on traffic splits ensuring consistent and balanced exposure. Finally, they pre-launch by verifying tracking and QA, monitor the test without peeking early, and analyze results using confidence intervals and guardrail metrics before making decisions.
How large should my sample size be? Sample size depends on your baseline conversion rate and expected lift; for example, detecting a 10% lift on a 5% baseline usually requires about 7,000 visitors per variant. What if I see early positive results? Stopping a test early risks false positives; always wait until the pre-calculated sample size is reached for reliable conclusions. How do I decide which metrics to track? Choose a single primary metric tied directly to your hypothesis, supported by secondary and guardrail metrics to understand broader impacts and avoid negative side effects.
Attach this skill to any Metaflow agent task involving planning or executing an A/B test or experiment by referencing relevant test context or hypotheses. Expect guided prompts to define clear test objectives, calculate sample sizes, and design variants aligned with your growth goals. The skill helps ensure your test setup includes necessary tracking and monitoring steps for valid, actionable results. You can then integrate this setup with analytics and implementation workflows to complete your experiment lifecycle.
For broader context, see our roundup of claude skills marketing, and read Claude Code workflows for marketing agencies for related setup guidance.