When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "should I test this," "which version is better," "test two versions," "statistical significance," or "how long should I run this test." Use this whenever someone is comparing two approaches and wants to measure which performs better. For tracking implementation, see analytics-tra
This skill guides the design, setup, and execution of A/B tests and experiments to generate statistically valid and actionable insights. It helps define clear hypotheses based on user data, select primary and guardrail metrics, determine sample size and test duration, and choose appropriate variant types and traffic splits. The focus is on aligning tests with business goals, ensuring robust implementation, and analyzing results with proper statistical rigor.
This skill is ideal for growth leads planning iterative optimizations of landing pages or funnels, performance marketers running PPC campaigns who want to validate messaging or creative changes, and agency strategists managing multivariate experiments across client websites. It supports scenarios where practitioners need to decide whether to test a specific change, how to structure variants, or how long to run a test to reach reliable conclusions.
Start by assessing the test context: identify the conversion goal, baseline metrics like current conversion rate and traffic volume, and any constraints such as timeline or tooling. Next, craft a strong hypothesis using data or observations, specifying the expected impact and target audience. Then, design variants focusing on a single meaningful change—this could be headline copy, CTA design, or content order—and calculate sample size based on expected lift and baseline rate. Finally, implement the test with consistent traffic allocation, monitor for technical issues during the test, and analyze results for statistical significance, effect size, and guardrail metric impacts before making a decision.
How long should I run this test? Run until you reach the calculated sample size for your expected lift at 95% confidence, avoiding early peeking to prevent false positives. What if my traffic is low? Consider testing larger lifts, running longer tests, or focusing on high-impact pages to meet sample size requirements. How do I pick the primary metric? Choose the single most relevant conversion metric directly tied to your hypothesis that will determine test success or failure.
Attach this skill to a Metaflow agent task whenever you want to plan or evaluate an A/B test, from hypothesis formation through result analysis. The agent will guide you through defining test parameters, sample sizing, variant design, and interpretation of statistical significance. Expect clear recommendations on test setup and decision criteria based on your inputs and context. This skill integrates smoothly with other analytics or tracking skills to support implementation and monitoring workflows.
For broader context, see our roundup of marketing skills claude, and read Claude Code workflows for marketing agencies for related setup guidance.