If you’re shipping digital products and want to know what actually moves the needle—A/B testing is your friend. But running a test is only half the battle; analyzing it the right way matters just as much. This guide is for anyone who wants to use Amplitude’s experiment features to cut through the noise and get real answers from their A/B tests. Maybe you’ve used Amplitude for analytics already, or you’re just eyeing their experiments product. Either way, you’ll learn how to set up a clean test, avoid the usual mistakes, and pull out insights you can trust.
Let’s get into it.
Step 1: Get Your (Data) House in Order
Before you even think about running experiments in Amplitude, you need a solid data foundation. If your product events are a mess, your A/B test results will be, too.
Here’s what you need: - Clear event tracking: Make sure the actions you’ll measure (sign-ups, clicks, purchases, etc.) are tracked as events in Amplitude. No guesswork—double-check everything fires correctly. - Consistent user IDs: Each user should have a stable, unique identifier across sessions and devices. If you’re mixing anonymous and logged-in IDs, fix that first. - Baseline metrics: Know your current conversion rates, usage patterns, and sample sizes. You need these numbers to design a decent experiment.
Pro tip: Don’t trust your data blindly. Pick a random user, walk through your product, and watch their event stream in Amplitude. If something looks weird, fix it now—not after the experiment.
Step 2: Define a Real Hypothesis (Not Just “Let’s Try This”)
A/B testing is about answering a question—not just rolling the dice and seeing what happens. Amplitude will track whatever you throw at it, but if you don’t have a sharp hypothesis, you’ll waste time.
Ask yourself: - What exactly am I changing? (Button color, new onboarding flow, different copy, etc.) - What do I expect to happen? (“This variant will increase onboarding completion by 10%.”) - Which metric will prove it? (Pick one primary metric. Secondary metrics are fine, but don’t let them muddy the water.)
Write it down. If you can’t state your hypothesis and expected outcome in one sentence, you’re not ready to test.
Step 3: Set Up Your Experiment in Amplitude
Now for the nuts and bolts. Amplitude Experiment lets you create and roll out feature flags and A/B tests. Here’s how to set it up without making a mess:
- Create a new experiment: In Amplitude, go to the “Experiments” section and start a new experiment.
- Define variants: Usually “A” (control) and “B” (variant), but you can add more if you’re feeling ambitious. Keep it simple—multi-variant tests get messy fast.
- Choose your targeting: Decide which users see which variant. You can target by user properties, cohorts, or randomly assign. Don’t overthink segmentation unless you have a good reason.
- Set exposure percentage: Start with 50/50 split unless you have a reason to favor control (e.g., risky changes). Don’t bias your test.
- Attach the right events: Tell Amplitude which conversion events to track for this experiment. Double-check you’re using the right event names.
Reality check: Feature flags and experiments in Amplitude require a decent engineering setup. If toggling a flag takes weeks, fix your deployment pipeline first.
Step 4: Monitor Early, But Don’t Peek at Results
Once your experiment is live, it’s tempting to check results every hour. Don’t. Early data is noisy and can easily mislead you.
What you should do: - Monitor for technical issues: Make sure both variants are getting traffic, events are firing, and users aren’t getting stuck. This is about catching bugs, not interpreting results. - Don’t stop early: Amplitude will show you fancy charts, but ignore the p-values and “statistical significance” messages until you hit your planned sample size and duration. Peeking early leads to false positives.
How long should you run it? - At least 1–2 weeks (to cover day-of-week effects), but base it on your traffic and conversion rate. Amplitude has a “Sample Size Calculator”—use it. - Don’t end tests based on impatience or because “the variant looks better so far.” That’s how you fool yourself.
Step 5: Analyze Results Like a Sane Person
When your experiment is done, it’s time to look at the numbers. Amplitude’s experiment reports are slick, but don’t let the interface do your thinking for you.
What to actually look for:
- Primary metric difference: Did the variant beat the control on your main outcome? Ignore all the secondary metrics at first.
- Statistical significance: Amplitude will show confidence intervals and p-values. Treat them as guides, not gospel. If your result is barely “significant,” don’t bet the business on it.
- Direction and magnitude: How big is the effect? Is it big enough to matter, or is it a rounding error?
- Segment analysis: Only slice by segments (like device type, region, etc.) if you had a hypothesis about them before starting. Otherwise, you’re just data dredging.
What not to do: - Don’t declare victory because one chart looks bigger than the other. Look at the actual numbers and intervals. - Don’t ignore implementation bugs (“oh, the variant didn’t actually show up for half the users, but let’s ship it anyway”). If you find issues, run the test again.
Step 6: Make a Decision (and Document It)
When the test ends, make a clear call: - Ship the variant if it clearly wins—by a margin that matters for your business. - Stick with control if the result is flat or negative. - If it’s a tie, don’t be afraid to say “no change.” Not every test will be a blockbuster, and that’s fine.
Write down: - What you tested - Your hypothesis - The results (both numbers and your read on them) - The decision, and why you’re making it
This avoids endless “wait, didn’t we test that already?” debates down the line.
Step 7: Learn and Iterate (Don’t Chase Unicorns)
The dirty secret: Most A/B tests won’t move your main metric. That’s normal. The real power is in learning what doesn’t work, so you stop spinning your wheels.
A few things to remember: - Don’t run endless tests on trivial stuff (like button color) and expect miracles. - Focus your efforts on high-impact changes—big copy shifts, new flows, radical simplifications. - Use what you learn to plan smarter, not just to rack up “significant” results.
Ignore the hype: Amplitude has plenty of features—automated test suggestions, advanced targeting, statistical jargon. Most teams don’t need half of it. Get good at the basics: clean events, sharp hypotheses, honest analysis.
Wrapping Up
A/B testing with Amplitude Experiment isn’t magic, but it’s the most reliable way to measure what’s actually working in your product. Keep your setup tight, your hypotheses sharp, and your analysis honest. Don’t get lost in dashboards or chase “stat sig” for its own sake. Keep things simple, ship often, and let the real users tell you what works. That’s how you get better, one test at a time.