← Articles·ESSAY № 004·3 MINUTES·NOVEMBER 2025

5 A/B Testing Mistakes That Cost You Revenue

Statistical significance isn't enough. These five common A/B testing errors lead to false conclusions and missed revenue opportunities.

By AJ Magnuson

Founder · Growth & AI

A/B testing is the backbone of growth experimentation, but the gap between running tests and running them well is enormous. Here are five mistakes I see repeatedly — each one capable of leading you to confidently wrong conclusions.

1. Peeking at Results Too Early

This is the most common and most dangerous mistake. Checking your test results daily and stopping when you see significance leads to dramatically inflated false positive rates. A test designed for 95% confidence that's checked daily for a week has an effective false positive rate closer to 30%.

Fix: Pre-register your sample size and runtime. Don't peek. If you must monitor, use sequential testing methods that account for multiple looks.

2. Ignoring Segment Effects

Your test might show a flat overall result while hiding a +20% lift in one segment and a -15% drop in another. These Simpson's Paradox situations are more common than you'd think, especially when traffic composition shifts during the test.

Fix: Always break down results by key segments (device, geography, user tenure, plan type). Pre-register the segments you'll analyze.

3. Testing Too Many Variants

Running 6 variants might feel like you're moving faster, but you're actually diluting statistical power. With the same traffic, a 6-variant test needs roughly 5x the runtime of an A/B test to reach the same confidence level.

Fix: Limit to 2-3 variants maximum. Use the extra power to detect smaller effects, which is usually more valuable.

4. Wrong Success Metric

Optimizing for clicks when you should optimize for purchases. Optimizing for trial starts when you should optimize for trial-to-paid conversion. The proxy metric you choose can lead you to a local maximum that's far from the global one.

Fix: Use your north star metric or the closest reliable proxy. Accept longer test durations for better decisions.

5. Ignoring Novelty and Primacy Effects

A redesigned feature often sees an initial spike (novelty effect) or initial drop (primacy/change aversion) that normalizes over 2-4 weeks. Ending your test before these effects wash out gives you misleading results.

Fix: Run tests for at least 2-3 full business cycles (typically 2-4 weeks minimum). Compare later-period results against early-period results to detect novelty effects.

The meta-lesson: slow down. A single well-run test that gives you a reliable answer is worth ten fast tests that give you noise dressed up as signal.

Filed under Growth Loops. See also How to Build a Growth Model from Scratch, Choosing Your North Star Metric, How to Structure a Growth Team.

Cite as · Magnuson 2025 · Omega Point Writing № 004Experimentation · A/B Testing · Analytics