كل المقالات
ab-testingcroconversion-rateecommerce

Why Most A/B Tests Don't Show a Winner (and the Fix)

Your A/B test isn't showing a winner because it's underpowered. Here's the sample size math, traffic thresholds, and how long to actually wait.

28 أبريل 2026

You've been running an A/B test for three weeks. Variant B is up 4%, but the tool says "not significant." You want to ship it. Should you?

Probably not — and the reason your A/B test is not showing a winner usually has nothing to do with the variant itself. It's the math. Most e-commerce A/B tests on small and mid-size stores never have a chance of finishing. They're underpowered from the moment they start, and no amount of waiting will fix them.

The sample size problem in plain language

Statistical significance is a function of three things: how big a difference you're trying to detect, how many people see each variant, and how noisy the metric is. If you want to detect a small lift on a small store, you need a long time and a lot of patience. If the math says you need 80,000 sessions per variant and you have 2,000, no result is coming.

Here's the rough rule. To reliably detect a 10% lift on a metric that converts at 3% (typical Salla product page add-to-cart), you need around 30,000 sessions per variant. To detect a 5% lift, you need close to 120,000. Most stores running tests never check this number before they start.

Why "it looks like B is winning" lies to you

Conversion data is noisy. With small samples, the lead bounces around dramatically. Variant B might be up 12% on Tuesday, down 3% on Friday, up 6% on Monday — and none of those numbers mean anything yet. They're just early readings of a coin you haven't flipped enough times.

Stopping a test early because B "looks like the winner" is the most expensive mistake in CRO. You don't see the cost — you just ship losers and wonder why revenue stays flat.

This is called peeking, and it's the reason teams convince themselves their CRO program is working when it isn't. Every time you check a test and let the result influence your decision, you increase the chance of a false positive. Decide your end date upfront and don't move it.

Three honest options when traffic is low

If you're a small store, the standard CRO playbook doesn't quite fit. You have three real choices, and pretending you have more is how time gets wasted.

  1. Test bigger changes. A 30% redesign of the product page might lift add-to-cart by 15%, which is detectable on small samples. A button color change won't be. Aim higher.
  2. Test on the highest-traffic page only. Most stores have one page (often the homepage or top category) that gets 60%+ of all sessions. Run every test there until you grow.
  3. Roll up the goal. Instead of measuring "completed orders" — which only a tiny fraction of visitors do — measure "add-to-cart" or "reached checkout." These convert 5–15× higher and finish 5–15× faster.

None of these are cheating. They're matching the test to the store you actually have, not the store in the textbook examples.

There's a fourth option people often overlook: bundling small tests into one larger one. Instead of testing a headline change alone, then a button change alone, then a shipping line change — try a fully redesigned product page that combines all three against the current version. The total effect is bigger, so it shows up faster. The downside: you won't know which specific change drove the result. But for a store starving for traffic, knowing "this bundle wins" is more useful than "I don't know because no test ever finished."

How long to actually wait

The honest waiting period is whichever is longer:

  • Two full business cycles. For most MENA stores, that's two weeks — covering both weekends and at least one salary week.
  • The pre-calculated sample size. Use any free sample size calculator before you start. If it says you need six weeks, plan for six weeks.
  • One full inventory or campaign cycle. If you ran a sale during the test, the test is contaminated. Wait until shopping behavior normalizes again.

If after all that the test still says "no winner," believe it. "No winner" is a result. It means the change you tested didn't matter as much as you thought, and you should test something bolder next.

When the metric you measured is wrong

Sometimes the test really did finish — the math is fine, the time was right — but you measured the wrong thing. We see this constantly with shipping and pricing tests. The variant lifts add-to-cart by 8% but tanks completed orders by 5%, because shoppers added more, then bailed at checkout when they saw the real total.

Always look at the metric one step closer to revenue. If you tested something on the product page, check both add-to-cart and orders. If you tested in the cart, check both checkout starts and orders. A win at the top that becomes a loss at the bottom is a loss.

Look for post-purchase impact too. A variant that lifts conversion but attracts a different kind of buyer might look like a winner in week one, then reveal itself through higher returns or lower repeat purchase rate. You don't need to wait two months before declaring a test done, but before rolling the winner to 100% of traffic, check return rates and support tickets in the two weeks after the test ends.

Before you even start a test, sometimes the smarter move is to figure out where the problem is, not what to change. Watching real shoppers is often faster than guessing — see heatmaps vs session recordings vs A/B tests for which tool answers which question.

Next steps

Before your next test, run two numbers. First, your weekly sessions on the page you want to test. Second, the sample size needed for the lift you'd realistically expect. If those don't match, change the test — bigger change, busier page, or a goal closer to the funnel top. Don't run tests that can't finish.