All posts
statisticsab-testing

What "statistical significance" actually means

Most teams stop tests too early because they misread the number on the dashboard. Here's the mental model that fixes it.

April 28, 2026

What "statistical significance" actually means

"95% confidence" sounds like "we are 95% sure variant B is better." It is not. It is a much narrower claim, and confusing the two is why most teams ship false winners.

The actual definition

A 95% confidence result means: if there were truly no difference between the variants, we would see a result this lopsided less than 5% of the time by random chance alone.

That is not the same as "B is 95% likely to be better." It's a statement about how likely your data would be in a world where there's no real effect.

Why this matters in practice

Two failure modes follow from misreading this:

  1. Stopping early. If you peek at results every day and stop the moment significance crosses 95%, you will declare winners that aren't real, again and again. The math assumes you commit to a sample size up front.
  2. Confusing significance with impact. A variant that converts 0.05% better can become "significant" with enough traffic. Significant ≠ meaningful.

What to do instead

  • Decide your minimum sample size before launching. Stick to it.
  • Look at the effect size first. A 5% lift that's not yet significant is more useful than a 0.1% lift that is.
  • Run the test for at least one full business cycle (typically a week). Weekday/weekend traffic patterns will lie to you otherwise.