SuperTry

What sample size do you need for a reliable product test?

How many testers does a product test really need to be statistically reliable? Methodology, formulas and category benchmarks.

Équipe SuperTry3 min read
Share
Statistical distribution chart on a white background

"How many testers do I need?" is the first question DTC brands ask when launching a product test. The honest answer: it depends on what you're trying to measure. Here's the method to neither overspend nor undersize.

The rule of 30, and its limits

The rule of thumb circulating — "at least 30 testers" — comes from the Central Limit Theorem: above 30 observations, the distribution of means tends toward a normal distribution, even when the underlying population isn't normal.

But: 30 testers are only enough if you're measuring an average score (satisfaction, propensity to buy) with moderate variance.

To detect a rare defect (1 tester in 20 has an allergic reaction, say), 30 testers are notoriously insufficient. You need at least 100, often 200.

The simple formula to know

For a proportion (% of satisfied testers, % who would repurchase), required sample size is:

n = (Z² × p × (1−p)) / e²
  • Z = 1.96 for 95% confidence
  • p = expected proportion (e.g. 0.7 for 70% satisfaction)
  • e = acceptable margin of error (e.g. 0.1 for ±10 points)

Concrete example: to detect a 70% satisfaction rate with ±10 points, you need:

n = (1.96² × 0.7 × 0.3) / 0.1² ≈ 81 testers

To go down to ±5 points, the count quadruples: 323 testers.

Benchmarks by category

Based on SuperTry data across 2,000 campaigns in 2024-2025:

CategoryRecommended sizeWhy
Packaging test (strong signals)30-50Low variance, homogeneous feedback
Cosmetic product test50-100Variable skin sensitivities
Food test80-150Highly subjective tastes
Health claim test200+High variance + regulatory risk
Children / baby test100+Safety-critical

The 4 parameters that change everything

1. Type of measure

  • Continuous measure (1-10 score) → smaller sample suffices
  • Binary measure (yes/no, bought/not bought) → larger sample required

2. Expected variance

The more heterogeneous the profiles (ages, regions, sensitivities), the larger the sample must be.

3. Effect size sought

Detecting a 30% gap between 2 versions = easy. Detecting 5% = it takes 36× more testers.

4. Sub-segments

If you want to analyze women 25-34 separately from men 45-54, each sub-segment must reach the minimum size — not the overall sample.

The SuperTry method in 3 steps

  1. Define the hypothesis: "70% of testers will prefer version A" is a testable hypothesis.
  2. Compute the minimum size with the formula above (or our built-in calculator).
  3. Run in two waves: first 50% of the sample. If results are clear, stop. Otherwise complete with the second wave.

This approach divides the average budget by 1.5, with no reliability trade-off.

Bottom line

A good sample size is never "30 by default" nor "as many as possible". It's the result of a calculation tied to what you want to prove. A few minutes of upfront methodology beat weeks of re-testing because of a poorly calibrated sample.

Continue reading