"How many testers do I need?" is the first question DTC brands ask when launching a product test. The honest answer: it depends on what you're trying to measure. Here's the method to neither overspend nor undersize.
The rule of 30, and its limits
The rule of thumb circulating — "at least 30 testers" — comes from the Central Limit Theorem: above 30 observations, the distribution of means tends toward a normal distribution, even when the underlying population isn't normal.
But: 30 testers are only enough if you're measuring an average score (satisfaction, propensity to buy) with moderate variance.
To detect a rare defect (1 tester in 20 has an allergic reaction, say), 30 testers are notoriously insufficient. You need at least 100, often 200.
The simple formula to know
For a proportion (% of satisfied testers, % who would repurchase), required sample size is:
n = (Z² × p × (1−p)) / e²
Z= 1.96 for 95% confidencep= expected proportion (e.g. 0.7 for 70% satisfaction)e= acceptable margin of error (e.g. 0.1 for ±10 points)
Concrete example: to detect a 70% satisfaction rate with ±10 points, you need:
n = (1.96² × 0.7 × 0.3) / 0.1² ≈ 81 testers
To go down to ±5 points, the count quadruples: 323 testers.
Benchmarks by category
Based on SuperTry data across 2,000 campaigns in 2024-2025:
| Category | Recommended size | Why |
|---|---|---|
| Packaging test (strong signals) | 30-50 | Low variance, homogeneous feedback |
| Cosmetic product test | 50-100 | Variable skin sensitivities |
| Food test | 80-150 | Highly subjective tastes |
| Health claim test | 200+ | High variance + regulatory risk |
| Children / baby test | 100+ | Safety-critical |
The 4 parameters that change everything
1. Type of measure
- Continuous measure (1-10 score) → smaller sample suffices
- Binary measure (yes/no, bought/not bought) → larger sample required
2. Expected variance
The more heterogeneous the profiles (ages, regions, sensitivities), the larger the sample must be.
3. Effect size sought
Detecting a 30% gap between 2 versions = easy. Detecting 5% = it takes 36× more testers.
4. Sub-segments
If you want to analyze women 25-34 separately from men 45-54, each sub-segment must reach the minimum size — not the overall sample.
The SuperTry method in 3 steps
- Define the hypothesis: "70% of testers will prefer version A" is a testable hypothesis.
- Compute the minimum size with the formula above (or our built-in calculator).
- Run in two waves: first 50% of the sample. If results are clear, stop. Otherwise complete with the second wave.
This approach divides the average budget by 1.5, with no reliability trade-off.
Bottom line
A good sample size is never "30 by default" nor "as many as possible". It's the result of a calculation tied to what you want to prove. A few minutes of upfront methodology beat weeks of re-testing because of a poorly calibrated sample.
