A/B Testing Cold Email: How to Test 10 Variants (Not Just 2 Subject Lines)
You are probably already A/B testing your cold emails. And you are probably learning almost nothing from it.
Here is how it typically goes: you write one email, duplicate it, swap the subject line, and split your list in half. Version A gets "Quick question about your marketing stack" and Version B gets "Idea for [Company]." You send both, check open rates a week later, declare a winner, and call it optimization.
That is not A/B testing. That is coin-flipping with extra steps.
The problem is not that you are testing — it is that you are testing the smallest possible variable while holding everything else constant. Subject lines matter, but they are just the lid on the jar. What is inside the jar matters far more.
The Three Levels of Cold Email Testing
Not all tests teach you the same things. Think of A/B testing as a spectrum, from surface-level tweaks to fundamental strategic experiments.
Level 1: Component Testing
This is where most people start and stop. You change one element — a subject line, a CTA, a greeting — and compare performance. The core message, structure, and angle remain identical.
Example: Testing "Quick question" vs. "Idea for [Company]" as subject lines on the same email body.
Level 2: Structural Testing
Here you vary the format and structure while keeping the same general approach. You are testing how you deliver the message, not what the message says.
Example: A three-sentence email vs. a longer email with a case study. Same pain point, same CTA, different packaging.
Level 3: Approach Testing
This is where the real learning happens. You test fundamentally different strategies — different angles, different opening hooks, different value propositions. Each variant represents a distinct hypothesis about what will resonate with your audience.
Example: A pain-point lead ("Hiring 15 warehouse roles this quarter means onboarding is eating your ops team's time") vs. a compliment lead ("Your recent post on retention strategy was sharper than most of what I see in e-commerce") vs. a direct-offer lead ("We cut onboarding time from 8 hours to 2 for logistics companies your size").
What Each Level Teaches You
| Level | What You Test | What You Learn | Speed of Learning |
|---|---|---|---|
| Level 1: Component | Subject lines, CTAs, greetings | Which words trigger opens or clicks | Slow — small incremental gains |
| Level 2: Structural | Email length, formatting, proof placement | How your audience prefers to consume information | Moderate — informs your templates |
| Level 3: Approach | Pain points, angles, value propositions | What your audience actually cares about | Fast — reshapes your entire strategy |
Most teams spend all their time at Level 1 because it is easy. You only change one line. But Level 1 testing has a ceiling. Once you have a decent subject line, further tweaks yield diminishing returns. You optimize the lid while ignoring the jar.
Level 3 is where outbound strategy gets built. When you discover that question-based openings outperform compliment-based openings by 3x for CFOs at mid-market SaaS companies, that insight does not just improve one email — it informs every email you send to that segment.
The Math Problem With 2 Variants
Traditional A/B testing gives you two variants. That means you are testing one hypothesis at a time: "Is A better than B?"
With two variants, here is what a single test cycle looks like. You form a hypothesis, write two versions, send to your list, wait for results, pick a winner, and form a new hypothesis. Each cycle takes one to two weeks. Run 10 cycles and you have tested 10 ideas over three to five months.
Now consider what happens with 10 variants running simultaneously. In a single cycle, you are testing 10 different hypotheses at once. In two weeks, you have the same insight that would have taken five months with A/B pairs.
This is not just 5x faster. It is qualitatively different. With 10 variants, you are not inching toward a slightly better email. You are rapidly mapping the landscape of what works — which angles, which structures, which proof points, which CTAs resonate with your specific audience.
Making Multi-Variant Testing Actually Work
Running 10 variants is not as simple as writing 10 emails and blasting them out. There are three things you need to get right.
Sample Size and Statistical Confidence
The most common mistake in email testing is declaring a winner too early. If Variant A has a 15% reply rate and Variant B has a 10% reply rate after 20 sends each, you do not have a winner. You have noise.
A useful rule of thumb for cold email: you need roughly 100 to 200 sends per variant before reply rate differences become meaningful. For open rates, you can work with smaller samples (50 to 100 per variant) since the event happens more frequently.
With 10 variants, that means a test cycle needs 1,000 to 2,000 total sends. That sounds like a lot, but for most B2B outreach campaigns, it is a few weeks of normal sending volume.
You do not need to be a statistician. Just resist the urge to call winners before the numbers stabilize.
Vary the Right Things
If all 10 variants are slight rewrites of the same idea, you have 10 versions of Level 1 testing. You learn that "regarding" outperforms "about" in subject lines. Congratulations.
For multi-variant testing to deliver real insight, your variants need to represent genuinely different approaches:
- Pain-point lead — Open with a specific problem the prospect likely faces
- Compliment lead — Open with something genuinely impressive about their business
- Question lead — Open with a question that surfaces a known challenge
- Case-study lead — Open with a concrete result you achieved for a similar company
- Direct-offer lead — Skip the warm-up and state exactly what you can do for them
- Trigger-event lead — Reference something recent (new hire, funding round, product launch)
Mix in structural variations too. Some short, some long. Some with bullet points, some without. Some with a soft CTA ("worth a look?"), some with a specific ask ("15 minutes on Thursday?").
The goal is controlled diversity — enough variation to learn something meaningful, enough structure to know why one variant outperformed another.
Iterate Based on What the Data Tells You
This is where most testing programs die. You run a test, find a winner, and then... keep sending the winner forever.
Effective testing is a cycle, not a one-time event. Here is how it should work:
- Launch — Send 10 variants simultaneously
- Measure — Track opens, replies, and positive responses (not just volume — a "not interested" reply is not a win)
- Retire the losers — Stop sending variants that underperform
- Analyze the winners — What do the top performers have in common? Is it the angle, the structure, the CTA, or some combination?
- Generate new challengers — Create new variants informed by what worked, but testing new hypotheses
- Repeat — The best variant this month should face new challengers next month
This creates a compounding effect. Each cycle starts from a higher baseline than the last. Your worst-performing variant in month three would have been your best performer in month one.
When to Stop Testing
You do not test forever. Eventually, you reach a point of diminishing returns where new variants perform within a narrow band around your current best. When three consecutive test cycles fail to produce a variant that meaningfully outperforms the incumbent, you have found your ceiling for that audience segment.
At that point, shift your testing energy to a different segment, a different offer, or a different stage of the funnel. The approach that wins for Series A startup founders may fail completely for enterprise procurement teams. Testing never truly ends — it just moves to where the biggest gaps remain.
The Compounding Advantage
The teams that win at cold outreach are not the ones with the best copywriter. They are the ones with the best testing system. A mediocre first email that feeds into a disciplined testing process will outperform a brilliant email sent on autopilot within a few months.
Two variants teach you slowly. Ten variants teach you fast. And a system that automatically retires underperformers and replaces them with new challengers never stops improving.
BongoBot tests 10 email variants per campaign simultaneously, automatically retiring low performers and generating new challengers from what is working. See how it works.
Ready to put this into practice?
BongoBot automates personalized outreach so you can focus on closing.
Start Free