flaskAudiences Testing & Experimentation Guide

How to test and measure Audiences the right way

This guide is here to help you get accurate results when testing Freshpaint Audiences and avoid the most common testing mistakes we see that lead to misleading conclusions.


Table of Contents


First: what Freshpaint Audiences actually does

Freshpaint Audiences isn’t meant to replace Meta or Google’s native targeting.

Think of it like this:

  • Freshpaint helps you identify the right people to target using your first-party data and pass those identifiers to Meta and Google in a compliant manner

  • Meta & Google handle ad delivery and optimization using their in-platform machine learning

Freshpaint gives those platforms better, cleaner, compliant inputs. It’s not designed to out-optimize their delivery engines.

Because of that, direct 50/50 A/B tests between audiences in Freshpaint and platform-native audiences can be misleading.


The right way to frame success

Instead of asking:

“Did this Freshpaint audience beat our Meta or Google audience?”

Look at:

  • Are we now able to run compliant retargeting we couldn’t safely run before?

  • Are we reducing wasted spend by excluding existing members or patients?

  • Does overall campaign ROI improve when Freshpaint audiences are layered into your strategy?

  • Are your lookalike seeds higher quality?

Audiences works best as part of your overall acquisition strategy, not as a standalone ad set.


Why audiences in Freshpaint can look “smaller”

Freshpaint audiences are often:

  • More precise

  • Higher intent

  • Built from real first-party behavioral or CRM/EHR data

Smaller, higher-intent audiences naturally:

  • Have higher frequency

  • Can look more expensive in isolation

  • Convert better per person

This is expected and often a sign of quality, not a problem.


What to test instead (better experiments)

Rather than testing:

Freshpaint vs Meta

Focus on:

  • Total campaign CPA before and after exclusions

  • Overall ROI when Freshpaint audiences are layered in

  • Changes in call center or lead handling volume

  • Lookalike quality when seeded with audiences in Freshpaint

You’re measuring system impact, not ad-set performance in a vacuum.


How to Run a Valid Audiences Experiment (Overview)

  1. Pick one clear problem to fix

  2. Wasted spend → Exclusions

  3. Scaling efficiently → Lookalikes

  4. Lost demand → Retargeting

  5. Layer, don’t replace

  6. Keep your existing native campaigns running

  7. Add Freshpaint audiences into the strategy

  8. Measure blended performance

  9. Look at overall CPA, ROI, and operational impact

  10. Not just which ad set “won”

  11. Let campaigns normalize

  12. You’re changing who you reach

  13. Platform learning needs time to adjust

How to Run a Valid Audiences Experiment (step-by-step)

This section walks through how to design, run, and evaluate a Freshpaint Audiences test so you get a real, accurate read on impact and avoid false negatives.


Step 1: Define Your Control and Your Test

Before you touch any campaigns, decide:

Control (your baseline)

Your existing campaign setup that does not use any audiences built in Freshpaint. This should reflect how you’re running acquisition today.

Examples:

  • Native Meta / Google audiences

  • Any current targeting and exclusions

  • Your normal budget and bidding strategy

This is your “business as usual” benchmark.


Test (your Freshpaint layer)

Your Test should use the exact same campaign structure as your Control, with one change:

Audiences built in Freshpaint are layered into the targeting.

What stays the same:

  • Budget

  • Creative

  • Bidding strategy

  • Geo

  • Conversion goals

What changes:

  • Add Freshpaint exclusions, retargeting, or lookalikes based on the use case you’re testing

This isolates Freshpaint’s impact.


Step 2: Split Traffic Cleanly

You must ensure the Control and Test are not competing with each other.

Use one of the following:

  • Platform experiments (recommended)

  • Geo splits (two similar markets)

  • Time-based splits (if geo splitting isn’t possible)

Goal: Control and Test should receive equivalent traffic and opportunity.


Step 3: Run the Test Long Enough

Audiences changes who you’re reaching, which means platform learning needs time to stabilize.

Minimum guidance:

  • Run at least 4–6 weeks

  • Or until you’ve seen a meaningful volume of conversions in both arms

Avoid drawing conclusions in the first 1–2 weeks.


Step 4: What Metrics to Use (and what to ignore)

Do not evaluate based on:

  • Which ad set “won”

  • Single-ad-set CPA in isolation

  • Early learning-phase metrics

Instead, track:

  • Overall CPA (blended)

  • Total conversions

  • Conversion quality

  • Waste reduction (suppressed traffic, reduced internal calls, etc.)

  • Lookalike expansion efficiency (if applicable)

You’re evaluating system lift, not ad-set competition.


Step 5: How to Read Results

Ask:

  • Did overall CPA improve?

  • Did total ROI improve?

  • Did we reduce wasted or redundant spend?

  • Did conversion quality improve?

  • Did we unlock compliant strategies we couldn’t run before?

If yes → Audiences is working.

Even if one individual ad set looks “more expensive.”


Step 6: Expand Beyond a Single Use Case

The strongest results don’t come from one audience.

After your first test:

  1. Start with exclusions

  2. Add lookalikes

  3. Then layer retargeting

  4. Build a system over time

This is how teams see consistent ROI and why single-use-case tests often under-show value.


Step 7: When to Iterate

Only change one variable at a time:

  • Add a new Freshpaint audience

  • Or expand into a new use case

Then repeat the same structure: Control vs Test → Run → Measure blended impact → Expand


Why testing only one use case often under-shows value

Teams that only run:

  • One audience

  • One campaign

  • One short test

Often miss the bigger value of Audiences.

Teams that see the strongest results usually implement some combination of:

  • Start with exclusions

  • Add lookalikes

  • Then layer retargeting

  • Build a system over time


Common testing questions (and what they usually mean)

What you might notice

What’s happening

How to think about it

CPAs look higher

Audiences are more precise

Look at blended ROI

Audiences are smaller

They’re higher intent

Pair with native reach

A/B tests look worse

Audiences were treated as replacements

Evaluate blended performance


What good testing looks like

Good tests answer:

  • Did wasted spend go down?

  • Did overall ROI improve?

  • Did conversion quality improve?

  • Did Audiences unlock strategies you couldn’t safely run before?

Not:

  • Which ad set “won.”

Last updated

Was this helpful?