Tag: Data Science

  • How CUPED Makes Your A/B Tests Smarter (Without Needing More Users)

    How CUPED Makes Your A/B Tests Smarter (Without Needing More Users)

    Photo by Scott Graham on Unsplash

    Have you ever run an A/B test and found… nothing?

    No clear difference. Just noisy results that leave you wondering, “Did this even do anything?”

    That’s where CUPED comes in — short for Controlled Using Pre-Experiment Data. It sounds complicated, but the idea is surprisingly simple: Use what you already know about users to get better results — faster.

    So what is CUPED, in plain English?

    CUPED is a way to reduce the random noise in your experiment results by using data you already have from before the experiment started.

    Let’s say you’re testing a new homepage design to see if more users sign up.
    You know how often each user visited your site last week, before they saw the new page.

    That number (past visits) often affects how likely they are to sign up this week. CUPED uses that pattern to adjust your results so you can see the real impact of your new design, not just luck.

    A simple example

    You split users into two groups:

    • Control: sees the old homepage
    • Treatment: sees the new homepage

    But what if your treatment group just happened to include more people who visited last week? They might naturally sign up more — even if the new page did nothing.

    CUPED solves that by adjusting for past visits. You run a simple regression:

    signups_this_week ~ visits_last_week

    This gives you a coefficient (let’s call it θ, theta) showing how much past visits predict sign-ups.

    Then, for each user, you subtract the part of their behavior explained by past visits.
    Now you compare adjusted sign-up numbers across groups, a fairer, clearer test.

    adjusted_signups = signups_during_test - θ * (visits_last_week - average_visits)

    This removes the part of the outcome that’s explained just by past behavior. Now run your A/B test using these adjusted sign-up numbers.
    They have less random noise, so differences between control and treatment are easier to detect.

    Why use CUPED?

    1. More statistical power without needing more users
    2. Shorter experiments
    3. Cleaner insights even with messy data

    It’s especially useful when your metric is noisy or your sample size is small.

    When NOT to use CUPED?

    CUPED works best when your pre-experiment metric is strongly correlated with your outcome.
    If it isn’t, it might not help, it could add noise.

    So always test that correlation first!

    Want to go deeper?

    This post was inspired in part by Lyft’s blog on experimentation and Microsoft Research’s paper on CUPED: Variance Reduction in Online Controlled Experiments via Pre-Experiment Data.

    Official Source:

    Title: Variance Reduction in Online Controlled Experiments via Pre-Experiment Data
    Authors: Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, Nils Pohlmann
    Link: https://www.researchgate.net/publication/237838291_Improving_the_Sensitivity_of_Online_Controlled_Experiments_by_Utilizing_Pre-Experiment_Data

    CUPED is like giving your experiment a smarter starting point.
    By using what you already know about your users, you can get faster answers and make better decisions with no extra data required. You’re subtracting the influence of past behavior to better isolate the true effect of your test.

    If you found this helpful or want to dig deeper into data science insights, check out the Technology Blog section on NotesfromShivani, a space where I break down complex ideas in a clear, simple way. And if you’d like to connect, chat, or share your thoughts, I’m always up for a good data conversation over on LinkedIn. Come say hi!