All Collections
Performance
Calculating Probability of Success
Calculating Probability of Success

Understanding our approach to testing incremental experiences with Bayesian Confidence.

Esther Vermeil avatar
Written by Esther Vermeil
Updated over a week ago

Introduction

We use a Bayesian model to evaluate a Yieldify test’s Probability of Success. Probability of Success measures the likelihood that the variant generates a positive uplift on the performance metric compared to the control group.

We apply our formula of Probability of Success to Revenue for websites that report revenue and to Conversion Rate for websites that only report sales.

With Bayesian, probability of success is tightly linked to uplift. If a variant displays a high probability of success, it means that we have high certainty that the treatment group (variant) will perform better than the control group.

If you’ve selected 95% as your default test confidence, it ensures that there's a 1-in-20 chance that a session’s result does not match the revenue or conversion rate uplift shown. For certain websites, a lower level of statistical significance may be appropriate for your tests. Read this article to see how you can change a website’s test confidence level.

Explaining Probability of Success

You can find the Probability of Success of any variant testing against a control group by clicking on the performance tab of an incremental campaign in the YCP platform.

A variant is Underperforming when it has a negative impact on website revenue/conversions compared to the control group. In these cases, its performance metric uplift is less than or equal to 0% and its probability of success is less than 50%. You should try optimising the experience to improve its performance.

A variant has a Low impact when it has a negligible impact on website revenue/conversions compared to the control group. In these cases, its performance metric uplift is close to 0% and its probability of success is between 50% and your website default test confidence level (80%, 85%, 90% or 95%). You should try optimising the experience to improve its performance if the uplift does not increase over time.

A variant is Unstable when it has a positive impact on website revenue/conversions compared to the control group, but the results are highly variable. In these cases, its performance metric uplift is positive and its probability of success is more than your website default test confidence level. You should wait until results stabilize before exposing the variant to a wider audience.

A variant is the Current Winner of the test when it has a stable and positive impact on website revenue/conversions compared to the control group. In these cases, it generates the highest positive performance metric uplift and its probability of success is more than the website default test confidence level. The test is complete and you should expose the variant to a wider audience.

Explaining Stability

A test’s probability of success is highly impacted by a website's daily conversions and revenue, which experience a degree of seasonality. Yieldify uses a stability index to ensure that the revenue or conversion rate uplift of a variant is stable over time and likely to produce reliable results.

When a variant has a high probability of success that is unstable, the test should continue until the results stabilise. Stable results ensure that the predicted uplift is repeatable across every future session. When a variant has a high probability of success that is stable, the test can be ended and that variant set to 100% exposure.

The following stability checks are made for every variant of an incremental test:

  • Daily Change in Revenue per Purchase Cycle or Conversion Rate is below 1%

  • Probability of Success:

    • The 7-day rolling average of its probability of success is more than the website-level test confidence selection (80%, 85%, 90% or 95%)

    • Its probability of success has been is more than the website-level test confidence selection (80%, 85%, 90% or 95%) for more than 5 days in the last 7 days

  • Expected Loss*:

    • The 7-day rolling average of its expected loss is less than the required threshold. Thresholds:

      • Less than 1% when revenue uplift is the performance metric

      • Less than 0.02% when conversion rate is the performance metric

    • Its expected loss has been less than the required threshold for more than 5 days in the last 7 days

In the case of multiple variants meeting all criteria, the ‘winning’ variant will be determined by highest Revenue or Conversion Rate uplift.

* Expected loss measures the average negative revenue uplift a variant would generate in the 5% chance it does not perform as expected (given a 95% probability of success). This check ensures that a variant’s revenue uplift wouldn’t drop more than the threshold, even if the variant is unsuccessful.

To understand why a variant is unstable, please reach out to your Account Manager who has access to a more detailed breakdown of every test.

Did this answer your question?