2024-11-25
Praveenkumar Revankar
Nov 25, 2024 ⋅ 5 min read

How to use ANOVA to make data-driven decisions

Praveenkumar Revankar

As a product manager, one of your most important tasks involves launching new products. Imagine you spend months conducting market research, determining your MVP, and troubleshooting bugs. Finally it comes time to get the word out about your product to your target audience. With your product marketing team, you decide to use four different promotional channels: Email, WhatsApp, TV, and social media.

How To Use ANOVA To Make Data-Driven Decisions

After, with your launch in the rearview, your stakeholders ask you whether you experienced a significant difference in sales generated by these campaigns. They want to know if certain campaigns performed better or worse than others to determine a best practice for the future.

How would you go about doing this?

One of the best ways to tackle the request would be to use ANOVA (analysis of variance). Keep reading to learn about what ANOVA is, the different types you can run, and how to perform it.

What is ANOVA?

ANOVA, which stands for analysis of variance, is a statistical method researchers use to compare multiple groups simultaneously to determine whether there are any statistically significant differences between them.

ANOVA’s history dates back to the early 1900s when it was primarily developed by the renowned statistician Ronald Fisher. The core principle of ANOVA lies in partitioning the total variability in the data into variability between and within groups. ANOVA compares these two sources of variability and tests the null hypothesis that assumes that all group means are equal.

When to use ANOVA

You can use ANOVA when you need to compare more than two groups and see if there are any significant differences in their performances.

Overall, ANOVA is best suited when:

  • You have a continuous dependent variable
  • You have one or more categorical independent variables (factors)
  • You want to compare means across multiple groups
  • The sample groups meet all the assumptions to conduct ANOVA

ANOVA is widely applicable where comparing outcomes across groups is essential for decision-making. Some examples include:

  • Comparing teaching methods or assessing the impact of different learning environments
  • Comparing treatment effectiveness or evaluating the impact of different therapies
  • Testing different fertilizers or evaluating the impact of different irrigation techniques
  • Comparing sales performance or assessing the impact of different advertising campaigns
  • Comparing the effectiveness of different therapies or assessing the impact of different interventions on behavior

ANOVA terminology

As a statistical method, ANOVA comes with its own set of terms that are important to understand before you attempt to implement it within your product team. These include:

  • Factors are the independent variables in an ANOVA analysis. They represent the groups or categories being compared. In an experiment comparing teaching methods on test scores, the teaching method is the factor
  • Levels are the different categories or values within a factor. Each factor has at least two levels. For the factor “teaching method,” the levels could be “classroom,” “online,” and “hybrid”
  • The dependent variable is the outcome or response that you measure and compare across the different levels of a factor. For the factor teaching method, test scores are the dependent variable
  • The independent variable, or factor, is the variable that categorizes groups or conditions in the study. ANOVA evaluates the effect of this variable on the dependent variable
  • The null hypothesis (H0) posits that there is no significant difference among the means of the groups being compared
  • The alternative hypothesis (H1) asserts that there is a significant difference among the group means
  • The F-Statistic is the ratio of variation between groups to variation within groups. You use it to determine if the observed group has statistically significant differences. A higher F-value suggests a more significant difference between group means relative to within-group variance
  • The p-value indicates the probability that the observed differences among group means occurred by chance. A low p-value (≤ 0.05) leads to rejecting the null hypothesis
  • Degrees of freedom are the values needed to calculate the F-statistic and vary based on the number of groups and sample sizes
  • Post-hoc tests are follow-up tests conducted if ANOVA indicates significant differences. They identify which specific groups differ significantly

Types of ANOVA

ANOVA is categorized based on the number of variables participating in the experiment and the involvement of subjects in the study. The most important ones include:

Types Of ANOVA

One-way ANOVA

As the name suggests, one-way ANOVA tests the difference in groups with one independent variable.

For example, say you want to test the effect of different diets on weight loss. You divide participants into three groups based on their diet type (e.g., diet A, diet B, and diet C) and measure weight loss in kilograms. In this case, there is only one independent variable (diet type) with multiple (three) levels, and the dependent variable is weight loss.

Two-way ANOVA

Two-way ANOVA analyzes the effect of each independent variable and its interaction with a dependent variable.

In this case, you might run a study on the effect of exercise type (cardio, strength, no exercise) and diet type (diet A, diet B, diet C) on weight loss. Here, there are two independent variables: exercise type and diet type, and the dependent variable is weight loss.

Repeated measures ANOVA

Repeated measures ANOVA is used when you perform the tests on the same participants under different conditions or at different times. This type helps to assess if there are significant differences in the dependent variables across the repeated measurements within the same subjects captured under different conditions or times.

For example, if you measure participants’ blood pressure before, during, and after a treatment. Since each participant’s blood pressure is recorded three times in different conditions from the same subject, this requires repeated measures ANOVA.

Repeated measures can be applied on one-way or two-way ANOVA where tests are done based on certain conditions over time.

Multivariate ANOVA (MANOVA)

If you have multiple dependent variables, you might want to use multivariate ANOVA (MANOVA). It allows you to assess if the mean vectors of multiple dependent variables differ across the levels of one or more independent variables.

For example, if you were examining the effect of different teaching methods (in-person, online, hybrid) on students’ performance across several subjects (math, science, and language). Here, you have one independent variable (teaching method) with three levels, and multiple dependent variables (scores in math, science, and language).

The chart below illustrates the different types of ANOVA and when to use each:

ANOVA Type Independent Variables Dependent Variables Repeated Measures? When to Use
One-way One One No Comparing one factor with three plus groups
Two-way Two One No Comparing main effects and their interaction effects
One-way Repeated One One Yes Same subjects but different conditions or multiple time points.
Two-way Repeated Two One Yes Two factors repeated measurements on the same subjects
MANOVA One plus Two plus No Multiple outcomes (mean vectors) measured

Performing ANOVA

To help you understand how to run ANOVA, consider the following example and the steps required:

Problem — A product marketing team runs three campaigns in different regions for a newly launched product: Social media ads, email marketing, and TV commercials

1. Define the objective

The marketing team wants to know if the average sales generated by different marketing campaigns are significantly different.

2. Formulate hypotheses

  • Null hypothesis (H₀) — There is no significant difference in sales performance across the marketing campaigns
  • Alternative hypothesis (H₁) — At least one marketing campaign has a significantly different sales performance

3. Collect data

Gather the sales performance data for each marketing campaign. For example, say each campaign runs in three regions and generates the following sales for each region:

  • Social media ads — $50,000, $55,000, $60,000
  • Email marketing — $45,000, $50,000, $55,000
  • TV commercials — $40,000, $50,000, $52,000

4. Check assumptions

  • Independence — Ensure that the sales performances are independent
  • Normality — Check if the sales performance data are approximately normally distributed
  • Homogeneity of variances — Check if the variances are equal across groups

5. Conduct ANOVA

Now, for ANOVA, you would:

  1. Calculate group means and overall mean
    • Mean of social media ads = 55,000
    • Mean of email marketing = 50,000
    • Mean of TV commercials = 47,333.33
    • Total mean (sum of all sales / 9) = 45,222.22
  2. Calculate the sum of squares
    • Sum of squares between groups (SSBG) = 93,758,306.32
    • Sum of squares within groups (SSWG) = 182,666,666.67
  3. Calculate total sum of squares (SST)
    • SST = SSB+SSW ≈ 93,758,306.32+182,666,666.67 = 276,424,973
    • SST: 276,424,973
  4. Calculate degrees of freedom (df)
    • df between groups (dfBG) = (Number of groups – 1) = 3 – 1 = 2
    • df within groups (dfWG) = (Total no. of observations – no. of groups) = 9 – 3 = 6
    • Total degrees of freedom (dfT) = (Total number of observations) – 1 = 9 – 1 = 8
  5. Calculate mean squares (MS)
    • Mean square between groups (MSBG) = 46,879,153.16
    • Mean square within (MSWG) = 30,444,444.44
  6. Calculate f-statistic
    • F-statistic = (MSBG / MSWG) = (46,879,153.16 / 30,444,444.44) = 1.54
  7. Determine p-value
    • Using statistical software or F-distribution tables, look up the p-value for F(dfBG, dfWG)
    • p-value for F(2, 6) to be approximately 0.19

6. Interpret the results

With a p-value of 0.19 (not less than 0.05), you fail to reject the null hypothesis. This means there isn’t enough evidence to suggest that the sales performance varies significantly across the marketing campaigns.

7. Post-hoc tests (if applicable)

Since you didn’t reject the null hypothesis, post-hoc tests aren’t needed in this case. However, if the null hypothesis was rejected and an alternate hypothesis was selected, post-hoc tests such as Tukey’s HSD would need to be used to compare pairwise between all group means to identify specific differences.

Final thoughts

ANOVA is a statistical method used to compare the means of three or more groups to determine if at least one group’s mean is significantly different from the others. It helps you identify variations among group means and assess the impact of different factors on a dependent variable. Understanding ANOVA enables you to make data-driven decisions, understand relationships and variability within and between key drivers, and run post-hoc analysis.

Featured image source: IconScout

Leave a Reply