## 7B. ANOVA

In earlier lessons you learned how to test to see whether or not two groups differed - an independent samples t-test. But what do you do if you have more than two groups?

The first case we will examine is when you have three or more independent groups and you want to see whether or not there are differences between them - the test that accomplishes this is an Analysis of Variance - a between subjects test to determine if there is a difference between three or more groups.

Analysis of Variance (ANOVA) is a complex business, at this point you need to find a textbook and read the chapter(s) on ANOVA. For a quick summary you can go HERE, but that is not really going to be enough.

Load this DATA into a new table called "

Lets define group as a factor. Remember how to do that?

Of course you can use graphics to get a feel for what is happening. Let's try a boxplot:

The first case we will examine is when you have three or more independent groups and you want to see whether or not there are differences between them - the test that accomplishes this is an Analysis of Variance - a between subjects test to determine if there is a difference between three or more groups.

Analysis of Variance (ANOVA) is a complex business, at this point you need to find a textbook and read the chapter(s) on ANOVA. For a quick summary you can go HERE, but that is not really going to be enough.

Load this DATA into a new table called "

**data**" in R Studio. You will note the first column indicates subject numbers from 1 to 150, the second column groups codes for 3 groups, and the third column actual data. Rename the columns "**subject**", "**group**", and "**rt**".Lets define group as a factor. Remember how to do that?

**data$group = factor(data$group)**Of course you can use graphics to get a feel for what is happening. Let's try a boxplot:

**boxplot(data$rt~data$group)**Box plots show differences between factors (each group) but they also provide information about the data within each group. The boxes represent the data that falls between the 25th and 75th percentile, the bottom and top of the error bars show the range from the 1st to 100th percentile, the line in the middle of the box reflects the median, and any circles that appear are suggested outliers.

However, the point of an ANOVA is to statistically assess whether or not there are differences between the three groups. As it turns out, this is very easy to do:

This command tells R to run an analysis of variance (aov) on data$rt with data$group as a factor. To see the actual results of the ANOVA, you need to type:

**analysis = aov(data$rt~data$group)**This command tells R to run an analysis of variance (aov) on data$rt with data$group as a factor. To see the actual results of the ANOVA, you need to type:

**summary(analysis)**. What you will see is the ANOVA summary table.The summary table gives you a lot of key information. Perhaps the most important are the F statistic and the p-value - in this case the p value is below 0.05 so the ANOVA suggests that there are differences between the groups.

Note, typically you would report this ANOVA as follows. Our analysis revealed that there was a difference between groups, F(2,147) = 17.58, p < 0.001.

There is something important to note here - an ANOVA simply tells you whether or not there is a difference between groups, it does not tell you where the difference is. See below and Lesson 7C.

The box plot above suggested this, another way to show this would be to plot the means with error bars. To do this, we need to add the library "

If you recall, this will install the gplots library and then the library command will load it. Now, to plot the group means with 95% confidence intervals as the error bars you can simply use:

If you wish to statistically test whether or not there are differences between specific groups you could run a series of independent samples t-tests - Group 1 versus Group 2, Group 1 versus Group 3, and Group 2 versus Group 3. There are reasons not to do this - see Field Chapter 10. For now, we will simply use a command that runs a series of comparisons for us:

This command will provide output that shows the results of specific tests between each group. As you will see, all of the groups are different from each other. Note, R offers many different types of post-hoc tests. You could do pairwise t-tests with a Bonferonni correction by using:

Another way to statistically pull apart an ANOVA is through contrast analysis. Essentially, contrast analysis repartitions the variance so that you can determine group effects. Try the following:

This shows you the contrasts of groups 2 and 3 relative to group 1 - the default contrast.

To use a different contrast, you can define it through the contrast command them re-execute the model.

The above code would conduct a contrast comparing the first two groups to the third.

The short version is that ANOVA is a special case of multiple regression - it is just a form of a linear model.

Try the following:

Look at the F statistic and p-value.

Look at the t statistics and p-value (recall F = t^2!!!). They are the same but the data is represented in a regression format because it was defined as a linear model and not a special case. To recover the ANOVA table.

The data HERE reflects data from 4 groups of people, with each group reflecting data from a different city. Run the ANOVA. What is the story here? What cities are the same and what cities differ? Ensure you report the results of your ANOVA and support your claims of group differences and similarities with a plot and with the results of a TukeyHSD test. Ensure that you make a statement and test all of the assumptions of ANOVA (see Lesson 7A).

Note, typically you would report this ANOVA as follows. Our analysis revealed that there was a difference between groups, F(2,147) = 17.58, p < 0.001.

There is something important to note here - an ANOVA simply tells you whether or not there is a difference between groups, it does not tell you where the difference is. See below and Lesson 7C.

The box plot above suggested this, another way to show this would be to plot the means with error bars. To do this, we need to add the library "

**gplots**". See Lesson 1A but you can do the following:**install.packages("gplots")**

library("gplots")library("gplots")

If you recall, this will install the gplots library and then the library command will load it. Now, to plot the group means with 95% confidence intervals as the error bars you can simply use:

**plotmeans(data$rt~data$group)****Post Hoc Analysis of ANOVA**If you wish to statistically test whether or not there are differences between specific groups you could run a series of independent samples t-tests - Group 1 versus Group 2, Group 1 versus Group 3, and Group 2 versus Group 3. There are reasons not to do this - see Field Chapter 10. For now, we will simply use a command that runs a series of comparisons for us:

**TukeyHSD(analysis)**This command will provide output that shows the results of specific tests between each group. As you will see, all of the groups are different from each other. Note, R offers many different types of post-hoc tests. You could do pairwise t-tests with a Bonferonni correction by using:

**pairwise.t.test(data$rt,data$group,p.adjust="bonf")****Contrast Analysis of ANOVA**Another way to statistically pull apart an ANOVA is through contrast analysis. Essentially, contrast analysis repartitions the variance so that you can determine group effects. Try the following:

**summary.lm(analysis)**This shows you the contrasts of groups 2 and 3 relative to group 1 - the default contrast.

To use a different contrast, you can define it through the contrast command them re-execute the model.

******contrasts(data$group) = contr.treatment(3, base = 3)****modelnew = aov(data$rt~data$group)****summary.lm(modelnew)**The above code would conduct a contrast comparing the first two groups to the third.

**contrasts(data$group) = contr.poly(3)**

modelnew = aov(data$rt~data$group)

summary.lm(modelnew)

The above code would conduct a contrast examining the polynomial trends between the groups (linear, quadratic).modelnew = aov(data$rt~data$group)

summary.lm(modelnew)

**For more understanding...**The short version is that ANOVA is a special case of multiple regression - it is just a form of a linear model.

Try the following:

**model1 = aov(data$rt~data$group)**

summary(model1)summary(model1)

Look at the F statistic and p-value.

**model2 = lm(data$rt~data$group)**

summary(model2)summary(model2)

Look at the t statistics and p-value (recall F = t^2!!!). They are the same but the data is represented in a regression format because it was defined as a linear model and not a special case. To recover the ANOVA table.

**anova(model2)**__Assignment Question__The data HERE reflects data from 4 groups of people, with each group reflecting data from a different city. Run the ANOVA. What is the story here? What cities are the same and what cities differ? Ensure you report the results of your ANOVA and support your claims of group differences and similarities with a plot and with the results of a TukeyHSD test. Ensure that you make a statement and test all of the assumptions of ANOVA (see Lesson 7A).