KRIGOLSON TEACHING
  • NEUROSCIENCE
    • NEUROSCIENCE 100 >
      • NEURO 100 INTRODUCTION
      • NEURO 101 ADVANCED
      • NEURO 102 AGING
      • NEURO 103 MEMORY
      • NEURO 104 DECISION MAKING
      • NEURO 105 LEARNING
      • Research Statistics
    • NRSC 500B / MEDS 470
  • Kinesiology
    • EPHE 245 >
      • LABORATORY
      • PRACTICUM
    • EPHE 357
  • STATISTICS
    • LECTURE >
      • INTRODUCTION TO R
      • DESCRIPTIVE STATISTICS
      • VISUALIZING DATA
      • Correlation and Regression
      • MULTIPLE REGRESSION
      • LOGIC OF NHST
      • T TESTS
      • ANOVA
      • POST HOC ANALYSIS
      • NON PARAMETRIC STATISTICS
      • FACTORIAL ANOVA
      • Repeated Measures ANOVA
      • Mixed ANOVA
      • MULTIVARIATE ANOVA
      • THE NEW STATISTICS
      • Bayesian Methods
    • ASSIGNMENTS >
      • Introduction to R >
        • INTRODUCTION TO R
        • LOADING DATA
        • DATA TABLES
      • Descriptive Statistics >
        • Mean, Median, and Mode
        • VARIANCE
        • CONFIDENCE INTERVALS
        • SHORTCUTS
      • Visualizing Data >
        • PLOTTING BASICS
        • BAR GRAPHS
        • BOXPLOTS
        • HISTOGRAMS
        • USING GGPLOT I
        • USING GGPLOT II
        • USING GGPLOT III
      • Correlation and Regression >
        • CORRELATION
        • REGRESSION
      • MULTIPLE REGRESSION >
        • MULTIPLE REGRESSION
      • Logic of NHST >
        • Sample Size and Variance
        • DISTRIBUTIONS
        • TESTING DISTRIBUTIONS
      • T-Tests >
        • Single Sample TTests
        • Paired Sample TTests
        • Independent Sample TTests
      • ANOVA >
        • ANOVA ASSUMPTIONS
        • ANOVA
      • POST HOC ANALYSIS >
        • POSTHOC ANALYSIS
      • NON PARAMETRIC STATISTICS >
        • WILCOXON TEST
        • WILCOXON SIGNED TEST
        • MULTIPLE GROUPS
      • FACTORIAL ANOVA
      • REPEATED MEASURES ANOVA >
        • RM ANOVA
        • TREND ANALYSIS
      • MIXED ANOVA
      • MULTIVARIATE ANOVA
      • THE NEW STATISTICS
      • BAYESIAN TTESTS
    • RESOURCES
    • R TIPS
  • Directed Studies
    • Advanced Topics in Motor Control A
    • Advanced Topics in Motor Control B
    • An Introduction to EEG
    • Advanced EEG and ERP Methods
    • Neural Correlates of Human Reward Processing
    • Independent Research Project
  • MATLAB
    • THE BASICS >
      • Hello World
      • BASIC MATHEMATICS
      • VARIABLES
      • Matrices
      • Writing Scripts
      • PATHS AND DIRECTORIES
      • USER INPUT
      • FOR LOOPS
      • WHILE LOOPS
      • IF STATEMENTS
      • RANDOM NUMBERS
    • STATISTICS >
      • LOADING DATA
      • DESCRIPTIVE STATISTICS
      • MAKING FUNCTIONS
      • BAR GRAPHS
      • LINE GRAPHS
      • TTESTS
    • EXPERIMENTS: THE BASICS >
      • DRAWING A CIRCLE
      • DRAWING MULTIPLE OBJECTS
      • DRAWING TEXT
      • DRAWING AN IMAGE
      • PLAYING A TONE
      • KEYBOARD INPUT
      • BUILDING A TRIAL
      • BUILDING TRIALS
      • NESTED LOOPS
      • RIGHT OR WRONG
      • SAVING DATA
    • EXPERIMENTS: ADVANCED >
      • STROOP
      • N BACK
      • Oddball
      • Animation
      • VIDEO
    • EEG and ERP Analysis >
      • ERP Analysis
  • RESOURCES
    • EXCEL
    • HOW TO READ A RESEARCH PAPER
    • HOW TO WRITE A RESEARCH PAPER
  • Workshops
    • Iowa State EEG Workshop 2018
  • Python
    • The Basics >
      • Setting Up Python
      • Hello, world!
      • Basic Math & Using Import
      • Variables
      • Matrices
      • Scripts
      • User Input
      • For Loops

ANOVA ASSUMPTIONS

In earlier lessons you learned how to test to see whether or not two groups differed - an independent samples t-test. But what do you do if you have more than two groups?

The first case we will examine is when you have three or more independent groups and you want to see whether or not there are differences between them - the test that accomplishes this is an Analysis of Variance - a between subjects test to determine if there is a difference between three or more groups.

Analysis of Variance (ANOVA) is a complex business, at this point you need to find a textbook and read the chapter(s) on ANOVA. For a quick summary you can go 
HERE, but that is not really going to be enough.

Load this DATA into a new table called "data" in R Studio. You will note the first column indicates subject numbers from 1 to 150, the second column groups codes for 3 groups, and the third column actual data. Rename the columns "subject", "group", and "rt".

Lets define group as a factor. Remember how to do that? 

data$group = factor(data$group)

Before we actually run an ANOVA we need to test the assumptions of this test. What are assumptions? In short, each statistical test can only be run if the data meet certain criteria. For ANOVA, there are four assumptions that you need to meet.

Assumption One: Between Group Independence. The groups are independent. Essentially, your groups cannot be related - for instance - if you are interested in studying age this is easy - a "young" group is naturally independent of groups that are "middle aged" and "elderly". This may not be as true in all instances and you need to be sure. Note, there is no statistical test for this, you need to use logic and common sense. For example, the means and variances of a group may be identical but they can be truly independent. Consider this - you are interested in how many calories a day people eat. You create a group in Berlin, Germany and Auckland, New Zealand and find they eat the same number of calories per day as reflected by the means and the variances. These groups are most likely independent but they share common statistical properties.

Assumption Two: Within Group Sampling and Independence. The members of each groups are sampled randomly and are independent of each other. This relates to research methods and is beyond the scope of this tutorial, but typically we do the best we can here.

Assumption Three: Normality. The data for each group are drawn from a normally distributed population. Actually, this is not 100%. The actual assumption is that the sampling distribution of the mean of the population from which the data is drawn is normally distributed. There is a lot of theory here, but in short, this is actually almost always true and most people have a misperception of this assumption. Most people think that the assumption means that the data itself is normally distributed which is not true at all. The confusion stems from the fact that most textbooks state something like this "If the data is normally distributed then the assumption is met". That is a true statement, if your data is normally distributed then it is extremely likely that the sampling distribution of the mean of the population the data is from is normally distributed. However, if the data is not normally distributed then it is still quite possible that the sampling distribution of the mean of the population data is normally distributed. You should discuss this one with your statistics professor if you are still confused!


In spite of this, most people like to test whether the data is normally distributed or not. Here are three ways to do this:

a. Plot Histograms for each group and assess this through visual inspection. For the above data:

par(mfcol=c(1, 3))
hist(data$rt[data$group==1])
hist(data$rt[data$group==2])
hist(data$rt[data$group==3])

b. Examine the skew and kurtosis of the data for each group. You can read more about Skewness and Kurtosis HERE and HERE, but essentially Skewness assesses how much to the left or right the distribution is pulled (0 is perfectly normal) and kurtosis assesses how flat or peaked the distribution is (3 is perfectly normal).

To test the skewness and kurtosis for the first group you would use the following commands:
install.packages("moments")
library("moments")
skewness(data$rt[data$group==1])

kurtosis(data$rt[data$group==1])
skewness(data$rt[data$group==2])

kurtosis(data$rt[data$group==2])
skewness(data$rt[data$group==3])

kurtosis(data$rt[data$group==3])


c. Use QQ plots to assess normality. On a QQ plot the data is normal if it all falls on a line. Try the following code:
qqnorm(data$rt[data$group==1])
qqline(data$rt[data$group==1])

qqnorm(data$rt[data$group==2])
qqline(data$rt[data$group==2])
qqnorm(data$rt[data$group==3])
qqline(data$rt[data$group==3])

Picture





​The data here is all pretty close to the line, which suggests that the data is normal.

d. Use a formal statistical test that assesses normality like the Shapiro - Wilk Test.
shapiro.test(data$rt[data$group==1])
shapiro.test(data$rt[data$group==2])
shapiro.test(data$rt[data$group==3])

If the p value of this test is not significant, i.e., p > 0.05, then it suggests that the data is normally distributed.

Assumption Four: Homogeneity of Variance. ANOVA assumes that the variances of all groups are equivalent. There are three ways to test this.

a. Plot Histograms (ensure the number of bins and axes are the same) and visually ensure that the variances are roughly equivalent. See above.

b. Compute the variances for each group:
var(data$rt[data$group==1])
var(data$rt[data$group==2])
var(data$rt[data$group==3])
A general rule of thumb is that if the smallest variance is within 4 magnitudes of the largest variance then the assumption is met. In other words varmax < 4 x varmin.

c. Use a statistical test of the assumption. The Bartlett Test tests the assumption directly and returns a p value. If the p value is significant, p < 0.05, then the assumption is not met. Dealing with a violation of the assumption is beyond this assignment, but you do need to be aware if any of the assumptions are violated.
bartlett.test(data$rt~data$group)
Alternatively, you could use an alternative test such as Levene's Test.
leveneTest(data$rt~data$group)

  • NEUROSCIENCE
    • NEUROSCIENCE 100 >
      • NEURO 100 INTRODUCTION
      • NEURO 101 ADVANCED
      • NEURO 102 AGING
      • NEURO 103 MEMORY
      • NEURO 104 DECISION MAKING
      • NEURO 105 LEARNING
      • Research Statistics
    • NRSC 500B / MEDS 470
  • Kinesiology
    • EPHE 245 >
      • LABORATORY
      • PRACTICUM
    • EPHE 357
  • STATISTICS
    • LECTURE >
      • INTRODUCTION TO R
      • DESCRIPTIVE STATISTICS
      • VISUALIZING DATA
      • Correlation and Regression
      • MULTIPLE REGRESSION
      • LOGIC OF NHST
      • T TESTS
      • ANOVA
      • POST HOC ANALYSIS
      • NON PARAMETRIC STATISTICS
      • FACTORIAL ANOVA
      • Repeated Measures ANOVA
      • Mixed ANOVA
      • MULTIVARIATE ANOVA
      • THE NEW STATISTICS
      • Bayesian Methods
    • ASSIGNMENTS >
      • Introduction to R >
        • INTRODUCTION TO R
        • LOADING DATA
        • DATA TABLES
      • Descriptive Statistics >
        • Mean, Median, and Mode
        • VARIANCE
        • CONFIDENCE INTERVALS
        • SHORTCUTS
      • Visualizing Data >
        • PLOTTING BASICS
        • BAR GRAPHS
        • BOXPLOTS
        • HISTOGRAMS
        • USING GGPLOT I
        • USING GGPLOT II
        • USING GGPLOT III
      • Correlation and Regression >
        • CORRELATION
        • REGRESSION
      • MULTIPLE REGRESSION >
        • MULTIPLE REGRESSION
      • Logic of NHST >
        • Sample Size and Variance
        • DISTRIBUTIONS
        • TESTING DISTRIBUTIONS
      • T-Tests >
        • Single Sample TTests
        • Paired Sample TTests
        • Independent Sample TTests
      • ANOVA >
        • ANOVA ASSUMPTIONS
        • ANOVA
      • POST HOC ANALYSIS >
        • POSTHOC ANALYSIS
      • NON PARAMETRIC STATISTICS >
        • WILCOXON TEST
        • WILCOXON SIGNED TEST
        • MULTIPLE GROUPS
      • FACTORIAL ANOVA
      • REPEATED MEASURES ANOVA >
        • RM ANOVA
        • TREND ANALYSIS
      • MIXED ANOVA
      • MULTIVARIATE ANOVA
      • THE NEW STATISTICS
      • BAYESIAN TTESTS
    • RESOURCES
    • R TIPS
  • Directed Studies
    • Advanced Topics in Motor Control A
    • Advanced Topics in Motor Control B
    • An Introduction to EEG
    • Advanced EEG and ERP Methods
    • Neural Correlates of Human Reward Processing
    • Independent Research Project
  • MATLAB
    • THE BASICS >
      • Hello World
      • BASIC MATHEMATICS
      • VARIABLES
      • Matrices
      • Writing Scripts
      • PATHS AND DIRECTORIES
      • USER INPUT
      • FOR LOOPS
      • WHILE LOOPS
      • IF STATEMENTS
      • RANDOM NUMBERS
    • STATISTICS >
      • LOADING DATA
      • DESCRIPTIVE STATISTICS
      • MAKING FUNCTIONS
      • BAR GRAPHS
      • LINE GRAPHS
      • TTESTS
    • EXPERIMENTS: THE BASICS >
      • DRAWING A CIRCLE
      • DRAWING MULTIPLE OBJECTS
      • DRAWING TEXT
      • DRAWING AN IMAGE
      • PLAYING A TONE
      • KEYBOARD INPUT
      • BUILDING A TRIAL
      • BUILDING TRIALS
      • NESTED LOOPS
      • RIGHT OR WRONG
      • SAVING DATA
    • EXPERIMENTS: ADVANCED >
      • STROOP
      • N BACK
      • Oddball
      • Animation
      • VIDEO
    • EEG and ERP Analysis >
      • ERP Analysis
  • RESOURCES
    • EXCEL
    • HOW TO READ A RESEARCH PAPER
    • HOW TO WRITE A RESEARCH PAPER
  • Workshops
    • Iowa State EEG Workshop 2018
  • Python
    • The Basics >
      • Setting Up Python
      • Hello, world!
      • Basic Math & Using Import
      • Variables
      • Matrices
      • Scripts
      • User Input
      • For Loops