Independent Sample TTests
TTests are inferential statistical tests that allow you to draw conclusions about data. There are three types of ttests and they are used in very specific situations.
Independent sample t-tests are used when you want to compare two independent groups. For instance, if you want to compare fitness test scores between people from Victoria and people from Vancouver. In an independent test, if you are in one group you cannot by definition be in the other group. In the aforementioned example if you are from Victoria you cannot be from Vancouver and vice versa.
For example, as a researcher you have want to compare the number of calories eaten on average per day between people from Victoria and people from Vancouver. The sample data is HERE.
Load the data in R in a new variable called caloriedata. Note that the data has three columns, the first coding the subject number, the second coding the group (1 = Victoria, 2 = Vancouver), and the third average caloric intake. Let's give the columns in the table names:
names(caloriedata) = c('subject','group','calories')
The column for group is a factor, or independent variable. In R, you need to define factors as factors. Let's do this:
caloriedata$group = factor(caloriedata$group)
Also look at the subject column. You will note that the subject numbers range from 1 to 100. No subject number occurs twice and the subject numbers in groups 1 and 2 are different thus making this clearly an independent design.
Now, lets run an independent samples ttest to compare the average amount of calories consumed between the two groups.
analysis = t.test(caloriedata$calories ~ caloriedata$group)
Note that here there is nothing after the design - the default ttest in R is an independent ttest thus specifying mu (single sample) or stating Paired = True (dependent samples) is not necessary.
To see the results of the ttest, simple type:
print(analysis)
You should find that the result of the ttest is p = 0.08754.
Make sure you understand what degrees of freedom are. For an independent samples ttest the degrees of freedom are: df = n1 - 1 + n2 - 1.
A final note, the correct way to report a ttest is: "The results of our analysis revealed that group A scored better on the standardized test than group B, t(49) = 4.53, p < 0.05." In this sentence the 49 reflects the appropriate degrees of freedom, the 4.53 is the actual t statistic, and the p value is either stated exactly or as less than a certain value.
Testing the Assumption of Homogeneity of Variance
As you know a statistical test is not valid unless the assumptions are met. For an independent samples t-test, the assumption of homogeneity of variance needs to be tested. The easiest way to do this is to examine the variance ratio using Hartley's rule of thumb.
vars = aggregate(caloriedata$calories,list(caloriedata$group),var)
vars
Essentially, you are looking to ensure that the variance of one group is no more than four times the variance of the other group (or vice versa). In this case, you can clearly see the assumption is met.
A more formal way to test the homogeneity of variance is to use Levene's Test.
install.packages("car")
library(car)
leveneTest(caloriedata$calories~caloriedata$group)
The test is not significant thus the assumption of homogeneity of variance is met. Alternatively, you could use the Bartlett Test.
bartlett.test(caloriedata$calories~caloriedata$group)
Again, the test is not significant thus the assumption of homogeneity of variance is met. Although you have to install the car package to use the Levene Test, it is generally considered to be more reliable than the Barlett test, especially for smaller sample sizes.
Assignment Questions
1. Conduct a single sample ttest on the data HERE. Does the sample data differ from a population mean of 2900 (report the t statistic and the p value)? What is the sample mean and standard deviation? What is the 95% confidence interval of the sample data? Generated a bar plot of the sample data with a 95% confidence interval as the error bar.
2. Conduct a paired samples ttest on the data HERE. Do the two time points differ (report the t statistic and the p value)? What are the means and standard deviations of the two time points AND the difference scores of the data points? What are the 95% confidence intervals of the two time points AND the difference scores? Generated a bar plot of the the two time points and the difference scores with the 95% confidence intervals as the error bars.
3. Conduct an independent samples ttest on the data HERE. Do the two groups differ (report the t statistic and the p value)? What are the means and standard deviations of the two groups? What are the 95% confidence intervals of the two groups? Generated a bar plot of the the two time points with the 95% confidence intervals as the error bars.
Challenge Question
4. For Question 3, create a plot of the difference of the two group means with the appropriate 95% confidence interval. You can find some information on how to do this HERE.
Independent sample t-tests are used when you want to compare two independent groups. For instance, if you want to compare fitness test scores between people from Victoria and people from Vancouver. In an independent test, if you are in one group you cannot by definition be in the other group. In the aforementioned example if you are from Victoria you cannot be from Vancouver and vice versa.
For example, as a researcher you have want to compare the number of calories eaten on average per day between people from Victoria and people from Vancouver. The sample data is HERE.
Load the data in R in a new variable called caloriedata. Note that the data has three columns, the first coding the subject number, the second coding the group (1 = Victoria, 2 = Vancouver), and the third average caloric intake. Let's give the columns in the table names:
names(caloriedata) = c('subject','group','calories')
The column for group is a factor, or independent variable. In R, you need to define factors as factors. Let's do this:
caloriedata$group = factor(caloriedata$group)
Also look at the subject column. You will note that the subject numbers range from 1 to 100. No subject number occurs twice and the subject numbers in groups 1 and 2 are different thus making this clearly an independent design.
Now, lets run an independent samples ttest to compare the average amount of calories consumed between the two groups.
analysis = t.test(caloriedata$calories ~ caloriedata$group)
Note that here there is nothing after the design - the default ttest in R is an independent ttest thus specifying mu (single sample) or stating Paired = True (dependent samples) is not necessary.
To see the results of the ttest, simple type:
print(analysis)
You should find that the result of the ttest is p = 0.08754.
Make sure you understand what degrees of freedom are. For an independent samples ttest the degrees of freedom are: df = n1 - 1 + n2 - 1.
A final note, the correct way to report a ttest is: "The results of our analysis revealed that group A scored better on the standardized test than group B, t(49) = 4.53, p < 0.05." In this sentence the 49 reflects the appropriate degrees of freedom, the 4.53 is the actual t statistic, and the p value is either stated exactly or as less than a certain value.
Testing the Assumption of Homogeneity of Variance
As you know a statistical test is not valid unless the assumptions are met. For an independent samples t-test, the assumption of homogeneity of variance needs to be tested. The easiest way to do this is to examine the variance ratio using Hartley's rule of thumb.
vars = aggregate(caloriedata$calories,list(caloriedata$group),var)
vars
Essentially, you are looking to ensure that the variance of one group is no more than four times the variance of the other group (or vice versa). In this case, you can clearly see the assumption is met.
A more formal way to test the homogeneity of variance is to use Levene's Test.
install.packages("car")
library(car)
leveneTest(caloriedata$calories~caloriedata$group)
The test is not significant thus the assumption of homogeneity of variance is met. Alternatively, you could use the Bartlett Test.
bartlett.test(caloriedata$calories~caloriedata$group)
Again, the test is not significant thus the assumption of homogeneity of variance is met. Although you have to install the car package to use the Levene Test, it is generally considered to be more reliable than the Barlett test, especially for smaller sample sizes.
Assignment Questions
1. Conduct a single sample ttest on the data HERE. Does the sample data differ from a population mean of 2900 (report the t statistic and the p value)? What is the sample mean and standard deviation? What is the 95% confidence interval of the sample data? Generated a bar plot of the sample data with a 95% confidence interval as the error bar.
2. Conduct a paired samples ttest on the data HERE. Do the two time points differ (report the t statistic and the p value)? What are the means and standard deviations of the two time points AND the difference scores of the data points? What are the 95% confidence intervals of the two time points AND the difference scores? Generated a bar plot of the the two time points and the difference scores with the 95% confidence intervals as the error bars.
3. Conduct an independent samples ttest on the data HERE. Do the two groups differ (report the t statistic and the p value)? What are the means and standard deviations of the two groups? What are the 95% confidence intervals of the two groups? Generated a bar plot of the the two time points with the 95% confidence intervals as the error bars.
Challenge Question
4. For Question 3, create a plot of the difference of the two group means with the appropriate 95% confidence interval. You can find some information on how to do this HERE.