Testing Distributions
1. In the previous assignment we generated a sampling distribution of the mean - a key concept in statistics. However, experimentally, sometimes we simply want to examine the distributions of data that we have. This is easily done with histograms which you should be well familiar with at this point.
Create a sample data set: data = rnorm(1000,500,100)
Make a histogram of the data set: hist(data)
The histogram is one simple way to assess the normality of your data. However, there are other ways to do this.
2. One way to assess normality is to use a Shapiro Wilk test (see HERE). The Shapiro Wilk test assesses the normality of a data set and provides a p value - a test statistic that we will talk a lot more about later - that the states whether or not the data is normally distributed. If the p value of the test is less than 0.05 then the data is not normally distributed. If the p value of the test is greater than 0.05 then the data is normally distributed. Try a Shapiro Wilk test on your data by doing this: shapiro.test(data)
Assuming that you used rnorm, it would be very surprising if your data failed this test!
3. Another way to test normality is via a Kolmogorov-Smirnov test. The KS test compares your data against a distribution of your choice (e.g., normal). With this test, you must specify the mean and the standard deviation of the distribution so you need to use the values from your data to approximate those.
For example: ks.test(data,pnorm,mean(data),sd(data))
Try this now. This test would test the normality of data against a normal distribution with a mean and standard deviation equivalent to that of your data set.
Assignment Questions
The attached ZIP file contains the data for 10 experimental participants. For each participant:
1. Determine the mean, standard deviation, and range of the participants data.
2. Plot a histogram of the participants data.
3. Test the normality of the participants data using a Shapiro Wilk test.
4. Test the normality of the participants data using a KS test.
Ensure the above 4 components all run from a single R script that you can send me. In other words, I can run the one script and see the results from all 10 of the participants.
5. What is the Assumption of Normality? Clearly explain this as a comment at the bottom of your R script.
Create a sample data set: data = rnorm(1000,500,100)
Make a histogram of the data set: hist(data)
The histogram is one simple way to assess the normality of your data. However, there are other ways to do this.
2. One way to assess normality is to use a Shapiro Wilk test (see HERE). The Shapiro Wilk test assesses the normality of a data set and provides a p value - a test statistic that we will talk a lot more about later - that the states whether or not the data is normally distributed. If the p value of the test is less than 0.05 then the data is not normally distributed. If the p value of the test is greater than 0.05 then the data is normally distributed. Try a Shapiro Wilk test on your data by doing this: shapiro.test(data)
Assuming that you used rnorm, it would be very surprising if your data failed this test!
3. Another way to test normality is via a Kolmogorov-Smirnov test. The KS test compares your data against a distribution of your choice (e.g., normal). With this test, you must specify the mean and the standard deviation of the distribution so you need to use the values from your data to approximate those.
For example: ks.test(data,pnorm,mean(data),sd(data))
Try this now. This test would test the normality of data against a normal distribution with a mean and standard deviation equivalent to that of your data set.
Assignment Questions
The attached ZIP file contains the data for 10 experimental participants. For each participant:
1. Determine the mean, standard deviation, and range of the participants data.
2. Plot a histogram of the participants data.
3. Test the normality of the participants data using a Shapiro Wilk test.
4. Test the normality of the participants data using a KS test.
Ensure the above 4 components all run from a single R script that you can send me. In other words, I can run the one script and see the results from all 10 of the participants.
5. What is the Assumption of Normality? Clearly explain this as a comment at the bottom of your R script.