KRIGOLSON TEACHING
  • NEUROSCIENCE
    • NEUROSCIENCE 100 >
      • NEURO 100 INTRODUCTION
      • NEURO 101 ADVANCED
      • NEURO 102 AGING
      • NEURO 103 MEMORY
      • NEURO 104 DECISION MAKING
      • NEURO 105 LEARNING
      • Research Statistics
    • NRSC 500B / MEDS 470
  • Kinesiology
    • EPHE 245 >
      • LABORATORY
      • PRACTICUM
    • EPHE 357
  • STATISTICS
    • LECTURE >
      • INTRODUCTION TO R
      • DESCRIPTIVE STATISTICS
      • VISUALIZING DATA
      • Correlation and Regression
      • MULTIPLE REGRESSION
      • LOGIC OF NHST
      • T TESTS
      • ANOVA
      • POST HOC ANALYSIS
      • NON PARAMETRIC STATISTICS
      • FACTORIAL ANOVA
      • Repeated Measures ANOVA
      • Mixed ANOVA
      • MULTIVARIATE ANOVA
      • THE NEW STATISTICS
      • Bayesian Methods
    • ASSIGNMENTS >
      • Introduction to R >
        • INTRODUCTION TO R
        • LOADING DATA
        • DATA TABLES
      • Descriptive Statistics >
        • Mean, Median, and Mode
        • VARIANCE
        • CONFIDENCE INTERVALS
        • SHORTCUTS
      • Visualizing Data >
        • PLOTTING BASICS
        • BAR GRAPHS
        • BOXPLOTS
        • HISTOGRAMS
        • USING GGPLOT I
        • USING GGPLOT II
        • USING GGPLOT III
      • Correlation and Regression >
        • CORRELATION
        • REGRESSION
      • MULTIPLE REGRESSION >
        • MULTIPLE REGRESSION
      • Logic of NHST >
        • Sample Size and Variance
        • DISTRIBUTIONS
        • TESTING DISTRIBUTIONS
      • T-Tests >
        • Single Sample TTests
        • Paired Sample TTests
        • Independent Sample TTests
      • ANOVA >
        • ANOVA ASSUMPTIONS
        • ANOVA
      • POST HOC ANALYSIS >
        • POSTHOC ANALYSIS
      • NON PARAMETRIC STATISTICS >
        • WILCOXON TEST
        • WILCOXON SIGNED TEST
        • MULTIPLE GROUPS
      • FACTORIAL ANOVA
      • REPEATED MEASURES ANOVA >
        • RM ANOVA
        • TREND ANALYSIS
      • MIXED ANOVA
      • MULTIVARIATE ANOVA
      • THE NEW STATISTICS
      • BAYESIAN TTESTS
    • RESOURCES
    • R TIPS
  • Directed Studies
    • Advanced Topics in Motor Control A
    • Advanced Topics in Motor Control B
    • An Introduction to EEG
    • Advanced EEG and ERP Methods
    • Neural Correlates of Human Reward Processing
    • Independent Research Project
  • MATLAB
    • THE BASICS >
      • Hello World
      • BASIC MATHEMATICS
      • VARIABLES
      • Matrices
      • Writing Scripts
      • PATHS AND DIRECTORIES
      • USER INPUT
      • FOR LOOPS
      • WHILE LOOPS
      • IF STATEMENTS
      • RANDOM NUMBERS
    • STATISTICS >
      • LOADING DATA
      • DESCRIPTIVE STATISTICS
      • MAKING FUNCTIONS
      • BAR GRAPHS
      • LINE GRAPHS
      • TTESTS
    • EXPERIMENTS: THE BASICS >
      • DRAWING A CIRCLE
      • DRAWING MULTIPLE OBJECTS
      • DRAWING TEXT
      • DRAWING AN IMAGE
      • PLAYING A TONE
      • KEYBOARD INPUT
      • BUILDING A TRIAL
      • BUILDING TRIALS
      • NESTED LOOPS
      • RIGHT OR WRONG
      • SAVING DATA
    • EXPERIMENTS: ADVANCED >
      • STROOP
      • N BACK
      • Oddball
      • Animation
      • VIDEO
    • EEG and ERP Analysis >
      • ERP Analysis
  • RESOURCES
    • EXCEL
    • HOW TO READ A RESEARCH PAPER
    • HOW TO WRITE A RESEARCH PAPER
  • Workshops
    • Iowa State EEG Workshop 2018
  • Python
    • The Basics >
      • Setting Up Python
      • Hello, world!
      • Basic Math & Using Import
      • Variables
      • Matrices
      • Scripts
      • User Input
      • For Loops

REGRESSION

Linear regression is closely related to correlation. Recall that in correlation we sought to evaluate the relationship between two variables - let's call then X and Y for simplicity. If a relationship is present then there is a Pearson r value less than -0.1 or greater than 0.1 - if no relationship is present then the Pearson r value falls between -0.1 and 0.1. 

In regression, we seek to determine whether X can predict Y. For instance, do GRE scores predict success at graduate school? Do MCAT scores predict success at medical school? Does income predict happiness?

The general form of regression is Y = B0 + B1X. Hopefully you remember that this is essentially the equation of a line - the formula you learned in high school would have been Y = MX + B, which can be rewritten as Y = B + MX. In regression models, B0 is a constant and B1 is the coefficient for X. Think of it this way - income may range from $0 to $1,000,000 in our data and our happiness score might only range from 1 to 5. Thus, the regression model needs to tweak the income scores by multiplying them by B1 and adding B0 to predict a score between 1 and 5.

Load the data HERE into a table in R called data.

Running a regression is simple. All you need to do is use the following command:
model = lm(data$V1~data$V2)
summary(model)

You should see output that looks like this:

Call:
lm(formula = data$V1 ~ data$V2)

Residuals:
   Min     1Q Median     3Q    Max 
-36783  -9544  -1284   7467  74017 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  38504.2    11889.9   3.238  0.00141 ** 
data$V2       1260.2      234.9   5.364 2.29e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 16040 on 195 degrees of freedom
Multiple R-squared:  0.1286,    Adjusted R-squared:  0.1241 
F-statistic: 28.77 on 1 and 195 DF,  p-value: 2.294e-07

Essentially, what R is telling us is that there is a model that fits - p < 0.05. You will notice that it returns a multiple R-squared value which is the square of correlation coefficient r. It also return B0 and B1 which in this case are 38504.2 and 1260.2, respectively. Thus, the regression equation would be Y = 38504.2 + 1260.2X for this data.

Note, a model might not always fit the data. You can see the linear model by using the following commands:
plot(data$V2~data$V1)
abline(lm(data$V2~data$V1))

Picture
Assignment
Using the data HERE, construct regression models for all of the variables in columns 2 to 6 against column 1. Thus, you will be running 5 separate linear regressions. Hand in the results of each model and a plot of each model.
  • NEUROSCIENCE
    • NEUROSCIENCE 100 >
      • NEURO 100 INTRODUCTION
      • NEURO 101 ADVANCED
      • NEURO 102 AGING
      • NEURO 103 MEMORY
      • NEURO 104 DECISION MAKING
      • NEURO 105 LEARNING
      • Research Statistics
    • NRSC 500B / MEDS 470
  • Kinesiology
    • EPHE 245 >
      • LABORATORY
      • PRACTICUM
    • EPHE 357
  • STATISTICS
    • LECTURE >
      • INTRODUCTION TO R
      • DESCRIPTIVE STATISTICS
      • VISUALIZING DATA
      • Correlation and Regression
      • MULTIPLE REGRESSION
      • LOGIC OF NHST
      • T TESTS
      • ANOVA
      • POST HOC ANALYSIS
      • NON PARAMETRIC STATISTICS
      • FACTORIAL ANOVA
      • Repeated Measures ANOVA
      • Mixed ANOVA
      • MULTIVARIATE ANOVA
      • THE NEW STATISTICS
      • Bayesian Methods
    • ASSIGNMENTS >
      • Introduction to R >
        • INTRODUCTION TO R
        • LOADING DATA
        • DATA TABLES
      • Descriptive Statistics >
        • Mean, Median, and Mode
        • VARIANCE
        • CONFIDENCE INTERVALS
        • SHORTCUTS
      • Visualizing Data >
        • PLOTTING BASICS
        • BAR GRAPHS
        • BOXPLOTS
        • HISTOGRAMS
        • USING GGPLOT I
        • USING GGPLOT II
        • USING GGPLOT III
      • Correlation and Regression >
        • CORRELATION
        • REGRESSION
      • MULTIPLE REGRESSION >
        • MULTIPLE REGRESSION
      • Logic of NHST >
        • Sample Size and Variance
        • DISTRIBUTIONS
        • TESTING DISTRIBUTIONS
      • T-Tests >
        • Single Sample TTests
        • Paired Sample TTests
        • Independent Sample TTests
      • ANOVA >
        • ANOVA ASSUMPTIONS
        • ANOVA
      • POST HOC ANALYSIS >
        • POSTHOC ANALYSIS
      • NON PARAMETRIC STATISTICS >
        • WILCOXON TEST
        • WILCOXON SIGNED TEST
        • MULTIPLE GROUPS
      • FACTORIAL ANOVA
      • REPEATED MEASURES ANOVA >
        • RM ANOVA
        • TREND ANALYSIS
      • MIXED ANOVA
      • MULTIVARIATE ANOVA
      • THE NEW STATISTICS
      • BAYESIAN TTESTS
    • RESOURCES
    • R TIPS
  • Directed Studies
    • Advanced Topics in Motor Control A
    • Advanced Topics in Motor Control B
    • An Introduction to EEG
    • Advanced EEG and ERP Methods
    • Neural Correlates of Human Reward Processing
    • Independent Research Project
  • MATLAB
    • THE BASICS >
      • Hello World
      • BASIC MATHEMATICS
      • VARIABLES
      • Matrices
      • Writing Scripts
      • PATHS AND DIRECTORIES
      • USER INPUT
      • FOR LOOPS
      • WHILE LOOPS
      • IF STATEMENTS
      • RANDOM NUMBERS
    • STATISTICS >
      • LOADING DATA
      • DESCRIPTIVE STATISTICS
      • MAKING FUNCTIONS
      • BAR GRAPHS
      • LINE GRAPHS
      • TTESTS
    • EXPERIMENTS: THE BASICS >
      • DRAWING A CIRCLE
      • DRAWING MULTIPLE OBJECTS
      • DRAWING TEXT
      • DRAWING AN IMAGE
      • PLAYING A TONE
      • KEYBOARD INPUT
      • BUILDING A TRIAL
      • BUILDING TRIALS
      • NESTED LOOPS
      • RIGHT OR WRONG
      • SAVING DATA
    • EXPERIMENTS: ADVANCED >
      • STROOP
      • N BACK
      • Oddball
      • Animation
      • VIDEO
    • EEG and ERP Analysis >
      • ERP Analysis
  • RESOURCES
    • EXCEL
    • HOW TO READ A RESEARCH PAPER
    • HOW TO WRITE A RESEARCH PAPER
  • Workshops
    • Iowa State EEG Workshop 2018
  • Python
    • The Basics >
      • Setting Up Python
      • Hello, world!
      • Basic Math & Using Import
      • Variables
      • Matrices
      • Scripts
      • User Input
      • For Loops