This tutorial is designed to outline the steps in MATLAB for a simple ERP analysis of a single subject.
First, you need to download and unzip the sample data and the channel locations file that is HERE . You will also need to put my GitHub repos for MATLAB-EEG-preProcessing, MATLAB-EEG-postProcessing, MATLAB-EEG-icaTools and MATLAB-EEG-fileIO into your MATLAB path. Finally, you will need to install EEGLAB which is available HERE.
This tutorial assumes that you have all of the aforementioned bits and pieces installed and they are all in the MATLAB path. Put the tutorial data in a folder and use MATLAB to navigate to that folder.
IMPORTANT. This tutorial is not designed to teach you what these steps mean, it is simply designed to teach you how to process and analyze EEG data.
First, you need to download and unzip the sample data and the channel locations file that is HERE . You will also need to put my GitHub repos for MATLAB-EEG-preProcessing, MATLAB-EEG-postProcessing, MATLAB-EEG-icaTools and MATLAB-EEG-fileIO into your MATLAB path. Finally, you will need to install EEGLAB which is available HERE.
This tutorial assumes that you have all of the aforementioned bits and pieces installed and they are all in the MATLAB path. Put the tutorial data in a folder and use MATLAB to navigate to that folder.
IMPORTANT. This tutorial is not designed to teach you what these steps mean, it is simply designed to teach you how to process and analyze EEG data.
1. Loading Data
The tutorial data consists of 10 files of EEG data from an oddball ERP task. Because the data is in Brain Vision Format, there are three files for each person. The .vhdr files is a header file that contains information about the recording. You can open it with a text editor to see the contents if you wish. The .vmrk file is the marker file that specifies when the events of interest (e.g., fixation, oddball stimulus, control stimulus, etc, occurred). You can also open this file with a text editor to see the contents. Finally, there is the .eeg file which has the EEG data in binary format. Occasionally, the EEG file will have a .dat ending but it will still work.
To load the first EEG data file, type the following commands into the EEG command window or put them in a script. Make sure you change the pathName to match the path to your own data.
pathName = '/Users/krigolson/Desktop/tutorialData';
fileName = 'Oddball1.vhdr';
EEG = doLoadBVData(pathName,fileName);
This will load the EEG data into MATLAB into a variable called EEG. If this does not happen, something is wrong with your path installs / path configurations.
To load the first EEG data file, type the following commands into the EEG command window or put them in a script. Make sure you change the pathName to match the path to your own data.
pathName = '/Users/krigolson/Desktop/tutorialData';
fileName = 'Oddball1.vhdr';
EEG = doLoadBVData(pathName,fileName);
This will load the EEG data into MATLAB into a variable called EEG. If this does not happen, something is wrong with your path installs / path configurations.
2. Filtering Data
Now, we will filter the data. Before we filter the data, we need to define filter parameters. We will set a low cutoff of 0.5 Hz and a high cutoff of 30 Hz for a bandpass filter. We will also set a notch filter at 60 Hz.
filterParameters.low = 0.5;
filterParameters.high = 30;
filterParameters.notch = 60;
To filter the data, we simply use the following command:
EEG = doFilter(EEG,filterParameters);
Your screen display in MATLAB should look something like this:
filterParameters.low = 0.5;
filterParameters.high = 30;
filterParameters.notch = 60;
To filter the data, we simply use the following command:
EEG = doFilter(EEG,filterParameters);
Your screen display in MATLAB should look something like this:
The display shows the filters that were implemented. If you want to learn more about digital filtering I strongly recommend you read Chapter 7 and watch this video HERE just for starters.
If you want a demo on what filtering actually does to EEG data, run this code HERE.
The script I have used to far is HERE.
If you want a demo on what filtering actually does to EEG data, run this code HERE.
The script I have used to far is HERE.
3. Data Cleaning
The next step involves some fairly sophisticated data cleaning using the EEGLAB clean_line and clean_rawdata functions. Again, a discussion of these tools is beyond the scope of this tutorial. We have hidden these functions in our doCleanData function. The default settings were recommended by Arnaud Delorme, I would consider changing these only with significant thought and discussion. You would have to edit the doCleanData function to modify these parameters.
clean_rawdata will be installed automatically when you open EEGLAB. clean_line will not. Download clean_line HERE and put it in your EEGLAB plugins directory. Add it to your MATLAB path or just simply run EEGLAB and it will be installed. Note, we have modified the clean_line function to avoid pop up windows, so, the one on GitHub will not run as smoothly.
To run doCleanData simply use the following command:
EEG = doCleanData(EEG);
You will see a bunch of output, but we have created a summary which is in the variable EEG.quality.cleanraw.
clean_rawdata will be installed automatically when you open EEGLAB. clean_line will not. Download clean_line HERE and put it in your EEGLAB plugins directory. Add it to your MATLAB path or just simply run EEGLAB and it will be installed. Note, we have modified the clean_line function to avoid pop up windows, so, the one on GitHub will not run as smoothly.
To run doCleanData simply use the following command:
EEG = doCleanData(EEG);
You will see a bunch of output, but we have created a summary which is in the variable EEG.quality.cleanraw.
This output tells you the percent of data removed, how many channels were removed, and how many Boundary Events were created - these are markers for the discontinuities in the data created by clean_rawdata.
The script I have used this far is HERE.
The script I have used this far is HERE.
4. Reference Data
Before running ICA, it is recommended to re-reference the EEG data to an average reference. This is easy to do with the following command:
EEG = doICAReference(EEG);
The script I have used to get this far is HERE.
EEG = doICAReference(EEG);
The script I have used to get this far is HERE.
5. Independent Component Analysis
Next, we use independent component analysis (more HERE) to remove ocular artifacts (blinks, saccades) from the EEG data. ICA is quite complex, but the simple version is if you examined the data before and after ICA, you should see that the ocular components are removed.
Make sure you have the Github toolbox MATLAB-EEG-icaTools installed and synced.
To run ICA is simple, and it requires two commands:
EEG = doICA(EEG);
EEG = doRemoveOcularICAComponents(EEG);
Make sure you have the Github toolbox MATLAB-EEG-icaTools installed and synced.
To run ICA is simple, and it requires two commands:
EEG = doICA(EEG);
EEG = doRemoveOcularICAComponents(EEG);
Afterwards, you will see a whole bunch of messages in the display window. It is important to inspect the EEG.quality.ica variable afterwards. You want to see that some ocular components were removed - in this case, 2. You also want to check that the average brain percentage is above 30%.
The script I have used this far is HERE.
The script I have used this far is HERE.
6. Interpolate Channels
The data cleaning step typically results in the removal of channels. We need to put them back in the data set, so we have our full channel array. However, we do not want to do this before ICA because interpolation increases the correlation between channels (which is bad for ICA).
Now, we actually have to go back to the start because we need to keep our original channel list. So, right after we load the data (step 1) we would add a line to store the original channels:
originalChannels = EEG.chanlocs;
And then after we run ICA, we would interpolate the missing channels by using this command:
EEG = doInterpolate(EEG,originalChannels);
I will not post a screen shot here, but with this data set you should see that you have 61 channels before interpolation and you should be back to 63 channels after interpolation.
The script I have used to get to this step is HERE.
Now, we actually have to go back to the start because we need to keep our original channel list. So, right after we load the data (step 1) we would add a line to store the original channels:
originalChannels = EEG.chanlocs;
And then after we run ICA, we would interpolate the missing channels by using this command:
EEG = doInterpolate(EEG,originalChannels);
I will not post a screen shot here, but with this data set you should see that you have 61 channels before interpolation and you should be back to 63 channels after interpolation.
The script I have used to get to this step is HERE.
7. Epoch Data
The next step in analysis is to epoch the data - extract segments of data around the event markers for the oddball task. In the oddball task, we had two primary events - the appearance of the control and oddball circles. In this data, the oddball marker is S202 and the control marker is S203. To extract the epochs of data, we first define the markers and a time window - we will use 200 ms before the event to 600 ms after, and then call the doEpoch function.
epochMarkers = {'S202','S203'};
epochTimes = [-0.2 0.6];
EEG = doEpochData(EEG,epochMarkers,epochTimes);
You will not see anything exciting happen, but if you look in the EEG variable you will see that we now have three dimensions for EEG.data, channels x time x events. You should see 63 x 400 x 106. Note, we have less events than the expected total due to artifacts in the data.
The script I used is HERE.
epochMarkers = {'S202','S203'};
epochTimes = [-0.2 0.6];
EEG = doEpochData(EEG,epochMarkers,epochTimes);
You will not see anything exciting happen, but if you look in the EEG variable you will see that we now have three dimensions for EEG.data, channels x time x events. You should see 63 x 400 x 106. Note, we have less events than the expected total due to artifacts in the data.
The script I used is HERE.
8. Baseline Correction
The next step in analysis is the baseline correction. Again, I am not discussing the theory here, but essentially we take the average voltage before the onset of the stimulus and subtract it from the waveform for every channel and trial to "zero" the data to a common baseline.
baselineWindow = [-0.2 0];
EEG = doBaseline(EEG,baselineWindow);
As before, there is not much to see here but if you plotted individual trials before and after the baseline correction you would see that the mean of the baseline post correction is zero.
The script I used to do this is HERE.
baselineWindow = [-0.2 0];
EEG = doBaseline(EEG,baselineWindow);
As before, there is not much to see here but if you plotted individual trials before and after the baseline correction you would see that the mean of the baseline post correction is zero.
The script I used to do this is HERE.
9. Make Event-Related Potentials
Well, finally the "end". Now, we just average the trials together for the oddball and control circles.
% make ERPS
ERP = doERP(EEG,epochMarkers);
It is probably worth plotting the result so you can see the ERP. To do this, we need to identify a channel where the P300 is maximal - it is 52, Pz. I know this from experience but there is another way to figure this out which I will show you in the next steps. For now, let's plot the ERP.
% let's plot the P300, we will use channel Pz
erpChannel = 52; % Pz
plot(ERP.times,ERP.data(erpChannel,:,1));
hold on;
plot(ERP.times,ERP.data(erpChannel,:,2));
You will note that the individual ERP is quite noisy, this is to be expected. But, the P300 is clearly visible.
The next thing we should do is save the ERP variable that is created for when we want to do group analysis.
save('Subject 1','ERP')
Now, you should repeat these steps and process all 10 subjects in the tutorial data set. Note, you will want to change the name of the file you save each time.
The script I used is HERE.
10. Group Level Analysis
At this point in time, you should have 10 files, 1 for each subject, containing the variable ERP. If you have not done so, do so now. You will note that the individual ERPs were quite variable. That is to be expected and is normal.
Now, we need to use a loop to load the 10 subject files and put the data into a variable called erpData.
for i = 1:10
fileName = ['Subject ' num2str(i) '.mat'];
load(fileName);
erpData(:,:,:,i) = ERP.data;
end
When this is done, check the size of erpData. You should see that it is 63 (channels) x 400 (time) x 2 (conditions: oddball, control) x 10 (subjects).
size(erpData)
Now, we want to average across participants to get a "grand average", we will plot this as well.
grandERP = mean(erpData,4);
erpChannel = 52; % Pz
subplot(1,3,1);
plot(ERP.times,grandERP(erpChannel,:,1));
hold on;
plot(ERP.times,grandERP(erpChannel,:,2));
Next, we will create a difference waveform for each person. We do this for statistical analysis. We will also plot the difference waveform.
differenceERPs = squeeze(erpData(:,:,1,:) - erpData(:,:,2,:));
Note the squeeze command just removes the singleton dimension. Let's average across participants and plot the difference wave.
grandDifference = mean(differenceERPs,3);
subplot(1,3,2);
plot(ERP.times,grandDifference(erpChannel,:));
Now, for statistical analysis we need to know the peak time. This is simply the maximal value of the grand average difference waveform.
[peakValue peakPoint] = max(grandDifference(erpChannel,:))
Note that peakPoint is not a time, but the index position of the maximal value.
For statistical analysis, we will simply take the mean voltage +/- 20 data points around peakPoint for each person. We will average these data into a single value for each person.
peakData = squeeze(differenceERPs(erpChannel,peakPoint-20:peakPoint+20,:))
peakData = mean(peakData,1);
We can statistically test the existence of the P300 by using a single sample ttest against zero. The logic if simple, because we created difference waveforms if there was no difference between the oddball and control P300 peaks, then the ttest would not be significant.
[h,p,ci,stats] = ttest(peakData);
You will note the p is less than 0.05 so there is a statistical P300 present.
Finally, we can plot the topography of the peak to provide further justification that this is a P300.
subplot(1,3,3);
doPlot2DTopo(grandDifference(:,peakPoint),ERP.chanlocs);
For a paper, you could write this up as follows.
An analysis of the ERP data revealed a P300 ERP component (t(9) = 4.69, p = 0.0011. On the left panel of Figure 1, we see the grand average conditional ERP waveforms for channel Pz. In the middle panel, we see the grand average difference waveform. In the right panel of Figure 1, we see the P300 topography which confirms the existence of the component and the channel used for statistical quantification of the P300.
The script I used to do the group analysis is HERE.
Now, we need to use a loop to load the 10 subject files and put the data into a variable called erpData.
for i = 1:10
fileName = ['Subject ' num2str(i) '.mat'];
load(fileName);
erpData(:,:,:,i) = ERP.data;
end
When this is done, check the size of erpData. You should see that it is 63 (channels) x 400 (time) x 2 (conditions: oddball, control) x 10 (subjects).
size(erpData)
Now, we want to average across participants to get a "grand average", we will plot this as well.
grandERP = mean(erpData,4);
erpChannel = 52; % Pz
subplot(1,3,1);
plot(ERP.times,grandERP(erpChannel,:,1));
hold on;
plot(ERP.times,grandERP(erpChannel,:,2));
Next, we will create a difference waveform for each person. We do this for statistical analysis. We will also plot the difference waveform.
differenceERPs = squeeze(erpData(:,:,1,:) - erpData(:,:,2,:));
Note the squeeze command just removes the singleton dimension. Let's average across participants and plot the difference wave.
grandDifference = mean(differenceERPs,3);
subplot(1,3,2);
plot(ERP.times,grandDifference(erpChannel,:));
Now, for statistical analysis we need to know the peak time. This is simply the maximal value of the grand average difference waveform.
[peakValue peakPoint] = max(grandDifference(erpChannel,:))
Note that peakPoint is not a time, but the index position of the maximal value.
For statistical analysis, we will simply take the mean voltage +/- 20 data points around peakPoint for each person. We will average these data into a single value for each person.
peakData = squeeze(differenceERPs(erpChannel,peakPoint-20:peakPoint+20,:))
peakData = mean(peakData,1);
We can statistically test the existence of the P300 by using a single sample ttest against zero. The logic if simple, because we created difference waveforms if there was no difference between the oddball and control P300 peaks, then the ttest would not be significant.
[h,p,ci,stats] = ttest(peakData);
You will note the p is less than 0.05 so there is a statistical P300 present.
Finally, we can plot the topography of the peak to provide further justification that this is a P300.
subplot(1,3,3);
doPlot2DTopo(grandDifference(:,peakPoint),ERP.chanlocs);
For a paper, you could write this up as follows.
An analysis of the ERP data revealed a P300 ERP component (t(9) = 4.69, p = 0.0011. On the left panel of Figure 1, we see the grand average conditional ERP waveforms for channel Pz. In the middle panel, we see the grand average difference waveform. In the right panel of Figure 1, we see the P300 topography which confirms the existence of the component and the channel used for statistical quantification of the P300.
The script I used to do the group analysis is HERE.