Wavelet Analysis

This tutorial is designed to outline the steps in MATLAB to run a Wavelet analysis.

First, you need to download and unzip the sample data and the channel locations file that is HERE . You will also need to put my GitHub repos for MATLAB-EEG-preProcessing, MATLAB-EEG-postProcessing, MATLAB-EEG-icaTools, MATLAB-EEG-timeFrequencyAnalysis and MATLAB-EEG-fileIO into your MATLAB path. Finally, you will need to install EEGLAB which is available HERE.

This tutorial assumes that you have all of the aforementioned bits and pieces installed and they are all in the MATLAB path. Put the tutorial data in a folder and use MATLAB to navigate to that folder.

IMPORTANT. This tutorial is not designed to teach you what these steps mean, it is simply designed to teach you how to process and analyze EEG data using a Wavelet analysis. It is also worth noting that most of the steps here are redundant with the ERP analysis tutorial. To help highlight the differences, I have highlighted the title in BOLD where there are differences.

1. Loading Data

The tutorial data consists of 10 files of EEG data from an oddball ERP task. Because the data is in Brain Vision Format, there are three files for each person. The .vhdr files is a header file that contains information about the recording. You can open it with a text editor to see the contents if you wish. The .vmrk file is the marker file that specifies when the events of interest (e.g., fixation, oddball stimulus, control stimulus, etc, occurred). You can also open this file with a text editor to see the contents. Finally, there is the .eeg file which has the EEG data in binary format. Occasionally, the EEG file will have a .dat ending but it will still work.

To load the first EEG data file, type the following commands into the EEG command window or put them in a script. Make sure you change the pathName to match the path to your own data.

pathName = '/Users/krigolson/Desktop/tutorialData';
fileName = 'Oddball1.vhdr';
EEG = doLoadBVData(pathName,fileName);

This will load the EEG data into MATLAB into a variable called EEG. If this does not happen, something is wrong with your path installs / path configurations.

The EEG variable contains a bunch of fields about the data. For instance, we can see that this data has 63 channels and 108775 data points.

More on EEGLAB data structures HERE.
The MATLAB script I used to do this is HERE.

2. Filtering Data

Now, we will filter the data. Before we filter the data, we need to define filter parameters. We will set a low cutoff of 0.5 Hz and a high cutoff of 30 Hz for a bandpass filter. We will also set a notch filter at 60 Hz. Note, that in a Wavelet analysis you may want to raise the high pass cutoff if you are interested in gamma power, oscillations between 31 and 100 Hz.

filterParameters.low = 0.5;
filterParameters.high = 30;
filterParameters.notch = 60;

To filter the data, we simply use the following command:

EEG = doFilter(EEG,filterParameters);

Your screen display in MATLAB should look something like this:

The display shows the filters that were implemented. If you want to learn more about digital filtering I strongly recommend you read Chapter 7 and watch this video HERE just for starters.

If you want a demo on what filtering actually does to EEG data, run this code HERE.

The script I have used to far is HERE.

3. Data Cleaning

The next step involves some fairly sophisticated data cleaning using the EEGLAB clean_line and clean_rawdata functions. Again, a discussion of these tools is beyond the scope of this tutorial. We have hidden these functions in our doCleanData function. The default settings were recommended by Arnaud Delorme, I would consider changing these only with significant thought and discussion. You would have to edit the doCleanData function to modify these parameters.

clean_rawdata will be installed automatically when you open EEGLAB. clean_line will not. Download clean_line HERE and put it in your EEGLAB plugins directory. Add it to your MATLAB path or just simply run EEGLAB and it will be installed. Note, we have modified the clean_line function to avoid pop up windows, so, the one on GitHub will not run as smoothly.

To run doCleanData simply use the following command:

EEG = doCleanData(EEG);

You will see a bunch of output, but we have created a summary which is in the variable EEG.quality.cleanraw.

This output tells you the percent of data removed, how many channels were removed, and how many Boundary Events were created - these are markers for the discontinuities in the data created by clean_rawdata.

The script I have used this far is HERE.

4. Reference Data

Before running ICA, it is recommended to re-reference the EEG data to an average reference. This is easy to do with the following command:

EEG = doICAReference(EEG);

The script I have used to get this far is HERE.

5. Independent Component Analysis

Next, we use independent component analysis (more HERE) to remove ocular artifacts (blinks, saccades) from the EEG data. ICA is quite complex, but the simple version is if you examined the data before and after ICA, you should see that the ocular components are removed.

Make sure you have the Github toolbox MATLAB-EEG-icaTools installed and synced.

You also need to install the Picard extension for EEGLAB. To do this, run EEGLAB by typing eeglab at the MATLAB command line. Then, using the GUI go File --> Maanage EEGLAB Extensions. In the popup window, type "picard" in the search box and install this extension. You should only have to do this one time.

To run ICA is simple, and it requires two commands:

EEG = doICA(EEG);
EEG = doRemoveOcularICAComponents(EEG);

Afterwards, you will see a whole bunch of messages in the display window. It is important to inspect the EEG.quality.ica variable afterwards. You want to see that some ocular components were removed - in this case, 2. You also want to check that the average brain percentage is above 30%.

The script I have used this far is HERE.

6. Interpolate Channels

The data cleaning step typically results in the removal of channels. We need to put them back in the data set, so we have our full channel array. However, we do not want to do this before ICA because interpolation increases the correlation between channels (which is bad for ICA).

Now, we actually have to go back to the start because we need to keep our original channel list. So, right after we load the data (step 1) we would add a line to store the original channels:

originalChannels = EEG.chanlocs;

And then after we run ICA, we would interpolate the missing channels by using this command:

EEG = doInterpolate(EEG,originalChannels);

I will not post a screen shot here, but with this data set you should see that you have 61 channels before interpolation and you should be back to 63 channels after interpolation.

The script I have used to get to this step is HERE.

7. Epoch Data

The next step in analysis is to epoch the data - extract segments of data around the event markers for the oddball task. In the oddball task, we had two primary events - the appearance of the control and oddball circles. In this data, the oddball marker is S202 and the control marker is S203. To extract the epochs of data, we first define the markers and a time window - we will use 200 ms before the event to 600 ms after, and then call the doEpoch function.

epochMarkers = {'S202','S203'};
epochTimes = [-200 600];
EEG = doEpochData(EEG,epochMarkers,epochTimes);

You will not see anything exciting happen, but if you look in the EEG variable you will see that we now have three dimensions for EEG.data, channels x time x events. You should see 63 x 400 x 106. Note, we have less events than the expected total due to artifacts in the data.

The script I used is HERE.

8. Baseline Correction

The next step in analysis is the baseline correction. Again, I am not discussing the theory here, but essentially we take the average voltage before the onset of the stimulus and subtract it from the waveform for every channel and trial to "zero" the data to a common baseline.

baselineWindow = [-200 0];
EEG = doBaseline(EEG,baselineWindow);

As before, there is not much to see here but if you plotted individual trials before and after the baseline correction you would see that the mean of the baseline post correction is zero.

The script I used to do this is HERE.

9. Run the Wavelet Analysis

Now, we can run the Wavelet analysis on the data. The input is more complicated and is explained in the comment.

% run the Wavelet analysis. The time specified is the baseline window in
% ms. The next three numbers are the minimum frequency, the maximum
% frequency, and the number of frequency steps. I used 30 so the resolution
% will be 1 Hz. The last number if the mortlet parameter which shapes the
% wavelet. the recommended number is 6.
WAV = doWAV(EEG,epochMarkers,[-200 0],1,30,30,6);

This will generate an output variable called WAV. The wavelet output data is in WAV.data. This is the average power for all trials in both of our experimental conditions. The size of the matrix is 63 x 30 x 500 x 2. 63 is the number of channels of course. 30 represents 30 frequency bins. 500 represents the number of timepoints (500 as this data is sampled at 500 Hz). Finally, the 2 is for conditions, we have two - oddball and control.

% plot the Wavelets for both conditions side by size for channel 52
condition1WaveletData = squeeze(WAV.data(52,:,:,1));
condition2WaveletData = squeeze(WAV.data(52,:,:,2));
% now the plotting is more complex as we are working with an image with 3D
% data
subplot(1,2,1);
imagesc(condition1WaveletData);
title('Channel Pz: Condition One');
xlabel('Samples');
ylabel('Frequency (Hz)');
set(gca,'YDir','normal');
subplot(1,2,2);
imagesc(condition2WaveletData);
title('Channel Pz: Condition Two');
xlabel('Samples');
set(gca,'YDir','normal');
ylabel('Frequency (Hz)');

The next thing we should do is save the WAV variable that is created for when we want to do group analysis.

save('Subject Wavelet 1','WAV');

Now, you should repeat these steps and process all 10 subjects in the tutorial data set. Note, you will want to change the name of the file you save each time.

The script I used is HERE.

10. Group Level Analysis

At this point in time, you should have 10 files, 1 for each subject, containing the variable WAV. If you have not done so, do so now. You will note that the individual WAVs were quite variable. That is to be expected and is normal.

Now, we need to use a loop to load the 10 subject files and put the data into a variable called wavData.

% clear the memory
clear;
% close all open files and windows
close all;
% clear the command console
clc;
% loop through and load the participants
for i = 1:10
% specify a file name
fileName = ['Subject Wavelet ' num2str(i) '.mat'];
% load the file
load(fileName);
% put the data into a variable called wavData
wavData(:,:,:,:,i) = WAV.data;
end

When this is done, check the size of wavData. You should see that it is 63 (channels) x 30 (frequency bins) x 500 (data points) x 2 (conditions: oddball, control) x 10 (subjects).

size(wavData)

Wavelets can have edge artifacts where the wavelet itself doesn't handle the beginning and end of the data very well. So, we trim this data away. It is important when doing a wavelet analysis to remember to analyze more data than you need so you have data to trim.

% now with wavelets to avoid edge artifacts we are going to trim the first
% 100 ms and the last 100 ms
wavData(:,:,1:50,:,:) = [];
wavData(:,:,end-49:end,:,:) = [];

Now, we want to average across participants to get a "grand average", we will plot this as well.

grandWAV = mean(wavData,5);

Now, let's plot the data.

% plot the Wavelets for both conditions side by size for channel 52
condition1WaveletData = squeeze(grandWAV(52,:,:,1));
condition2WaveletData = squeeze(grandWAV(52,:,:,2));
% now the plotting is more complex as we are working with an image with 3D
% data
subplot(1,2,1);
imagesc(condition1WaveletData);
title('Channel Pz: Condition One');
xlabel('Samples');
ylabel('Frequency (Hz)');
set(gca,'YDir','normal');
subplot(1,2,2);
imagesc(condition2WaveletData);
title('Channel Pz: Condition Two');
xlabel('Samples');
set(gca,'YDir','normal');
ylabel('Frequency (Hz)');

You will see clearly that there is a difference between the two conditions in when you compare the two wavelet images (see below).

But how do we quantify this? Analyzing wavelet data statistically is quite hard to do. A simple method is to t-test every data point, apply a Bonferroni correction, and only zero any not significant voxels.

We will do this on the difference scores and use a specific function we have built for this.

So first, let's compute the difference scores for Pz.

% compute the difference scores
differenceData = squeeze(wavData(52,:,:,1,:) - wavData(52,:,:,2,:));

Then get a statistical map of the results.

% and generate a statistical map, you can just alpha using the Bonferroni
% rules.
alpha = 0.05;
statMap = doWAVStats(differenceData,alpha);

And finally plot the results.

% and lets plot the result
figure;
imagesc(statMap);
title('Channel Pz: Difference');
xlabel('Samples');
ylabel('Frequency (Hz)');
set(gca,'YDir','normal');

You will note there are regions where the data is statistically different, some where Condition One is > Condition Two, and some where Condition Two is greater than Condition One. The statistical map is used by most as the result and replaces writing a table for all 30 x 400 cells! It is worth noting that topography maps are challenging with wavelets but can be done at a specific time point if desired, typically at the maximum of a "hot spot".

For a paper, you could write this up as follows.

An wavelet analysis of the EEG data revealed revealed differences between 2 and 7 Hz between 0 and 500 ms. See Figure One to interpret the results.

The script I used to do the group analysis is HEREwavgroupanalysis.m.

FFT power spectra. Note the difference in delta, theta, and alpha.

The statistical map of the results.