EEG Signals: A Basic Signal Analysis Walkthrough

EEG Signals: A Basic Signal Analysis Walkthrough

A signal is a physical quantity that varies with time, space, or other independent variables and our brain is a source of numerous electrical signals. Inside our brain neurons communicate with each other via electrical impulses. These signals are a fundamental aspect of brain function, underlying everything from basic motor control to complex cognitive process. To visualize these electrical impulses Electroencephalogram (EEG) signals are one of the methods which captures voltage fluctuations resulting from ionic current flows within the neurons of the brain. In short, EEG is a non-invasive electrical signals recorded by electrodes from the scalp of a human and it is analyzed for brain activity.

Among various EEG signal datasets, one of the most famous dataset is the DEAP dataset. We will be going through a simple signal analysis of EEG signals from the first subject of the data. For better understanding what is the DEAP data please go through https://www.eecs.qmul.ac.uk/mmv/datasets/deap/readme.html

We will be using Python language for the signal processing. Specifically the open source python package called 'mne', which is specially made for brain signals exploring. Also 'scipy' is used for applying fundamental algorithms for scientific computing in Python To know more about mne please go through the nice documentation along with tutorials: https://mne.tools/stable/index.html

To start with the analysis we first need to import the dependencies from python library. Here we are using Google Colab for coding.

#install the mne
!pip install mne
import os
import numpy as np
import mne
import matplotlib.pyplot as plt

#import libraries for signal analysis
from scipy import signal
from scipy.signal import welch
from scipy.fft import fft, fftfreq        

The dataset is in a new file format which is 'bdf' format. EEG signals are normally recorded in '.bdf' or '.gdf' or '.mat' format. The MNE library can read all of these formats. We need to put the subject data in a directory then we need to read the data by 'read_raw_bdf' method.

signal_dir = "s01.bdf"
raw = mne.io.read_raw_bdf(signal_dir, preload=True)        

To get the general information of the EEG data raw.info is used.

raw.info        

We get the output as:


General info of the subject1 data

So what are the channels here? EEG is recorded by electrodes placed on your scalp and those individual electrodes refer to the individual channel names. There are few standards to measure the EEG with channels, one of them is the '10-20 system'. Electrodes are placed at specific points identified by?letters which indicate underlying brain regions (F = Frontal, T = Temporal, P = Parietal, O = Occipital) and numbers (odd for the left hemisphere, even for the right, z for midline). For example, Fp1 represents a sensor placed over the left frontal pole.


The 10-20 system
Now we will be doing the following procedures first.

  1. To plot the raw EEG signal
  2. To see the channel names
  3. To define our channel of interests
  4. To plot the defined channel within a timeframe to better understand the signal
  5. Then follow a paper for further processing.

Let us do the work upto point 4 now.

#Plotting the EEG data
raw.plot(title='Raw EEG Data')        


The raw EEG data with all 47 channels
#See the channel names
print(raw.info['ch_names'])        
['Fp1', 'AF3', 'F7', 'F3', 'FC1', 'FC5', 'T7', 'C3', 'CP1', 'CP5', 'P7', 'P3', 'Pz', 'PO3', 'O1', 'Oz', 'O2', 'PO4', 'P4', 'P8', 'CP6', 'CP2', 'C4', 'T8', 'FC6', 'FC2', 'F4', 'F8', 'AF4', 'Fp2', 'Fz', 'Cz', 'EXG1', 'EXG2', 'EXG3', 'EXG4', 'EXG5', 'EXG6', 'EXG7', 'EXG8', 'GSR1', 'GSR2', 'Erg1', 'Erg2', 'Resp', 'Plet', 'Temp', 'Status']        

Now let us see some of the channels that I want to further analyze.

# Define the channels you want to keep
channels_of_interest = ['Fp1', 'AF3', 'F7', 'F3']

# Pick only the channels of interest
raw.pick_channels(channels_of_interest)

raw.plot(title='Selected Frontal Channels')        


Raw EEG signal of selected channels

Here we can see we can not purely visualize any of the channel because of there redundancy and noises. So now we will start analyzing only one channel, say the F7 channel. I followed the review paper titled 'On the Intersection of Signal Processing and Machine Learning: A Use Case-Driven Analysis Approach' by Sulaiman Aburakhia, Member, IEEE, Abdallah Shami, Senior Member, IEEE and George K. Karagiannidis, Fellow, IEEE.

paper: https://arxiv.org/abs/2403.17181

The followings are to be done now:

Select the F7 channel

  • Calculate the duration in seconds
  • Plot the data in a specific time frame
  • Smoothening of the signal
  • Applying Low Pass Filter
  • Interpolate the data (optional)
  • Plotting the Power Spectral Density(PSD) with FFT
  • Plotting the Spectrogram after STFT

# Select only the 'F7' channel
raw_f7 = raw.copy().pick_channels(['F7'])

# Verify the selected channel in the new Raw object
print(raw_f7.info['ch_names'])

# Plot the raw data of the selected channel
#raw_f7.plot(title='Channel F7')

start_time = 0  # Start time in seconds
end_time = 10   # End time in seconds

# Plot the data for the selected time window
raw_f7.plot(start=start_time, duration=end_time - start_time, title='F7 Channel')        


F7 Channel (10 seconds)
Signal Smoothing

The signal is suffering from too much abruptions. There could be abrupt amplitude variations in the collected signal between samples. The performance of an application at the application level may suffer from such erratic amplitude variations. Additionally, the signal is vulnerable to outlier samples during measurements due to outside variables such instrument malfunction. Smoothing serves as an approximation filter that operates at a given rate on N successive samples duration and results N averaged samples. Typical smoothing techniques make use of the widely recognized moving average (MA) filter along with the Savitzky-Golay (SG) filter.

Moving Average Filter: Moving Average filter offers a simple method to smooth the signal by averaging neighboring data points while preserving the signal’s shape. It operates by averaging a set of signal data points (window ) and sliding this window across the signal to smooth out short term fluctuations. To know more please follow: https://codemonk.io/blog/moving-average-filter/

start_time = 1  # Start time in seconds
end_time = 60   # End time in seconds

# Extract the data for the time window
data, times = raw_f7[:, int(start_time * raw_f7.info['sfreq']):int(end_time * raw_f7.info['sfreq'])]

# Define the moving average filter function
def moving_average(data, window_size):
    cumsum = np.cumsum(data)
    cumsum[window_size:] = cumsum[window_size:] - cumsum[:-window_size]
    return cumsum[window_size - 1:] / window_size

# Apply the moving average filter
window_size = 50  # Adjust window size as needed
smoothed_data = moving_average(data[0], window_size)

# Adjust the times for the smoothed data
smoothed_times = times[window_size - 1:]  # Align time array with smoothed data

# Plot the original and smoothed signals
plt.figure(figsize=(12, 6))

# Plot original data
plt.subplot(2, 1, 1)
plt.plot(times, data[0], label='Original Signal', color='blue')
plt.title('Original F7 Signal (1 to 60 seconds)')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.legend()

# Plot smoothed data
plt.subplot(2, 1, 2)
plt.plot(smoothed_times, smoothed_data, label='Smoothed Signal', color='red')
plt.title('Smoothed F7 Signal (1 to 60 seconds)')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.legend()

plt.tight_layout()
plt.show()        


From the above image you can clearly depict that the original signal which was filled with noise and artifacts have been filtered and smoothened after applying the MA filter, which is basically a Finite Impulse Response (FIR) filter.

Signal Frequency Filtering

As a vital instrument for signal manipulation or analysis, frequency filters attenuate certain frequency components while permitting others to flow through. These filters are necessary for many different uses. The particular criteria of the application, such as the required cut-off frequencies, the transition band's steepness, and the allowable amounts of ripple in the passband and stopband, serve as guidelines for the design and implementation of frequency filters. Main categories of frequency filters include:

Low-Pass Filters: These filters allow signals with a frequency lower than a certain cutoff frequency to pass through while attenuating signals with frequencies higher than the cutoff. Low-pass filters are widely used in signal smoothing and in eliminating high-frequency noise.

High-Pass Filters: High-pass filters perform the opposite function of low-pass filters. They allow signals with frequencies higher than a certain cutoff frequency to pass and attenuate signals with lower frequencies. HPFs are commonly used in applications requiring the removal of low-frequency noise or drift from a signal.

Band-Pass Filters: Band-pass filters allow signals within a specific frequency range (band) to pass through while attenuating frequencies outside this range. These filters are used in applications where a particular frequency band is of interest, such as in certain types of signal analysis and telecommunications.

Band-Stop Filters or Notch Filters: Band-stop filters, also known as notch filters or band-reject filters, attenuate signals within a specific frequency range while allowing frequencies outside this range to pass through. These filters are useful in rejecting interference or noise at specific frequencies without affecting the rest of the signal spectrum.

Let us design a 4th order Butterworth LPF for the F7 channel for the whole timeframe along with the plotting of PSD and Frequency Spectrums.

Power spectral density (PSD) describes how the power of a signal is distributed across different frequencies, providing a measure of the power per unit frequency. This is achieved through the normalization of the power spectrum by frequency resolution (frequency bin width), converting the power spectrum into a power spectral density that shows the power distribution per unit frequency.
The fast Fourier transform (FFT) and its inverse, IFFT, are computational algorithms that efficiently implement the DFT and its inverse, IDFT. The key advantage of FFT and IFFT is their high computational efficiency. NumPy and SciPy Python libraries offer reliable functions for FFT and IFFT computation. When computing the FFT of a signal, two critical parameters need to be considered: the sampling rate, fs, and the FFT length, n. The values of these parameters determine the frequency range and frequency resolution of the resulting spectrum.
data, times = raw_f7[:]  # Get all the data

# Define the moving average filter function
def moving_average(data, window_size):
    cumsum = np.cumsum(data)
    cumsum[window_size:] = cumsum[window_size:] - cumsum[:-window_size]
    return cumsum[window_size - 1:] / window_size

# Apply the moving average filter
window_size = 50
smoothed_data = moving_average(data[0], window_size)

# Apply a 4th-order Butterworth low-pass filter
fs = raw.info['sfreq']  # Sampling frequency
cutoff = 50  # Cutoff frequency in Hz
N = 4  # Filter order
b, a = butter(N, cutoff, fs=fs, btype='low', analog=False)
filtered_data = filtfilt(b, a, smoothed_data)

# Function to compute FFT
def apply_fft(x, fs):
    N = len(x)
    T = 1.0 / fs
    fft_coef = np.fft.fft(x - np.mean(x))
    xf = np.fft.fftfreq(N, T)[:N // 2]
    fft_positive = 2.0 / N * np.abs(fft_coef[:N // 2])
    return xf, fft_positive

# Function to compute PSD
def compute_psd(signal, fs):
    seg_length = len(signal) // 10
    overlap = seg_length // 20
    nfft_length = max(2**14, seg_length)
    frequencies, psd = welch(signal, fs=fs, window='hann', nperseg=seg_length, noverlap=overlap, nfft=nfft_length)
    return frequencies, psd

# Compute PSD and FFT for raw and filtered signals
f_psd_raw, psd_raw = compute_psd(smoothed_data, fs)
f_psd_filtered, psd_filtered = compute_psd(filtered_data, fs)
f_fft_raw, fft_raw = apply_fft(smoothed_data, fs)
f_fft_filtered, fft_filtered = apply_fft(filtered_data, fs)

# Plotting
plt.figure(figsize=(20, 15))

# Time-domain waveform
plt.subplot(3, 1, 1)
plt.plot(times[:len(smoothed_data)], smoothed_data, label='Smoothed Signal')
plt.plot(times[:len(filtered_data)], filtered_data, label='Filtered Signal', linestyle='--')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.title('Time-Domain Waveform')
plt.legend()
plt.grid(True)

# PSD
plt.subplot(3, 1, 2)
plt.semilogy(f_psd_raw, psd_raw, label='Smoothed Signal')
plt.semilogy(f_psd_filtered, psd_filtered, label='Filtered Signal')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Power Spectral Density (dB/Hz)')
plt.title('Power Spectral Density')
plt.legend()
plt.grid(True)

# Frequency spectrum
plt.subplot(3, 1, 3)
plt.plot(f_fft_raw, fft_raw, label='Smoothed Signal')
plt.plot(f_fft_filtered, fft_filtered, label='Filtered Signal')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Amplitude')
plt.title('Frequency Spectrum')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()        


From the PSD plot it is evident that during the time range of 390 to 410 seconds there is a high peak amplitude also a peak in downwards is available after say 630 seconds. We need to remove this peak keeping a stable signal, which can be done by interpolation. You will be experiencing the difference after providing the interpolation.

Interpolation is the process of reconstructing a CT signal x(t)x(t) from its samples x[n]=x(nTs)x[n]=x(nTs).
from scipy.interpolate import interp1d

# Define the time ranges to be removed (in seconds)
remove_intervals = [(390, 410), (630, 670)]

# Convert these times to sample indices
remove_indices = []
for start, end in remove_intervals:
    start_index = int(start * fs)
    end_index = int(end * fs)
    remove_indices.append((start_index, end_index))

# Mask the intervals to be removed
masked_data = smoothed_data.copy()
for start_index, end_index in remove_indices:
    masked_data[start_index:end_index] = np.nan

# Interpolate over the NaN values
non_nan_indices = ~np.isnan(masked_data)
nan_indices = np.isnan(masked_data)

interpolator = interp1d(np.where(non_nan_indices)[0], masked_data[non_nan_indices], kind='linear', fill_value='extrapolate')
interpolated_data = interpolator(np.arange(len(masked_data)))

# Plot the data
plt.figure(figsize=(20, 10))

# Plot original smoothed signal
plt.plot(times[:len(smoothed_data)], smoothed_data, label='Original Smoothed Signal', alpha=0.5)

# Plot interpolated signal
plt.plot(times[:len(interpolated_data)], interpolated_data, label='Interpolated Signal', linestyle='--')

plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.title('Signal with Interpolated Sections')
plt.legend()
plt.grid(True)
plt.show()        


I want to show you the filtering performance for the first 100 seconds of the F7 channel along with the before and after part. You will be clearly visualize that the signal got filtered with a 4th order Butterworth filter. Point to note, the more order of n the more filtered data you should get.

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import butter, filtfilt, welch

# Parameters
segment_duration = 100  # Duration of the segment in seconds
fs = raw.info['sfreq']  # Sampling frequency

# Define the number of samples for 100 seconds
segment_samples = int(segment_duration * fs)

# Extract the first 100 seconds of data from F7 channel
data, times = raw_f7[:]
data_segment = data[0, :segment_samples]
times_segment = times[:segment_samples]

# Apply a 4th-order Butterworth low-pass filter
cutoff = 50  # Cutoff frequency in Hz
N = 4  # Filter order
b, a = butter(N, cutoff, fs=fs, btype='low', analog=False)
filtered_segment = filtfilt(b, a, data_segment)

# Function to compute FFT
def apply_fft(x, fs):
    N = len(x)
    T = 1.0 / fs
    fft_coef = np.fft.fft(x - np.mean(x))
    xf = np.fft.fftfreq(N, T)[:N // 2]
    fft_positive = 2.0 / N * np.abs(fft_coef[:N // 2])
    return xf, fft_positive

# Function to compute PSD
def compute_psd(signal, fs):
    seg_length = len(signal) // 10
    nfft_length = max(2**14, seg_length)  # Ensure nfft is at least as large as seg_length
    frequencies, psd = welch(signal, fs=fs, window='hann', nperseg=seg_length, nfft=nfft_length)
    return frequencies, psd

# Compute PSD and FFT for raw and filtered segments
f_psd_raw, psd_raw = compute_psd(data_segment, fs)
f_psd_filtered, psd_filtered = compute_psd(filtered_segment, fs)
f_fft_raw, fft_raw = apply_fft(data_segment, fs)
f_fft_filtered, fft_filtered = apply_fft(filtered_segment, fs)

# Plotting
plt.figure(figsize=(20, 15))

# Time-domain waveform
plt.subplot(3, 1, 1)
plt.plot(times_segment, data_segment, label='Raw Signal')
plt.plot(times_segment, filtered_segment, label='Filtered Signal', linestyle='--')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.title('Time-Domain Waveform (First 100 Seconds)')
plt.legend()
plt.grid(True)

# PSD
plt.subplot(3, 1, 2)
plt.semilogy(f_psd_raw, psd_raw, label='Raw Signal')
plt.semilogy(f_psd_filtered, psd_filtered, label='Filtered Signal')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Power Spectral Density (dB/Hz)')
plt.title('Power Spectral Density (First 100 Seconds)')
plt.legend()
plt.grid(True)

# Frequency spectrum
plt.subplot(3, 1, 3)
plt.plot(f_fft_raw, fft_raw, label='Raw Signal')
plt.plot(f_fft_filtered, fft_filtered, label='Filtered Signal')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Amplitude')
plt.title('Frequency Spectrum (First 100 Seconds)')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()        


Spectrograms using STFT

The short-time Fourier transform (STFT) bridges the gap between time-domain and frequency-domain analysis, making it particularly useful for analyzing nonstationary signals whose spectral characteristics evolve over time. Unlike FT, which spans the signal’s entire duration for frequency analysis, STFT analyzes the signal over finite, overlapping time windows, thereby preserving temporal information.

The computation of the STFT involves dividing the time-domain signal into segments of equal time length, which is achieved by multiplying the signal with a sliding window function. The FFT is then computed for each segment, providing frequency bins and corresponding Fourier coefficients for each segment.

So what can we find from a spectrogram?

The spectrogram displays how the frequency content of the signal evolves over time. ach point on the spectrogram represents the amplitude (or power) of a specific frequency at a specific time. By observing the spectrogram, you can identify which frequencies are dominant at various times.

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import butter, filtfilt, stft

# Parameters
fs = raw.info['sfreq']  # Sampling frequency of the EEG data

# Extract the F7 channel data
f7_data = raw.copy().pick_channels(['F7']).get_data()[0]

# Smoothing Parameters
N = 4  # Filter order for smoothing
lowcut = 1.0  # Low cut-off frequency in Hz for smoothing
highcut = 50.0  # High cut-off frequency in Hz for smoothing

# Design the Butterworth bandpass filter for smoothing
b, a = butter(N, [lowcut, highcut], fs=fs, btype='band')
smoothed_f7 = filtfilt(b, a, f7_data)

# Low-Pass Butterworth Filter Parameters
lpf_order = 4  # Filter order
lpf_cutoff = 30.0  # Cut-off frequency in Hz for low-pass filter

# Design the Butterworth low-pass filter
b_lpf, a_lpf = butter(lpf_order, lpf_cutoff, fs=fs, btype='low')
filtered_f7 = filtfilt(b_lpf, a_lpf, smoothed_f7)

# Define segment duration and extract the first 100 seconds
segment_duration = 100  # Duration in seconds
segment_samples = int(segment_duration * fs)
data_segment = filtered_f7[:segment_samples]

# STFT Parameters
seg_len = len(data_segment) // 10  # Segment length
overlap = 0.5 * seg_len  # 50% overlap
fft_length = max(seg_len, 2048)
win = 'hamming'  # Window function

# Compute STFT
f_stft, times, Zxx = stft(data_segment, fs=fs, window=win, nperseg=seg_len, noverlap=overlap, nfft=fft_length)

# Plotting the STFT Spectrogram
plt.figure(figsize=(20, 15))
plt.pcolormesh(times, f_stft, np.abs(Zxx), shading='gouraud')
plt.title('STFT Spectrogram of Filtered F7 Signal', fontsize=40)
plt.ylabel('Frequency (Hz)', fontsize=40)
plt.xlabel('Time (s)', fontsize=40)
cb = plt.colorbar()
cb.set_label(label='Coefficient magnitude', size=40)
cb.ax.tick_params(labelsize=40)
plt.tick_params(axis='both', which='major', labelsize=40)
plt.tick_params(axis='both', which='minor', labelsize=40)
plt.show()        


Therefore, we have came to the end of a basic signal processing using the DEAP dataset that is the EEG signals. Many of the terms, such as, FFT, STFT, PSD, Filters etc. should be kept in the readers mind before applying the signal processing techniques.

good informative.

Faahimah Bint Fakhrul

--Undergraduate Student|Electrical and Electronic Engineering Major and Aspiring Matlab Performer: Transforming Ideas into solutions.

6 个月

Very Informative..

Opy Das

Research Assistant at UiA | Master's in Renewable Energy | Deep Learning Collaboration in Battery Management System and Renewable Energy | Power System Optimization

6 个月

Great Work. Keep it up.

Taisirul Muktadi

BSc. in Biomedical Engineering | CUET | Entrepreneur | Data & ML Enthusiast

6 个月

Great advice

要查看或添加评论,请登录

Tanzeem Tahmeed Reza的更多文章

社区洞察

其他会员也浏览了