PEPPER

baseeegheader PEPPER in a nutshell

Introduction
Pipeline Overview
Development Guidelines
Containers

Introduction

Welcome to the PEPPER-Pipeline project! To get immediately started on collaborating, view development guidelines. We welcome contributors from all backgrounds!

The development of the PEPPER-Pipeline is focused on optimizing an automated, flexible, and easy-to-use preprocessing pipeline dedicated to EEG preprocessing.

Following the optimization of import and preprocessing tools, development will focus on building out a common core of EEG processing tools to handle ERP, time-frequency, and source-based analyses. To find out more about this, view pipeline overview.

Pipeline Overview

The following section details the planning, motivation, and features behind the PEPPER pipeline.

UML Diagrams

UML Diagram for Outer-loop

UML diagram for run, which references to run:preprocess

UML Diagram for Preprocess

UML diagram for run:preprocess

The UML diagrams above detail the discrete pipeline steps of the default input_config.json file:

load:data (pipeline input)

A subset of raw data described in load_data is extracted.
run:preprocess

The main script calls a series of functions, each one executing a step of the pipeline. All follow the same standard format: each feature always receives an EEG object and unpacked variables from the params dictionary in the main script.

Additionally, each pipeline step returns an EEG object and a dictionary describing the changes that occurred to that EEG object.

Motivation behind each pipeline step is described in features.
write:output

At the very last step of the pipeline, each respective output is passed to a write module that transforms the summed outputs into a comprehensive file.

Together, the contents of input_config.json and output_preproc_sub-A_ses-B_task-C_run-D_eeg.json define all details necessary to describe (such as in the methods and results section for a journal publication) the manipulations of the pre-processing pipeline and its outputs.

A long-term goal is to automate the writing of these journal article sections via a script that takes these two files as inputs.

Features

1-Filter

High pass filter the data using MNE functions
Read in the “high pass” “low pass” fields from the user_params.json file to define filter parameters

2-Reject Bad Channels

Auto-detect and remove bad channels (those that are “noisy” for a majority of the recording)
Write to output file (field “globalBad_chans”) to indicate which channels were detected as bad

3-Independent Component Analysis

Overview: ICA requires a decent amount of stationarity in the data. This is often violated by raw EEG. One way around this is to first make a copy of the EEG data using automated methods to detect noisy portions of data and then remove these sections. ICA is then run on the copied data after cleaning. The ICA weights produced by the copied dataset are copied back into the original recording. In this way, we do not have to “throw out” sections of noisy data, but we are still able to derive an improved ICA decomposition.

Prepica
- Make a copy of the EEG recording
- For the copied data: high pass filter at 1 Hz
- For the copied data: segment by epoch to “cut” the continuous EEG recording into arbitrary 1-second epochs
- For the copied data: use automated methods (voltage outlier detection and spectral outlier detection) to detect epochs that are excessively “noisy” for any channel
- For the copied data: reject (remove) the noisy periods of data
- Write to the output file which segments were rejected and based on what metrics
ICA
- Run ICA on the copied data
- Copy the ICA weights from the copied data back to the pre-copy data
Rejica
- Use automated methods (TBD) to identify ICA components that reflect artifacts
- Remove the data corresponding to the identified artifacts
- Write to the output file (field “icArtifacts”) which ICA components were identified as artifacts

4-Segment

Segment by epoch to “cut” the continuous data into epochs of data such that the zero point for each epoch is a given marker of interest
Write to output file (field “XXX”) which markers were used for epoching purposes, how many of each epoch were created, and how many milliseconds were retained before/after the markers of interest

5-Final Reject Epochs

Loop through each channel. For a given channel, loop over all epochs for that channel and identify epochs for which that channel, for a given epoch, exceeds either the voltage threshold or spectral threshold. If it exceeds either threshold, reject the channel data for this channel/epoch
Write to the output file (“field XXX”) which channel/epoch intersections were rejected

6-Interpolate

Interpolate missing channels, at the epoch level, using a spherical spline interpolation, as implemented in MNE
Interpolate missing channels, at the global level, using a spherical spline interpolation, as implemented in MNE
Write to output file (field “XXX”) which channels were interpolated and using what method

7-Re-reference

Re-reference the data to the average of all electrodes (“average reference”) using the MNE function
Write to output file (field “XXX”) which data were re-referenced to average

RoadMap

Pre-release

Minimal, yet complete, pipeline implemented
Performs all standard preprocessing steps
Validated, but not optimized
Containers are stable
Testing suite is stable
Standards for community-driven contributions established
Contributor documentation
Initial governance structure and credit assignment standards established

Release 0.1

Modules to allow running in parallel on local or remote (HPC) controlled by the same parameters in the input_params.json file (auto-generates Slurm scripts for HPC)
Preprint posted to bioRxiv
Initial data quality assessment suite is stable
Pipeline meets/exceeds at least one commonly used and published pipeline on data quality metrics
Initial set of standard input parameters for child, adolescent and young adult data
Updated functions:
- Filter
- ICA
Standalone import-feature-io template
User-end documentation
Updated/refined governance structure and credit assignment standards

Release 1.0

Optimized for infant, child, adolescent, and young adult data
Standard input parameters available
Revised data output formats to bring in line with emerging BIDS-EEG derivatives
Expanded data quality assessment suite
Functions relying on electrode locations updated to use 3-dimensional coordinates based on age-appropriate head model
Integrated with DataLad
Verbose .log output files

Release 2.0

Data quality optimization module
Module for downloading remote datasets
Integrated with DataLad
GUI for generating input_params.json file
Web-based GUI and automated connection to computational resources
Expanded set of norms for standard EEG features

Development Guidelines

Identify Issues or Enhancements

If you believe a new issue needs to be added to the list of open issues, feel free to create a new issue and select the appropriate template that suits the indicated change.

Once an issue has been created, the original author can likewise immediately assign themselves and start coding or documenting as described in contribute to the code.

Contribute to the Code

To get started on coding, follow the steps below. Note that you must have a GitHub account to collaborate on this project. All quoted commands are executed in your shell.

Fork the repo to your GitHub account by clicking on the “Fork” button on the top right corner of the PEPPER repository:
Clone the repository to your local machine, with git clone in your terminal. Be sure to replace user below with your own GitHub username.
```
  git clone https://github.com/[user]/pepper-pipeline.git
  cd pepper-pipeline
```
Build and activate a container using the OS-relevant files (see containers/README.md).
Switch to the branch that you plan to contribute to.

If work on this issue has already begun, then fetch and checkout the active branch and then create a sub-branch.
```
cd pepper-pipeline
git checkout dev-feature-issue
git checkout -b dev-feature-issue-name
```

If this issue has not begun development, then create a new branch and then create a sub-branch.

cd pepper-pipeline
git checkout -b dev-feature-issue
git checkout -b dev-feature-issue-name 

Implement changes (commit often!).

 git add file1 file2
 git commit -m "Attached flux capacitor" 

After you complete all your intended commits, push changes to branch.
```
 git push origin dev-feature-issue-name 
```
Create a pull request using the GitHub GUI.

Containers

Please use the dockerfile & singularity recipe located in containers/. Directions on installation and usage are located in container/README.md.

Contents