Processing and Analysis of Multichannel Extracellular Neuronal Signals: State-of-the-Art and Challenges

In recent years multichannel neuronal signal acquisition systems have allowed scientists to focus on research questions which were otherwise impossible. They act as a powerful means to study brain (dys)functions in in-vivo and in in-vitro animal models. Typically, each session of electrophysiological experiments with multichannel data acquisition systems generate large amount of raw data. For example, a 128 channel signal acquisition system with 16 bits A/D conversion and 20 kHz sampling rate will generate approximately 17 GB data per hour (uncompressed). This poses an important and challenging problem of inferring conclusions from the large amounts of acquired data. Thus, automated signal processing and analysis tools are becoming a key component in neuroscience research, facilitating extraction of relevant information from neuronal recordings in a reasonable time. The purpose of this review is to introduce the reader to the current state-of-the-art of open-source packages for (semi)automated processing and analysis of multichannel extracellular neuronal signals (i.e., neuronal spikes, local field potentials, electroencephalogram, etc.), and the existing Neuroinformatics infrastructure for tool and data sharing. The review is concluded by pinpointing some major challenges that are being faced, which include the development of novel benchmarking techniques, cloud-based distributed processing and analysis tools, as well as defining novel means to share and standardize data.


INTRODUCTION
The open question of structure-function relationship has attracted lot of interests in Systems Neuroscience. Recent works on anatomical substructures of the brain (Briggman and Denk, 2006;Mikula et al., 2012) promise to improve our understanding of neuronal networks physiology and drive the development of novel applications of neurotechnology by interpreting the activities of large neuronal ensembles via extracellular methods (Buzsaki, 2004;Nicolelis and Lebedev, 2009).
On the other hand, neuronal signals recorded by means of neuronal probes require rigorous (pre)processing and analysis. In terms of technological advancement, the extracellular interfacing of neurons with artificial chip-based devices has taken a considerable leap forward, even in comparison with very popular patch-clamp, EEG, and fMRI techniques (Vassanelli, 2011;Spira and Hai, 2013). In the last two decades, such advances have allowed neuroscientists to record neural activity simultaneously from many neurons with up to thousands of recording sites in a single neuronal probe and at a temporal resolution from a few up to hundreds of kilo Hertz (kHz) (Buzsaki, 2004;Schröder et al., 2015).
The wide variety of electrode size and dimensions allow different types of neuronal signals to be recorded from the extracellular space. Single-unit activities (action potentials) from single neurons can be sensed by small electrodes in their close proximity (Buzsaki et al., 2012). They also pick multi-unit activities from several simultaneously active neurons nearby to the electrode (Einevoll et al., 2012). With increasing electrode dimensions, local field potentials (LFPs) are sensed from neighboring neuronal populations as synchronous net activity of several hundreds to thousands neurons (Tsytsarev et al., 2006;Vassanelli, 2011Vassanelli, , 2014Vassanelli et al., 2012;Khodagholy et al., 2015). Therefore, the neurophysiological signals from different brain structures can be measured using a wide range of techniques based on the dimensions of the electrodes (see Figure 1; Sejnowski et al., 2014).
Also, the massive growth in the field of brain imaging techniques allowed scientists to image brain activities at very different scales, from imaging single ion-channels to the whole brain (for a review, see Freeman, 2015).
Recently developed neural probes allowed neuroscientists to investigate neural processing by monitoring groups of neurons and their activation patterns at unprecedented resolution (Brown et al., 2004;Giocomo, 2015), thus also contributing to bridge the gap between neuronal network activity and behavior (Berenyi et al., 2014). In addition, they provided deep insights on the pathological basis of brain disorders (Friston et al., 2015). As a drawback, investigation of brain function and pathology can require massive data mining. For example, in an hour, a 128 FIGURE 1 | Spatiotemporal range of neurophysiological signal acquisition techniques. Spatiotemporal range of the main techniques to measure neurophysiological signals from the brain. EEG, electroencephalography; MEG, magnetoencephalography. channel signal acquisition system with 16 bits A/D conversion and 20 kHz sampling rate will generate approximately 17 GB uncompressed data (Mahmud et al., 2014). Inferring meaningful conclusions from this massive amount of data is pivotal to the neuroscience and neuroengineering community (Mahmud et al., 2010a(Mahmud et al., , 2012a and tools for analysis of such multichannel extracellular recordings that support a rapid and accurate data interpretation are still missing (Stevenson and Kording, 2011). Though computing power increased and costs decreased, yet, processing and analysis of signals remained labor-intensive. This poses a huge challenge to the computational neuroscientists: to develop tools to analyze such complex data that are optimized for both memory management and processing times (Stevenson and Kording, 2011).
Over the years, to make data handling and analysis fast, interactive and user friendly, several software tools have been developed by individual laboratories, e.g., Mahmud et al. (2012a), but only a negligible number of them have been released to the community. In practice, large number of analysis scripts are kept private, leading to a situation where analysis transparency is reduced and reproducibility of analysis results is hampered (Schofield et al., 2009).
It has also been argued that the acquired data, despite being in digitized form, have been only minimally made publicly available for other scientists to explore and validate (Van Horn and Ball, 2008). To overcome this, in recent years, the community sees a growing need to have standardized and publicly available tools (Gardner et al., 2008;Akil et al., 2011) as well as experimental data repositories (Ascoli, 2006a;De Schutter, 2010). To this aim, a paradigm shift has been initiated by a set of laboratories to share their analysis tools through open-source licenses fostering standardization (Ince et al., 2010). Given the circumstances, distributed and cloud-based computing solutions have become an obvious and valuable option (Mahmud et al., 2014). This review will introduce the readers to the available major open-source academic toolboxes for processing and analysis of neurophysiological signals acquired by means of multichannel probes, and the available infrastructure for sharing such tools and the experimental data. Also, some of the challenges and bottlenecks the community is currently facing will be identified and highlighted, and development perspectives which, in our opinion, will facilitate result reproducibility, flexibility, and standardization will be provided.

STATE-OF-THE-ART
The state-of-the-art for processing and analysis of neurophysiological signals can be categorized based on signal types, i.e., electroencephalography or magnetoencephalography, (local) field potentials, and spikes. Though the majority of the toolboxes specialize to process and analyze one specific type of signal, there exist a few which provide rather comprehensive methods covering two or more signal types. Therefore, based on the signal types we categorized the toolboxes into three broad categories: • Toolboxes for Electroencephalography (EEG) analysis; • Toolboxes for spike trains and field potentials analysis; • Toolboxes for spike sorting.
Most of the tools were developed mainly in MATLAB (Mathworks Inc., Natick, USA; www.mathworks.com) and python (www. python.org) programming languages due to their diffused usage in the neuroscience community. Other programming languages such as C, C++, R, Delphi7, and Java were also used in partial coding of some packages.

Toolboxes for Electroencephalography (EEG) Analysis
In the last decade, various techniques have been developed and applied to EEG data analysis and focused reviews on specific techniques have been reported (Pascual-Marqui et al., 2002;Stam, 2005;Hallez et al., 2007;Grech et al., 2008;Lenkov et al., 2013;de Cheveigné and Parra, 2014). Table 1 summarizes some of the popular open-source EEG analysis tools with their representative features which are enlisted below.

EEGLAB
"EEGLAB" is a MATLAB based EEG signal processing environment with time-frequency and ICA methods (Delorme and Makeig, 2004). It allows the user to: plot channel spectra and maps, remove artifacts, extract signal epochs, average data, select and compare multiple data, plot event related potential (ERP) images, decompose data using ICA and time/frequency methods, and estimate source locations. In addition, it also allows handling data from multiple subjects and perform statistical

ERPWAVELAB
"ERPWAVELAB" is another MATLAB based EEG processing toolbox (Morup et al., 2007) which depends on EEGLAB for certain functionalities. It is capable of multi-channel timefrequency analysis of ERP of EEG and MEG data. Provides data decomposition using multiway (tensor) factorization. The features include: various visualizations and maps, artifact rejection in the time-frequency domain, clustering dendrogram, statistical analysis across different groups and subjects, cross coherence analysis, etc. It can be obtained from www.erpwavelab.org.

pyMVPA
"pyMVPA" is a multivariate pattern analysis package developed in Python and aims to facilitate statistical learning analyses of large datasets (Hanke et al., 2009). It offers data handling and an extensible framework for multivariate statistical analyses such as, classification, regression, and feature selection. It can be downloaded from www.pymvpa.org/.

eConnectome
"eConnectome" is a MATLAB based software with interactive graphical interfaces for EEG/ECoG/MEG preprocessing, source estimation, connectivity analysis and visualization where the connectivity from EEG/ECoG/MEG can be mapped over sensor and source domains (He et al., 2011). It can be obtained from http://econnectome.umn.edu/.

FieldTrip
"FieldTrip" is a MATLAB based toolbox developed for the analysis of MEG, EEG, and other noninvasively recorded electrophysiological data (Oostenveld et al., 2011). Capable of handling data directly from many proprietary formats (e.g., BrainProducts/BrainVision, NeuroScan, Electrical Geodesics Inc., BCI2000, Micromed, Nexstim, European data format, Generic standard formats, etc.), it provides the user to perform time-frequency analysis using multitapers, source reconstruction using dipoles, distributed sources and beamformers, connectivity analysis, and nonparametric statistical permutation tests at the channel and source level. It can be obtained from www. fieldtriptoolbox.org.

EEGVIS
"EEGVIS" is a MATLAB based toolbox that allows users to explore multichannel EEG and other large array-based data sets using multiscale drill-down techniques (Robbins, 2012). Available at http://visual.cs.utsa.edu/research/projects/eegvis, and useable as a plugin to "EEGLAB."

SCoT
"SCoT" is a toolbox written in Python for connectivity analysis on EEG/MEG sources. It performs blind source separation, connectivity estimation, resampling statistics, and visualization (Billinger et al., 2014). It works with both multi-trial and single trial data. The source code can be downloaded from https:// github.com/SCoT-dev/SCoT.

PREP
"PREP" is for early-stage EEG processing which is a MATLAB based preprocessing pipeline that aims in cleaning (e.g., line noise removal, fixing drifting problem, interpolating corrupt channels, etc.) the EEG signals (Bigdely-Shamlo et al., 2015). The library is available at http://eegstudy.org/prepcode.

Toolboxes for Spike Trains and Field Potentials Analysis
With the increasing capabilities to record simultaneously from a growing number of neurons, computational neuroscientists developed automated toolboxes addressing the required processing and analyses. We touch upon few of the publicly available ones below. Table 2 summarizes the packages we discuss below with their representative features.

MeaBench
"MeaBench" is a toolbox written mainly in C++ with certain parts written in Perl 1 and MATLAB. It is intended for data acquisition and online analysis of commercial multielectrode array recordings from Multichannel Systems GmbH (Reutlingen, Germany) (Wagenaar et al., 2005). It allows real-time data visualization, line and stimulus artifact suppression, spike and burst detection and validation. Available at www.danielwagenaar. net/res/software/meabench/.

Brain-System for Multivariate AutoRegressive Time Series (BSMART)
"BSMART" toolbox is written in MATLAB/C for spectral analysis of neurophysiological signals (Cui et al., 2008). It provides (multi-)bi-variate AutoRegressive modeling, spectral analysis through coherence and Granger causality, and network analysis. Available at http://www.brain-smart. org/.

Finding Information in Neural Data (FIND)
"FIND" is a platform-independent framework for the analysis of neuronal data based on MATLAB (Meier et al., 2008). It provides a unified data import function from various proprietary formats simplifying standardized interfacing with analysis tools and allows analysis of discrete series of spike events, continuous time series, and imaging data. Also, allows simulating multielectrode activity using point-process based stochastic model. Available at http://find.bccn.uni-freiburg.de/.

Spike Train Analysis Toolkit (STAToolkit)
"STAToolkit" is a MATLAB/C-hybrid toolbox implementing information theoretic methods to quantify how well the stimuli can be distinguished based on the timing of neuronal firing patterns in a spike train (Goldberg et al., 2009). Available at http:// neuroanalysis.org.

PANDORA
"PANDORA" is a MATLAB-based toolbox that extracts userdefined characteristics from spike train signals and create numerical database tables from them (Gunay et al., 2009). Further analyses (e.g., drug and parameter effects, spike shape characterization, histogramming and comparison of distributions, cross-correlation, etc.) can then be performed on these tables. Spike detection and feature extraction can also be performed. It is available at http://software.incf.org/software/ pandora.

Chronux
"Chronux" toolbox is developed in MATLAB for the analysis of both point process and continuous data (Bokil et al., 2010). It provides spike sorting, and local regression and multitaper spectral analysis of neural signals. Available at http://chronux. org/.

SPKTool
"SPKTool" is coded in MATLAB for the detection and analysis of neural spiking activity (Liu et al., 2011). It performs spike detection, feature extraction, manual and semi-automatic clustering of spike trains. Available at http://spktool.sourceforge. net/.

SigMate
"SigMate" is a MATLAB-based comprehensive framework that allows preprocessing and analysis of EEG, LFPs, and spike signals (Mahmud et al., 2012a). It's main contribution is in the analysis of LFPs which includes data display, file operations, baseline correction, artifact removal, noise characterization, current source density (CSD) analysis, latency estimation from LFPs and CSDs, determination of cortical layer activation order using LFPs and CSDs, and single LFP clustering. The EEG and spike analysis are provided through EEGLAB (see Section 2.1.1) and Wave_Clus (see Section 2.3.1) toolboxes. It can be obtained from https://sites.google.com/site/muftimahmud/codes.

Multivariate Granger Causality Toolbox (MVGC)
"MVGC" is a toolbox written in MATLAB that implements WienerGranger causality (G-causality) on multiple equivalent representations of a vector autoregressive model in both time and frequency domains (Barnett and Seth, 2014). It can be applied to neuroelectric, neuromagnetic, and fMRI signals and can be obtained from http://www.sussex.ac.uk/sackler/mvgc/.

QSpike Tools
"QSpike Tools" is a Linux/Unix-based cloud-computing framework, modeled using client-server architecture and developed in MATLAB / Bash scripts 2 , for processing and analysis of extracellular spike trains (Mahmud et al., 2014). It performs batch preprocessing of CPU-intensive operations for each channel (e.g., filtering, multi-unit activity detection, spike-sorting, etc.), in parallel, by delegating them to a multi-core computer or to a computers cluster. It can be obtained from https://sites.google.com/site/qspiketool/.

Toolboxes for Spike Sorting
As seen in the literature, majority of the efforts have been devoted in developing tools for spike sorting and analysis. A recent review by Rey et al. outlines the basic concepts of spike sorting, applicability requirements, and shortcoming of currently available algorithms (Rey et al., 2015). Detailing all spike-sorting packages and their functionalities would require a complete review, therefore, here we restrict our discussion to some of the popular open-source toolboxes.

Wave_Clus
"Wave_Clus" is the most popular spike sorting package to date. Developed in MATLAB, it uses wavelet transformation based feature selection method and superparamagnetic clustering (Blatt et al., 1996) method to sort the spikes into different classes (Quian Quiroga et al., 2004). It is available at https://vis.caltech.edu/r odri/Wave_clus/Wave_clus_home.htm.

UltraMegaSort2000
"UltraMegaSort2000" is a MATLAB based toolbox for spike detection and clustering which implements a hierarchical clustering scheme using similarities of spike shape and spike timing statistics, and provides false-positive and false-negative errors as quality evaluation metrics (Fee et al., 1996;Hill et al., 2011). Available at http://physics.ucsd.edu/neurophysics/ software.php.

EToS
"EToS" is a spike sorting toolbox written in C++ implementing multimodality-weighted PCA and variational Bayes for student's t mixture model (Takekawa et al., 2012). The spike sorting code is parallelized through OpenMP (www.openmp.org) and available at http://etos.sourceforge.net/.

MClust
"MClust" is a spike sorting toolbox developed in MATLAB. It supports both manual and automated clustering with possibility to manual feature selection (Redish, 2014). It can be obtained from http://redishlab.neuroscience.umn.edu/MClust/ MClust.html.

NEV2lkit
"NEV2lKit" is a package written in C++ with routines for analysis, visualization and classification of spikes (Bongard et al., 2014). Its results are accurate, efficient and consistency across experiments. Available at http://nev2lkit.sourceforge.net/.

WIToolbox
"WIToolbox" implements a combination of wavelet transform and information theory using MATLAB for better classification of spikes on the occasions of spike time-jitter, background noise, and sample size problem (Lopes-dos Santos et al., 2015). Available at www.le.ac.uk/csn/WI.

SHARING OF ANALYSIS TOOLS AND EXPERIMENTAL DATA
Making available to the community analysis toolboxes for easy and efficient handling of massive neuronal data is just a part of the solution. The other part is the availability of infrastructures which would allow these tools and the experimental data to be shared. Computational neuroscientists are putting constant and significant efforts in building and refining "Neuroinformatics" infrastructures, as outlined below, for making data, tools, and resources electronically accessible over the web (Ascoli, 2006b) which is believed to help and facilitate the standardization, benchmarking process, and foster collaborative research (Mahmud et al., 2012b). As quoted by Prof. Jan G. Bjaalie, "Neuroinformatics applies the methods and approaches required for large scale data integration and thereby paves the way toward understanding the brain" 3 .

Neuroshare
The neuroshare ( The Code Analysis, Repository and Modeling for e-Neuroscience (CARMEN) project was one of its kind in developing a virtual neuroscience laboratory, specially for electrophysiology data, facilitating e-Neuroscience through creating a unique infrastructure for data and tools sharing and services (Watson et al., 2010). These secure services allow a user to curate data and analysis code to defined storages, document experimental protocols, and execute data analysis (Fletcher et al., 2008).
The data as such cannot be curated to the databases of CARMEN without having a proper metadata description about it. This description is essential for accessing correct data out of the thousands of available datasets and interpreting them using the appropriate analysis codes (Jessop et al., 2010). The CARMEN framework currently supports analysis codes written in MATLAB, Python, C/C++, and R. The users may upload their codes, in the form of non-interactive standalone command-line applications, wrapping them using a Service Builder tool to create a suitable service format to be executed on the platform (Weeks et al., 2013).
Recently, a programming document demonstrated the usage of a curated repository of multielectrode array recordings of spontaneous activity from mouse and ferret retina. The mentioned dataset was in HD5 4 format (a format for hierarchical data organization), and the document outlined the guide to be followed for the efficient usage of the CARMEN software workflow. Moreover, the dataset structure along with examples of reproducible research using those data files were reported (Eglen et al., 2014).

Neurodata without Borders: Neurophysiology (NWB:N)
To facilitate research reproducibility and to have an opportunity to explore someone else's data, data standardization is a must. The Neurodata Without Borders: Neurophysiology (NWB:N, http://www.nwb.org/) is an initiative aiming at promoting data standardization and sharing. Since it's infancy, the NWB:N has been keen on producing a common data format for recordings and metadata of cellular electrophysiology which has recently been released along with a sample dataset (Teeters et al., 2015).

CHALLENGES AND FUTURE PERSPECTIVES
Secure infrastructures are vital for the success of large-scale, multi-institutional Neuroinformatics research. It is foreseeable that Neuroinformatics research facilities shall be capable of integrating data seamlessly from different sources for data sharing, but also they should be secure enough to address challenging issues like -• research collaboration with the option to protect their proprietary data, • user friendliness allowing users with minimal information technology skills to explore, navigate, and use scientific data and services provided by the environment.
In the recent years, the emergence and popularity of distributed computing render an opportunity to share resources that otherwise require more effort. In particular, cloud computing and service oriented architecture open novel avenues necessary to foster collaborative neuronal signal analysis through distributed infrastructure. These approaches allow better representation of responsibilities taken by the different users in accordance to their granted privileges. In our opinion, the development is expected toward: • Design and implementation of secure and protected systems; • Advance on cloud based web applications; • Facilitate easy deployment of data; • Reusability and sharing of tools with adaptability to changing requirements; • Empower researchers to share functionalities that they want to publish.
Based on the current state-of-the-art, we identified few challenges that require immediate attention of the community, a few are indicated below: 1. Over the last few years, the neuroscientists have put together quite a few useful neuroimage repositories and their analysis tools (Eickhoff et al., 2016), but neurophysiology is lagging behind. Though there exist a few individual databases (e.g., http://brainliner.jp/, http://www.g-node.org/, https:// www.ieeg.org/, etc.), they are very poor in comparison to their imaging counterpart (Tripathy et al., 2014). 2. With the actual acquisition systems and the needed data formats changes, inter-operability and data conversion is still a nightmare due to the lack of widely adopted standards. In addition, when the data are being curated in a databases, the data-description through metadata is again incompatible among different labs/curators which also hampers in conducting meaningful analyses using data from another lab. This unnecessarily increases the time and effort required for data discovery and analysis. 3. Due to the practical problem of rapid and customized analyses, most of the labs develop their own analysis scripts and perform their required analyses. This approach has severe drawbacks on the global scale: interoperability, compatibility, and sharing of tools with other laboratories are highly restricted. Thus, the problem of creating a common set of analyses and the availability of benchmark analysis tools are yet to be addressed. 4. Though the price of computing power has reduced significantly over the years, yet the power required to demystify large neuronal ensembles is still alarmingly high. From a Neuroinformatics perspective, availability of powerful international computing facilities will greatly facilitate remote, automated, and standardized multichannel neuronal signal processing and analysis. 5. Cloud computing's popularity is rapidly growing. Exploiting the bliss of distributed computing, a concept of Competitorto-Collaborator would be very interesting where small clusters of laboratories working on similar research questions would share their resources and tools through a unified cloud-based framework for the other laboratories to be used as webservices.

AUTHOR CONTRIBUTIONS
MM performed the reported study. MM wrote and SV edited the paper. Both authors have seen and approved the final manuscript.

ACKNOWLEDGMENTS
Financial support by the 7th Framework Programme of the European Commission through "RAMP" project (www.rampproject.eu) with contract no. 612058 is kindly acknowledged.