Deep Learning provides exceptional accuracy to ECoG-based Functional Language Mapping for epilepsy surgery

The success of surgical resection in epilepsy patients depends on preserving functionally critical brain regions, while removing pathological tissues. Being the gold standard, electro-cortical stimulation mapping (ESM) helps surgeons in localizing the function of eloquent cortex through electrical stimulation of electrodes placed directly on the cortical brain surface. Due to the potential hazards of ESM, including increased risk of provoked seizures, electrocorticography based functional mapping (ECOG-FM) was introduced as a safer alternative approach. However, ECoG-FM has a low success rate when compared to the ESM. In this study, we address this critical limitation by developing a new algorithm based on deep learning for ECoG-FM and thereby we achieve an accuracy comparable to ESM in identifying eloquent language cortex. In our experiments, with 11 epilepsy patients who underwent presurgical evaluation (through deep learning-based signal analysis on 637 electrodes), our proposed algorithm made an exceptional 23% improvement with respect to the conventional ECoG-FM analysis (∼60%). We obtained the state-of-the-art accuracy of 83.05% in identifying language regions, which has never been achieved before. Our findings have demonstrated, for the first time, that deep learning powered ECoG-FM can serve as a stand-alone modality and avoid likely hazards of the ESM in epilepsy surgery. Hence, reducing the potential for developing post-surgical morbidity in the language function.


INTRODUCTION
Epilepsy is a chronic neurological disorder characterized by recurrent, unpredictable seizures with over 65 million reported cases around the world (Patricia, 2018). Approximately 20% of these patients are diagnosed with drug-resistant epilepsy. The only possible treatment in a majority of these cases is surgical intervention. During epilepsy surgery the pathological brain tissue, which is associated with seizures, might be surgically removed. While epilepsy surgery is a curative option for drug-resistant epilepsy, neurosurgeons need to avoid removing tissues associated with language, sensory, and motor functions. This calls for an accurate identification and localization of these functionally significant brain regions. The surgical procedure can be performed more accurately with a precise localization for preventing any corresponding post-surgical neurological/functional deficits. Toward this end, this study is aimed toward developing an innovative approach for the mapping of language cortex. This would ensure an improved and sustainable post-surgical quality of life for patients presented for brain surgery.
Electro-cortical stimulation mapping (ESM) has been considered as the gold standard for functional cortex localization in epilepsy surgery. ESM is an invasive procedure that uses electrodes placed on the surface of the brain (grid electrodes) or within the brain (depth electrodes). It is considered vital for reducing the risk of language deficits post-surgery and therefore, expanding surgical options. ESM has a long history of serving as the main modality for pre-surgical functional mapping of epilepsy patients. Acute electrical cortical stimulation was successfully performed in 1950 during epilepsy surgery by Penfield and colleagues (Penfield and Rasmussen, 1950;Penfield and Roberts, 1959). During ESM, pairs of electrodes covering the region of interest (in our case -eloquent cortex) are stimulated by delivering a brief electric pulse. The stimulation temporarily disables/inhibits the cortical area of interest (creates a temporary functional lesion). Behavioral changes such as unusual sensation, involuntary movements, or language impairments (i.e., speech paucity), observed during stimulation indicates that the tested area is essential to that task performance, and its resection might lead to functional deficits. Ojemann et al. (1989) studied language localization using ESM, on a large dataset of 117 patients. The study found that there was sufficiently large individual variability in the exact location of language function and concluded that there was a need for an improved language localization model. Much later, more standard and effective tasks for expressive language localization, such as verb generation (Ojemann et al., 2002) and picture naming (Edwards et al., 2010), were tested with the increased use of ESM.
However, one major drawback of ESM is its potential to induce after-discharges (Pouratian et al., 2004), which could result in seizures. Since stimulation provoked seizures can occur rather frequently during ESM procedures (Bank et al., 2014), ESM tests often need to be repeated, leading to extended time and effort from medical professionals (neuropsychologists and/or neurologists). In some cases, the ESM procedure cannot be completed due to repeated seizure activity and/or its consequences.
The current limitations of the ESM have created a strong need for establishing other independent functional mapping modalities to identify eloquent cortex. Unfortunately, as of now, none of the existing neuro-imaging modalities are flexible enough to provide functional mapping results in real time in the operating room. Therefore, the search for a stand-alone methodology for functional eloquent cortex localization has been continuing and resulted in attempts to use electrocorticography (ECoG) as a viable alternative. ECoG is the invasive version of electroencephalography (EEG) and sometimes also referred to as intracranial encephalography, demonstrating excellent temporal resolution like EEG. Importantly, ECoG equipment is portable and can be utilized both at the patients' bed-side and intraoperatively. Unlike EEG, it overcomes the problem of poor spatial resolution, since the activity of interest is recorded directly from the cortical brain surface. It also avoids the problem of electrical signal attenuation in EEG caused by the signal propagation through tissues surrounding the brain. To record ECoG signals, a craniotomy (removal of the skull section: bone flap) is performed and the dura is opened to access the brain tissue. The arrays of grid of electrodes ( Figure 1A, left) are then placed on the exposed cerebral cortex. Following this, ECoG-based functional mapping (ECoG-FM) is performed, while task-based responses from grid electrodes are recorded. Since there is no external electrical stimulation during this process, ECoG-FM is considered as a safer alternative to ESM. When performed in real time (Schalk et al., 2008), ECoG-FM procedure can be referred to as realtime functional mapping (Leuthardt et al., 2007;Miller et al., 2007;Kapeller et al., 2015;Prueckl et al., 2017b). Figure 1A demonstrates the general setup for ECoG-FM recordings.
Although ECoG-FM has its advantages when compared with ESM, the detection accuracy has been low and limits its use in clinical settings for epilepsy patients. It must be noted, that the implanted electrodes in ECoG-FM have the potential to provide signals at a high sampling rate. A data driven approach which utilizes this rich source of data present within the recorded signals could benefit the mapping of functional areas. Our proposed method aims to achieve this goal by utilizing time and frequency domain features extracted from signals recorded using subdural electrodes for ECoG-FM. We propose to utilize deep learningbased methods and report significant performance in classifying electrodes as positive and negative response channels in response to a language mapping task for epilepsy patients.

Related Works and Existing Critical Challenges
ECoG-based approaches have been used successfully for motor cortex localization (Towle et al., 1998;Leuthardt et al., 2007;Miller et al., 2007;Qian et al., 2013;Vansteensel et al., 2013). In comparison, the localization of functional language cortex appears far more complex and challenging (Alkawadri et al., 2018). Current localization approaches are based on detecting positive response channels (called active channels or active electrodes) among the set of all channels. A baseline recording of each channel at resting-state is used to determine signal characteristics at specific frequency ranges. Most often, power of the ECoG signal lies within the alpha (α), beta (β), and (primarily) high gamma (high-γ) (70-170 Hz) frequency bands (Schalk et al., 2008;Prueckl et al., 2013). These values are compared with the signal power measured during the execution of language task. The results of this approach for language mapping have not achieved desirable accuracy. For example, Arya et al. (2015) studied high-γ response from ECoG recording of 7 patients during spontaneous conversation. The results showed low specificity and accuracy. In a follow up FIGURE 1 | Overview of (A) the language localization framework with ECoG-based functional mapping (ECoG-FM) approach. ECoG signal recording, data transfer, storage, research and clinical paths, and tasks are illustrated. ECoG signals are obtained in response to task-related changes (e.g., picture naming) from grid electrodes implanted on the cortical surface in the subdural space, (B) the proposed ECoG signal classification approach for each channel on the cortical surface in the subdural space. PRC, Positive Response Channel; NRC, Negative Response Channel. Each individual step is considered as a "module" in the overall system design. RNN, Recurrent Neural Network. study, Arya et al. (2018b) demonstrated high-gamma modulation for the story-listening task and achieved high specificity, but sensitivity remained low. Korostenskaja et al. (2015) showed that similar to the results for motor cortex, ECoG-FM can be used for eloquent language cortex localization as a complimentary technique with ESM, but not as a stand-alone modality. It has also been demonstrated that ECoG-FM can be used as a guiding tool for ESM, thereby reducing the time of ESM procedure and decreasing the risk of provoked seizures (Prueckl et al., 2017a,b).
Despite their potential, the current ECoG-FM approaches are not found capable enough to be used as a stand-alone methodology for accurate language mapping. To address these challenges and provide ECoG-FM more independence in eloquent language cortex localization, we fill in the following currently existing methodological gaps: i Available approaches compare a channel's signal with its resting-state (baseline) recording and do not compare the channels' characteristics to other recorded channels. ii The signal characteristics in the frequency range beyond high-γ band have not been explored yet to the best of our knowledge.
There has been limited work on validation of ECoG-FM for language cortex localization; hence, there are more evidences needed for utilization of ECoG-FM in the clinics.
In our previous works RaviPrakash et al., 2017), we showed the feasibility of utilizing conventional machine learning methods for channel response classification by using the whole signal spectrum (not limited to α, β & high γ) and without using the baseline recording. This was one of the first machine learning based approaches in this field with strong results and demonstrated the potential of ECoG-FM signal to be analyzed more accurately compared to conventional signal processing based descriptive methods.

Summary of Our Contributions
We propose an innovative deep learning algorithm for accurately classifying the channel response of eloquent cortex, alleviating the current challenges of ECoG-FM. We also show the effectiveness of using deep learning methods in problems with limited number of patients but sufficient samples to achieve desired performance. Our main contributions are as follows: 1. We have used the complete ECoG signal frequency spectrum for the first time to the best of our knowledge to identify signal characteristics for mapping language cortex. 2. Our innovative deep learning models achieve state-of-theart performance in language cortex mapping using ECoG-FM. We have achieved 82% sensitivity in classifying both positive and negative response channels. 3. We have shown that deep learning models can be successfully used in studies even with a limited number of subjects and low dimensional (1D) data. The requirement for sufficient data to train our deep learning model is satisfied as we show that the number of data points (due to a dense grid placement and high-resolution signal) are adequate and our experimental results add credence to these findings.
Our results have indicated that with the proposed deep learning-based classification model, ECoG-FM can be reliably used as a standalone technique for functional language mapping.

MATERIALS AND METHODS
An overview of the proposed system is illustrated in Figure 1B. We pre-processed the ECoG signals using temporal filtering such that the selected samples were synchronized with the start and end points of the task resulting in equal length blocks. We then divided the equal length ECoG signal blocks into overlapping sub-blocks of data. Our aim was to learn discriminative signal patterns and eventually reduce the computational load (Step 1). We learned different sets of signal features independently: frequency domain (i.e., auto-regression) and time domain (deep learning-based features) in Step 2 and Step 3, respectively. After we combined the learned features, we trained a recurrent neural network (RNN), a class of deep learning algorithm suited for analyzing sequential data, to classify sub-blocks of signals (Step 4). Finally, we used the majority voting technique to combine these sub-block labels and determine an overall Positive Response Channel (PRC) or Negative Response Channel (NRC) label (Step 5). A detailed block diagram showing the steps involved in the proposed methodology is shown in Figure 2. Our proposed prediction models were trained and tested using Keras with TensorFlow backend on servers equipped with Nvidia Titan X with 12 Gb graphics memory, 2.7 GHz CPU, 64 GB RAM. In the following sub-sections, we describe each module of our proposed system in detail.

Dataset
The study was approved by the Institutional Review Board (IRB) at AdventHealth Orlando, Orlando, United States. We recruited eleven patients with drug-resistant epilepsy, who underwent presurgical evaluation with ECoG grid implantation. All patients provided their written informed consent to participate in this study. The patients were teenagers and adults with an average age of 23.18 ± 11.61 years. Varying number of electrodes were tested for the patients for a total of 637 electrodes across all patients (see Supplementary Table 1 for summary of the patient demographics). Patients were recruited under IRB approved protocol #276487. Patients diagnosed with Epilepsy (13 years or older) underwent ESM as part of a standard of care pre-surgical evaluation were included in this study. Exclusion criteria were the following: Younger than 13 years old, having audiogenic reflex epilepsy, uncorrected vision, uncorrected hearing, and English language as not a primary (dominant) language (as determined through neuropsychological evaluation).

Pre-processing ECoG Signal (Step 1)
The signals recorded from the patients used implanted electrodes and hence were not affected significantly by artifacts arising from head movement. To ensure that sufficient data was recorded for training the deep learning models, the subjects were made to listen to five different stories during a single trial. To prepare the recorded data for deep learning-based analysis, nontask/control time points in the signal were eliminated using temporal filtering. Hence, the spontaneous activity recordings before the start and trailing signals at the end of the experiment were discarded from the blocks. These synchronized and uniform length blocks were fed to the deep networks for functional

Time Domain RNN (Step 2)
Our goal was to find discriminative signal patterns from ECoG signals, which are time varying and non-stationary 1D sequences. They are non-stationary because task based ECoG recordings can have signal statistics which depend on the time relative to the events. Inspired by the effectiveness of recurrent neural networks in sequence classification tasks in different domains (Graves and Schmidhuber, 2005;Baccouche et al., 2012;Wang et al., 2016), we have developed RNN based deep neural network algorithms to extract discriminative features from time-domain ECoG data. We hypothesized that limitations of the conventional spectral (frequency-based) or time-based signal analysis methods can be overcome with RNN based methods. In RNNs outputs from previous time steps are taken as input for the current time step, thereby forming a directed cyclic graph. RNNs thus learn the relationships in sequential data thereby retaining higher contextual information.
We first used popular EEG features (Hjorth, 1970;Pollock et al., 1990;Inuso et al., 2007) to learn signal characteristics from the ECoG signals. A sliding window approach was applied to extract features, which were then concatenated into a single feature vector to represent the control and activetask blocks in the signal. We have 10 blocks of ECoG data (active + control) for each electrode. This amounts to a recording of 360,000 samples per channel (10 × 30 s × 1200 samples/s = 360,000). Following basic data pre-processing steps, a sliding window of width 600 samples (i.e., 0.5 s) and a stride of 100 samples were used on each data block (active/control). The extracted features included mean, skew, kurtosis, peak to peak value, and Hjorth values (details in Supplementary Material).
The time domain features were fed to the learning module illustrated in Figure 3. The complete ECoG signal contained both control and active task signals, thus; the sub-blocks of control signals were ignored and the input to this time domain module was sub-blocks of active task signals. Recently, 1D convolutional networks have been shown to perform well in time series forecasting and classification tasks (Goodfellow et al., 2018;Zabihi et al., 2019). We designed the module to have two paths comprising of 1D convolutional layers and long-short term memory (LSTM) blocks. LSTM, introduced in 1997 ( Hochreiter and Schmidhuber, 1997), is a type of RNN that has the ability to learn long-term dependencies of data. In literature, LSTM and its variants have primarily been used in 1D sequence classification tasks Nowak et al., 2017) and prediction tasks (Flunkert et al., 2017;Gammulle et al., 2017). In our experiments, we used multiple LSTM layers in different exploratory configurations to learn a more complex feature representation of the input signal.

Frequency Domain Features (Step 3)
One of the objectives of our study was to analyze multi-domain (time and frequency independently) and hybrid-domain (time and frequency combined) signal characteristics. This kind of thorough comparison has never done before to the best of our knowledge for ECoG-FM. In this step (Step 3), we focused on the spectral characterization of signals. Conventional ECoG signal classification approaches are based on frequency-domain, where spectral analysis of the signal is performed to identify the channel response. Traditionally, spectral estimation of the signals is performed by fitting a parametric time domain model to the ECoG signals. One of the most commonly employed models/approaches in this category is the autoregressive model. An AR model for a discrete signal x[n] is represented as, x where a p k are the AR coefficients, p is the order of the AR model, w[n] is a zero mean white noise process with a variance ρ. Once the model in Eq. 1 is solved, the resulting AR parameters were used for characterization of the ECoG signal from frequency-domain perspective.
Methods to solve for the AR parameters are diverse, we used the reflection coefficient estimation-based methods (see Supplementary Material for details of the parameter selection procedure).

Fusion for Hybrid Domain (Step 4)
Using LSTMs in Step 2, we learned a different set of features (i.e., time domain) than the AR features that were generated in Step 3. In domain fusion step (Step 4), these two (largely) complimentary features were combined to obtain a hybrid signal representation model with a new deep network setting, Domain Fusion Network (DFN) (See Figure 4). Although the fusion of features from the time and frequency domain can be done in multiple ways (including stacking, element-wise multiplication, and concatenation), we used a concatenation approach to get full benefit of each domain (time vs frequency). In stacking, the feature vectors can be stacked together to create a feature matrix. Alternatively, a dot product of the feature vectors can be computed to create a single feature vector. This can be interpreted as weighting each feature in one domain by the other domain. With the stacking and element-wise multiplication approaches, the size of the feature vectors from both domains need to be the same. Since time and frequency domain features used in our proposed methodology had different dimensions, employing the concatenation approach enabled us to efficiently utilize these features. In a different perspective, such approaches have been shown effective in modern deep learning architectures such as DenseNet (Huang et al., 2017).
In concatenation, we assumed independence of features; hence, we did not use element-wise multiplication or other approaches for data merging. Since convolution helps identify local patterns and reduce redundant information in the data, the complete feature vector (after concatenation) was then passed through multiple layers of 1D convolutions with an activation function, to weight each feature based on its contribution to the classification problem (PRC vs NRC). Following the 1D convolution layers, the output feature maps were spatially averaged using Global Average Pooling (Lin et al., 2013), making the DFN more robust to spatial translations of the input data and introducing structural regularization to the feature maps. Finally, we inserted a single fully connected layer into the DFN and used a sigmoid activation to perform the final classification.

Majority Voting (Step 5)
The output of the domain fusion model was a label for the input signal, which was a sub-block. Signal from each channel/electrode was made up of hundreds of sub-blocks of the signals with reasonable overlapping. Therefore, for classifying a channel as either PRC or NRC, we hypothesize that the output that is observed more commonly is assigned as the final label. For this purpose, we apply majority voting on the output for each sub-block. For instance, if a channel included 354 sub-blocks and more than 50% of sub-blocks indicated a positive response, that channel was labeled as a PRC. As a rule, whenever the number of negative and positive responses are equal, the channel will not be assigned any label. Although, we did not observe any such channel in our experiments.

Task Paradigm
ECoG signals from the implanted subdural grids are split into two streams: one for continuous clinical seizure monitoring and the other for ECoG-FM ( Figure 1A). The tool used to record the incoming ECoG signal was BCI2000 (Schalk et al., 2008). A baseline recording of the cortical activity was first acquired to capture the "resting-state" neuronal activity. Following this baseline recording step, paradigms similar to those employed in ESM or functional magnetic resonance imaging were used to record the task-related ECoG signal for functional mapping (Korostenskaja et al., 2014a). Figure 5 shows one such paradigm, mimicking the exact details of the experimental setup we have used for the language comprehension task. Alternate 30 second blocks of ECoG data during "control" and "active" conditions were recorded continuously at a fixed sampling rate of 1,200 Hz. For the language comprehension task, the active condition implies listening to a story, while the control task involves listening to broadband noise (Korostenskaja et al., 2014b). For the active condition (i.e., listening to a story), a different story was selected for each block in order to keep the patient attentive and responsive. Both control and active sequences would activate sense of hearing, but the story listening task will particularly activate the language function. We hypothesize that this would suffice in eliciting the desired response for mapping eloquent cortex related to language function and our results have verified this hypothesis. For this purpose, the system recorded information from 128 ECoG channels (128 electrodes in Figure 5) by using g.USBamp bio-signal amplifiers (g.tec Medical Engineering GmbH, Austria) with subdural ground and reference electrodes.

Training Paradigms
Our overall goal was to successfully (and automatically) identify positive response channels and negative response channels in ECoG-FM data using new machine learning models, specifically based on deep neural networks. The ground truth (i.e., reference standard) was inferred from the gold standard ESM results. Owing to the large imbalance in the number of PRCs and NRCs (NRCs outnumbering PRCs by 3:1), we randomly selected equal number of NRCs to balance the data and avoid potential data imbalance problem when training deep learning models.
Each channel's signal comprised of blocks of active task data and control data, where each active task block was from a different story. The discriminative power of these stories in the classification task was unknown. There is a possibility that features from one story could play a more significant role than others. Additionally, the discriminative power of any particular feature is unknown. To ascertain the role of these, we divided our experimental evaluation approaches into three main categories for data classification. Each approach depended upon the way active task data was included, and the features used in the training process. This structured experimental procedure helped us in determining the usefulness of each component of the signal and provided insights into the response of brain regions (through channel responses) to different signals. We performed experiments with different features and architectures.

Our Proposed Deep Network Architectures
In task-based experiments, a response is generally expected only during the active task period and not in the control or rest period. We used fully convolutional network and long shortterm memory architectures in the time domain module, since these have shown success in various time-series classification problems (Karim et al., 2017(Karim et al., , 2019. We built our network by first analyzing the effect of using time domain features during the active task (represented as Active Time-AT). We tested our proposed time domain module by varying the network. We used a fully convolutional network (represented as AT 1 ) and then added LSTM module to the network (represented as AT 2 ). For frequency domain analysis, we added the auto regressive (AR) features to the frequency domain module by passing it through a fully connected layer (represented as AT-AR 1 ). Figure 6A shows the architectures (the superscripts indicate the variation within an architecture) including AT 1 , AT 2 and AT-AR 1 . In the domain fusion module, we tested different combinations of 1D convolutions (represented as AT-AR 2 ) and fully connected layers (represented as AT-AR 3 ). We also varied the depth of the frequency domain module by adding an additional fully connected layer in the network (represented as AT-AR 4 ). The network structures including AT-AR 2 , AT-AR 3 , and AT-AR 4 are shown in Figure 6B. We empirically determined the number of epochs required to train the network such that to avoid overfitting. Our experimental paradigms used time and frequency domain features individually and also in a hybrid manner (combined). We also analyzed the effect of active and control task data.

Model Validation
In supervised machine learning approaches, where a model is trained using ground-truth labels, the goal is to maximize predictive accuracy. However, therein lies the risk of memorizing the data rather than learning the optimal features. This problem of memorizing the data or learning the structure of the data to be the noise in the data is often referred to as overfitting (Dietterich, 1995). It is important for a classification model to be able to generalize to unseen data and avoid the problem of overfitting. The method of testing how the analysis/model generalizes to an independent test dataset is known as crossvalidation. When a completely independent dataset is not available, as is generally the case, the available data is split into training data and validation/test data. There are different types of cross-validation approaches such as leave-one-out crossvalidation, hold-out method, k-fold cross-validation, to name a few (James et al., 2013).

Shuffle-Split Cross Validation
Previously, due to the time-consuming nature of training a deep learning model, we applied the hold-out method to validate our proposed models (Arlot and Celisse, 2010). In this method, the model is trained on a part of the available data, while the remaining data is held for testing/validating the model. For effectively testing the generalization and robustness of our proposed models, we validated them using the shuffle-split cross-validation approach. In the shuffle-split cross-validation method, the data is randomly sampled and split into training and testing splits iteratively, similar to the hold-out method. The results are averaged across the number of iterations. This can be seen as repeating the hold-out method k times, such  that the data for training and validation is randomly sampled each time. The use of shuffle-split method allows sampling different data combinations rather than a single sampling as in k-fold cross-validation. Since the blocks were randomly assigned to the training and test folds, we ensured that no data from an electrode (channel) was represented in both training and testing folds simultaneously. Hence, if a block of data was assigned to a particular set (training/test), then all blocks belonging to that channel were assigned to the same (training/test) set. This ensured a fair evaluation with better generalization accuracy and helped in avoiding overfitting. For 30-fold cross validation, we repeated the experiments 30 times and used each of these distinct and non-overlapping training-testing sets to evaluate our model accuracies. Prediction accuracy was then calculated by averaging the results of these 30 experiments. It took an average of 30 h to train the models for 30-fold cross validation. Although it should be noted that testing a channel would be in real time, once a trained model is available.

Training on Individual Features on Active Task Data (Training Paradigm-I)
In this approach, we assumed that the channel response was similar for different stimuli (story) used in this study. Our experimental paradigm consisted of five different stories and thus, in this approach, no distinction was made with regards to the story. All of the active task data (i.e., five different story tasks) from a channel were used together for training the network. Among the time domain features (See section "Materials and Methods"), we found using random forest method that activity feature gave the best results. Therefore, all our proposed deep learning architecture (Figure 6) were first tested using the activity feature ( Table 1).
The addition of LSTM improved the performance of the time domain module. This was further improved by the addition of the frequency domain features using the domain fusion module. We found that increasing the depth of the frequency domain module did not have any obvious benefit in classification performance. We also tested our hypothesis that the story listening task (active task) was more discriminative in identifying the eloquent cortex. To compare information present in the active and control task data, we replicated our best performing models (AT-AR 2 and AT-AR 3 ) and fed it with control task data (represented as -CT-AR 2 and CT-AR 3 , where CT represents control time). We observed that sensitivity of the control data model was lower than that of active data model, indicating a lower discriminative power ( Table 1) and confirmed our hypothesis. To identify the best features for the channel classification task, we fed the best performing model (AT-AR 3 ), with different hand-crafted features and performed crossvalidation. The performance with different features was found to be similar ( Table 2). The mobility feature showed the best performance with high sensitivity and accuracy compared to the other features.

Training With Multiple Features on Active Task Data -Feature Fusion (Training Paradigm-II)
We hypothesized in this experiment that different features can provide complementary information and can be combined to enhance the model performance. The top performing features from individual feature training (Table 2) -mobility, skew, mean, peak-to-peak (P2P), were used to test the hypothesis. The other three features were not used on the basis that they had a marginally lower specificity. Different approaches to feature fusion were tested in the form of early fusion and late fusion. In the early fusion approach, different features were used as input channels to the best performing network architecture (AT-AR 3 ). In late fusion, we tested two different approaches: first, separate time domain models were retrained for each hand-crafted feature, and a single frequency domain module was trained. The domain fusion module was used to combine these time and frequency domain modules (represented as AT-AR 3 -LF 1 ). Secondly, we experimented by combining the frequency domain module prior to the feature fusion layer (represented as AT-AR 3 -LF 2 ). The performance of these models is presented in Table 3.  In previous experiments so far, we assumed that the channel responds in a similar manner to different stimuli (stories). However, it is plausible that the channel may respond differently to different stimuli. In this training approach, we now assumed that the channel responds differently to different stimuli (stories). The task paradigm consisted of five different stories corresponding to five different task blocks. We hypothesized that PRCs respond differently to the NRCs for each of these stories. We separated the signals based on the story and train the network in a similar manner as in section "Training With Multiple Features on Active Task Data -Feature Fusion (Training Paradigm-II)." Each story was trained with its own time domain and frequency domain modules using the AT-AR 3 network. These different networks were then combined and fed through a fully connected layer. Deep networks with different features as inputs were trained and the performance is compared in Table 4, where the features column shows the particular value (mobility, mean, and activity) computed for each story.

DISCUSSION AND CONCLUDING REMARKS
In this paper, we proposed novel deep learning architectures to classify the channel response of ECoG signals. The results showed the state-of-the-art classification accuracy of 83.05% with high specificity and sensitivity of 80.3 and 85.8%, respectively, in determining whether the channel was positive (has a response) or negative (has no response) in relation to the task stimulus. The different features and fusion approaches have given us the flexibility in maximizing different metrics, where as an example we can improve the specificity to 83.9% (with a 1% drop in accuracy). In general, with AT-AR 3 -LF 2 is our best performing model with values >80% for all performance metrics including accuracy, sensitivity and specificity. In a feasibility study toward using machine learning for ECoG-FM, a random forest classifier was used in detecting positive and negative response channels with an accuracy of 78% . Traditionally, the accuracy of ECoG-FM is high for mapping sensory and motor function, but relatively low for language modality. On an average, ECoG functional language mapping had a lower sensitivity (62%) and higher specificity (75%) to detect language-specific regions [for a comprehensive review, see Korostenskaja et al. (2014b)]. This is in contrast to the results for hand motor (100% sensitivity and 79.7% specificity) and hand sensory (100% sensitivity and 73.87% specificity) ECoG-based mapping (Kapeller et al., 2015).
The results of our current study demonstrate that the accuracy for mapping eloquent cortex using ECoG-FM can now be comparable to both sensory and motor ECoG-FM accuracies. The language ECoG-FM accuracy values we have achieved are the highest among those reported so far (Arya et al., 2018a). Although a number of studies have demonstrated successful utilization of the ECoG-FM as a complimentary tool for ESM (Prueckl et al., 2017a,b), there was not enough evidence to support the use of ECoG-FM as a stand-alone methodology for functional language mapping due to its relatively low accuracy compared with ESM (Korostenskaja et al., 2014b;Kapeller et al., 2015). The outcome of our research has indicated the potential of ECoG-FM, to be considered as a stand-alone modality for eloquent language cortex localization. Our experimental results show performance comparable to what is achieved with ESM for eloquent cortex mapping. Based on this we believe that with our trained models the proposed scheme can be used independent of ESM in surgery planning. To establish the method as ready to be used in clinical practice, a patient/subject-wise analysis followed by blind test evaluation on the models will be performed as we continue to collect more clinical data. It is possible that some features of the ECoG signal, reflecting the complex nature of language processing, were omitted from consideration when restricting the language ECoG-FM analysis to the gamma frequency band only. Expanding analysis to the whole spectrum of frequencies in our study, therefore, has exceeded the results gained from prior analysis approaches. It contributed to the improved classification accuracy and confirmed the results of previous studies, pointing toward the complex nature of language processing that needs to be considered during the analysis of neurophysiological data.
The results of our current study would have a wide-ranging applicability in clinical practice. In particular, our proposed approach can be utilized to prevent functional morbidity postsurgery in patients with pharmacoresistant epilepsy. In addition, this can also be used to increase the accuracy of the eloquent cortex mapping in patients undergoing resection of brain tumors  and arteriovenous malformations .
Deep learning-based ECoG-FM approaches can also be successfully applied in various fields of adaptive neurotechnologies (e.g., neuromodulation), where ECoG-based mapping is performed to determine the best area for responsive stimulation. For example, for defining the neural correlates of tics, the involuntary movements and/or sounds, to be used for responsive stimulation in patients with Tourette's syndrome (Shute et al., 2016;Molina et al., 2018) and for bidirectional neurostimulation via fully implantable neural interfaces in Parkinson's disease (Swann et al., 2018). The applicability of our proposed approach extends as well toward the fields of developmental disorders (e.g., autism) (Sturm et al., 2013;Sinha et al., 2015), psychiatry (e.g., major depression (Coenen et al., 2018) and obsessive-compulsive disorder (Vázquez-Bourgon et al., 2017), addiction (Wang et al., 2018), eating disorders and obesity (Val-Laillet et al., 2015;Oterdoom et al., 2018), where neurostimulation can be potentially utilized to provide treatment and improve patients' quality of life.

Deep Learning When the Data Is Limited
It is traditionally argued that deep learning models are data intensive and only fit to problems where adequate training data is available. The inherently low dimensional nature of physiological data (such as ECoG and EEG) both in terms of samples and subjects has historically restricted this field from taking advantage of the recent advances in machine learning (led by novel deep learning architectures). This trend is arguable for multiple reasons. The number of subjects (which in most of these studies is low) is not a good parameter to decide whether we can use deep learning-based methods. Our results clearly support this argument, where the number of subjects (11 for training the models) could seem small. But the overall data (from 128 electrodes at 1,200 samples per second) having millions of samples was highly sufficient to efficiently train our deep learning models. Our classification accuracy (83.05%) and sensitivity (85.8%) bodes well for selecting deep learning methods in our proposed models. This trend can also be seen in other recent studies (Gal, 2016;Tabar and Halici, 2017;Acharya et al., 2018;Angrick et al., 2019) and we are observing a paradigm shift in classification tasks using 1D physiological data. At the same time, deep learning models are developed that can work with small data. In particular, Bayesian deep learning models are shown to have good performance even when the labeled training data is scarce (Gal, 2016). We conclude that even with limited subjects, the data could be sufficient for successfully using deep learning methods in 1D data classification tasks such as ECoG-FM.

Other Limitations and Future Perspectives
Despite the state-of-the-art results in ECoG-FM predictions, there are some limitations of our work to be noted. First, our experimental paradigm (Figure 5) involves five different stories being played to the subject. The responses to these stories, have some inherent similarities, but overall are different. Therefore, training the deep learning model with a single label for the whole channel could add noise to the model. Though we have tested the effect of training the network, while treating each story individually, this reduces the overall data available to train the model. We believe that these results can be improved by including additional data and then training the system individually for each story in the paradigm. Second, the subjects used in this initial validation study were a mix of teenagers and adults. Smith (2010) found that the effect of epilepsy and seizures on children and adults was different i.e., the rules learned about the behavior of the brain in adults is different for children. Hence, a more comprehensive study with focus on children/teenagers is needed. This is one of our future aims to test the proposed machine learning based approach in different patient populations; however, patient recruitment is difficult due to the involvement of surgery, and disease prevalence.
Finally, in the proposed approach, due to the exploratory and research nature of our study, the classification was not performed in real-time. We have used retrospective data for validating the innovations and are currently working on the real-time clinical implementation of the algorithm. We intend to extend this study for mapping functional language cortex in prospective subjects. We believe that implementing such a reliable technology will increase current presurgical and intra-operative functional mapping accuracy, expand surgical treatment opportunities, prevent post-surgical language morbidity, and improve patient outcomes.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by AdventHealth Orlando. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
MK, UB, and EC: idea development, current study conception. HR, CS, SA, and UB: deep learning design, computational experiments. MK, CS, EC, and KL: patient recruitment, IRB process, ECoG tasks, ESM experiments. All authors: Manuscript writing, editing, and significant revision.