Single-Trial Mechanisms Underlying Changes in Averaged P300 ERP Amplitude and Latency in Military Service Members After Combat Deployment

Attenuation in P300 amplitude has been characterized in a wide range of neurological and psychiatric disorders such as dementia, schizophrenia, and posttraumatic stress disorder (PTSD). However, it is unclear whether the attenuation observed in the averaged event-related potential (ERP) is due to the reduction of neural resources available for cognitive processing, the decreased consistency of cognitive resource allocation, or the increased instability of cognitive processing speed. In this study, we investigated this problem by estimating single-trial P300 amplitude and latency using a modified Woody filter and examined the relation between amplitudes and latencies from the single-trial level to the averaged ERP level. ERPs were recorded from 30 military service members returning from combat deployment at two time points separated by 6 or 12 months. A conventional visual oddball task was used to elicit P300. We observed that the extent of changes in the within-subject average P300 amplitude over time was significantly correlated with the amount of change in three single-trial measures: (1) the latency variance of the single-trial P300 (r = −0.440, p = 0.0102); (2) the percentage of P300-absent trials (r = −0.488, p = 0.005); and (3) the consistent variation of the single-trial amplitude (r = 0.571, p = 0.0022). These findings suggest that there are multiple underlying mechanisms on the single-trial level that contribute to the changes in amplitudes seen at the averaged ERP level. The changes between the first and second assessments were quantified with the intraclass correlation coefficient, the standard error of measurement and the minimal detectable difference. The unique population, the small sample size and the large fraction of participants lost to follow up precludes generalizations of these measures of change to other populations.


INTRODUCTION
The within-subject average P300 event-related potential (ERP) has demonstrated significant promise as an objective physiological measure of cognitive processing (Polich and Herbst, 2000). Variation in its amplitude (Pfefferbaum et al., 1989;Fabiani et al., 1990;Noldy et al., 1990;Polich, 1997), and latency have been well characterized in normal populations (McCarthy and Donchin, 1981;Verleger, 1997Verleger, , 2010Leuthold and Sommer, 1998;Doucet and Stelmack, 1999). Attenuation in P300 amplitude and slowing in P300 latency have also been associated with a wide range of neurological and psychiatric disorders such as dementia (Hedges et al., 2016), schizophrenia (Ford et al., 1994;Mathalon et al., 2000;Oribe et al., 2015), traumatic brain injury (Gaetz and Bernstein, 2001), and posttraumatic stress disorder (PTSD; McFarlane et al., 1993;Metzger et al., 1997;Kimble et al., 2000;Felmingham et al., 2002;Wang et al., 2017Wang et al., , 2018. The article by Ford et al. (1994) is particularly relevant to this contribution. Ford et al. (1994) examined P300s obtained from schizophrenics and from healthy comparison participants. They reported three observations. First, schizophrenics had fewer trials passing the P300 screen; that is, schizophrenics had fewer trials that elicited a response that satisfied the criterion for the presence of a P300. Second, the amplitude of P300s that passed criterion was smaller in schizophrenics, and, third, the single-trial latency was greater in schizophrenics. Demonstrating further potential for clinical use as a correlate to cognitive function, P300 has been shown to strengthen in response to various treatments (Kouri et al., 1996;Werber et al., 2003;Tilki et al., 2004;Chang et al., 2014;Khedr et al., 2014;Vaitkevičius et al., 2015). However, the mechanisms underlying the variations in within-subject average P300 remain unclear.
P300 is typically measured from an average of many single trials. Understanding the characteristics of the single trials may help to explain the differences in P300 seen in past studies and reveal insight into the underlying variation of cognitive processing. Changes in the amplitude of grandaveraged P300 ERPs may be due to a number of single-trial factors (Figure 1). First, the single-trial amplitudes may be smaller or larger across the trials that elicit a P300 ERP. Second, there may be a change in the proportion of trials that elicit a P300 ERP, and the proportion will be reflected in the withinsubject average. Finally, an increase (decrease) in latency jitter could lead to smaller (larger) P300 ERP amplitude. Another single-trial effect that would not impact the change in withinsubject average P300 ERP amplitude but would impact the change in ERP latency is a uniform shift in the single-trial latency mean.
Each of the above proposed alternative mechanisms has a different cognitive implication. Intermittently occurring P300s would be consistent with clinical reports of attentional fluctuations and distractibility in veterans with PTSD (Davis et al., 1996). Overall reduction of all P300-elicited trials would suggest the allocation of fewer resources for processing the stimulus, either due to depletion of available resources and/or to the misallocation of existing resources (Langer and Eickhoff, 2013). Single-trial latency variability could suggest fluctuations in processing strategy during data collection, change in instability or efficiency of cognitive processing, which would be consistent with studies relating single-trial latency variation to intra-individual reaction time characteristics (Saville et al., 2011) and of the association between aging and P300 attenuation (Daffner et al., 2006). The ability to extract meaningful information about alterations in neural resources available for cognitive processing within a single trial level of granularity could provide an objective metric of cognitive capacity. Disruption of neurocognitive function is a critical post-deployment health concern. it is influenced by factors that can be disrupted by combat exposure such as insomnia, stress and pain, making this of central importance when assessing deployment-related sequelae.
In this study, we aim to decouple these disparate underlying mechanisms by estimating the trial-by-trial P300 amplitude and latency using a modified Woody filter and examining the relations of amplitudes and latencies from the singletrial level to the averaged ERP level. Utilizing a longitudinal study design to monitor P300 ERP within-subject changes in the aftermath of combat trauma, we recruited military service members recently returned from a combat deployment in either Iraq or Afghanistan to undergo a baseline EEG assessment, with subsequent follow-up assessment at 6 or 12 months. The P300 ERPs were measured using a conventional visual oddball task paradigm.

Participants
Thirty military service members (age 30.4 ± 7.2 years, 27 men and three women) returned from a deployment of at least 3 months in either Iraq or Afghanistan were included in this study. Participants were not compensated for their participation. They completed a baseline ERP assessment within 2 months of their return, and a subsequent follow-up assessment at 6 or 12 months. Eighty-five candidate participants were screened. Of these 80 had a baseline EEG/ERP assessment. Of the 80, 10 had a follow-up EEG/ERP at 6 months and 28 had a followup at 12 months. The low retest rate is typical in studies of service personnel recently who returned from overseas duty. In many cases they are reassigned to other duty stations or separate from service. As will be addressed in a subsequent section of this article, the low retest rate has significant implications for the interpretation of quantitative measures of retest reliability. At the time of baseline assessment, participants did not meet the diagnostic criteria for PTSD, major depressive disorder or post-concussion syndrome. Exclusion criteria were the following: a current Glasgow Coma Scale score less than 13; a history of head injury resulting in loss of consciousness for 60 min or more; visual acuity lower than 20/100 after correction; psychosis; active suicidal or homicidal ideation; pregnancy; a PTSD Checklist-Military Version score (Forbes et al., 2001) greater than or equal to 50, or a diagnosis of PTSD made by an experienced psychologist using the Clinician-Administered PTSD Scale (Weathers et al., 2001) based on the DSM-IV criteria; a diagnosis of post-concussion syndrome according to the International Classification of Diseases, 10th Clinical Modification; and a Patient Health Questionnaire-9 score (Spitzer et al., 1999 and the Patient Health Questionnaire Primary Care Study Group) greater than or equal to 10.
All participants provided written informed consent in accordance with the protocol approved by institutional review boards at Uniformed Services University, Walter Reed National Military Medical Center, and the National Institutes of Health.

Task Paradigm
Scalp EEG was recorded at both baseline and follow-up assessments while participants performed a visual oddball task. Visual stimuli were presented by a digital tachistoscope of our own design and construction. The tachistoscope was a 5 × 5 square array of yellow, light-emitting diodes. Each diode was 1 cm in diameter. Given spacing between LEDs, the array was 6 × 6 cm. The standard visual stimulus was a vertical stimulus which consists of the five vertical center line LEDs illuminated simultaneously for 40 ms. The target visual stimulus was a horizontal stimulus which was composed of the five horizontal center line LEDs illuminated simultaneously for 40 ms. Each subject received 125 stimuli in total, of which about 21% (26 ± 1 trials) were target and 79% (99 ± 1 trials) were standard stimuli. The subjects were instructed to maintain a silent count of the number of target stimulus presentations and to report their count at the end. The inter-stimulus onset time was varied randomly between 1.4 and 1.8 s. The experiment lasted about 3.5 min.
Recordings were obtained in a steel-enclosed electromagnetically shielded chamber that was lined with sound absorbent material. Gold electrodes were used and the impedance of each channel was less than 5 K . Low level ambient light was on throughout the procedure. Prior to initiation of the task, participants were instructed from a standardized script. The task was described and the participant was asked to respond ''as quickly and as accurately as possible.'' The recording was preceded by practice trials to ensure that the participant understood the task.

EEG Recording
The scalp EEG was recorded using an EPA6 amplifier (Sensorium Inc.) and Grass electrodes (Natus Neurology Inc.) at Fz, Cz, Pz, Oz, C3, and C4 according to the standard 10-20 electrode system, with linked earlobes as reference and a forehead ground. Electrode impedances were maintained under 5 K . EOG was recorded from two electrodes placed above and below the right eye. The sampling rate was 2,048 Hz, and the analog filter band-pass was 0.02-500 Hz.

EEG Data Processing
EEG data were analyzed offline using custom scripts written in MATLAB 1 . The continuous EEG signals from each participant were first visually inspected. Channels with poor signal quality were removed from further analysis. EOG artifacts were 1 www.mathworks.com corrected by using a regression approach (Croft and Barry, 1998). After EOG correction the data were high-pass filtered at 1 Hz and low-pass filtered at 50 Hz. Continuous EEG data were then segmented into epochs from −500 to 1,000 ms with respect to the stimulus onset. Trials with activity exceeding ±75 µV were excluded from analysis. The overall trial rejection rate was 4.2%. The rejection rates for target and standard stimuli were 4.6% and 4.1%. The ERP waveforms for target and standard stimuli were extracted by averaging those preprocessed epochs. Electrode location of maximum P300 activation, Pz, was used for all further analysis. For each subject, the averaged P300 ERP amplitude and latency were measured as the voltage of the largest positive peak of target ERP within 250-500 ms and the time from stimulus onset to the maximum positive amplitude within 250-500 ms, respectively.

Single-Trial P300 Analysis
Analysis was limited to responses to target stimuli. Singletrial latencies and amplitudes were determined by calculating the correlation between a single trial and a template that was determined using the procedure presented in Thornton et al. (2007) and Thornton (2008). An iterated procedure is used to produce the template.
1. The average of all single trials, T, is computed. 2. Single trials are divided into three subgroups, A, B and C corresponding to the first, second and third of trials in recorded order. 3. The average of each subgroup is calculated. 4. The lag between T and the Subgroup A is determined and denoted by L A . Similarly, the lag between T and the average of Subgroup B is L B and the lag between T and the average of Subgroup C is L C . 5. A new template is formed by averaging Subgroup A single trials shifted by L A with Subgroup B single trials shifted by L B with Subgroup C trials shifted by L C . 6. The process is re-entered at Step 4 using the new template. 7. The process continues to iterate until the difference between the iterated templates is less than a prespecified difference. Thornton et al. (2007) use the phrase ''until no further changes'' result.
The number of subgroups is then increased by three and the process continues with trials divided across six subgroups until, as before, the iterated template is stable. The increase in subgroup number continues until the number of subgroups is equal to or just below one half the total number of single trials. By using this procedure all shift latencies used to calculate the template are determined from correlations determined between average signals (the current template correlated with the average of a subgroup). This prevents the possibility, present in latencies determined between a template and a single trial, of a maximum correlation lag obtained with a signal component that is due to noise.
The three resulting outputs determined by the maximum correlation with the template were: (1) an estimated peak P300 latency for each trial and (2) the corresponding P300 amplitude at that peak latency for each trial, along with (3) a correlation coefficient per trial which indicated how close each trial matched with the averaged P300 ERP template. Subsequently, trials were defined as having an elicited ERP if they had a correlation coefficient greater than 0.3. The legitimacy of this criterion was investigated by reviewing a very large number of single trials visually to determine if ERPs would be inappropriately lost when this criterion was used. We did not observe instances where this occurred with the 0.3 correlation criterion. Of note, this threshold was varied from 0.1 to 0.4, with no change on the significance of the results. Trials without an ERP were removed and tracked as single-trial-level measure in and of itself, as ''% P300-absent trials.'' The remaining trials were then used to calculate the mean and standard deviation (SD) of the single-trial amplitudes and single-trial latencies.
The following values were obtained from each participant using within-subject average ERPs: 1. Baseline amplitudes of the within-subject average ERP 2. Follow-up amplitudes of the within-subject averaged ERP 3. The difference of these amplitudes (Follow-up-Baseline) 4. Baseline latencies of the within-subject average ERP 5. Follow-up latencies of the within-subject average ERP 6. The difference of these latencies (Follow-up-Baseline) The following values are obtained from each participant using the distributions determined from the participant's set of single trials: 1. Baseline percentage of trials with no ERP 2. Baseline distribution of single-trial amplitudes (mean and standard deviation) 3. Baseline distribution of single-trial latencies (mean and standard deviation) 4. Follow-up percentage of trials with no ERP 5. Follow-up distribution of single-trial amplitudes (mean and standard deviation) 6. Follow-up distribution of single-trial latencies (mean and standard deviation) 7. Change in single-trial mean amplitudes (Follow-up-Baseline) 8. Change in single-trial latencies (Follow-up-Baseline) 9. Change in the percentage of trials with no ERP (Follow-up-Baseline) Pearson correlations are calculated between changes in statistics characterizing the average ERP and changes in statistics that characterize single-trial distributions ( Table 1). The P-values and confidence intervals for the correlation coefficients were determined using the percentile bootstrap with 10,000 bootstrap samples per measure. Confidence intervals are adjusted using the Bonferroni correction to have overall coverage probability of 95% within each table. Each table lists 10 comparisons, so this corresponds to a 99.5% coverage probability for each individual confidence interval. The p-values reported are not adjusted, and if the reader wishes to compare them to the common 0.05 and 0.01 significance level, they should use the adjusted significance levels of 0.005 and 0.001, respectively. Note that using a Bonferroni correction will generally lead to a conservative (lower than prescribed) family-wise error rate when test statistics are correlated, as should be expected for the correlations amongst the pre-post measures. This results in reduced power to detect true effects. However, as with any hypothesis test, there is always a tradeoff between the ability to detect true effects while avoiding the detection of spurious effects, and we have chosen to err on the side of avoiding the detection of spurious effects.

RESULTS
The P300 amplitude and latency were measured from each participant's ERP at electrode Pz. Figure 2 shows that the participants overall did not show any differences in their averaged P300 from baseline to follow-up assessment at the group level. This is expected since the cohort was studied longitudinally with no clinical diagnoses for PTSD, major depressive disorder, or post-concussion syndrome at baseline and no specified treatment between baseline and follow-up. We examined the within-subject correlations between the changes (calculated as follow-up minus baseline) in P300 measures on the grand-averaged level and single-trial level ( Table 1).
As shown in Figure 3, the changes in the average P300 ERP amplitude are significantly correlated to two of the hypothesized underlying single-trial measures as illustrated in Figure 1, column 2-4 and plotted in Figure 3. First, P300 amplitude was negatively correlated with percentage of P300-absent trials out of the total number of trials (r = −0.488, p = 0.005). Second, P300 amplitude was positively correlated with amplitude mean (r = 0.571, p = 0.0022). The P300 amplitude was not significantly correlated with latency SD (r = −0.44, p = 0.0102) after correction for multiple comparisons, however, its confidence interval (−0.795, 0.047) is suggestive of anything from a moderately negative correlation to a very weak positive correlation. The observed correlations amongst the remaining single-trial measures, the change in single-trial P300 latency mean and amplitude SD, and the change in averaged P300 ERP amplitude did not achieve statistical significance at the 0.05 significance level after adjusting for multiple comparisons ( Table 1).
The associations between the change in P300 ERP latency and the change in each of the single-trial measures are shown in Figure 4. We found that the changes in P300 average ERP latency was positively correlated with the changes in the P300 latency mean on a single-trial level (r = 0.622, p = 0.004). This result is consistent with the expected electrophysiological effect of the P300 latency mean on the classic averaged P300 latency (Figure 1, last column). No other observed correlations achieved the 0.05 significance level after adjustment for multiple comparisons ( Table 1).

Quantifying T1 to T2 Differences
This is not a clinical population. As indicated in the description of the inclusion/exclusion criteria, participants were excluded based on traumatic brain injury history, suicidal ideation, PTSD symptoms, psychological symptoms identified by the Patient Health Questionnaire and the presence of persistent post-concussion symptoms. The measures quantifying singletrial ERPs reported in this article cannot, therefore, be correlated  The estimated correlation (confidence interval) for each pair is reported, as well as the bootstrapped p-value. Confidence intervals have been adjusted for 10 multiple comparisons using the Bonferroni correction to have simultaneous confidence level of 95%. with major clinical pathologies, as was done in, for example, Ford et al. (1994). These measures indicate, admittedly provisionally given the small sample size, the test-retest reliability of these measures as quantified by the intraclass correlation coefficient, the ICC. There are several variants of the ICC. Shrout and Fleiss (1979) published six and McGraw and Wong (1996) published 10. Guidelines for selecting the appropriate version are given in Müller and Büttner (1994). Following that guidance, we report here ICC (2,1) for five measures ( Table 2). The implications of these results for clinical practice must be considered with caution. Because the intraclass correlation coefficient can be used to calculate the clinically important Standard Error of Measurement and the Minimal Detectable Difference (Portney and Watkins, 2015, also reported for these measures in the table), it might be thought possible to use these measures longitudinally to assess treatment response or disease progression. This requires interpretation of intraclass correlation coefficients. To a limited degree, a comparative sense of the intraclass correlation coefficients obtained here can be obtained by comparing them to ICCs obtained with psychophysiological measures known to be stable in a healthy population, for example, measures of heart rate variability computed from RR interval sequences (the sequence of time intervals separating peaks of successive QRS complexes in the ECG). Killian et al. (2015) report the following from healthy adult controls at rest: mean RR interval (ICC (2,1) = 0.791), SD of RR intervals (ICC (2,1) = 0.831), root mean square of successive RR intervals (ICC (2,1) = 0.814) and ratio low frequency to high frequency bands of the RR spectrum (ICC (2,1) = 0.886). ICC's obtained with the ERP data analyzed here are discernibly lower. Portney and Watkins provide the following general guidance (Portney and Watkins, 2015, pp. 594-595): ''As a general guideline, we suggest that values above 0.75 are indicative of good reliability, and those below 0.75 poor to moderate reliability. For many clinical measurements, reliability should exceed 0.90 to ensure reasonable validity. These are only guidelines, however, and should not be used as absolute standards. Researchers and clinicians must defend their judgments within the context of the specific scores being assessed and the degree of acceptable precision in the measurement'' (emphasis in the original text).
Additional considerations further temper the possibility of clinical utility. In addition to the limitations of the sample size in this study, it must be noted that reliability coefficients are population-dependent. A change that may be clinically significant in an age-matched civilian population, which typically displays highest reliability, may well be at noise level in a population of returned service personnel who have experienced combat exposure and some of whom may on entry into the study be below clinical threshold but prodromal for significant psychopathology. These intraclass correlation coefficients do not, therefore, generalize to other populations. The high loss to follow-up (the failure to obtain a second assessment) is an additional cause of concern. As previously noted, this low retest rate is typical in studies of service personnel recently returned from overseas duty (which is the population of interest to this program). This is a problem in reliability studies because it cannot be assumed that the measures under investigation are uniformly distributed across the groups that did and did not receive a second assessment. The population available for a second assessment may be significantly different from the population lost to follow-up.
In aggregate, these considerations argue that at least in this population the measures of ERPs examined here will have limited longitudinal clinical utility when used in isolation as single measures. It remains possible that a continuation study with a larger population and possibly using additional measures of ERP dynamics may be more successful. Also, greater utility might be obtained when these measures are combined with other data. in a multivariate assessment of change.
FIGURE 1 | Illustration of the single-trial measures and their subsequent effects on the grand-averaged event-related potentials (ERPs). First column (Baseline, green): example of baseline trials with P300 elicited in four trials and absent in one trial. Second column, % Trls no ERP: each individual P300 remains unchanged, however the proportion of P300-absent trials increases. This is measured as a percent change in the number of P300-absent trials to total number of trials and results in a reduced classic grand-averaged P300 amplitude (bottom row). Third column, Amplitude Mean: each individual P300 is smaller in amplitude, number of P300-absent trials remains the same as baseline. Amplitude Mean is calculated as the mean of the estimated P300 amplitudes using only the four trials with an elicited P300. The middle P300-absent trial is not included in the calculation. The change in Amplitude Mean also results in a reduced classic grand-averaged P300 amplitude (bottom row). Fourth column, Latency Standard Deviation (SD): number of elicited P300s and their amplitudes are unchanged. Only variation is the latency of each P300 peak such that the latency variance (jitter) is increased (both slower and faster), but the average P300 latency remains constant, resulting in a smaller (and broader) averaged P300 amplitude (bottom row). Fifth column, Latency Mean: the latency of each P300 peak is consistently slower, resulting in a slower averaged ERP latency and no change to the ERP amplitude (bottom row). Green arrows used to help visually clarify shifts from baseline P300s (first column, green).

DISCUSSION
We utilized a longitudinal, repeated-measures study design to investigate the underlying mechanisms for the variation seen in grand-averaged ERPs. Within 2 months after their return from combat deployment, 30 military service members were asked to perform a visual oddball P300 assessment as a baseline measure, then again after 6 or 12 months. Since classically averaged ERPs are computed as an average over multiple trials with any trial-to-trial variation averaged out, we examined the P300 measures on a single-trial level in order to understand better the variations seen in the averaged P300 measures. We observed that the variation in P300 amplitude was significantly associated with changes in single-trial amplitude mean and the proportion of P300-absent trials. Similarly, P300 latency was significantly associated with the changes in single trial latency mean. These results altogether were consistent with the hypothesized single-trial measures contributing to the changes in the averaged ERP shown in Figure 1 and provide evidence of multiple electrophysiological mechanisms underlying the variation in averaged ERP amplitude.
P300 is thought to be a non-specific measure of cognitive health, reflective of fundamental cognitive processes including attention allocation of cortical resources, memory storage, and processing efficiency (Polich, 2007). The P300 component has been widely studied in normal populations (Polich and Herbst, 2000;Verleger et al., 2005) and implicated in a wide range of neurological disorders ranging from cognitive decline with aging, to depression, PTSD, autism, schizophrenia, TBI, and Alzheimers (Oken, 1997;Pan et al., 1999;Reinvang, 1999;Cycowicz, 2000;Jeon and Polich, 2003;Verleger, 2003;Polich, 2004;DeBoer et al., 2004). As such, many studies have decomposed the P300 component into single-trial measures. However, the methodological approach to single-trial analyses has varied. Many studies treated some of these single-trial measures such as latency jitter as task-irrelevant variation to be corrected in order to calculate true P300 amplitude (Roth et al., 1980;Walhovd et al., 2008). Other studies focused solely on intra-subject variation, both to prove its neurophysiological significance and its relation to normal and pathological measures (Ritter et al., 1972;Kutas et al., 1977;Blankertz et al., 2011;Biscaldi et al., 2016;Ouyang et al., 2017). Here we propose that the seemingly conflicting results or partial results all fit into a unified approach considering the multiplicity of P300 mechanisms behind the variations in the classic grand-averaged P300, and that examining the single-trial measures in addition to the averaged P300 would be a FIGURE 2 | Grand-average visually-evoked P300 ERPs from central parietal channel (Pz) at baseline visit (top, red) and follow-up visit (bottom, blue) across all participants (N = 30) for both attended (target) and ignored (standard) conditions. Follow-up assessment was taken at either 6 months or 12 months after baseline assessment. Shaded areas indicate the standard error of the mean. P300 ERPs showed no significant difference between baseline and follow-up visits at the group level.
FIGURE 3 | Scatter plots showing the correlation between the changes of grand-averaged P300 ERP amplitude and (A) the change in the proportion of P300-absent trials (% trials with no ERP, not the number of trials), (B) the change in single-trial P300 amplitude mean, and (C) the change in latency SD. All correlations are statistically significant, suggesting that there are multiple mechanisms underlying the changes seen on the averaged ERP level. Dotted green lines indicate a 95% confidence band for the regression curve. more informative step toward decoupling the different possible mechanisms.
Abnormalities in P300 has been implicated in a wide range of neurological disorders, including delayed-onset PTSD, depression, and neuropsychological and cognitive deficits due to mTBI (Kimble et al., 2000;Karl et al., 2006;Javanbakht et al., 2011;Johnson et al., 2013;Proudfit et al., 2015), all of which are risk factors for our specific cohort of combatexposed yet currently clinically healthy veterans. Again attention is directed to the essential work of Ford et al. (1994) who found that schizophrenics present increased latency jitter, an increased fraction of trials that do not elicit a P300 and smaller single-trial ERP amplitudes. Many studies have tried to link these deficits and disorders with potential underlying causes, such as structural, vascular, etc. Glushakova et al. (2014) have shown evidence of evolving white matter degeneration following TBI, associated with microvascular abnormalities leading to blood-brain barrier damage and progressive inflammatory responses (Araki et al., 2005;Glushakova et al., 2014;Taib et al., 2017). Each of these injuries would have different implications on the electrophysiological impact. Reduction in P300 amplitude may be associated with a multitude of abnormalities, including reduced volume of the anterior cingulate cortex gray matter (Araki et al., 2005) and reduced white matter integrity (Fjell et al., 2011;Tamnes et al., 2012), all of which could contribute to increasing P300 latency delay and variability. Loss of dopamine D1 receptors in caudate and DLPFC (MacDonald et al., 2012) reported in a longitudinal study in Parkinson patients found shortened P300 latency significantly related to dopaminergic systems. Future studies are needed to determine the relations between the individual single-trial measures and their possible structural and physiological causes.
Several limitations of this study should be considered. First, this was a group that began the study with no clinically diagnosed pathologies. Moreover, the P300 latency and amplitude can be influenced by several internal and external factors such as exercise (Yagi et al., 1999), fatigue (Haubert et al., 2018), age and gender (Polich and Herbst, 2000;Ribeiro and Castelo-Bianco, 2019) Taking into consideration withinsubject ERP variability between visits, especially months apart, in addition to the baseline normalcy of our cohort, the participants ideally should be more accurately separated into three groups (deteriorated, stable, and improved) instead of two groups (deteriorated and improved). Last, our linear correlation results may ignore nonlinearity from an individual's cognitive capacity to compensate for injury (Wang et al., 2018). Future studies with greater sample size are needed to properly explore the contributions of each single-trial measure to the strengthening or weakening of the average P300 ERP.
In conclusion, we propose that single-trial analysis may, therefore, serve as a valuable approach to assess cognitive processing and mental health. We demonstrated evidence of multiple electrophysiological mechanisms underlying the variation in averaged ERP amplitude. Here, we propose a unified approach of multiple P300 mechanisms influencing the variations in the classic grand-averaged P300, and that examining the single-trial measures in addition to the averaged P300 could be a step towards decoupling the different possible mechanisms.

DATA AVAILABILITY STATEMENT
The datasets for this manuscript are held by the United States Department of Defense and are not publicly available. Requests to access the datasets should be directed to Paul Rapp (paul.rapp@usuhs.edu)

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Institutional Review Board, Uniformed Services University of the Health Sciences in accordance with all applicable Federal regulations governing the protection of research participants. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
AT acquired the data, conducted the primary analysis and wrote the first draft of the article. PR designed the study and participated in the analysis of the data and revision of the article. CW participated in the acquisition of the data and its analysis. MC established inclusion/exclusion of the participants and conducted the psychological evaluation of participants. DD conducted the statistical analysis and participated in the design of the study. DN participated in data acquisition. MR participated in the design of the study. CC designed and maintained the data acquisition system. DK directed laboratory operations, participated in data acquisition and in the drafting of the manuscript.

FUNDING
We would like to acknowledge support from the Uniformed Services University, the Defense Medical Research and Development Program and the Center for Neuroscience and Regenerative Medicine. Funding was provided by the Center for Neuroscience and Regenerative Medicine Project 351030 and by the Defense Medical Research and Development Program Project D10_1_AR_J5_605.