Automatic Recognition of Auditory Brainstem Response Characteristic Waveform based on BiLSTM

Background Auditory brainstem response (ABR) test is widely used in newborn hearing screening and hearing disease diagnosis. Identifying and marking are challenging and repetitive tasks because of complex rules of ABR characteristic waveform and interference of background noise. Methods This study proposes an automatic method to recognize ABR characteristic waveform. First, binarization is created to mark 1024 sampling points accordingly. The selected characteristic area of ABR data is 0-8ms. The marking area is enlarged to expand feature information and reduce marking error. Second, a bi-directional long short-term memory (BiLSTM) network structure is established to improve relevance of sampling points, and an ABR sampling point classifier is obtained by training. Finally, mark points are obtained through thresholding. Results Specific structure, related parameters, recognition effect, and noise resistance of network were explored in 614 sets of ABR clinical data, and recognition accuracy of waves I, III, and V can reach 92.91%. Discussion Thus, the proposed method can reduce the repetitive work of doctors and meet accuracy effectively. Therefore, this method has clinical potential.


1.Introduction
ABR is an electrical activity of nerve impulses in brainstem auditory conduction pathway caused by acoustic stimulation. It can observe functional status of auditory nerve and lower auditory center, and reflect conduction ability of brainstem auditory pathway [1,2]. Given that patient's hearing impairment can be diagnosed without his active cooperation, ABR has become one of the routine methods for newborn hearing screening and adult hearing disease diagnosis [3,4,5]. ABR waveform usually has a short eclipse period of 10ms, and electrode intensity in microvolts is recorded. ABR can usually record seven normal phase waves, that are indicated by Roman numerals Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ, Ⅵ, and VII in clinical medicine. Where, waves Ⅰ, Ⅲ, and Ⅴ are often used as clinical diagnosis basis because of their obvious characteristics [6,7]. Figure 1 states the annotated ABR waveforms, which mainly identify waves I, III, and V clinically.
Other characteristic waves are usually not displayed clearly because of small amplitude, two-wave fusion and noise interference. Thus, they are rarely used as a basis for diagnosis.

Fig. 1 Annotated ABR waveforms
In clinical diagnosis, the minimum short sound stimulation intensity of wave V is usually used as ABR threshold. Sometimes when wave III is greater than wave V, ABR threshold is judged by stimulation intensity of wave III [8]. In determining lesions, the location can be judged according to the eclipse period of waves Ⅰ, Ⅲ, and Ⅴ and the eclipse period between waves and binaural waves [9]. Furthermore, types of deafness of a patient can be judged by observing the change characteristics of ABR waveform latency and special shape of ABR waveform in the same patient under different stimulation levels. Thus, ABR threshold and eclipse period of waves I, III, and V, which is of great significance in clinical applications can be obtained by identifying position of the characteristic wave of ABR. Usually, acquired auditory brainstem evoked potential signal weakened fully. In clinical testing, multiple tests must be performed to superimpose, average, and obtain relatively stable waveform results. This process is susceptible to interference from spontaneous signals. In addition, electrodes placed on the top of skull or mastoid will be disturbed by outside world, thereby resulting in unobvious peaks of evoked potential waveform, overlapping peaks, and false peaks.
Performing multiple tests on patients and comparing results, which not only consume a lot of time but are also prone to subjective judgment errors, are usually necessary.
Thus, identifying waveform characteristics of ABR and avoiding interference caused by unclear differentiation, fuzzy characteristics, and abnormal waveforms are important issues that needs to be solved urgently and correctly in clinical ABR auscultation.
The application of computer technology in assisting medical diagnosis can effectively reduce errors caused by repetitive work and complex waveform characteristics. This research direction has been important for ABR consultation for a long time [10]. For example, Wilson [11] discussed the relationship between ABR and discrete wavelet transform reconstructed waveforms, indicating that the discrete wavelet transform waveform of ABR can be used as an effective time-frequency representation of normal ABR but with certain limitations. Especially in some cases, the reconstructed ABR discrete wavelet transform wave is missing because of the invariance of discrete wavelet transform shift. Bradly and Wilson [12] further studied the method of using derivative wavelet estimation to automatically analyze ABR, which improved accuracy of main wave identification to a high level. However, they also mentioned the need for further research on the performance of waveform recognition of abnormal subjects, and manual judgment of abnormal waveforms is still required under clinical conditions. Zhang et al. [13] proposed an ABR classification method that combined wavelet transform and Bayesian network to reduce the number of stimulus repetitions and avoid the nerve fatigue of the examinee. Important features are extracted through image thresholding and wavelet transform. Subsequently, features were applied as variables to classify using Bayesian networks. Experimental results show that the ABR data with only 128 repetitive stimulations can achieve an accuracy of 84.17%.
Compared with the clinical test that usually requires 2000 repetitions, the detection efficiency of ABR is improved greatly, but the eclipse period is prolonged by 0.1ms. Moreover, the wave intervals of waves I_III and waves III_V tend to be equal.
Unfavorable factors will occur when the wave interval is applied as a diagnostic index.
As a heuristic method, neural network can compensate for the lack of feature extraction in traditional methods to improve recognition accuracy, thereby becoming a research direction in recent years [14]. For example, Gholami-Boroujeny et al. [15] proposed a nonlinear adaptive noise cancellation algorithm based on multilayer perceptron neural network, and compared it with a linear adaptive noise cancellation algorithm based on least mean square adaptive filtering. The results show that their method requires less recording time and performs better under low signal-to-noise ratio.
Molina et al. [16] proposed a method to classify ABR through symbolic pattern discovery. Initially, the numerical time series is converted into symbolic time series.
Then the symbol pattern discovery technology is applied to the output symbol sequence.
Finally, a classification technology based on the recognition pattern is used to classify new individuals. The researchers stated that medical personnel accept this method easily because the system will output diagnostic results with medical terminology.
Fallatah et al. [17] proposed a new algorithm for detecting speech ABR. The detection time is shortened without reducing the accuracy by using the constructed spectral feature vector as the input of the neural network. In comparison with four methods that are based on wavelet transform and approximate entropy artificial neural network, this method has higher recognition accuracy and only requires a small amount of running time. Although neural networks have been explored in the field of ABR auscultation, they mainly focus on preprocessing or qualitative judgment, and cannot quantify the location of characteristic waveform. Also, the established model cannot fully extract the ABR waveform characteristics. Thus, meeting the accuracy of clinical requirements is difficult. Therefore, artificial neural network is still a very important challenge in identifying ABR waveforms automatically.
In summary, automatic recognition of ABR waveforms through computer-assisted methods can provide diagnostic evidence and assist clinicians and audiologists in ABR interpretation effectively. It also reduces the errors caused by subjective factors, the interference of complex waveforms, and the burden of a large number of repetitive tasks for medical staff. The neural network has long-term research value in the recognition of ABR characteristic waveforms. This study proposes a method of using LSTM network to identify waves Ⅰ, Ⅲ, and Ⅴ in the ABR waveform, and proposes a new idea for the recognition of ABR characteristic waveforms by neural networks. The structure of the study is organized as follows: The experimental data and the detailed description of the proposed method are presented in Chapter 2. Chapters 3 presents the experimental design and the corresponding results. Finally, Chapter 4 discusses this work.

Data Source
The data are provided by Department of Otolaryngology Head and Neck Surgery, Chinese PLA General Hospital. The SmartEP evoked potential test system developed by the American Smart Listening Company is used for measurement and acquisition. All data were marked by three clinical audiologists with characteristic waves: wave I, wave III, and wave V and cross-validated. Finally, the data were randomly divided into training and test sets. A total of 491 training sets were used to train the network model, and 123 test sets were used for the final recognition accuracy test.

Data Processing
To quantify waveform and label points, two 1024×1 matrices were generated as the classification train and label of 1024 sampling points, respectively. The equation of the original training data A is expressed as follows: In actual processing, the loss function value can easily reach a low level, and sufficient information cannot be learned because the ratio of the labeled value to the unlabeled value in the 321 sample points is only 3:318. The manually labeled information may also bring certain errors. Thus, this study adopted the method of augmenting the position of the identification point in the training label. The four points (0.1ms) before and after the original marking point were marked as the characteristic area, which expands the marking range of the characteristic waveform.

Network structure
BiLSTM is established as the network structure to enable the input sequence to have a connection with one another [18]. Figure 4 shows that another LSTM layer that  ( ) U and a b are the weights and biases. Finally, the memory cell t C is calculated to the next time step by using Eq. (7): where is the Hadamard product, which indicates that the corresponding positions of the matrix are multiplied. The right side refers to the output gate, and the output t o of the output gate is calculated by using Eq. (8): The predicted output weight V and bias c are applied to activate the output value to obtain the predicted value ˆt y , as shown in Eq. (10)

Wavelet Transform
In the traditional mode, wavelet transform is a commonly used method in ABR extraction and recognition research [19]. In ABR extraction, wavelet transform can achieve the effect of eliminating noise by selecting the detail components of specific frequencies for reconstruction and to make the ABR waveform smoother. Obtaining relatively clear waveforms while reducing repetitive stimulation is also possible.

Results
In this study, three sets of experiments, including: (1) comparison between various network structures, (2) comparison experiment of wavelet transform, and (3) comparison experiment of different hidden layer nodes were designed. Figure 6 is experimental flowchart. The sequence input layer was used as the input of the potential value of 321 sampling points, and the data were passed to several LSTM or BiLSTM layers. Subsequently, the fully connected layer was connected. The classification probability of each time point was calculated using the softmax function. Finally, the classification layer was connected. The cross-entropy function [21]  In the comparative experiment of the wavelet transform, all data added noise as interference. Seven different network structures were used for testing. For instance, the training data preprocessed by wavelet transform were used as the experimental group, and the training data trained using the original data were used as the control group. In this experiment, ABR data were decomposed in six layers, and the approximate and detailed components of the 6th and 4th, 5th, and 6th layers were retained to reconstruct the waveform, respectively. The parameter configuration is consistent. The network was trained with 5 K-fold cross-validation (K=9), and the test was performed to obtain the average value. Four recognition results of ABR data were randomly selected and presented in Figure 8. After threshold processing, output vectors from models were converted to feature points. Compared with manual labels, feature points are similar with them in position. These recognitions have clinical potential. Therefore, they also verify possibility of the proposed method. To better verify accuracy of recognition, this work has carried out a quantitative discussion from different network structures, wavelet transform processing and number of hidden neurons. Fig. 8 Recognition results of four data, where a, b, c, and d are manual labels. Also, e, f, g, and h represent outputs of model.

Comparison between multiple network structures
Generally, an error scale of 0.2ms is applied as scale range of clinically marked points. The maximum allowable error value ME was set. By traversing all the identification points, if the distance between the eclipse period of the prediction point and the identification point was within the range of ME, then the prediction result was correct. According to the number of correct prediction points p r and the total marked points n p , the accuracy rate ACC is calculated using  Table 1:  increase ACC with number of superimposed layers. After BiLSTM network reaches three layers, ACC will no longer increase significantly. Network structure will gradually reach an over-fitting state and increase computational pressure because of excessive parameters. Thus, three-layer BiLSTM network is a better choice.

Wavelet transform experiment
When testing ACC of wavelet transform, ABR data was decomposed in 6 layers.
Also, approximate components of 6th layer and detailed components of the 4th, 5th, and 6th layers were retained to reconstruct waveform. Figure 10 expressed an instance of filtered result by wavelet transform. Curve processed by wavelet transform becomes smoother. Then, unprocessed ABR data served as a control experiment. In this work, detection and comparison were carried out based on two error scales of 0.1 and 0.2ms ( Table 2). Results of recognition ACC is expressed in Figure 11:

Comparative experiments of different hidden layer nodes
Based on above results, three-layer BiLSTM network is a better choice. ACC with different hidden node numbers were discussed in this work (Table 3). Figure 9b expressed ACC results with different hidden layer nodes of 64,128,256 and 512.
Obviously, recognition ACC increases with number of hidden nodes, because enough parameters make network fitting accurately. Also, ACC of 0.2ms error scale increases slowly during the change process of 256-512 nodes and has basically saturated.
Considering accuracy standard in practical applications and time cost of training that may be brought by increasing number of hidden nodes, a network of 512 hidden nodes is a better choice.
Furthermore, this work mainly discusses characteristic wave recognition process of a short-sound ABR with a 96dB stimulus. Also, only parameters such as latency and wave interval can be obtained. In clinical applications, many indicators can still be used as a diagnostic basis, such as relationship between potential values of different stimulus sizes, response and disappearance of wave V and change of eclipse period of each characteristic wave. This also provide a new idea for the subsequent computer-assisted ABR diagnosis and treatment.

4.Discussion
This work proposes an automatic recognition method for ABR characteristic waveforms using BiLSTM network. The main purpose is to identify positions of characteristic waves I, III, and V, which assist medical staff in obtaining relevant clinical test parameters, such as eclipse period and wave interval. A data quantification process is designed to analyze the characteristic waveform of ABR, including selection area of potential signal and expansion of label position. Optimal network model structure is obtained through multiple sets of comparative experiments. In 614 sets of clinically collected ABR waveform experiments, network's overall recognition of characteristic waves showed an ACC of 92.91%.
Experimental results express that the method proposes a new idea for identification of ABR characteristic waveforms, and helps professionals to obtain eclipse period parameters in ABR waveforms. Therefore, computer automatic identification method can obtain deeper information, avoid subjective judgment error of medical staff in the manual identification process effectively, reduce number of repeated stimulations during test and also avoid the vision fatigue of the tested person. Because of noise immunity of proposed network model, it can effectively reduce repetitive detection of patients. In process of large-scale identification, average time of each data by using the method only takes approximately 0.05s, which is much faster than speed of manual identification. Thus, it has great advantages in repeatable work.