Underwater Acoustic Source Localization via Kernel Extreme Learning Machine

Fiber-optic hydrophones have received extensive research interests due to their advantage in ocean underwater target detection. Here, kernel extreme learning machine (K-ELM) is introduced to source localization in underwater ocean waveguide. As a data-driven machine learning method, K-ELM does not need a priori environment information compared to the conventional method of match field processing. The acoustic source localization is considered as a supervised classification problem, and the normalized sample covariance matrix formed over a number of snapshots is utilized as an input. The K-ELM is trained to classify sample covariance matrices (SCMs) into different depth and range classes with simulation. The source position can be estimated directly from the normalized SCMs with K-ELM. The results show that the K-ELM method achieves satisfactory high accuracy on both range and depth localization. The proposed K-ELM method provides an alternative approach for ocean underwater source localization, especially in the case with less a priori environment information.


INTRODUCTION
Underwater source localization in ocean waveguides is a vital task in the military and civilian fields, and has become a research focus in applied ocean acoustics [1]. Fiber-optic hydrophone (FOHP) technology is a promising way for acoustic wave measurement and considered a viable alternative to conventional piezoelectric needle and membrane hydrophones [2]. FOHP overcomes several limitations of conventional hydrophones, such as inaccurate reproduction of high negative pressures, cavitation tendency, low bandwidth, bad electromagnetic shielding, and ageing problems. The classical methods for extracting source localization from measured acoustic signal are model-based. The most widely used model-based method is matched field processing (MFP) [3][4][5][6][7][8][9][10][11]. As to MFP, the a priori environment information, for example, sound speed profile (SSP) and acoustic properties of seafloor, is required to model the sound pressure field. The modeled sound pressure field is matched with replica fields, which can be derived from a given propagation model and environmental parameters. The location where the experimental field best matches with the modeled field is taken as the estimated source location. Therefore, accurate a priori environment information and appropriate propagation model are extremely essential for an ideal localization result. Unfortunately, the accurate environment parameters are variational and hard to acquire exactly, which makes MFP difficult for practical application.
Recently, repaid development of machine learning methods and successful applications in many conventional fields give rise to more interests on data-based techniques, such as deep neural network, support vector machine, and random forest [12,13]. These algorithms exhibit great performance in signal processing [14,15], computer vision [16], natural language processing [17], and medicine [18]. These algorithms learn the latent pattern between input and output from a large amount of data. Compared with conventional model-based methods, machine learning algorithms perform more accurately and robustly. Many machine learning algorithms have been utilized to process the underwater acoustic signal for ship classification, direction-of-arrival estimation, target tracking, and acoustic source localization [19][20][21][22][23]. Researches aim to improve the localization accuracy and decrease the dependence of environmental information. However, previously proposed algorithms depend heavily on initial parameter selection commonly. Some feed-forward neural network-based algorithms are faced with overfit problem. Usually, a lot of experience and efforts are needed for the architecture design and the parameter selection, such as the number of hidden layers and neurons, loss function, and activation function. To improve the accuracy and robustness of underwater acoustic source localization, some new methods are desirable.
In this article, kernel-based extreme learning machine (K-ELM) is introduced to localize the ocean underwater acoustic source. K-ELM is an improvement of ELM. The ELM is a kind of single hidden layer neural network, but free of back-propagation algorithm [24][25][26]. Thus, it shows great approximation capability as traditional feed-forward neural network, but less training time. However, the randomly initialized parameters of ELM lead to the unideal stability. As an improvement of ELM, K-ELM adopts the kernel function as the replacement of the randomly initialized hidden layer. The kernel function has the same approximation capability with the conventional hidden layer and performs more stably. Moreover, K-ELM does not need to design the number of hidden neurons, which is an important parameter in most machine learning algorithms. The source localization is solved by K-ELM as a classification task. The sound pressure signal measured by vertical linear array (VLA) is preprocessed and transferred into sample covariance matrix (SCM). K-ELM is trained with the pair of simulated SCM and range or depth. Note that, to predict the range and depth, two K-ELM models with different parameters (output weight matrix) are necessary. Then, the performance of K-ELM is evaluated in simulation. In particular, the K-ELM method achieves high localization performance in both accuracy and processing time.
The remainder of this article is organized as follows. The principle of ELM and K-ELM is introduced in the Theory and Model section. The sound propagation model is described, and the application of K-ELM for source localization is given in the Ocean Underwater Source Localization Using K-ELM section. The simulation results and discussion are given in the Results and Discussion section. The conclusions of this paper are presented in the Conclusion section.

THEORY AND MODEL
A typical single hidden layer neural network concludes three layers, and its structure is depicted in Figure 1 [11].
For N input samples, x i is the ith group of the input data, and it equals [x i1 , x i2 , . . . ,x in ] T ∈R n . The prediction result y is mathematically expressed as follows [24]: where p is the number of hidden nodes, β j [β j1 , β j2 , . . . ,β jm ] T ∈R m is the weight vector that connects the jth hidden node and the output nodes, h j (x) is the output function of the hidden nodes, w j [w j1 , w j2 , . . . ,w jn ] T ∈R n is the weight vector and connects the jth hidden node and the input nodes, b j is the bias of the jth hidden node, and σ(x) is an excitation function. The predicted results can approximate the ideal outputs t i with zero error as ||y i -t i || 0, if σ(x) is infinitely differentiable, where t i is the corresponding label of x i and it equals [t i1 ,t i2 , . . . ,t im ] T ∈R m . For simplicity, the above N equations can be rewritten as where T is the hidden layer output matrix, and B [β 1 ,β 2 ,...,β p ] T is the output weight matrix. Thus, B can be obtained as follows [24]: where H † represents the generalized inverse matrix of H. To overcome the singular problems in calculating HH T , the regularization coefficient I/e is added to the main diagonal in the diagonal matrix. Therefore, the output weight is rewritten as follows [24]: where I is the unit diagonal matrix and e is the punishment coefficient. The predicted result Y HB can be obtained as Frontiers in Physics | www.frontiersin.org April 2021 | Volume 9 | Article 653875 Here, we introduce the kernel function to enhance the stability of feature extraction. The kernel function is defined as follows [26]: For the linear indivisible low-dimensional data, the kernel function can map it into the high-dimensional space, yielding the data being divided. There are many kinds of kernel functions: the one used in this work is Gaussian function K (u, v) exp (−c||u-v|| 2 ), where c 1/(2σ 2 ) and σ is standard deviation. Then, the output function of K-ELM classifier can be written as follows [26]: Due to the adoption of kernel function, the number of hidden neurons is self-adapted. Thus, the simple and effective K-ELM method has great potential in the field of underwater acoustic source localization.

OCEAN UNDERWATER SOURCE LOCALIZATION USING K-ELM Physical Signal Model
Considering a single narrowband sound source impinges on a vertical linear array of M sensors formed with FOHP in a farfield scenario. The measured sound pressure field is initiated with a source of frequency ω at the range y and the depth z.
Considering the presence of noise during the transport, the sound pressure field can be expressed in frequency domain as follows [4]: where T is the measured pressure field at frequency ω obtained by taking the discrete Fourier transform of the received pressure field for a period of observation times, α(ω) and n(ω) are the complex multiplicative noise and the additive noise, s(ω) is the source spectrum at frequency ω, and g (ω, θ, y, z) is the transfer function related to the frequency ω, environment parameter θ, source range y, and depth z. Furthermore, the pressure field of far-field broadband source with frequency ω q ∈[ω 1 , ω Q ] at position (y, z) can be written as follows [6]: x G θ, y, z s + n, where Q is the number of discrete frequency bins, T is defined as an extended vector for both narrowband and broadband cases, n [n T (ω 1 ), . . ., n T (ω Q )] T is defined similarly with x,s [(1+ α(ω 1 ))s (ω 1 ), . . ., (1+α(ω Q ))s (ω Q )] T is a Q-dimensional vector, and G (θ, y, z) is a QM × Q matrix written as follows [23]: The multiplicative noise is modeled as a complex random perturbation factor α |α|exp (jϕ), which is commonly assumed to be a complex Gaussian distribution. The perturbation factor level is defined as 10 log 10 |α(ω)| 2 (dB). The additive noise is modeled as a complex Gaussian distribution with mean value 0 and variance δ 2 . Due to the decrease of the signal level with range increment, the signal-to-noise ratio (SNR) is defined at the most distant range bin as follows [18]:

K-ELM Method for Source Localization
Here, K-ELM is employed for source localization, which is achieved as follows: 1) Simulate the acoustic pressure signal with respect to different ranges and depths. 2) Add noise with different SNRs onto the ideal simulation signals to generate the noised signal for testing. 3) Preprocess the input data. The discrete Fourier transform is conducted to transform the measured sound pressure signal into the frequency domain. Then, the SCMs are calculated and vectorized to be the input data of the K-ELM model. 4) Utilize the SCMs of ideal signals as the training data, and adopt the SCMs of noised signals as the testing data. 5) Train two K-ELM models for range and depth prediction on the training set. 6) Predict the source range and the depth on testing set using the well-trained K-ELM models.
The training samples are simulated under different ranges and depths. The range of acoustic source is changed from 1 km to 8 km with a step of 5 m, and the depth of acoustic source varies from 10 m to 200 m with a step of 1 m. Therefore, the number of training samples is 1,400 and 190 for range and depth localization training set, respectively. The testing samples are acquired by adding noise to training samples, among which, 561 samples are selected to build the test set.
The structure of K-ELM is self-adaptive. Thus, the input neurons and hidden neurons do not need to be designed. The only parameter to be trained is the output weight matrix. We apply the K-ELM to source localization in both narrowband case (Q 1, single frequency) and broadband case (Q ≥ 2, multiple frequencies). Note that the signals with different bandwidth are divided into several groups, and the training and evaluation are conducted utilizing data within the same groups.
Before utilizing the K-ELM for source localization, data preprocessing is conducted to extract the feature and reduce the data redundancy. Firstly, to reduce the effect of the source spectrum s(ω), the complex sound pressure p(ω) at the frequency ω is normalized by the following [23].
where ||·|| denotes the norm and (·) H denotes the complex conjugate transpose. In order to obtain an accurate localization result, the normalized SCM, C(ω), is usually formed from the normalized sound pressure p s (ω) at the sth snapshot and averaged over N s snapshots.
where Ns is the number of snapshots and C(ω) is a conjugate symmetric matrix. Thus, only the upper triangular matrix entries are enough. The real and imaginary parts of these entries are separated and vectorized to a 1D vector C(ω), which consists of M × (M +1) elements in single frequency case (Q 1). For the broadband case (Q ≥ 2), the input data is constructed by C(ω)with different single frequencies as [C T (ω 1 ), C T (ω 2 ), . . ., is the same with the single frequency case.
With the 5 m and 1 m interval, 1,500 groups and 190 groups of training data with respect to range and depth are simulated. The normalized SCM at each range is formed over two 1-s snapshots, according to Eq. (13). In this work, the number of elements of VLA is 21. Thus, the number of input vectors is 21 × (21 + 1) 462 for narrowband case and 21 × (21 + 1) × Q 462Qfor broadband case.

RESULTS AND DISCUSSION
Simulations are conducted to evaluate the performance of the proposed methods. In this section, we use KRAKEN [27] to simulate the acoustic data in a shallow water waveguide which is similar to that of the SWellEx-96 experiment [28] as illustrated in Figure 2. The environment parameters are considered as rangeindependent and depth-independent. Here, four layers are considered, that is, water layer, sediment layer, mudstone layer, and seafloor half-space. The SSP of water layer is shown in Figure 2. The parameters of the environment are given in Table 1.
Then, the source range varies from 1 km to 8 km with a step of 5 m, and the depth is changed from 10 m to 200 m with a step of 1 m. The source signal is assumed to contain a series of multitones ({49, 94, 148, 235, 399} Hz) which is the same as SWellEx-96 experiment. The VLA consisted of 21 hydrophones spanning a    We adopt two measures to quantify the performance of K-ELM in source localization task. There are MAPE and 10% error interval. These two measures are defined as where L is the sample number in the test set. R g and R t are the predicted range by K-ELM and the true range, respectively. The 10% error interval defines the upper bound R u and lower bound R l with respect to the truth range R t .
The K-ELM model is evaluated under three SNRs: 5 dB, 0 dB, and −5 dB. For each SNR, both narrowband and broadband sources are considered. The perturbation factor level is set to −10 dB for all cases. Figure 3 shows the localization results of K-ELM under different SNRs in narrowband and broadband cases. For simplicity, only the results of 235 Hz and {94, 148, 388} Hz are presented in Figure 3. The predicted ranges are plotted as the red circle, the true ranges are plotted as the blue line, and the 10% error interval is also plotted as the gray shadow area. As shown in Figure 4, the range localization accuracy degrades with the decrease of the SNR. For narrowband case, the prediction results are similar at 5 dB and 0 dB and show great decline at −5 dB, especially on the ranges near both ends. But most predicted ranges are still within the 10% error interval. The MAPE of this case is 8.27. For broadband case, the performance is better than that of narrowband case, especially at low SNR of −5 dB with MAPE of 0.11. With the decrease of SNR, the prediction results show slight fluctuation and degradation. The MAPE of K-ELM with {94 148 388} Hz signal is 0.11, which is 8-fold less than narrowband case. The MAPE of MFP method with {94 148 388} Hz signal condition is 33.5%, which is greatly larger than K-ELM.
Next, more detailed MAPEs with different frequencies and SNRs are given in Table 2. It can be seen that the performance of K-ELM under broadband case is much better than narrowband case. For example, the broadband signal consisting of two narrowband signals with low frequencies, {49 94} Hz, achieves better performance than that of 49 Hz and 94 Hz cases, even than the best performance in narrowband case achieved by 388 Hz. In particular, the accuracy of broadband case is much better than narrowband case at low SNRs (<0 dB), where the minimum MAPEs for narrowband and broadband case are 5.05 and 0.07, respectively. The minimum MAPE at low SNR is reduced by 72 times. On the other hand, some inner rules exist in narrowband and broadband cases. For narrowband case, the accuracy increases with the increment of signal frequency, while for broadband case, the accuracy is influenced by two factors, that is, the number of frequency bins and the highest frequency of broadband signal. With the increase in the number of frequency bins, the MAPE decreases gradually. Meanwhile, for the broadband signal which has the same frequency bins, the signal with higher frequency achieves better accuracy; for example, MAPEs of {94 148 388} Hz case and {49 94 235} Hz case at −5 dB are 0.11 and 0.16, respectively. To draw a conclusion, multiple-frequency source signals and high frequency signals contain more information, leading to better performance. At last, the time for processing 560 signals with K-ELM model is about 0.05 s. That is to say, the proceeding speed is fast, which is desirable for real-time testing in practice.
The performance of source depth localization by K-ELM method is further tested. The depth prediction is regarded as classification task as well. The source depth is simulated from 10 m to 200 m with a step of 1 m. The prediction results with three SNRs under narrowband and broadband case are illustrated in Figure 4. For simplicity, the results of 235 Hz and {94 148 388} Hz, which are different from those in range prediction, are presented in Figure 4.
According to Figure 4, the performance of localization shows similar rule to that in broadband case. The accuracy degrades with the decrease of SNR in both narrowband and broadband case. The adoption of broadband signal greatly improves the performance greatly as well. At low SNR (−5 dB), the MAPE of narrowband case is 7.53%, and that of broadband case is 0.09%, which is reduced 87 times. At the same condition, the MAPE of  Table 3 similar to the range localization. The MAPE distribution of Table 3 shows similar rule to Table 2. The MAPEs of depth prediction are relatively smaller than range prediction. The minimum MAPEs at low SNR for narrowband and broadband are 4.82 and 0.04, respectively. It is reduced by 80 times.
The K-ELM algorithm shows reasonable predictions on acoustic source localization, which demonstrate its advantages on signal processing and always provide a global optimum without the need of iterative tuning.

CONCLUSION
In summary, a machine learning method, K-LEM, is proposed to achieve the single ocean underwater acoustic source localization. Sound pressure signals received from fiber-optic hydrophone with different frequencies and SNRs are utilized to investigate the performance of K-ELM for source localization. The acoustic pressure signal measured by VLA is transformed to frequency domain and preprocessed into normalized SCM as the input of the K-ELM. These SCMs are classified into different ranges or depths by K-ELM algorithm. The results show that K-ELM performs well in both range and depth localization task under various frequencies and SNRs. In particular, in case of SNR with −5 dB, the least MAPEs of range localization for narrowband and broadband cases are 0.08 and 0, and those of depth localization for narrowband and broadband case are 0.06 and 0.00, respectively. Meanwhile, composition of narrowband signals can greatly improve the prediction accuracy. The maximum reductions of MAPE for range and depth localization are 136 and 146 times. Moreover, the processing time of K-ELM method for 560-group data is only 0.05 s, indicating the high processing speed in underwater acoustic source localization, which makes it possible for real-time testing. Thus, the K-ELM gives an accurate and effective way for ocean underwater source localization.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
ZH and JH conceived and performed the papers collection and manuscript writing; PX and MN were involved in the paper writing; KL and GL were involved in the paper review and editing. All the authors contributed to the discussion on the results for this manuscript.