Monitoring of a Nearshore Small Dolphin Species Using Passive Acoustic Platforms and Supervised Machine Learning Techniques

Passive acoustic monitoring (PAM) is increasingly being adopted as a non-invasive method for the assessment of ocean ecological dynamics. PAM is an important sampling approach for acquiring critical information about marine mammals, especially in areas where data are lacking and where evaluations of threats for vulnerable populations are required. The Indo-Pacific humpback dolphin (IPHD, Sousa chinensis) is a coastal species which inhabits tropical and warm-temperate waters from the eastern Indian Ocean throughout Southeast Asia to central China. A new population of this species was recently discovered in waters southwest of Hainan Island, China. An array of passive acoustic platforms was deployed at depths of 10–20 m (the preferred habitat of humpback dolphins), across sites covering more than 100 km of coastline. In this study, we explored whether the acoustic data recorded by the array could be used to classify IPHD echolocation clicks, with the aim of investigating the spatiotemporal patterns of distribution and acoustic behavior of this species. A number of supervised machine learning algorithms were trained to automatically classify echolocation clicks from the different types of short-broadband pulses recorded. The best performance was reported by a cubic support vector machine (Cubic SVM), which was applied to 19,215 5-min recordings (∼4.2 TB), collected over a period of 75 days at six locations. Subsequently, using spectrogram visualization and audio listening, human operators confirmed the presence of clicks within the selected files. Additionally, other dolphin vocalizations (including whistles, buzzes, and burst pulses) and different sound sources (soniferous fishes, snapping shrimps, human activities) were also reported. The detection range of IPHD clicks was estimated using a transmission loss (TL) model and the performance of the trained classifier was compared with data synchronously collected by an acoustic data logger (A-tag). This study demonstrates that the distribution and habitat use of a coastal and resident dolphin species can be monitored over a large spatiotemporal scale, using an array of passive acoustic platforms and a data analysis protocol that includes both machine learning techniques and spectrogram inspection.

Passive acoustic monitoring (PAM) is increasingly being adopted as a non-invasive method for the assessment of ocean ecological dynamics. PAM is an important sampling approach for acquiring critical information about marine mammals, especially in areas where data are lacking and where evaluations of threats for vulnerable populations are required. The Indo-Pacific humpback dolphin (IPHD, Sousa chinensis) is a coastal species which inhabits tropical and warm-temperate waters from the eastern Indian Ocean throughout Southeast Asia to central China. A new population of this species was recently discovered in waters southwest of Hainan Island, China. An array of passive acoustic platforms was deployed at depths of 10-20 m (the preferred habitat of humpback dolphins), across sites covering more than 100 km of coastline. In this study, we explored whether the acoustic data recorded by the array could be used to classify IPHD echolocation clicks, with the aim of investigating the spatiotemporal patterns of distribution and acoustic behavior of this species. A number of supervised machine learning algorithms were trained to automatically classify echolocation clicks from the different types of short-broadband pulses recorded. The best performance was reported by a cubic support vector machine (Cubic SVM), which was applied to 19,215 5-min recordings (∼4.2 TB), collected over a period of 75 days at six locations. Subsequently, using spectrogram visualization and audio listening, human operators confirmed the presence of clicks within the selected files. Additionally, other dolphin vocalizations (including whistles, buzzes, and burst pulses) and different sound sources (soniferous fishes, snapping shrimps, human activities) were also reported. The detection range of IPHD clicks was estimated using a transmission loss (TL) model and the performance of the trained classifier was compared with data synchronously collected by an acoustic data logger (A-tag). This study demonstrates that the distribution and habitat use of a coastal and resident dolphin species can be monitored over a large spatiotemporal scale, using an array of passive acoustic platforms and a data analysis protocol that includes both machine learning techniques and spectrogram inspection.

INTRODUCTION
Cetaceans evolved from terrestrial ancestors, and their morphology and physiology have adapted to enable them to live as air-breathers in the vast ocean (Perrin et al., 2009). In conjunction with this, their methods of interacting with conspecifics and the surrounding environment have also evolved; and sound is their primary resource for communication (Au, 1993;Tyack and Clark, 2000). Species from the Delphinidae family have exceptional acoustic capabilities and rely on sounds for many vital life functions (e.g., navigation, foraging, feeding, intraspecies communication, coordination, and predator detection). However, determining the distribution of dolphins and their habitat use presents a significant challenge (Tyack, 2008). In comparison to visual and photo-identification surveys which only provide information on surface activities over limited time periods, passive acoustic monitoring (PAM) can be applied continuously over extended periods of time, even in rough weather conditions (Mellinger et al., 2007). PAM is a non-invasive and reliable method for surveying mobile and phonating marine organisms, which can provide information regarding species distribution and activity at high spatiotemporal resolutions (Wang et al., 2016;Hildebrand et al., 2019;Monczak et al., 2019). It is especially useful for marine mammals that are otherwise difficult to visually monitor in the natural environment (Cato and McCauley, 2001).
The use of multiple synchronized acoustic recorders can improve our understanding of cetacean movements and vocalizations across large sampling areas and can be useful for studying changes in acoustic behavior over a range of spatial and temporal scales. Large networks of acoustic and seismic sensors have been used to study the calling behavior of baleen whales (Tervo et al., 2012;Guazzo et al., 2017;Fournet et al., 2018;Leroy et al., 2018) using low-frequency data from seafloor seismic surveys (Wilcock, 2012;Caruso et al., 2016). Arrays of recorders have also been utilized to investigate the occurrence and distribution of cetacean species with smaller home ranges, such as resident coastal dolphins (Munger et al., 2016;Temple et al., 2016;Giorli and Au, 2017;Lammers et al., 2017). In recent years, the identification of dolphin vocalizations using large acoustic datasets has improved our ability to study these animals in the wild (Zimmer, 2011). Moreover, the correlation between dolphin acoustic behavior and ambient noise (natural, anthropogenic, and other biological sources) can provide additional insights into their ecology and habitats (Lammers et al., 2017;Marley et al., 2017). In particular, the monitoring of coastal dolphins is essential to identify critical areas of habitat use and mitigate the impacts of anthropogenic activities (Ingram and Rogan, 2002).
Our improved ability to acquire acoustic data continuously from multiple recording stations over extended periods of time has increased the use of automatic signal identification techniques (Zimmer, 2011) and machine learning approaches (Bianco et al., 2019) for detecting marine mammal vocalizations (Shiu et al., 2020). Several detection and classification algorithms have been developed to identify cetacean sounds, particularly the echolocation clicks of toothed whales (Soldevilla et al., 2008;Roch et al., 2011Roch et al., , 2015Giorli et al., 2016;Caruso et al., 2017;Giorli and Goetz, 2019;Hildebrand et al., 2019). Compared to manual analysis methods (spectrogram analysis), such algorithms have the advantage of being able to quickly and reliably analyze large amounts of data and produce standardized measures of sound characteristics that can be used for statistical analysis (Giorli et al., 2016;Caruso et al., 2017). Among automatic detection systems for small cetaceans (porpoises and dolphins), one approach is the use of data loggers that identify echolocation clicks and provide rapid information regarding animal distribution and biosonar activity (Akamatsu et al., 2005(Akamatsu et al., , 2011Bailey et al., 2010). These instruments save information from detection events and a number of other signal parameters without keeping audio data Zein et al., 2019). This allows the user to collect data over extended periods of time without the need to recharge the device's batteries and without requiring large amounts of memory. This method is limited, however, in that it is unable to provide a full acoustic description of the behavior and surrounding habitat of the species recorded (e.g., vocal repertoire, other sound sources, effect of noise, soundscape trends).
The Indo-Pacific humpback dolphin (IPHD, Sousa chinensis) is a species of coastal Delphinidae which inhabits tropical and warm-temperate waters from eastern India throughout Southeast Asia to central China (Jefferson and Rosenbaum, 2014). At least seven or eight populations are found in Chinese waters (Jefferson and Hung, 2004;Jefferson et al., 2017). The IPHD is classified as "Vulnerable" on the International Union for Conservation of Nature and Natural Resources (IUCN) Red List , is listed as a Grade I National Key Protected Animal by China's Wild Animal Protection Law (issued in 1988) and its protection is of global interest. However, the distribution and abundance of the IPHD is yet to be comprehensively evaluated (Chen et al., 2009;Jefferson et al., 2017). Generally, it is known that the IPHD prefers enclosed inshore habitats (e.g., estuarine areas, bays, rocky reefs, coastal lagoons, and mangrove swamps), which are shallower than 20-30 m and within a few km from the coast (Ross et al., 1994;Jefferson and Hung, 2004;Parra and Ross, 2009). Furthermore, humpback dolphins are extremely vocal and their acoustic behavior plays a fundamental role in both their recognition of the environment and in mediating social interactions (Van Parijs and Corkeron, 2001;Li et al., 2013).
According to local ecological knowledge and line transect boat-based surveys conducted since 2013, a new IPHD population was recorded in the waters southwest of Hainan Island (northern South China Sea, China) for the first time in 2014 (Li et al., 2016). This finding extended the known distribution of the IPHD more than 300 km southward in Chinese waters (Li et al., 2016). Subsequently, extensive boatbased surveys, conducted over 4 years in the waters southwest of Hainan Island have reported over 30 sightings of IPHDs. This suggests that this population of IPHDs occupy a larger home range than other populations previously described in Chinese waters and in other regions of the world (Karczmarski and Cockcroft, 1998;Keith et al., 2002;Cagnazzi et al., 2011;Xu et al., 2015;Jefferson et al., 2017;Wu et al., 2017). However, little is known regarding this new population of IPHD (Li et al., 2016). Research is required to determine its spatiotemporal distribution and behavior, in order to identify its core habitat, develop conservation management plans which may include the designation of marine protected areas (MPAs).
Several passive acoustic projects have been carried out on the world's largest population of IPHD in the Pearl River Estuary (China), which has the highest economic development and most severe impact from human coastal activities in China (He et al., 2014). These studies revealed important information regarding the species' biosonar activity, temporal occurrence, relationship with fish choruses (prey availability), and the potential impact of anthropogenic noise (Wang et al., 2014(Wang et al., , 2019Pine et al., 2017a,b). Long-term PAM of the IPHD has also been conducted at multiple sampling sites north of Lantau Island (Hong Kong), as a part of the environmental impact assessments for several development coastal projects (Munger et al., 2016). However, PAM has never been proposed to investigate a recently discovered population. Designing a sampling protocol to collect accurate information from large areas of shallow water is challenging. However, passive acoustics, in the form of an array of sensors installed along the coastline, can be used to collect data from zones which are currently data-deficient for marine mammals, and subsequently less studied IPHD populations. Moreover, as specific acoustic behavior is identified across multiple PAM sites, ecological data from cetacean hotspots are acquired simultaneously. Thus, through the analysis of other sound sources recorded concurrently, the detection of dolphin vocalizations may provide additional information on their fine-scale activity (e.g., habitat use) within their supposed area of residence.
In this study, we demonstrate the use of PAM for investigating the IPHD population reported in the shallow waters of Hainan Island. A sampling plan and data analysis protocol were developed and tested. We aimed to apply this novel method to studying the habitat of a vulnerable dolphin species, to assess the potential impacts associated with coastal construction activities and to the development of a Habitat Conservation Plan.

Study Area and Passive Acoustic Platforms
The study was carried out in the southwestern waters of Hainan Island, in the coastal area spanning from Sanya to Dongfang (Figure 1). Selection was based on preliminary results from previous periodical boat-based visual surveys and local ecological knowledge . During visual surveys, only IPHD and Indo-Pacific finless porpoise (IPFP, Neophocaena phocaenoides) were observed in the study area. An array of seven PAM platforms (named from P#0 to P#6) were installed on the ocean floor in very shallow waters (10-20 m in depth, Figure 1A). Each platform consisted of a frame supporting a concrete weight and stainless-steel tubes that protected the hydrophone inside the structure ( Figure 1B). PAM platform locations were selected based on information from previous visual surveys and the available literature regarding movements and migration patterns of global humpback dolphin populations, which reported ranges of approximately 100-150 km (Karczmarski and Cockcroft, 1998;Keith et al., 2002;Cagnazzi et al., 2011;Jefferson et al., 2017). In our study area, groups of dolphins were usually sighted traveling at less than 20 km/h. Therefore, a distance of approximately 15-20 km between each platform was chosen. The PAM sampling sites were not progressively named numerically from south to north because platform #0 was previously operated by another PAM study . The PAM platform deployments were carried out during a 5-day cruise between 25 February and 1 March 2018. The area around each monitoring site was characterized by a sandy-muddy seabed.
An acoustic stationary digital recorder (SoundTrap HF, Ocean Instruments Ltd., New Zealand) was mounted on a stainlesssteel bar at the center of the PAM frame, at approximately 1.5 m above the seafloor ( Figure 1B). The SoundTrap recorder had a linear frequency range of 20 Hz-150 kHz ± 3 dB, a selfnoise of less than sea-state 0 in the bandwidth 100 Hz-2 kHz and a sensitivity of -203 dB re V/µPa (high gain setting with a maximum sound pressure level of 172 dB re 1 µPa peak to peak before clipping). A recording duty cycle of 5 min every 30 min (16.6%) was established, with a sampling frequency of 288 kHz with 16-bit quantization. The acoustic recording spanned over 75 days, between 25 February and 10 May 2018, and consisted of 19,215 5-min recordings (∼4.2 TB, Figure 1C). Platform #0 was also equipped with an A-tag acoustic data logger (Marine Micro Technology, Japan) for the first 31 days of the study, between 26 February and 29 March 2018. One platform (P#2) was lost, most likely as a result of being removed or displaced by fishing trawling activities, and could not be recovered.

Data Analysis Protocol
Data analysis was based on the complementary use of machine learning techniques, in the form of signal processing and classification algorithms, and the manual analysis method based on spectrogram inspection.
Humpback dolphins are known to produce three main types of vocalization (Sims et al., 2012): short-broadband ultrasonic (echolocation) clicks, tonal frequency-modulated whistles, and rapid click-series of pulsed sounds (buzzes and burst pulses). Echolocation clicks are believed to be used primarily to acquire sensory information on the surrounding environment (i.e., to navigate and locate prey or other objects like reefs) (Au, 1993). While, whistles may be commonly used for intra-species social communication and individual recognition (Sayigh et al., 1999;Wang et al., 2013). The functional significance of pulsed sounds (buzzes and burst pulses) is poorly understood in most dolphin species (Lammers et al., 2003), including the IPHD. However, buzzes are generally produced in the terminal phase of the biosonar-based foraging process, while burst pulses are typically isolated from echolocation clicks and are often produced during social interactions and long-range detection (Blomqvist and Amundin, 2004;Lammers et al., 2006;Finneran, 2013;Arranz et al., 2016).
Indo-Pacific humpback dolphin seem to emit echolocation clicks more frequently than other types of vocalizations, and when other vocalizations are produced, clicks are usually recorded alongside them (Munger et al., 2016). Echolocation  clicks were previously considered as the most reliable sound with which to detect the presence of IPHD using PAM (Munger et al., 2016). In general, the automatic detection of tonal signals from dolphins is more complex (Zimmer, 2011). Therefore, the presence of IPHD in our data was assessed by detecting and classifying echolocation clicks using custom algorithms, and subsequently, using manual analysis carried out by human operators (spectrogram analysis and audio listening).

Supervised Machine Learning Algorithm
Within machine learning, supervised learning is based on the process of learning patterns in the data, based on information previously labeled by human (Bianco et al., 2019). Labeled data constitute training data that are used to create a predictive classification tool. Thus, the users provide a known set of input data (observations) and a known response to the data (classes), and the algorithm iteratively makes predictions from the input to the desired output (Bianco et al., 2019). Machine learning models were trained to classify echolocation clicks recorded by the passive acoustic platforms to study the acoustic presence of IPHD.

Data-labeling
Acoustic recordings were collected using a SoundTrap during boat-based surveys. In the event of an IPHD or IPFP sighting, the SoundTrap was lowered into the water to record echolocation clicks. The same procedure was applied during different recording sessions of known conditions such as passing vessels or adverse weather conditions (e.g., heavy rain, high sea state). The SoundTrap used the same recording specifications as described in Section "Study Area and Passive Acoustic Platforms." An experienced researcher analyzed the waveform and spectrogram of the acquired signals using the software Raven Lite (Bioacoustics Research Program, Cornell Lab of Ornithology) and MATLAB (MathWorks, Natick, MA, United States). Short-broadband pulses were identified from acoustic sources of different nature (geophony, biophony, and antropophony). The pulses (observations) were labeled and five classes of short-broadband pulses were defined considering previous knowledge as follows: • IPFP: Clicks from Indo-Pacific finless porpoise.
• RAIN: Pulses produced by rain.

Measuring signal features
Once the data had been labeled, a custom routine (Caruso et al., 2017), developed in MATLAB, was used to extract each pulse recorded during the boat-based surveys and to measure its time-frequency characteristics. The routine first applied a 3 kHz high-pass filter (Butterworth, fourth-order, zero-phase digital filtering). An energy detector based on the Teager-Kaiser Operator was then used to detect the short-broadband pulses (Kaiser, 1990;Klinck and Mellinger, 2011) that exceeded an "adaptive" threshold (Caruso et al., 2017). Each detected pulse was then extracted using a time window of 512 samples (1.8 ms) centered at the peak of the signal. Subsequently, the extracted pulses were oversampled (by a factor of two) and their acoustic features were measured. All parameters related to sound intensity were not considered because they were mainly influenced by an acoustic source of unknown position. The following acoustic features were measured: • Pulse Duration (d) [µs]: the duration of each pulse was determined from the peak value of the envelope of the Teager-Kaiser Operator. The onset and termination of the click signal were defined as the points at which 10% of the peak value was reached. • Peak Frequency (f p ) [kHz]: the first peak frequency corresponds to the highest amplitude of the power spectral density (PSD) of the pulse. • Centroid frequency (f c ) [kHz]: the frequency value that divides the pulse spectrum in halves of equal energy. • Bandwidth [kHz]: the bandwidth of the pulse was parameterized by the -3 dB bandwidth (BW -3dB ) and the bandwidth root-mean square (RMS, BW RMS ). The BW -3dB is the frequency range in which the PSD is above half of its maximum. The BW RMS is a measure of the spectral standard deviation around the centroid frequency of the spectrum (Madsen and Wahlberg, 2007). It can be used as a proxy for the frequency window over which the animal integrates both signal energy and noise (Moehl et al., 2003;Madsen and Wahlberg, 2007

Creating training data
The input for the machine learning model was composed of the known set of input data (signal features) and the known responses to the data (labels of the five classes identified, Figure 2). Before the training phase, the values of each parameter within a single class were tested for a standard normal distribution using a one-sample Kolmogorov-Smirnov test. The test rejected the null hypothesis at the 5% significance level, so outliers were removed if the data were not normally distributed. Consequently, elements of more than 1.5 interquartile ranges above the upper quartile or below the lower quartile were deleted from the training data.
In total, 51,238 short-broadband pulses were identified as training data. As far as possible, a similar number of pulses for each class was included to balance the input of the machine learning model. The number of pulses for each class was distributed as follows: • IPFP: 24% of the total.

Training data exploration
Data were standardized in order to facilitate the comparison of parameters with different units. The distance of each data point from the mean in terms of the standard deviation was measured. Therefore, the standardized data set had a mean of 0 and a standard deviation of 1, and retained the shape properties of the original dataset (same skewness and kurtosis). Figure 3 shows a series of scatter plots for the parameters measured with the training data, grouped by the defined classes. These formed the training data necessary for the development of the supervised learning algorithm.

Trained classifiers
Different types of classifiers (decision trees, discriminant analysis, support vector machines, and nearest neighbor classifiers) were trained in MATLAB considering the following criteria: • Observations: 51,238.
The validation process assessed the performance of each model using new data, compared to the training data. In FIGURE 2 | Signal features. Histograms showing the acoustic features measured for the five classes of short-broadband pulses. In blue, echolocation clicks emitted by Indo-Pacific finless porpoise (IPFP). In green, echolocation clicks emitted by Indo-Pacific humpback dolphin (IPHD). In red, pulses emitted by snapping shrimps (SNAP). In cyan, pulses generated by rain (RAIN). In magenta, pulses produced by ship (SHIP). this study, the k-fold cross-validation method was used to examine the predictive accuracy of the fitted models. This process evaluates the data across the entire training set, dividing it randomly into k groups and then training the model k times, each time leaving a different group out and using it instead as a validation set. Therefore, the cross-validation is repeated until each group has been used as the test set and the average test error is calculated over all groups. The k-fold strategy was used to prevent overfitting and for the selection of the best model.
A classification confusion matrix can be used to visualize the performance of each classifier (Fawcett, 2006). Given a classifier (columns, predicted class) and a set of instances (rows, true class), a two-by-two confusion matrix can be constructed to represent the dispositions of a set of instances (Fawcett, 2006). Therefore, there are four possible outcomes: The confusion matrix forms the basis for many common performance metrics [TP rate = TP/P; FP rate = FP/N; Precision = TP/(TP+FP); Accuracy = (TP + TN)/TOT], where P represents the number of real positive cases in the data (all output positives), N is the number of real negative cases in the data (all output negatives), and TOT is the total number of instances.
The performance of each trained model (supervised machine learning algorithm) was checked considering the overall accuracy of the validation process (the score represented the accuracy of all observations). The classifier with the best performance was used to predict the new data acquired by the array of passive acoustic platforms.

Spectrogram Visualization and Listening
In the first stages of the development of automatic detection or classification algorithms for a specific and complex underwater habitat, it is imperative to test and confirm their reliability using manual spectrogram analysis (Caruso et al., 2017;Hildebrand et al., 2019). After applying the trained classifier to the data from the six PAM platforms, each file with pulses classified as "IPHD" was manually checked using the program Raven Lite. PAM expert operators analyzed the selected files, and also reported the presence of other signals of interest which were recorded concurrently with the IPHD vocalizations. Information was logged regarding: • IPHD vocalizations (whistles and click-series of pulsed sounds).
• Human-related activities (ship noise, explosions, others; the category others included sonar, low frequency pulses, and all acoustic signals considered as man-made).
All these sounds have stereotypical and identifiable spectral components.

Detection Range
The classification process provided information on the time and signal features of the identified IPHD clicks. However, it was not possible to acquire information regarding the distance of detections or the correlated spatial abundance of the animals recorded (DeRuiter et al., 2010). To estimate the probability of click detection as a function of range in the study area, we estimated a transmission loss (TL) model for clicks emitted by IPHDs (Figure 4). Acoustics recordings were collected during IPHD sightings using a six-arm star type hydrophone array with 13 elements, and specific criteria were applied to identify on-axis clicks recorded close to the central hydrophone of the array (Jensen et al., 2009;Gong et al., 2019). The source spectral density level (SSD or SL, dB re 1 µPa/ √ Hz @ 1 m) of the most intensive on-axis click was estimated and a TL model was applied for all frequency components to study the variation of PSD (dB re 1 µPa/ √ Hz) during signal propagation. We considered attenuation due to geometric spreading and absorption processes related to seawater properties (Francois and Garrison, 1982a,b). The geometric spreading model was calculated as spherical spreading in approximation for short and high-frequency echolocation clicks (Madsen and Wahlberg, 2007;Villadsgaard et al., 2007;DeRuiter et al., 2010). Propagation loss was simulated assuming the source (dolphin) at different ranges from the platforms (R = 100 m; R = 200 m; R = 500 m; R = 1000 m). The TL model applied the following formulas: The frequency-dependent absorption α coefficient was associated with temperature, hydrostatic pressure, pH, and salinity (Ainslie and McColm, 1998). These measurements were collected using a portable YSI Professional Plus device (Ohio, United States) and a Hondex PS-7 portable Depth Sounder (Honda Electronics, Japan). The detection range of the platforms for IPHD echolocation clicks was estimated in relation to the average PSDs calculated at each PAM location (from the entire dataset). The received levels (RLs) considered the sound pressure received by the recording hydrophone with source at different distances (Figure 4).

Spatiotemporal Distribution and Acoustic Behavior
The IPHD spatial distribution was evaluated across the PAM platforms, based on the number of files with click detections (confirmed by manual analysis). The number of clicks logged per file was also considered as an index of dolphin acoustic activity within the detection range of each platform.
The biosonar activity of the species was also analyzed in relation to the presence of possible temporal patterns. In particular, the diel cycle was explored in relation to the acoustic presence and echolocation activity of IPHDs in the study area and among each deployment location. The probability of detection was investigated based on the time of the day, which was divided into clusters of 4 h (00:00-04:00; 04:00-08:00; 08:00-12:00; 12:00-16:00; 16:00-20:00; 20:00-24:00). Generalized additive mixed models (GAMMs; Wood, 2017) were also used to evaluate the variation of dolphin click occurrence (presence/absence) and number of clicks detected at the daily scale. Binomial-based GAMMs with a logit link function were performed using R (RStudio Inc., Boston, MA, United States). The hour of the day was used as a fixed effect and dolphin occurrence was transformed to a binary scale (1 = present and 0 = absent). The models were fitted using the R package "mgcv" (R Core Team, 2019; Wood, 2019), and the splines library was used to build cyclic cubic regression splines (Wood, 2017). Knots were capped at four smooths following the method described by Dinh et al. (2018). The number of clicks recorded during the entire day and night phase was also inspected, based on the mean time of sunrise and sunset in the study area (25 February-10 May; sunrise: 07:02, sunset: 18:26; acquired from the Time and Date AS website 1 ). Spatial occurrence of the other sounds identified during the manual analysis (IPHD vocalizations, other phonating animals and anthropogenic sources) were also analyzed to provide insights regarding habitat use and ambient noise characteristics across the study area.

A-tag Acoustic Data Logger
The A-tag acoustic data logger is a submersible acoustic data logger (Marine Micro Technology, Saitama, Japan) which logs the occurrence of possible click events but does not record an audio file (Akamatsu et al., 2005;Li et al., 2010). The A-tag data logger consisted of a waterproof cylindrical aluminum case, two hydrophones (sensitivity of -201 dB re 1 V/µPa, frequency response ± 5 dB in the range of 100-160 kHz), a band-pass filter (-3 dB with a range of 55-235 kHz), an analogto-digital converter, a CPU (PIC18F6620; Microchip Technology Inc., Chandler, AZ, United States) for system control and data 1 www.timeanddate.com processing, a 128 MB flash memory module for data storage, and two UM-1 batteries (Akamatsu et al., 2005).
Dolphin sonar pulse events were extracted from the A-tag data logger using custom software written in Igor Pro 5.01 (WaveMetrics Inc., Lake Oswego, OR, United States). Following previous work in the study area , the following discriminating parameters were used to identify dolphin sonar pulse events: the minimum number of pulses (five); the maximum duration of Inter Click Intervals in a click train (200 ms); and the changes in the patterns of the ICIs between two adjacent pulses (smoothly changing patterns, i.e., each ICI greater than half and less than twice the duration of the previous one). After the click trains were extracted, a manual examination was also performed to eliminate false detections such as ship noise and pulses made by snapping shrimps, which were characterized by randomly changing SPLs and/or ICIs (Akamatsu et al., 2011;Dong et al., 2017). A confusion matrix was constructed, based on the acoustic classification process for the True condition (column), while the A-tag was considered the Predicted class (rows).

Identification of IPHD Clicks
The entire acoustic dataset (collected over 75 days, from six platforms, and totaling 19,215 files, ∼4.2 TB) was inspected to search for IPHD clicks. The number of files acquired did not differ significantly across the six PAM sites (one-sample Wilcoxon signed rank test, alpha = 0.01, p > 0.01) and the median number of recordings collected was equal to 3190.5 files. Echolocation clicks from IPHDs ( Figure 5A) were identified through the following processing stages ( Figure 5B).

1.
Pulse detection: short-broadband pulses were detected in the band 3-144 kHz using the methodology described to measure the signal features of the training data (section " Measuring Signal Features"). 2. Trained classifier: the cubic support vector machine (Cubic SVM) had the best performance, with an accuracy of 89.9%. The TP rate for the class "IPHD" was 96%. Figure 6 shows the confusion matrix plot for the selected trained classifier. A post-classification filter was applied to the data, and files which were classified as IPHD, with at least 30 pulses with a duration of less than 100 µs  were selected. The list of all the files analyzed, with information on the presence/absence of IPHDs and number of clicks per file, was reported in table "Events" (Figure 5B). 3. Manual analysis: expert PAM operators used spectrogram analysis to check the selected files to confirm the presence of dolphin clicks. All the files where click presence was manually confirmed were reported in table "Presence" (Figure 5B), alongside the number of clicks detected.

4.
Signal features: the parameters measured for all shortbroadband pulses were stored in table "Features" for IPHD clicks and in table "Other Pulses" for the other classes ( Figure 5B).

Acoustic Occurrence and Spatiotemporal Patterns
According to the TL model of the IPHD clicks and the average PSDs, the acoustic recorders were capable of detecting dolphin clicks at a range of at least 500 m approximately, depending on ambient noise levels (average PSDs) at each PAM site (Figure 4). In total, 346 recordings were selected based on the presence of pulses classified as "IPHD" (Figure 7A). Within these, the manual analysis reported visible echolocation clicks in 276 files (79.77%; 1.44% of the entire dataset) (Figure 7B). The algorithm performed differently across the six PAM locations (χ 2 test, p < 0.001), relative to the spectrogram inspection. The Cubic SVM performed best in site P#6, with 87.61% confirmed presence ( Figure 7A). The lowest level of confirmed detections was for site P#1, with 57.89%. The number of files recorded which contained IPHD clicks was highest at site P#6 (99 files) and lowest at site P#1 (22 files); 51, 25, 45, and 34 files with clicks were recorded at site P#0, P#3, P#4, and P#5, respectively ( Figure 7B). The number of files containing clicks did not differ significantly between the 2 months which had the same acquisition effort (April and March) across the six PAM sites (Mann-Whitney U-test, p > 0.05) (Figure 8). However, the number of click events selected per file differed significantly between the PAM sites (Kruskal-Wallis nonparametric test; H [5, N = 6] = 44.9, p < 0.01). The files with the highest number of clicks detected, were recorded at sites P#3 (median: 152; 25th percentile: 58.5; 75th percentile: 358) and P#5 (median: 116.5; 25th percentile: 51; 75th percentile: 223). A multiple pairwise comparison of mean ranks (input from the Kruskal-Wallis test) was performed to test for differences among all the locations (Figure 7C). The two sites with the highest mean number of clicks recorded per file (P#3 and P#5) were significantly different from all other PAM locations (p < 0.05). In Figure 8, the spatiotemporal temporal distribution of dolphin presence is shown in term of number of clicks recorded per day (48 5min recordings).
The probability of detection was explored at the daily level, across all the platforms within the 4-h clusters selected (Figure 9), and revealed there was no significant difference in the number of files containing clicks (Kruskal-Wallis nonparametric test; H [5, N = 6] = 9.8, p > 0.05; Figure 9A). Moreover, no significant differences were found between the different platforms across each cluster (all: p > 0.05; Figure 9B). In general, the highest detection probability was identified within the cluster from 04:00 to 08:00 (median: 0.22, 25th percentiles: 0.18, 75th percentile: 0.29). Additionally, GAMMs demonstrated a high variability between the different platforms for dolphin acoustic presence and activity, when hour of the day was used as a fixed effect (Figure 10). The smoothing estimators at site P#4 for both dolphin presence and number of clicks (p < 0.001), and at site P#6 for dolphin presence (p < 0.05), were significantly different. In addition, the number of clicks detected during the day and night phases (Mann-Whitney U-test, p < 0.05) at site P#6 were statistically significant, with a higher rate of clicks recorded during the night. There were no significant differences in the number of clicks recorded at the other PAM sites (Mann-Whitney U-test, p > 0.05). Therefore, hour of the day could not be considered a relevant predictor of dolphin click occurrence within the study area, during the 75 days of monitoring.
Results from the manual analysis of the files selected for the presence of IPHD clicks (276 files) also revealed the spatiotemporal distribution of other IPHD vocalizations and In each box, the central mark indicates the median, and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. Whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually using the "+" symbol (in red). On the right, a multiple pairwise comparison of mean ranks was performed to test significant differences between the PAM sites. diverse sound sources. The occurrence of whistles and click-series of pulsed sounds differed across the six PAM locations (both: χ 2 test, p < 0.01). In conjunction with the number of echolocation clicks detected per file, sites P#3 and P#5 recorded the highest percentage of files containing whistles (52.00 and 44.12% respectively), and bursts pulses and buzzes (P#3: 76.00%; P#5: 47.07%). Sounds from snapping shrimps were identified in all of the files inspected. Calls emitted by fishes (Hawkins, 1986) were identified at significantly different rates across the PAM sites (χ 2 test, p < 0.01). Platforms P#6, P#5, and P#3 recorded the highest number of files containing fish calls (80.81%, 76.47%, and 68.00%, respectively). Furthermore, sounds related to human activities within the defined groups (ships noise, explosions, others) were identified in all of the files manually inspected (346 files). Results showed that the distribution of these anthropogenic noise sources differed around the PAM locations (all: χ 2 test, p < 0.01). The presence of ship noise was higher at sites P#1 and P#0 (100.00 and 76.19% of the files analyzed contained these noises). Whereas, site P#5 was the quietest location with regard to the presence of ship noise (20.51% of files contained these noises). Explosions were identified in three files (one file each in P#1, P#3, and P#4). The occurrence of other artificial sounds (sonar, low frequency pulses, etc.) was higher at sites P#3 (27.27%), P#4 (23.23%), and P#1 (15.79%) compared to the other locations.

Comparison With A-Tag Data Logger
The output of the classification process was also compared with detections logged by an A-tag data logger during the first 31 days of monitoring at site P#0 (1474 files). Both sets of detections, from the Cubic SVM and the A-tag, were validated using manual checks. Within this sub-dataset, 17 files were selected for the presence of dolphin clicks (Figure 11). Results of the confusion matrix test (Figure 11A) showed that in five files, both methods confirmed the presence of dolphin clicks (TP rate = 1). No FP detections occurred, meaning there were no cases where the trained classifier detected candidate clicks when the A-tag detection was triggered (FP rate = 0). In 98.84% of the dataset (1457 files), neither method identified the presence of dolphin   clicks (TN) and in 12 files the A-tag did not trigger with the presence of clicks (FN). In general, in the files classified as TP, a higher number of clicks were identified (mean ± SD = 717 ± 830) compared to the others (mean ± SD = 39 ± 12) ( Figure 11B). Nevertheless, the A-tag was highly reliable in terms of the detection of IPHD clicks (accuracy = 0.99).

DISCUSSION
The distribution and habitat characteristics of the second largest population of IPHD off the east coast of Zhanjiang city (China) were recently studied as part of a 7-year boat-based visual survey effort . Research has also been carried out on IPHD in the Northern Beibu Gulf (China), using data acquired through a 3-year visual survey study (Wu et al., 2017). Traditional visual surveys of dolphins can cover large spatial areas, and are fundamental for specific research utilizing animal sightings (behavioral ecology, genetics, ecotoxicology, etc.) and for specimen identification (photo identification). The data acquired during boat-based visual surveys are essential for investigating group size and composition, and to fully understand population ecology and inform wildlife management plans . However, visual surveys are limited by: (1) their high cost and the availability of research vessels and personnel and (2) the difficulties associated with collecting data at night, in rough weather conditions and from animals when they are underwater.
Passive acoustic monitoring surveys are increasingly being adopted as a key sampling approach to acquire novel information about ocean ecosystems, including their biodiversity (Ricci et al., 2016), habitat degradation (Gordon et al., 2018), the distribution and abundance of elusive species (Hildebrand et al., 2019), benthic communities (Freeman and Freeman, 2016), coral reef health (Bertucci et al., 2016), macroalgae production (Freeman et al., 2018), the status of MPAs (Buscaino et al., 2016;Gabriele et al., 2018), and the effect of human activities (Lammers et al., 2017). In this study, we described the use of multiple PAM sites to investigate spatiotemporal patterns in the distribution and acoustic behavior of an IPHD population.
Recently, Munger et al. (2016) conducted a long-term PAM project to investigate IPHD occurrence north of Lantau Island, Hong Kong, as a part of environmental impact assessments for a number of coastal development plans. The survey was designed to monitor potential areas of impact for IPHD, in relation to planned coastal reclamation and seabed modification. Over 2 years of monitoring, Ecological Acoustic Recorders (EARs) were deployed across several PAM sites (13) during different time periods. The EARs were placed in small subgroups, separated by a distance of a few kilometers. The results reported by Munger et al. (2016) were consistent with the known spatial patterns of IPHD in Hong Kong waters, and indicated the potential use of PAM to study the distribution of the species in shallow waters. However, the acoustic data were acquired at a limited sampling frequency (64 kHz) for the analysis of full bandwidth echolocation clicks, and the signal identification was conducted manually (spectrogram visualization). Munger et al. (2016) also reported that a high level of ambient noise generated by human activities and the constant presence of snapping shrimps may compromise the reliability of an automatic detection process.
In this study, a PAM system using multiple recorders (SoundTrap) set to record at a high sampling frequency (288 kHz) was tested, together with a trained classifier (support vector machine) and a specific data analysis protocol, to study the acoustic occurrence of IPHD. The described classifier was able to detect echolocation clicks from IPHD from within a large dataset, avoiding the other signals recorded, and in particular the short-broadband pulses emitted by snapping shrimps and finless porpoises. The Cubic SVM model was highly reliable in the classification process, and the manual analysis confirmed the presence of dolphin clicks in 79.77% of the files automatically selected. As the distribution of the population is not yet clear, data were acquired from a large area. Performance was slightly different across the PAM sites (Figure 7), where the lowest level reported at site P#1 seemed to be related to the strong influence of anthropogenic noise at this location. However, the main goal of the automatic selection was an effective reduction of the large dataset. Only 346 files (1.44% of the total number of files acquired) were manually inspected.
The methodology applied in this study represents an influential approach to a big data challenge, which included the need to examine recordings from multiple platforms and different acoustic scenarios. Spectrogram resolution could be considered a limiting factor during the human-operated checks of the data, especially when low intensity clicks were recorded (when the animals were more than 500 m from the platform or traveling and echolocating in other directions) or in states of high background and broadband noise generated by ship traffic and/or adverse climatic conditions. However, the manual check of the automatic results was imperative as a validation technique (Hildebrand et al., 2019). In addition, the spectrogram analysis was an important complementary monitoring technique which produced information on soundscape composition. The analysis of the selected data revealed significant insights regarding the fine-scale differences between the PAM sites, and possible correlations with dolphin presence and acoustic behavior. For example, IPHD social and feeding sounds were recorded more frequently at sites P#3 and P#5, in conjunction with the automatic detection of the highest number of clicks. This result confirmed the potential applications of PAM using both automatic and manual methods of data analysis.
The acoustic complexity of marine ecosystems in shallow waters can vary within a few kilometers, especially where the transition between sandy, muddy, rocky, and coral seabed creates micro-benthic communities with different acoustic scenarios (Radford et al., 2010;Ceraulo et al., 2018;Lillis et al., 2018), which are also influenced by broadband ship noise and other human activities Liu et al., 2017;Marley et al., 2017). As such, the differences in the detectability of dolphin clicks across the PAM sites could be a factor of the environment (shallow waters) and the acoustic signal (dolphin click) being investigated. In the future, the use of a specific classification process (trained classifier) for each location will be considered, in order to decrease any potential bias in the analysis (DesJardins and Gordon, 1995). Further investigations are also ongoing to define the spatiotemporal distribution of the other biological sources identified in the study area (finless porpoises, snapping shrimps, soniferous fishes).
The northernmost site (P#6) revealed a higher, consistent acoustic presence of IPHD (the highest number of files with clicks and days with presence; Figures 7, 8). At sites P#3 and P#5, the highest number of clicks in association with the general acoustic activity of IPHD were recorded. These three sites also logged the highest percentage of files containing fish calls which were concurrently recorded with dolphins. These results suggested that the abundance of soniferous fishes could be considered as a possible primary habitat distribution factor and an index of prey availability for IPHD (Munger et al., 2016;Würsig et al., 2016;Pine et al., 2017a,b). The abundance of pulse sounds from snapping shrimps confirmed that this was the most abundant biological sound source in tropical coastal waters (Au and Banks, 1998;Lillis et al., 2018). In addition, the southern sites in the study area (P#1, P#0, P#3, and P#4) appeared to be more influenced by anthropogenic noises, compared to the northern sites (P#5 and P#6). Further analyses are ongoing to describe the variation in the ambient noise levels in the study area in relation to IPHD presence and acoustic behavior, and to describe the long-term acoustic trends and patterns in these sound sources.
The probability of click detection was higher for the 4h clusters before, during, and after dusk and sunrise (4:00-8:00; Figure 9). A variety of odontocetes species around the world, including IPHD, have been reported as being more active acoustically during the night and close to sunset and sunrise, with specific daily patterns (Benoit-Bird et al., 2009;Soldevilla et al., 2010;Klinck et al., 2012;Au et al., 2013;Baumann-Pickering et al., 2015;Wang et al., 2015;Giorli et al., 2016;Temple et al., 2016;Munger et al., 2016;Caruso et al., 2017). In our study area, the daily patterns of acoustic presence and the number of clicks recorded were very different among the PAM sites (Figure 10). GAMMs revealed that hour of the day was not a relevant predictor of dolphin click occurrence or rate of clicks. Considering only the daily and nightly phases, only site P#6 showed a statistical difference in the number of clicks recorded. This was not found for the other five locations and could be a result of the low number of detections logged by the other platforms. Similar results were also reported by Munger et al. (2016) in Hong Kong waters, where one of the PAM sites recorded an insufficient number of detections to determine diel pattern, despite the long-term monitoring carried out. Overall, Munger et al. (2016) reported a high detection rate of IPHD during nighttime, which reinforces the importance of conducting PAM surveys, and not only visual surveys which are mostly limited to daylight hours.
The comparison between the classifier developed in this study and the A-tag data logger demonstrated the high reliability of both methods for the detection of IPHD clicks (Figure 11). No recordings were marked as FP and the two approaches had similar results for the selection of files without click events (TN). The classifier had a higher resolution (12 files) in the identification of clicks (FN). Within the FN class, all the recordings contained a lower number of clicks than the TP class (Figure 11). Therefore, mainly in the cases where few clicks were recorded, the A-tag did not identify events. The acoustic data logger was highly reliable. However, there were both advantages and limitations associated with using this device to analyze dolphin occurrence. Using the A-tag, it was possible to acquire data with a higher duty cycle, reduce the data storage requirement (as no audio files were saved), and accelerate the data processing and availability of results. Despite this, the absence of audio files for post-processing analyses, for example, for the measurement of additional click parameters, the detection of other sounds, and the assessment of environmental noise levels and temporal patterns of the soundscape, was a disadvantage. Additionally, the detection range of the SoundTrap was estimated as larger than previous estimates for the A-tag in the study area . Dong et al. (2017) estimated a detection range of 200 m for IPHD clicks. Whereas, in this study, we estimated a detection range for the acoustic platforms of at least 500 m (Figure 4). However, we strongly suggest carrying out a manual check of the results when using both methods.

CONCLUSION
The IPHD is a nationally protected species of high conservation value in China. However, it is predicted that in the near future, the recently discovered Hainan population will be subject to a series of environmental upheavals. The methodology described in this study demonstrates the potential of PAM for the investigation of the ecology of nearshore dolphin species. The study involved the deployment of underwater acoustic platforms in shallow waters within the supposed habitat of the IPHD, and the acquisition of data simultaneously at six locations distributed along more than 100 km of coastline. Preliminary information regarding habitat use has been acquired and the hotspot areas characterized by higher acoustic activity were identified (P#6, P#5, and P#3), in relation to IPHD feeding and social sounds. As the highest number of detections were recorded at the northernmost site (P#6), the survey area has been extended, and new platforms have been installed north of this location. During future PAM projects, the occurrence of typical Inter Click Intervals will be considered as a significant parameter in the automatic classification of clicks from other cetacean species (Frasier et al., 2017). The results of this study also revealed preliminary information regarding the distribution of other phonating animals and noise producing human activities. The use of PAM has expanded our survey ability in the study area. Furthermore, the information regarding the IPHD acoustic occurrence will be crucial in advancing this field of investigation. Recent boat-based surveys focused in the hotspot areas defined in this study, and the rate of animal encounters is increasing considerably. Therefore, our demonstration that PAM is a powerful technique which can be applied to monitor IPHD habitat, and to improve the efficiency of the direct monitoring of this recently discovered population is very timely.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The data collection and use protocols were approved by the Institute of Deep-sea Science and Engineering, Chinese Academy of Sciences, with the ethics approval code IDSSESYLL-MMMBL-01.

AUTHOR CONTRIBUTIONS
FC, LD, and SL conceived the study and coordinated the PAM project. LD, FC, WX, ZG, MLL, and MML carried out the platform deployments and data collection. FC and GA designed the data analysis protocol and the classification process. FC, DL, WX, and ZG analyzed the acoustic data. FC, LD, SL, MLL, MML, WX, and ZG provided ideas, materials, and analysis tools. All authors reviewed the manuscript.

ACKNOWLEDGMENTS
We would like to thank all the staff and students at the Institute of Deep-sea Science and Engineering (Chinese Academy of Sciences) for their help during this study. We would also like to express our gratitude to the local fishermen who provided assistance. We also wish to thank Giacomo Giorli for his assistance with the text review, Rosario Grammauta for his valuable advice regarding the TL model, and Nicolò Caruso for designing Figure 1B. FC would like to thank the President's International Fellowship Initiative (PIFI) of the Chinese Academy of Sciences. We would like to thank Editage (www.editage.com) for English language editing.