Quantitative Soundscape Analysis to Understand Multidimensional Features

A methodology for the analysis of soundscapes was developed in an attempt to facilitate efficient and accurate soundscape comparisons across time and space. The methodology consists of a collection of traditional soundscape metrics, statistical measures, and acoustic indices that were selected to quantify several salient properties of marine soundscapes: amplitude, impulsiveness, periodicity, and uniformity. The metrics were calculated over approximately 30 h of semi-continuous passive acoustic data gathered in seven unique acoustic environments. The resultant metric values were compared to a priori descriptions and cross-examined statistically to determine which combination most effectively captured the characteristics of the representative soundscapes. The best measures of amplitude, impulsiveness, periodicity, and uniformity were determined to be SPLrms and SPLpk for amplitude, kurtosis for impulsiveness, an autocorrelation based metric for periodicity, and the Dissimilarity index for uniformity. The metrics were combined to form the proposed “Soundscape Code,” which allows for rapid multidimensional and direct comparisons of salient soundscape properties across time and space. This initial characterization will aid in directing further analyses and guiding subsequent assessments to understand soundscape dynamics.


INTRODUCTION
Ocean sound conveys a wealth of information due to the highly efficient manner in which acoustic energy travels through the water. Studying ambient ocean sound provides information on vocalizing marine life, ocean dynamics, and human use of the ocean (Hildebrand, 2009;Pijanowski et al., 2011;Howe et al., 2019). In recognition of its inherent value, ocean sound has been recently accepted as an Essential Ocean Variable (EOV) by the Global Ocean Observing System (GOOS) Biology and Ecosystem Panel (Ocean Sound EOV, 2018). EOVs are approved based on three considerations: (1) relevance in helping solve scientific questions and addressing societal needs, (2) contributions to improving marine resource management, and (3) feasibility for global observation regarding cost effectiveness, technology, and human capabilities 1 . In the context of ocean sound, inclusion in the GOOS framework provides a formal structure for recording ocean sound. Implementation of the ocean sound EOV will help to guide scientific data collection to ensure consistency and appropriate comparisons in soundscape analysis and ocean sound studies.
A soundscape is an acoustic environment tied to the function of a given landscape or marine habitat, and it is the sum of all sounds present; ISO 18405 defines soundscape as the characterization of the ambient sound in terms of its spatial, temporal, frequency attributes, and the types of sources contributing to the sound field. Defining and characterizing the soundscape is an important step in the task of assessing, monitoring, and comparing global acoustic environments. By utilizing soundscape analysis and ocean sound, researchers can better understand ocean dynamics (Radford et al., 2010;McWilliam and Hawkins, 2013;Miksis-Olds et al., 2013;Staaterman et al., 2014), biodiversity and ecosystem health (Parks et al., 2014;Staaterman et al., 2014), and the risk of anthropogenic impacts on marine life (Weilgart, 2007;Carroll et al., 2017).
Traditionally, sound is analyzed by measuring the sound pressure level (Sound Pressure Level; SPL), and other sourceand amplitude-related parameters such as the number of sources detected, source classification, localization of detectable sources, or sound exposure level (SEL) . Recently, researchers have developed and applied metrics mathematically summarizing acoustic properties and comparing them with independent ecological data to understand the types of sources present in a soundscape. For example, the Acoustic Complexity Index (ACI) was proposed as a proxy for biodiversity Sueur et al. (2008), and (Pieretti et al., 2011) demonstrated the efficacy of the Entropy Index (H-index) and the Dissimilarity Index (D-index) at highlighting biodiversity of a terrestrial environment. Application of acoustic biodiversity indices in a marine environment have yielded mixed results (Parks et al., 2014;Staaterman et al., 2017;Bohnenstiehl et al., 2018;Bolgan et al., 2018). Further investigation into the utility of acoustic indices in marine applications is needed to assess their efficacy.
Even though ocean ambient sound and soundscape research has been conducted for decades, the ocean community has still not reached a consensus on the optimal way to accurately report and compare important aspects of ocean sound. Ocean sound studies are not trivial endeavors, and the complexity of ocean sound dynamics, combined with a lack of formal standards, guidelines, and consistent methods, make soundscape analyses and meaningful comparisons difficult. The methodologies utilized by researchers are often tailored to a specific study, which focuses on answering the question at hand, but contributes little to the understanding of soundscape dynamics on a large regional or global scale if the results cannot be easily interpreted or compared to data from other areas. Studies often fail to clearly report metric input parameters critical to the determination of the final metric value; ambiguities in reporting can make replicating study methodologies difficult, and can lead to erroneous comparisons. For example, different methods of 1 https://goosocean.org/index averaging have yielded differences in final metric results of over 10 dB in previous works (Merchant et al., 2012;Hawkins et al., 2014). Some methodology descriptions are so vague it is nearly impossible to determine averaging times, integration windows, and exactly which metric is being calculated (McKenna et al., 2016). To accurately report important soundscape information, efforts must be made to standardize the way in which researchers acquire, process, analyze, and report acoustic metrics.
Many analysis methods produce graphical outputs, which are assessed visually but can become cumbersome when many comparisons need to be quantitative across time or space. Graphical information, supplemented with standardized quantitative analysis of the multidimensional soundscape within an accepted framework would produce thorough, accurate, and easily comparable results for acoustic recordings. A method that resembles this type of standardized quantitative analysis was adopted by The World Meteorology Organization (WMO) for comparing and reporting the state of sea ice that encompasses multidimensional information: ice coverage, stage of development, age, thickness and form. The WMO system for reporting sea ice is commonly referred to as the "egg code" (JCOMM Expert Team on Sea Ice, 2014). This egg code presents standard ice characteristics in a clear and succinct manner, and the multidimensional nature of the egg code reports a variety of relevant ice properties; "one size fits all" measures are rarely adequate in describing dynamic environments (Figure 1). The idea of a "measure" like the egg code that captures and reports salient information about an environment is the inspiration for the proposed soundscape code. While the egg code reports multiple dimensions of the environmental "feature" ice, the proposed soundscape code reports multiple dimensions of the environmental "feature, " sound pressure.
Amplitude, periodicity, impulsiveness, and uniformity are physical soundscape properties that are important to understanding soundscapes and the distribution of sound energy across time, space, and frequency ( Table 1). The objective of this study was to identify the optimal suite of metrics across the general soundscape properties (amplitude, impulsiveness, periodicity, and uniformity) to create a soundscape code infrastructure for comparing soundscapes. Multiple metrics within each soundscape property ( Table 2) were selected and applied to a diverse set of soundscapes to identify the metric that best captured the salient aspects of the acoustic recordings. Comparing the acoustic properties of soundscapes is not meant to be an exhaustive assessment, but rather an initial analysis to understand some of the dynamics of acoustic environments and guide subsequent analysis for more targeted assessments. The resulting product forms the proposed soundscape code, which provides a framework for comparing soundscape properties across space and time utilizing metrics that capture spectral and temporal properties of acoustic environments; characterizing acoustic environments in terms of spatial, spectral, and temporal acoustic properties directly relates to the ISO 18405 definition of a soundscape.
Sound level statistics and measures of the amplitude of acoustic power and energy are used frequently in ocean sound studies. The root-mean-square (rms) SPL captures the average sound pressure amplitude of the corresponding environment (SPL rms ). Though susceptible to upward bias from loud, intermittent sounds, SPL rms is the most ubiquitous acoustic metric (Merchant et al., 2015). SPL rms does not capture all the important amplitude information of a soundscape such as maximum sound pressure levels (SPL pk ), the sound floor (quietest periods in a soundscape), or sound exposure level. Reporting both the SPL rms and SPL pk provides detail on average sound amplitude as well as information on the range of sound amplitude.
Impulsive sounds are defined qualitatively as sounds that are of short duration, have rapid rise times, and high sound levels (NIOSH, 1998;NMFS, 2018). A wide variety of sound sources produce pulsed acoustic signatures which means that it is possible for the impulsiveness of a soundscape to be used as an indication of source presence/absence. Impulsive sounds can potentially have physiological impacts on fish (Halvorsen et al., 2012a,b;Casper et al., 2013a,b), and marine mammals (Lucke et al., 2009;Kastelein et al., 2015;Southall et al., 2019), so it is also a valuable property to consider from a regulatory perspective as well as a physical characteristic. Regulations lack quantitative definitions regarding the difference between impulsive and non-impulsive sounds, but several metrics for quantifying impulsiveness have been suggested including kurtosis, crest factor and Harris impulse factor (Erdreich, 1986;Starck and Pekkarinen, 1987;Kastelein et al., 2017). All three were initially considered candidate metrics to represent impulsiveness in the soundscape code, but Harris impulse factor was removed from consideration due to constraint in the range of the metric and the resulting implications for future use in comparative analysis.
The term periodicity is inherently general; periodicity can refer to a pattern that repeats over the course of a year, month, day, hour, or second. Seismic airgun signals (Greene and Richardson, 1988), echolocation clicks (Clarke et al., 2019), pulsed fish or whale vocalizations (Watkins et al., 1987;Lobel, 1992), and the rhythmic rasping of the California spiny lobster (Patek et al., 2009) are examples of real world ocean signals that are periodic. The proposed soundscape code focuses on periodicities that (1) impose physical characteristics to a soundscape over short time periods, (2) occur on time scales of less than a minute, and (3) can be captured by metrics calculated over a single minute of acoustic data. A metric for capturing larger scale periodicity related to diel, season, or annual cycles was not explored in this project but could be assessed using a time series of the individual soundscape code parameters. To our knowledge no metric designed specifically for quantifying the content of periodic signals in an acoustic environment exists, so metrics from other fields were repurposed as candidates to represent the periodicity property in the soundscape code. Cepstrum was first proposed as a tool for analyzing periodic seismological data (Bogert et al., 1963), where the arrival of various waves and phases could be considered as distorted echoes. Cepstrum is not widely used in marine soundscape studies, but has been used with efficacy in a variety of mechanical analyses, and it is considered underutilized by those that use it (Randall, 2017). Time lagged autocorrelation has been used to characterize soundscapes in terms of the dominant source types , and was repurposed in this study to quantify the content of periodic signals detected in a soundscape.
Soundscape uniformity is the degree to which the signals change over time in terms of temporal and frequency attributes of the soundscape. It answers the question "to what degree are the sounds similar or different?" and describes the dynamic nature of a given soundscape. The inclusion of the uniformity property in the soundscape code was motivated by the widespread interest in biodiversity, and the use of passive acoustic monitoring techniques to study biodiversity remotely (Peet, 1974;Pimm and Lawton, 1998;Sueur et al., 2014). A suite of quantitative indices has been developed and geared toward quantifying different properties of acoustic environments: Acoustic Complexity Index (ACI), H-index, D-index, and Acoustic Richness (AR). These indices have been widely used in terrestrial acoustic studies to measure biodiversity and species richness (Sueur et al., 2008(Sueur et al., , 2014Pieretti et al., 2011Pieretti et al., , 2017. For the D-index, Sueur et al. (2008) utilized a measure that estimated the compositional dissimilarity between two communities. Within this work, the D-index is applied to two consecutive acoustic recordings in an effort to capture the acoustic differences and measure the acoustic uniformity. The Entropy Index (H) has been used as a proxy for biodiversity in the marine environment with mixed results (Parks et al., 2014;Harris et al., 2016). Harris et al. (2016) found that H values exhibited a dependence on the size of the fast Fourier transform (FFT) window, and at a window length of 512 showed little correlation to typical diversity measures, but correlation increased with spectral resolution. Parks et al. (2014) had to remove noise from a seismic survey before finding a significant connection between the H index and sampled biodiversity. Because H-index was designed to increase with signal diversity in time and frequency, it was repurposed in this study to represent acoustic uniformity, which Amplitude Can be conceptualized as the "loudness" of an environment. Describes the effective sound level across time.

SPLrms, SPLpk
Impulsiveness Impulses are characterized as being broadband with rapid rise times, short durations, and high peak sound pressures. Impulsiveness of a soundscape would describe the presence and magnitude of signals that can be characterized as impulsive.

Kurtosis, Crest Factor
Periodicity Describes the repetitive nature of sounds in the soundscape. The timescale of the periodic activity is an important factor here; pulsed signals with short inter-pulse-intervals like seismic surveys, pile driving, and pulsed minke whale vocalizations are periodic; repeating acoustic events like dawn or evening chorus are also periodic, but on much larger time scales.

Time lagged autocorrelation, Cepstrum
Uniformity Describes the diversity of a system. In an acoustic context: to what degree are all the sounds similar or different across time?
shares similarities with the principle of acoustic diversity that the metric was built on.

MATERIALS AND METHODS
The datasets used to assess the performance of the candidate metrics for use in the soundscape code were selected from a pool of passive acoustic data that had already been analyzed, and in some cases, used in publications (Martin et al., 2017(Martin et al., , 2020Martin and Barclay, 2019). Soundscape code datasets were picked based on previous knowledge of activity in the soundscape region. Digital passive acoustic data were converted to pressure data, and then metrics were calculated over each pressure time series. The metrics were analyzed to determine the optimal combination for capturing salient quantitative aspects of a soundscape. Each dataset was collected using Autonomous Multi-channel Acoustic Recorders (AMAR, JASCO Applied Sciences) that sampled at a variety of sample rates and durations ( Table 3). Recorders were deployed intermittently between 2012 and 2016 at the seven different locations. While the sites may not all be unique in their location, the acoustic content of their recordings was unique; GB5 is actually about 70 km from GB4v35 and GB4v0, and while the latter two share the same site location designation, the datasets were recorded weeks apart. The seven data sets contain a variety of human-generated, natural biologic, and natural abiotic sounds including sounds from a seismic survey, impact pile driving, vessel passages, ice calving and icebergs, fin (Balaenoptera physalus) and humpback whale (Megaptera novaeangliae) vocalizations, northern bottlenose whale (Hyperoodon ampullatus) and common dolphin (Delphinus delphis) whistles and echolocation, and shallow-water reef sounds including foraging urchins, snapping shrimp, and fish grunts (Figure 2). The biological sounds present in the data sets are representative of the diversity of marine life and sounds produced ocean wide.

Data Processing
Five frequency bands were selected for soundscape code analysis: (1) 10-100 Hz (Low), (2) 100-1,000 Hz (Mid), (3) 1-10 kHz (High), (4) 10 kHz and above (Ultra-High), and (5) 10 Hz and above (broadband; BB). These frequency bands were chosen because the dominant frequencies of many signals can be isolated into a single soundscape code frequency band. Data from Biogully East (BGE) was low pass filtered with a passband out to 32 kHz to provide a uniform analysis in the Ultra-High band across Melville Bay (MB), Great Barrier Reef (GBR), Orsted (OR), and BGE. Sample rate restrictions precluded analysis of the Ultra-High band at the Grand Banks sites (GB4v0, GB4v35, and GB5). The high band at GB5 was included, even though the data could only be resolved up to 8 kHz due to the sample rate at this site (16 kHz) ( Table 2). The metrics assessed for the soundscape code were calculated over one-min time windows. The one-min time window is a standard time length in soundscape analysis and corresponds with the human auditory experience (Ainslie et al., 2018). All FFTs performed in calculating soundscape code metrics used 1-second time windows. The median and 95% confidence interval of each metric was reported for each site and analyzed. Acorr2, acorr3, SPL rms , SPL pk , kurtosis, crest factor, D-index, and H-index were calculated using custom code written in MATLAB (2019); The Mathworks Inc., Natick, MA, United States.
Sound pressure level (SPL), reported in logarithmic decibel (dB) units relative to a reference pressure of 1 µPa, is the most common amplitude metric reported in ocean sound studies (Equation 1) where P ref is reference pressure, p(t) is the instantaneous pressure at time (t), and T is the analysis window duration (Madsen, 2005;Thompson et al., 2013;Merchant et al., 2015). The SPL pk has added value as an amplitude metric, as it is also a relevant measure in determining the risk of physical damage in auditory systems (Coles et al., 1968) (Equation 2).
Because the SPL pk and SPL rms metrics were identified from previously published work as well-established and effective All hydrophones individually calibrated with pistonphone calibrator before and after deployment. measures of the amplitude of sound pressure, they were selected for use in the soundscape code without further analysis (Madsen, 2005;Thompson et al., 2013;Merchant et al., 2015). The crest factor is defined as the difference, in dB, between the SPL pk and the time averaged sound pressure level. It describes the ratio of the SPL pk relative to the effective pressure level (Equation 3): A crest factor of 1 indicates no peak, while large-valued crest factors indicate the presence of large peaks. This metric has been used in predicting auditory injury in industrial workers by utilizing A-weighted sound levels where a crest factor value of 15 dB or greater indicated dangerous impulse noises (Starck and Pekkarinen, 1987). Kurtosis describes the shape of a probability distribution and is a measure of the "tailedness" of the probability distribution of a real valued random variable. Kurtosis is defined below for the pressure time series p(t) as (Equations 4-6): signals are considered impulsive. Gaussian-distributed random noise produces kurtosis values of 3.0. Time series with strong sinusoidal signals have a kurtosis in the range of 0.0 to 3.0, and time series with transients produce kurtosis values above 3.0 (Martin et al., 2020). Cepstrum treats the log spectrum of a time series as a waveform, and the spectrum of this log spectrum produces peaks when the original waveform contains echoes, or periodic components (Oppenheim and Schafer, 2004). Cepstrum is calculated by taking the real part of the inverse discrete Fourier transform (DFT) of the logarithm of the magnitude of the DFT of the signal (Equation 7): where p ts is the pressure time series. Cepstrum was calculated over averaged pressure time series using a built-in MATLAB function rceps. However, the graphical output of cepstrum needed to be further quantified for use in the soundscape code. To do this, a threshold set at c(n) = 0.1 was chosen, and any peaks above this threshold in the cepstrum output were used as proxies for periodicities with the number of peaks-per-minute (ppm) counted and reported in the soundscape code. Inspired by , time lagged autocorrelation used to highlight periodicities in acoustic data was considered as a periodicity metric candidate within the present study. Using an averaged pressure time series, the peaks above a selected threshold in autocorrelation plots can be counted and used as proxies for periodicity in a soundscape. Two averaging windows were assessed within this study to determine the best fit for the soundscape code: 1.0 s mean square (MS) sound pressure averages, and 0.1 s MS sound pressure averages. These nuanced autocorrelation metrics are referred to as "acorr2" (1.0 s average), and "acorr3" (0.1 s average). The threshold for periodicities using autocorrelation, a minimum peak prominence of ρ yy (t, t + τ) = 0.5, was set using the MATLAB function findpeaks, and any autocorrelation coefficient peaks in the 1min time window above this threshold were counted (ppm). For acorr2, 45 (75%) lags were considered. For acorr3, 420 lags (70%) were considered.
The H and D indices are calculated using the amplitude envelope which is given by the absolute value of the analytic signal ζ(t), defined as (Equation 8): where: i = √ −1, and p h (t) is the Hilbert transform of the real valued signal p(t). Probability mass functions (PMF) give the probability that a discrete, random variable is exactly equal to some value, and the PMF of the amplitude envelope A(t) and PMF of the mean spectrum S(f) is given by (Equations 9 and 10): and is used to quantify envelope dissimilarity where s(f ) is the mean spectrum. Envelope dissimilarity is estimated between two signals by computing the difference between their PMFs (Equations 11 and 12): where A(t) is the PMF of the amplitude envelope and S(f) is PMF of the mean spectrum. D-index (Equation 13) is the product of the temporal dissimilarity (D t ) and spectral dissimilarity(D f ): The D index is a between-group (β) index originally developed to measure differences between communities. In the context of this study, the D index is used to quantify differences in the soundscape across time by calculating it over consecutive acoustic recordings. H-index (Equation 16) is the product of the spectral (H f ) and temporal (H t ) entropies (Equations 14 and 15): where A(t) is the PMF of the amplitude envelope, and S(f) is the PMF of the mean spectrum. H is 0 for a single pure tone, increases with frequency bands and amplitude modulations, and approaches 1 for random noise.

Metric Performance Analysis
A qualitative analysis was done to determine the optimal representative metric for each of the three soundscape properties in the soundscape code. Visual analysis of spectrograms and waveforms, coupled with knowledge of the sound sources present at each site, helped to form a priori expectations for the candidate soundscape metrics (Figure 3). Metric statistics were compared against a priori expectations, identifying which metrics produced the strongest agreement across soundscape code properties. The SPL pk and SPL rms metrics have been well studied as quantitative metrics of amplitude (Madsen, 2005;Thompson et al., 2013;Merchant et al., 2015) and further comparison was not deemed necessary. A series of qualitative comparisons (Table 4) were used to inform the determination of which metric was optimal for each property. The qualitative comparisons shown in Table 4 do not represent an exhaustive review of the analysis completed using the soundscape code datasets, but rather represent the comparisons that produced definitive results in the analysis. Because amplitude metrics were already chosen, they are not featured among the list of comparisons. The qualitative comparisons and time series analysis of the candidate metrics informed most of the decision on which combination was optimal for use in the soundscape code. However, to explore how metric values could be used to distinguish or draw comparisons between sites, some statistical analysis of the metric values and distributions respective to each site and frequency band was desired. Because the ANOVA is an analysis specified for normally distributed Gaussian data, and distributions of kurtosis, D-index, and acorr3 violated this assumption of normality, a non-parametric multiple comparisons for all site pairs using the Dunn method for joint ranking was utilized (Dunn, 1964). Multiple comparisons tests were carried out using JMP Pro TM 14.0.0 software and were repeated for the Broadband, Low, Mid, and High frequency bands; for each frequency band analyzed, 21 site pairs were assessed. For each metric, groupings in respective frequency bands were formed by sites whose metric distributions were determined by the multiple comparisons tests to not be significantly different. An assessment of how the candidate metrics "grouped" the soundscapes highlighted how the different metrics would compare or contrast the soundscapes of similar and different acoustic environments. Connected letters reports were created for the multiple comparisons test results to better visualize these groupings. Sites sharing common letters in the tables (within but not across frequency bands) have metric distributions that are not significantly different (according to the Dunn method for joint ranking).

RESULTS
Results from a series of comparisons that led to the final choice of metrics are presented on a property-by-property basis. Results from several of the qualitative comparisons outlined in Table 4 are presented to highlight the responses that guided the metric selection. Metric comparisons were conducted for impulsiveness, periodicity, and uniformity properties. As stated previously, metrics for amplitude were already identified from previously published work as well-established and effective measures of the amplitude of sound pressure, so they were selected for the soundscape code without further analysis (Madsen, 2005;Thompson et al., 2013;Merchant et al., 2015). Calculated metric time series were compared to spectrograms, pressure waveforms, and a priori expectations to guide final metric selection.

Impulsiveness
Both kurtosis and crest factor were generally found to accurately report the presence of impulsive signals. However, kurtosis values more closely aligned with a priori expectations in characterizations of the impulsiveness of the soundscape code datasets. The superiority of kurtosis in indicating the presence of impulsive signals was suggested in qualitative comparison I1, which featured sound from only one dominant sound source: ice. Spectrograms showed that ice cracks, groans, and rumbling acoustic activity dominated the lower frequencies of the soundscape, but several instances of more broadband ice cracks exist in the dataset (Figure 2A). Impulsive metrics were expected to reflect the presence of impulsive signals in mostly BB, Low, or Mid soundscape code bands. Kurtosis reported many values exceeding the impulsive threshold in the BB, Low, and Mid soundscape code frequency bands indicating considerable impulsive acoustic activity in the expected frequency bands (Figure 4).
Based on spectrogram analysis and an analysis of the sound pressure levels at MB, it was understood that while potentially impulsive events occurred frequently throughout the recording, a handful of high intensity events dominated the soundscape. It was expected that the impulse metrics would reflect the sporadic and intermittent nature of the ice cracks in boxplots of impulsiveness metric values through greater variability (Figure 4). Kurtosis performed as expected by indicating a wide range of kurtosis values that accurately captured the sporadic nature of the ice sounds. While crest factor reflected the presence of impulsive signals, it reported very little distinction between the soundscape code frequency bands, and the crest factor values varied much less than the kurtosis values so that it was not as possible to detect that a handful of high amplitude events characterized the impulsive nature of the soundscape at MB.
At GB4v35, where the 20 Hz pulsed vocalizations formed the basis for qualitative comparison I2, kurtosis values indicated the presence of impulsive signals the in Low band for minutes 1, 2, 3, 5, 6, 7, 9, 10, 11 which corresponded closely to the minutes containing pulsed fin whale vocalizations. Mid and High band kurtosis values maintained values of 3 for the duration of the qualitative comparison I2. Crest factor peaks also aligned with the pulse trains, but unlike kurtosis, crest factor impulse detections were identified in all soundscape code frequency bands, and for every minute but the 8th. The crest factor values in the Mid (100-1000 Hz), High (1-10 kHz), and Ultra-High (>10 kHz) soundscape code bands did not align with content visualized in the spectrograms or a priori expectations made based on the knowledge that the dominant sound source at this site was fin whales. However, 3-10 dB fluctuations in the 1-second SPL pk in the Ultra-high band were detected, which could indicate the presence of an impulsive sound and justify the higher than expected crest factor values (see Supplementary Figure 1 related to qualitative comparison I2). Ten-min boxplots were used to explore how the metric values changed over time at GB4v35 (Figure 5). Crest factor (Figure 5 Right) remained high during the period of ship noise (box 10-11), so it was difficult to deduce from the crest factor values that a ship had contributed significantly to the soundscape. In contrast, kurtosis values (Figure 5 Left) dropped quickly after the introduction of vessel noise to the soundscape (box 10-11), and values only increased after the vessel noise had subsided and the soundscape returned to being dominated by the pulsed signals of the fin whales (boxes 14-21). Kurtosis also only showed a slightly elevated response to a different fin whale chorus that correspond roughly to boxes 22-32 and a different recording period. Crest factor indicated little difference between the impulsiveness of the two different fin whale choruses. Metrics should indicate the GBR is more consistent. Indication of presence of common dolphin vocalizations at OR, and fish grunts at GBR are a secondary expectation.
Both metrics performed similarly but the range measure of the D-index provided an accurate assessment of the different sites. Difference in responses was nuanced but favored D-index.
Frontiers in Marine Science | www.frontiersin.org  Qualitative comparison I3 which contained signals from a seismic survey ( Figure 2B) yielded similar results in terms of the performance of the two impulsiveness metrics. Ultimately, both metrics adequately reported the nature of the impulsive seismic survey signals, but kurtosis again aligned more with the salient signals in the proper frequency bands (see Supplementary Figure 2 related to qualitative comparison I3). 10-min boxplots of both crest factor and kurtosis values adequately reflected the nature of the impulsive signals in the GB4v0 soundscape. However, kurtosis boxplots at GB4v0 highlighted the difference in seismic survey signals as the survey vessel approached, passed over the hydrophone, and departed. Crest factor, on the other hand, indicated little difference between the phases of the survey.

Periodicity
Periodicity metrics all reflected aspects of the periodic nature of each of the soundscapes and differences in metric responses were typically nuanced (Figure 6). Acorr3 was found to be more closely linked to the periodic nature of the soundscapes, and was more robust to mischaracterizations of the soundscapes that were observed with acorr2 and cepstrum.
In comparison P1, subsets of the GB4v35 dataset contained unequal numbers of fin whale pulsed vocalizations, and this disparity was used to compare the responses of the periodicity measures to determine if the metrics would report more peaks in time window 1, which contained far more of the 20 Hz periodic fin whale vocalizations (Figure 7). Cepstrum reported 27 fewer peaks across frequency bands in time window 2, while acorr3  reported 11 fewer peaks. In a deviation from expectations, acorr2 reported six more peaks for the second time window.
Time series analysis using 10-min boxplots over the entirety of the GB4v35 dataset similar to the analysis presented in Figure 5 showed two main differences: (1) Acorr2 reported more peaks per minute than acorr3 in the High band for 69% of the minutes analyzed (n = 353). (2) Both acorr2 and acorr3 were highly consistent during the second period of fin whale vocalizations while cepstrum varied more (see Supplementary Figure 3 related to time series analysis of the periodicity metrics at GB4v35).
Qualitative comparisons P2 and P3 yielded results that were less conclusive than P1. Comparison P2 utilized sounds from a seismic survey ( Figure 2B) and metrics were expected to report an increase in peaks-per-minute from time window 1 to time window 2. Time window 1 captured distant seismic survey signals, while time window 2 captured close proximity signals that were louder and had more consistent repetition. All metrics reported more peaks-per-minute across Soundscape Code frequency bands for the second time window of the GB4v0 dataset. Comparison P3 utilized the sounds from an impact pile driving operation ( Figure 2D) and metrics were expected to report a decrease in peaks-per-minute from time window 1 to time window 2. Time window 1 featured intense pile driving sounds and time window 2 did not. The periodicity metrics in P3 reported a substantial decrease in peaks-per-minute across the two time windows. 10-min boxplots of periodicity metrics plotted over the duration of the datasets used in qualitative comparisons P2 and P3 did not indicate conclusive differences and all metrics responded appropriately to the different acoustic activity featured in the two datasets.
Uniformity D-index values aligned with a priori expectations and outperformed the H-index in every analysis conducted using the soundscape code datasets. D-index values accurately captured the acoustic uniformity at all soundscape code datasets by indicating consistently high values at GB4v0 and OR, and the presence of high values in sites where dramatic changes in the acoustic environment occurred (Figure 8).
Comparisons of uniformity metric results drew on ice sounds from MB (Figure 2A), pile driving and boat noise from OR (Figure 2D), reef sounds from GBR (Figure 2G), and sporadic echolocation and whistling activity from BGE ( Figure 2E) to determine which metric would represent soundscape uniformity in the soundscape code (Table 4 rows 7 and 8). At MB, the D-index values in the low and mid bands reflected the sporadic and random ice noise (Figure 9). Compared to BGE, D-index values accurately characterized MB as more variable in these bands. In the High and Ultra-High bands, BGE D-index values were greater than MB, which again was an accurate representation of the acoustic activity of northern bottlenose. Disruption of acoustic uniformity from the northern bottlenose whales at BGE was reflected clearly in time series analysis of D-index values. H-index also reflected the decreased uniformity at MB, but the dependence of this metric on bandwidth made interpretation and comparison difficult, as H-index values increased from the Low to Ultra-High soundscape code band regardless of acoustic uniformity. D-index soundscape code values in Figure 9 reflect the substantial disparity in acoustic uniformity between the two sites in both magnitude and variability of the index. In contrast, the slightly larger range of the H-index values corresponding to the MB Low band indicated only a slight disparity in acoustic uniformity between the two sites, and the magnitude of the index was not representative of the recording content.
Similar analysis carried out on data from the OR and GBR sites yielded slightly different results. In qualitative comparison (U2), comparisons of respective uniformity metrics across the sites highlights differences in acoustic uniformity. D-index values more clearly capture the disparity in acoustic uniformity between OR and GBR especially in the increased size of the boxplots of values at OR in the High and Ultra-High bands. H-index values used to compare the acoustically distinct OR and GBR sites failed to reflect the acoustic disparity by producing almost identical soundscape code medians, with only slightly more variability of the 1-min H-index values reported at OR. Similar to the H-index, the magnitudes of the D-index values at both OR and GBR were only slightly different. The variability measure of the D-index, however, did reflect the disparity in acoustic uniformity across OR and GBR (see Supplementary Figure 4    qualitative comparison U2). When metric values were analyzed using 10-min boxplots over the full OR recording, the D-index more effectively captured the dynamic nature of the soundscape, while the H-index values hardly indicated any changes in acoustic activity (see Supplementary Figures related time series analysis of the uniformity indices at OR and GBR). The intuitive nature of the D-index and much closer alignment to salient acoustic activity in the soundscapes of the soundscape code datasets than H-index suggested D-index was the optimal metric to represent acoustic uniformity in the soundscape code.

Statistical Groupings of Metric Values
The non-parametric multiple comparisons tests were used to form the statistical groupings of sites based on the medians of the metric values (Figure 10). Respective to each site, metric values that are not significantly different are "connected" by identical letters in the connected letter tables that report the multiple comparisons results. In key frequency bands, the uniquely impulsive soundscapes of MB, GBR, OR, and GB4v0 were all found to have kurtosis values that were significantly different than the sites where impulsive signals were either rare or only faint (BGE, GB4v35, and GB5) ( Figure 10A). Kurtosis was observed to outperform crest factor in terms of robustness, sensitivity, and more informative soundscape grouping, which led to the selection of kurtosis to represent impulsiveness in the soundscape code. Periodicity metrics failed to produce intuitive groupings of the sites in terms of periodic content, and acorr3 was the only metric that produced significantly different values between the highly periodic site and the moderate-low periodic sites (MB, BGE, GB5, and GBR) ( Figure 10B). In spite of less than ideal groupings for acorr3, optimal performance in qualitative comparisons and other analyses made it the only viable choice and acorr3 was selected as the metric to represent soundscape periodicity. Multiple comparisons results for the D-index were both adequate and less than ideal, depending on which frequency band was being considered ( Figure 10C). However, considering the far more intuitive nature of the index and consistently better performance relative to the H-index, D-index was chosen to represent acoustic uniformity in the soundscape code.

DISCUSSION
A collection of metrics was applied to a series of unique soundscapes to identify the optimal suite of metrics for capturing the salient soundscape characteristics, which ultimately enables quick and simple quantitative comparisons of soundscapes. The final determination considered both the metric efficacy in quantifying the corresponding soundscape property, and how well the metric fit into the infrastructure of the soundscape code. SPL rms and SPL pk (amplitude), kurtosis (impulsiveness), D-index (uniformity), and acorr3 (periodicity) were determined to be the best metrics out of the candidate metrics for comparing soundscapes. Soundscape codes comprised of the optimal metrics indicated dominant signal frequencies and salient differences in acoustic environments (Figure 11). Figure 11 represents what an initial soundscape assessment using the soundscape code methodology might look like; tabulated soundscape information across frequency bands and metrics offers an initial "glimpse" into a marine acoustic environment and highlights areas of interest for further targeted analysis. The soundscape code is proposed here as a first step in the direction of a standardized soundscape analysis methodology that will ultimately facilitate quantitative comparison and assessment of soundscapes, and guide subsequent analysis.
Traditionally, underwater soundscape studies focus mostly on quantifying fluctuations, central tendencies, or minimum/maximum observed levels of amplitude typically represented by sound pressure, intensity, or acoustic energy ( Table 1). If metrics that quantify aspects of other soundscape properties are included in soundscape analysis, a more thorough assessment of soundscapes is possible. The soundscape properties outlined in Table 2 were quantified by the selected metrics, which allowed comparisons of the soundscape code datasets to be made in terms of sound amplitude, impulsiveness and transient events, content of repetitive signals, and spectral and temporal variability. For example, in a comparison of the impulsiveness of the soundscape code datasets GB5, GB4v35, and BGE, impulsiveness metric values indicate they are the least impulsive sites of the seven (Figure 11). This observation was made quickly, and demonstrates the ease with which one can compare and contrast different soundscapes when identical metrics are being compared. This assessment of across site impulsiveness can be taken a step further: The elevated 95% CI value in the Ultra-High (relative to BB, Low, Mid, and High) band at BGE indicates the presence of acoustically active northern bottlenose whales. At the same time, the median and 95% CI in the Low bands at GB4v35 and GB5, respectively, indicate the presence of chorusing fin whales. Martin et al. (2020) showed that 1-min kurtosis values increased as the amplitude of simulated impulses increased, so the slightly elevated impulsiveness metric values at GB4v35 relative to GB5 could be a manifestation of the higher amplitude of the fin whale chorus at GB4v35, and this coincides with increased SPL pk values at this site. This example highlights how a combination of multidimensional metrics can be used congruently to understand a soundscape and how nuanced differences in the metrics can indicate significant differences in soundscape composition. The collection of metrics captures temporal and frequency characteristics of acoustic environments and depending on application can be used to assess spatial temporal and variation in soundscapes directly corresponding to the soundscape components defined in ISO (2017) 18405.
The proposed soundscape code provides a valuable framework to simply covey complex ocean characteristics and is a "first step" in the direction of a standardized soundscape analysis and reporting structure. We recognize that the future use and potential improvement of the soundscape code will benefit from more thorough assessment of duty cycling, bandwidth definitions, and dataset durations, as only data sets of multiple hours and a majority of continuous sampling regimes were used to select the proposed soundscape code metrics. Further work assessing the impact and performance of different analysis windows (larger time scales), datasets with unique acoustic features not captured in this work, datasets with significant overlapping of source signals, and threshold selections is required to ensure the development of an effective, rapid, and robust quantitative soundscape framework.
Duty cycle was found to have impacts on the D-index, as the D-index measures the difference between consecutive recordings, and consecutive recordings will have more in common than recordings spaced apart by longer periods of time. The selected frequency bandwidths worked for the purposes of this project, but other frequency banding should be explored to better represent evolving regulations and knowledge of marine life hearing. Similar to duty cycle concerns, dataset duration being represented in the soundscape code should be explored to understand how a comparison of soundscape code results from a small duration dataset (minutes to hours) compares to results from a large duration datasets (days to months). A final aspect to  Columns indicate the frequency band, and for each band, the median (med) and 95% confidence intervals (CI95) are reported. (H) The minimum and maximum soundscape code median values observed across all sites in corresponding frequency bands. Metrics represented in each row of the soundscape codes are from top to bottom: SPL rms , SPL pk , kurtosis, D-index Index, acorr3. The total range of the soundscape code medians and 95% CIs was divided into quartiles (respectively), and the cell colors correspond to which quartile the value falls into from low to high: blue, green, yellow, red.
consider in future soundscape code performance and application is the duration of the integration and averaging windows for each of the property metrics. All soundscape code metrics were based on 1-min time windowing protocol to better align with what few standard soundscape analysis methods there are (Ainslie et al., 2018). Averaging of sound pressure for the periodicity metrics was done on 0.1-second and 1.0 s windows. Other window sizes should be explored to assess performance and use of the soundscape code. Exhaustive analysis of the impact that different analysis parameters have on the soundscape code metrics would have added to the value of this research, but it did not fit into the scope of the project. Thorough metric analysis did, however, identify several aspects of the selected soundscape code metrics that should be explored.
The selection of acorr3 as the periodicity metric is a prime candidate for additional assessment and development within the soundscape code structure. It was noticed that acorr3 produced "false positives" due to noise in the autocorrelation outputs. This was found with all of the candidate periodicity metrics, but in acorr3 it occurred at a much reduced and more manageable manner. In spite of the potential to falsely indicate the presence of periodicities, acorr3 accurately characterized the soundscape datasets in terms of periodicity, with the exception of MB where the repeated cracking of ice led to a mischaracterization of this site being more periodic than expected.
The candidate metric for uniformity, the H-index, exhibited a strong dependence on the bandwidth of the signal being analyzed, which made within site comparisons of the H-index across soundscape code frequency bands futile and would severely limit the utility of the uniformity metric in the soundscape code. Furthermore, the observed behavior of the H-index in response to anthropogenic activity is similar to findings in Parks et al. (2014): anthropogenic sounds confounded the metric. At OR and GB4v0, the sounds of a seismic survey and pile driving drove the H-index down, while the opposite was observed for the D-index. Ship noise at GB4v35 and GB5 had little effect on the H-index but drove D-index values down as biological signals from fin whales were masked. Elevated ambient sound levels from hurricane Nicole had a similar masking effect on the H and D indices, where the H-index was unaffected but the D-index dropped significantly. The D-index was found to more closely align with the real world signals in the soundscape code datasets and consistently reflected the acoustic uniformity of known sound sources in proper frequency bands. D-index demonstrated a sensitivity that allowed it to highlight nuanced differences in soundscape composition, and ultimately it was chosen as the metric to represent acoustic uniformity. However, extremely small values of the D-index make interpretation more difficult than the metrics that report the other soundscape properties and further scrutiny should be given to ensure metric efficacy.
Both impulsiveness metrics were closely tied to the content of impulsive signals in the soundscape but kurtosis outperformed crest factor in meeting a priori expectations and produced values that made assessments of impulsiveness easier and quicker. The constrained range of possible crest factor values means the variability it produced when characterizing sites in terms of impulsiveness can be narrow and hard to interpret. The larger range of possible kurtosis values meant it could more dramatically reflect differences in transient or impulsive acoustic activity between sites, which makes rapid assessments more feasible. Kurtosis is already well established in signal analysis and acoustics, so compared to the metrics representing periodicity and uniformity, it does not need to be as thoroughly assessed in terms of efficacy. Analysis of kurtosis time series to explain soundscape code metric values across properties led to a realization that time series analysis of the soundscape code metrics is also an informative method for exploring and assessing acoustic environments with implications for future applications.
Targeted analysis of large acoustic datasets could be made easier by analyzing time series data of the soundscape code metrics. D-index time series consistently indicated time periods of dynamic acoustic activity. Peaks in acorr3 metric time series regularly highlighted the presence of echolocation signals and transient periodic acoustic signatures. Time series analysis of kurtosis values demonstrated an impressive utility in the assessment of a variety of aspects of underwater sound by indicating the presence of transient acoustic activity and shifts in acoustic activity in general. Time series analysis of kurtosis suggests the metric could be used in a variety of applications beyond the scope of soundscape comparison using the soundscape code. The relationship between kurtosis and impulsive sounds, and resultant relevance in impact studies indicates it could be used in assessments of noise impacts and mitigation. For example, bubble curtain efficacy could be assessed using the soundscape code or time series analysis of soundscape code metrics as the change in signal across a bubble curtain would assuredly be captured by impulsiveness and amplitude metrics, if not uniformity and periodicity metrics as well. Noise studies sometimes analyze sound at different ranges from a sound source, and the soundscape code metrics could easily be applied to this type of assessment and would quickly and clearly highlight salient differences in the impacted soundscape.
The soundscape code methodology provides a structure for quick and easy quantitative comparisons meant to capture salient soundscape characteristics for directed assessments of sources, patterns, and trends. The utility of the soundscape code methodology lies in succinct, consistent, and transparent reporting of acoustic soundscape properties. If the combination of metrics is calculated and reported in a uniform manner, then direct comparisons are easily made. Ambiguity in reporting of metric calculation parameters makes interpretation of results time consuming and can result in erroneous conclusions; the uniform integration times and frequency bands of the soundscape code allows for accurate direct comparisons with immediate understanding of exactly what is being calculated. Furthermore, the multidimensional nature of the soundscape code helps to highlight similarities and differences in soundscapes that are sometimes overlooked in traditional soundscape analyses.

DATA AVAILABILITY STATEMENT
The data analyzed in this study is subject to the following licenses/restrictions: Some of the soundscape code data sets are proprietary within the companies that collected them. Data access and availability will be considered upon request.
Requests to access these datasets should be directed to DW, dcw1017@wildcats.unh.edu.

AUTHOR CONTRIBUTIONS
JM-O, SM, DH, and AL contributed to the conceptual framework of the study. SM and MS both assisted in the design and practical implementation of the framework. KL guided statistical analysis and interpretation of statistical results. All authors contributed to manuscript revisions, read, and approved the submitted version.