<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Ecol. Evol.</journal-id>
<journal-title>Frontiers in Ecology and Evolution</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Ecol. Evol.</abbrev-journal-title>
<issn pub-type="epub">2296-701X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fevo.2023.1071640</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Ecology and Evolution</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A workflow for the automated detection and classification of female gibbon calls from long-term acoustic recordings</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Clink</surname>
<given-names>Dena J.</given-names>
</name>
<xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
<xref rid="c001" ref-type="corresp"><sup>&#x002A;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/966486/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Kier</surname>
<given-names>Isabel</given-names>
</name>
<xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/1506218/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ahmad</surname>
<given-names>Abdul Hamid</given-names>
</name>
<xref rid="aff2" ref-type="aff"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Klinck</surname>
<given-names>Holger</given-names>
</name>
<xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/549603/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University</institution>, <addr-line>Ithaca, NY</addr-line>, <country>United States</country></aff>
<aff id="aff2"><sup>2</sup><institution>Institute for Tropical Biology and Conservation, Universiti Malaysia Sabah</institution>, <addr-line>Kota Kinabalu, Sabah</addr-line>, <country>Malaysia</country></aff>
<author-notes>
<fn id="fn0001" fn-type="edited-by"><p>Edited by: Marco Gamba, University of Turin, Italy</p></fn>
<fn id="fn0002" fn-type="edited-by"><p>Reviewed by: Lydia Light, University of North Carolina at Charlotte, United States; Tim Sainburg, University of California, San Diego, United States</p></fn>
<corresp id="c001">&#x002A;Correspondence: Dena J. Clink, <email>dena.clink@cornell.edu</email></corresp>
<fn id="fn0003" fn-type="other"><p>This article was submitted to Behavioral and Evolutionary Ecology, a section of the journal Frontiers in Ecology and Evolution</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>09</day>
<month>02</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>11</volume>
<elocation-id>1071640</elocation-id>
<history>
<date date-type="received">
<day>16</day>
<month>10</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>09</day>
<month>01</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2023 Clink, Kier, Ahmad and Klinck.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder> Clink, Kier, Ahmad and Klinck</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Passive acoustic monitoring (PAM) allows for the study of vocal animals on temporal and spatial scales difficult to achieve using only human observers. Recent improvements in recording technology, data storage, and battery capacity have led to increased use of PAM. One of the main obstacles in implementing wide-scale PAM programs is the lack of open-source programs that efficiently process terabytes of sound recordings and do not require large amounts of training data. Here we describe a workflow for detecting, classifying, and visualizing female Northern grey gibbon calls in Sabah, Malaysia. Our approach detects sound events using band-limited energy summation and does binary classification of these events (gibbon female or not) using machine learning algorithms (support vector machine and random forest). We then applied an unsupervised approach (affinity propagation clustering) to see if we could further differentiate between true and false positives or the number of gibbon females in our dataset. We used this workflow to address three questions: (1) does this automated approach provide reliable estimates of temporal patterns of gibbon calling activity; (2) can unsupervised approaches be applied as a post-processing step to improve the performance of the system; and (3) can unsupervised approaches be used to estimate how many female individuals (or clusters) there are in our study area? We found that performance plateaued with &#x003E;160 clips of training data for each of our two classes. Using optimized settings, our automated approach achieved a satisfactory performance (F1 score&#x2009;~&#x2009;80%). The unsupervised approach did not effectively differentiate between true and false positives or return clusters that appear to correspond to the number of females in our study area. Our results indicate that more work needs to be done before unsupervised approaches can be reliably used to estimate the number of individual animals occupying an area from PAM data. Future work applying these methods across sites and different gibbon species and comparisons to deep learning approaches will be crucial for future gibbon conservation initiatives across Southeast Asia.</p>
</abstract>
<kwd-group>
<kwd>machine learning</kwd>
<kwd><italic>Hylobates</italic></kwd>
<kwd>R programing language</kwd>
<kwd>signal processing</kwd>
<kwd>bioacoustics</kwd>
<kwd>Southeast Asia</kwd>
</kwd-group>
<counts>
<fig-count count="9"/>
<table-count count="2"/>
<equation-count count="0"/>
<ref-count count="130"/>
<page-count count="19"/>
<word-count count="14839"/>
</counts>
</article-meta>
</front>
<body>
<sec id="sec1" sec-type="intro">
<title>Introduction</title>
<sec id="sec2">
<title>Passive acoustic monitoring</title>
<p>Researchers worldwide are increasingly interested in passive acoustic monitoring (PAM), which relies on autonomous recording units to monitor vocal animals and their habitats. Increased availability of low-cost recording units (<xref ref-type="bibr" rid="ref52">Hill et al., 2018</xref>; <xref ref-type="bibr" rid="ref101">Sethi et al., 2018</xref>; <xref ref-type="bibr" rid="ref115">Sugai et al., 2019</xref>), along with advances in data storage capabilities, makes the use of PAM an attractive option for monitoring vocal species in inaccessible areas where the animals are difficult to monitor visually (such as dense rainforests) or when the animals exhibit cryptic behavior (<xref ref-type="bibr" rid="ref30">Deichmann et al., 2018</xref>). Even in cases where other methods such as visual surveys are feasible, PAM may be superior as it may be able to detect animals continuously for extended periods of time, at a greater range than visual methods, can operate under any light conditions, and is more amenable to automated data collection than visual or trapping techniques (<xref ref-type="bibr" rid="ref79">Marques et al., 2013</xref>). In addition, PAM provides an objective, non-invasive method that limits observer bias in detection of target signals.</p>
<p>One of the most widely recognized benefits of using acoustic monitoring, apart from the potential to reduce the amount of time needed for human observers, is that there is a permanent record of the monitored soundscape (<xref ref-type="bibr" rid="ref129">Zwart et al., 2014</xref>; <xref ref-type="bibr" rid="ref114">Sugai and Llusia, 2019</xref>). In addition, the use of archived acoustic data allows for multiple analysts at different times to review and validate detections/classifications, as opposed to point-counts where one or multiple observers, often with varying degrees of experience, collect the data <italic>in-situ</italic>. It is, therefore, not surprising that, in many cases, analysis of recordings taken by autonomous recorders can be more effective than using trained human observers in the field. For example, a comparison of PAM and human observers to detect European nightjars (<italic>Caprimulgus europaeus</italic>) showed that PAM detected nightjars during 19 of 22 survey periods, while surveyors detected nightjars on only six of these occasions (<xref ref-type="bibr" rid="ref129">Zwart et al., 2014</xref>). An analysis of 21 bird studies that compared detections by human observers and detections from acoustic data collected using autonomous recorders found that for 15 of the studies, manual analysis of PAM acoustic data led to results that were equal to or better than results from point counts done using human observers (<xref ref-type="bibr" rid="ref105">Shonfield and Bayne, 2017</xref>). Despite the rapidly expanding advances in PAM technology, the use of PAM is limited by a lack of widely applicable analytical methods and the limited availability of open-source audio processing tools, particularly for the tropics, where soundscapes are very complex (<xref ref-type="bibr" rid="ref41">Gibb et al., 2018</xref>).</p>
<p>Interest in the use of PAM to monitor nonhuman primates has increased in recent years, with one of the foundational papers using PAM to estimate occupancy of three signal types: chimpanzee buttress drumming (<italic>Pan troglodytes</italic>) and the loud calls of the Diana monkey (<italic>Cercopithecus diana</italic>) and king colobus monkey (<italic>Colobus polykomos</italic>) in Ta&#x00EF; National Park, C&#x00F4;te d&#x2019;Ivoire (<xref ref-type="bibr" rid="ref58">Kalan et al., 2015</xref>). The authors found that occurrence data from PAM combined with automated processing methods was comparable to that collected by human observers. Since then, PAM has been used to investigate chimpanzee group ranging and territory use (<xref ref-type="bibr" rid="ref59">Kalan et al., 2016</xref>), vocal calling patterns of gibbons (<italic>Hylobates funereus;</italic> <xref ref-type="bibr" rid="ref20">Clink et al., 2020b</xref>) and howler monkeys (<italic>Alouatta caraya;</italic> <xref ref-type="bibr" rid="ref89">P&#x00E9;rez-Granados and Schuchmann, 2021</xref>), occupancy modeling of gibbons (<italic>Nomascus gabriellae;</italic> <xref ref-type="bibr" rid="ref119">Vu and Tran, 2019</xref>) and density estimation of pale fork-marked lemurs (<italic>Phaner pallescens</italic>) based on calling bout rates (<xref ref-type="bibr" rid="ref78">Markolf et al., 2022</xref>).</p>
</sec>
<sec id="sec3">
<title>Acoustic analysis of long-term datasets</title>
<p>Traditional approaches for finding signals of interest include hand-browsing spectrograms to identify signals of interest using programs such as Raven Pro (K. Lisa Yang Center for Conservation Bioacoustics, Ithaca, NY, USA). This approach can reduce processing time relative to listening to the recordings but requires trained analysts and substantial human investment. Another approach is hand-browsing of long-term spectral averages (LTSAs), which still requires a significant time investment, but allows analysts to process data at a faster rate than hand-browsing of regular spectrograms, as LTSAs provide a visual representation of the soundscape over a larger time period [days to weeks to years (<xref ref-type="bibr" rid="ref125">Wiggins, 2003</xref>; <xref ref-type="bibr" rid="ref20">Clink et al., 2020b</xref>)]. However, particularly with the advances in data storage capabilities and deployment of arrays of recorders collecting data continuously, the amount of time necessary for hand-browsing or listening to recordings for signals of interest is prohibitive and is not consistent with conservation goals that require rapid assessment. This necessitates reliable, automated approaches to efficiently process large amounts of acoustic data.</p>
</sec>
<sec id="sec4">
<title>Automated detection and classification</title>
<p>Machine listening, a fast-growing field in computer science, is a form of artificial intelligence that &#x201C;learns&#x201D; from training data to perform particular tasks, such as detecting and classifying acoustic signals (<xref ref-type="bibr" rid="ref121">W&#x00E4;ldchen and M&#x00E4;der, 2018</xref>). Artificial neural networks (<xref ref-type="bibr" rid="ref83">Mielke and Zuberb&#x00FC;hler, 2013</xref>), Gaussian mixture models (<xref ref-type="bibr" rid="ref51">Heinicke et al., 2015</xref>), and Support Vector Machines (<xref ref-type="bibr" rid="ref51">Heinicke et al., 2015</xref>; <xref ref-type="bibr" rid="ref62">Keen et al., 2017</xref>) &#x2013; some of the more commonly used algorithms for early applications of human speech recognition (<xref ref-type="bibr" rid="ref86">Muda et al., 2010</xref>; <xref ref-type="bibr" rid="ref25">Dahake and Shaw, 2016</xref>) &#x2013; can be used for the automated detection of terrestrial animal signals from long-term recordings. Many different automated detection approaches for terrestrial animals using these early machine-learning models have been developed (<xref ref-type="bibr" rid="ref58">Kalan et al., 2015</xref>; <xref ref-type="bibr" rid="ref127">Zeppelzauer et al., 2015</xref>; <xref ref-type="bibr" rid="ref62">Keen et al., 2017</xref>). Given the diversity of signal types and acoustic environments, no single detection algorithm performs well across all signal types and recording environments.</p>
</sec>
<sec id="sec5">
<title>A summary of existing automated detection/classification approaches</title>
<p>Python and R are the two most popular open-source programming languages for scientific research (<xref ref-type="bibr" rid="ref100">Scavetta and Angelov, 2021</xref>). Although Python has surpassed R in overall popularity, R remains an important and complementary language, especially in the life sciences (<xref ref-type="bibr" rid="ref69">Lawlor et al., 2022</xref>). An analysis of 30 ecology journals indicated that in 2017 over 58% of ecological studies utilized the R programming environment (<xref ref-type="bibr" rid="ref67">Lai et al., 2019</xref>). Although we could not find a more recent assessment, we are certain that R remains an important tool for ecologists and conservationists. Therefore, automated detection/classification workflows in R may be more accessible to ecologists already familiar with the R programming environment. Already, many existing R packages can be used for importing, visualizing, and manipulating sound files. For example, &#x201C;seewave&#x201D; (<xref ref-type="bibr" rid="ref113">Sueur et al., 2008</xref>) and &#x201C;tuneR&#x201D; (<xref ref-type="bibr" rid="ref71">Ligges et al., 2016</xref>) are some of the more commonly used packages for reading in sound files, visualizing spectrograms and extracting features.</p>
<p>An early workflow and R package &#x201C;flightcallr&#x201D; used random forest classification to classify bird calls, but the detection of candidate signals using band-limited energy summation was done using an external program, Raven Pro (<xref ref-type="bibr" rid="ref95">Ross and Allen, 2014</xref>). One of the first R packages that provided a complete automated detection/classification of acoustic signals workflow in R was &#x201C;monitoR,&#x201D; which provides functions for detection using spectrogram cross-correlation and bin template matching (<xref ref-type="bibr" rid="ref61">Katz et al., 2016b</xref>). In spectrogram cross-correlation, the detection and classification steps are combined. The R package &#x201C;warbleR&#x201D; has functions for visualization and detection of acoustic signals using band-limited energy summation, all done in R (<xref ref-type="bibr" rid="ref2">Araya-Salas and Smith-Vidaurre, 2017</xref>).</p>
<p>There has been an increase in the use of deep learning&#x2014;a subfield of machine listening that utilizes neural network architecture&#x2014;for the combined automated detection/classification of acoustic signals. Target species include North Atlantic right whales (<italic>Eubalaena glacialis,</italic> <xref ref-type="bibr" rid="ref104">Shiu et al., 2020</xref>), fin whales (<italic>Balaenoptera physalus</italic>, <xref ref-type="bibr" rid="ref75">Madhusudhana et al., 2021</xref>), North American and European bird species (<xref ref-type="bibr" rid="ref57">Kahl et al., 2021</xref>), multiple forest birds and mammals in the Pacific Northwest (<xref ref-type="bibr" rid="ref97">Ruff et al., 2021</xref>), chimpanzees (<italic>Pan troglodytes</italic>, <xref ref-type="bibr" rid="ref1">Anders et al., 2021</xref>), high frequency and ultrasonic mouse lemur (<italic>Microcebus murinus</italic>) calls (<xref ref-type="bibr" rid="ref93">Romero-Mujalli et al., 2021</xref>) and Hainan gibbon (<italic>Nomascus hainanus</italic>) vocalizations (<xref ref-type="bibr" rid="ref35">Dufourq et al., 2021</xref>). See <xref rid="tab1" ref-type="table">Table 1</xref> for a summary of existing approaches that use R or Python for the automated detection of acoustic signals from terrestrial PAM data. Note that the only applications for gibbons are on a single species, the Hainan gibbon.</p>
<table-wrap position="float" id="tab1">
<label>Table 1</label>
<caption>
<p>Summary of existing approaches that use R or Python for the automated detection/classification of acoustic signals from terrestrial PAM data.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="middle">Signal type</th>
<th align="left" valign="middle">Training data recording location</th>
<th align="left" valign="middle">Detection/classification approach</th>
<th align="left" valign="middle">R?</th>
<th align="left" valign="middle">Python?</th>
<th align="left" valign="middle">Open source?</th>
<th align="left" valign="middle">Citation</th>
<th align="left" valign="middle">Repository?</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Nocturnal flight calls of multiple avian species</td>
<td align="left" valign="top">Six locations in New York State, USA</td>
<td align="left" valign="top">BLED detector in external program + RF</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top">N</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top"><xref ref-type="bibr" rid="ref95">Ross and Allen (2014)</xref></td>
<td align="left" valign="top">Package on R forge (<xref ref-type="bibr" rid="ref94">Ross, 2013</xref>)</td>
</tr>
<tr>
<td align="left" valign="top">Four primate species</td>
<td align="left" valign="top">Ta&#x00EF; National Park, C&#x00F4;te d&#x2019;Ivoire</td>
<td align="left" valign="top">Speaker segmentation + SVM or Gaussian Mixture Models</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top">N</td>
<td align="left" valign="top">N</td>
<td align="left" valign="top"><xref ref-type="bibr" rid="ref51">Heinicke et al. (2015)</xref></td>
<td align="left" valign="top">Code availability not indicated in publication</td>
</tr>
<tr>
<td align="left" valign="top">Two northeastern songbird species</td>
<td align="left" valign="top">10 sites in Vermont and New York, USA</td>
<td align="left" valign="top">Binary point matching or spectrogram cross-correlation</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top">N</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top"><xref ref-type="bibr" rid="ref60">Katz et al. (2016a</xref>,<xref ref-type="bibr" rid="ref61">b)</xref></td>
<td align="left" valign="top">Package on CRAN (<xref ref-type="bibr" rid="ref44">Hafner and Katz, 2018</xref>)</td>
</tr>
<tr>
<td align="left" valign="top">Forest elephants</td>
<td align="left" valign="top">Three sites in Gabon and one in the Central African Republic</td>
<td align="left" valign="top">CNNs</td>
<td align="left" valign="top">N</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top">N</td>
<td align="left" valign="top"><xref ref-type="bibr" rid="ref8">Bjorck et al. (2019)</xref></td>
<td align="left" valign="top">Code availability not indicated in publication</td>
</tr>
<tr>
<td align="left" valign="top">Two frog species</td>
<td align="left" valign="top">Temperate N. America and Panama</td>
<td align="left" valign="top">Measure the presence of periodic structure based on the power spectral density</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top"><xref ref-type="bibr" rid="ref68">Lapp et al. (2021)</xref></td>
<td align="left" valign="top">Python and R implementations on GitHub</td>
</tr>
<tr>
<td align="left" valign="top">No signals specified</td>
<td align="left" valign="top">~</td>
<td align="left" valign="top">Binary point matching or spectrogram cross-correlation + SVM, RF, others</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top">N</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top"><xref ref-type="bibr" rid="ref5">Balantic and Donovan (2020)</xref></td>
<td align="left" valign="top">Package on Gitlab</td>
</tr>
<tr>
<td align="left" valign="top">Chimpanzees</td>
<td align="left" valign="top">Ta&#x00EF; National Park, C&#x00F4;te d&#x2019;Ivoire</td>
<td align="left" valign="top">Convolutional recurrent neural networks</td>
<td align="left" valign="top">N</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top"><xref ref-type="bibr" rid="ref1">Anders et al. (2021)</xref></td>
<td align="left" valign="top">Package on GitHub</td>
</tr>
<tr>
<td align="left" valign="top">984 bird species</td>
<td align="left" valign="top">North America and Europe</td>
<td align="left" valign="top">Deep artificial neural networks</td>
<td align="left" valign="top">N</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top"><xref ref-type="bibr" rid="ref57">Kahl et al. (2021)</xref></td>
<td align="left" valign="top">Source code on GitHub</td>
</tr>
<tr>
<td align="left" valign="top">12 bird species and 2 small mammal species</td>
<td align="left" valign="top">Forested landscapes of Oregon and Washington, USA</td>
<td align="left" valign="top">CNNs</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top"><xref ref-type="bibr" rid="ref97">Ruff et al. (2021)</xref></td>
<td align="left" valign="top">Code and data on Zenodo (<xref ref-type="bibr" rid="ref96">Ruff et al., 2020</xref>)</td>
</tr>
<tr>
<td align="left" valign="top">Hainan gibbons</td>
<td align="left" valign="top">Hainan, China</td>
<td align="left" valign="top">CNNs</td>
<td align="left" valign="top">N</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top"><xref ref-type="bibr" rid="ref35">Dufourq et al. (2021)</xref></td>
<td align="left" valign="top">Code available on GitHub; training data on Zenodo (<xref ref-type="bibr" rid="ref36">Dufourq et al., 2020</xref>)</td>
</tr>
<tr>
<td align="left" valign="top">Bat echolocation calls and two owl species</td>
<td align="left" valign="top">Europe</td>
<td align="left" valign="top">CNNs</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top"><xref ref-type="bibr" rid="ref107">Silva et al. (2022)</xref></td>
<td align="left" valign="top">Package on CRAN (<xref ref-type="bibr" rid="ref106">Silva, 2022</xref>)</td>
</tr>
<tr>
<td align="left" valign="top">Hainan gibbons, black-and-white ruffed lemurs and two bird species</td>
<td align="left" valign="top">Hainan, China; Ranomafana National Park, Madagascar; Mount Mulanje Biosphere Reserve, Malawi and Intaka Island Nature Reserve in Cape Town, South Africa</td>
<td align="left" valign="top">Pretrained CNNs (e.g., transfer learning)</td>
<td align="left" valign="top">N</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top"><xref ref-type="bibr" rid="ref34">Dufourq et al. (2022)</xref></td>
<td align="left" valign="top">Code available on GitHub</td>
</tr>
<tr>
<td align="left" valign="top">60 species of katydids</td>
<td align="left" valign="top">Barro Colorado Island, Panama</td>
<td align="left" valign="top">CNNs</td>
<td align="left" valign="top">N</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top">Y</td>
<td align="left" valign="top"><xref ref-type="bibr" rid="ref76">Madhusudhana et al. (2019)</xref></td>
<td align="left" valign="top">Code available on Zenodo (<xref ref-type="bibr" rid="ref74">Madhusudhana, 2021</xref>)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Repositories are linked if they have an associated digital object identifier (DOI) or are available <italic>via</italic> package development web sites such as the Comprehensive R Archive Network (CRAN). Otherwise, availability as indicated in associated publications is shown.</p>
</table-wrap-foot>
</table-wrap>
<p>Recently, a workflow was developed that provided a graphical interface through a Shiny application and RStudio for the automated detection of acoustic signals, with the automated detection and classification done using a deep convolutional neural network (CNN) implemented in Python (<xref ref-type="bibr" rid="ref97">Ruff et al., 2021</xref>). Another R package utilizes deep learning for the automated detection of bat echolocation calls; this package also relies on CNNs implemented in Python (<xref ref-type="bibr" rid="ref107">Silva et al., 2022</xref>). Deep learning approaches are promising, but they often require large amounts of training data, which can be challenging to obtain, particularly for rare animals or signals (<xref ref-type="bibr" rid="ref1">Anders et al., 2021</xref>). In addition, training deep learning models may require extensive computational power and specialized hardware (<xref ref-type="bibr" rid="ref34">Dufourq et al., 2022</xref>); effective training of deep learning models also generally requires a high level of domain knowledge (<xref ref-type="bibr" rid="ref53">Hodnett et al., 2019</xref>).</p>
</sec>
<sec id="sec6">
<title>Feature extraction</title>
<p>An often necessary step for classification of acoustic signals (unless using deep learning or spectrogram cross-correlation) is feature extraction, wherein the digital waveform is reduced to a meaningful number of informative acoustic features. Traditional approaches relied on manual feature extraction from the spectrogram, but this method requires substantial effort from human observers, which means it is not optimal for automated approaches. Early automated approaches utilized feature sets such as Mel-frequency cepstral coefficients; MFCCs (<xref ref-type="bibr" rid="ref51">Heinicke et al., 2015</xref>), a feature extraction method originally designed for human speech applications (<xref ref-type="bibr" rid="ref48">Han et al., 2006</xref>; <xref ref-type="bibr" rid="ref86">Muda et al., 2010</xref>). Despite their relative simplicity, MFCCs can be used to effectively distinguish between female Northern grey gibbon individuals (<xref ref-type="bibr" rid="ref16">Clink et al., 2018a</xref>), terrestrial and underwater soundscapes (<xref ref-type="bibr" rid="ref32">Dias et al., 2021</xref>), urban soundscapes (<xref ref-type="bibr" rid="ref87">Noviyanti et al., 2019</xref>), and even the presence or absence of queen bees in a bee hive (<xref ref-type="bibr" rid="ref109">Soares et al., 2022</xref>). Although the use of MFCCs as features for distinguishing between individuals in other gibbon species has been limited, the many documented cases of vocal individuality across gibbon species (<xref ref-type="bibr" rid="ref45">Haimoff and Gittins, 1985</xref>; <xref ref-type="bibr" rid="ref46">Haimoff and Tilson, 1985</xref>; <xref ref-type="bibr" rid="ref116">Sun et al., 2011</xref>; <xref ref-type="bibr" rid="ref123">Wanelik et al., 2012</xref>; <xref ref-type="bibr" rid="ref38">Feng et al., 2014</xref>) indicate that MFCCs will most likely be effective features for discriminating individuals of other gibbon species. There are numerous other options for feature extraction, including automated generation of spectro-temporal features for sound events (<xref ref-type="bibr" rid="ref113">Sueur et al., 2008</xref>; <xref ref-type="bibr" rid="ref95">Ross and Allen, 2014</xref>) and calculating a set of acoustic indices (<xref ref-type="bibr" rid="ref54">Huancapaza Hilasaca et al., 2021</xref>).</p>
<p>Other approaches rely on spectrogram images and treat sound classification as an image classification problem (<xref ref-type="bibr" rid="ref73">Lucio et al., 2015</xref>; <xref ref-type="bibr" rid="ref121">W&#x00E4;ldchen and M&#x00E4;der, 2018</xref>; <xref ref-type="bibr" rid="ref128">Zottesso et al., 2018</xref>). For many of the current deep learning approaches, the input for the classification is the spectrogram, which can be on the linear or Mel-frequency scale (reviewed in <xref ref-type="bibr" rid="ref112">Stowell, 2022</xref>). An approach that has gained traction in recent years is the use of embeddings, wherein a pre-trained convolutional neural network (CNN), for example, using &#x2018;Google&#x2019;s AudioSet&#x2019; dataset (<xref ref-type="bibr" rid="ref40">Gemmeke et al., 2017</xref>), is used to create a set of informative, representative features. A common way to do this is to remove the final classification layer from the pre-trained network, which leaves a high-dimensional feature representation of the acoustic data (<xref ref-type="bibr" rid="ref112">Stowell, 2022</xref>). This approach has been used successfully in numerous ecoacoustic applications (<xref ref-type="bibr" rid="ref103">Sethi et al., 2020</xref>, <xref ref-type="bibr" rid="ref102">2022</xref>; <xref ref-type="bibr" rid="ref50">Heath et al., 2021</xref>).</p>
</sec>
<sec id="sec7">
<title>Training, validation, and test datasets</title>
<p>When doing automated detection of animal calls, the number and diversity of training data samples must be taken into consideration to minimize false positives (where the system falsely classifies the signal as the signal of interest) and false negatives (e.g., missed opportunities), where the system fails to detect the signal of interest. To avoid overfitting &#x2014; a phenomenon that occurs when model performance is not generalizable to data that was not included in the training dataset &#x2014; it is essential to separate data into training, validation, and test sets (<xref ref-type="bibr" rid="ref51">Heinicke et al., 2015</xref>; <xref ref-type="bibr" rid="ref81">Mellinger et al., 2016</xref>). The training dataset is the sample of data that was used to fit the model, the validation set is used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters, and the test dataset is the sample of data used to provide an unbiased evaluation of a final model fit. Some commonly used metrics include precision (the proportion of detections that are true detections) and recall (the proportion of actual calls that are successfully detected; <xref ref-type="bibr" rid="ref81">Mellinger et al., 2016</xref>). Often, these metrics are converted to false alarm rates, such as the rate of false positives per hour, which can help guide decisions about the detection threshold. In addition, when doing automated detection and classification, it is common to use a threshold (such as the probability assigned to a classification by a machine learning algorithm) to make decisions about rejecting or accepting a detection (<xref ref-type="bibr" rid="ref81">Mellinger et al., 2016</xref>). Varying these thresholds will result in changes to false-positive and the proportion of missed calls. These can be plotted with receiver operating curves (ROC; <xref ref-type="bibr" rid="ref117">Swets, 1964</xref>) or detection error tradeoff curves (DET; <xref ref-type="bibr" rid="ref80">Martin et al., 1997</xref>).</p>
</sec>
<sec id="sec8">
<title>PAM of gibbons</title>
<p>Gibbons are pair-living, territorial small apes that regularly emit species- and sex-specific long-distance vocalizations that can be heard &#x003E;1&#x2009;km in a dense forest (<xref ref-type="bibr" rid="ref84">Mitani, 1984</xref>, <xref ref-type="bibr" rid="ref85">1985</xref>; <xref ref-type="bibr" rid="ref39">Geissmann, 2002</xref>; <xref ref-type="bibr" rid="ref14">Clarke et al., 2006</xref>). All but one of the approximately 20 gibbon species are classified as Endangered or Critically Endangered, making them an important target for conservation efforts (<xref ref-type="bibr" rid="ref56">IUCN, 2022</xref>). Gibbons are often difficult to observe visually in the forest canopy but relatively easy to detect acoustically (<xref ref-type="bibr" rid="ref85">Mitani, 1985</xref>), which makes them ideal candidates for PAM. Indeed, many early studies relied on human observers listening to calling gibbons to estimate group density using fixed-point counts (<xref ref-type="bibr" rid="ref11">Brockelman and Srikosamatara, 1993</xref>; <xref ref-type="bibr" rid="ref47">Hamard et al., 2010</xref>; <xref ref-type="bibr" rid="ref90">Phoonjampa et al., 2011</xref>; <xref ref-type="bibr" rid="ref64">Kidney et al., 2016</xref>). To date, relatively few gibbon species have been monitored using PAM, including the Hainan gibbon in China (<xref ref-type="bibr" rid="ref35">Dufourq et al., 2021</xref>), yellow-cheeked gibbons in Vietnam (<xref ref-type="bibr" rid="ref119">Vu and Tran, 2019</xref>, <xref ref-type="bibr" rid="ref120">2020</xref>), and Northern grey gibbons (<italic>Hylobates funereus</italic>) on Malaysian Borneo (<xref ref-type="bibr" rid="ref20">Clink et al., 2020b</xref>). However, this will undoubtedly change over the next few years with increased interest and accessibility of equipment and analytical tools needed for effective PAM of gibbon species across Southeast Asia.</p>
<p>Most gibbon species have two types of long-distance vocalizations. Male solo is the term used for male vocalizations emitted while vocalizing alone, and duets are the coordinated vocal exchange between the adult male and female of the pair (<xref ref-type="bibr" rid="ref23">Cowlishaw, 1992</xref>, <xref ref-type="bibr" rid="ref24">1996</xref>). Gibbons generally call in the early morning, with male gibbon solos starting earlier than the duets (<xref ref-type="bibr" rid="ref20">Clink et al., 2020b</xref>). In the current paper, we focused our analysis on a call type in the female contribution to the duet, known as the great call, for two reasons. First, the structure of the great call is highly stereotyped, individually distinct (<xref ref-type="bibr" rid="ref118">Terleph et al., 2015</xref>; <xref ref-type="bibr" rid="ref15">Clink et al., 2017</xref>), of longer duration than other types of gibbon vocalizations, and the males tend to be silent during the female great call, which facilitates better automated detection. Second, most acoustic density estimation techniques focus on duets, as females rarely sing if they are not in a mated pair (<xref ref-type="bibr" rid="ref84">Mitani, 1984</xref>). In contrast, males will solo whether in a mated pair or drifters (<xref ref-type="bibr" rid="ref11">Brockelman and Srikosamatara, 1993</xref>), which means automated detection of the female call will be more relevant for density estimation (<xref ref-type="bibr" rid="ref64">Kidney et al., 2016</xref>) using PAM. Northern grey gibbon females have been shown to emit individually distinct calls (<xref ref-type="bibr" rid="ref15">Clink et al., 2017</xref>, <xref ref-type="bibr" rid="ref16">2018a</xref>), and these calls can be discriminated well using both supervised and unsupervised methods (<xref ref-type="bibr" rid="ref22">Clink and Klinck, 2021</xref>).</p>
</sec>
<sec id="sec9">
<title>Individual vocal signatures and PAM</title>
<p>A major hurdle in the implementation of many PAM applications is the fact that individual identity is unknown, as data are collected in the absence of a human observer. In particular, density estimation using PAM data would greatly benefit from the ability to infer the number of individuals in the survey area from acoustic data (<xref ref-type="bibr" rid="ref111">Stevenson et al., 2015</xref>). The location of the calling animal can infer individual identity. Still, precise acoustic localization that relies on the time difference of arrival (TDOA) of a signal at multiple autonomous recording units can be logistically and analytically challenging (<xref ref-type="bibr" rid="ref500">Wijers et al., 2021</xref>). Another way that individual identity can be inferred from acoustic data is through individually distinct vocal signatures. Individual vocal signatures have been identified across a diverse range of taxonomic groups (<xref ref-type="bibr" rid="ref26">Darden et al., 2003</xref>; <xref ref-type="bibr" rid="ref42">Gillam and Chaverri, 2012</xref>; <xref ref-type="bibr" rid="ref63">Kershenbaum et al., 2013</xref>; <xref ref-type="bibr" rid="ref37">Favaro et al., 2016</xref>). Most studies investigating individual signatures use supervised methods, wherein the identity of the calling individual is known, but see <xref ref-type="bibr" rid="ref99">Sainburg et al. (2020)</xref> for unsupervised applications on individual vocal signatures. Identifying the number of individuals based on acoustic differences from PAM data remains a challenge, as unsupervised approaches must be used since the data are, by definition, collected in the absence of human observers (<xref ref-type="bibr" rid="ref22">Clink and Klinck, 2021</xref>; <xref ref-type="bibr" rid="ref98">Sadhukhan et al., 2021</xref>).</p>
</sec>
<sec id="sec10">
<title>Overview of the automated detection/classification workflow</title>
<p>This workflow complements existing R packages for acoustic analysis, such as tuneR (<xref ref-type="bibr" rid="ref71">Ligges et al., 2016</xref>), seewave (<xref ref-type="bibr" rid="ref113">Sueur et al., 2008</xref>), warbleR (<xref ref-type="bibr" rid="ref2">Araya-Salas and Smith-Vidaurre, 2017</xref>), and monitoR (<xref ref-type="bibr" rid="ref61">Katz et al., 2016b</xref>), and contributes functionalities for automated detection and classification using support vector machine, SVM (<xref ref-type="bibr" rid="ref82">Meyer et al., 2017</xref>) and random forest, RF (<xref ref-type="bibr" rid="ref70">Liaw and Wiener, 2002</xref>) algorithms. Automated detection of signals in this workflow follows nine main steps: (1) Create labeled training, validation, and test datasets; (2) identify potential sound events using a band-limited energy detector; (3) data reduction and feature extraction of sound events using Mel-frequency cepstral coefficients; MFCCs (<xref ref-type="bibr" rid="ref48">Han et al., 2006</xref>; <xref ref-type="bibr" rid="ref86">Muda et al., 2010</xref>); (4) train machine learning algorithms on the training dataset (5) classify the sound events in the validation dataset using trained machine learning algorithms and calculate performance metrics on the validation dataset to find optimal settings; (6) use a manually labeled test dataset to evaluate model performance; (7) run the detector/classifier over the entire dataset (once the optimal settings have been identified); (8) verify all detections and remove false positives; and (9) use the validated output from the detector/classifier for inference (<xref rid="fig1" ref-type="fig">Figure 1</xref>).</p>
<fig position="float" id="fig1">
<label>Figure 1</label>
<caption>
<p>Schematic of automated detection/classification workflow presented in the current study. See the text for details about each step.</p>
</caption>
<graphic xlink:href="fevo-11-1071640-g001.tif"/>
</fig>
<p>When training the system, it is important to use data that will not be used in the subsequent testing phase, as this may artificially inflate accuracy estimates (<xref ref-type="bibr" rid="ref51">Heinicke et al., 2015</xref>). Creating labeled datasets and subsequent validation of detections to remove false positives requires substantial input and investment by trained analysts; this is the case for all automated detection approaches, even those that utilize sophisticated deep learning approaches. In addition, automated approaches generally require substantial investment in modifying and tuning parameters to identify optimal settings. Therefore, although automated approaches substantially reduce processing time relative to manual review, they still require high levels of human investment throughout the process.</p>
</sec>
<sec id="sec11">
<title>Objectives</title>
<p>We have three main objectives with this manuscript. Although more sophisticated methods of automated detection that utilize deep learning approaches exist (e.g., <xref ref-type="bibr" rid="ref35">Dufourq et al., 2021</xref>, <xref ref-type="bibr" rid="ref34">2022</xref>; <xref ref-type="bibr" rid="ref124">Wang et al., 2022</xref>), these methods generally require substantial training datasets and are not readily available for users of the R programming environment (<xref ref-type="bibr" rid="ref92">R Core Team, 2022</xref>). However, see (<xref ref-type="bibr" rid="ref107">Silva et al., 2022</xref>) for a comprehensive deep-learning R package that relies heavily on Python. We aim to provide an open-source, step-by-step workflow for the automated detection and classification of Northern grey gibbon (<italic>H. funereus</italic>; hereafter gibbons) female calls using readily available machine learning algorithms in the R programming environment. The results of our study will provide an important benchmark for automated detection/classification applications for gibbon female great calls. We also test whether a post-processing step that utilizes unsupervised clustering can help improve the performance of our system, namely if this approach can help further differentiate between true and false positives. Lastly, as there have been relatively few studies of gibbons that utilize automated detection methods to address a well-defined research question (but see <xref ref-type="bibr" rid="ref35">Dufourq et al., 2021</xref> for an example on Hainan gibbons), we aimed to show how PAM can be used to address two different research questions. Specifically, we aim to answer the questions: (1) can we use unsupervised approaches to estimate how many female individuals (or clusters) there are in our study area, and (2) can this approach be used to investigate temporal patterns of gibbon calling activity? We utilized affinity propagation clustering to estimate the number of females (or clusters) in our dataset (<xref ref-type="bibr" rid="ref33">Dueck, 2009</xref>). This unsupervised clustering algorithm has been shown to be useful for identifying the number of gibbon females in a labeled dataset (<xref ref-type="bibr" rid="ref22">Clink and Klinck, 2021</xref>). To investigate temporal patterns of calling activity, we compared estimates derived from our automated system to those obtained using manual annotations from LTSAs by a human observer (<xref ref-type="bibr" rid="ref20">Clink et al., 2020b</xref>).</p>
</sec>
</sec>
<sec id="sec12" sec-type="materials|methods">
<title>Materials and methods</title>
<sec id="sec13">
<title>Data collection</title>
<p>Acoustic data were collected using first-generation Swift recorders (<xref ref-type="bibr" rid="ref65">Koch et al., 2016</xref>) developed by the K. Lisa Yang Center for Conservation Bioacoustics at the Cornell Lab of Ornithology. The sensitivity of the used microphones was &#x2212;44 (+/&#x2212;3) dB re 1&#x2009;V/Pa. The microphone&#x2019;s frequency response was not measured but is assumed to be flat (+/&#x2212; 2&#x2009;dB) in the frequency range 100&#x2009;Hz to 7.5&#x2009;kHz. The analog signal was amplified by 40&#x2009;dB and digitized (16-bit resolution) using an analog-to-digital converter (ADC) with a clipping level of &#x2212;/+ 0.9&#x2009;V. Recordings were saved as consecutive two-hour Waveform Audio File Format (WAV) with a size of approximately 230&#x2009;MB. We recorded using a sampling rate of 16&#x2009;kHz, giving a Nyquist frequency of 8&#x2009;kHz, which is well above the range of the fundamental frequency of Northern grey gibbon calls (0.5 to 1.6&#x2009;kHz). We deployed eleven Swift autonomous recording units spaced on a 750-m grid encompassing an area of approximately 3&#x2009;km<sup>2</sup> in the Danum Valley Conservation Area, Sabah, Malaysia (4&#x00B0;57&#x2032;53.00&#x2033;N, 117&#x00B0;48&#x2032;18.38&#x2033;E) from February 13&#x2013;April 21, 2018. We attached recorders to trees at approximately 2-m height and recorded continuously over a 24-h period.</p>
<p>Source height (<xref ref-type="bibr" rid="ref28">Darras et al., 2016</xref>) and presumably recorder height can influence the detection range of the target signal, along with the frequency range of the signal, levels of ambient noise in the frequency range of interest, topography, and source level of the calling animal (<xref ref-type="bibr" rid="ref27">Darras et al., 2018</xref>). Given the monetary and logistical constraints for placing recorders in the canopy, we opted to place the recorders at a lower height. Our estimated detection range is approximately 500 meters using the settings described below (<xref ref-type="bibr" rid="ref21">Clink and Klinck, 2019</xref>), and future work investigating the effect of recorder height on detection range will be informative. Danum Valley Conservation Area encompasses approximately 440&#x2009;km<sup>2</sup> of lowland dipterocarp forest and is considered &#x2018;aseasonal&#x2019; as it does not have distinct wet and dry seasons like many tropical forest regions (<xref ref-type="bibr" rid="ref122">Walsh and Newbery, 1999</xref>). Gibbons are less likely to vocalize if there was rain the night before, although rain appears to have a stronger influence on male solos than coordinated duets (<xref ref-type="bibr" rid="ref20">Clink et al., 2020b</xref>). The reported group density of gibbons in the Danum Valley Conservation Area is ~4.7 per km<sup>2</sup> (<xref ref-type="bibr" rid="ref49">Hanya and Bernard, 2021</xref>), and the home range size of two groups was reported as 0.33 and 0.34&#x2009;km<sup>2</sup> (33 and 34&#x2009;ha; <xref ref-type="bibr" rid="ref55">Inoue et al., 2016</xref>).</p>
<p>We limited our analysis to recordings taken between 06:00 and 11:00 local time, as gibbons tend to restrict their calling to the early morning hours (<xref ref-type="bibr" rid="ref85">Mitani, 1985</xref>; <xref ref-type="bibr" rid="ref20">Clink et al., 2020b</xref>), which resulted in a total of over 4,500&#x2009;h of recordings for the automated detection. See <xref ref-type="bibr" rid="ref20">Clink et al. (2020b)</xref> for a detailed description of the study design and <xref rid="fig2" ref-type="fig">Figure 2</xref> for a study area map. On average, the gibbon duets at this site are 15.1&#x2009;min long (range&#x2009;=&#x2009;1.6&#x2013;55.4&#x2009;min) (<xref ref-type="bibr" rid="ref20">Clink et al., 2020b</xref>). The duets are comprised of combinations of notes emitted by both the male and female, often with silent gaps of varying duration between the different components of the duet. The variability of note types and silent intervals in the duet would make training an automated detector/classifier system to identify any component of the duet a challenge (especially in the absence of a lot of training data). In addition, focusing on a certain call type within the longer vocalization is the established approach for automated detection/classification of gibbon vocalizations (<xref ref-type="bibr" rid="ref35">Dufourq et al., 2021</xref>). Therefore, our automated detection/classification approach focused on the female great call. See <xref rid="fig3" ref-type="fig">Figure 3</xref> for a representative spectrogram of a Northern grey gibbon duet and female great calls within the duet.</p>
<fig position="float" id="fig2">
<label>Figure 2</label>
<caption>
<p>Map of recording locations of Swift autonomous recording units in Danum Valley Conservation Area, Sabah, Malaysia.</p>
</caption>
<graphic xlink:href="fevo-11-1071640-g002.tif"/>
</fig>
<fig position="float" id="fig3">
<label>Figure 3</label>
<caption>
<p>Representative spectrogram of a Northern grey gibbon duet recorded in Danum Valley Conservation Area, Sabah, Malaysia. The white bracket indicates a portion of the gibbon duet (also known as a bout), and the red boxes indicate unique great calls emitted by the gibbon female. The spectrogram was created using the Matlab-based program Triton (<xref ref-type="bibr" rid="ref125">Wiggins, 2003</xref>).</p>
</caption>
<graphic xlink:href="fevo-11-1071640-g003.tif"/>
</fig>
</sec>
<sec id="sec14">
<title>Creating a labeled training dataset</title>
<p>It is necessary to validate automated detection and classification systems using different training and test datasets (<xref ref-type="bibr" rid="ref51">Heinicke et al., 2015</xref>). We randomly chose approximately 500&#x2009;h of recordings to use for our training dataset and used a band-limited energy detector (settings described below) to identify potential sounds of interest in the gibbon frequency range, which resulted in 1,439 unique sound events. The subsequent sound events were then annotated by a single observer (DJC) using a custom-written function in R to visualize the spectrograms into the following categories: great argus pheasant (<italic>Argusianus argus</italic>) long and short calls (<xref ref-type="bibr" rid="ref18">Clink et al., 2021</xref>), helmeted hornbills (<italic>Rhinoplax vigil</italic>), rhinoceros hornbills (<italic>Buceros rhinoceros</italic>), female gibbons and a catch-all &#x201C;noise&#x201D; category. For simplicity of training the machine learning algorithms, we converted our training data into two categories: &#x201C;female gibbon&#x201D; or &#x201C;noise,&#x201D; and subsequently trained binary classifiers, although the classifiers can also deal with multi-class classification problems. The binary noise class contained all signals that were not female gibbon great calls, including great argus pheasants and hornbills. To investigate how the number of training data samples influences our system&#x2019;s performance, we randomly subset our training data into batches of 10, 20, 40, 80, 160, 320, and 400 samples of each category (female gibbon and noise) over 10 iterations each. We were also interested to see how the addition of high-quality focal recordings influenced the performance, so we added 60 female gibbon calls collected during focal recordings from previous field seasons (<xref ref-type="bibr" rid="ref17">Clink et al., 2018b</xref>) to a set of training data. We compared the performance of the detection/classification system using random iterations to the training dataset containing all the training data samples (<italic>n</italic>&#x2009;=&#x2009;1,439) and the dataset with the female calls added.</p>
</sec>
<sec id="sec15">
<title>Sound event detection</title>
<p>Detectors are commonly used to isolate potential sound events of interest from background noise (<xref ref-type="bibr" rid="ref31">Delacourt and Wellekens, 2000</xref>; <xref ref-type="bibr" rid="ref29">Davy and Godsill, 2002</xref>; <xref ref-type="bibr" rid="ref72">Lu et al., 2003</xref>). In this workflow, we identified potential sound events based on band-limited energy summation (<xref ref-type="bibr" rid="ref81">Mellinger et al., 2016</xref>). For the band-limited energy detector (BLED), we first converted the 2-h recordings to a spectrogram (made with a 1,600-point (100&#x2009;ms) Hamming window (3&#x2009;dB bandwidth&#x2009;=&#x2009;13&#x2009;Hz), with 0% overlap and a 2,048-point DFT) using the package &#x201C;seewave&#x201D; (<xref ref-type="bibr" rid="ref113">Sueur et al., 2008</xref>). We filtered the spectrogram to the frequency range of interest (in the case of Northern grey gibbons 0.5&#x2013;1.6&#x2009;kHz). For each non-overlapping time window, we calculated the sum of the energy across frequency bins, which resulted in a single value for each 100&#x2009;ms time window. We then used the &#x201C;quantile&#x201D; function in base R to calculate the threshold value for signal versus noise. We ran early experiments using different quantile values and found that using the 15th quantile gave the best recall for our signal of interest. We then considered any events which lasted for 5&#x2009;s or longer to be detections. Note that settings for the band-limited energy detector, MFCCs, and machine learning algorithms can be modified; we modified the detector and MFCC settings as independent steps in early experiments. We found in early experiments that modifying the quantile values and the duration of the detections influenced the performance of our system, so we suggest practitioners adopting this method experiment with modifying these settings to fit their system.</p>
</sec>
<sec id="sec16">
<title>Supervised classification</title>
<p>We were interested in testing the performance of secondary classifiers &#x2014;support vector machine (SVM) or random forest (RF) &#x2014; for classifying our detected sound events. To train the classifiers, we used the training datasets outlined above and calculated Mel-frequency cepstral coefficients (MFCCs) for each of the labeled sound events using the R package &#x201C;tuneR&#x201D; (<xref ref-type="bibr" rid="ref71">Ligges et al., 2016</xref>). We calculated MFCCs focusing on the fundamental frequency range of female gibbon calls (0.5&#x2013;1.6&#x2009;kHz). We focused on the fundamental frequency range because harmonics are generally not visible in the recordings unless the animals were very close to the recording units. As the duration of sound events is variable, and machine learning classification approaches require feature vectors of equal length, we averaged MFCCs over time windows. First, we divided each sound event into 8 evenly spaced time windows (with the actual length of each window varying based on the duration of the event) and calculated 12 MFCCs along with the delta coefficients for each time window (<xref ref-type="bibr" rid="ref71">Ligges et al., 2016</xref>). We appended the duration of the event onto the MFCC vector, resulting in a vector for each sound event of length 177. We then used the E1071 package (<xref ref-type="bibr" rid="ref82">Meyer et al., 2017</xref>) to train a SVM and the &#x201C;randomForest&#x201D; package (<xref ref-type="bibr" rid="ref70">Liaw and Wiener, 2002</xref>) to train a RF, respectively. Each algorithm assigned each sound event to a class (&#x201C;female gibbon&#x201D; or &#x201C;noise&#x201D;) and returned an associated probability. For SVM, we set &#x201C;cross&#x2009;=&#x2009;25,&#x201D; meaning that we used 25-fold cross-validation, set the kernel to &#x201C;radial,&#x201D; and used the &#x201C;tune&#x201D; parameter to find optimal settings for the cost and gamma parameters. For the random forest algorithm, we used the default settings apart from setting the number of trees&#x2009;=&#x2009;10,000.</p>
</sec>
<sec id="sec17">
<title>Validation and test datasets</title>
<p>We annotated our validation and test datasets using a slightly different approach than we used for the training data. We did this because our system utilizes a band-limited energy detector. If we simply labeled the resulting clips (like we did with the training data), our performance metrics would not account for the detections that were missed initially by the detector. Therefore, to create our test and validation datasets, one observer (DJC) manually annotated 48 randomly chosen hours of recordings taken from different recorders and times across our study site using spectrograms created in Raven Pro 1.6. Twenty-four hours were used for validation, and the remaining 24&#x2009;h were used as a test dataset to report the final performance metrics of the system. For each sound file, we identified the begin and end time of any female gibbon vocalization. We also labeled calls as high quality (wherein the full structure of the call was visible in the spectrogram and there were no overlapping bird calls or other background noises) or low quality (wherein the call was visible in the spectrogram, but the full structure was not, or there was overlapping with another calling animal/noise). As the detector isolates sound events based on energy in a certain frequency band, sometimes the start time of the detection does not align exactly with the annotated start time of the call; therefore, when calculating the performance metrics we considered sound events that started 4&#x2009;s before the annotations or 2&#x2009;s after the annotations to be a match.</p>
<p>We evaluated our system using five different metrics using the R package &#x2018;ROCR&#x2019; (<xref ref-type="bibr" rid="ref108">Sing et al., 2005</xref>) to calculate precision, recall, and false alarm rate. We were interested to see how the performance of our classifiers varied when we changed the probability threshold, so we calculated the area under the precision-recall curves, which shows the trade-off between the rate of false-positives and false-negatives at different probability thresholds. We calculated the area under the receiver operating characteristic curve (AUC) for each machine learning algorithm and training dataset configuration. We also calculated the F1 score, as it integrates both precision and recall information into the metric.</p>
<p>We used a model selection approach to test for the effects of training data and machine learning algorithm on our performance metrics (AUC), so we created as series of two linear models using the R package &#x201C;lme4&#x201D; (<xref ref-type="bibr" rid="ref7">Bates et al., 2017</xref>). The first model we considered, the null model, had only AUC as the outcome, with no predictor variables. The second model, which we considered the full model, contained the machine learning algorithm (SVM or RF) and training data category as predictors. We used the Akaike information criterion (AIC) to compare the fit of the two models to our data, implemented in the &#x201C;bbmle&#x201D; package (AICctab adjusted for small sample sizes; <xref ref-type="bibr" rid="ref10">Bolker, 2014</xref>). We chose the settings that maximized AUC and the F1-score for the subsequent analysis of the full dataset (described below).</p>
</sec>
<sec id="sec18">
<title>Verification workflow</title>
<p>The optimal detector/classifier settings for our two main objectives were slightly different. For our first objective, wherein we wanted to compare patterns of vocal activity based on the output of our automated detector to patterns identified using human-annotated datasets (<xref ref-type="bibr" rid="ref20">Clink et al., 2020b</xref>), we aimed to maximize recall while also maintaining an acceptable number of false positives. In early tests, we found that using a smaller quantile threshold (0.15) for the BLED detector improved recall. One observer (IK) manually verified all detections using a custom function in R that allows observers to quickly view spectrograms and verify detections. Although duet bouts contain many great calls, we considered instances where at least one great call was detected during each hour as the presence of a duet. We then compared our results to those identified using a human observer and calculated the percent of annotated duets the automated system detected. To compare the two distributions, we used a Kolmogorov&#x2013;Smirnov test implemented using the &#x2018;ks.test&#x2019; function in the R version 4.2.1 programming environment (<xref ref-type="bibr" rid="ref92">R Core Team, 2022</xref>). We first converted the times to &#x201C;Unix time&#x201D; (the number of seconds since 1970-01-01 00:00:00 UTC; <xref ref-type="bibr" rid="ref43">Grolemund and Wickham, 2011</xref>) so that we had continuous values for comparison. We used a non-parametric test as we did not assume a normal distribution of our data.</p>
<p>For the objective wherein we used unsupervised clustering to quantify the number of females (clusters) in our dataset, we needed higher quality calls in terms of signal-to-noise ratio (SNR) and overall structure. This is because the use of MFCCs as features for discriminating among individuals is highly dependent on SNR (<xref ref-type="bibr" rid="ref110">Spillmann et al., 2017</xref>). For this objective, we manually omitted all detections that did not follow the species-specific structure with longer introductory notes that transition into rapidly repeating trill notes and only used detections with a probability &#x003E;0.99 as assigned by the SVM (<xref ref-type="bibr" rid="ref15">Clink et al., 2017</xref>).</p>
</sec>
<sec id="sec19">
<title>Unsupervised clustering</title>
<p>We used unsupervised clustering to investigate the tendency to cluster in: (A) the verified detections containing true and false positives after running the detector/classifier over our entire dataset: and (B) female calls that follow the species-specific structure of the great call wherein different clusters may reflect different individuals. We used affinity propagation clustering, a state-of-the-art unsupervised approach (<xref ref-type="bibr" rid="ref33">Dueck, 2009</xref>) that has been used successfully in a few bioacoustics applications, including anomaly detection in a forest environment (<xref ref-type="bibr" rid="ref103">Sethi et al., 2020</xref>) and clustering of female gibbon calls with known identity (<xref ref-type="bibr" rid="ref22">Clink and Klinck, 2021</xref>). Our previous work showed that out of three unsupervised algorithms compared, affinity propagation clustering returned a number of clusters that matched the number of known female individuals in our dataset most closely (<xref ref-type="bibr" rid="ref22">Clink and Klinck, 2021</xref>). Input preferences for the affinity propagation clustering algorithm can vary the number of clusters returned. We initially used an adaptive approach wherein we varied the input preference from 0.1 to 1 in increments of 0.1 (indicated by &#x201C;q&#x201D; in the &#x201C;APCluster&#x201D; R package; <xref ref-type="bibr" rid="ref9">Bodenhofer et al., 2011</xref>) and calculated silhouette coefficients using the &#x201C;cluster&#x201D; package (<xref ref-type="bibr" rid="ref77">Maechler et al., 2019</xref>). We found that the optimal q identified in this manner led to an unreasonably high number of clusters for the true/false positives, so we set q&#x2009;=&#x2009;0.1, resulting in fewer clusters.</p>
<p>We input an MFCC vector for each sound event into the affinity propagation clustering algorithm. For the true/false positives, we calculated the MFCCs slightly differently than outlined above, as fewer features resulted in better clustering. Instead of creating a standardized number of time windows for each event, we calculated MFCCs for each sound event using the default settings (wintime&#x2009;=&#x2009;0.025, hoptime&#x2009;=&#x2009;0.01, and numcep&#x2009;=&#x2009;12). We then took the mean and standard deviation for each Mel-frequency band and the delta coefficients, resulting in 48 unique values for each sound event. We also included the duration of the signal. For the true and false positive detections, we used normalized mutual information (NMI) as an external validation measure implemented in the &#x2018;aricode&#x2019; package (<xref ref-type="bibr" rid="ref13">Chiquet and Rigaill, 2019</xref>). NMI provides a value between 0 and 1, with 1 indicating a perfect match between two sets of labels (<xref ref-type="bibr" rid="ref126">Xuan et al., 2010</xref>). For clustering of the high-quality female calls, we used the adaptive approach to find the optimal value of q. We used the standard number of MFCC windows approach as outlined above.</p>
<p>To visualize clustering in our dataset, we used a uniform manifold learning technique (UMAP) implemented in the R package &#x2018;umap&#x2019; (<xref ref-type="bibr" rid="ref66">Konopka, 2020</xref>). UMAP is a data reduction and visualization approach that has been used to visualize differences in forest soundscapes (<xref ref-type="bibr" rid="ref103">Sethi et al., 2020</xref>), taxonomic groups of neotropical birds (<xref ref-type="bibr" rid="ref88">Parra-Hern&#x00E1;ndez et al., 2020</xref>), and female gibbon great calls (<xref ref-type="bibr" rid="ref22">Clink and Klinck, 2021</xref>).</p>
</sec>
</sec>
<sec id="sec20">
<title>Data availability</title>
<p>A tutorial, annotated code, and all data needed to recreate figures presented in the manuscript are available on GitHub.<xref rid="fn0004" ref-type="fn"><sup>1</sup></xref> Access to raw sound files used for training and testing can be granted by request to the corresponding author.</p>
</sec>
<sec id="sec21" sec-type="results">
<title>Results</title>
<sec id="sec22">
<title>Training data and algorithm influence performance</title>
<p>The classification accuracy of SVM for the training dataset containing all samples was 98.82%, and the accuracy of the RF was 97.85%. We found that the number of training data samples and the selected machine learning algorithm substantially influenced the performance of our detector/classifier using the validation dataset (<xref rid="tab2" ref-type="table">Table 2</xref>). Using an AIC model selection approach, we found that the model with AUC as an outcome and with the machine learning algorithm and training data category as predictors performed much better than the null model (&#x0394;AICc&#x2009;=&#x2009;11.2; 100% of model weight). When using AUC as the metric, we found that SVM performed slightly better than RF, and performance normalized when the number of training samples was greater than <italic>n</italic>&#x2009;=&#x2009;160 (<xref rid="fig4" ref-type="fig">Figure 4</xref>). We also found that the model with F1 score as an outcome and machine learning algorithm and training data category as predictors performed much better than the null model (&#x0394;AICc&#x2009;=&#x2009;34,730.6; 100% of model weight; <xref rid="fig4" ref-type="fig">Figure 4</xref>). Again, SVM performed better than RF, but in this case, the training dataset that contained all the samples (<italic>n</italic>&#x2009;=&#x2009;433 female calls and <italic>n</italic>&#x2009;=&#x2009;1,006 noise events) or all the samples plus extra female calls performed better (<xref rid="fig4" ref-type="fig">Figure 4</xref>). There were noticeable differences in the performance of the two algorithms regarding F1 score across different probability thresholds (<xref rid="fig5" ref-type="fig">Figure 5</xref>). SVM had a higher performance at higher probability thresholds, whereas performance for RF had the highest F1 value when the probability threshold was 0.60. We decided to use the SVM algorithm with all the training samples for our full analysis. We used the 24-h test dataset to calculate the final performance metrics of our system. We found that the highest F1 score (0.78) was when the probability threshold was 0.90, precision was 0.88, and recall was 0.71.</p>
<table-wrap position="float" id="tab2">
<label>Table 2</label>
<caption>
<p>Summary of precision, recall, F1, and area under the curve (AUC) calculated using the validation dataset for random subsets of training data compared to the full training dataset and the full dataset augmented with female great calls.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="middle"><sans-serif>Training data</sans-serif></th>
<th align="left" valign="middle"><sans-serif>Algorithm</sans-serif></th>
<th align="center" valign="middle"><sans-serif>Precision (mean&#x2009;&#x00B1;&#x2009;sd)</sans-serif></th>
<th align="center" valign="middle"><sans-serif>Recall (mean&#x2009;&#x00B1;&#x2009;sd)</sans-serif></th>
<th align="center" valign="middle"><sans-serif>F1 (mean&#x2009;&#x00B1;&#x2009;sd)</sans-serif></th>
<th align="center" valign="middle"><sans-serif>AUC (mean&#x2009;&#x00B1;&#x2009;sd)</sans-serif></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top" rowspan="2"><italic><sans-serif>n</sans-serif></italic> <sans-serif>=&#x2009;10</sans-serif></td>
<td align="left" valign="top"><sans-serif>RF</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.96&#x2009;&#x00B1;&#x2009;0.07</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.3&#x2009;&#x00B1;&#x2009;0.18</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.45&#x2009;&#x00B1;&#x2009;0.2</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.76&#x2009;&#x00B1;&#x2009;0.01</sans-serif></td>
</tr>
<tr>
<td align="left" valign="top"><sans-serif>SVM</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.95&#x2009;&#x00B1;&#x2009;0.08</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.41&#x2009;&#x00B1;&#x2009;0.22</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.58&#x2009;&#x00B1;&#x2009;0.22</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.79&#x2009;&#x00B1;&#x2009;0.01</sans-serif></td>
</tr>
<tr>
<td align="left" valign="top" rowspan="2"><italic><sans-serif>n</sans-serif></italic> <sans-serif>=&#x2009;20</sans-serif></td>
<td align="left" valign="top"><sans-serif>RF</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.97&#x2009;&#x00B1;&#x2009;0.04</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.32&#x2009;&#x00B1;&#x2009;0.18</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.48&#x2009;&#x00B1;&#x2009;0.21</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.74&#x2009;&#x00B1;&#x2009;0.05</sans-serif></td>
</tr>
<tr>
<td align="left" valign="top"><sans-serif>SVM</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.96&#x2009;&#x00B1;&#x2009;0.08</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.43&#x2009;&#x00B1;&#x2009;0.22</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.59&#x2009;&#x00B1;&#x2009;0.22</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.73&#x2009;&#x00B1;&#x2009;0.31</sans-serif></td>
</tr>
<tr>
<td align="left" valign="top" rowspan="2"><italic><sans-serif>n</sans-serif></italic> <sans-serif>=&#x2009;40</sans-serif></td>
<td align="left" valign="top"><sans-serif>RF</sans-serif></td>
<td align="center" valign="top"><sans-serif>1&#x2009;&#x00B1;&#x2009;0.03</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.35&#x2009;&#x00B1;&#x2009;0.18</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.52&#x2009;&#x00B1;&#x2009;0.21</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.72&#x2009;&#x00B1;&#x2009;0.03</sans-serif></td>
</tr>
<tr>
<td align="left" valign="top"><sans-serif>SVM</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.96&#x2009;&#x00B1;&#x2009;0.05</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.52&#x2009;&#x00B1;&#x2009;0.22</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.68&#x2009;&#x00B1;&#x2009;0.22</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.73&#x2009;&#x00B1;&#x2009;0.05</sans-serif></td>
</tr>
<tr>
<td align="left" valign="top" rowspan="2"><italic><sans-serif>n</sans-serif></italic> <sans-serif>=&#x2009;80</sans-serif></td>
<td align="left" valign="top"><sans-serif>RF</sans-serif></td>
<td align="center" valign="top"><sans-serif>1&#x2009;&#x00B1;&#x2009;0.03</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.37&#x2009;&#x00B1;&#x2009;0.18</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.53&#x2009;&#x00B1;&#x2009;0.21</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.73&#x2009;&#x00B1;&#x2009;0.03</sans-serif></td>
</tr>
<tr>
<td align="left" valign="top"><sans-serif>SVM</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.95&#x2009;&#x00B1;&#x2009;0.05</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.59&#x2009;&#x00B1;&#x2009;0.18</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.73&#x2009;&#x00B1;&#x2009;0.17</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.77&#x2009;&#x00B1;&#x2009;0.03</sans-serif></td>
</tr>
<tr>
<td align="left" valign="top" rowspan="2"><italic><sans-serif>n</sans-serif></italic> <sans-serif>=&#x2009;160</sans-serif></td>
<td align="left" valign="top"><sans-serif>RF</sans-serif></td>
<td align="center" valign="top"><sans-serif>1&#x2009;&#x00B1;&#x2009;0.04</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.41&#x2009;&#x00B1;&#x2009;0.2</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.58&#x2009;&#x00B1;&#x2009;0.21</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.74&#x2009;&#x00B1;&#x2009;0.02</sans-serif></td>
</tr>
<tr>
<td align="left" valign="top"><sans-serif>SVM</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.94&#x2009;&#x00B1;&#x2009;0.06</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.68&#x2009;&#x00B1;&#x2009;0.14</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.77&#x2009;&#x00B1;&#x2009;0.11</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.77&#x2009;&#x00B1;&#x2009;0.01</sans-serif></td>
</tr>
<tr>
<td align="left" valign="top" rowspan="2"><italic><sans-serif>n</sans-serif></italic> <sans-serif>=&#x2009;320</sans-serif></td>
<td align="left" valign="top"><sans-serif>RF</sans-serif></td>
<td align="center" valign="top"><sans-serif>1&#x2009;&#x00B1;&#x2009;0.04</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.46&#x2009;&#x00B1;&#x2009;0.2</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.63&#x2009;&#x00B1;&#x2009;0.22</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.76&#x2009;&#x00B1;&#x2009;0.01</sans-serif></td>
</tr>
<tr>
<td align="left" valign="top"><sans-serif>SVM</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.93&#x2009;&#x00B1;&#x2009;0.06</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.71&#x2009;&#x00B1;&#x2009;0.11</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.79&#x2009;&#x00B1;&#x2009;0.08</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.81&#x2009;&#x00B1;&#x2009;0.01</sans-serif></td>
</tr>
<tr>
<td align="left" valign="top" rowspan="2"><italic><sans-serif>n</sans-serif></italic> <sans-serif>=&#x2009;400</sans-serif></td>
<td align="left" valign="top"><sans-serif>RF</sans-serif></td>
<td align="center" valign="top"><sans-serif>1&#x2009;&#x00B1;&#x2009;0.04</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.48&#x2009;&#x00B1;&#x2009;0.2</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.65&#x2009;&#x00B1;&#x2009;0.21</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.76&#x2009;&#x00B1;&#x2009;0</sans-serif></td>
</tr>
<tr>
<td align="left" valign="top"><sans-serif>SVM</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.92&#x2009;&#x00B1;&#x2009;0.06</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.71&#x2009;&#x00B1;&#x2009;0.13</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.79&#x2009;&#x00B1;&#x2009;0.11</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.82&#x2009;&#x00B1;&#x2009;0.01</sans-serif></td>
</tr>
<tr>
<td align="left" valign="top" rowspan="2"><sans-serif>All</sans-serif></td>
<td align="left" valign="top"><sans-serif>RF</sans-serif></td>
<td align="center" valign="top"><sans-serif>1&#x2009;&#x00B1;&#x2009;0.02</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.34&#x2009;&#x00B1;&#x2009;0.19</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.51&#x2009;&#x00B1;&#x2009;0.22</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.76&#x2009;&#x00B1;&#x2009;NA</sans-serif></td>
</tr>
<tr>
<td align="left" valign="top"><bold><sans-serif>SVM</sans-serif></bold></td>
<td align="center" valign="top"><bold><sans-serif>0.94&#x2009;&#x00B1;&#x2009;0.05</sans-serif></bold></td>
<td align="center" valign="top"><bold><sans-serif>0.71&#x2009;&#x00B1;&#x2009;0.12</sans-serif></bold></td>
<td align="center" valign="top"><bold><sans-serif>0.8&#x2009;&#x00B1;&#x2009;0.09</sans-serif></bold></td>
<td align="center" valign="top"><bold><sans-serif>0.83&#x2009;&#x00B1;&#x2009;NA</sans-serif></bold></td>
</tr>
<tr>
<td align="left" valign="top" rowspan="2"><sans-serif>All + F</sans-serif></td>
<td align="left" valign="top"><sans-serif>RF</sans-serif></td>
<td align="center" valign="top"><sans-serif>1&#x2009;&#x00B1;&#x2009;0.02</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.39&#x2009;&#x00B1;&#x2009;0.19</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.56&#x2009;&#x00B1;&#x2009;0.21</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.76&#x2009;&#x00B1;&#x2009;NA</sans-serif></td>
</tr>
<tr>
<td align="left" valign="top"><sans-serif>SVM</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.94&#x2009;&#x00B1;&#x2009;0.05</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.72&#x2009;&#x00B1;&#x2009;0.17</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.8&#x2009;&#x00B1;&#x2009;0.16</sans-serif></td>
<td align="center" valign="top"><sans-serif>0.83&#x2009;&#x00B1;&#x2009;NA</sans-serif></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Precision, recall, and F1 values reported are for probability thresholds &#x003E;0.50. Performance metrics were calculated using the &#x2018;ROCR&#x2019; package (<xref ref-type="bibr" rid="ref108">Sing et al., 2005</xref>). These metrics were used to determine which settings resulted in the best performance of the system. The bold indicates the best performing settings that were used for subsequent analysis of our entire dataset. </p>
</table-wrap-foot>
</table-wrap>
<fig position="float" id="fig4">
<label>Figure 4</label>
<caption>
<p>Coefficient plots from the linear model with AUC (left) or F1 score (right) as the outcome and training data category and machine learning algorithm as predictors. Using AIC, we found that both models performed substantially better than the null model. For both coefficient plots, the reference training data category is <italic>n</italic>&#x2009;=&#x2009;160. We considered predictors to be reliable if the confidence intervals did not overlap zero. For AUC (left), training data samples smaller than <italic>n</italic>&#x2009;=&#x2009;160 had a slightly negative impact on AUC, whereas a larger number of training data samples had a slightly positive impact. Note that the confidence intervals overlap zero, so these can be interpreted only as trends. The use of the SVM algorithm had a slightly positive effect on AUC. For the F1 score (right), the number of training samples had a reliable effect on the F1 score. When samples were less than <italic>n</italic>&#x2009;=&#x2009;160, the F1 score was lower. When there were more samples, the F1 score was higher. SVM also had a reliably positive effect on the F1 score.</p>
</caption>
<graphic xlink:href="fevo-11-1071640-g004.tif"/>
</fig>
<fig position="float" id="fig5">
<label>Figure 5</label>
<caption>
<p>F1-score for each machine learning algorithm (RF or SVM), probability threshold category, and training data category. Both algorithms had comparable performance in terms of F1 score, although the probability threshold with the highest F1 score differed. The dashed line indicates the highest F-1 score (0.80) for both algorithms on the validation dataset.</p>
</caption>
<graphic xlink:href="fevo-11-1071640-g005.tif"/>
</fig>
</sec>
<sec id="sec23">
<title>Comparison of an automated system to human annotations</title>
<p>We used the SVM algorithm and all training data samples to run over our full dataset resulting in 4,771 detections, of which 3,662 were true positives and 1109 were false positives (precision = 0.77). A histogram showing the distributions of automatically detected calls and manually annotated calls is shown in <xref rid="fig6" ref-type="fig">Figure 6</xref>. A Kolmogorov&#x2013;Smirnov test indicated that the two distributions were not significantly different <italic>D</italic>&#x2009;=&#x2009;0.07, <italic>p</italic>&#x2009;&#x003E;&#x2009;0.05.</p>
<fig position="float" id="fig6">
<label>Figure 6</label>
<caption>
<p>Histogram showing the number of calls detected by time using the automated system (left) and manually annotated by a human observer (right). Note that the differences in axes are due to the detections for the automated system being at the call level, whereas the annotations were at the bout level (and bouts are comprised of multiple calls). There was no statistically significant difference between the two distributions (see text for details).</p>
</caption>
<graphic xlink:href="fevo-11-1071640-g006.tif"/>
</fig>
</sec>
<sec id="sec24">
<title>Unsupervised clustering</title>
<p>We used unsupervised clustering to investigate the tendency to cluster in true/false positives and high-quality female calls. For our first aim, we used affinity propagation clustering to differentiate between true and false positives after we used our detection/classification system. We did not find that affinity propagation clustering effectively separated false positives, as the NMI score was close to zero (NMI&#x2009;=&#x2009;0.03). Although there were only two classes in our dataset (true and false positives), the clustering results indicated 53 distinct clusters. Supervised classification accuracy using SVM for true and false positives was ~95%. UMAP projections of the true and false positive detections are shown in <xref rid="fig7" ref-type="fig">Figure 7</xref>. For our second aim, we used affinity propagation clustering to investigate the tendency to cluster in the high-quality female calls detected by our system (<italic>n</italic>&#x2009;=&#x2009;194). Using adaptive affinity propagation clustering, we found that setting <italic>q</italic>&#x2009;=&#x2009;0.2 resulted in the highest silhouette coefficient (0.18) and returned ten unique clusters. UMAP projections of female calls are shown in <xref rid="fig8" ref-type="fig">Figure 8</xref>. Histograms indicating the number of calls from each recorder assigned to each cluster by the affinity propagation algorithm are shown in <xref rid="fig9" ref-type="fig">Figure 9</xref>.</p>
<fig position="float" id="fig7">
<label>Figure 7</label>
<caption>
<p>UMAP projections indicating validated detections (left) and cluster assignment by affinity propagation clustering (right). Each point represents a detection, and the colors in the plot on the left indicate whether the detection was a true (T; indicated by the blue triangles) or false (F; indicated by the orange circles) positive. The colors in the plot on the right indicate which of the 53 clusters returned by the affinity propagation clustering algorithm the detection was assigned to.</p>
</caption>
<graphic xlink:href="fevo-11-1071640-g007.tif"/>
</fig>
<fig position="float" id="fig8">
<label>Figure 8</label>
<caption>
<p>UMAP projections of Northern grey gibbon female calls. The location of each spectrogram indicates the UMAP projection of the call, and the border color indicates cluster assignment by affinity propagation clustering.</p>
</caption>
<graphic xlink:href="fevo-11-1071640-g008.tif"/>
</fig>
<fig position="float" id="fig9">
<label>Figure 9</label>
<caption>
<p>Histograms showing the number of calls assigned to each cluster by the affinity propagation algorithm. Each panel indicates one of the clusters as assigned by affinity propagation clustering, the <italic>x</italic>-axis indicates the associated recording unit where the call was detected, and the <italic>y</italic>-axis indicates the number of calls for each cluster and recorder. The spectrograms shown exemplify each cluster assigned by the affinity propagation clustering algorithm.</p>
</caption>
<graphic xlink:href="fevo-11-1071640-g009.tif"/>
</fig>
</sec>
</sec>
<sec id="sec25" sec-type="discussions">
<title>Discussion</title>
<p>We show that using open-source R packages, a detector and classifier can be developed with an acceptable performance that exceeds that of previously published automated detector/classifiers for primate calls [e.g., Diana monkey F1 based on reported precision and recall&#x2009;=&#x2009;65.62 (<xref ref-type="bibr" rid="ref51">Heinicke et al., 2015</xref>)]. However, the performance of this system (maximum F1 score&#x2009;=&#x2009;0.78) was below some reported deep learning approaches [e.g., F1 score&#x2009;=&#x2009;90.55 for Hainan gibbons (<xref ref-type="bibr" rid="ref35">Dufourq et al., 2021</xref>), F1 score&#x2009;=&#x2009;~87.5 for owl species (<xref ref-type="bibr" rid="ref97">Ruff et al., 2021</xref>), F1 score&#x2009;=&#x2009;87.0 for bats (<xref ref-type="bibr" rid="ref107">Silva et al., 2022</xref>)]. In addition, we found that temporal patterns of calling based on our automated system matched those of the human annotation approach. We also tested whether using an unsupervised approach (affinity propagation clustering) could help further distinguish true and false positive detections but found that the clustering results (<italic>n&#x2009;=</italic>&#x2009;53 clusters) did not differentiate true and false positives in any meaningful way. Visual inspection of the false positives indicated that many of them were overlapping with great argus pheasants, or were other parts of the gibbon duet or solo. A majority of the false positives were male solo phrases, and these vocalization types contain rapidly repeating notes in the same frequency range as the female gibbon call. Lastly, we applied unsupervised clustering to a reduced dataset of validated detections of female calls that followed the species-specific call structure and found evidence for ten unique clusters. Inspection of spatial patterns of distribution of the clusters indicates that the clusters do not correlate with female identity.</p>
<sec id="sec26">
<title>Calls versus bouts</title>
<p>Our analysis focused on one call type within the gibbon duet: the female great call. We did this for practical reasons, as female calls tend to be stereotyped and follow a species- and sex-specific pattern. In addition, females rarely call alone, which means the presence of the female call can be used to infer the presence of a pair of gibbons. Also, most acoustic survey methods focus on the duet for the reasons described above, and generally, only data on the presence or absence of a duet bout at a particular time and location are needed (<xref ref-type="bibr" rid="ref11">Brockelman and Srikosamatara, 1993</xref>; <xref ref-type="bibr" rid="ref64">Kidney et al., 2016</xref>). When calculating the performance of our automated detection/classification system, we focused on the level of the call, as this is a common way to evaluate the performance of automated systems (<xref ref-type="bibr" rid="ref36">Dufourq et al., 2020</xref>). Finally, when comparing temporal patterns of calling behavior, we compared to an existing dataset of annotations at the level of the duet bout. We did this because annotating duet bouts using LTSAs is much more efficient than annotating each individual call for the entire dataset. However, for certain applications such as individual vocal signatures, the analysis necessarily focuses on individual calls within a bout.</p>
</sec>
<sec id="sec27">
<title>Comparison of ML algorithms</title>
<p>We found that SVM performed slightly better than RF in most metrics reported (except precision). However, RF had a comparable classification accuracy to SVM on the training dataset (SVM accuracy&#x2009;=&#x2009;98.82% and RF accuracy&#x2009;=&#x2009;97.85% for all training data samples). This reduced performance can be attributed to the substantially lower recall of RF relative to SVM, despite RF having higher precision in many cases (data summarized in <xref rid="tab2" ref-type="table">Table 2</xref>). The precision of SVM decreased slightly as we increased the number of training samples, which may be due to increased variability in the training data samples that influenced the algorithm&#x2019;s precision. We did not see that the precision of RF decreased with an increased number of training samples, but RF recall remained low regardless of the amount of training data. These patterns are reflected in differences in the F1 scores across probabilities for both algorithms.</p>
<p>The tolerable number of false positives, or the minimum tolerable recall of the system, will depend heavily on the research question. For example, when modeling occupancy, it may be important that no calls are missed, and hence, a higher recall would be desirable. But, for studies that focus more on the behavioral ecology of the calling animals (<xref ref-type="bibr" rid="ref19">Clink et al., 2020a</xref>,<xref ref-type="bibr" rid="ref20">b</xref>), it may be important for the detector to identify calls with a low amount of false positives but less important if the detector misses many low signal-to-noise calls. Therefore, in some cases where high precision is desired but recall is less important, RF may be a better choice. It is also possible that tuning the RF (as we did with SVM) may result in better performance. However, we did not do this as it is generally agreed that RF works well using default values of the hyperparameters (<xref ref-type="bibr" rid="ref91">Probst et al., 2019</xref>).</p>
</sec>
<sec id="sec28">
<title>Influence of training data</title>
<p>We found that the AUC and F1 metrics normalized when using 160 samples of training data or more for each of the two classes (gibbon female and noise). However, using all training data or data augmented with female calls resulted in better F1 scores. The training datasets that contained all the samples and added females were unbalanced and contained many more noise samples than female calls. Including more diverse noise samples lead to better performance in this system, and both RF and SVM handle unbalanced datasets effectively. It is important to note that although we found performance normalized when training with 160 calls or more, this number does not account for the additional number of calls needed for validation and training. Therefore, the total number of calls or observations to effectively train and subsequently evaluate the performance of the system will be &#x003E;160 calls. We realize that compiling a dataset of 160 or more calls for rare sound events from elusive species may be unrealistic. We found that our in our system including as few as 40 calls allowed for acceptable performance (F1 score for SVM&#x2009;=&#x2009;0.70), so the approach could be potentially used successfully with a much smaller training dataset.</p>
<p>In addition, our training, validation, and test datasets came from different recording units, times of day, and multiple territories of different gibbon groups. Including 40 calls from the same recorder and same individual would presumably not be as effective as including calls from different individuals and recording locations. A full discussion of the effective preparation of datasets for machine learning is out of the scope of the present paper, but readers are urged to think carefully about the preparation of acoustic datasets for automated detection and aim to include samples from a diverse number of recording locations, individuals and time of day. Transfer learning which utilizes pre-trained convolutional neural networks for different classification problems than the model was originally trained, provides another alternative for small datasets, with transfer learning providing up to an 82% F1 score with small datasets (<xref ref-type="bibr" rid="ref34">Dufourq et al., 2022</xref>). Future work that compares the approach presented here with transfer learning will be highly informative.</p>
</sec>
<sec id="sec29">
<title>Unsupervised clustering to distinguish true/false positives</title>
<p>We did not find that affinity propagation clustering helped further differentiate true and false positives in our dataset, despite being able to differentiate between the two classes using supervised methods with ~95% accuracy. As noted above, many of the false positives were phrases from male solos, and these phrases are highly variable in note order and note sequence (<xref ref-type="bibr" rid="ref19">Clink et al., 2020a</xref>), which may have led to the high number of clusters observed. The NMI score was close to zero, indicating a lack of accordance between the unsupervised cluster assignments and the true labels. These types of unsupervised approaches have been fruitful in distinguishing among many different types of acoustic signals, including soundscapes (<xref ref-type="bibr" rid="ref103">Sethi et al., 2020</xref>), bird species (<xref ref-type="bibr" rid="ref88">Parra-Hern&#x00E1;ndez et al., 2020</xref>), and gibbon individuals (<xref ref-type="bibr" rid="ref22">Clink and Klinck, 2021</xref>). We extracted MFCCs for all sound events focusing on the relevant frequency range for female gibbon great calls. As detections were based on band-limited energy summation in this frequency range, extracting MFCCs in this frequency range was a logical choice. We did early experiments where we summarized the extracted MFCCs in different ways and slightly modified the frequency range. We did not find that these early experiments led to better separation of true and false positives. Therefore, we conclude that the use of MFCCs and affinity propagation clustering is not an effective way to differentiate between true and false positives in our dataset. It is possible that using different features may have led to different results, and embeddings from convolutional neural networks as features (e.g., <xref ref-type="bibr" rid="ref103">Sethi et al., 2020</xref>) or the use of low dimentional latent space projections learned from the spectrograms (<xref ref-type="bibr" rid="ref99">Sainburg et al., 2020</xref>) are promising future directions.</p>
</sec>
<sec id="sec30">
<title>Unsupervised clustering of validated gibbon female calls</title>
<p>The ability to distinguish between individuals based on their vocalizations is important for many different PAM applications, and population density estimation in particular (<xref ref-type="bibr" rid="ref3">Augustine et al., 2018</xref>, <xref ref-type="bibr" rid="ref4">2019</xref>). The home range size of two gibbon pairs in our population was previously reported to be about 0.34&#x2009;km<sup>2</sup> (34&#x2009;ha; <xref ref-type="bibr" rid="ref55">Inoue et al., 2016</xref>), but within gibbon populations, the home range size can vary substantially (<xref ref-type="bibr" rid="ref12">Cheyne et al., 2019</xref>), making it difficult to know exactly how many pairs were included in our study area. In another study, gibbon group density was reported as 4.7 groups per km<sup>2</sup>; the discrepancy between this value and home range size estimates provided by <xref ref-type="bibr" rid="ref55">Inoue et al. (2016)</xref> is presumably due to the fact that the studies were measuring different parameters (density vs. home range) and the fact that home ranges can overlap, even in territorial species. Therefore, based on conservative estimates of gibbon density and home range size, up to 12 pairs may occur in our 3&#x2009;km<sup>2</sup> study area.</p>
<p>Our unsupervised approach using affinity propagation clustering on high-quality female calls returned ten unique clusters. We showed that affinity propagation clustering consistently returned a similar number of clusters to the actual number of individuals in a different dataset (<xref ref-type="bibr" rid="ref22">Clink and Klinck, 2021</xref>). However, an inspection of the histograms in <xref rid="fig9" ref-type="fig">Figure 9</xref> shows that some clusters appear to have strong spatial patterns (e.g., only appearing on a few recorders in close spatial proximity), whereas others appear on many recorders. In some cases, the same clusters appear on recorders that are &#x003E;1.5&#x2009;km apart &#x2014; a presumably larger distance than the width of a gibbon home range &#x2014; therefore, it seems unlikely that these clusters are associated with female identity. When using unsupervised approaches, it is common practice to assign each cluster to the class that contains the highest number of observations, and we showed affinity propagation clustering reliably returned a number of clusters that matched the number of individuals in the dataset, but often &#x2018;misclassified&#x2019; calls to the wrong cluster/individual (<xref ref-type="bibr" rid="ref22">Clink and Klinck, 2021</xref>). Importantly, our previous work was done on high-quality, focal recordings with a substantial amount of preprocessing to ensure the calls were comparable (e.g., did not contain shorter introductory notes or overlap with the male). In the present study, we manually screened calls to ensure they followed the species-specific structure and were relatively high-quality, but the limitations of PAM data (collected using an omnidirectional, relatively inexpensive microphone, and at variable distances to the calling animals) may preclude effective unsupervised clustering of individuals.</p>
<p>We conclude that more work needs to be done before we can reliably use unsupervised methods to estimate the number of individuals in a study area. Our current ability to utilize these approaches to return the number of individuals reliably is presently limited, especially because there is not a lot of information regarding the stability of individual signatures over time; but see (<xref ref-type="bibr" rid="ref38">Feng et al., 2014</xref>). Future work that utilizes labeled training datasets collected using PAM data to train classifiers that can subsequently predict new individuals (e.g., an approach similar to that presented in; <xref ref-type="bibr" rid="ref98">Sadhukhan et al., 2021</xref>) will help further our ability to identify unknown individuals from PAM data.</p>
</sec>
<sec id="sec31">
<title>Generalizability of the system</title>
<p>Gibbon female calls are well-suited for automated detection and classification as they are loud and highly stereotyped, and gibbon females tend to call often. During a particular calling bout, they emit multiple calls, allowing for ample training data. Although gibbon female calls are individually distinct (<xref ref-type="bibr" rid="ref15">Clink et al., 2017</xref>, <xref ref-type="bibr" rid="ref16">2018a</xref>), the differences between individuals were not sufficient to preclude detection and classification using our system. Importantly, the fact that gibbon female calls tend to be of longer duration (&#x003E; 6-s) than many other signals in the frequency range meant that the duration of the signal could be used as an effective metric to reject nonrelevant signals. The generalizability of our methods to other systems/datasets will depend on a variety of conditions, in particular, the signal-to-noise ratio of the call(s) of interest, type and variability of background noise, the amount of stereotypy in the calls of interest, and the amount of training data that can be obtained to train the system. Future applications that apply this approach to other gibbon species, or compare this approach with deep learning techniques, will be important next steps to determine the utility and effectiveness of automated detection approaches for other taxa.</p>
</sec>
<sec id="sec32">
<title>Future directions</title>
<p>Due to the three-step design of our automated detection, classification, and unsupervised clustering approach, modifying the system at various stages should be relatively straightforward. In particular, using MFCCs as features was a logical approach given how well MFCCs work to distinguish among gibbon calls [this paper and <xref ref-type="bibr" rid="ref16">Clink et al. (2018a)</xref>]. However, it is possible that using different types of feature sets may result in even better performance of the automated system. As mentioned above, the use of embeddings from pre-trained convolutional neural networks is a possibility. In addition, the supervised classification algorithms included in our approach were not optimized; the RF algorithm, in particular, was implemented using the default values set by the algorithm developers. Therefore, further tuning and optimization of the algorithms may also influence the performance. Lastly, this approach was developed using training, validation, and test data from one site (Danum Valley Conservation Area). Future work investigating the performance of this system in other locations with (presumably) different types of ambient noise will be informative.</p>
</sec>
</sec>
<sec id="sec33" sec-type="conclusions">
<title>Conclusion</title>
<p>Here we highlight how the open-source R-programming environment can be used to process and visualize acoustic data collected using autonomous recorders that are often programmed to record continuously for long periods of time. Even the most sophisticated machine learning algorithms are never 100% accurate or precise and will return false positives or negatives (<xref ref-type="bibr" rid="ref6">Bardeli et al., 2010</xref>; <xref ref-type="bibr" rid="ref51">Heinicke et al., 2015</xref>; <xref ref-type="bibr" rid="ref62">Keen et al., 2017</xref>), which is also the case with human observers, but this is rarely quantified statistically (<xref ref-type="bibr" rid="ref51">Heinicke et al., 2015</xref>). We hope this relatively simple automated detection/classification approach will serve as a useful foundation for practitioners interested in automated acoustic analysis methods. We also show that unsupervised approaches need further work and refinement before they can be reliably used to distinguish between different data classes recorded using autonomous recording units. Given the importance of being able to distinguish among individuals for numerous types of PAM applications, this should be a high-priority area for future research.</p>
</sec>
<sec id="sec34" sec-type="data-availability">
<title>Data availability statement</title>
<p>The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.</p>
</sec>
<sec id="sec35">
<title>Ethics statement</title>
<p>Institutional approval was provided by Cornell University (IACUC 2017-0098).</p>
</sec>
<sec id="sec36">
<title>Author contributions</title>
<p>DC, AA, and HK conceived the ideas and designed the methodology. DC and IK annotated and validated data. DC and IK led the writing of the manuscript. All authors contributed critically to the drafts and gave final approval for publication.</p>
</sec>
<sec id="sec37" sec-type="funding-information">
<title>Funding</title>
<p>DC acknowledges the Fulbright ASEAN Research Award for U.S. Scholars for providing funding for the field research.</p>
</sec>
<sec id="conf1" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="sec100" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ack>
<p>The authors thank the two reviewers who provided valuable feedback that greatly improved the manuscript. We thank the makers of the packages &#x201C;randomforest,&#x201D; &#x201C;e1017,&#x201D; &#x201C;seewave,&#x201D; &#x201C;signal,&#x201D; and &#x201C;tuneR,&#x201D; on which this workflow relies extensively. We thank Yoel Majikil for his assistance with data collection for this project.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="ref1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anders</surname> <given-names>F.</given-names></name> <name><surname>Kalan</surname> <given-names>A. K.</given-names></name> <name><surname>K&#x00FC;hl</surname> <given-names>H. S.</given-names></name> <name><surname>Fuchs</surname> <given-names>M.</given-names></name></person-group> (<year>2021</year>). <article-title>Compensating class imbalance for acoustic chimpanzee detection with convolutional recurrent neural networks</article-title>. <source>Eco. Inform.</source> <volume>65</volume>:<fpage>101423</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.ecoinf.2021.101423</pub-id></citation></ref>
<ref id="ref2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Araya-Salas</surname> <given-names>M.</given-names></name> <name><surname>Smith-Vidaurre</surname> <given-names>G.</given-names></name></person-group> (<year>2017</year>). <article-title>warbleR: an R package to streamline analysis of animal acoustic signals</article-title>. <source>Methods Ecol. Evol.</source> <volume>8</volume>, <fpage>184</fpage>&#x2013;<lpage>191</lpage>. doi: <pub-id pub-id-type="doi">10.1111/2041-210X.12624</pub-id></citation></ref>
<ref id="ref3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Augustine</surname> <given-names>B. C.</given-names></name> <name><surname>Royle</surname> <given-names>J. A.</given-names></name> <name><surname>Kelly</surname> <given-names>M. J.</given-names></name> <name><surname>Satter</surname> <given-names>C. B.</given-names></name> <name><surname>Alonso</surname> <given-names>R. S.</given-names></name> <name><surname>Boydston</surname> <given-names>E. E.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Spatial capture&#x2013;recapture with partial identity: an application to camera traps</article-title>. <source>Ann. Appl. Stat.</source> <volume>12</volume>, <fpage>67</fpage>&#x2013;<lpage>95</lpage>. doi: <pub-id pub-id-type="doi">10.1214/17-AOAS1091</pub-id></citation></ref>
<ref id="ref4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Augustine</surname> <given-names>B. C.</given-names></name> <name><surname>Royle</surname> <given-names>J. A.</given-names></name> <name><surname>Murphy</surname> <given-names>S. M.</given-names></name> <name><surname>Chandler</surname> <given-names>R. B.</given-names></name> <name><surname>Cox</surname> <given-names>J. J.</given-names></name> <name><surname>Kelly</surname> <given-names>M. J.</given-names></name></person-group> (<year>2019</year>). <article-title>Spatial capture&#x2013;recapture for categorically marked populations with an application to genetic capture&#x2013;recapture</article-title>. <source>Ecosphere</source> <volume>10</volume>:<fpage>e02627</fpage>. doi: <pub-id pub-id-type="doi">10.1002/ecs2.2627</pub-id></citation></ref>
<ref id="ref5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Balantic</surname> <given-names>C.</given-names></name> <name><surname>Donovan</surname> <given-names>T.</given-names></name></person-group> (<year>2020</year>). <article-title>AMMonitor: remote monitoring of biodiversity in an adaptive framework with r</article-title>. <source>Methods Ecol. Evol.</source> <volume>11</volume>, <fpage>869</fpage>&#x2013;<lpage>877</lpage>. doi: <pub-id pub-id-type="doi">10.1111/2041-210X.13397</pub-id></citation></ref>
<ref id="ref6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bardeli</surname> <given-names>R.</given-names></name> <name><surname>Wolff</surname> <given-names>D.</given-names></name> <name><surname>Kurth</surname> <given-names>F.</given-names></name> <name><surname>Koch</surname> <given-names>M.</given-names></name> <name><surname>Tauchert</surname> <given-names>K. H.</given-names></name> <name><surname>Frommolt</surname> <given-names>K. H.</given-names></name></person-group> (<year>2010</year>). <article-title>Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring</article-title>. <source>Pattern Recogn. Lett.</source> <volume>31</volume>, <fpage>1524</fpage>&#x2013;<lpage>1534</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.patrec.2009.09.014</pub-id></citation></ref>
<ref id="ref7"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Bates</surname> <given-names>D.</given-names></name> <name><surname>Maechler</surname> <given-names>M.</given-names></name> <name><surname>Bolker</surname> <given-names>B. M.</given-names></name> <name><surname>Walker</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). lme4: Linear Mixed-Effects Models Using &#x201C;Eigen&#x201D; and S4. R package version 1.1-13. Available at: <ext-link xlink:href="http://keziamanlove.com/wp-content/uploads/2015/04/StatsInRTutorial.pdf" ext-link-type="uri">http://keziamanlove.com/wp-content/uploads/2015/04/StatsInRTutorial.pdf</ext-link> <ext-link xlink:href="https://cran.r-project.org/web/packages/lme4/lme4.pdf" ext-link-type="uri">https://cran.r-project.org/web/packages/lme4/lme4.pdf</ext-link> (Accessed January 23, 2023).</citation></ref>
<ref id="ref8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bjorck</surname> <given-names>J.</given-names></name> <name><surname>Rappazzo</surname> <given-names>B. H.</given-names></name> <name><surname>Chen</surname> <given-names>D.</given-names></name> <name><surname>Bernstein</surname> <given-names>R.</given-names></name> <name><surname>Wrege</surname> <given-names>P. H.</given-names></name> <name><surname>Gomes</surname> <given-names>C. P.</given-names></name></person-group> (<year>2019</year>). <article-title>Automatic detection and compression for passive acoustic monitoring of the african forest elephant</article-title>. <source>Proc. AAAI Conf. Artific. Intellig.</source> <volume>33</volume>, <fpage>476</fpage>&#x2013;<lpage>484</lpage>. doi: <pub-id pub-id-type="doi">10.1609/aaai.v33i01.3301476</pub-id></citation></ref>
<ref id="ref9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bodenhofer</surname> <given-names>U.</given-names></name> <name><surname>Kothmeier</surname> <given-names>A.</given-names></name> <name><surname>Hochreiter</surname> <given-names>S.</given-names></name></person-group> (<year>2011</year>). <article-title>APCluster: an R package for affinity propagation clustering</article-title>. <source>Bioinformatics</source> <volume>27</volume>, <fpage>2463</fpage>&#x2013;<lpage>2464</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btr406</pub-id>, PMID: <pub-id pub-id-type="pmid">21737437</pub-id></citation></ref>
<ref id="ref10"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Bolker</surname> <given-names>B. M.</given-names></name></person-group> (<year>2014</year>). bbmle: tools for general maximum likelihood estimation. Available at: <ext-link xlink:href="http://cran.stat.sfu.ca/web/packages/bbmle/" ext-link-type="uri">http://cran.stat.sfu.ca/web/packages/bbmle/</ext-link> (Accessed January 23, 2023).</citation></ref>
<ref id="ref11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brockelman</surname> <given-names>W. Y.</given-names></name> <name><surname>Srikosamatara</surname> <given-names>S.</given-names></name></person-group> (<year>1993</year>). <article-title>Estimation of density of gibbon groups by use of loud songs</article-title>. <source>Am. J. Primatol.</source> <volume>29</volume>, <fpage>93</fpage>&#x2013;<lpage>108</lpage>. doi: <pub-id pub-id-type="doi">10.1002/ajp.1350290203</pub-id>, PMID: <pub-id pub-id-type="pmid">31941194</pub-id></citation></ref>
<ref id="ref12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheyne</surname> <given-names>S. M.</given-names></name> <name><surname>Capilla</surname> <given-names>B. R.</given-names></name> <name><surname>K</surname> <given-names>A.</given-names></name> <name><surname>Supiansyah</surname></name> <name><surname>Adul</surname></name> <name><surname>Cahyaningrum</surname> <given-names>E.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Home range variation and site fidelity of Bornean southern gibbons [<italic>Hylobates albibarbis</italic>] from 2010&#x2013;2018</article-title>. <source>PLoS One</source> <volume>14</volume>, <fpage>e0217784</fpage>&#x2013;<lpage>e0217713</lpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0217784</pub-id>, PMID: <pub-id pub-id-type="pmid">31365525</pub-id></citation></ref>
<ref id="ref13"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Chiquet</surname> <given-names>J.</given-names></name> <name><surname>Rigaill</surname> <given-names>G.</given-names></name></person-group> (<year>2019</year>). Aricode: efficient computations of standard clustering comparison measures. Available at: <ext-link xlink:href="https://cran.r-project.org/package=aricode" ext-link-type="uri">https://cran.r-project.org/package=aricode</ext-link> (Accessed January 23, 2023).</citation></ref>
<ref id="ref14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clarke</surname> <given-names>E.</given-names></name> <name><surname>Reichard</surname> <given-names>U. H.</given-names></name> <name><surname>Zuberb&#x00FC;hler</surname> <given-names>K.</given-names></name></person-group> (<year>2006</year>). <article-title>The syntax and meaning of wild gibbon songs</article-title>. <source>PLoS One</source> <volume>1</volume>:<fpage>e73</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0000073</pub-id>, PMID: <pub-id pub-id-type="pmid">17183705</pub-id></citation></ref>
<ref id="ref15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clink</surname> <given-names>D. J.</given-names></name> <name><surname>Bernard</surname> <given-names>H.</given-names></name> <name><surname>Crofoot</surname> <given-names>M. C.</given-names></name> <name><surname>Marshall</surname> <given-names>A. J.</given-names></name></person-group> (<year>2017</year>). <article-title>Investigating individual vocal signatures and small-scale patterns of geographic variation in female bornean gibbon (<italic>Hylobates muelleri</italic>) great calls</article-title>. <source>Int. J. Primatol.</source> <volume>38</volume>, <fpage>656</fpage>&#x2013;<lpage>671</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s10764-017-9972-y</pub-id></citation></ref>
<ref id="ref16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clink</surname> <given-names>D. J.</given-names></name> <name><surname>Crofoot</surname> <given-names>M. C.</given-names></name> <name><surname>Marshall</surname> <given-names>A. J.</given-names></name></person-group> (<year>2018a</year>). <article-title>Application of a semi-automated vocal fingerprinting approach to monitor Bornean gibbon females in an experimentally fragmented landscape in Sabah, Malaysia</article-title>. <source>Bioacoustics</source> <volume>28</volume>, <fpage>193</fpage>&#x2013;<lpage>209</lpage>. doi: <pub-id pub-id-type="doi">10.1080/09524622.2018.1426042</pub-id></citation></ref>
<ref id="ref17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clink</surname> <given-names>D. J.</given-names></name> <name><surname>Grote</surname> <given-names>M. N.</given-names></name> <name><surname>Crofoot</surname> <given-names>M. C.</given-names></name> <name><surname>Marshall</surname> <given-names>A. J.</given-names></name></person-group> (<year>2018b</year>). <article-title>Understanding sources of variance and correlation among features of Bornean gibbon (<italic>Hylobates muelleri</italic>) female calls</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>144</volume>, <fpage>698</fpage>&#x2013;<lpage>708</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.5049578</pub-id>, PMID: <pub-id pub-id-type="pmid">30180677</pub-id></citation></ref>
<ref id="ref18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clink</surname> <given-names>D. J.</given-names></name> <name><surname>Groves</surname> <given-names>T.</given-names></name> <name><surname>Ahmad</surname> <given-names>A. H.</given-names></name> <name><surname>Klinck</surname> <given-names>H.</given-names></name></person-group> (<year>2021</year>). <article-title>Not by the light of the moon: investigating circadian rhythms and environmental predictors of calling in Bornean great argus</article-title>. <source>PLoS One</source> <volume>16</volume>:<fpage>e0246564</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0246564</pub-id>, PMID: <pub-id pub-id-type="pmid">33592004</pub-id></citation></ref>
<ref id="ref19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clink</surname> <given-names>D. J.</given-names></name> <name><surname>Hamid Ahmad</surname> <given-names>A.</given-names></name> <name><surname>Klinck</surname> <given-names>H.</given-names></name></person-group> (<year>2020a</year>). <article-title>Brevity is not a universal in animal communication: evidence for compression depends on the unit of analysis in small ape vocalizations</article-title>. <source>R. Soc. Open Sci.</source> <volume>7</volume>. doi: <pub-id pub-id-type="doi">10.1098/rsos.200151</pub-id></citation></ref>
<ref id="ref20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clink</surname> <given-names>D. J.</given-names></name> <name><surname>Hamid Ahmad</surname> <given-names>A.</given-names></name> <name><surname>Klinck</surname> <given-names>H.</given-names></name></person-group> (<year>2020b</year>). <article-title>Gibbons aren&#x2019;t singing in the rain: presence and amount of rainfall influences ape calling behavior in Sabah, Malaysia</article-title>. <source>Sci. Rep.</source> <volume>10</volume>:<fpage>1282</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-020-57976-x</pub-id>, PMID: <pub-id pub-id-type="pmid">31992788</pub-id></citation></ref>
<ref id="ref21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clink</surname> <given-names>D. J.</given-names></name> <name><surname>Klinck</surname> <given-names>H.</given-names></name></person-group> (<year>2019</year>). <article-title>A case study on Bornean gibbons highlights the challenges for incorporating individual identity into passive acoustic monitoring surveys</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>146</volume>:<fpage>2855</fpage>. doi: <pub-id pub-id-type="doi">10.1121/1.5136908</pub-id></citation></ref>
<ref id="ref22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clink</surname> <given-names>D. J.</given-names></name> <name><surname>Klinck</surname> <given-names>H.</given-names></name></person-group> (<year>2021</year>). <article-title>Unsupervised acoustic classification of individual gibbon females and the implications for passive acoustic monitoring</article-title>. <source>Methods Ecol. Evol.</source> <volume>12</volume>, <fpage>328</fpage>&#x2013;<lpage>341</lpage>. doi: <pub-id pub-id-type="doi">10.1111/2041-210X.13520</pub-id></citation></ref>
<ref id="ref23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cowlishaw</surname> <given-names>G.</given-names></name></person-group> (<year>1992</year>). <article-title>Song function in gibbons</article-title>. <source>Behaviour</source> <volume>121</volume>, <fpage>131</fpage>&#x2013;<lpage>153</lpage>. doi: <pub-id pub-id-type="doi">10.1163/156853992X00471</pub-id>, PMID: <pub-id pub-id-type="pmid">36195571</pub-id></citation></ref>
<ref id="ref24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cowlishaw</surname> <given-names>G.</given-names></name></person-group> (<year>1996</year>). <article-title>Sexual selection and information content in gibbon song bouts</article-title>. <source>Ethology</source> <volume>102</volume>, <fpage>272</fpage>&#x2013;<lpage>284</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1439-0310.1996.tb01125.x</pub-id></citation></ref>
<ref id="ref25"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Dahake</surname> <given-names>P. P.</given-names></name> <name><surname>Shaw</surname> <given-names>K.</given-names></name></person-group> (<year>2016</year>). <article-title>Speaker dependent speech emotion recognition using MFCC and support vector machine</article-title>. in <conf-name>International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT)</conf-name>, <fpage>1080</fpage>&#x2013;<lpage>1084</lpage>.</citation></ref>
<ref id="ref26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Darden</surname> <given-names>S. K.</given-names></name> <name><surname>Dabelsteen</surname> <given-names>T.</given-names></name> <name><surname>Pedersen</surname> <given-names>S. B.</given-names></name></person-group> (<year>2003</year>). <article-title>A potential tool for swift fox (<italic>Vulpes velox</italic>) conservation: individuality of long-range barking sequences</article-title>. <source>J. Mammal.</source> <volume>84</volume>, <fpage>1417</fpage>&#x2013;<lpage>1427</lpage>. doi: <pub-id pub-id-type="doi">10.1644/BEM-031</pub-id></citation></ref>
<ref id="ref27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Darras</surname> <given-names>K.</given-names></name> <name><surname>Furnas</surname> <given-names>B.</given-names></name> <name><surname>Fitriawan</surname> <given-names>I.</given-names></name> <name><surname>Mulyani</surname> <given-names>Y.</given-names></name> <name><surname>Tscharntke</surname> <given-names>T.</given-names></name></person-group> (<year>2018</year>). <article-title>Estimating bird detection distances in sound recordings for standardizing detection ranges and distance sampling</article-title>. <source>Methods Ecol. Evol.</source> <volume>9</volume>, <fpage>1928</fpage>&#x2013;<lpage>1938</lpage>. doi: <pub-id pub-id-type="doi">10.1111/2041-210X.13031</pub-id></citation></ref>
<ref id="ref28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Darras</surname> <given-names>K.</given-names></name> <name><surname>P&#x00FC;tz</surname> <given-names>P.</given-names></name> <name><surname>Rembold</surname> <given-names>K.</given-names></name> <name><surname>Tscharntke</surname> <given-names>T.</given-names></name></person-group> (<year>2016</year>). <article-title>Measuring sound detection spaces for acoustic animal sampling and monitoring</article-title>. <source>Biol. Conserv.</source> <volume>201</volume>, <fpage>29</fpage>&#x2013;<lpage>37</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.biocon.2016.06.021</pub-id>, PMID: <pub-id pub-id-type="pmid">25551564</pub-id></citation></ref>
<ref id="ref29"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Davy</surname> <given-names>M.</given-names></name> <name><surname>Godsill</surname> <given-names>S.</given-names></name></person-group> (<year>2002</year>). <article-title>Detection of abrupt spectral changes using support vector machines an application to audio signal segmentation</article-title>. <conf-name>2002 IEEE International Conference on Acoustics, Speech, and Signal Processing</conf-name> <publisher-loc>Orlando, FL</publisher-loc>: <publisher-name>IEEE</publisher-name>. <fpage>1313</fpage>&#x2013;<lpage>1316</lpage>.</citation></ref>
<ref id="ref30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deichmann</surname> <given-names>J. L.</given-names></name> <name><surname>Acevedo-Charry</surname> <given-names>O.</given-names></name> <name><surname>Barclay</surname> <given-names>L.</given-names></name> <name><surname>Burivalova</surname> <given-names>Z.</given-names></name> <name><surname>Campos-Cerqueira</surname> <given-names>M.</given-names></name> <name><surname>d&#x2019;Horta</surname> <given-names>F.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>It&#x2019;s time to listen: there is much to be learned from the sounds of tropical ecosystems</article-title>. <source>Biotropica</source> <volume>50</volume>, <fpage>713</fpage>&#x2013;<lpage>718</lpage>. doi: <pub-id pub-id-type="doi">10.1111/btp.12593</pub-id></citation></ref>
<ref id="ref31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Delacourt</surname> <given-names>P.</given-names></name> <name><surname>Wellekens</surname> <given-names>C. J.</given-names></name></person-group> (<year>2000</year>). <article-title>DISTBIC: a speaker-based segmentation for audio data indexing</article-title>. <source>Speech Comm.</source> <volume>32</volume>, <fpage>111</fpage>&#x2013;<lpage>126</lpage>. doi: <pub-id pub-id-type="doi">10.1016/S0167-6393(00)00027-3</pub-id></citation></ref>
<ref id="ref32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dias</surname> <given-names>F. F.</given-names></name> <name><surname>Pedrini</surname> <given-names>H.</given-names></name> <name><surname>Minghim</surname> <given-names>R.</given-names></name></person-group> (<year>2021</year>). <article-title>Soundscape segregation based on visual analysis and discriminating features</article-title>. <source>Eco. Inform.</source> <volume>61</volume>:<fpage>101184</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.ecoinf.2020.101184</pub-id></citation></ref>
<ref id="ref33"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Dueck</surname> <given-names>D.</given-names></name></person-group> (<year>2009</year>). Affinity propagation: Clustering data by passing messages. Toronto, ON, Canada: University of Toronto, 144.</citation></ref>
<ref id="ref34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dufourq</surname> <given-names>E.</given-names></name> <name><surname>Batist</surname> <given-names>C.</given-names></name> <name><surname>Foquet</surname> <given-names>R.</given-names></name> <name><surname>Durbach</surname> <given-names>I.</given-names></name></person-group> (<year>2022</year>). <article-title>Passive acoustic monitoring of animal populations with transfer learning</article-title>. <source>Eco. Inform.</source> <volume>70</volume>:<fpage>101688</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.ecoinf.2022.101688</pub-id>, PMID: <pub-id pub-id-type="pmid">35341043</pub-id></citation></ref>
<ref id="ref35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dufourq</surname> <given-names>E.</given-names></name> <name><surname>Durbach</surname> <given-names>I.</given-names></name> <name><surname>Hansford</surname> <given-names>J. P.</given-names></name> <name><surname>Hoepfner</surname> <given-names>A.</given-names></name> <name><surname>Ma</surname> <given-names>H.</given-names></name> <name><surname>Bryant</surname> <given-names>J. V.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Automated detection of Hainan gibbon calls for passive acoustic monitoring</article-title>. <source>Remote Sens. Ecol. Conserv.</source> <volume>7</volume>, <fpage>475</fpage>&#x2013;<lpage>487</lpage>. doi: <pub-id pub-id-type="doi">10.1002/rse2.201</pub-id></citation></ref>
<ref id="ref36"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Dufourq</surname> <given-names>E.</given-names></name> <name><surname>Durbach</surname> <given-names>I.</given-names></name> <name><surname>Hansford</surname> <given-names>J. P.</given-names></name> <name><surname>Hoepfner</surname> <given-names>A.</given-names></name> <name><surname>Ma</surname> <given-names>H.</given-names></name> <name><surname>Bryant</surname> <given-names>J. V.</given-names></name> <etal/></person-group>. (<year>2020</year>). Automated detection of Hainan gibbon calls for passive acoustic monitoring. doi: <pub-id pub-id-type="doi">10.5281/zenodo.3991714</pub-id>.</citation></ref>
<ref id="ref37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Favaro</surname> <given-names>L.</given-names></name> <name><surname>Gili</surname> <given-names>C.</given-names></name> <name><surname>Da Rugna</surname> <given-names>C.</given-names></name> <name><surname>Gnone</surname> <given-names>G.</given-names></name> <name><surname>Fissore</surname> <given-names>C.</given-names></name> <name><surname>Sanchez</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Vocal individuality and species divergence in the contact calls of banded penguins</article-title>. <source>Behav. Process.</source> <volume>128</volume>, <fpage>83</fpage>&#x2013;<lpage>88</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.beproc.2016.04.010</pub-id>, PMID: <pub-id pub-id-type="pmid">27102762</pub-id></citation></ref>
<ref id="ref38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Feng</surname> <given-names>J.-J.</given-names></name> <name><surname>Cui</surname> <given-names>L.-W.</given-names></name> <name><surname>Ma</surname> <given-names>C.-Y.</given-names></name> <name><surname>Fei</surname> <given-names>H.-L.</given-names></name> <name><surname>Fan</surname> <given-names>P.-F.</given-names></name></person-group> (<year>2014</year>). <article-title>Individuality and stability in male songs of cao vit gibbons (<italic>Nomascus nasutus</italic>) with potential to monitor population dynamics</article-title>. <source>PLoS One</source> <volume>9</volume>:<fpage>e96317</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0096317</pub-id>, PMID: <pub-id pub-id-type="pmid">24788306</pub-id></citation></ref>
<ref id="ref39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Geissmann</surname> <given-names>T.</given-names></name></person-group> (<year>2002</year>). <article-title>Duet-splitting and the evolution of gibbon songs</article-title>. <source>Biol. Rev.</source> <volume>77</volume>, <fpage>57</fpage>&#x2013;<lpage>76</lpage>. doi: <pub-id pub-id-type="doi">10.1017/S1464793101005826</pub-id>, PMID: <pub-id pub-id-type="pmid">11911374</pub-id></citation></ref>
<ref id="ref40"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Gemmeke</surname> <given-names>J. F.</given-names></name> <name><surname>Ellis</surname> <given-names>D. P.</given-names></name> <name><surname>Freedman</surname> <given-names>D.</given-names></name> <name><surname>Jansen</surname> <given-names>A.</given-names></name> <name><surname>Lawrence</surname> <given-names>W.</given-names></name> <name><surname>Moore</surname> <given-names>R. C.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Audio set: an ontology and human-labeled dataset for audio events</article-title>. in <conf-name>2017 IEEE international conference on acoustics, speech and signal processing (ICASSP)</conf-name>, <publisher-loc>Piscataway, NJ</publisher-loc> <publisher-name>IEEE</publisher-name> <fpage>776</fpage>&#x2013;<lpage>780</lpage>.</citation></ref>
<ref id="ref41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gibb</surname> <given-names>R.</given-names></name> <name><surname>Browning</surname> <given-names>E.</given-names></name> <name><surname>Glover-Kapfer</surname> <given-names>P.</given-names></name> <name><surname>Jones</surname> <given-names>K. E.</given-names></name></person-group> (<year>2018</year>). <article-title>Emerging opportunities and challenges for passive acoustics in ecological assessment and monitoring</article-title>. <source>Methods Ecol. Evol.</source> <volume>10</volume>, <fpage>169</fpage>&#x2013;<lpage>185</lpage>. doi: <pub-id pub-id-type="doi">10.1111/2041-210X.13101</pub-id></citation></ref>
<ref id="ref42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gillam</surname> <given-names>E. H.</given-names></name> <name><surname>Chaverri</surname> <given-names>G.</given-names></name></person-group> (<year>2012</year>). <article-title>Strong individual signatures and weaker group signatures in contact calls of Spix&#x2019;s disc-winged bat, <italic>Thyroptera tricolor</italic></article-title>. <source>Anim. Behav.</source> <volume>83</volume>, <fpage>269</fpage>&#x2013;<lpage>276</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.anbehav.2011.11.002</pub-id></citation></ref>
<ref id="ref43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grolemund</surname> <given-names>G.</given-names></name> <name><surname>Wickham</surname> <given-names>H.</given-names></name></person-group> (<year>2011</year>). <article-title>Dates and times made easy with lubridate</article-title>. <source>J. Stat. Softw.</source> <volume>40</volume>, <fpage>1</fpage>&#x2013;<lpage>25</lpage>. doi: <pub-id pub-id-type="doi">10.18637/jss.v040.i03</pub-id></citation></ref>
<ref id="ref44"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Hafner</surname> <given-names>S. D.</given-names></name> <name><surname>Katz</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). monitoR: acoustic template detection in R. Available at: <ext-link xlink:href="http://www.uvm.edu/rsenr/vtcfwru/R/?Page=monitoR/monitoR.htm" ext-link-type="uri">http://www.uvm.edu/rsenr/vtcfwru/R/?Page=monitoR/monitoR.htm</ext-link> (Accessed January 23, 2023).</citation></ref>
<ref id="ref45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Haimoff</surname> <given-names>E.</given-names></name> <name><surname>Gittins</surname> <given-names>S.</given-names></name></person-group> (<year>1985</year>). <article-title>Individuality in the songs of wild agile gibbons (<italic>Hylobates agilis</italic>) of Peninsular Malaysia</article-title>. <source>Am. J. Primatol.</source> <volume>8</volume>, <fpage>239</fpage>&#x2013;<lpage>247</lpage>. doi: <pub-id pub-id-type="doi">10.1002/ajp.1350080306</pub-id>, PMID: <pub-id pub-id-type="pmid">31986812</pub-id></citation></ref>
<ref id="ref46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Haimoff</surname> <given-names>E.</given-names></name> <name><surname>Tilson</surname> <given-names>R.</given-names></name></person-group> (<year>1985</year>). <article-title>Individuality in the female songs of wild kloss&#x2019; gibbons (<italic>Hylobates klossii</italic>) on Siberut Island, Indonesia</article-title>. <source>Folia Primatol.</source> <volume>44</volume>, <fpage>129</fpage>&#x2013;<lpage>137</lpage>. doi: <pub-id pub-id-type="doi">10.1159/000156207</pub-id></citation></ref>
<ref id="ref47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hamard</surname> <given-names>M.</given-names></name> <name><surname>Cheyne</surname> <given-names>S. M.</given-names></name> <name><surname>Nijman</surname> <given-names>V.</given-names></name></person-group> (<year>2010</year>). <article-title>Vegetation correlates of gibbon density in the peat-swamp forest of the Sabangau catchment, Central Kalimantan, Indonesia</article-title>. <source>Am. J. Primatol.</source> <volume>72</volume>, <fpage>607</fpage>&#x2013;<lpage>616</lpage>. doi: <pub-id pub-id-type="doi">10.1002/ajp.20815</pub-id>, PMID: <pub-id pub-id-type="pmid">20186760</pub-id></citation></ref>
<ref id="ref48"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Han</surname> <given-names>W.</given-names></name> <name><surname>Chan</surname> <given-names>C.-F.</given-names></name> <name><surname>Choy</surname> <given-names>C.-S.</given-names></name> <name><surname>Pun</surname> <given-names>K.-P.</given-names></name></person-group> (<year>2006</year>). <article-title>An efficient MFCC extraction method in speech recognition</article-title>. in <conf-name>2006 IEEE International Symposium on Circuits and Systems</conf-name>, <publisher-loc>Piscataway, NJ</publisher-loc> <publisher-name>IEEE</publisher-name> <volume>4</volume>.</citation></ref>
<ref id="ref49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hanya</surname> <given-names>G.</given-names></name> <name><surname>Bernard</surname> <given-names>H.</given-names></name></person-group> (<year>2021</year>). <article-title>Interspecific encounters among diurnal primates in Danum Valley, Borneo</article-title>. <source>Int. J. Primatol.</source> <volume>42</volume>, <fpage>442</fpage>&#x2013;<lpage>462</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s10764-021-00211-9</pub-id></citation></ref>
<ref id="ref50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heath</surname> <given-names>B. E.</given-names></name> <name><surname>Sethi</surname> <given-names>S. S.</given-names></name> <name><surname>Orme</surname> <given-names>C. D. L.</given-names></name> <name><surname>Ewers</surname> <given-names>R. M.</given-names></name> <name><surname>Picinali</surname> <given-names>L.</given-names></name></person-group> (<year>2021</year>). <article-title>How index selection, compression, and recording schedule impact the description of ecological soundscapes</article-title>. <source>Ecol. Evol.</source> <volume>11</volume>, <fpage>13206</fpage>&#x2013;<lpage>13217</lpage>. doi: <pub-id pub-id-type="doi">10.1002/ece3.8042</pub-id>, PMID: <pub-id pub-id-type="pmid">34646463</pub-id></citation></ref>
<ref id="ref51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heinicke</surname> <given-names>S.</given-names></name> <name><surname>Kalan</surname> <given-names>A. K.</given-names></name> <name><surname>Wagner</surname> <given-names>O. J. J.</given-names></name> <name><surname>Mundry</surname> <given-names>R.</given-names></name> <name><surname>Lukashevich</surname> <given-names>H.</given-names></name> <name><surname>K&#x00FC;hl</surname> <given-names>H. S.</given-names></name></person-group> (<year>2015</year>). <article-title>Assessing the performance of a semi-automated acoustic monitoring system for primates</article-title>. <source>Methods Ecol. Evol.</source> <volume>6</volume>, <fpage>753</fpage>&#x2013;<lpage>763</lpage>. doi: <pub-id pub-id-type="doi">10.1111/2041-210X.12384</pub-id>, PMID: <pub-id pub-id-type="pmid">15683670</pub-id></citation></ref>
<ref id="ref52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hill</surname> <given-names>A. P.</given-names></name> <name><surname>Prince</surname> <given-names>P.</given-names></name> <name><surname>Pi&#x00F1;a Covarrubias</surname> <given-names>E.</given-names></name> <name><surname>Doncaster</surname> <given-names>C. P.</given-names></name> <name><surname>Snaddon</surname> <given-names>J. L.</given-names></name> <name><surname>Rogers</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>AudioMoth: evaluation of a smart open acoustic device for monitoring biodiversity and the environment</article-title>. <source>Methods Ecol. Evol.</source> <volume>9</volume>, <fpage>1199</fpage>&#x2013;<lpage>1211</lpage>. doi: <pub-id pub-id-type="doi">10.1111/2041-210X.12955</pub-id></citation></ref>
<ref id="ref53"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Hodnett</surname> <given-names>M.</given-names></name> <name><surname>Wiley</surname> <given-names>J. F.</given-names></name> <name><surname>Liu</surname> <given-names>Y. H.</given-names></name> <name><surname>Maldonado</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). <source>Deep Learning with R for Beginners: Design Neural Network Models in R 3.5 Using TensorFlow, Keras, and MXNet</source>. <publisher-loc>Birmingham</publisher-loc> <publisher-name>Packt Publishing Ltd.</publisher-name></citation></ref>
<ref id="ref54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huancapaza Hilasaca</surname> <given-names>L. M.</given-names></name> <name><surname>Gaspar</surname> <given-names>L. P.</given-names></name> <name><surname>Ribeiro</surname> <given-names>M. C.</given-names></name> <name><surname>Minghim</surname> <given-names>R.</given-names></name></person-group> (<year>2021</year>). <article-title>Visualization and categorization of ecological acoustic events based on discriminant features</article-title>. <source>Ecol. Indic.</source> <volume>126</volume>:<fpage>107316</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.ecolind.2020.107316</pub-id></citation></ref>
<ref id="ref55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Inoue</surname> <given-names>Y.</given-names></name> <name><surname>Sinun</surname> <given-names>W.</given-names></name> <name><surname>Okanoya</surname> <given-names>K.</given-names></name></person-group> (<year>2016</year>). <article-title>Activity budget, travel distance, sleeping time, height of activity and travel order of wild east Bornean Grey gibbons (<italic>Hylobates funereus</italic>) in Danum Valley conservation area</article-title>. <source>Raff. Bull. Zool.</source> <volume>64</volume>, <fpage>127</fpage>&#x2013;<lpage>138</lpage>.</citation></ref>
<ref id="ref56"><citation citation-type="other"><person-group person-group-type="author"><collab id="coll1">IUCN</collab></person-group> (<year>2022</year>). <source>The IUCN Red List of Threatened Species. <italic>The IUCN Red List of Threatened Species</italic></source>. <comment>Available at: <ext-link xlink:href="https://www.iucnredlist.org" ext-link-type="uri">https://www.iucnredlist.org</ext-link></comment> ().</citation></ref>
<ref id="ref57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kahl</surname> <given-names>S.</given-names></name> <name><surname>Wood</surname> <given-names>C. M.</given-names></name> <name><surname>Eibl</surname> <given-names>M.</given-names></name> <name><surname>Klinck</surname> <given-names>H.</given-names></name></person-group> (<year>2021</year>). <article-title>BirdNET: a deep learning solution for avian diversity monitoring</article-title>. <source>Eco. Inform.</source> <volume>61</volume>:<fpage>101236</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.ecoinf.2021.101236</pub-id></citation></ref>
<ref id="ref58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kalan</surname> <given-names>A. K.</given-names></name> <name><surname>Mundry</surname> <given-names>R.</given-names></name> <name><surname>Wagner</surname> <given-names>O. J. J.</given-names></name> <name><surname>Heinicke</surname> <given-names>S.</given-names></name> <name><surname>Boesch</surname> <given-names>C.</given-names></name> <name><surname>K&#x00FC;hl</surname> <given-names>H. S.</given-names></name></person-group> (<year>2015</year>). <article-title>Towards the automated detection and occupancy estimation of primates using passive acoustic monitoring</article-title>. <source>Ecol. Indic.</source> <volume>54</volume>, <fpage>217</fpage>&#x2013;<lpage>226</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.ecolind.2015.02.023</pub-id></citation></ref>
<ref id="ref59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kalan</surname> <given-names>A. K.</given-names></name> <name><surname>Piel</surname> <given-names>A. K.</given-names></name> <name><surname>Mundry</surname> <given-names>R.</given-names></name> <name><surname>Wittig</surname> <given-names>R. M.</given-names></name> <name><surname>Boesch</surname> <given-names>C.</given-names></name> <name><surname>K&#x00FC;hl</surname> <given-names>H. S.</given-names></name></person-group> (<year>2016</year>). <article-title>Passive acoustic monitoring reveals group ranging and territory use: a case study of wild chimpanzees (pan troglodytes)</article-title>. <source>Front. Zool.</source> <volume>13</volume>:<fpage>34</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s12983-016-0167-8</pub-id>, PMID: <pub-id pub-id-type="pmid">27507999</pub-id></citation></ref>
<ref id="ref60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Katz</surname> <given-names>J.</given-names></name> <name><surname>Hafner</surname> <given-names>S. D.</given-names></name> <name><surname>Donovan</surname> <given-names>T.</given-names></name></person-group> (<year>2016a</year>). <article-title>Assessment of error rates in acoustic monitoring with the R package monitoR</article-title>. <source>Bioacoustics</source> <volume>25</volume>, <fpage>177</fpage>&#x2013;<lpage>196</lpage>. doi: <pub-id pub-id-type="doi">10.1080/09524622.2015.1133320</pub-id></citation></ref>
<ref id="ref61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Katz</surname> <given-names>J.</given-names></name> <name><surname>Hafner</surname> <given-names>S. D.</given-names></name> <name><surname>Donovan</surname> <given-names>T.</given-names></name></person-group> (<year>2016b</year>). <article-title>Tools for automated acoustic monitoring within the R package monitoR</article-title>. <source>Bioacoustics</source> <volume>25</volume>, <fpage>197</fpage>&#x2013;<lpage>210</lpage>. doi: <pub-id pub-id-type="doi">10.1080/09524622.2016.1138415</pub-id></citation></ref>
<ref id="ref62"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Keen</surname> <given-names>S. C.</given-names></name> <name><surname>Shiu</surname> <given-names>Y.</given-names></name> <name><surname>Wrege</surname> <given-names>P. H.</given-names></name> <name><surname>Rowland</surname> <given-names>E. D.</given-names></name></person-group> (<year>2017</year>). <article-title>Automated detection of low-frequency rumbles of forest elephants: a critical tool for their conservation</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>141</volume>, <fpage>2715</fpage>&#x2013;<lpage>2726</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.4979476</pub-id>, PMID: <pub-id pub-id-type="pmid">28464628</pub-id></citation></ref>
<ref id="ref63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kershenbaum</surname> <given-names>A.</given-names></name> <name><surname>Sayigh</surname> <given-names>L. S.</given-names></name> <name><surname>Janik</surname> <given-names>V. M.</given-names></name></person-group> (<year>2013</year>). <article-title>The encoding of individual identity in dolphin signature whistles: how much information is needed?</article-title> <source>PLoS One</source> <volume>8</volume>:<fpage>e77671</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0077671</pub-id>, PMID: <pub-id pub-id-type="pmid">24194893</pub-id></citation></ref>
<ref id="ref64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kidney</surname> <given-names>D.</given-names></name> <name><surname>Rawson</surname> <given-names>B. M.</given-names></name> <name><surname>Borchers</surname> <given-names>D. L.</given-names></name> <name><surname>Stevenson</surname> <given-names>B. C.</given-names></name> <name><surname>Marques</surname> <given-names>T. A.</given-names></name> <name><surname>Thomas</surname> <given-names>L.</given-names></name></person-group> (<year>2016</year>). <article-title>An efficient acoustic density estimation method with human detectors applied to gibbons in Cambodia</article-title>. <source>PLoS One</source> <volume>11</volume>:<fpage>e0155066</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0155066</pub-id>, PMID: <pub-id pub-id-type="pmid">27195799</pub-id></citation></ref>
<ref id="ref65"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Koch</surname> <given-names>R.</given-names></name> <name><surname>Raymond</surname> <given-names>M.</given-names></name> <name><surname>Wrege</surname> <given-names>P.</given-names></name> <name><surname>Klinck</surname> <given-names>H.</given-names></name></person-group> (<year>2016</year>). <article-title>SWIFT: a small, low-cost acoustic recorder for terrestrial wildlife monitoring applications</article-title>. in <conf-name>North American Ornithological Conference</conf-name> <conf-loc>Washington, DC</conf-loc>, <comment>619</comment>.</citation></ref>
<ref id="ref66"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Konopka</surname> <given-names>T</given-names></name></person-group>. (<year>2020</year>). Umap: uniform manifold approximation and projection. Available at: <ext-link xlink:href="https://cran.r-project.org/package=umap" ext-link-type="uri">https://cran.r-project.org/package=umap</ext-link> (Accessed January 23, 2023).</citation></ref>
<ref id="ref67"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lai</surname> <given-names>J.</given-names></name> <name><surname>Lortie</surname> <given-names>C. J.</given-names></name> <name><surname>Muenchen</surname> <given-names>R. A.</given-names></name> <name><surname>Yang</surname> <given-names>J.</given-names></name> <name><surname>Ma</surname> <given-names>K.</given-names></name></person-group> (<year>2019</year>). <article-title>Evaluating the popularity of R in ecology</article-title>. <source>Ecosphere</source> <volume>10</volume>:<fpage>e02567</fpage>. doi: <pub-id pub-id-type="doi">10.1002/ecs2.2567</pub-id>, PMID: <pub-id pub-id-type="pmid">35885402</pub-id></citation></ref>
<ref id="ref68"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lapp</surname> <given-names>S.</given-names></name> <name><surname>Wu</surname> <given-names>T.</given-names></name> <name><surname>Richards-Zawacki</surname> <given-names>C.</given-names></name> <name><surname>Voyles</surname> <given-names>J.</given-names></name> <name><surname>Rodriguez</surname> <given-names>K. M.</given-names></name> <name><surname>Shamon</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Automated detection of frog calls and choruses by pulse repetition rate</article-title>. <source>Conserv. Biol.</source> <volume>35</volume>, <fpage>1659</fpage>&#x2013;<lpage>1668</lpage>. doi: <pub-id pub-id-type="doi">10.1111/cobi.13718</pub-id>, PMID: <pub-id pub-id-type="pmid">33586273</pub-id></citation></ref>
<ref id="ref69"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lawlor</surname> <given-names>J.</given-names></name> <name><surname>Banville</surname> <given-names>F.</given-names></name> <name><surname>Forero-Mu&#x00F1;oz</surname> <given-names>N.-R.</given-names></name> <name><surname>H&#x00E9;bert</surname> <given-names>K.</given-names></name> <name><surname>Mart&#x00ED;nez-Lanfranco</surname> <given-names>J. A.</given-names></name> <name><surname>Rogy</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Ten simple rules for teaching yourself R</article-title>. <source>PLoS Comput. Biol.</source> <volume>18</volume>:<fpage>e1010372</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pcbi.1010372</pub-id>, PMID: <pub-id pub-id-type="pmid">36048770</pub-id></citation></ref>
<ref id="ref70"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liaw</surname> <given-names>A.</given-names></name> <name><surname>Wiener</surname> <given-names>M.</given-names></name></person-group> (<year>2002</year>). <article-title>Classification and regression by randomForest</article-title>. <source>R News</source> <volume>2</volume>, <fpage>18</fpage>&#x2013;<lpage>22</lpage>.</citation></ref>
<ref id="ref71"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Ligges</surname> <given-names>U.</given-names></name> <name><surname>Krey</surname> <given-names>S.</given-names></name> <name><surname>Mersmann</surname> <given-names>O.</given-names></name> <name><surname>Schnackenberg</surname> <given-names>S.</given-names></name></person-group> (<year>2016</year>). {tuneR}: analysis of music. Available at: <ext-link xlink:href="https://r-forge.r-project.org/projects/tuner/" ext-link-type="uri">https://r-forge.r-project.org/projects/tuner/</ext-link></citation></ref>
<ref id="ref72"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lu</surname> <given-names>L.</given-names></name> <name><surname>Zhang</surname> <given-names>H.-J.</given-names></name> <name><surname>Li</surname> <given-names>S. Z.</given-names></name></person-group> (<year>2003</year>). <article-title>Content-based audio classification and segmentation by using support vector machines</article-title>. <source>Multimed. Syst.</source> <volume>8</volume>, <fpage>482</fpage>&#x2013;<lpage>492</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s00530-002-0065-0</pub-id></citation></ref>
<ref id="ref73"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Lucio</surname> <given-names>D. R.</given-names></name> <name><surname>Maldonado</surname> <given-names>Y.</given-names></name> <name><surname>da Costa</surname> <given-names>G.</given-names></name></person-group> (<year>2015</year>). <article-title>Bird species classification using spectrograms</article-title>. in <conf-name>2015 Latin American Computing Conference (CLEI)</conf-name>, <conf-loc>Arequipa, Peru</conf-loc> <fpage>1</fpage>&#x2013;<lpage>11</lpage>.</citation></ref>
<ref id="ref74"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Madhusudhana</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). shyamblast/Koogu. Zenodo. doi: <pub-id pub-id-type="doi">10.5281/zenodo.5781423</pub-id>.</citation></ref>
<ref id="ref75"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Madhusudhana</surname> <given-names>S.</given-names></name> <name><surname>Shiu</surname> <given-names>Y.</given-names></name> <name><surname>Klinck</surname> <given-names>H.</given-names></name> <name><surname>Fleishman</surname> <given-names>E.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Nosal</surname> <given-names>E.-M.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Improve automatic detection of animal call sequences with temporal context</article-title>. <source>J. R. Soc. Interface</source> <volume>18</volume>:<fpage>20210297</fpage>. doi: <pub-id pub-id-type="doi">10.1098/rsif.2021.0297</pub-id>, PMID: <pub-id pub-id-type="pmid">34283944</pub-id></citation></ref>
<ref id="ref76"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Madhusudhana</surname> <given-names>S. K.</given-names></name> <name><surname>Symes</surname> <given-names>L. B.</given-names></name> <name><surname>Klinck</surname> <given-names>H.</given-names></name></person-group> (<year>2019</year>). <article-title>A deep convolutional neural network based classifier for passive acoustic monitoring of neotropical katydids</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>146</volume>:<fpage>2982</fpage>. doi: <pub-id pub-id-type="doi">10.1121/1.5137323</pub-id></citation></ref>
<ref id="ref77"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Maechler</surname> <given-names>M.</given-names></name> <name><surname>Rousseeuw</surname> <given-names>P.</given-names></name> <name><surname>Struyf</surname> <given-names>A.</given-names></name> <name><surname>Hubert</surname> <given-names>M.</given-names></name> <name><surname>Hornik</surname> <given-names>K.</given-names></name></person-group> (<year>2019</year>). <article-title>Cluster: cluster analysis basics and extensions</article-title>.</citation></ref>
<ref id="ref78"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Markolf</surname> <given-names>M.</given-names></name> <name><surname>Zinowsky</surname> <given-names>M.</given-names></name> <name><surname>Keller</surname> <given-names>J. K.</given-names></name> <name><surname>Borys</surname> <given-names>J.</given-names></name> <name><surname>Cillov</surname> <given-names>A.</given-names></name> <name><surname>Sch&#x00FC;lke</surname> <given-names>O.</given-names></name></person-group> (<year>2022</year>). <article-title>Toward passive acoustic monitoring of lemurs: using an affordable open-source system to monitor Phaner vocal activity and density</article-title>. <source>Int. J. Primatol.</source> <volume>43</volume>, <fpage>409</fpage>&#x2013;<lpage>433</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s10764-022-00285-z</pub-id></citation></ref>
<ref id="ref79"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marques</surname> <given-names>T. A.</given-names></name> <name><surname>Thomas</surname> <given-names>L.</given-names></name> <name><surname>Martin</surname> <given-names>S. W.</given-names></name> <name><surname>Mellinger</surname> <given-names>D. K.</given-names></name> <name><surname>Ward</surname> <given-names>J. A.</given-names></name> <name><surname>Moretti</surname> <given-names>D. J.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Estimating animal population density using passive acoustics</article-title>. <source>Biol. Rev.</source> <volume>88</volume>, <fpage>287</fpage>&#x2013;<lpage>309</lpage>. doi: <pub-id pub-id-type="doi">10.1111/brv.12001</pub-id>, PMID: <pub-id pub-id-type="pmid">23190144</pub-id></citation></ref>
<ref id="ref80"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Martin</surname> <given-names>A.</given-names></name> <name><surname>Doddington</surname> <given-names>G.</given-names></name> <name><surname>Kamm</surname> <given-names>T.</given-names></name> <name><surname>Ordowski</surname> <given-names>M.</given-names></name> <name><surname>Przybocki</surname> <given-names>M.</given-names></name></person-group> (<year>1997</year>). <article-title>The DET curve in assessment of detection task performance</article-title>. <source>Proc. Eurospeech</source> <volume>4</volume>, <fpage>1895</fpage>&#x2013;<lpage>1898</lpage>.</citation></ref>
<ref id="ref81"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Mellinger</surname> <given-names>D. K.</given-names></name> <name><surname>Roch</surname> <given-names>M. A.</given-names></name> <name><surname>Nosal</surname> <given-names>E.-M.</given-names></name> <name><surname>Klinck</surname> <given-names>H.</given-names></name></person-group> (<year>2016</year>). &#x201C;<article-title>Signal processing</article-title>&#x201D; in <source>Listening in the Ocean</source>. eds. <person-group person-group-type="editor"><name><surname>Au</surname> <given-names>W. W.</given-names></name> <name><surname>Lammers</surname> <given-names>M. O.</given-names></name></person-group> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>359</fpage>&#x2013;<lpage>409</lpage>.</citation></ref>
<ref id="ref82"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Meyer</surname> <given-names>D.</given-names></name> <name><surname>Dimitriadou</surname> <given-names>E.</given-names></name> <name><surname>Hornik</surname> <given-names>K.</given-names></name> <name><surname>Weingessel</surname> <given-names>A.</given-names></name> <name><surname>Leisch</surname> <given-names>F.</given-names></name></person-group> (<year>2017</year>). e1071: Misc functions of the Department of Statistics.</citation></ref>
<ref id="ref83"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mielke</surname> <given-names>A.</given-names></name> <name><surname>Zuberb&#x00FC;hler</surname> <given-names>K.</given-names></name></person-group> (<year>2013</year>). <article-title>A method for automated individual, species and call type recognition in free-ranging animals</article-title>. <source>Anim. Behav.</source> <volume>86</volume>, <fpage>475</fpage>&#x2013;<lpage>482</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.anbehav.2013.04.017</pub-id></citation></ref>
<ref id="ref84"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mitani</surname> <given-names>J. C.</given-names></name></person-group> (<year>1984</year>). <article-title>The behavioral regulation of monogamy in gibbons (<italic>Hylobates muelleri</italic>)</article-title>. <source>Behav. Ecol. Sociobiol.</source> <volume>15</volume>, <fpage>225</fpage>&#x2013;<lpage>229</lpage>. doi: <pub-id pub-id-type="doi">10.1007/BF00292979</pub-id></citation></ref>
<ref id="ref85"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mitani</surname> <given-names>J. C.</given-names></name></person-group> (<year>1985</year>). <article-title>Gibbon song duets and intergroup spacing</article-title>. <source>Behaviour</source> <volume>92</volume>, <fpage>59</fpage>&#x2013;<lpage>96</lpage>. doi: <pub-id pub-id-type="doi">10.1080/0141192032000137321</pub-id></citation></ref>
<ref id="ref86"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Muda</surname> <given-names>L.</given-names></name> <name><surname>Begam</surname> <given-names>M.</given-names></name> <name><surname>Elamvazuthi</surname> <given-names>I.</given-names></name></person-group> (<year>2010</year>). <article-title>Voice recognition algorithms using Mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques</article-title>. <source>J. Comput.</source> <volume>2</volume>, <fpage>2151</fpage>&#x2013;<lpage>9617</lpage>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.1003.4083</pub-id></citation></ref>
<ref id="ref87"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Noviyanti</surname> <given-names>A.</given-names></name> <name><surname>Sudarsono</surname> <given-names>A. S.</given-names></name> <name><surname>Kusumaningrum</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>Urban soundscape prediction based on acoustic ecology and MFCC parameters</article-title>. <source>AIP Conf. Proc.</source> <volume>2187</volume>:<fpage>050005</fpage>. doi: <pub-id pub-id-type="doi">10.1063/1.5138335</pub-id></citation></ref>
<ref id="ref88"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Parra-Hern&#x00E1;ndez</surname> <given-names>R. M.</given-names></name> <name><surname>Posada-Quintero</surname> <given-names>J. I.</given-names></name> <name><surname>Acevedo-Charry</surname> <given-names>O.</given-names></name> <name><surname>Posada-Quintero</surname> <given-names>H. F.</given-names></name></person-group> (<year>2020</year>). <article-title>Uniform manifold approximation and projection for clustering taxa through vocalizations in a neotropical passerine (rough-legged tyrannulet, <italic>Phyllomyias burmeisteri</italic>)</article-title>. <source>Animals</source> <volume>10</volume>:<fpage>1406</fpage>. doi: <pub-id pub-id-type="doi">10.3390/ani10081406</pub-id>, PMID: <pub-id pub-id-type="pmid">32806680</pub-id></citation></ref>
<ref id="ref89"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>P&#x00E9;rez-Granados</surname> <given-names>C.</given-names></name> <name><surname>Schuchmann</surname> <given-names>K.-L.</given-names></name></person-group> (<year>2021</year>). <article-title>Passive acoustic monitoring of the diel and annual vocal behavior of the Black and Gold Howler Monkey</article-title>. <source>Am. J. Primatol.</source> <volume>83</volume>:<fpage>e23241</fpage>. doi: <pub-id pub-id-type="doi">10.1002/ajp.23241</pub-id>, PMID: <pub-id pub-id-type="pmid">33539555</pub-id></citation></ref>
<ref id="ref90"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Phoonjampa</surname> <given-names>R.</given-names></name> <name><surname>Koenig</surname> <given-names>A.</given-names></name> <name><surname>Brockelman</surname> <given-names>W. Y.</given-names></name> <name><surname>Borries</surname> <given-names>C.</given-names></name> <name><surname>Gale</surname> <given-names>G. A.</given-names></name> <name><surname>Carroll</surname> <given-names>J. P.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>Pileated gibbon density in relation to habitat characteristics and post-logging forest recovery</article-title>. <source>Biotropica</source> <volume>43</volume>, <fpage>619</fpage>&#x2013;<lpage>627</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1744-7429.2010.00743.x</pub-id></citation></ref>
<ref id="ref91"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Probst</surname> <given-names>P.</given-names></name> <name><surname>Wright</surname> <given-names>M. N.</given-names></name> <name><surname>Boulesteix</surname> <given-names>A.-L.</given-names></name></person-group> (<year>2019</year>). <article-title>Hyperparameters and tuning strategies for random forest</article-title>. <source>Wiley Interdiscipl. Rev.</source> <volume>9</volume>:<fpage>e1301</fpage>. doi: <pub-id pub-id-type="doi">10.1002/widm.1301</pub-id></citation></ref>
<ref id="ref92"><citation citation-type="book"><person-group person-group-type="author"><collab id="coll2">R Core Team</collab></person-group> (<year>2022</year>). <source>R: A Language and Environment for Statistical Computing</source>. <publisher-loc>Vienna, Austria</publisher-loc>: <publisher-name>R Foundation for Statistical Computing</publisher-name>. <comment>Available at: <ext-link xlink:href="https://www.R-project.org/" ext-link-type="uri">https://www.R-project.org/</ext-link></comment> (Accessed January 23, 2023).</citation></ref>
<ref id="ref93"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Romero-Mujalli</surname> <given-names>D.</given-names></name> <name><surname>Bergmann</surname> <given-names>T.</given-names></name> <name><surname>Zimmermann</surname> <given-names>A.</given-names></name> <name><surname>Scheumann</surname> <given-names>M.</given-names></name></person-group> (<year>2021</year>). <article-title>Utilizing DeepSqueak for automatic detection and classification of mammalian vocalizations: a case study on primate vocalizations</article-title>. <source>Sci. Rep.</source> <volume>11</volume>:<fpage>24463</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-021-03941-1</pub-id>, PMID: <pub-id pub-id-type="pmid">34961788</pub-id></citation></ref>
<ref id="ref94"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Ross</surname> <given-names>J. C.</given-names></name></person-group> (<year>2013</year>). Flightcallr: Classify Night Flight Calls Based on Acoustic Measurements. Available at: <ext-link xlink:href="https://R-Forge.R-project.org/projects/flightcallr/" ext-link-type="uri">https://R-Forge.R-project.org/projects/flightcallr/</ext-link> (Accessed January 23, 2023).</citation></ref>
<ref id="ref95"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ross</surname> <given-names>J. C.</given-names></name> <name><surname>Allen</surname> <given-names>P. E.</given-names></name></person-group> (<year>2014</year>). <article-title>Random Forest for improved analysis efficiency in passive acoustic monitoring</article-title>. <source>Eco. Inform.</source> <volume>21</volume>, <fpage>34</fpage>&#x2013;<lpage>39</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.ecoinf.2013.12.002</pub-id></citation></ref>
<ref id="ref96"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Ruff</surname> <given-names>Z. J.</given-names></name> <name><surname>Lesmeister</surname> <given-names>D. B.</given-names></name> <name><surname>Appel</surname> <given-names>C. L.</given-names></name> <name><surname>Sullivan</surname> <given-names>C. M.</given-names></name></person-group> (<year>2020</year>). Convolutional neural network and R-Shiny app for identifying vocalizations in Pacific Northwest forests. doi: <pub-id pub-id-type="doi">10.5281/zenodo.4092393</pub-id>.</citation></ref>
<ref id="ref97"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ruff</surname> <given-names>Z. J.</given-names></name> <name><surname>Lesmeister</surname> <given-names>D. B.</given-names></name> <name><surname>Appel</surname> <given-names>C. L.</given-names></name> <name><surname>Sullivan</surname> <given-names>C. M.</given-names></name></person-group> (<year>2021</year>). <article-title>Workflow and convolutional neural network for automated identification of animal sounds</article-title>. <source>Ecol. Indic.</source> <volume>124</volume>:<fpage>107419</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.ecolind.2021.107419</pub-id></citation></ref>
<ref id="ref98"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sadhukhan</surname> <given-names>S.</given-names></name> <name><surname>Root-Gutteridge</surname> <given-names>H.</given-names></name> <name><surname>Habib</surname> <given-names>B.</given-names></name></person-group> (<year>2021</year>). <article-title>Identifying unknown Indian wolves by their distinctive howls: its potential as a non-invasive survey method</article-title>. <source>Sci. Rep.</source> <volume>11</volume>:<fpage>7309</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-021-86718-w</pub-id>, PMID: <pub-id pub-id-type="pmid">33790346</pub-id></citation></ref>
<ref id="ref99"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sainburg</surname> <given-names>T.</given-names></name> <name><surname>Thielk</surname> <given-names>M.</given-names></name> <name><surname>Gentner</surname> <given-names>T. Q.</given-names></name></person-group> (<year>2020</year>). <article-title>Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires</article-title>. <source>PLoS Comput. Biol.</source> <volume>16</volume>:<fpage>e1008228</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pcbi.1008228</pub-id>, PMID: <pub-id pub-id-type="pmid">33057332</pub-id></citation></ref>
<ref id="ref100"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Scavetta</surname> <given-names>R. J.</given-names></name> <name><surname>Angelov</surname> <given-names>B.</given-names></name></person-group> (<year>2021</year>). <source>Python and R for the Modern Data Scientist</source>. <publisher-loc>California, USA</publisher-loc>: <publisher-name>O&#x2019;Reilly Media, Inc</publisher-name>.</citation></ref>
<ref id="ref101"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sethi</surname> <given-names>S. S.</given-names></name> <name><surname>Ewers</surname> <given-names>R. M.</given-names></name> <name><surname>Jones</surname> <given-names>N. S.</given-names></name> <name><surname>Orme</surname> <given-names>C. D. L.</given-names></name> <name><surname>Picinali</surname> <given-names>L.</given-names></name></person-group> (<year>2018</year>). <article-title>Robust, real-time and autonomous monitoring of ecosystems with an open, low-cost, networked device</article-title>. <source>Methods Ecol. Evol.</source> <volume>9</volume>, <fpage>2383</fpage>&#x2013;<lpage>2387</lpage>. doi: <pub-id pub-id-type="doi">10.1111/2041-210X.13089</pub-id></citation></ref>
<ref id="ref102"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sethi</surname> <given-names>S. S.</given-names></name> <name><surname>Ewers</surname> <given-names>R. M.</given-names></name> <name><surname>Jones</surname> <given-names>N. S.</given-names></name> <name><surname>Sleutel</surname> <given-names>J.</given-names></name> <name><surname>Shabrani</surname> <given-names>A.</given-names></name> <name><surname>Zulkifli</surname> <given-names>N.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Soundscapes predict species occurrence in tropical forests</article-title>. <source>Oikos</source> <volume>2022</volume>:<fpage>e08525</fpage>. doi: <pub-id pub-id-type="doi">10.1111/oik.08525</pub-id></citation></ref>
<ref id="ref103"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sethi</surname> <given-names>S. S.</given-names></name> <name><surname>Jones</surname> <given-names>N. S.</given-names></name> <name><surname>Fulcher</surname> <given-names>B. D.</given-names></name> <name><surname>Picinali</surname> <given-names>L.</given-names></name> <name><surname>Clink</surname> <given-names>D. J.</given-names></name> <name><surname>Klinck</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Characterizing soundscapes across diverse ecosystems using a universal acoustic feature set</article-title>. <source>Proc. Natl. Acad. Sci.</source> <volume>117</volume>, <fpage>17049</fpage>&#x2013;<lpage>17055</lpage>. doi: <pub-id pub-id-type="doi">10.1073/pnas.2004702117</pub-id>, PMID: <pub-id pub-id-type="pmid">32636258</pub-id></citation></ref>
<ref id="ref104"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shiu</surname> <given-names>Y.</given-names></name> <name><surname>Palmer</surname> <given-names>K. J.</given-names></name> <name><surname>Roch</surname> <given-names>M. A.</given-names></name> <name><surname>Fleishman</surname> <given-names>E.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Nosal</surname> <given-names>E.-M.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Deep neural networks for automated detection of marine mammal species</article-title>. <source>Sci. Rep.</source> <volume>10</volume>, <fpage>1</fpage>&#x2013;<lpage>12</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-020-57549-y</pub-id></citation></ref>
<ref id="ref105"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shonfield</surname> <given-names>J.</given-names></name> <name><surname>Bayne</surname> <given-names>E. M.</given-names></name></person-group> (<year>2017</year>). <article-title>Autonomous recording units in avian ecological research: current use and future applications</article-title>. <source>Avian Conserv. Ecol.</source> <volume>12</volume>:<fpage>art14</fpage>. doi: <pub-id pub-id-type="doi">10.5751/ACE-00974-120114</pub-id></citation></ref>
<ref id="ref106"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Silva</surname> <given-names>B</given-names></name></person-group>. (<year>2022</year>). soundClass: Sound Classification Using Convolutional Neural Networks. Available at: <ext-link xlink:href="https://CRAN.R-project.org/package=soundClass" ext-link-type="uri">https://CRAN.R-project.org/package=soundClass</ext-link> (Accessed January 23, 2023).</citation></ref>
<ref id="ref107"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Silva</surname> <given-names>B.</given-names></name> <name><surname>Mestre</surname> <given-names>F.</given-names></name> <name><surname>Barreiro</surname> <given-names>S.</given-names></name> <name><surname>Alves</surname> <given-names>P. J.</given-names></name> <name><surname>Herrera</surname> <given-names>J. M.</given-names></name></person-group> (<year>2022</year>). <article-title>soundClass: an automatic sound classification tool for biodiversity monitoring using machine learning</article-title>. <source>Methods Ecol. Evolut.</source> <volume>13</volume>, <fpage>2356</fpage>&#x2013;<lpage>2362</lpage>. doi: <pub-id pub-id-type="doi">10.1111/2041-210X.13964</pub-id></citation></ref>
<ref id="ref108"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sing</surname> <given-names>T.</given-names></name> <name><surname>Sander</surname> <given-names>O.</given-names></name> <name><surname>Beerenwinkel</surname> <given-names>N.</given-names></name> <name><surname>Lengauer</surname> <given-names>T.</given-names></name> <name><surname>Sing</surname> <given-names>T.</given-names></name> <name><surname>Sander</surname> <given-names>O.</given-names></name> <etal/></person-group>. (<year>2005</year>). <article-title>ROCR: visualizing classifier performance in R</article-title>. <source>Bioinformatics</source> <volume>21</volume>, <fpage>3940</fpage>&#x2013;<lpage>3941</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/bti623</pub-id>, PMID: <pub-id pub-id-type="pmid">16096348</pub-id></citation></ref>
<ref id="ref109"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Soares</surname> <given-names>B. S.</given-names></name> <name><surname>Luz</surname> <given-names>J. S.</given-names></name> <name><surname>de Mac&#x00EA;do</surname> <given-names>V. F.</given-names></name> <name><surname>Silva</surname> <given-names>R. R. V. E.</given-names></name> <name><surname>De Ara&#x00FA;jo</surname> <given-names>F. H. D.</given-names></name> <name><surname>Magalh&#x00E3;es</surname> <given-names>D. M. V.</given-names></name></person-group> (<year>2022</year>). <article-title>MFCC-based descriptor for bee queen presence detection</article-title>. <source>Expert Syst. Appl.</source> <volume>201</volume>:<fpage>117104</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.eswa.2022.117104</pub-id></citation></ref>
<ref id="ref110"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spillmann</surname> <given-names>B.</given-names></name> <name><surname>van Schaik</surname> <given-names>C. P.</given-names></name> <name><surname>Setia</surname> <given-names>T. M.</given-names></name> <name><surname>Sadjadi</surname> <given-names>S. O.</given-names></name></person-group> (<year>2017</year>). <article-title>Who shall I say is calling? Validation of a caller recognition procedure in Bornean flanged male orangutan (<italic>Pongo pygmaeus</italic>) long calls</article-title>. <source>Bioacoustics</source> <volume>26</volume>, <fpage>109</fpage>&#x2013;<lpage>120</lpage>. doi: <pub-id pub-id-type="doi">10.1080/09524622.2016.1216802</pub-id></citation></ref>
<ref id="ref111"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stevenson</surname> <given-names>B. C.</given-names></name> <name><surname>Borchers</surname> <given-names>D. L.</given-names></name> <name><surname>Altwegg</surname> <given-names>R.</given-names></name> <name><surname>Swift</surname> <given-names>R. J.</given-names></name> <name><surname>Gillespie</surname> <given-names>D. M.</given-names></name> <name><surname>Measey</surname> <given-names>G. J.</given-names></name></person-group> (<year>2015</year>). <article-title>A general framework for animal density estimation from acoustic detections across a fixed microphone array</article-title>. <source>Methods Ecol. Evol.</source> <volume>6</volume>, <fpage>38</fpage>&#x2013;<lpage>48</lpage>. doi: <pub-id pub-id-type="doi">10.1111/2041-210X.12291</pub-id></citation></ref>
<ref id="ref112"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stowell</surname> <given-names>D.</given-names></name></person-group> (<year>2022</year>). <article-title>Computational bioacoustics with deep learning: a review and roadmap</article-title>. <source>PeerJ</source> <volume>10</volume>:<fpage>e13152</fpage>. doi: <pub-id pub-id-type="doi">10.7717/peerj.13152</pub-id>, PMID: <pub-id pub-id-type="pmid">35341043</pub-id></citation></ref>
<ref id="ref113"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sueur</surname> <given-names>J.</given-names></name> <name><surname>Aubin</surname> <given-names>T.</given-names></name> <name><surname>Simonis</surname> <given-names>C.</given-names></name></person-group> (<year>2008</year>). <article-title>Seewave: a free modular tool for sound analysis and synthesis</article-title>. <source>Bioacoustics</source> <volume>18</volume>, <fpage>213</fpage>&#x2013;<lpage>226</lpage>. doi: <pub-id pub-id-type="doi">10.1080/09524622.2008.9753600</pub-id></citation></ref>
<ref id="ref114"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sugai</surname> <given-names>L. S. M.</given-names></name> <name><surname>Llusia</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>Bioacoustic time capsules: using acoustic monitoring to document biodiversity</article-title>. <source>Ecol. Indic.</source> <volume>99</volume>, <fpage>149</fpage>&#x2013;<lpage>152</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.ecolind.2018.12.021</pub-id></citation></ref>
<ref id="ref115"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sugai</surname> <given-names>L. S. M.</given-names></name> <name><surname>Silva</surname> <given-names>T. S. F.</given-names></name> <name><surname>Ribeiro</surname> <given-names>J. W.</given-names></name> <name><surname>Llusia</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>Terrestrial passive acoustic monitoring: review and perspectives</article-title>. <source>Bioscience</source> <volume>69</volume>, <fpage>15</fpage>&#x2013;<lpage>25</lpage>. doi: <pub-id pub-id-type="doi">10.1093/biosci/biy147</pub-id></citation></ref>
<ref id="ref116"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>G.-Z.</given-names></name> <name><surname>Huang</surname> <given-names>B.</given-names></name> <name><surname>Guan</surname> <given-names>Z.-H.</given-names></name> <name><surname>Geissmann</surname> <given-names>T.</given-names></name> <name><surname>Jiang</surname> <given-names>X.-L.</given-names></name></person-group> (<year>2011</year>). <article-title>Individuality in male songs of wild black crested gibbons (<italic>Nomascus concolor</italic> )</article-title>. <source>Am. J. Primatol.</source> <volume>73</volume>, <fpage>431</fpage>&#x2013;<lpage>438</lpage>. doi: <pub-id pub-id-type="doi">10.1002/ajp.20917</pub-id>, PMID: <pub-id pub-id-type="pmid">21432872</pub-id></citation></ref>
<ref id="ref117"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Swets</surname> <given-names>J. A.</given-names></name></person-group> (<year>1964</year>). <source>Signal Detection and Recognition by Human Observers: Contemporary Readings</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Wiley</publisher-name>.</citation></ref>
<ref id="ref118"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Terleph</surname> <given-names>T. A.</given-names></name> <name><surname>Malaivijitnond</surname> <given-names>S.</given-names></name> <name><surname>Reichard</surname> <given-names>U. H.</given-names></name></person-group> (<year>2015</year>). <article-title>Lar gibbon (<italic>Hylobates lar</italic>) great call reveals individual caller identity</article-title>. <source>Am. J. Primatol.</source> <volume>77</volume>, <fpage>811</fpage>&#x2013;<lpage>821</lpage>. doi: <pub-id pub-id-type="doi">10.1002/ajp.22406</pub-id>, PMID: <pub-id pub-id-type="pmid">25800578</pub-id></citation></ref>
<ref id="ref119"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vu</surname> <given-names>T. T.</given-names></name> <name><surname>Tran</surname> <given-names>L. M.</given-names></name></person-group> (<year>2019</year>). <article-title>An application of autonomous recorders for gibbon monitoring</article-title>. <source>Int. J. Primatol.</source> <volume>40</volume>, <fpage>169</fpage>&#x2013;<lpage>186</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s10764-018-0073-3</pub-id></citation></ref>
<ref id="ref120"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vu</surname> <given-names>T. T.</given-names></name> <name><surname>Tran</surname> <given-names>D. V.</given-names></name></person-group> (<year>2020</year>). <article-title>Using autonomous recorders and bioacoustics to monitor the globally endangered wildlife in the Annamite mountain landscape: a case study with crested argus in Song Thanh Nature Reserve</article-title>. <source>J. Nat. Conserv.</source> <volume>56</volume>:<fpage>125843</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jnc.2020.125843</pub-id></citation></ref>
<ref id="ref121"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>W&#x00E4;ldchen</surname> <given-names>J.</given-names></name> <name><surname>M&#x00E4;der</surname> <given-names>P.</given-names></name></person-group> (<year>2018</year>). <article-title>Machine learning for image based species identification</article-title>. <source>Methods Ecol. Evol.</source> <volume>9</volume>, <fpage>2216</fpage>&#x2013;<lpage>2225</lpage>. doi: <pub-id pub-id-type="doi">10.1111/2041-210X.13075</pub-id>, PMID: <pub-id pub-id-type="pmid">36533914</pub-id></citation></ref>
<ref id="ref122"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Walsh</surname> <given-names>R. P.</given-names></name> <name><surname>Newbery</surname> <given-names>D. M.</given-names></name></person-group> (<year>1999</year>). <article-title>The ecoclimatology of Danum, Sabah, in the context of the world&#x2019;s rainforest regions, with particular reference to dry periods and their impact</article-title>. <source>Philos. Trans. R. Soc. Lond. B Biol. Sci</source> <volume>354</volume>, <fpage>1869</fpage>&#x2013;<lpage>1883</lpage>. doi: <pub-id pub-id-type="doi">10.1098/rstb.1999.0528</pub-id>, PMID: <pub-id pub-id-type="pmid">11605629</pub-id></citation></ref>
<ref id="ref123"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wanelik</surname> <given-names>K. M.</given-names></name> <name><surname>Azis</surname> <given-names>A.</given-names></name> <name><surname>Cheyne</surname> <given-names>S. M.</given-names></name></person-group> (<year>2012</year>). <article-title>Note- , phrase- and song-specific acoustic variables contributing to the individuality of male duet song in the Bornean southern gibbon (<italic>Hylobates albibarbis</italic>)</article-title>. <source>Primates</source> <volume>54</volume>, <fpage>159</fpage>&#x2013;<lpage>170</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s10329-012-0338-y</pub-id></citation></ref>
<ref id="ref124"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Ye</surname> <given-names>J.</given-names></name> <name><surname>Borchers</surname> <given-names>D. L.</given-names></name></person-group> (<year>2022</year>). <article-title>Automated call detection for acoustic surveys with structured calls of varying length</article-title>. <source>Methods Ecol. Evolut.</source> <volume>13</volume>, <fpage>1552</fpage>&#x2013;<lpage>1567</lpage>. doi: <pub-id pub-id-type="doi">10.1111/2041-210X.13873</pub-id></citation></ref>
<ref id="ref125"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wiggins</surname> <given-names>S.</given-names></name></person-group> (<year>2003</year>). <article-title>Autonomous acoustic recording packages (ARPs) for long-term monitoring of whale sounds</article-title>. <source>Mar. Technol. Soc. J.</source> <volume>37</volume>, <fpage>13</fpage>&#x2013;<lpage>22</lpage>. doi: <pub-id pub-id-type="doi">10.4031/002533203787537375</pub-id></citation></ref>
<ref id="ref500"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wijers</surname> <given-names>M.</given-names></name> <name><surname>Loveridge</surname> <given-names>A.</given-names></name> <name><surname>Macdonald</surname> <given-names>D. W.</given-names></name> <name><surname>Markham</surname> <given-names>A.</given-names></name></person-group> (<year>2021</year>). <article-title>CARACAL: A versatile passive acoustic monitoring tool for wildlife research and conservation</article-title>. <source>Bioacoustics</source> <volume>30</volume>, <fpage>41</fpage>&#x2013;<lpage>57</lpage>.</citation></ref>
<ref id="ref126"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xuan</surname> <given-names>N.</given-names></name> <name><surname>Julien</surname> <given-names>V.</given-names></name> <name><surname>Wales</surname> <given-names>S.</given-names></name> <name><surname>Bailey</surname> <given-names>J.</given-names></name></person-group> (<year>2010</year>). <article-title>Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance</article-title>. <source>J. Mach. Learn. Res.</source> <volume>11</volume>, <fpage>2837</fpage>&#x2013;<lpage>2854</lpage>.</citation></ref>
<ref id="ref127"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zeppelzauer</surname> <given-names>M.</given-names></name> <name><surname>Hensman</surname> <given-names>S.</given-names></name> <name><surname>Stoeger</surname> <given-names>A. S.</given-names></name></person-group> (<year>2015</year>). <article-title>Towards an automated acoustic detection system for free-ranging elephants</article-title>. <source>Bioacoustics</source> <volume>24</volume>, <fpage>13</fpage>&#x2013;<lpage>29</lpage>. doi: <pub-id pub-id-type="doi">10.1080/09524622.2014.906321</pub-id>, PMID: <pub-id pub-id-type="pmid">25983398</pub-id></citation></ref>
<ref id="ref128"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zottesso</surname> <given-names>R. H. D.</given-names></name> <name><surname>Costa</surname> <given-names>Y. M. G.</given-names></name> <name><surname>Bertolini</surname> <given-names>D.</given-names></name> <name><surname>Oliveira</surname> <given-names>L. E. S.</given-names></name></person-group> (<year>2018</year>). <article-title>Bird species identification using spectrogram and dissimilarity approach</article-title>. <source>Eco. Inform.</source> <volume>48</volume>, <fpage>187</fpage>&#x2013;<lpage>197</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.ecoinf.2018.08.007</pub-id></citation></ref>
<ref id="ref129"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zwart</surname> <given-names>M. C.</given-names></name> <name><surname>Baker</surname> <given-names>A.</given-names></name> <name><surname>McGowan</surname> <given-names>P. J. K.</given-names></name> <name><surname>Whittingham</surname> <given-names>M. J.</given-names></name></person-group> (<year>2014</year>). <article-title>The use of automated bioacoustic recorders to replace human wildlife surveys: an example using nightjars</article-title>. <source>PLoS One</source> <volume>9</volume>:<fpage>e102770</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0102770</pub-id>, PMID: <pub-id pub-id-type="pmid">25029035</pub-id></citation></ref></ref-list><fn-group>
<fn id="fn0004"><p><sup>1</sup><ext-link xlink:href="https://github.com/DenaJGibbon/Workflow-for-automated-detection-and-classification-gibbon-calls" ext-link-type="uri">https://github.com/DenaJGibbon/Workflow-for-automated-detection-and-classification-gibbon-calls</ext-link></p></fn>
</fn-group>
</back>
</article>