AUTHOR=Hammoud Mohammed , Getahun Melaku N. , Baldycheva Anna , Somov Andrey 

TITLE=Machine learning-based infant crying interpretation

JOURNAL=Frontiers in Artificial Intelligence

VOLUME=Volume 7 - 2024

YEAR=2024

URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2024.1337356

DOI=10.3389/frai.2024.1337356

ISSN=2624-8212

ABSTRACT=Crying is an inevitable character throughout the growth of infants, with conditions such as parents around them being understandable or the opposite being the case. This cry can be treated as an audio signal that carries a message about the infant's state, such as discomfort, hunger, and sickness. Traditional ways of understanding these feelings are required by the primary infant caregiver. Failing to understand them correctly can cause a severe problem. Various methods attempt to solve this problem; however, proper audio feature representation and classifiers are necessary for better results. This research uses time-, frequency-, and timefrequency-domain feature representations to gain in-depth information about the data. The timedomain features include Zero-crossing rate (ZCR) and Root Mean Square (RMS); the frequencydomain feature includes Mel-spectrogram; and the time-frequency-domain feature includes Mel-frequency Cepstral coefficients (MFCC). Moreover, Time series imagining algorithms are applied to transform 20-MFCC features into images using different algorithms: Gramian Angular Difference Fields, Gramian Angular Summation Fields, Markov Transition Fields, Recurrence plots, and RGB GAF. Then, these features are provided to different Machine Learning classifiers, such as Decision tree, Random forest, K nearest neighbors, and bagging. Using MFCC, ZCR, and RMS as features achieved high performance, outperforming State of the art (SOTA). Optimal parameters are found via the grid search method using 10-fold cross-validation, Our approach MFCC-based Random forest (RF) classifier achieved an accuracy of 96.39%, outperforming SOTA, the scalogram-based shuffleNet classifier, with an accuracy of 95.17%.