Associations Between Neonatal Cry Acoustics and Visual Attention During the First Year

It has been suggested that early cry parameters are connected to later cognitive abilities. The present study is the first to investigate whether the acoustic features of infant cry are associated with cognitive development already during the first year, as measured by oculomotor orienting and attention disengagement. Cry sounds for acoustic analyses (fundamental frequency; F0) were recorded in two neonatal cohorts at the age of 0–8 days (Tampere, Finland) or at 6 weeks (Cape Town, South Africa). Eye tracking was used to measure oculomotor orienting to peripheral visual stimuli and attention disengagement from central stimuli at 8 months (Tampere) or at 6 months (Cape Town) of age. Only a marginal positive correlation between fundamental frequency of cry (F0) and visual attention disengagement was observed in the Tampere cohort, but not in the Cape Town cohort. This correlation indicated that infants from the Tampere cohort with a higher neonatal F0 were marginally slower to shift their gaze away from the central stimulus to the peripheral stimulus. No associations between F0 and oculomotor orienting were observed in either cohort. We discuss possible factors influencing the current pattern of results suggesting a lack of replicable associations between neonatal cry and visual attention and suggest directions for future research investigating the potential of early cry analysis in predicting later cognitive development.


INTRODUCTION
Monitoring children's cognitive development is an important part of pediatric practice. The assessment of cognitive development during the neonatal period and early infancy is faced with multiple challenges, such as age-related changes in how different cognitive functions can be measured. For example, motor abilities are emphasized in most early measures, but their relation to cognitive functioning may vary at different ages, which may weaken the predictive validity of early assessments of cognitive development (McCall, 1979;Colombo, 1993). These challenges make research on the early markers of later cognitive functioning important.
The current study is focused on the potential connections between the acoustic characteristics of infant cry and visual attention across the first year of life. Cry is a vital, reflex-like means of communication present from the very beginning of life (Newman, 2007) and it indicates the neurological status of the neonate (Lester, 1987;LaGasse et al., 2005). Variations in visual attention during the first year, on the other hand, have been shown to be associated with later cognitive development (Dougherty and Haith, 1997;Colombo, 2002;Rose et al., 2003Rose et al., , 2012, thus providing an outcome measure that emerges early in development and is meaningfully associated with more complex and later-emerging cognitive functions. As both the acoustic characteristics of cry and visual attention can be measured reliably at an early age, examining their interrelations may help to determine the potential of neonatal cry analysis as an accessible, complementary marker of later development. A limited number of small-scale studies are available to suggest that early cry parameters are associated with later cognitive abilities. Lester (1987) observed that preterm and term infants having high average fundamental frequency of cry (F0; heard as the pitch of the cry) and larger variability in F0 within 2 weeks of term postconceptional age were more likely located in the group with lower scores on the Bayley Scales of Infant Development at 18 months of age compared to infants with lower average F0 and less variable F0. In addition, high variation in neonatal F0 predicted lower scores in the McCarthy Scales of Children's Abilities at 5 years of age. Similarly, studying infants prenatally exposed to methadone, Huntington et al. (1990) found that higher and more variable F0 in the first days of life were associated with lower mental development scores measured by the Bayley scales at 8, 12, and 18 months, and marginally at 24 months. Both Lester (1987) and Huntington et al. (1990) observed that the relation between the F0 variables (mean and variation of F0 across cry segments) and the cognitive development outcomes were not affected by risk factors including prematurity and prenatal exposure to drugs. Other studies have observed that the acoustic characteristics of cry in children with neurodevelopmental disorders such as CNS abnormalities and developmental delays (for reviews, see Michelsson and Michelsson, 1999;LaGasse et al., 2005) and in children with autism spectrum disorders (Sheinkopf et al., 2012;Esposito et al., 2014) are associated with an atypical and more unstable F0. The similarities in the findings of studies investigating at-risk infants, as well as the scarce findings among typically developing infants emphasize the importance of further research on the connections between the acoustic features of neonatal cry and cognitive functioning during the first year of life among typically developing infants.
Regarding the neural basis of variation in neonatal cry, it has been suggested that imperfect functioning of the vagus nerve and inhibitory parasympathetic nervous system activity are reflected in infant cry as higher and more variable F0 (Lester, 1976;Porter et al., 1988). Neonatal cry requires integration of the respiratory and larynx area organs, which are suggested to be controlled in early, reflex-like crying exclusively by the brainstem. No areas higher than the midbrain are required for producing cry during the first months of life (Newman, 2007). Consequently, the integrity of the brainstem is suggested to influence the acoustic features of cry (Lester, 1987;Vohr et al., 1989) and the vagus nerve of the parasympathetic nervous system is considered as a crucial pathway through which the brainstem regulates neonatal and infant cry (Lester and Dreher, 1989;LaGasse et al., 2005;Newman, 2007). The vagus nerve originates from the medulla oblongata in the brainstem and is densely connected with the vocal cords and respiratory muscles, on which it has an inhibitory function. Interestingly, neonatal F0 has been shown to vary as a function of cardiac vagal tone (Shinya et al., 2016). Thus, imperfect inhibitory parasympathetic control of the cry organs is likely reflected in higher and more variable infant F0. These findings highlight the role of the brainstem and vagal system in affecting neonatal cry acoustics.
In addition to influencing infant cry production, integrity of the brainstem may have a pivotal impact on visual attention and the processing of visual information (Colombo, 2001;Hunnius and Geuze, 2004;Geva and Feldman, 2008;Geva et al., 2014). Attention is typically modulated by arousal state and it has been observed that infants with brainstem dysfunction show atypical modulation of attention by arousal. Healthy neonates preferred familiar visual stimuli during increased arousal and novel stimuli during decreased arousal, whereas this pattern was less pronounced in neonates with brainstem dysfunction (Gardner et al., 2003). Karmel et al. (1996) observed that a similar atypical arousal-modulated attention pattern persists at least until 4 months of age in infants with brainstem dysfunction. In addition, sensitivity to social and non-social content of visual stimuli is associated with brainstem integrity. Geva et al. (2017) found that children with discrete neonatal brainstem dysfunction preferred an active nonsocial content (a flock of birds flying) or a neutral social content (a person listening) over an active social content (a person expressing affective narratives). These findings were opposite to the pattern of attention in typically developing children and the difference remained even in a follow up at 8 years of age.
Thus, it appears that neonatal cry and infant visual attention may share partially overlapping neural mechanisms, with particularly the brainstem implicated in both. In the current study, we therefore investigated the associations between infant cry acoustics (F0 and its variation) during the neonatal period (0-6 weeks) and visual attention at 6-8 months of age. As the indicator of infant visual attention, we measured oculomotor orienting and attention disengagement, both operationalized as saccadic reaction times in separate eye tracking paradigms.
The ability to produce rapid shifts of gaze to stimuli appearing in a new spatial location, typically measured as saccadic reaction time, matures in early infancy and its variations correlate with later cognitive development (Dougherty and Haith, 1997). For example, infants with longer latency of saccades to peripheral target stimuli were shown to score lower on standardized IQ measures at 4 years of age (Dougherty and Haith, 1997). In addition, saccadic reaction times have been shown to be affected by neurodevelopmental risk factors such as prenatal alcohol exposure (Green et al., 2013), preterm birth (Hunnius et al., 2008) and familial risk for autism spectrum disorder (Elsabbagh et al., 2013). Hence, studying infant visual attention with paradigms measuring saccadic reaction times may provide applicable markers for examining neurocognitive development in a variety of populations.
We used two infant attention paradigms for measuring saccadic reaction times. First, oculomotor orienting was measured as saccadic reaction time to a new peripheral stimulus Frontiers in Psychology | www.frontiersin.org following the offset of a stimulus on the center of the screen. Second, we measured disengagement of attention as the duration of gaze shift from centrally presented face and non-face stimuli to laterally presented competing stimuli. Particularly oculomotor orienting has been associated with later cognitive development, with faster saccadic reaction times in infancy predicting higher childhood IQ (Dougherty and Haith, 1997;Rose et al., 2012). Attention disengagement paradigms have been especially used for investigating infants' attention to social stimuli such as faces (Peltola et al., 2008) with recent findings indicating that variations in infants' attention to faces in this paradigm are associated with later social development . Furthermore, given that oculomotor orienting and attention disengagement appear to mature at different speed during the first year (with attention disengagement maturing later) (Hood and Atkinson, 1993;Matsuzawa and Shimojo, 1997;Colombo, 2001) and they rely on different neural mechanisms (Rafal and Robertson, 1995;Colombo, 2001) we considered it important to include both measures of infant attention in the current study.
Taken together, as infant cry and visual attention may have a partially overlapping neural basis and as the basic functions of infant visual attention may be important in understanding later-emerging more complex cognitive and social abilities, infant visual attention is an ideal outcome measure for attempts to determine whether the acoustic features of newborn cry are associated with normative developmental variations in basic cognitive functions.
Due to a lack of previous studies investigating associations between cry acoustics and visual attention in infancy, the nature of our study was explorative and we refrained from stating strong directional hypotheses. Nevertheless, given that both F0 (Lester, 1987;Huntington et al., 1990) and saccadic reaction times (Dougherty and Haith, 1997;Rose et al., 2012) have been associated with childhood cognitive development, exploring their potential associations during early development is valuable. To assess whether the results generalize across different populations, we implemented the analysis across two independent samples of infants from Tampere, Finland and Cape Town, South Africa.

Ethics Statement
The study adhered to ethical guidelines and the study protocol was approved by the Ethical Committee of Tampere University Hospital and, on behalf of Cape Town participants, by the Institutional Review Board of Stellenbosch University. Infants were included in the study after a written consent from their parent was obtained.

Participants
Data for the current analyses were obtained from two cohorts, one in Tampere, Finland, and the other in Cape Town, South Africa. Our interest was to study the associations between acoustic variables of cry during the neonatal period (Tampere: 0-8 days after birth; Cape Town: 6 weeks of age) and infant visual attention measured by eye tracking methods (Tampere: 8 months; Cape Town: 6 months of age).

Tampere Cohort
Participants in the Tampere cohort were enrolled in the study at birth (0-8 days; mean = 2.33 days, SD = 1.32 days, 41.6% females) and subsequently invited for a follow-up visit at 8 months of age (mean = 253.58 days, SD = 12.34 days). A total of 104 neonates were recruited from the Tampere University Hospital Maternity Ward Units and the Neonatal Ward Unit. Infants were considered eligible to be enrolled in the study if the infant did not have any kind of abnormalities in the neck area, nor a history of recent airway suctioning, which may influence the voice, and if one of the parents understood Finnish or English sufficiently to be able to be adequately informed about the study.
From the original pool of 104 infants, six infants did not cry when the researcher was present, hence cry recording was captured from 98 infants. Out of these, 73 infants attended the eye tracking follow-up visit at 8 months. Eye tracking data were not obtained from five infants because of technical problems and from four infants because of fussiness. Two infants were further excluded because of not being full-term (≥37 weeks), three because of low birth weight (≥2500 g), two because of perinatal asphyxia, and one because of an abnormal brain MRI finding.
From this sample of 56 infants, two infants needed to be excluded because of the inclusion criteria of the cry analysis. At least three expiratory cry utterances from the beginning of the recording had to be present for analyzing the F0. For the analyses of oculomotor orienting, two infants were excluded for having less than four trials with good gaze data for calculating oculomotor orienting. For the analyses of attention disengagement, eight infants were excluded for having less than three scorable trials in one of the stimulus conditions (i.e., a non-face control stimulus and happy and fearful face stimuli).
Consequently, 52 infants (42.3% females) and 46 infants (41.3% females) were included in the oculomotor orienting and attention disengagement analyses, respectively. The included infants did not have known health or developmental disadvantages nor known prenatal mother-related risk factors, such as drug abuse, untreated diabetes or high blood pressure. Data about health, risk factors, and medical therapies were collected from the medical records of the hospital.
In addition to the primary analyses, we ran additional analyses including only those infants whose cry was captured from the beginning of a cry bout (see below for details). This was done to enable more direct comparisons with the Cape Town cohort in which the cry recordings were always captured from the beginning of the cry bout. Thus, this analysis controlled the possible influence of the location of the cry sample in the recording. For these analyses, 29 and 28 infants were included in the oculomotor orienting and attention disengagement analyses, respectively.

Cape Town Cohort
The data for the Cape Town cohort was obtained from a larger research project in collaboration with the Department of Psychiatry at the Stellenbosch University. Participants were enrolled in this study at the age of 6 weeks (mean = 44.73 days, SD = 3.34 days, 32% females) and they participated in the followup registration at 6 months (mean = 187.85 days, SD = 13.84 days) of age. Similarly to the Tampere cohort, the infants did not have any kind of abnormalities in the neck area, nor a history of recent airway suctioning, which may influence the voice. Likewise, the other exclusion criteria were concordant with the Tampere cohort.
Cry recordings were obtained from a total of 100 infants. The follow-up assessment was completed by 73 infants. Of these, five infants needed to be excluded because of the age criteria at cry recording and four because there were less than three expiratory cry utterances in the recording. Three infants were excluded for not being full-term (≥37 weeks). Birthweight data was not available for this cohort, and none of the infants had perinatal asphyxia or a brain MRI finding. The infants did not have other developmental or health problems. Considering the inclusion criteria of mother-related risk factors, we excluded 6 infants due to prenatal alcohol use. For the eye tracking data, identical inclusion criteria were used as in the Tampere cohort. Consequently, 47 infants (38.3% females) and 53 infants (35.8% females) were included in the oculomotor orienting and disengagement analyses, respectively.
Within the cohort in the primary analysis, there were 14 infants exposed to prenatal selective serotonin reuptake inhibitor (SSRI) medication. Therefore, to control the possible influence of SSRI medication, additional analysis were conducted after excluding these participants, resulting in 37 and 42 infants in the oculomotor orienting and attention disengagement analyses, respectively. In both analyses a proportion of mothers had psychiatric problems such as depression and anxiety disorders. This was due to the interests of the larger research project from which the data were obtained.

Cry Recordings in Tampere
We recorded one cry sample from each infant in normal clinical settings at the age of 0 to 8 days. During the recording, infants were awake and laying on supine position. The location of the captured sample in the cry bout varied since we recorded spontaneous cries which were not launched by a distinct painful trigger, i.e., some cry sounds were captured only after the cry had already started. Soothing was performed by the parent whenever needed.
The duration of the samples in the whole data ranged from 14.4 to 430.5 s (mean = 114.2 s; SD = 81.7). The recording microphone was a Tascam DR-100MK Linear PCM Recorder, Røde M3 cardioid microphone. The samples were stored with a 48-kHz sampling rate, in a 24-bit Waveform Audio file (WAV) format. The microphone was at a stand at 30-cm distance directly in front of the infant's mouth.

Cry Recordings in Cape Town
Recording of the cry was done as a part of a standardized protocol at a routine 6-week baby clinic visit to the nurse. The room was always the same and the cry was elicited by vaccination or measuring the infant's weight on a scale. During the vaccination, the infant was awake and laying on the mother's lap. Soothing was performed in the way the mother preferred. We captured one cry sample at the beginning of the cry bout from every infant.
The duration of the samples in the whole data ranged from 16.9 to 150 s (mean = 80.7 s; SD = 18.8 s). The audio recorder used was a Zoom H4n recorder with built-in condenser microphones. Recordings were stored with a 48-kHz sampling rate, twochannel audio in a 24-bit Waveform Audio file (WAV) format. The distance between the infant's mouth and the microphone was 1.3 m for infants being vaccinated and 70 cm for infants being weighed. The microphone was fixed on a wall stand.

Acoustic Variables
As the cry samples were captured in clinical settings containing varying amounts of extraneous sounds, the first task was to separate the cry vocalizations from these extraneous sounds which are not useful for the purpose of acoustic analysis. We used an in-house hidden Markov model (HMM) (Schuster-Böckler and Bateman, 2007) based audio segmentation system (Naithani et al., 2018) for cry segment extraction, and the acoustic analysis was performed in MATLAB (Mathworks, Natick, MA, United States). The cry data was divided to three acoustic regions; expiratory phases, inspiratory phases, and residual, where residual included all the vocalizations that are not expiratory or inspiratory cries. We included only expiratory phases in the analysis, based on the frequent praxis in earlier studies as well as the results of our preliminary study showing that expiratory cries were detected more reliably than inspiratory cries by the analysis tool (Naithani et al., 2018).
In the analyses, we used the first five expiratory phases from the beginning of the captured cry sample. A small amount (5.1%) of recordings had less than five consecutive expiratory cry utterances due to the infant producing less than five utterances or the audio segmentation algorithm failing to correctly capture very short expiratory phases. In cases with less than five but a minimum of three expiratory phases, the mean of the available scorable expiratory phases was used to calculate the acoustic variables. Only phases with duration longer than 500 ms were included, in line with previous studies (Reggiannini et al., 2013;Fuamenya et al., 2015). The identified expiratory phases were then divided into overlapping frames of 25-ms duration and with a 12.5-ms hoplength, which means that there is a 50 % overlap between consecutive frames. The YIN algorithm (De Cheveigné and Kawahara, 2002), an autocorrelation based method, was used to extract a fundamental frequency estimate, F0 fr , and, a measure of aperiodicity, Ap fr , for each of these frames. The latter can be used to remove unreliable F0 fr estimates using an appropriate threshold. The freely available MATLAB implementation available at http://audition.ens.fr/adc/sw/yin.zip was used for the analysis. In this study we discarded values of F0 fr for which is Ap fr was greater than 0.3. The other YIN parameters used were: F0 fr min , the minimum expected fundamental frequency (here 200 Hz); F0 fr max , the maximum expected fundamental frequency (here 700 Hz); and F0 fr thresh , the proportion of aperiodic power tolerated within each frame (here 0.3). These parameters need to be specified in order to avoid too many octave errors in the estimation. The output of fundamental frequency estimation is a sequence of F0 fr values for each expiratory segment. Cry phases with more than 70% inharmonic frames (selected via Ap fr > 0.3, as described above) were excluded (for more information, see Fuamenya et al., 2015). Finally, we computed the mean and standard deviation of fundamental frequency estimates within each expiratory segment. In line with earlier studies (Michelsson and Michelsson, 1999;Rothgänger, 2003), the mean of the means of first five expiratory segments was defined as F0 and the mean of standard deviations of first five expiratory segments was defined as F0var. F0 and F0var were used as acoustic features for further analysis.

Eye Tracking Assessments in Tampere
In the 8-month follow-up visit, the infants participated in an eye tracking assessment of saccadic reaction times measured during separate oculomotor orienting and attention disengagement paradigms. During the assessment, the infant sat on the parent's lap in a dimly lit room watching the test stimuli on a 23-in monitor located at 60 ± 5 cm distance from the infant's eyes. The parent was asked to close her/his eyes during the recording to avoid false registrations (Gredebäck et al., 2010). The session was paused or terminated in case the infant expressed inconvenience or was not willing to co-operate. Eye tracking data were collected by using a Tobii TX300 remote eye tracking camera (Tobii Technology, Stockholm, Sweden) with a 300-Hz sampling rate.
Before recording the data, the eye tracker was calibrated by using a 5-point calibration script and Tobii SDK calibration algorithms. We repeated the calibration two times if one or more calibration points were missing or inadequate. We used the final calibration in case none of the calibrations produced an adequate result.
The tasks (i.e., oculomotor orienting and attention disengagement) were both divided in two blocks that were presented in alternate order starting with the first block of the oculomotor orienting task, followed by the first disengagement block, and ending with the second oculomotor orienting and disengagement blocks. Both oculomotor orienting blocks included 16 trials and the disengagement blocks included 24 trials (i.e., a total of 32 and 48 trials in the oculomotor orienting and disengagement tasks, respectively). Cheerful music with smiling children was played between the blocks. Stimulus presentation was controlled by an open-source software written in Python programming language 1 , together with PsychoPy (release 1.82.01) functions and a Tobii SDK plug-in. For both tasks, all data processing from the raw x-y gaze position coordinates to the parameters reflecting oculomotor orienting and disengagement were completed automatically using gazeAnalysisLib 1_05, a library of MATLAB routines for offline analysis of raw gaze data (Leppänen et al., 2015).

Oculomotor Orienting
Each trial in the oculomotor orienting task started with a 1000ms blank screen followed by an attention-grabbing animation presented on the center of the screen. The animation was a checkerboard pattern measuring 3 • ×3 • of visual angle and it rotated into its mirror image after each 250 ms. The pattern remained visible at least 250 ms until the infant looked at it. 1 https://github.com/infant-cognition-tampere/drop Using a gaze-contingent stimulus presentation, when the infant looked at the central cue, the cue disappeared and the target (a checkerboard pattern) appeared randomly in one of the four corners of the monitor, with a distance of 19 • from the central stimulus. The target remained visible 750 ms after the infant's point of gaze was first recorded in the target area (i.e., a raw xy coordinate was detected inside a 10.9 • ×9.5 • area surrounding the target stimulus). After that, a picture of a cartoon image or a toy combined by a joyful audio stimulus was presented on the center of the screen as a rewarding stimulus. The eye tracking data was analyzed with the gazeAnalysisLib (Leppänen et al., 2015) to determine the latency of infants' orienting response (saccadic reaction time) from the central to the target stimulus. Gaze shifts away from the screen and reaction times longer than 1000 ms were excluded.
Infants with 4 or more scorable oculomotor orienting trials were included. The criteria for included trials were: adequate fixation on the central stimulus (i.e., >70% of the time) during the time preceding a gaze shift, sufficient amount of non-missing data samples (i.e., no gaps longer than 200 ms), and valid information about the time of the gaze transition from the central to the peripheral stimulus (i.e., the transition did not occur during a period of missing data). Trials with gaze shifts within a period that started 150 ms after the onset of the peripheral stimulus and ended 1000 ms after the lateral stimulus onset were used to calculate mean oculomotor orienting latency for each participant (Leppänen et al., 2015).

Attention Disengagement
Attention disengagement was measured with a gaze-contingent Overlap paradigm, which has been commonly used for measuring infant attention disengagement from various stimuli (e.g., Peltola et al., 2008Peltola et al., , 2018. First, a fixation point was presented on the center of the screen until the infant looked at it for 500 ms. After that, a happy or a fearful face, or a non-face stimulus (a phase-scrambled rectangle containing the amplitude and color spectra of the original face stimuli) appeared on the center of the screen for 1000 ms. A distractor stimulus (a vertical checkerboard pattern) was then presented on the left or right side of the screen parallel to the central stimulus for 500 ms. At 250 ms the distractor stimulus changed to its mirror image. After 500 ms, the distractor stimulus disappeared and the central stimulus remained on the screen for 1000 ms, followed by a 1000-ms blank screen before the next trial. The central stimulus conditions (happy or fearful face, or the non-face stimulus) were presented in random order with the restriction that the same stimulus condition was not repeated more than three times in a row.
The corneal-reflection eye tracker measured the reaction times for gaze shifts between the central stimulus and the distractor stimulus. A dwell time index reflecting attention disengagement was calculated if there were at least three scorable trials in each stimulus condition. We calculated the dwell time index using the trials with a gaze shift and trials without a gaze shift, excluding non-scorable trials, and the time interval from the shortest to the longest acceptable saccadic reaction time. Trials with insufficient fixation on the central stimulus (i.e., >70% of the time) during the time preceding a gaze shift or the end of the analysis period, >200 ms gaps of valid gaze data, or invalid information about the gaze shift (i.e., with the eye movement occurring during a period of missing gaze data) were excluded from the analyses. For the included trials, attention dwell time on the central stimulus was determined for the period starting 150 ms from the onset of the distractor stimulus and ending 1000 ms after distractor onset. The duration was then converted to a normalized dwell time index score by using the following formula: where x is the time point of the saccadic eye movement (i.e., last gaze point in the area of the central stimulus preceding a saccade toward the distractor stimulus) and n is the number of scorable trials in a given stimulus condition. In this index, the shortest acceptable saccadic reaction time (150 ms) results in 0, and the longest possible saccadic reaction time (or lack of saccade, which is equal to the last measured time point of the central stimulus at 1000 ms) results in a score of 1 (Leppänen et al., 2015). The dwell time index was calculated as the mean of the stimulus conditions (happy or fearful face, or the non-face stimulus) from the accepted trials, and used as the dependent variable in the analyses of attention disengagement. The inclusion criterion was a minimum of three acceptable trials in each stimulus condition.

Eye Tracking Assessments in Cape Town
Similar to the assessments in Tampere, infants were assessed in a dimly lit room at a private health clinic or a state hospital. A custom MATLAB script (Mathworks, Natick, MA, United States), running on OS X (Apple Inc., Cupertino, CA, United States), and interfacing via Psychtoolbox and Tobii SDK plug-in with the eye tracking system was used for stimulus presentation and data collection (see Pyykkö et al., 2019 for full description of the tests). Eye tracking data were collected by using a Tobii X2-60 or TX60 screen-based eye tracking system (Tobii Technology, Stockholm, Sweden). The procedure for eye tracker calibration was similar to that used in the study in Tampere.
Oculomotor orienting was measured with a visual search task (adapted from Kaldy et al., 2011, anddescribed in detail in Pyykkö et al., 2019). In this task, an image of a red apple (5 • visual angle) was presented on the center of the screen, accompanied by an oh sound. After the infant had looked at the stimulus and 2000 ms had elapsed (or a maximum wait period of 4000 ms had elapsed), the infant saw a blank screen for 500 ms, followed by the re-appearance of the apple in a randomly chosen location. The apple was presented alone (one-object condition), among four or eight identical distractors (multiple-objects condition), or among four to eight different types of distractors (conjunction condition). There was a total of eight trials per condition. To obtain a similar measure for oculomotor orienting as in the Tampere cohort, we extracted orienting latencies from the data collected in the oneobject condition, and averaged these latencies for each child to obtain a measure of oculomotor orienting.
In the disengagement task, infants saw two stimuli with a 1000 ms onset asynchrony. The first was a picture of a non-face pattern (as above) or a face of a South African female expressing happiness or fear. The ethnicity of the model matched that of the child. The second stimulus was a geometric shape (black and white circles or a checkerboard pattern) laterally on the left or right side of the screen, which was superimposed by a still picture of the first frame of a child-friendly cartoon animation. When the infant shifted gaze to the lateral image or 2000 ms had elapsed, the still picture turned into a dynamic cartoon for 4000 ms. Infants saw a total of 16 trials with non-faces and 16 with faces. As above, the dwell time indices were calculated separately for each of the stimulus conditions by using an automated script.

Statistical Analyses
We performed the statistical analyses with SPSS version 25 (IBM Corp., Armonk, NY, United States). The alpha level for statistical significance was set at 0.05. All the variables were normally distributed according to Kolmogorov-Smirnov tests. Potential differences between the cohorts in the main study variables were explored with independent sample t-tests. We studied the associations between infant cry and later visual attention with Pearson's two-tailed correlation coefficients. Due to the exploratory nature of the study, no correction for multiple comparisons was applied to the significance thresholds. In the Tampere cohort, we conducted additional analyses to control for the location of the cry sample in the cry bout by including only those infants whose cry sequence was captured from the beginning of a cry bout. In the Cape Town cohort, we conducted an additional analysis to control for the possible influence of prenatal SSRI medication by including only those infants without prenatal SSRI exposure. In the Tampere cohort there were no infants with SSRI exposure and in the Cape Town cohort all the cry samples were captured from the beginning of a cry bout.

RESULTS
Descriptive data on the study variables are presented in Table 1. Preliminary analyses showed that the while the cohorts did not differ in the mean F0 values, t(111) = −1.14, p = 0.26, d = 0.22, the F0var values were somewhat larger in the Tampere cohort, t(111) = 2.35, p = 0.02, d = 0.45. Moreover, the infants in the Tampere cohort showed longer dwell times in the attention disengagement paradigm, t(97) = 5.41, p < 0.001, d = 1.10, and longer oculomotor orienting latencies, t(97) = 6.21, p < 0.001, d = 1.26. As can be observed from Table 2, the analysis within the Tampere cohort showed a marginal positive correlation between F0 and attention disengagement. When F0 was higher, the dwell times were longer, i.e., the infants were slower to shift gaze away from the central stimulus to the distractor stimulus. A similar but a larger correlation was observed in the additional analysis including only those infants whose cry sequence was captured from the beginning of a cry bout. However, no evidence of a similar correlation between F0 and attention disengagement was found in the Cape Town cohort, neither in the primary analyses nor in the analysis where infants with exposure to prenatal SSRI medication were excluded. There were no statistically significant correlations between F0 and oculomotor orienting in either cohort. Finally, there were no statistically significant correlations between F0 variability and either of the attentional outcome variables. The correlations are presented in Table 2. The values present the mean and standard deviation (in parentheses). a Mean of the F0means of the first five expiratory cry phases. b Mean of the F0 standard deviations of the first five expiratory cry phases. c Mean of dwell time indexes for the control stimulus, happy face, and fearful face. d Oculomotor orienting latency (in milliseconds).
TABLE 2 | Correlations (Pearson's r) between the F0 parameters and attention outcomes in the two cohorts.

Oculomotor orienting Attention disengagement
Tampere

DISCUSSION
The present study is the first to explore the relations between infant cry acoustics and visual attention during the first year. It integrates two previously separate research lines of infant cry analysis and eye tracking based measurement of infant visual attention. Variations in saccadic reaction times during infancy have been suggested to indicate the development of more advanced cognitive functions (Dougherty and Haith, 1997). Motivated by previous research indicating that the acoustic features of neonatal cry predict cognitive development, we explored whether the fundamental frequency (F0) of neonatal cry would be associated with visual attention, measured with eye tracking in later infancy.
For the most part, the results did not indicate that the F0 of neonatal cry is associated with visual attention at 6 to 8 months of age. The only exception was a marginal association between higher neonatal F0 and slower attention disengagement from central stimuli to peripheral distractor stimuli at 8 months of age in the Tampere cohort. This result held when we controlled for the location of the analyzed cry sample within the cry bout, with a larger correlation observed when the cry was captured from the beginning of the cry bout. A similar association was not, however, observed in the Cape Town cohort where the cry recording was conducted at 6 weeks and attention disengagement was measured at 6 months. Furthermore, oculomotor orienting turned out to have no correlation with F0 in either sample. Finally, the variation of F0 did not correlate with the eye tracking outcomes in any of the analyses.
The lack of correlation between neonate cry and oculomotor orienting in the current cohorts could be due to the early maturation of oculomotor orienting, which could result in rather low variability in saccadic reaction times in infants from typically developed cohorts at 6 and 8 months. Both F0 and oculomotor orienting have been indicated to reflect later cognitive functions, and the brainstem and vagal nerve to have important roles in their regulation. A distinctive difference between the two attention measurements, i.e., oculomotor orienting and disengagement, is that in the disengagement task, attention needs to be disengaged from a stimulus at fixation, whereas in the oculomotor orienting task, shifts of attention indicate a reaction to a new stimulus appearing on a blank screen. This difference may render the disengagement task cognitively more demanding. Matsuzawa and Shimojo (1997) observed significant differences in the maturation of these two components of attention. Disengagement times decreased markedly from 2 1 /2 months to 6 months of age, and changed only little from 6 months up to 1 year. In contrast, oculomotor orienting seemed to have matured before the first measurement at 2 1 /2 months and showed only minor changes within the studied age period. Additionally, saccadic reaction times in the disengagement task at 2 1 /2 months were approximately twice as long compared to saccadic reaction times in the oculomotor orienting task, and approached the speed of oculomotor orienting at 6 months. Thus, differences in the maturational timing of the two components of attention may have influenced our results, considering that the eye tracking data were registered at 6 or 8 months. Future studies may benefit from measuring oculomotor orienting at an earlier age.
Nevertheless, the difference between the maturation of oculomotor orienting and disengagement does not explain why the correlations between F0 and attention disengagement were seemingly larger in the Tampere cohort than in the Cape Town cohort. Although caution should be warranted in comparing the correlations across cohorts as there was no correction for multiple comparisons due to the exploratory nature of the study, we consider potential reasons for the lack of correlation between F0 and attention disengagement in the Cape Town cohort. We will first pay attention to the age differences at the time of cry recording between the cohorts. Secondly, we discuss the potential influence of the infant's early environment on the observed pattern of results.
There is a consensus that early cry reflects the development of the central nervous system (Lester, 1987;LaGasse et al., 2005). The regulation of cry goes through prominent changes within the first months of life. Neonatal cry is reflex-like, regulated by the brainstem and not dependent on higher cortical areas, the influence of which increase with age (Newman, 2007). Due to the intense early development of cry regulation, it is thus possible that the variation in cry acoustics at 6 weeks of age does not reflect activity of areas important for visual attention (e.g., brainstem and vagal nerve) to the same extent as cry captured at the neonatal age. F0 decreases in the first days and weeks, likely due to the growth of vocal cords, followed by an increase of F0, which is suggested to relate to progressive control of vocalizations due to neurological maturation. However, the exact time course of this process is not clear (Gilbert and Robb, 1996;Baeck and de Souza, 2007). Consequently, the prominent changes in the regulation of cry as well as normal age-related changes in F0 during the first weeks and months of life highlight the importance for standardizing the age of recording data for cry analysis. For future studies, we suggest recording cry samples at the neonatal period, as this will also make the data comparable to the earlier studies showing associations between neonatal F0 and later cognitive development (Huntington et al., 1990).
Likewise, interaction with the social and physical environment has the potential to influence infants' cognitive development and cry acoustics. Studies with neonates have shown that the amount and the acoustic qualities of cry may vary as a function of caregiver-infant interaction (Belsky et al., 1984;Lotem and Winkler, 2004), with tactile stimulation and reinforcement learning playing a significant role in the variation of early cry (Cecchini et al., 2007). Consequently, infants learn to adjust their cry to the most effective level according to situational factors. In the current study, the age difference between the cry recordings in Tampere (0-8 days) and Cape Town (6 weeks) may be considered a significant timespan in early postnatal development. Additionally, a proportion of Cape Town mothers had psychiatric problems such as depression and anxiety disorders, which are known to influence mother-child interaction and the development of the child (Field, 2010), and may have further influenced the results. Thus, in future studies, in addition to standardizing the age of cry recordings, measures of early interaction quality are likely important.
The other cry feature that has been associated with later development in previous studies, i.e., variation of fundamental frequency (F0var), was not related to the eye tracking outcomes. Studies have observed more varying F0 among preterm and/or low birthweight infants, infants with prenatal risk factors, as well as infants with severe medical or developmental abnormalities (Lester, 1987;Huntington et al., 1990;Michelsson and Michelsson, 1999;LaGasse et al., 2005). While it is possible that the absence of associations between F0 variation and the outcomes was due to the infants in our study being full term, with normal birth weight, and without any known health or developmental problems, it should be noted that two earlier studies (Lester, 1987;Huntington et al., 1990) observed that more variable F0 was related to lower scores in cognitive outcome measures at several ages even after controlling for risk factors including prematurity and prenatal exposure to drugs. As there is currently little methodological consensus about the determination of F0var, an important line for future studies is to examine the predictive significance of different possibilities in determining F0var, such as index variables or micro-variation measures, e.g., fluctuation of F0 across time by analyzing time series data (for further discussion of the methodological variability in infant cry research, see Gabrieli et al., 2019).
The influence of the trigger of the cry and the location of the analyzed cry sample in the cry bout need to be considered further. Our additional analysis in the Tampere cohort suggests that the correlation between F0 and attention disengagement is stronger when the cry is captured from the beginning of the cry bout, although it should be noted that the statistical power of this analysis was reduced due to the smaller number of infants included in this analysis. In a cry triggered by a sudden pain, the F0 is high in the beginning and decreases by time. In contrast, in an inconvenience cry (e.g., hunger, cold, or loneliness) the F0 is lower in the beginning and increases by time if the response to the cry is delayed (Zeskind et al., 1985;Green et al., 1998). In our study the trigger in the Cape Town cohort was either a vaccination or a weight measurement on a scale. In the Tampere cohort, the triggers ranged from venipuncture to unknown inconvenience. However, the venipuncture did not trigger a cry in all cases, and there were no other triggers in the Tampere cohort which could be considered as painful. In general, attempts to make a taxonomy of different cry triggers are challenging because evaluating the trigger is more or less subjective except in case of strong pain caused by an external operator. Future studies could benefit from measuring infants' arousal levels during cry paradigms (e.g., with heart rate or skin conductance measurements) as autonomic arousal is known to influence F0 (Porter et al., 1988;Shinya et al., 2016) and correlate with the experience of pain and inconvenience (Stevens et al., 2007). Thus, it would be important to study whether cry samples during increased arousal (such as sudden pain or prolonged cry) are more informative regarding the correlation between cry and later cognitive variables. The representativeness of the used cry samples is important for increasing the accuracy of F0 variables. It is possible to use several recordings or multiple samples within a recording. However, also in this approach the influence of arousal and the location of the used sample in the whole cry sequence should be taken into consideration.
The reliability of the correlation estimates in the current study was limited by the number of infants included in the analyses, and it will be important to replicate the findings with a larger sample. It will also be important to obtain more standardized cry data by controlling for various factors, such as infant age, the trigger of the cry, and other recording parameters, such as whether the infants are supine vs. on the caregivers lap during the recording (see Goberman et al., 2008), which also differed between the cohorts in this study. Additionally, the tasks for measuring oculomotor orienting and attention disengagement were not completely identical in Tampere and Cape Town, which was reflected in differences between the cohorts in the mean levels of the eye tracking variables. Notwithstanding these limitations, a strength of the study is the integration of two previously separate research lines of infant cry analysis and eye tracking based measurement of infant visual attention. Both methods enable data collection at a very early age which is a prominent advantage in evaluating early cognitive development. Methodological development (e.g., Naithani et al., 2018) has also enabled reliable segmentation and acoustic analyses of cry sounds captured in various settings, broadening the possibilities of neonatal cry analysis.
Monitoring early cognitive development is an important, yet challenging part of pediatric practice, which increases the need for applicable methods that can be administered at an earlier age than more elaborate psychological assessments. Research on the potential of using infant cry to predict cognitive development is still scarce and often conducted with small samples. In the present study with typically developing infants, the results suggested only a small association between neonatal cry F0 and later visual attention disengagement, which was not evident in the other cohort. Whether variations in infant cry characteristics are sufficiently distinct in typically developing populations to be used as a supporting tool for monitoring cognitive skills requires further research. Nevertheless, detecting normative variation in infant cognition is vitally important especially for identifying children with milder forms of atypical cognitive development.

DATA AVAILABILITY STATEMENT
All datasets presented in this study are included in the article/Supplementary Material.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethical Committee of Tampere University Hospital and the Institutional Review Board of Stellenbosch University. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
AK, GN, OT, TV, DN, JL, and MP designed the study. AK, EK, and AL collected the data. AK, GN, EK, MA, JL, and MP analyzed the data. AK drafted the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
The preparation of the manuscript was supported by grants from the Academy of Finland (#258708 to TV and #307657 to MP) and the National Research Foundation of South Africa (to DN).