Moderated Online Data-Collection for Developmental Research: Methods and Replications

Chuey, Aaron; Asaba, Mika; Bridgers, Sophie; Carrillo, Brandon; Dietz, Griffin; Garcia, Teresa; Leonard, Julia A.; Liu, Shari; Merrick, Megan; Radwan, Samaher; Stegall, Jessa; Velez, Natalia; Woo, Brandon; Wu, Yang; Zhou, Xi J.; Frank, Michael C.; Gweon, Hyowon

doi:10.3389/fpsyg.2021.734398

METHODS article

Front. Psychol., 03 November 2021

Sec. Human Developmental Psychology

Volume 12 - 2021 | https://doi.org/10.3389/fpsyg.2021.734398

Moderated Online Data-Collection for Developmental Research: Methods and Replications

AC
Aaron Chuey ¹^*
MA
Mika Asaba ¹
SB
Sophie Bridgers ²
BC
Brandon Carrillo ¹
GD
Griffin Dietz ¹
TG
Teresa Garcia ¹
JA
Julia A. Leonard ³
SL
Shari Liu ²
MM
Megan Merrick ⁴
SR
Samaher Radwan ¹
JS
Jessa Stegall ¹
NV
Natalia Velez ⁵
BW
Brandon Woo ⁵
YW
Yang Wu ¹
XJ
Xi J. Zhou ¹
MC
Michael C. Frank ¹
HG
Hyowon Gweon ¹^*

1. Department of Psychology, Stanford University, Palo Alto, CA, United States
2. Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, United States
3. Department of Psychology, Yale University, New Haven, CT, United States
4. Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, United States
5. Department of Psychology, Harvard University, Cambridge, MA, United States

Article metrics

View details

Citations

11,7k

Views

1,8k

Downloads

Abstract

Online data collection methods are expanding the ease and access of developmental research for researchers and participants alike. While its popularity among developmental scientists has soared during the COVID-19 pandemic, its potential goes beyond just a means for safe, socially distanced data collection. In particular, advances in video conferencing software has enabled researchers to engage in face-to-face interactions with participants from nearly any location at any time. Due to the novelty of these methods, however, many researchers still remain uncertain about the differences in available approaches as well as the validity of online methods more broadly. In this article, we aim to address both issues with a focus on moderated (synchronous) data collected using video-conferencing software (e.g., Zoom). First, we review existing approaches for designing and executing moderated online studies with young children. We also present concrete examples of studies that implemented choice and verbal measures (Studies 1 and 2) and looking time (Studies 3 and 4) across both in-person and online moderated data collection methods. Direct comparison of the two methods within each study as well as a meta-analysis of all studies suggest that the results from the two methods are comparable, providing empirical support for the validity of moderated online data collection. Finally, we discuss current limitations of online data collection and possible solutions, as well as its potential to increase the accessibility, diversity, and replicability of developmental science.

Introduction

Over the past decade, online data collection has transformed the field of psychological science. Commercial crowdsourcing platforms such as Amazon Mechanical Turk have allowed participants to perform experimental tasks remotely from their own computers, making it easier, faster, and cheaper for researchers to collect large samples. The advantages of online methods led to a rapid increase in their popularity; for example, the percentage of online studies published in three prominent social psychology journals rose from around 3% in 2005 to around 50% in 2015 (Anderson et al., 2019).

Although online methods have been mostly constrained to studies with adults, some recent efforts have pioneered ways to conduct developmental research online (e.g., Lookit, Scott and Schulz, 2017; TheChildLab.com, Sheskin and Keil, 2018; Panda, Rhodes et al., 2020). As the COVID-19 pandemic spurred many developmental researchers to consider safer alternatives to in-person interactions, these methods have quickly gained traction as an innovative way to enable large-scale data collection from children and maximize access and impact in developmental science (Sheskin et al., 2020). Due to the novelty of these methods, however, there is little shared information available about recommended practices for designing, implementing, and executing online experiments with children. Furthermore, researchers may feel hesitant to replicate or build on prior work using online methods because of uncertainties about how developmental data collected online would compare to data collected in person.

The current paper aims to serve as a guide for developmental researchers seeking information about online data collection, with a focus on using video-chat software for moderated (synchronous) data collection. We begin by explaining how moderated methods differ from unmoderated (asynchronous) methods, including their relative advantages and disadvantages. Next, we describe recommended practices and approaches for designing online developmental studies conducted via moderated sessions. In particular, we provide guidelines for implementing two broad classes of measures: forced choice for young children and looking time for infants. To examine the validity of moderated online methods, we present four sets of studies conducted both in person and online that utilize these measures as well as a meta-analysis that compares results from both data collection methods across the four sets of studies. Finally, we discuss the limitations and potential of moderated online data collection as a viable research method that will continue to shape developmental psychology.

Moderated Online Studies: What It Is and Recommended Practices

Online data collection methods can be categorized as moderated (synchronous) or unmoderated (asynchronous). Unlike unmoderated (asynchronous) data collection which functions like Amazon Mechanical Turk, moderated (synchronous) data collection functions more like in-person testing; participants engage in real-time interactions with researchers on a web-enabled device using video-conferencing software, such as Zoom, Adobe Connect, or Skype.

An advantage of unmoderated data collection is that it is less labor-intensive than moderated data collection. Participants complete a preprogrammed module without directly interacting with researchers; once the study is programmed, there is little effort involved in the actual data collection process on the researchers’ end. Some pioneering efforts have led to innovative platforms for implementing these modules (Lookit, see Scott and Schulz, 2017; see also Panda, Rhodes et al., 2020), and adaptations of three well-established studies on Lookit have found comparable results to their original in-person implementations (Scott et al., 2017). Its advantages, however, come with trade-offs: due to the lack of researcher supervision, unmoderated data collection is limited to behavioral paradigms where real-time monitoring is not necessary. Thus, this method may not be as well suited for studies where live social interactions and joint-attention are central to the hypothesis and experimental design. Furthermore, adapting an in-person study to an unmoderated module usually involves significant alterations in study procedure and format (Scott et al., 2017), creating additional challenges to directly replicating existing findings in some circumstances.

Moderated data collection, by contrast, is comparable to in-person methods in terms of their costs. It requires recruiting and scheduling participants for an appointment, and at least one researcher must be available to host the session and guide participants throughout the study procedure. Yet, this allows moderated sessions to retain the interactive nature of in-person studies that is often critical for developmental research. Experimenters can have face-to-face interactions with parents and children to provide instructions, present stimuli, actively guide children’s attention, ask questions, and record a number of behavioral measures. Although certain paradigms or measures are difficult to implement even with moderated methods (e.g., playing with a physical toy), many existing in-person studies can be translated into an online version with relatively minor changes in procedures.

Early efforts to apply moderated online data-collection to studies with children have produced promising results, albeit with some caveats. For instance, Sheskin and Keil (2018) collected verbal responses from 5- to 12-year-old children in the United States on several basic tasks via video-conferencing software (Adobe Connect). While children showed ceiling-level performance on questions that assessed their understanding of basic physical principles (e.g., gravity) and fair distribution of resources, their performance on false belief scenarios (i.e., the Sally-Anne task adapted from Baron-Cohen et al., 1985) was significantly delayed compared to results from prior work conducted in person. It is possible that younger children found it more difficult to keep track of multiple characters and locations on a completely virtual interaction; the task also relied primarily on verbal prompts without additional support to guide children’s attention (e.g., pointing). However, because this study did not directly compare the results from online and in-person versions of the same task, it is difficult to draw strong conclusions about the cause of the discrepancies or the validity of moderated methods more generally.

More recently, Smith-Flores et al. (2021) reported replications of prior looking time studies with infants (violation of expectation and preferential looking) via a moderated online format. The findings from data collected online were generally comparable to existing results; for instance, infants looked longer at events where an object violated the principle of gravity than events that did not (e.g., Spelke et al., 1992) and were more likely to learn about object properties following such surprising events (Stahl and Feigenson, 2015)¹. Contrary to classic work on infants’ understanding of physics, however, infants in this study did not show a sensitivity to violations of object solidity. Although infants in these experiments viewed recorded video clips of events very similar to those used in prior in-person studies, the authors note the experience of viewing such videos on screens is quite different from viewing the event in person, and that differences in the visual properties of test stimuli (e.g., limited aspect ratio of participants’ screens) could have contributed to the discrepancy in results. These concerns might apply to any study using online data collection (both moderated and unmoderated) that involves viewing visual stimuli on a screen as opposed to live events.

In sum, existing data suggest that moderated online studies are indeed feasible, but they also highlight two challenges. First, due to the relative novelty of moderated methods, researchers may be unsure about how to implement a study online and what can be done to minimize potential discrepancies between in-person and online versions. Second, the field still lacks a true apples-to-apples comparison between studies conducted online and in-person using stimuli and procedures matched as closely as possible. In particular, given the variety of dependent measures and procedures used in developmental research, it is important to have a number of such comparisons that span across different experimental designs and methods.

The following sections address these challenges by reviewing current approaches to moderated online study design and providing empirical data that replicate in-person findings with moderated online methods. We begin by outlining key considerations for implementing moderated studies, followed by presentation methods and design considerations that promote participant attention and engagement. Then, we provide concrete examples of implementing dependent measures that are frequently used in developmental research: choice and verbal measures (more suitable for children aged 2 and up) and looking time measures (suitable for infants). We also compare results from experiments that were conducted in-person and adapted for online data collection using these suggestions.

Moderated Online Studies: Implementation and Recommended Practices

Moderated online studies have been implemented using a variety of video-conferencing software, including Zoom, Adobe Connect, and Skype, among others. Each video conferencing software has benefits and drawbacks that make it better suited for certain research endeavors and styles. There are several particularly important dimensions to consider, including accessibility, functionality, and robustness to technical issues (see Table 1).

TABLE 1

Accessibility	Software should ideally be easy to obtain and use, especially for participants. In addition to monetary concerns or internet access (Lourenco and Tasimi, 2020), the need for technical skills, time (e.g., for downloading and installing new software), or specific hardware (e.g., Facetime requires Apple OS) can create barriers to participation. Intuitive software also makes online research easier for both experimenters and participants by reducing time spent setting up and troubleshooting sessions. Using software that many people already have and know how to use can alleviate this issue. Note, however, that accessibility is always relative to a particular population at a particular time; software that is suitable for one population may not necessarily be so for others. For example, Zoom became a more accessible option for conducting developmental research in the United States following the COVID-19 pandemic as more families downloaded and used Zoom in their day-to-day lives for work and remote schooling. As trends in software usage change over time for a given population, researchers should continue to adapt their methodologies accordingly.
Functionality	A software’s user interface, customizability, and security features determine how studies are conducted and the extent to which researchers can customize participants’ online experience. Importantly, security standards regarding recording and storage of online sessions vary across institutions and countries; researchers should keep these in mind when assessing the level of security a given software provides. Additionally, while basic video- and screen-sharing as well as text-chat functionalities are common in most software, the details vary in a number of ways, including how users customize what they can view on screen and how recording is implemented (e.g., local vs. cloud storage). More broadly, intuitive design and real-time flexibility often trades off with precise structure and customization options. Some software (e.g., Adobe Connect) allows experimenters to predetermine the layout of participants’ screens before sessions, and others (e.g., Zoom) automatically generate participants’ layouts and allow participants to modify their layout in real time (following instructions from experimenters). While the former type is ideal for experiments that require precise control over what participants view on screen, the latter type of software is more suitable for sessions involving rapid transitions between multiple experiments with different visual layouts.
Robustness	Recurring lag, audio or video problems, and even login errors can slow down or derail an online session. Although technical issues can also occur in person, issues can be more difficult to resolve in remote interactions where experimenters have limited means to understand participants’ issues. Therefore, it is important to test the frequency and duration of technical issues on both experimenters’ and participants’ ends before committing to a particular video-conferencing software. Depending on the software, screen-sharing or streaming large video or audio files can contribute to unwanted lag or delays. Further, their severity can vary depending on connection speed or devices used by both experimenters and participants. For experiments that rely on precise timing of presented stimuli, researchers might consider presentation methods that do not rely on screen-sharing (e.g., hosting video stimuli on servers or other platforms where participants can access directly, such as online video-hosting or slide-presentation services). If there are consistent participant-end issues that impact the fidelity of a study, researchers can also set explicit criteria for participation (e.g., must use a laptop or cannot use a phone signal-based internet connection).

Factors to consider when choosing software for moderated online data collection.

One common way to implement moderated online studies with young children utilizes locally installed slideshow applications on experimenters’ computers (e.g., Microsoft PowerPoint, Keynote). These applications allow researchers to present a wide variety of stimuli, including images, animations, videos, audio, and written language. Implementing studies using these applications creates a linear structure that naturally segments study procedures into manageable components, making it easy for researchers to manipulate the order of presentation and access notes. Alternatively, studies involving videos, such as many infants studies, have been implemented on video-sharing websites such as YouTube, or slides hosted on cloud services.

One key challenge in designing developmental experiments is ensuring that children stay engaged and attentive throughout the task. On the one hand, an advantage of online data collection is that children participate from their familiar home environments, which could improve their comfort and engagement. On the other hand, however, home environments can be more distracting than lab settings, and researchers have little control over them. For studies that require relatively well-controlled environments, researchers could consider sending parents instructions prior to the testing session to help them create ideal testing environments at home. For example, parents could be instructed to keep siblings out of the room during the session. Here we discuss a few additional strategies to maximize children’s engagement during online data collection and to direct their attention to specific stimuli on screen.

Elicit Regular Responses From Participants

Because online studies can suffer from technical problems as well as distractions in a child’s home environment, researchers should design them to be robust to frequent interruptions. Eliciting regular feedback from children, either casually or by implementing comprehension questions throughout the task, is one useful strategy. While this is also used in person, frequent questions are particularly useful for identifying long periods of lag or technical issues that can otherwise go unnoticed online. Playing a short video at the start of a session and asking participants to report any lag or audio problems is another quick and easy way to assess participant-end technical issues that might not be readily apparent from an experimenter’s perspective. Finally, it is often useful to make parts of a study easy to repeat in case they are compromised by connectivity issues, audio/video problems, or other unexpected difficulties.

Use Social Cues

In-person studies often utilize social cues from the experimenter (e.g., gaze, pointing) to direct children’s attention. While these are more difficult to use online, some video conferencing software (e.g., Zoom) allows researchers to flexibly adjust the size and location of experimenter’s video feed on participants’ screen, such that the experimenter’s gaze and pointing can be “directed” to specific parts of the stimuli (see Figure 1). These features can be useful for providing the experience of a “shared reality” with the experimenter and can be particularly effective in studies that require joint attention. Additionally, audio and visual attention-getters (e.g., sounds, animations, or markers like bounded boxes that highlight a particular event, character, or object on the screen) can be used instead of experimenters’ gaze or pointing gestures to focus children’s attention on specific stimuli.

FIGURE 1

Keep It Short and Simple

Because interacting with others online can tax children’s (and adults’) cognitive resources more than in-person interactions (e.g., Bailenson, 2021), it is important to keep online studies as short and simple as possible. For studies that require relatively longer sessions, presenting them as a series of multiple, distinct activities can help maintain children’s attention and enthusiasm throughout. In cases where concerns about cross-study contamination are minimal, researchers can also run more than one experiment per session. Of course, different studies have different attentional demands and require varying levels of continuous attention. Thus, researchers should consider what counts as a consequential lapse of attention and devise their exclusion criteria accordingly during the pre-registration process.

As we emphasized earlier, one key advantage of moderated methods is the relative ease of adapting in-person studies to an online format without significant changes to the procedure. This means that many of the strategies used to promote attention and engagement in person also apply to online studies. For instance, color-coding and animating stimuli, using engaging stories and characters, and talking in simple, plain language can also help children stay engaged. Overall. relatively minor changes to the way that stimuli are presented can have a large impact on children’s attention and engagement throughout an online session.

In what follows, we provide more specific guidelines for implementing two kinds of dependent measures (choice and looking time) with concrete example studies for each type of measure. Importantly, these studies address different theoretical questions and have not been fully published at the time of writing this article; the key reason for reporting these datasets is to examine the validity of moderated online data collection. As such, we describe the hypotheses and methods of these studies only to the degree necessary to contextualize our analyses: comparing the main effect of interest from data collected in-person versus online. In addition to a direct comparison of their results, we present a meta-analysis of all four sets of experiments that provides further evidence that moderated online and in-person testing yielded similar results across the current studies.

Examples and Replications I: Choice and Verbal Measures in Moderated Online Studies

To elicit explicit choices from children who are old enough to understand verbal instructions, in-person studies often use pointing or reaching as dependent measures. These behaviors, however, can be difficult to assess in online studies; webcam placement can vary across participants, and participants may move outside the field of view during the critical response period. One useful approach for implementing choice tasks for children in this age range is to replace pointing or reaching with verbal responses, and associate each choice with overt visual cues such as color. For example, a binary choice question can be presented as a choice between one character wearing orange and another character wearing purple (color assignment counterbalanced), with children only needing to choose “orange or purple” (see Figure 2). In these choice paradigms, it is important to keep the on-screen location of key choices or stimuli as consistent as possible throughout the study such that transitioning between slides is less disruptive and easier to follow.

FIGURE 2

In addition to forced-choice measures, experimenters can elicit free-form verbal responses or actions as dependent measures, or ask the parent to type out the child’s responses via text chat. Researchers can also implement other creative dependent measures, such as prompting children to make a drawing and share it with the experimenter via video. As long as a behavior can be consistently prompted and recorded, it can likely be used as a measure in a moderated online study. Here, we present two additional sets of studies conducted online and in person that measured children’s explicit choice between two agents. One study examined 4- to 5-year olds (Study 1) and another examined 6- to 9-year olds (Study 2).

Study 1

Research Question

Can 4- to 5-year-old children use information about task difficulty to infer relative competence when agents’ efforts are matched? To investigate this question, children viewed two agents who used 10 wooden blocks to build different structures; one placed the blocks on top of each other to create a vertical tower while the other placed them next to each other to form a horizontal line. Children were then asked which agent was better at building blocks. Prior work has established that children understand that the vertical structure is “harder” (i.e., takes longer) to build compared to the horizontal structure (Gweon et al., 2017). Thus, the hypothesis was that even though both agents moved and placed 10 blocks, if they took equally long to finish building, children would judge the agent who built the vertical (and therefore harder) tower as more competent than the agent who built the horizontal line.

Participants

In-person

Twenty 4- and 5-year-old children participated in-person at the Boston Children’s Museum (10 females, mean: 62.25 months, range: 49–71); 10 additional children were tested but excluded due to failing the practice question (n = 3) or inclusion criteria question (n = 7).

Online

Twenty 4- and 5-year-old children participated online (nine females, mean: 59.23 months, range: 49–71). Participants were recruited via local and online advertising. Seven additional children were tested but excluded due to failing the inclusion criteria question (n = 3), technical issues (n = 1), declining video (n = 1), not wanting to answer the inclusion question (n = 1) or dropping out (n = 1).

Methods

Both in-person and online

An experimenter first asked children “Who is better at writing letters — you or your parents?” and then “Who is better at playing on the playground — you or your parents?” If children chose themselves for writing or their parents for playing, they were corrected. In the test video, children watched two agents build block structures. Below one agent was a picture of a 10-block vertical tower and below the other agent was a picture of a 10-block horizontal tower. We chose these structures based on findings from Gweon et al. (2017) showing that 4-year olds readily judge the 10-block vertical structure as harder to build than the 10-block horizontal structure based on static pictures of the initial states (i.e., scattered blocks) and final states (finished towers), without seeing the building process. The agents first said they wanted to build a pictured tower. One agent pointed to the picture below her and said, “I’m going to make this,” then the other agent repeated the same action. Next, the agents began to build at the same time. A screen blocked visual access to the agents’ building actions. Both agents indicated they were finished building at the same time. The screen then lifted, revealing what each agent made. Children were then asked the test question followed by an additional question used as a part of the inclusion criteria. Those who answered the inclusion question inaccurately were excluded from analyses.

In-person

Before the test trial, children watched a practice video where two agents drew shapes, finishing at different times. While the agents drew, a screen blocked them. One of the agents indicated she was done drawing, followed by the other agent a few seconds later. Then the screen lifted to reveal what they made. Children were asked which agent finished first and whether the agents had made the same or different pictures. If they answered incorrectly, they were excluded from analysis. Afterward, children viewed the test trial, and were subsequently asked the critical test question: “Who is better at building blocks?” and were encouraged to point, followed by the inclusion question “Which tower is better?”

Online

The online study was the same as the in-person study except for the following modifications. To make the study amenable to online testing, children’s attention toward desired locations in the presentation was cued using animation and sound. Instead of asking children to point to which agent was better at the end, they were instructed to make their choice based on the color of squares surrounding each agent. To reduce study time, the practice trial was also removed (more than 93% of children passed the practice trial in in-person versions of three similar prior studies). Finally, we changed the inclusion question to “which tower is harder to make?”

Results

In-person

Children’s performance on the test question was significantly above chance (90%, CI = [80%, 100%], p < 0.001). This result held even after including the seven children who failed to answer the inclusion question accurately (74%, CI = [60%, 93%], p = 0.02).

Online

Consistent with in-person findings, children’s performance on the test question was significantly above chance (85%, CI = [70%, 100%], p = 0.003). See Figure 4 (1) for a summary of results.

FIGURE 3

FIGURE 4