The National Children's Study Archive Model: A 3-Tier Framework for Dissemination of Data and Specimens for General Use and Secondary Analysis

The National Children's Study (NCS) Archive was created as a repository of samples, data, and information from the NCS Vanguard Study—a longitudinal pregnancy and birth cohort evaluating approaches to study influence of environmental exposures on child health and development—to provide qualified researchers with access to NCS materials for use in secondary research. The National Children's Study Archive (NCSA) model is a 3-tiered access model designed to make the wealth of information and materials gathered during the NCS Vanguard Study available at a user appropriate level. The NCSA model was developed as a 3-tier framework, for users of varying access levels, providing intuitive data exploration and visualization tools, an end-to-end data and sample request management system, and a restricted portal for participant-level data access with a team of experts available to assist users. This platform provides a model to accelerate transformation of information and materials from existing studies into new scientific discoveries. Trial Registration: ClinicalTrials.gov Identifier: NCT00852904 (first posted February 27, 2009).


INTRODUCTION AND BACKGROUND
The National Children's Study (NCS) Vanguard Study was a pilot for a nationally representative cohort of 100,000 U.S. children to be followed beginning before birth to adulthood (along with their parents) in a national longitudinal study of environmental (including physical, chemical, biological, and psychosocial) influences on child health and development, to be conducted by the Eunice Kennedy Shriver National Institute of Child Health Human Development (NICHD) in collaboration with the U.S. Centers for Disease Control and Prevention and the U.S. Environmental Protection Agency (1).
The Vanguard Study tested different recruitment strategies between 2009 and 2014. The Initial Vanguard Study (IVS) opened to accrual in 2009 in seven locations across the U.S. The IVS focused on recruitment through household enumeration, and used comprehensive data and sample collection procedures that were designed for the Main NCS (2). In November 2010, the IVS was succeeded by the Alternate Recruitment Substudy (ARS). The ARS was designed to evaluate three recruitment strategies: Direct Outreach, Enhanced Household Enumeration, and Provider-Based Recruitment (PBR). Each strategy was conducted in 10 study locations (3)(4)(5). Recruitment under the three ARS strategies ended in February 2012. The ARS was followed in June 2012 by another substudy, the Provider-Based Sampling (PBS) Substudy, which was conducted in three study locations (6). Overall, the PBS strategy proved most effective and cost efficient (7). In December of 2014, the Vanguard Study was discontinued by the National Institutes of Health (NIH) Director following the recommendations of a National Children's Study Working Group (8,9).
To recruit Vanguard Study participants, over 74,000 women were screened to determine eligibility (Figure 1). At the close of the NCS Vanguard Study, more than 5,400 birth families had been enrolled and followed (Figure 1) in 43 counties within 31 states across the country (Figure 2). Over the course of the study, 5,608 children and their primary caregivers participated in at least one protocol-specified study visit, from pre-conception through pregnancy and birth up to 42 months of age (Figure 1).
Data on a wide variety of health-related topics was collected by interview, self-administered questionnaires, and data collector generated case report forms. In total, 161 data collection forms, excluding form versioning (i.e., v1, v2, etc.), were utilized by the study for the IVS and ARS phases. These forms and the data they collected included pre-, peri-, and post-partum interviews, biological sample collection, environmental sample collection, physical examination, and neuropsychosocial and cognitive assessments. During active data collection, study centers submitted up to 700 datasets biweekly, totaling 4.1 terabytes of unique data by study end.
Nearly 19,000 unique biological and 4,000 unique environmental samples were collected from participants (women, fathers, and children) and their households across all study visit stages. From these a sample repository of more than 250,000 items was derived. Biological material collected includes blood, urine, saliva, breast milk, mucosal swabs, meconium, hair, nails, and placental tissue. Various air, dust, and water collections comprise the environmental samples. Serial sets of biospecimens were collected from women over several timepoints in pregnancy. Various combined sets of biological samples from mother/child dyads and some mother/child/father triads are represented.
The Vanguard Study, in addition to collecting data and samples, produced thousands of documents to support and facilitate study implementation. To effectively coordinate 40 locations across the country, each site worked from centrally created and maintained procedures and guidance documents. These documents were a combined effort of different teams of subject matter experts (SMEs) and defined the functions and implementation of the NCS. Each study task, such as data collection, had corresponding standard operating procedures (SOP), in-home study-visit and site workflows, training documentation, equipment maintenance guides, and adverse outcome documentation requirements. This documentation supporting the implementation of the NCS captured the many goals and challenges of executing any large, multicenter study.
With the Vanguard study's closure in 2014, the wealth of potential future research use represented by the existing data, samples, documentation, and knowledge of the NCS was recognized. The potential for data sharing was specifically noted in a 12/11/2014 NIH Director's statement indicating that "Data from the Vanguard Study should be archived and available upon request by investigators for secondary analyses" (10). Studies have shown that collaboration among researchers and the sharing of data not only promotes more robust research but also leads to faster translation of findings into clinical practice (11). The NIH therefore deemed that the NCS should continue as a scientific resource and sought to archive and make available the data and samples as quickly and completely as possible in accordance with the February 26, 2003 Final NIH Statement on Sharing Research Data (12).
The National Children's Study Archive (NCSA) core mission was to curate and maintain the information, data, and sample materials generated by the NCS Vanguard Study, and to make those products available for continued de novo analysis (9). The NCSA supported this mission from November 2015 to April 2020, after which the NCS data, following removal of HIPAA identifiers (13), is made available through the NICHD Data and Specimen Hub (DASH) (13) for continued research use beginning in early or mid-2020.
To support the NCSA mission, the NCS team developed the NCSA model to share the wide array of study protocols, procedures, instruments, and other products from the NCS, and to make available for scientific research the data and samples collected from women, fathers, and children for the NCS Vanguard Study. The objective of this report is to: 1. Describe the 3-tier NCSA model the NCS Archive developed to promote user access at a user appropriate level. 2. Describe the user-centric tools and resources developed for the NCSA model to facilitate the understanding of complex longitudinal study design and the resulting products. 3. Describe the evaluation framework the NCS archive used to support secondary analysis within the NCSA model. Some of the criteria evaluated included data specifications, data documentation and discovery tools, remote analysis functionality, and privacy protection measures. Because of the complex and unique nature of the NCS Vanguard Study's design and conduct, and to meet the need for further data curation, the NCS Archive was created as a study-specific resource rather than a component of an existing data repository. As a separate resource, the NCS Archive could support carry-over research from the NCS Vanguard Study, release study data and information on a rolling basis, provide immediate support for secondary research projects, and share experience upon which other national cohort studies like the Environmental influences on Child Health Outcomes (ECHO) (16) and All of Us studies (17) could build.

MATERIALS AND METHODS
While NCSA planning, development, and implementation predated establishment and publication of the FAIR (Findable, Accessible, Interoperable, Reuseable) Guiding Principles for scientific data management and stewardship (18), these principles were anticipated in the functional requirements that were established for the system. Data security and privacy requirements were developed to ensure establishment of adequate controls to protect the privacy of the participants and ensure the security of the data as required  (19).
System design requirements were established to achieve specific desirable characteristics, including: • Flexible-quickly and efficiently accommodate evolving requirements • Comprehensive-support an extensive range of data analysis functions • Integrated-provide interoperability across data sources • Reliable-assure function under all conditions • Secure-employ stringent security controls to prevent inappropriate information disclosure and possible data loss, while ensuring that the right information is provided to the right people • Private-ensure data privacy and integrity despite technological evolution • Maximum access to data by researchers while maintaining FISMA standards • User-focused and User-friendly-tailored to a variety of data users • Reduction of long-term risks and lifecycle costs Definition of functional requirements incorporated input, review, and comment from data users as part of the process. Key functional requirements of the NCSA are summarized below for various elements of the Archive.
Data curation, preparation, and documentation for easy reuse: • Create analytic data files and all documentation including user guides, data dictionary, and code books, to be made available for the general research community and in accordance with Section 508 of the Rehabilitation Act of 1973 (20 • Remove sensitive variables to comply with the NIH data privacy protection requirements • Perform other data manipulation techniques to reduce disclosure risks • Develop and implement specifications to minimize participant disclosure risk • Produce data documentation needed by data users, including analytic limitations imposed by removal of personal identifiers. Support tools development: • Create new data discovery and cohort discovery tools based on existing applications such as the NCS Workbench and NCS Knowledgebase • Provide links to questionnaires and data collection instruments and materials • Develop and construct data item level crosswalks to facilitate data use • Incorporate search capabilities into these data user tools • Provide web-based analysis tools for end data users • Produce data tables and charts that summarize participant characteristics or key findings from analytic files Archive user support services: • Support requests for data access (microdata, specimens) • Produce other custom data files • Development and maintenance of common linking identifiers to integrate NCS data with external sources while maintaining participant confidentiality • Support linkages to laboratory data • Disclosure review of researcher produced analysis outputs and tables • Support for publication submission Requirements for multiple levels of data access: • Provide a limited set of carefully de-identified data files containing key variables which can be downloaded by users to be analyzed on their own equipment, as an inexpensive method to access key variables from the NCS. • Provide a secure enclave wherein users can access more sensitive, partially de-identified data and conduct analyses within the enclave, with no ability to download any microdata (individual-level data).

RESULTS
The NCSA model shares information using a 3-tier approach built on the "principle of least privilege" and augmented with user driven content tools (Figure 3). Each tier is intended to allow users to identify the Study information and the available data at their level of need. Need-based access has three goalsfirst, to streamline the process of data absorption; second, to allow the user to drive meaningful discovery; and third, to allow targeted use of limited analytic and support resources.
Key to the NCSA model is the leveraging of user self-service opportunities. Through either documentation or system tools information, information access is democratized. Users drive access and understanding of a study on their own, thereby reducing the resource requirements of Archive support staff and SMEs.

Tier 1: Public Access
The Public Access tier is intended for the general population to allow persons interested in the research to learn about the content without unnecessary technical detail. For the NCS, public access meant access to the public portion of the NCS Archive website and a NICHD hosted web page. Users had access to a study description (21), a history of the study, descriptions of the data and samples available, and information about the ongoing archive activities. Users were provided a mechanism to gain registered access and a mechanism to contact the NCS Archive with questions or inquiries.
Public access for the NCS additionally meant highlighting a selection of scientific works for both informational and aspirational purposes. Publications based on NCS data presented in journals and at conferences were made available together with a series of data briefs, which are concise summaries of selected data from the NCS Archive, with comparisons to other widely recognized sources of similar data ( Table 1).
This allowed sharing of previous work completed using NCS data and highlighted opportunities for new or continued research. As a federally funded program, it was important for NCS to make tangible results available to the general population. As a value-added component, the NCS Archive created and freely provided teaching databases comprised of child-level and health topic-related modules (22). The teaching databases were derived from real data collected during the NCS Vanguard Study. They were created for academic use only and not intended to be suitable for use for publication. The teaching databases offer a ready-made database system with examples for various statistical models, including repeated measures analyses, to supplement classroom instruction and for use in student projects.

Tier 2: Registered Access
The registered access tier of the NCSA model provides the next level of data access privilege to users interested in a study. Registered access makes available resources to promote and cultivate new ideas for further exploration of existing data or samples, as well as providing detailed documentation of the study and the study's experiences. In exchange for the access to new information, a user must register and attest to responsible stewardship of the information provided.
For the NCS Archive, restricted access to the site required completion of a demographic registration form ( Table 2), consent to adhere to the NCS Vanguard data user agreement ( Table 3), and registering a login service account. Registered users gained access to all publicly available Study information and data files, the site's protocol and data self-service tools, and the proposal management system for requests to access restricted study data and other stored materials in Tier 3. The NCS Archive developed several self-service tools to explore the study operational information and sample size data for participants, biospecimen, and environmental samples    (Table 4), to facilitate meaningful evaluation and consumption of the large amount of available NCS data. These tools transfer the ability to identify and confirm potential research opportunities from the Archive support staff to the interested user, preemptively answering potential inquiries.
One challenge to understanding the NCS Vanguard is a study design that involves multiple sequential study protocols-the 2009-2010 IVS Protocol and the 2011-2014 ARS Protocol. Each protocol included participant visits across multiple life stages (pre-conception, prenatal, perinatal, postnatal). The Protocol Browser tool was developed to allow users easily to assess the content and timing of study evaluations undertaken by NCS participants in either protocol. The tool is a visual representation of study events experienced by participants within each protocol, displayed by life stage, study visit, and data collection instruments administered (Figure 4). The tool can display one or both protocols simultaneously to allow comparisons. With numerous Study visits for each protocol across multiple life stages, IVS with 14 visits and ARS with 16, the NCS ended up fielding 161 different instruments (IVS 97, ARS 64). To manage the dissemination of these instruments, the Instrument and Dataset Inventories tool (Figure 5) was developed to display a list of available instruments by protocol and subject domain. Instruments are made available in PDF format, unless copy restrictions apply, and interested users can view the in-line variables notation. Additionally, the number of variables and records in each corresponding dataset is identified.  The primary uses of this information are to document, track, and monitor and evaluate the use of the NCS Archive, as well as to notify interested recipients of updates, corrections or other changes to the NCS data The Federal Privacy Act protects the confidentiality of some NIH records. The NIH and any users that are provided access to the NCS Archive will have access to the information collected by the NIH from the Recipient, as part of the data use agreement or data request forms for the purposes described above. In addition, the Act allows the release of some information without the Recipient's permission; for example, if it is requested by members of Congress or other authorized individuals. The information collection requested is voluntary, but necessary for obtaining access to data and samples in the NCS Archive 13 To complete NCS data user training From these instrument datasets and other constructed analytic files, the NCS Archive generated nearly 14,000 variables. To quickly evaluate what variables are available with how many data points or responses for a given area, the Variable Locator tool (Figure 6) was created. This tool provides for free text variable search of all data in the NCS Archive. A user who might enter the term "sleep" into the Variable Locator would be provided a list of all sleep-related variables and questions, together with frequency counts of respondents, valid responses, legitimate skip responses, and records in the dataset. The Variable Locator allows users to filter results further by using Boolean search terms ("AND, " "NOT, " and "OR"). Researchers can use the Variable Locator tool to identify readily what potential the NCS Archive data may hold for their intended research.
Once researchers have identified operational components of interest-datasets, instruments, or variables-knowing the potential sample size is crucial to evaluate the feasibility of their planned research. The Participant Explorer tool allows users to interrogate metadata-level NCS participant and study participation information. Users of the Participant Explorer tool can determine participant counts by various categories separately or in combination, such as type of participant (woman, child, father), data collection timepoint (prenatal, perinatal, postnatal), demographic variables (race, ethnicity, education level, marital status), etc. Data for fathers, for example, can be categorized by age, education, race/ethnicity, other demographics, and study activity (Figure 7). Each variable in the Participant Explorer allows a researcher to narrow the target population to reflect a researcher's specific needs and interest.
In similar fashion, the Sample Explorer allows users to develop sample size estimates based on the inventory of biological and environmental samples collected from women, children, and fathers who participated in the NCS ( Table 5). The Archive initially held over 250,000 individual biospecimens and 4,600 individual environmental samples. This extensive sample collection made the Archive a significant resource of materials for laboratory research, while the Sample Explorer provided a method to explore those samples in context and detail. Like the Participant Explorer, the Sample Explorer allows researchers to determine the number and type of primary or derivative biological and environmental sample material available by various demographic categories, type of participant, study visit, etc. Researchers can use the Sample Explorer to determine counts    of samples available from the specific type of participant (e.g., infants, pregnant women, etc.) of most interest to them. Once the available sample types, sample demographics, and sample size information are known, researchers can develop specific proposals to request those data and samples for their research.
Within the NCSA model, the user driven content framework and tools enable researchers to target specific populations, topic areas, and types of biological or environmental samples to determine quickly the feasibility of a research proposal. The research tools empower researchers to explore study information and to develop and refine their research ideas independent of SMEs. The tools help to decrease the knowledge gap between Archive staff and general users and allow Archive staff to provide expert knowledge and experience on narrowly targeted questions about potential research.
Once researchers determined the feasibility of using NCS resources, the Archive leveraged a user-driven electronic proposal submission system to submit, process, and review researcher requests for data alone or for combined data and biological and/or environmental samples to pursue their scientific objectives. Electronic forms collect information such as title, requesting investigator, institution, other users for  (23). Additionally, the sample request form collects data about shipping accounts, laboratory contact information, specific sample information, and testing parameters such as target analytes and assay platforms. Along with the completed request form, potential researchers can submit documentation of institutional review board (IRB) determination or review and approval, documentation of funding availability, and letters of institutional support. Other documents such as curriculum vitae (CV) or biosketch are encouraged but not required. When a research request is submitted, the researcher receives notification via email of receipt and processing. The NCSA model framework thus allows user-driven discovery, ideation, and data request submission.
Submitted requests next pass through a stringent multistep review. Requests are first screened by Archive staff for completeness and clarity. Analysts then assess the practical feasibility of request fulfillment. Preliminary data frequencies are run and availability of suitable samples in sufficient quantities and amounts is reviewed, according to information submitted in the request form. In parallel, a review is conducted to assure compliance with regulatory and ethical requirements, evaluate appropriateness for the intended purpose of the specific data and any samples requested, and determine whether the research plan is consistent with the NCS informed consent. Discussions between the researcher, Archive staff, and NICHD staff may be needed to determine how best to fulfill a request. The potential impact on the repository inventory also is taken into consideration when samples potentially suitable for a request are identified, and an assessment is made whether the selected samples (including quantity, volume, concentration, etc.) and proposed assay methods are suitable for the planned research. If the requested data or samples are unavailable, or research plans are deemed unacceptable, immediate notification of investigators is provided, with further processing of the request suspended.
After the initial review, the request and the preliminary feasibility analysis enter a queue for formal review by an NCS Data Access Committee (DAC) composed of NICHD scientists and ad-hoc subject matter experts. Requests can be reviewed by individual DAC members immediately and are not dependent on a formal meeting. Members are given an initial 10-day review window and have access to review all submitted request materials within the electronic request system. Members use standard evaluation criteria for review ( Table 6) and can share thoughts amongst themselves in a restricted discussion board.
If the DAC has questions, needs clarification, or needs additional information, the Archive staff work with the researcher to provide the requested information. Once the review is completed, the requester is notified of the review outcome by Archive staff. Accepted requests continue along the data access process, while unapproved research requests are returned to the requester with explanation. The next step in the data access process is completion of a Research Materials Distribution Agreement (RMDA). The RMDA is a contract between the Principal Investigator of the proposed research, their institution, and the NICHD. It ensures that all parties are signatories and aware of the agreed terms for accessing participant-level NCS data, and it confirms their commitment to the policies and procedures put in place to protect the rights of NCS participants and institutions. Once the RMDA is executed, researchers move to the third tier of access.

Tier 3: Restricted Access
The restricted access tier of the NCSA model is the most highly restricted level. It provides access and interaction with participant-level data collected during the NCS Vanguard Study and is the tier at which Archive staff interact with identifiable data. For the NCSA, investigators enter the restricted access tier through a secure virtual environment, known as the Researcher Portal.
For the Researcher Portal, Archive staff create a personalized virtual machine with statistical software that the researcher accesses through security authentication requirements (twofactor token, username, and password). Staff remain in communication with the researcher to ensure successful access to the Portal. The Portal allows for a spoke-hub distribution model whereby archive staff can centrally coordinate the flow of information between geographically disparate researchers.
Archive staff populate the researcher's folder with the requested data and coordinate all aspects of sample distribution, including sample selection based on proposal-defined characteristics. As an incentive to return laboratory test result data to the Archive and thereby expand available Archive data resources, participant-sample linkages are withheld until results are returned. With the NCSA model the limited technical  Potential or enrolled participant demographics or health status are not described at the individual level support on tier 1 allows staff to provide concierge support on tier 3 for researchers. Staff work closely with the researcher to ensure that the Archive can provide for successful completion of a project. This includes assisting with any questions about how the data were processed, providing additional data analysis, transferring output, programs, and summary files in and out of the Portal, linking supplemental files such as geographic characteristics or other data to NCS data, and assisting with writing and reviewing articles for publication, including adherence to NCSA data publication guidelines ( Table 7). Prior to release of information once the research project is complete, Archive staff performs a disclosure review of any information to be transferred out of the Researcher Portal to the researcher, to assure that the information meets disclosure standards. From its opening in November of 2015 through December 2019, there were over 20,000 visits to the NCS Archive ( Table 8) by users from more than 50 different countries.
Those visitors spent on average 6 min and 27 s on the site browsing the available resources. This highlights the strength of the NCSA model, where the resources for user selfservice are both available and utilized. Of those visits, 476 moved from Tier 1 access to Tier 2 by registering with the site. Using the exploratory tools and accessing study documents, those users submitted 59 research requests to gain access to NCS data and specimens, 44 of which started or completed research within the NCSA Tier 3 framework. These researchers represent 43 academic, non-profit, commercial, and government organizations/institutions. The researchers, 60 percent of whom have more than 10 years of scientific experience, range from undergraduate and graduate students to research scientists, professors and medical doctors. Over fifty peer reviewed publications have resulted from use of NCS Vanguard data and specimens (1-7, 24-69) as of January 2021.

DISCUSSION
Data sharing is becoming an imperative for U.S. scientific research funding agencies. Recent calls for mandated data sharing (70) were followed by the October 29, 2020 announcement of the Final NIH Policy for Data Management and Sharing (71). The new policy establishes the requirements of submission of Data Management and Sharing Plans, applies to research funded or conducted by NIH that results in the generation of scientific data, and links compliance to funding actions. Related notices provide considerations for selecting a data repository, desirable characteristics for all data repositories, and additional considerations for repositories storing human data (72). NIH promotes the use of established data repositories because deposit in a quality data repository generally improves the FAIRness (Findable, Accessible, Interoperable, and Re-usable) of the data. The National Library of Medicine provides a useful listing of NIH Data Repositories at https://www.nlm.nih.gov/NIHbmic/ nih_data_sharing_repositories.html.
The widely acknowledged benefits of data sharing must be balanced against the paramount need to protect privacy and maintain confidentiality of research participants (73). The human scientific research enterprise relies absolutely upon trust between The three-tier NCSA framework developed by the NCS Archive provides a model that allows researchers to evaluate information and data at a level appropriate for them, protects the rights of research subjects, and provides interactive support to researchers for specific research goals and outcomes.
The NCSA example represents a new model for adding value to large cohort studies after they close or even during their conduct and can serve as a model for effective and interactive data platform development, with tools and processes that empower users, protect research participants, and reduce the distance between data and knowledge.

PEER REVIEW
External peer review of the National Children's Study was carried out by the National Academy of Sciences in 2008 1

DATA AVAILABILITY STATEMENT
The datasets and other materials supporting the conclusions of this article are available in the National Children's Study Archive (https://ncsarchive.s-3.net) through the spring of 2020. In 2020, the NCS Archive data files will be available in the NICHD Data and Specimen Hub (DASH) (https://dash.nichd.nih.gov/study/ 228954) after removal of HIPAA identifiers.

ETHICS STATEMENT
Written informed consent was obtained from all participants and the study protocol was approved by the Institutional Review Boards (IRBs) of NICHD and each Vanguard Study institution. All procedures involving human subjects were approved by the Institutional Review Board of the Eunice Kennedy Shriver National Institute of Child Health and Human Development (Office for Human Research Protections Database IRB00000008 -National Insts Hlth -NICHD -IRB #8, FWA00005897).

AUTHOR CONTRIBUTIONS
PG and JM prepared the final manuscript. All authors participated in the design and conduct of the project, analysis and interpretation of the data, manuscript preparation and review, and read and approved the final manuscript.

FUNDING
The National Children's Study was created by Congressional mandate through the Children's Health Act of 2000 (Public Law 106-310 Sec. 1004). It was funded as a direct line item in the Congressional appropriation for the National Institutes of Health in FY2010, FY2011, FY2012, FY2013, FY2014, FY2015, FY2016, and FY2017. Funding for development and operation of the NCS Data and Sample Archive and Access System was provided through NICHD Contract HHSN275201500050U.