Common Data Elements to Facilitate Sharing and Re-use of Participant-Level Data: Assessment of Psychiatric Comorbidity Across Brain Disorders

The Ontario Brain Institute's “Brain-CODE” is a large-scale informatics platform designed to support the collection, storage and integration of diverse types of data across several brain disorders as a means to understand underlying causes of brain dysfunction and developing novel approaches to treatment. By providing access to aggregated datasets on participants with and without different brain disorders, Brain-CODE will facilitate analyses both within and across diseases and cover multiple brain disorders and a wide array of data, including clinical, neuroimaging, and molecular. To help achieve these goals, consensus methodology was used to identify a set of core demographic and clinical variables that should be routinely collected across all participating programs. Establishment of Common Data Elements within Brain-CODE is critical to enable a high degree of consistency in data collection across studies and thus optimize the ability of investigators to analyze pooled participant-level data within and across brain disorders. Results are also presented using selected common data elements pooled across three studies to better understand psychiatric comorbidity in neurological disease (Alzheimer's disease/amnesic mild cognitive impairment, amyotrophic lateral sclerosis, cerebrovascular disease, frontotemporal dementia, and Parkinson's disease).

The Ontario Brain Institute's "Brain-CODE" is a large-scale informatics platform designed to support the collection, storage and integration of diverse types of data across several brain disorders as a means to understand underlying causes of brain dysfunction and developing novel approaches to treatment. By providing access to aggregated datasets on participants with and without different brain disorders, Brain-CODE will facilitate analyses both within and across diseases and cover multiple brain disorders and a wide array of data, including clinical, neuroimaging, and molecular. To help achieve these goals, consensus methodology was used to identify a set of core demographic and clinical variables that should be routinely collected across all participating programs. Establishment of Common Data Elements within Brain-CODE is critical to enable a high degree of consistency in data collection across studies and thus optimize the ability of investigators to analyze pooled participant-level data within and across brain disorders. Results are also presented using selected common data elements pooled across three studies to better understand psychiatric comorbidity in neurological disease (Alzheimer's disease/amnesic mild cognitive impairment, amyotrophic lateral sclerosis, cerebrovascular disease, frontotemporal dementia, and Parkinson's disease).

INTRODUCTION
The Ontario Brain Institute's (OBI) "Brain-CODE" informatics platform (www.braincode.ca) (1) was designed to support the collection, integration, sharing, and analysis of diverse types of patient-level data across several brain disorders, including neurodevelopmental disorders (www.pond-network.ca), cerebral palsy (www.cpnet.canchild.ca), epilepsy (www.eplink.ca), major depressive disorder (MDD, www.canbind.ca), concussion (www.connectontario.ca), and neurodegenerative/neurovascular cognitive disorders (www.ondri.ca). These programs provide an opportunity to facilitate collaboration across disorders; with pooling of data across these programs expanding the utility of the individual datasets to support cross-disease comparisons and generalizability of findings. However, combining participant-level data from multiple studies and patient populations can be challenging, as different measures are often used to assess the same constructs (2). These data must be sufficiently comparable to allow meaningful data integration, and in the absence of common demographic variables and outcome measures it is difficult to pool data from different initiatives (2). Establishment up front of a minimum set of unambiguously defined and standardized assessments across initiatives will facilitate data sharing and integration and enable meaningful cross-initiative analyses (3)(4)(5)(6).
To help achieve this level of collaboration and data interoperability, a set of common demographic and clinical outcome measures were identified and adopted across all of the aforementioned research programs. The adoption of common data elements (CDEs) within these studies reduces the variability of data collection, and ultimately supports the secondary use of Brain-CODE data by facilitating the pooling of participant-level data across datasets (1,(3)(4)(5)(6). The primary objective of the present study was to use the CDEs to assess psychiatric comorbidity across various neurological diseases. We present here a summary of how we determined and used the Brain-CODE CDEs, including an example using CDEs pooled across three programs to assess symptoms of depression and anxiety across neurological diseases (Alzheimer's disease/amnesic mild cognitive impairment, amyotrophic lateral sclerosis, cerebrovascular disease, frontotemporal dementia and Parkinson's disease), as well as major depressive disorder (MDD).

Delphi Consensus Process
A modified Delphi survey process (7,8) was used to identify core demographic and clinical variables to be collected across all participating programs. Researchers from the five programs were invited to an online survey hosted through the Brain-CODE portal. Participants were asked to comment and respond to statements on a 5-point Likert scale regarding the collection of demographic and clinical variables, with possible responses ranging from Not Important to Very Important (example: How important is the collection of date of birth to achieving Brain-CODE goals?) to Do Not Recommend to Highly Recommend (example: Please provide your recommendation for the GAD-7 to assess anxiety in adults across all programs) or Disagree to Strongly Agree (example: QIDS-SR is appropriate to assess depression across all programs in adults). A Do Not Know option was included for all questions. Participants were also provided open-ended questions to allow them to comment on their answers, provide an additional opinion about Brain-CODE CDEs beyond the specified variables, and whether additional CDEs should be considered. The results were reviewed, and anonymized aggregated ratings and comments were presented back to the participants in a follow-up survey to obtain additional opinion and clarification, as required. Participants were directed to consider the results of the previous survey in their responses.
Prior to sending out the Delphi surveys, we identified the demographic and clinical domains that would be brought forward for consensus. Following the review of research project data dictionaries, study protocols, and through interactions with program researchers, we identified the following demographic domains of relevance across programs to consider for consensus: Sex, Date of Birth, Handedness, Ethnicity, Race, Education Level, Marital Status, Primary Language, Place of Birth, Geographic Region, and Height/Weight.
For the clinical CDEs, a preliminary online survey was sent to the programs to provide opinions on the symptom domains that may be of relevance to their program and across a broad range of patient populations. The results were presented at a follow-up workshop. Following group discussion there was agreement that psychiatric and medical comorbidity, depression, anxiety, sleep, quality of life, and activities of daily living should be assessed across all programs. These domains were considered relevant across programs, as comorbid psychiatric symptoms are often reported across a broad range of patient population that can impact health (9)(10)(11)(12). There was also agreement that when possible, the measure should be patient-reported, brief and easy to administer, widely used and validated, and available in the public domain. Based on literature and expert opinion, potential rating scales were then identified for each of the symptom domains to consider for consensus. A summary of the scales was presented to the participants in the Delphi survey with the aim of achieving consensus for a common measure for each symptom domain.
Although the threshold for consensus is arbitrary, recommended criteria for Delphi consensus generally range from 70 to 80% of agreement within two categories (8). In the present surveys, this would include ratings of Important/Very Important, Recommend/Highly Recommend or Agree/Strongly Agree. Consensus levels of >70% were considered, with other factors also weighted including harmonization with existing relevant databases. When consensus was not achieved, representatives of the relevant programs were asked to discuss internally and provide their recommendations. Thirty-six researchers participated in the demographic surveys and 44 in the clinical surveys. There was broad representation of opinion, including Principal Investigators (27%), Co-Principal Investigators (36%), Project Managers (27%), and Research Coordinators and Investigators (9%). The respondents indicated that the populations studied in their programs included adult (59%), child and adolescent (46%), infant (18%), and geriatric (18%).

Demographic and Clinical CDEs Selected
The demographic and clinical CDEs that were agreed upon are shown Table 1 and are summarized in Supplementary Material. To promote uptake and standardization of the CDEs, data dictionaries for the CDEs were made available on the Brain-CODE portal and electronic case report forms were independently validated prior to data collection to ensure adherence to naming standards. These core Demographic and Clinical CDEs have been successfully implemented across participating programs (please see www.braincode.ca for most recent numbers of CDEs collected within each program).

ASSESSMENT OF PSYCHIATRIC COMORBIDITY ACROSS NEUROLOGICAL DISEASES AND CONTROLS
Comorbid psychiatric symptoms are often reported across a broad range of patient populations that can impact health (9)(10)(11)(12) and are an integral part of neurological disorders (13)(14)(15). To better understand the expression of depression and anxiety in neurological disorders, we analyzed Brain-CODE CDEs (Demographics, QIDS-SR, and GAD-7) pooled across the five neurological diseases (Alzheimer's disease/amnesic mild cognitive impairment, amyotrophic lateral sclerosis, cerebrovascular disease, frontotemporal dementia, and Parkinson's disease) and MDD.

Study Population and Datasets Ontario Neurodegenerative Disease Research Initiative
The Ontario Neurodegenerative Disease Research Initiative (ONDRI, NCT04104373) is a multi-site prospective cohort study developed to characterize and track progression of neurodegenerative and cerebrovascular disorders (16). Cohorts include: Alzheimer's disease (AD) or amnestic single or multidomain mild cognitive impairment (MCI), amyotrophic lateral sclerosis (ALS), cerebrovascular disease (CVD), frontotemporal dementia (FTD), and Parkinson's disease (PD). These groups were included in the present study to assess the impact of depression and anxiety across neurological disorders. Please see Farhan et al. (16) and Sunderland et al. (17) for protocol details, inclusion, and exclusion criteria.

Canadian Biomarker Integration Network for Depression Study
The Canadian Biomarker Integration Network in Depression Study (CAN-BIND-1, NCT01655706) is a prospective multisite study developed to identify biomarkers of antidepressant response in MDD (18,19). The study collects imaging, clinical, and 'omics data that will be used to build predictive models of treatment response. Cohorts include people with MDD and healthy comparison participants, both of which are included in the present analysis. Please see Lam et al. (19) for protocol details, inclusion, and exclusion criteria.

Analysis Dataset
All data from ONDRI, CAN-BIND, and BEAM were collected and deposited into the Brain-CODE database (1) All studies were carried out in accordance with the Declaration of Helsinki and the International Council for Harmonization (ICH) of Technical Requirements for Pharmaceuticals for Human Use guidelines, and the study designs and procedures were reviewed by the appropriate ethics committees; informed consent was obtained from participants after full explanation of the nature of the procedures.

Symptoms of Depression
The QIDS-SR is a 16-item self-report measure that assesses the severity of depressive symptoms based on DSM-5 criteria for a major depressive episode, with items scored on a 4-point scale from 0 to 3 (20). Scoring of the QIDS-SR converts the 16 items into 9 DSM domains (sad mood, concentration, self-criticism, suicidal ideation, interest, energy/fatigue, sleep disturbances, and changes in appetite/weight), with the total score ranging from 0 to 27. It is important to note that because of ethics concerns (challenges in providing immediate follow-up with those expressing suicidal ideation), item #12 assessing suicidality was omitted from the QIDS-SR in both ONDRI and BEAM, and was therefore removed from all analyses. This was not expected to impact the scale's ability to discriminate MDD, given that removal of suicidality in other scales [for example, PHQ-9 (with suicide item) vs. PHQ-8 (without suicide item)] does not impact the scale's psychometric properties (21).

Symptoms of Anxiety
The GAD-7 is a 7-item self-report measure that is used to assess the severity of generalized anxiety symptoms (22). Items rate the severity of the 7 symptoms over the past 2 week on a 4-point scale (0 = not at all, 1 = several days, 2 = more than half the days, and 3 = nearly every day), with total score ranging from 0 to 21.

Analyses
Demographic and clinical characteristics were calculated and compared across all cohorts; ANOVA was used for comparison of continuous variables and chi-squared for comparison of categorical variables. Analyses were performed using SPSS V26. A level of p < 0.05 was regarded as statistically significant. Furthermore, to assess the validity of the CDEs in assessing depression and anxiety across the different cohorts, Cronbach's alpha and item-total correlations were also calculated for the QIDS-SR and GAD-7 within each of the cohorts as a measure of the scale's internal consistency. Table 2 shows Cronbach's alpha and item-total correlations for the QIDS-SR and GAD-7 for the ONDRI neurological disease (ND) and CAN-BIND MDD cohorts. For the QIDS-SR, alphas ranged from 0.68 (MDD) to 0.74 (FTD, PD), suggesting that although some of the items are assessing the same construct, and others may not. Notably, low item-total correlations (r < 0.3) were noted for the "Sleep" domain in AD/MCI, CVD, FTD, and PD, which were also questionable within the ALS (r = 0.  were also noted for the "psychomotor" item in ALS (r = 0.30) and MDD (r = 0.22). For the GAD-7, alpha was good across all cohorts, ranging from 0.83 (FTD) to 0.92 (PD). Item-total correlations were also acceptable (r > 0.3), which the exception of the GAD-7 "afraid" item in FTD (r = 0.21; see Table 2). Table 3 summarizes demographic and clinical characteristics of the ND cohorts (AD/MCI, ALS, CVD, FTD, and PD), MDD cohort and MDD-and ND-matched healthy controls. ANOVA revealed significant age differences across cohorts [F (7,888) = 394.77, p < 0.01]. No age differences were found between ONDRI ND cohorts and BEAM healthy controls or between CAN-BIND MDD and healthy controls. As expected, the ND cohorts were older than the MDD cohorts (see Table 3). Sex differences were noted across cohorts, with the proportion of female participants in the CAN-BIND cohorts representative of the sex distribution of MDD (10,23,24). The ONDRI ND cohorts were predominately male [ranging from 22.1% (PD) to 45.2% (AD/MCI) female], whereas the BEAM healthy control cohort was predominately female (75%). These sex differences between ONDRI ND cohorts and BEAM healthy controls may reflect differences in subject recruitment protocols, as the ONDRI protocol required participants to have a partner who knew them well during assessment, and that may have influenced sample demographics (16,25). The participation of a knowledgeable study-partner is critical to dementia-related research to help manage study logistics (i.e., scheduling study visits, and transportation) and as informants (26). In the ONDRI study, the majority of participants had a spousal study partner and those with spousal study partners were more often male (25). Indeed, as females generally outlive males, this requirement may have resulted in biases toward recruitment of males with surviving female partners within the ONDRI study. Table 3 also shows the QIDS-SR and GAD-7 scores across cohorts. ANOVA revealed a significant difference in total QIDS-SR score across cohorts [F (7,887) = 261.99, p < 0.01]. As expected, the MDD cohort had significantly higher QIDS-SR scores (14.92 ± 3.76) than the CAN-BIND healthy controls (2.27 ± 1.94) and all ONDRI ND cohorts (3.93 ± 3.12 to 6.28 ± 3.70) (all p < 0.05). QIDS-SR scores for the ONDRI cohorts were significantly higher than BEAM healthy controls (3.04 ± 2.11) for ALS (6.28 ± 3.70), FTD (5.36 ± 4.07), and PD (5.61 ± 3.28) (all p < 0.05), but not for AD/MCI (3.93 ± 3.12, p = 0.77) and CVD (4.27 ± 3.11, p = 0.33). Figure 1 shows the percentage of participants showing different levels of depressive severity acorss cohorts. In the ALS, FTD, and PD cohorts, at least mild symptoms of depression (total score > 5) were reported by 55, 35.7, and 43.5% of participants, and least moderate symptoms of depression were reported by 15, 11.3, and 8.6% of participants, respectively. By contrast, mild or moderate levels of depression or greater were reported by 2.2 and 0% of BEAM healthy control participants, respectively (see Figure 1).

DISCUSSION
The Ontario Brain Institute supports multidisciplinary collaborative Research Networks from across the province of Ontario and beyond, focusing on various brain conditions. These programs have generated large volumes of data that are integrated within the Brain-CODE platform to support scientific inquiry and analytics across multiple brain diseases and modalities, including clinical, imaging and molecular data (1). By providing access to participant-level data across different disorders, new hypotheses about brain disease and underlying causes will be generated, and ultimately promote new discoveries to improve patient care. To help achieve these goals, the establishment of demographic and clinical CDEs within Brain-CODE is a critical step toward enabling consistency in data collection and optimizing the ability of investigators to analyze pooled participant-level data across brain disorders. Furthermore, the Brain-CODE CDEs provide a framework to facilitate collaborations across disciplines and increase our understanding symptom expression across diseases and comorbidity.
Adopting a set of standardized assessments across different disease areas to facilitate sharing of participant-level data is a challenging endeavor that must consider the need and goals of individual research programs, since each program must select the disease-specific measures that are most relevant. As a matter of good scientific research practice, the measurements selected should be scientifically valid and justified to support specified aims. The selection of CDEs, therefore, should consider the FIGURE 1 | Percentage of participants reporting no (total score ≤ 5), mild (total score = 6-10), moderate (total score = 11-15), severe (total score = 16-20), and very severe (total score ≥ 21) symptoms of depression in the QIDS-SR across the cohorts. feasibility, utility, and acceptability of outcome measures that will benefit from buy-in and cooperation from stakeholders (6), including engagement of representative researchers through participation in workshops and agreement on a set of common assessment and standards. In developing the present set of core CDEs, Delphi consensus-based methodology was used to inform and engage stakeholders, gain their input and opinion, and arrive at an agreement. By including open-ended questions and comments, suggestions and opinions were not restricted to the predefined variables, thus allowing for broadening of opinion and consideration of program-specific needs and challenges.
A key consideration was harmonizing the Brain-CODE CDEs with those of other large, centralized data repositories and CDE initiatives, as this would facilitate sharing of data among the larger research community to allow comparison of the results from one study with another. One example is the United States National Institute of Neurological Disorders and Stroke (NINDS) CDE initiative, which is developing CDEs for NINDS-supported clinical neuroscience research (27). This CDE initiative includes many disease areas that are in common with Brain-CODE, including epilepsy (28), traumatic brain injury (29) and neurodegenerative diseases (27). The Clinical Data Interchange Standards Consortium (CDISC) also provide data standards that should be considered for CDEs, including alignment with relevant Clinical Data Acquisition Standards Harmonization (CDASH)-recommended standards and guidelines for recording of demographic information (30). Indeed, alignment with existing standards should be an important consideration when designing a study to support collaborative research and data sharing, as described in the Guidelines for Data Acquisition, Quality and Curation for Observational Research Designs (DAQCORD) (31). However, it is important to recognize that although prospective standardization with existing, relevant databases is ideal to facilitate data synthesis, implementing common data collection standards with all relevant initiatives, particularly established studies and legacy data, is clearly not possible. Thus, to permit established studies or existing data to be sufficiently comparable, retrospective harmonization will also be required to define a set of core variables, including establishing conventions and equivalencies of data dictionary terms (32).
Neuropsychiatric symptoms are prevalent in neurological disorders, with rates varying anywhere from 0 to 90%, depending on the assessments used, patient samples and type of disease (33)(34)(35). Although prevalence rates across neurological disorders are difficult to ascertain as different outcome measures and criteria are often used, studies generally show prevalence rates of depression and anxiety that are higher than in the general population and negatively impact well-being and progression of disease (33)(34)(35). In the present analysis of participant-level data pooled across different studies and using standardized outcome measures, depression scores (QIDS-SR) were found to be higher in ALS, FTD, and PD cohorts, but not AD/MCI and CVD, as compared to age-matched controls (see Table 3). The mean QIDS-SR total scores for the ALS, FTD, and PD cohorts indicated overall "mild" levels of depression, as compared to the "moderate" levels observed in the MDD cohort (see Figure 1). No differences, however, were noted in GAD-7 scores in any of the ONDRI ND cohorts as compared to age-matched controls.
As part of the ONDRI protocol, individuals with "unstable psychiatric illness defined as psychosis (hallucinations or delusions), lifelong history of major depression, or untreated late-onset major depression within 90 days of the screening visit" were excluded from the study (16). As a result, individuals with more severe psychiatric symptoms or a clinical diagnosis of MDD were excluded from the ONDRI study. However, although the relatively low levels of depressive symptoms and absence of anxiety-related symptoms may in part be explained by ONDRI study selection bias, other factors may also be involved, including adaptive psychosocial adjustment and coping strategies (36,37). This is consistent with the "disability paradox" in which persons with neurological disorders (and other chronic illnesses) adapt to their disabilities and report better well-being and quality of life than would be expected from the general population (36). Studies are currently underway using the Brain-CODE CDEs to assess psychosocial determinants of quality of life across disorders.
One of the key objectives of the CAN-BIND and ONDRI programs is the discovery biomarkers (imaging and 'omics) to track disease and treatment response (16,19). The present study demonstrates that some symptoms are shared across disorders, and within disorders, individual symptom profiles and phenotypes vary. Indeed, there are approximately half a million symptom profiles that meet criteria for diagnosis of MDD (38). This heterogeneity presents a challenge to traditional biomarker and drug discovery research that is based on the binary classifications of diseases, such as DSM diagnosis; and given the wide range of phenotypes within diseases, this "topdown" approach to biomarker discover is imprecise at best. There may be advantages, therefore, to also incorporate a "bottom-up" strategy to identify relevant clinical phenotypes and associated biomarkers that will refine our understand of the pathophysiology underlying disease and ultimately develop personalized treatment approaches (38,39).
It is important that CDEs demonstrate good construct validity across the diverse populations supported by Brain-CODE. Although the clinical CDEs were selected in part because they are widely used, including in some of the disease areas currently supported in Brain-CODE, the validity of these outcome assessments across these diverse populations cannot be fully appreciated until appropriate validation is performed. In the present study, the GAD-7 showed good internal consistency across all cohorts, with Cronbach's alpha of 0.83 or greater and generally acceptable item-total correlations (see Table 3). By contrast, Cronbach's alphas for the QIDS-SR were generally lower (0.68-0.74), with item-total correlations identifying some items with poor discrimination (see Table 3). As psychiatric symptoms often overlap with symptoms observed in neurological disease, it is possible that some of the symptom assessed by the QIDS-SR and GAD-7 may be confounded with neurological disease-related symptoms (i.e., sleep, concentration and psychomotor). As validation relates not only to the instrument itself, but also how it is being used, it will be important that the measurement properties of these scales are evaluated across disease groups. These studies are currently underway, including the application of item response and Rasch measurement theory to evaluate scale performance (40).

DATA AVAILABILITY STATEMENT
The data analyzed in this study is subject to the following licenses/restrictions: participants' data used in this study are currently stored in the Brain-CODE Neuroinformatics Platform (https://www.braincode.ca/) managed by the Ontario Brain Institute. Requests to access these datasets should be directed to the Ontario Brain Institute at info@braininstitute.ca.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by all recruitment sites in accordance with the Governance Policy of Ontario Brain Institute as well the institutional policies. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
AV: manuscript preparation, study concept and design, statistical analysis and interpretation of data, and ONDRI data management and curation. KE: study concept and design, statistical analysis, and interpretation of data. DB: statistical analysis and interpretation of data, ONDRI data management, and curation. SGE, KH, MJ, FP, and SR: CAN-BIND data management and curation. SGE, MJ, BT, BL, SL, and AS: ONDRI and BEAM data management and curation. BF, SK, RL, DPM, RM, VT, and RU: CAN-BIND recruiting site leads. SB, EF, MF, AL, MM, RS, MT, and LZ: ONDRI recruiting site leads. All authors provided critical revision of the manuscript for important intellectual content, read, and approved the final manuscript.

FUNDING
This research was conducted with the support of the Ontario Brain Institute, an independent non-profit corporation, funded partially by the Ontario government. BEAM funding from Brain Canada, the Edwards Foundation and GE Healthcare for in kind support; funding was also received from Linda C. Campbell toward the BEAM study. CAN-BIND and ONDRI are Integrated Discovery Programs with support from the Ontario Brain Institute, with funding and/or in-kind support also provided by the investigators' universities and academic institutions. Additional funding for CAN-BIND was provided by CIHR, Lundbeck, Bristol-Myers Squibb, Pfizer, and Servier. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

ACKNOWLEDGMENTS
We thank the members of the research programs who participated in the surveys and workshops. Author Disclaimer: The opinions, results, and conclusions are those of the authors and no endorsement by the Ontario Brain Institute is intended or should be inferred.