Edited by: Ignazio Piras, Translational Genomics Research Institute, United States
Reviewed by: Mauro Pala, National Research Council (CNR), Italy; Claudia Pisanu, University of Cagliari, Italy
This article was submitted to Genetics of Aging, a section of the journal Frontiers in Genetics
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Cognitive aging is one of the major problems worldwide, especially as people get older. This study aimed to perform global gene expression profiling of cognitive function to identify associated genes and pathways and a novel transcriptional regulatory network analysis to identify important regulons. We performed single transcript analysis on 400 monozygotic twins using an assumption-free generalized correlation coefficient (GCC), linear mixed-effect model (LME) and kinship model and identified six probes (one significant at the standard FDR < 0.05 while the other results were suggestive with 0.18 ≤ FDR ≤ 0.28). We combined the GCC and linear model results to cover diverse patterns of relationships, and meaningful and novel genes like
Cognitive impairment is a global challenge that creates cost, social, and economic challenges for society in many populations, in particular, older populations worldwide. Although, some effort has been made to understand the genes and biological pathways involved in cognitive functioning through gene expression analysis, there is still a lack of knowledge. Harries et al. (
The popular statistical models used in the analysis of gene expression data are usually linear models, which are controlled by multiple assumptions, including normality of phenotype and linear relations between expression level and the phenotype. In the case of having twins in the data, linear mixed-effect models are appropriate to deal with the correlation structure in the data. Imposing multiple assumptions in linear models might be the reason for having a smaller number of important markers in gene expression analysis. However, recently a couple of studies have shown the strength of generalize correlation coefficient (GCC) as a non-parametric method that is able to identify different patterns, deal with correlated twin samples as well as non-normality of the phenotype without imposing strict assumptions (Reshef et al.,
Transcription factors (TFs) are specific DNA sequences that affect gene expression by promoting or repressing the target genes. Mutation in TFs and TF binding sites determine many human diseases. The group of genes that are controlled by TFs are called regulons (Lambert et al.,
This study aimed to perform two analyses: (1) a global gene expression analysis of cognitive function measured in monozygotic (MZ) twins to identify significant genes and pathways associated with the phenotype by applying the assumption-free GCC and linear models, (2) investigate the significance of previously reported cognitive function-related TFs through a gene regulatory network analysis.
We used 400 MZ twins (220 males and 180 females) (
Whole blood was collected in PAXgene Blood RNA Tubes (PreAnalytiX GmbH, Hombrechtikon, Switzerland) and total RNA was extracted using the PAXgene Blood miRNA kit (QIAGEN) according to the manufacturer's protocol. The extracted RNA concentration was determined using a NanoDrop spectrophotometer ND-8000 (NanoDrop Technologies), and the quality was assessed by the Agilent 2100 Bioanalyzer. Gene expression profiling was performed using the Agilent SurePrint G3 Human GE v2 8×60K Microarray (Agilent Technologies). This array contains 62,976 60-mer probes. The array hybridization and sample labeling were done according to the “Two-Color Microarray-Based Gene Expression Analysis—Low Input Quick Amp Labeling” protocol. Samples were labeled Cy5 and the reference consisting of a pool of 16 samples was labeled Cy3. Hybridization, washing, scanning, and quantification were performed according to the array manufacturer's recommendations (Nygaard et al.,
The R package limma was used for quality control (QC) of the data (Ritchie et al.,
First, we adjusted covariates age, sex, and cell composition on gene expression data. Next, we applied GCC, kinship, and LME models to investigate the association between mRNA expression level and cognitive function. In the linear models, both LME from the
The kinship model calculates a kinship matrix and integrates it in the covariance matrix of the expression data. For GCC analysis, the
The adjustment for multiple testing was performed by the Benjamini & Hochberg false discovery rate (FDR) correction method (Benjamini and Hochberg,
A total number of 1,968 genes (
We used the R package
The QQ plot and Manhattan plot are shown in
QQ plot for single mRNA probe analysis from GCC, kinship, and LME models.
Manhattan plot for single mRNA probe analysis from GCC, kinship, and LME models along each chromosome.
List of top 20 mRNA probes from the single mRNA probe analysis.
A_23_P143713 | 0.018 | 22 | 39477481 | 1.569e-6 | 0.632 | 0.633 | 1.569e-6 | 0.04 | |
A_33_P3303742 | 0.080 | 5 | 131142893 | 2.889e-5 | 0.635 | 0.636 | 2.889e-5 | 0.18 | |
A_24_P626850 | 0.005 | 1 | 9330351 | 2.928e-5 | 0.140 | 0.141 | 2.928e-5 | 0.18 | |
A_32_P223059 | 0.072 | 1 | 8390869 | 0.003 | 3.027e-5 | 3.714e-5 | 3.027e-5 | 0.18 | |
A_33_P3394213 | 0.033 | 19 | 1009648 | 0.204 | 3.197e-5 | 3.971e-5 | 3.197e-5 | 0.18 | |
A_33_P3389649 | 0.093 | 5 | 59064133 | 3.877e-5 | 0.912 | 0.912 | 3.877e-5 | 0.18 | |
A_21_P0014060 | 0.046 | 1 | 172362943 | 0.120 | 0.0001 | 0.0001 | 0.0001 | 0.28 | |
A_23_P354175 | 0.008 | 4 | 1717772 | 0.0002 | 0.255 | 0.256 | 0.0002 | 0.28 | |
A_33_P3411025 | 0.135 | 10 | 99019229 | 0.0002 | 0.872 | 0.872 | 0.0002 | 0.28 | |
A_23_P329375 | 0.063 | 12 | 51583372 | 0.013 | 0.0002 | 0.0002 | 0.0002 | 0.28 | |
A_23_P216476 | 0.069 | 9 | 37438478 | 0.001 | 0.0002 | 0.0002 | 0.0002 | 0.28 | |
A_23_P325080 | 0.036 | 19 | 50358243 | 0.0002 | 0.742 | 0.742 | 0.0002 | 0.28 | |
A_33_P3278560 | 0.037 | 19 | 58102575 | 0.081 | 0.0002 | 0.0002 | 0.0002 | 0.28 | |
A_33_P3282241 | 0.013 | 11 | 55606873 | 0.180 | 0.0003 | 0.0003 | 0.0003 | 0.28 | |
A_23_P205074 | 0.114 | 13 | 29278193 | 0.0003 | 0.624 | 0.624 | 0.0003 | 0.28 | |
A_23_P130027 | 0.080 | 17 | 48620056 | 0.0003 | 0.215 | 0.216 | 0.0003 | 0.28 | |
A_33_P3266078 | 0.025 | 11 | 6806742 | 0.096 | 0.0003 | 0.0004 | 0.0003 | 0.28 | |
A_21_P0000024 | 0.113 | 13 | 28979986 | 0.0003 | 0.008 | 0.009 | 0.0003 | 0.28 | |
A_23_P144896 | 0 | 5 | 176910887 | 0.0003 | 0.297 | 0.249 | 0.0003 | 0.28 | |
A_23_P34888 | 0.059 | 1 | 111863116 | 0.0004 | 0.992 | 0.992 | 0.0004 | 0.28 |
The list of mRNA probes with
Top 20 significant KEGG and Reactome biological pathways from GSEA.
KEGG_RIBOSOME | Ribosome | 26 | 6.15 e−15 | 1.14 e−12 |
KEGG_FOCAL_ADHESION | Focal adhesion | 21 | 2.36 e−4 | 2.19 e−2 |
KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERA ERACTION | Cytokine-cytokine receptor interaction | 25 | 3.66 e−4 | 2.27 e−2 |
KEGG_CYSTEINE_AND_METHIONINE_METABOLIS LISM | Cysteine and methionine metabolism | 7 | 6.4 e−4 | 2.97 e−2 |
KEGG_P53_SIGNALING_PATHWAY | p53 signaling pathway | 10 | 8.22 e−4 | 3.06 e−2 |
KEGG_HUNTINGTONS_DISEASE | Huntington's disease | 18 | 1.19 e−3 | 3.37 e−2 |
KEGG_ENDOCYTOSIS | Endocytosis | 18 | 1.27 e−3 | 3.37 e−2 |
REACTOME_EUKARYOTIC_TRANSLATION_ELONGATION | Eukaryotic translation elongation | 27 | 4.06 e−15 | 5.95 e−12 |
REACTOME_METABOLISM_OF_AMINO_ACIDS_AND_ DERIVATIVES | Metabolism of amino acids and derivatives | 55 | 7.77 e−15 | 5.95 e−12 |
REACTOME_SELENOAMINO_ACID_METABOLISM | Selenoamino acid metabolism | 29 | 3.52 e−14 | 1.8 e−11 |
REACTOME_METABOLISM_OF_RNA | Metabolism of RNA | 76 | 1.26 e−13 | 4.83 e−11 |
REACTOME_RESPONSE_OF_EIF2AK4_ GCN2_TO_AO_AMINO_ACID_ DEFICIENCY | Response of EIF2AK4 (GCN2) to amino acid deficiency | 26 | 2.91 e−13 | 8.31 e−11 |
REACTOME_RRNA_PROCESSING | rRNA processing | 37 | 3.68 e−13 | 8.31 e−11 |
REACTOME_INFLUENZA_INFECTION | Influenza infection | 32 | 3.8 e−13 | 8.31 e−11 |
REACTOME_NERVOUS_SYSTEM_ DEVELOPMENT | Nervous system development | 68 | 4.61 e−13 | 8.83 e−11 |
REACTOME_SIGNALING_BY_ROBO_ RECEPTORS | Signaling by ROBO receptors | 37 | 2.54 e−12 | 3.94 e−10 |
REACTOME_EUKARYOTIC_ TRANSLATION_INITIATION | Eukaryotic translation initiation | 27 | 2.57 e−12 | 3.94 e−10 |
We used 17 TFs as input and after the filtering and bootstrapping process, seven regulons remained, which among, five significant regulons from GSEA-2T analysis were identified with FDR < 1.3e-4. Among these significant regulons, two are positively associated with target genes, and three regulons are negatively associated with target genes.
List of seven regulons identified from GSEA-2T among which five significant regulons were identified with FDR < 1.3e-4.
CTCF | 2,164 | 0.89 | 0.0000999 | 0.00013999 |
REST | 2,559 | 0.67 | 0.0000999 | 0.00013999 |
SP3 | 7,490 | −1.24 | 0.0000999 | 0.00013999 |
SRF | 2585 | −0.87 | 0.0000999 | 0.00013999 |
XBP1 | 7367 | −0.94 | 0.0000999 | 0.00013999 |
TCF4 | 1,540 | 0.08 | 0.13929 | 0.1625 |
FOXP2 | 56 | 0.21 | 0.36526 | 0.36526 |
Two-tailed GSEA illustrating five significant identified regulons from regulatory network analysis. Regulons are split into positive and negative targets, and differential enrichment score (dES) is shown for positive (red line) and negative (blue line) targets.
Through applying both the assumption-free GCC and linear models, this exploratory study was able to capture diverse patterns of relations not limited to those from linear models. We were able to identify interesting genes and pathways implicated in cognitive function. Also, the novel transcription regulatory analysis paved the way for the detection of significant regulons associated with cognitive function.
The
The other gene is
The
We found interesting and important pathways which might be implicated in cognitive impairment (
A total number of five significant regulons were enriched by GSEA-2T analysis of transcriptional regulation, in which
Overall, through applying GCC as a complementary method along with the linear models, this exploratory study was able to detect more important and meaningful differentially expressed genes and biological pathways implicated in cognitive function. Additionally, applying transcriptional analysis could reveal the link between significant regulons and cognition which further confirms that previously noted TFs are associated with cognitive function.
According to Danish and EU legislations, transfer and sharing of individual-level data require prior approval from the Danish Data Protection Agency and require that data sharing requests are dealt with on a case-by-case basis. However, we welcome any enquiries regarding collaboration and individual requests for data sharing. Requests can be directed to JH,
The studies involving human participants were reviewed and approved by The Regional Scientific Ethical Committees for Southern Denmark (S-VF-19980072). The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.
JH and QT contributed to the conception and design. AM performed the data analysis and wrote the manuscript. All authors read and approved the final manuscript.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at: