The Promoter Regions of Intellectual Disability-Associated Genes Are Uniquely Enriched in LTR Sequences of the MER41 Primate-Specific Endogenous Retrovirus: An Evolutionary Connection Between Immunity and Cognition.

Social behavior and neuronal connectivity in rodents have been shown to be shaped by the prototypical T lymphocyte-derived pro-inflammatory cytokine Interferon-gamma (IFNγ). It has also been demonstrated that STAT1 (Signal Transducer And Activator Of Transcription 1), a transcription factor (TF) crucially involved in the IFNγ pathway, binds consensus sequences that, in humans, are located with a high frequency in the LTRs (Long Terminal Repeats) of the MER41 family of primate-specific HERVs (Human Endogenous Retroviruses). However, the putative role of an IFNγ/STAT1/MER41 pathway in human cognition and/or behavior is still poorly documented. Here, we present evidence that the promoter regions of intellectual disability-associated genes are uniquely enriched in LTR sequences of the MER41 HERVs. This observation is specific to MER41 among more than 130 HERVs examined. Moreover, we have not found such a significant enrichment in the promoter regions of genes that associate with autism spectrum disorder (ASD) or schizophrenia. Interestingly, ID-associated genes exhibit promoter-localized MER41 LTRs that harbor TF binding sites (TFBSs) for not only STAT1 but also other immune TFs such as, in particular, NFKB1 (Nuclear Factor Kappa B Subunit 1) and STAT3 (Signal Transducer And Activator Of Transcription 3). Moreover, IL-6 (Interleukin 6) rather than IFNγ, is identified as the main candidate cytokine regulating such an immune/MER41/cognition pathway. Of note, differences between humans and chimpanzees are observed regarding the insertion sites of MER41 LTRs in the promoter regions of ID-associated genes. Finally, a survey of the human proteome has allowed us to map a protein-protein network which links the identified immune/MER41/cognition pathway to FOXP2 (Forkhead Box P2), a key TF involved in the emergence of human speech. Our work suggests that together with the evolution of immune genes, the stepped self-domestication of MER41 in the genomes of primates could have contributed to cognitive evolution. We further propose that non-inherited forms of ID might result from the untimely or quantitatively inappropriate expression of immune signals, notably IL-6, that putatively regulate cognition-associated genes via promoter-localized MER41 LTRs.


INTRODUCTION
Interferon gamma (IFNγ), the prototypical T Helper 1 (TH1) cytokine, is a T-cell derived pro-inflammatory molecule exerting several effects on innate immune cells and others on nonimmune cells including neurons (Litteljohn et al., 2014). Specifically, IFNγ was recently shown to be a social behavior regulator and to shape neuronal connectivity in rodents (Filiano et al., 2016). Irrespective of cell type, binding of IFNγ to its receptor induces the transcriptional regulation of target genes via the recognition of promoter consensus sequences by the transcription factor STAT1 (Signal Transducer And Activator Of Transcription 1; Ramana et al., 2002;Green et al., 2017). In humans, an important share of the IFNγ/STAT1 pro-inflammatory pathway is mediated by the binding of STAT1 to consensus sequences localized in the Long Terminal Repeats (LTRs) of the MER41 family of Human Endogenous Retroviruses (HERVs; Chuong et al., 2016). Thus, in primates only, MER41 sequences located in the promoter regions of immune genes, serve as IFNγ-inducible enhancers that are indispensible for a full IFNγ-mediated immune response (Chuong et al., 2016). MER41 integrated into the genome of a primate ancestor 45-60 million years ago and a total of 7,190 LTR elements belonging to six subfamilies (MER41A-MER41G) are detectable in the modern human genome (Chuong et al., 2016). That being said, the hypothesis of an IFNγ/STAT1/MER41 pathway shaping social behavior and/or cognition in humans (Filiano et al., 2016) is not yet supported by experimental data. Addressing this issue is of importance as it could provide additional evidence on whether and how the immune system may translate environmental cues (including, possibly, cultural cues) into genomic regulatory pathways shaping behavior and/or cognition. Obviously, multiple research fields are concerned, from cognitive evolution to psychiatric disorders. As a first step, any human candidate gene(s) regulated by such a pathway need(s) to be identified. Additionally, it is important to determine if STAT1 is the sole immune TF that potentially regulate the transcription of behavior-and/or cognition-associated genes via MER41 LTRs in humans. Finally, it is worth considering whether the stepped integration of HERVs into the human genome, and more generally any evolutionary change dictated by infectious events, could be related to key cognitive specificities experienced by hominins.
Indeed, it was recently hypothesized that the horizontal transfer of genetic material by viral and non-viral vectors might have prompted the emergence of language in the human species (Benítez-Burraco and Uriagereka, 2016).
To address these issues, we followed a bioinformatics workflow relying on the use of two recently generated web tools allowing a survey of HERV sequences and their associated transcription factor binding sites (TFBSs) in the entire human genome. Using this approach, we found that the promoter regions of genes causatively linked to genetically determined intellectual disability (ID) are highly significantly enriched in LTR sequences of the MER41 family. Such an enrichment was unique to both MER41, as compared to more than 130 explored HERVs, and to ID-associated genes, as compared to lists of genes associated with autism spectrum disorder (ASD) or schizophrenia. The MER41 LTRs that localize in the promoter regions of ID-associated genes harbor binding sites recognized by canonical immune TFs including STAT1, STAT3, and NFKB1. From these data, we performed phylogenetic comparisons between humans and chimpanzees regarding: (i) MER41 LTR insertion sites in the promoter regions of candidate ID-associated genes, (ii) protein sequences of the immune-related TFs binding MER41 LTRs in such promoter regions. This was so as to infer putative differences regarding the immune/MER41/cognition pathway, which might ultimately account for some of the cognitive differences between humans and chimpanzees. Finally, since FOXP2 is currently known as a relevant TF regulating aspects of brain development and functions which are important for the execution of speech-related motor programs Vernes et al., 2007Vernes et al., , 2011Konopka et al., 2009;Oswald et al., 2017), we searched genomics and proteomics databases to map a putative functional interactome linking FOXP2 and the immune TFs binding MER41 LTRs. This way, we were able to unravel a HERV-driven evolutionary-determined connection between cognition and immunity with a potential impact on language evolution and the pathophysiology of ID.

MATERIALS AND METHODS
A scheme summarizing the workflow followed in the present work is shown in Figure 1.
FIGURE 1 | Workflow of the study. Rectangles (yellow or red) frame the main results obtained following each of the analytical steps briefly described in ellipse shapes (green or blue). Terms in italics correspond to the name of the bioinformatics tools used for each analytical step. LTR: long terminal repeat; ID: intellectual disability; TFs: transcription factors.
Frontiers in Genetics | www.frontiersin.org 3 April 2019 | Volume 10 | Article 321 All the bioinformatics analyses were performed at least three times between December 2017 and February 2019. Bioinformatic tools and corresponding tasks performed in this study are described below.

The
EnHERV database and web tool (Tongyoo et al., 2017): identifying human genes harboring MER41 LTR sequence(s) in the promoter region located 2 kb upstream the TSS; only solo LTRs oriented in the sense direction relative to the gene orientation were taken into account. 2. The db-HERV-REs database and web tool (Ito et al., 2017): identifying experimentally demonstrated TFBSs in HERV LTRs. The db-HERV-REs database has been generated by the re-analysis of 519 ChIP-Seq datasets provided by the ENCODE (ENCODE Project Consortium, 2004Davis et al., 2018) and Roadmap (Roadmap Epigenomics Consortium et al., 2015) consortia. 3. The enrichment web platform Enrichr (Kuleshov et al., 2016): performing enrichments analyses on queried lists of genes. The Enrichr website allows surveying simultaneously 132 libraries gathering 245,575 terms and their associated lists of genes or proteins. Enrichment analysis tools provided by the Enrichr bioinformatics platform provides adjusted P-values computed from the Fisher's exact test, Z-scores assessing deviation from an expected randomly obtained rank of P-values, and combined scores computed from the Z-scores and the adjusted P-values. We essentially focused our analysis on the well-recognized "GO term biological process" library (Ashburner et al., 2000; The Gene Ontology Consortium, 2019) and on three ontology libraries based exclusively on text-mining: (i) the "Jensen TISSUES" library (Santos et al., 2015), to determine whether a list of genes is significantly associated with a specific tissue or cell type, (ii) the "Jensen COMPARTMENTS" library (Binder et al., 2014), to determine whether a list of genes is significantly associated with a specific cellular compartment or macromolecular complex, and (iii) the "Jensen DISEASES" library , to determine whether a list of genes is significantly associated with a specific disease. 4. The UCSC genome browser (Rosenbloom et al., 2015): retrieving the sequences of MER41 LTRs and their precise localization in the promoter region of ID-associated genes in the human genome (Human genome assembly GRCh38/hg38) and in the Pan troglodytes genome (Chimpanzee genome assembly CSAC 2.1.4/panTro4). 5. The Swiss Institute of Bioinformatics (SIB) sequence alignment web tool LALIGN (SIB Swiss Institute of Bioinformatics Members, 2016): performing sequence comparisons between human and chimpanzee MER41 LTRs located in the promoter region of ID-associated genes. For each of these genes we checked the presence, nature and precise localization of MER41 LTR sequences in the promoter region.
6. The UniProt database of protein sequence and functional information (The UniProt Consortium, 2018): performing protein sequence alignments between Homo sapiens and Pan troglodytes for immune TFs binding MER 41 LTRs in the promoter regions of ID-associated genes. 7. The Brain RNA-Seq database (Zhang et al., 2016): exploring mRNA expression profiles obtained by RNA-Seq analyses in primary cultures of human neurons, astrocytes or macrophages/microglia. 8. The TISSUES database (Palasca et al., 2018): determining, for a given gene, which tissues harbor the highest levels of expression across a large range of normal human tissues. This database compiles results from four large expression atlases generated by pan-genomic and/or pan-proteomic analyses of normal human tissues (Su et al., 2004;Clark et al., 2007;Krupp et al., 2012;Fagerberg et al., 2014).

The Promoter Regions of ID-Associated Genes Are Uniquely Enriched in MER41 LTR Sequences
We queried the EnHERV database and web tool (Tongyoo et al., 2017) to determine whether candidate lists of cognition/behavior-related genes were enriched in genes harboring promoter-localized HERV LTRs (more precisely: sense-oriented solo HERV LTR sequence(s) localized in the promoter region located 2 kb upstream the TSS). We performed such an analysis successively for the 133 families of HERV that can be mined on the EnHERV website. Three lists of cognition/behavior-related genes were assessed (Supplementary Table 1): (i) a list of high confidence ASD susceptibility genes established by the SFARI consortium (Abrahams et al., 2013) and based on expert-operated manual curation of the literature, (ii) a recently established list of putative schizophrenia-causing genes inferred from the integrative analyses of genome wide association studies (Ma et al., 2018), and (iii) a list of genes for which mutations or deletions are considered as causative of intellectual disability based on a manual curation of the literature (Kochinke et al., 2016). As indicated in the original paper describing the EnHERV web tool (Tongyoo et al., 2017), results were considered as statistically significant when both following criteria were fulfilled: a Fisher exact test P-values <0.001 and an odds ratio >1. Using this approach, we found that the promoter regions of ID-associated genes were highly significantly enriched in MER41 LTRs (P-value = 0.0004; odds ratio = 4.28). Results were not significant for any of the other 132 HERV families that can be mined on the EnHERV website nor for the promoter regions of ASD-or schizophrenia-associated genes. Further supporting the specificity of our findings, when analyzing the 22 lists of non-CNS related genes provided as training lists by the EnHERV server, we did not find any significant enrichment in genes with promoter-localized MER41 LTR sequences. To confirm our findings, we retrieved from the EnHERV website the whole list of coding genes which, in humans, harbor a sense-oriented promoter-localized MER41 LTR sequence. On this list of 79 genes (Supplementary Table 2), we then performed enrichment analyses using the Enrichr website as described in the Materials and Methods section. We found no statistically significant enrichments with regard to "biological process" GO terms, tissue-specific expression or sub-cellular localization of gene products. However, text-mining enrichment analysis unraveled a significant enrichment in genes associated with the term "Intellectual disability" ( Table 1).
Since enrichment analysis based on text mining may be biased by the identification of a non-causative link between a given gene and the term "Intellectual disability", we took into account only genes that had been identified as causative of ID (Kochinke et al., 2016). On this basis, out of 79 human genes harboring a MER41 LTR sequence in their promoter region, nine had an established causative link with ID. The genes and associated genetic conditions are summarized in Table 2. Based on the analysis of the "Jensen DISEASES" library, the highest statistical scores were obtained with the term "Intellectual disability." The five most significant enrichments are shown.  The "biological process" GO terms that annotate those nine genes are shown in Supplementary Table 3. Overall, these data point to a yet unrecognized potential link between promoterlocalized MER41 LTRs and cognition.

LTRs From Distinct Members of the MER41 Family of HERVs Are Inserted in the Promoter Regions of ID-Associated Genes
TFBSs in LTRs from the MER41 family (MER41 A-E and MER41G) have been shown to vary depending of the MER41 member considered (Chuong et al., 2016). Using the EnHERV database and web tool, we have identified several MER41 members for which LTRs can be demonstrated in the promoter regions of ID-associated genes. As shown in Table 3, only three ID-associated genes harbored a MER41B LTR in their promoter region: CEP290, DDHD2, and GCSH. This indicates a potential transcriptional regulation of these three genes by the IFNγ/STAT1 pathway.
Interestingly, MER41 LTRs located in the promoter regions of ID-associated genes also include MER41A LTRs, which lack STAT1 binding sites (Chuong et al., 2016). This observation Gene symbols (left column) and the corresponding MER41 family or families (right column) are shown.
urged us to determine if other immune pathways (non-IFNγ/STAT1-mediated) may regulate the transcription of IDassociated genes via MER41 LTRs. To this aim we used the HERV database and web tool "db-HERV-RE" (Ito et al., 2017), which allow the identification of experimentally demonstrated TFBSs in HERV LTRs.

YY1 Is the Sole Transcription Factor
Harboring TFBSs in All the MER41 LTRs Inserted in the Promoter Regions of ID-Associated Genes Using the approach described above, we have identified 32 TFs that bind MER41 LTRs in the promoter regions of ID-associated genes ( Table 4).
As expected, MER41B LTR comprises a STAT1 consensus sequence while MER41A LTR does not. Interestingly, an YY1 consensus sequence is present in the LTRs of all the MER41A-E members. It is worth noting that mutations/deletions in YY1 are responsible for the Gabriele-De Vries syndrome, an autosomal dominant neurodevelopmental disorder characterized by intellectual disability, delayed psychomotor development and frequent autistic symptoms (Gabriele et al., 2017). Interestingly also, mutations in CTCF, another gene encoding a MER41 LTR-binding TF, are causally linked to "Mental retardation, autosomal dominant 21, " a developmental disorder characterized by significantly below-average general intellectual functioning associated with impairments in adaptive behavior (Gregor et al., 2013). Other inherited disorders associated to the above identified TF genes are summarized in Table 5 and notably include three groups of immune-related diseases induced by genetic alterations of the canonical immune TFs STAT1, STAT3, and NFKB1, respectively.
To summarize, besides STAT1, we have identified two canonical immune TFs, STAT3 and NFKB1, which bind specific MER41 LTRs in the promoter regions of ID-associated genes. Another TF, YY1, binds all MER41 LTRs in the promoter regions of ID-associated genes.

YY1 Interact With a Unique Network of Immune TFs That Bind MER41 LTRs in the Promoter Regions of ID-Associated Genes
We then explored the BioGRID database of human protein interactions (Chatr-aryamontri et al., 2015) to determine whether STAT1, STAT3, and/or NFKB1 were reported to physically interact with each other and/or with YY1 and other TFs binding MER41 LTRs in the promoter regions of ID-associated genes. Interestingly, in the retrieved interaction network (Figure 2), we observed that YY1, via its interaction with NFKB1, is connected to a specific set of MER41 LTR-binding TFs that interact with STAT1, STAT3, and/or NFKB1. An analysis of the GO terms "Biological process" annotating each of these TFs indicates that besides STAT1, STAT3, and NFKB1, other members of this unique set of TFs exert immune functions (Supplementary Table 4). This is notably the case for YY1. Moreover, some of such immune functions are linked to specific cytokines among which IL-1 and IL-6 are the most commonly shared in the retrieved GO terms (Figure 3). This result indicates that the prototypical proinflammatory molecules IL-1 and IL-6 are possibly involved in the transcriptional regulation of ID-associated genes displaying promoter-localized MER41 LTRs.

Chimpanzees vs. Homo sapiens Comparisons of MER41A-E LTR Sequences and Insertion Sites in the Promoter Regions of Cognition Related (ID-Associated) Genes
As mentioned, MER41 HERVs integrated the genome of a primate ancestor 45-60 million years ago. The process of socalled "ERV domestication" (Dewannieux and Heidmann, 2013) relies on mechanisms that are not only species-specific, but may have partly shaped speciation (Johnson, 2015). Accordingly, in primates, the species-specific domestication of MER41 HERVs translates into the existence of species-specific differences regarding the insertion sites and/or sequences of integrated (fixed) LTRs. On this basis, we investigated whether the promoters of ID-associated genes harbored the same MER41A-E LTRs in human and chimpanzees (Supplementary Table 5). Out of the nine candidate genes examined we found that five exhibited, in both species, MER41 LTR sequences belonging to the same family and displaying 95-100% homology (Supplementary Table 5). That being said, in two ID-associated genes MER41 LTR sequences were found at distances larger than 2 kb from the TSS in chimps and, for two other genes (CDH15 and GCSH), MER41 LTR sequences were absent, at least up to 10 kb from the TSS in chimps. These results are indicative of differences that may prove functionally relevant with regard to the MER41 LTR-mediated transcriptional regulation of specific ID-associated genes. This remains to be experimentally explored. It is of note that, according to the classification provided by the Gene ontology (GO) consortium (The Gene Ontology Consortium, 2017, 2019), three of the genes displaying such promoter-localized differences are annotated with "Biological process" GO terms that may possibly render an account of distinctive features between  Table 3). These include the terms "visual learning" and "locomotor behavior" for DDHD2, "hindbrain development" for CEP290 and "glycine catabolic process" for GCSH (glycine being a major inhibitory neurotransmitter). To complement these investigations, we also assessed whether key immune TFs putatively involved in the immune/MER41/cognition pathway exhibited humans vs. chimps differences regarding their amino acid sequences (Supplementary Table 6). Using the UniProt web tool "Align", comparisons retrieved 100% homology between humans and chimps in the amino acid sequences of NFKB1, STAT1, STAT3, and CEBPB. A 99.7% homology was retrieved for YY1 and no functionally relevant amino-acid substitution in the compared YY1 sequences could be predicted according to the UniProt Align webtool.

YY1 Links FOXP2 to Immune TFs Binding MER41 LTRs in the Promoter Regions of Cognition-Related (ID-Associated) Genes
FOXP2, a TF abundantly expressed in cortical neurons, is involved in the emergence of human speech (Vernes et al., 2007(Vernes et al., , 2011Fisher and Scharff, 2009;Scharff and Petri, 2011;Xu et al., 2018). Until recently, the transcriptional activity FOXP2 was thought to rely on the recognition of specific TFBs by FOXP2/FOXP2 homodimers or by FOXP1/FOXP2 or FOXP4/FOXP2 heterodimers (Wang et al., 2003;Sin et al., 2015). FIGURE 3 | Cytokines functionally linked to TFs that bind MER41 LTRs in the promoter regions of ID-associated genes. An analysis was performed of the GO terms "biological process" that annotate each TFs which bind MER41 LTRs in the promoter regions of ID-associated genes. This allowed determining which cytokines are functionally linked with these TFs and potentially mediate an immune regulation of the MER41/cognition pathway. Such functional links are depicted as black lines. TFs are highlighted in yellow. Cytokines are highlighted in gray or, alternatively, in red for cytokines harboring three or more functional links with TFs. IFNG: interferon-gamma; IFN: interferon.
However, a recent work established a short list of TFs that bind FOXP2 and are likely to form heterodimers that regulate FOXP2 availability and/or DNA binding properties in neurons (Estruch et al., 2018). Interestingly, YY1 was identified as one of the seven newly identified FOXP2-interacting TFs. We thus sought to determine whether YY1 could potentially represent a molecular link between FOXP2 and the immune/MER41/cognition pathway we identified. To this aim, we explored data obtained from a recent work attempting to identify a set of FOXP2 targets that are specific to human FOXP2 in neurons (Oswald et al., 2017). More precisely, data from this study were obtained by: (i) a meta-analysis of previous works reporting on FOXP2 neuronal targets (based notably on Chip-Seq analyses of the neuronal cell line SH-SY5Y; Spiteri et al., 2007;Vernes et al., 2007Vernes et al., , 2011Enard et al., 2009;Konopka et al., 2009;Hilliard et al., 2012) and (ii) a comparison of neuronal genes that are targeted by human FOXP2 vs. non-human primates orthologs of FOXP2 in the neuronal cell line SH-SY5Y (Oswald et al., 2017). A set of 40 candidate proteins encoded by FOXP2-targeted genes was identified. Protein interactors of these candidate targets were added in order to establish a final list of 80 proteins that are putatively regulated by FOXP2 in neurons in a human-specific manner. Interestingly, when performing enrichment analyses of the list of genes encoding such 80 proteins (Supplementary Table 7), we found a highly significant enrichment in genes that are either associated with immune-related terms, as identified by FIGURE 4 | Mapping of the protein and functional network linking FOXP2 to immune TFs that bind MER41 LTRs in the promoter regions of cognition-related (ID-associated) genes. A survey of the human proteome was performed on the BioGRID database (Chatr-aryamontri et al., 2015) in order to map protein-protein interactions between FOXP2, STAT1, STAT3, NFKB1, YY1 and the molecules identified as being human-specifically regulated by FOXP2 in the neuronal cell line SH-SY-5Y (Oswald et al., 2017). Ellipse shapes highlighted in yellow represent FOXP2, its protein partner YY1 and the three immune TFs that bind MER41 LTRs in the promoter regions of cognition-related (ID-associated) genes (namely STAT1, STAT3 and NFKB1). Ellipse shapes highlighted in gray represent molecules that interact with these TFs and are putatively regulated by FOXP2 in a human-specific manner.
Rectangles represent MER41 families displaying LTRs in the promoter regions of cognition-related (ID-associated) genes. Black lines indicate protein-protein interactions. Dashed green lines indicate protein-DNA interactions between TFs and MER41 LTRs located in the promoter regions of cognition-related (ID-associated) genes. Red lines indicate molecules that are targeted by FOXP2 in a human-specific manner.
text mining ("immune System", "NFKB complex", "arthritis" and others), or linked to immune biological processes according to the GO term classification ("cellular response to IL-21, " "cellular response to IL-2" and others). Of note, the list of genes putatively regulated by FOXP2 in a human-specific manner is significantly enriched in genes involved in "interleukin-6-mediated signaling pathway" (adjusted p-value: 0.0008) pointing again to IL-6 as a possible important player in the immune/MER41/cognition pathway. Finally, such a list of FOXP2 targets comprised STAT3 and several protein partners of STAT1, STAT3 and/or NFKB1.
Integrating these data with the demonstrated interaction of FOXP2 with YY1 allows us to map a network of immune TFs that link FOXP2 to specific cognition-related (ID-associated) genes exhibiting promoter-localized MER41 LTRs (Figure 4).

Human Neural Cells Express Key Immune Genes Involved in the Immune/MER41/Cognition Pathway
To further assess the relevance of our findings, we surveyed two independent databases which allows determining the neural expression of key genes putatively involved in immune/MER41/cognition pathway. These databases comprise: (i) the "TISSUES" database (Palasca et al., 2018) which compiles manually curated expression results obtained in four distinct expression atlases (Su et al., 2004;Clark et al., 2007;Krupp et al., 2012;Fagerberg et al., 2014) covering a large range of normal human tissues and (ii) the recently launched "Brain RNA-Seq" database (Zhang et al., 2016) which allows exploring expression profiles observed in primary cultures of human neurons, astrocytes or macrophages/microglia. In our survey, the lymphocyte-specific gene markers CD3G and ZAP70 were used as negative controls. Data retrieved from the "Brain RNA-Seq database" showed that in cultured human neurons, CD3G and ZAP70 are expressed at levels considered as bellow the detection threshold (Supplementary  Table 8). Of potential interest also, retrieved data showed that IL-6 is constitutively expressed by cultured macrophages/microglia derived from the brains of humans but not mice (data not shown). Regarding the expression pattern of candidate genes in normal human tissues, data retrieved from the "TISSUES" database showed that, as expected, CD3G and ZAP70 exhibited their higher levels of expression in lymphoid tissues (e.g., thymus, tonsils or lymph nodes; Supplementary Table 8). However, surprisingly, the prototypical immune-related genes STAT1, STAT3, NFKB1, IL6R and IL6ST were reported to display their highest (or second highest) levels of expression in the human brain (Supplementary

DISCUSSION
We have found that, in the human genome, the promoter regions of ID-associated genes are uniquely enriched in MER41 LTRs. More specifically, nine ID-associated genes that are putatively important in cognitive evolution exhibit MER41 LTRs in their promoter regions. As more than 100 families of HERV are integrated into our genome, it was important to determine whether our findings are specific to MER41 and to ID-associated genes, and if so to what extent. Among the 133 families of HERV explored here, MER41 is the only family whose LTRs were found with statistically high frequency in the promoter regions of ID-associated genes. It must be emphasized that, while many HERV families are inherited from ancestors common to all mammals, the MER41 family is detected exclusively in the genome of primates. Interestingly, we have observed substantial differences between humans and chimpanzees regarding the localization of MER41 LTRs in the promoter regions of IDassociated genes. These results suggest that the MER41 family of HERVs could have been involved in cognitive changes after our split from chimps. In this scheme, infection and horizontal transmission of the exogenous virus from which MER41 HERVs derive, would have occurred in a community of primate ancestors and would have led to germline infection, followed by vertical transmission and, in fine, endogenization. If so, genomic evolution from these primate ancestors would have been, at least in part, affected by the processes of HERV endogenization and domestication, which is itself mainly dictated by the host's immune system (Dewannieux and Heidmann, 2013). Accordingly, differences regarding the insertion sites of MER41 LTRs in the promoter region of a large range of genes, including cognition-related (ID-associated) genes, might have played roles in cognitive speciation. It is worth noting that MER41 LTRs are not enriched in the promoter regions of ASD-or schizophrenia-related genes. This finding suggests that selected aspects of cognitive evolution in the primate genus are linked to MER41. Our work also indicates that, in humans, immune regulation of the MER41/cognition pathway is not limited to IFNγ and its main downstream signaling molecule, STAT1. Indeed, the MER41 LTRs located in the promoter regions of cognitionrelated (ID-associated) genes harbor TFBSs for a group of five interacting immune-related TFs (STAT1, STAT3, NFKB1, YY1, and CEBPB) which are themselves functionally linked to multiple cytokines including IFNγ. Moreover, in this functional network, the prototypical pro-inflammatory cytokine IL-6 rather than IFNγ appears to be the main hub. Thus, cognitive evolution after our split from chimps might have been influenced by the process of endogenization and domestication of MER41 HERVs and by the parallel genomic evolution of immune genes. In this view, it is worth noting that, overall, immune genes harbor the highest levels of purifying selection in the human genome, which reflects the key functions of immunity in the defense against life-threatening infectious agents (Daub et al., 2013;Deschamps et al., 2016;Delgobo et al., 2019). This is notably the case for STAT1 (Deschamps et al., 2016) and for genes involved in the IL-6 pathway such as, in particular, IL-6, IL6ST, STAT3, and CEBPB (Daub et al., 2013;Delgobo et al., 2019).
Owing to its putatively important role in the immune/MER41/cognition pathway, YY1 deserves particular attention. Indeed, YY1 binding sites are observed in the LTRs from all MER41 subtypes (MER41 A to E) and YY1 is a direct protein partner of both NFKB1 and FOXP2, two TFs exerting major roles in immunity and language, respectively. Moreover, YY1 is not only recognized as being crucially involved in CNS development (as notably shown in the inherited brain disorder "Gabriel-de Vries syndrome"), but also as exerting major functions in the immune system. In particular, YY1 was demonstrated to inhibit differentiation and function of regulatory T cells by blocking Foxp3 expression (Hwang et al., 2016) and to regulate effector cytokine gene expression and T(H)2 immune responses (Guo et al., 2008). We previously proposed that the nervous and immune systems have somehow co-evolved to the benefits of both systems, particularly regarding cognitive evolution, including our language-readiness (Benítez-Burraco and Uriagereka, 2016;Nataf, 2017a,b). In this context, YY1 may represent a new molecular connection between immunity and cognition and, even more specifically, between immunity and speech or language more generally.
NFKB1 may also draw specific interest since its neuronal expression was reported to be essential to behavior and cognition in both invertebrates and mammals (Meffert and Baltimore, 2005;Mattson and Meffert, 2006;Kaltschmidt and Kaltschmidt, 2009;Dresselhaus et al., 2018). In the central nervous system of rodents, components of the NFKB complex are detectable in neuronal processes and in synapses under physiological conditions (Salles et al., 2014;Dresselhaus et al., 2018). Moreover, synaptic transmission as well as exposure to neurotrophins activate the NFKB pathway in neurons (Meffert and Baltimore, 2005;Mattson and Meffert, 2006;Kaltschmidt and Kaltschmidt, 2009). In turn, NFKB activation in neurons triggers the transcription of multiple neuronal genes that may favor cognition and shape behavior. This is notably the case for neuropeptide Y and BDNF (Snow and Albensi, 2016).
The immune-mediated retrotranscription of specific HERVs was shown possibly to negatively influence the outcome of CNS disorders (Douville et al., 2011;Kremer et al., 2013;Douville and Nath, 2017;Küry et al., 2018). While our work unravels the putative evolutionary-determined advantage conferred by the immune/MER41/cognition pathway in human, it also points to the potential weaknesses that are inherent to such a pathway. Indeed, we propose that alterations of the immune/MER41/cognition pathway might contribute to the development of non-inherited forms of ID. Such a dysfunction might be induced by the untimely or quantitatively inappropriate exposure of neurons to specific cytokines, notably IL-6 or IFNγ, which physiologically shape neurotransmission (Chourbaji et al., 2006;Baier et al., 2009;Victório et al., 2010;Litteljohn et al., 2014;Gruol, 2015) and are possibly involved in the immune/MER41/cognition pathway. Obviously, however, experiments are needed in order to start demonstrating that the immune/MER41/pathway actually operates in the human brain. In particular, in vitro experiments performed on human neural cells could allow determining whether or not IL-6 and/or IFNγ regulate the expression of ID-associated genes harboring promoter-localized MER41 LTRs and, if so, whether or not such a process occurs via the binding of STAT1, STAT3, NFKB1, YY1 and/or CEBPB to MER41 LTRs.
In any case, our work reinforces the notion of neuroimmune co-evolution that we previously put forward (Benítez-Burraco and Uriagereka, 2016;Nataf, 2017a,b). In this general frame, we would like to propose that, besides the potential role of endogenous immune cues (Nataf, 2017a,b), immune signals triggered by infectious agents, might have been important to cognitive evolution. In particular, depending on their pathogenicity, such infectious agents could have exerted a neuroimmune selection pressure over millions of years (e.g., via the self-domestication of HERVs) or during short periods of time (e.g., via the occurrence of life-threatening epidemics of viral or bacterial infections). In this view, our findings provide general support to the hypothesis previously enunciated by Piattelli-Palmarini and Uriagereka (Piattelli-Palmarini and Uriagereka, 2004), updated by Benítez-Burraco and Uriagereka (Benítez-Burraco and Uriagereka, 2016) which states that the recent emergence of linguistic skills would have been triggered by a fast propagating virus.

AUTHOR CONTRIBUTIONS
SN performed the bioinformatics analyses and wrote the manuscript. AB-B and JU wrote the manuscript.

FUNDING
This work benefited from private funds attributed to SN for bioinformatics analyses that do not relate with the content of this paper.