Effects of HAR1 on cognitive function in mice and the regulatory network of HAR1 determined by RNA sequencing and applied bioinformatics analysis

Background: HAR1 is a 118-bp segment that lies in a pair of novel non-coding RNA genes. It shows a dramatic accelerated change with an estimated 18 substitutions in the human lineage since the human–chimpanzee ancestor, compared with the expected 0.27 substitutions based on the slow rate of change in this region in other amniotes. Mutations of HAR1 lead to a different HAR1 secondary structure in humans compared to that in chimpanzees. Methods: We cloned HAR1 into the EF-1α promoter vector to generate transgenic mice. Morris water maze tests and step-down passive avoidance tests were conducted to observe the changes in memory and cognitive abilities of mice. RNA-seq analysis was performed to identify differentially expressed genes (DEGs) between the experimental and control groups. Systematic bioinformatics analysis was used to confirm the pathways and functions that the DEGs were involved in. Results: Memory and cognitive abilities of the transgenic mice were significantly improved. The results of Gene Ontology (GO) analysis showed that Neuron differentiation, Dentate gyrus development, Nervous system development, Cerebral cortex neuron differentiation, Cerebral cortex development, Cerebral cortex development and Neurogenesis are all significant GO terms related to brain development. The DEGs enriched in these terms included Lhx2, Emx2, Foxg1, Nr2e1 and Emx1. All these genes play an important role in regulating the functioning of Cajal–Retzius cells (CRs). The DEGs were also enriched in glutamatergic synapses, synapses, memory, and the positive regulation of long-term synaptic potentiation. In addition, “cellular response to calcium ions” exhibited the second highest rich factor in the GO analysis. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis of the DEGs showed that the neuroactive ligand–receptor interaction pathway was the most significantly enriched pathway, and DEGs also notably enriched in neuroactive ligand–receptor interaction, axon guidance, and cholinergic synapses. Conclusion: HAR1 overexpression led to improvements in memory and cognitive abilities of the transgenic mice. The possible mechanism for this was that the long non-coding RNA (lncRNA) HAR1A affected brain development by regulating the function of CRs. Moreover, HAR1A may be involved in ligand–receptor interaction, axon guidance, and synapse formation, all of which are important in brain development and evolution. Furthermore, cellular response to calcium may play an important role in those processes.

exhibited the second highest rich factor in the GO analysis. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis of the DEGs showed that the neuroactive ligand-receptor interaction pathway was the most significantly enriched pathway, and DEGs also notably enriched in neuroactive ligand-receptor interaction, axon guidance, and cholinergic synapses.
Conclusion: HAR1 overexpression led to improvements in memory and cognitive abilities of the transgenic mice. The possible mechanism for this was that the long non-coding RNA (lncRNA) HAR1A affected brain development by regulating the function of CRs. Moreover, HAR1A may be involved in ligand-receptor interaction, axon guidance, and synapse formation, all of which are important in brain development and evolution. Furthermore, cellular response to calcium may play an important role in those processes. KEYWORDS HAR1, lncRNA, brain development, transgenic mouse, RNA-seq

Introduction
Human accelerated regions (HARs) consist of 49 segments of the human genome that have been conserved through vertebrate evolution, although they are strikingly different in humans (Hubisz and Pollard, 2014). HARs may be responsible for the unique characteristics of our species, since they are related to the evolution of size, structure, and complexity of the human brain. HAR1 is a human genome region located in the long arm of chromosome 20. This 118-bp region contains 18 mutations between humans and chimpanzees. These mutations cause different secondary structures of lncRNA HAR1A between humans and chimpanzees (Pollard et al., 2006a). HAR1A, in particular, is one of the most distinctively different HARs between humans and chimpanzees (Pollard et al., 2006b;Kostka et al., 2012). Furthermore, dysregulation of HAR1A has been associated with many central nervous system diseases.
LncRNA HAR1A is explicitly expressed in Cajal-Retzius cells (CRs), which are well-known for controlling nerve cell migration during brain development, suggesting that HAR1A is important in brain development (Cao et al., 2021;Lee et al., 2021). In addition, the expression of lncRNA HAR1A affects the occurrence and development of many brain diseases, including Huntington's disease and glioma Wei et al., 2021). However, genes regulated by HAR1A in the developing brain have not been systematically studied yet. Therefore, its association with the development and evolution of the central nervous system remains relatively unclear.
HAR1A belongs to lncRNAs, which are non-coding RNAs longer than 200 nucleotides. Accumulating evidence has demonstrated that lncRNAs exert epigenetic effects by regulating transcriptional and post-transcriptional processes. Moreover, lncRNAs have been reported to regulate human brain development, neural stem cell regulation, and neuronal axon elongation Wei et al., 2021). The involvement of lncRNAs in regulating neurological growth and development allows the nervous system to grow and differentiate in the regular order of time and space (Wei et al., 2021).
In this study, we explored the effects of HAR1 overexpression on the cognition and memory of mice and analyzed the related functions and enriched pathways of differentially expressed genes (DEGs). This study aimed to identify the role of HAR1 in brain development and deepen the understanding of the molecular regulatory mechanism of lncRNAs in neurological development.

Mouse models
The C57BL/6 mouse strain was used in this study. Six transgenic mice (experimental group) were provided by Cyagen Bioscience Inc. (Guangzhou, China). Eleven wild-type mice were provided by the Guangdong Medical Laboratory Animal Center (Guangzhou, China) and analyzed in parallel with the transgenic mice as a control group. All mice were placed in a specific pathogen-free cage and subjected to a 12 h light-dark cycle in an approved facility.
The transgenic mice were produced by cloning the human HAR1 gene into an EF-1α promoter vector. The pRP (Exp)-EF1A vector (Cyagen Biosciences, Guangzhou, China) was used to construct the transgenic mice. The sequence of the target HAR1 gene was: ATGAAACGGAGGAGACGTTACAGCAACGTGTCA GCTGAAATGATGGGCGTAGACGCACGTCAGCGGCGGAAA TGGTTTCTATCAAAATGAAAGTGTTTAGAGATTTTCCTC AAGTTTCA. The combined sequence of the HAR1 gene and EF-1α promoter vector was named H119, with a length of 252 bp, and was randomly inserted into genomes.
All animal experiments were carried out according to the Experimental Animal Center of the Guangzhou Medical University's guidelines and approved by the Guangzhou Medical University's Ethics Committee (approval number: 2020-121).

Morris water maze (MWM) test
The experimental and control groups were trained in a round open pool (diameter: 1.7 m; depth: 0.3 m) at 23°C ± 1°C (Vorhees and Williams, 2006;Yu et al., 2018). The maze was divided into four equal quadrants (Pollard et al., 2006a;Pollard et al., 2006b;Kostka et al., 2012;Hubisz and Pollard, 2014) by specifying two orthogonal axes, the ends of which were marked as four base points: north (N), south (S), east (E), and west (W). A camera Frontiers in Genetics frontiersin.org 02 (FDR-AX700, Sony, Japan) was suspended from the middle of the ceiling to record the movements of the rats. The space acquisition task was tested four times a day, with an interval of 15 s, for five consecutive days.
The escape platform (diameter: 0.1 m) was approximately 1 cm below the water surface, located at the center of quadrant 2 (target quadrant). The mice were gently released from various starting points into the water and were allowed to seek the hidden platform for 60 s. Mice were manually guided to the target platform if they failed to find it within 60 s. After a 6-day delay from the last space acquisition training day, exploratory trials were conducted to assess the long-term memory of the mice (Yu et al., 2018). Tracking software (EthoVision XT 10, Noldus Information Technology, Leesburg, VA, United States) was used to analyze the results, including the escape latency in the spatial acquisition days and the latency to the target platform (probe time) during the probe trial.

Step-down test
The one-trial step-down test was performed to assess inhibitory avoidance and long-term memory, including 5-min training and a 5-min test after 24 h. The size of the chamber was approximately 0.18 (height) × 0.12 (width) × 0.12 (depth) m. The floorboard was an electrified grid composed of 1-mm copper bars in parallel, with a spacing of 0.05 m, and there was also a high-rising rubber platform (diameter: 0.24 m) in one corner of the chamber. Mice were placed on the platform with their noses directed toward the bottom corner. In the training session, when the mice stepped on the grid, they would receive an immediate shock (36V, AC), and as a result would instinctively jump up to the high-rising platform to avoid the electric shock. The time taken for the mice to jump from the high-rising platform to the grid bottom (step-down latency) and the number of times the mice jumped from the high-rising platform during the training phase (error counts) were recorded. In the experimental phase, the same procedure was conducted. The equipment was carefully cleaned after each test session to reduce the possibility of odor interference. After the step-down test, mice were euthanized by cervical dislocation, and their brain samples were subsequently collected (Yu et al., 2017).

RNA extraction
The TRIzol method (Invitrogen) was used to extract total RNA from the brain tissue of the experimental group mice and the control group mice. The Agilent 2100 Bioanalyzer was used to assess the concentration and quality of RNA.
The products were separated using 2% agarose gel electrophoresis. The gel was left to run for 90 min with an 80 V/ 70 A current. A UV transilluminator was used to visualize the products on the gel after red staining.

Differential gene expression analysis of the RNA-seq data
After obtaining the raw data, we performed quality control (QC) to remove adapters, N-terminus, and low-quality reads. FastQC (v 0.11.5) was used for QC with the parameters -a = given adapter sequence, --output = output path, and -m = 40 and others using default parameters. Trimmed reads sample data were aligned to reference sequences using Bowtie2 software (v2.2.6) with the following parameters: --read-edit-dist 3, -r 70, --library-type frunstranded, and others using default parameters. HTSeq-Count (0.6.1p1) was used for the expression quantification with the following parameters: --stranded = no, --format = bam, --mode = intersection-nonempty, and others using default parameters. The DEGs between the experimental and control groups were screened using the DESeq2 package (1.12.4), with the following parameters: log2 (foldchange) >1, Padj <0.05. Here, Padj means the p-adjustment corrected by the Benjamini and Hochberg method after multiple test adjustments.

GO and KEGG pathway enrichment analyses
The DEGs were analyzed by cluster analysis, pathway significant enrichment analysis (Padj <0.05), and Gene Ontology (GO) functional enrichment analysis (Padj <0.05). GO enrichment analyses were based on the Wallenius non-central hypergeometric distribution. GO and KEGG enrichment analyses were conducted in Frontiers in Genetics frontiersin.org 03

FIGURE 1
Escape latencies of the experimental and control groups over five consecutive training days (A). Escape latencies of the experimental and control groups in the test session (B).
Step-down latency of the experimental and control groups during training (C) and test sessions (E). Error counts in the experimental and control groups during the training (D) and test sessions (F).

Validation of differentially expressed genes
Ten DEGs involved in brain development were selected to validate the RNA-Seq results by quantitative real-time PCR (qRT-PCR). Total RNAs were first separated from the experimental and control group samples. The total RNAs were treated with DNase and were used to validate the RNA-seq results. After DNase treatment, 1.5 μg of DNasetreated RNA was reverse-transcribed to cDNA with oligo (dT) 15 using M-MLV reverse transcriptase (Life Technologies) in a 20 μL total reaction volume. Following the reverse transcription reaction, 1.5 μL of the reaction mixture was used to run a qPCR program of 46 cycles consisting of melting (30 s at 95°C), annealing (30 s at 60°C), and extension (30 s at 74°C). The 25 μL reaction mixture contained 1.5 PCR buffer (Mg 2+ Plus), 1 μm of forward primer, 1 μm of reverse primer (Supplementary Table S1), 200 μm of each dNTP, and Roche SYBR-Green master mix in a LightCycler 480 Real-Time PCR System (Roche Applied Science). Data were analyzed by using the 2 -ΔΔCt method. Glyceraldehyde 3-phosphate dehydrogenase (GADPH) was used as the reference gene. All other results were shown as the fold change relative to the GADPH control.

MWM and step-down passive avoidance tests
Our study used the MWM test to estimate spatial learning and long-term memory. During the space acquisition days, the time the mice took to find the hidden target platform (the escape latency) was measured. The escape latency was reduced during 5 days of training for the experimental group; the fifth day (36.0 ± 6.1 s) had about 61% of the value of the first day (59.0 ± 0.7 s). During the fifth day, the escape latency of the control group (43.1 ± 5.6 s) still accounted for about 80% of the value of the first day (54.0 ± 3.6 s) ( Figure 1A). After a 6-day delay from the training course, a probe test was conducted to assess the long-term memory of the mice. In terms of the probe test, it took an average of 28.0 ± 5.4 s for the experimental group to reach the platform position for the first time, compared to 72.1 ± 19.7 s for the control group (p < 0.001; Figure 1B). Therefore, the duration of the probe test for the experimental group was significantly reduced compared to the control group.
This study measured the step-down latency and error count to assess memory retention (Figures 1C-F). During the training period, the latencies of the experimental and control groups were 24.0 ± 4.7 s and 22.0 ± 5.1 s respectively, this difference between the groups was not statistically significant ( Figure 1C). During the test stage, the step-down latency of the experimental group increased by 79.2 ± 18.9 s, significantly more than that of the control group, which increased by 65.0 ± 30.0 s (p < 0.01; Figure 1E). The mice in the experimental group showed an overall smarter performance, with fewer errors in the training period (5.2 ± 0.3 vs 8.2 ± 1.3, p < 0.01; Figure 1D). During the test period, the error counts of the experimental group (2.9 ± 0.9) were significantly lower than those of the control group (6.2 ± 1.6, p < 0.01; Figure 1F).
The results of the MWM and step-down passive avoidance tests showed that HAR1A overexpression significantly improved the performance of both short-and long-term memory, spatial learning, and cognitive ability of mice.

Validation of transgenic mice
The quality of transgenic mice was screened by PCR assay, which resulted in six strong positive results, 1-4 and 8-9 ( Figure 2). This meant that the six mice successfully overexpressed HAR1. Their brain samples were then selected for follow-up experiments.

Review of the RNA-seq datasets
The QC results showed that the base composition was satisfied in all 17 samples. After excluding the adapter sequences and the lowquality reads, the lowest clean reading obtained from the 17 groups was 72,540,330. The number of mapped reads for each sample was more than 65 million, sufficient to provide valid data for further analysis. The map rates (mapped reads/raw reads) were all above 88% (Supplementary Table S2), suggesting that all the constructed libraries have excellent quality.

Analysis of DEGs between the experimental and control groups
Differential gene expression analysis was performed to identify DEGs between the experimental and control groups. A total of 17,897 DEGs (fold change >1 and p < 0.05) were detected between the experimental and control groups, of which 7,933 genes were upregulated and 9,964 genes were downregulated (Figure 3). This Frontiers in Genetics frontiersin.org 05

FIGURE 3
Volcano plots of DEGs. The gray points represent genes with relatively unchanged expression; the green points represent the downregulated genes; and the red points represent upregulated genes. P-value was adjusted using q-value and q-value < 0.005, and log2 (fold change) > 1 was set as the threshold for significant differential expression.

GO functional enrichment analysis
GO enrichment analysis was performed to identify the primary functions of the DEGs. Ultimately, 1,361 important GO terms were obtained, and subsequently, the top 20 GO terms for functional enrichment were selected to configure a bubble chart ( Figure 4A). "Positive regulation of long-term synaptic potentiation" and "Cellular response to calcium ions" exhibited the top two highest rich factor. Furthermore, DEGs were enriched in glutamatergic synapses, synapses, memory, and neuron differentiation, which are associated with neural development. Table 1 presents the GO results related to brain development. Neuron differentiation, Dentate gyrus development, Nervous system development, Cerebral cortex neuron differentiation, Cerebral cortex development, Cerebral cortex development and Neurogenesis were all significant. They are mainly related to the directional differentiation of neural stem cells and the formation of spatial brain structures.
Considering that using multiple databases provides more reliable results, we performed a series of analyses on DEGs using GO, REAC (https://reactome.org/), and HPA databases (https:// www.atlasantibodies.com/). As shown in Table 2, the p-values for the top 13 enrichment pathways were all below 4.7E-43, indicating a high significance. The neuron part pathway, synaptic signaling pathway, neuronal system pathway, and pre-synapse pathway were all significantly enriched.

KEGG pathway analysis
KEGG pathway analysis was performed to investigate the pathways in which DEGs were significantly enriched. Apart from GO terms, 124 KEGG pathways were found to be related to the DEGs. The top 20 most significantly enriched pathways are shown in Figure 4B. The neuroactive ligand-receptor interaction pathway showed the greatest significance. Furthermore, DEGs were notably enriched in terms of axon guidance, calcium signaling, and cholinergic synapse, all related to neural development (Table 3) (Sanes and Zipursky, 2020).

Validation of the DEGs using qRT-PCR
Ten DEGs involved in brain development were chosen for qRT-PCR confirmation. Consistent with the results of RNA-seq, Lhx2,

FIGURE 4
Top 20 functions enriched in GO analysis (A) and top 20 pathways enriched in KEGG analysis (B). The Rich factor represents the enrichment degree of the terms, and the y-axis shows the names of the enriched terms. The area of the node represents the number of genes. The p-value is represented by a color scale. The statistical significance increased from green (relatively lower significance) to red (relatively higher significance).

Discussion
Previous studies have confirmed that HARs are related to the development and evolution of the human brain (Tolosa et al., 2008;Mosti and Silver, 2021). HAR1 is a section of HAR1A that has been conserved through vertebrate evolution, although a difference of 18 mutations in its 118-bp sequence between humans and chimpanzees is present (Pollard et al., 2006a;Pollard et al., 2006b). HAR1 is part of an overlapping cis-antisense pair of structured lncRNAs, HAR1A and HAR1B. The expression of lncRNA HAR1B in the human brain is lower than that of HAR1A. The expression pattern of HAR1B suggests that it may be expressed in later stages of brain development to downregulate HAR1A by antisense inhibition (Pollard et al., 2006a). The human mutations in the HAR1 region result in the corresponding secondary structural changes in lncRNA HAR1A; however, research on the Frontiers in Genetics frontiersin.org 08 impact of this change is still limited. This study confirmed that HAR1 overexpression led to a significant improvement in the cognition and memory of mice. The possible pathways and mechanisms were then explored through bioinformatics analysis.

Potential pathways of HAR1A affecting cortical development through CRs
LncRNA HAR1A is specifically expressed in CRs, which exist in the developing human neocortex during the gestational period of weeks 7-19 (Waters et al., 2021). CRs are crucial in the specification and migration of cortical neurons, suggesting that HAR1A plays an important role in neurogenesis (Causeret et al., 2021). The results of the GO analysis showed that Neuron differentiation, Dentate gyrus development, Nervous system development, Cerebral cortex neuron differentiation, Cerebral cortex development, Cerebral cortex development and Neurogenesis are all significant GO terms related to brain development. Five DEGs repeatedly appear in these GO items process: Lhx2, Emx2, Foxg1, Nr2e1, and Emx1.
Foxg1 encodes a transcription repression factor essential in the regional subdivision for telencephalon formation and brain development (Hettige and Ernst, 2019). When Foxg1 is expressed in

FIGURE 5
Validation of RNA-seq data using RT-qPCR. Fold changes in gene expression between the experimental and control groups detected by RNA-seq were compared with those measured by qRT-PCR. The mRNA levels were normalized to GADPH expression, and the expression fold changes (compared to the control sample) were calculated by using the 2 -ΔΔCt method. The x-and y-axes correspond to the genes and log2 (ratio of H/L) measured by RNA-seq and qPCR.
Frontiers in Genetics frontiersin.org cortical progenitor cells, this stops the production of CRs by directly inhibiting a default transcriptional network (Hanashima et al., 2004). Foxg1-Lhx2 interactions instruct the termination of CR production (Godbole et al., 2018). CRs originate from the cortical hem (Griveau et al., 2010). Neurons in the ventral pallium and cortical hem in the medial pallium become subplate and CRs, respectively, after developing and migrating toward the neuroepithelium (Griveau et al., 2010;García-Moreno and Molnár, 2020). Emx1 and Emx2 are required to establish the Wnt-rich cortical hem domain (Dixit et al., 2011;von Frowein et al., 2006). Both EMX1 and EXM2 encode a homeobox-containing transcription factor and cooperate to promote the generation of CRs. Nr2e1 participates in a feedback loop with the brain-specific microRNA-9 (miR-9), while mouse miR-9 targets Foxg1 to control the generation of CRs in the medial pallium properly (Davila et al., 2014). CRs generated early in the marginal zone of the cortex synthesize the glycoprotein reelin and secrete it into the peripheral intercellular stroma to regulate the morphology and biochemical maturation of radial glia cells (RGCs) (Lehrman et al., 2018;Junqueira Alves et al., 2021). RGC fibers act as scaffolding when newborn neurons migrate to their final destinations. Reelin is a stop signal for neuron migration, which decides the neurons' orientation and positioning within their layers. However, abnormal function of CRs can lead to various neurological diseases, including Alzheimer's disease, schizophrenia, lissencephaly, and temporal lobe epilepsy (Chen et al., 2017).
LncRNA HAR1A may affect the function of CRs by regulating the expression of Lhx2, Emx2, Foxg1, Nr2e1, and Emx1. HAR1A may affect cerebral cortex development in this way, although the specific mechanism of this requires further study.

LncRNA HAR1A may affect synaptic function and formation
The KEGG pathway analysis results showed that the neuroactive ligand-receptor interaction pathway was the most significant, and DEGs were notably enriched in axon guidance, and cholinergic synapses. The results of GO analysis showed that DEGs were enriched in glutamatergic synapses, synapses, memory, and the positive regulation of long-term synaptic potentiation. Multiple database analyses (GO, REAC, and HPA) of DEGs showed that the synaptic signaling pathway and pre-synapse pathway were highly significant.
Based on these results, it can be concluded that HAR1A holds an important role in neuroactive ligand-receptor interactions and synaptic and axon guidance, all of which are critical processes in the early development of the brain . Neurons migrate to their correct locations to make connections by emitting axons during the development of the brain (Ulloa et al., 2018). Multiple environmental signals control the migration of these neurons and axon growth. After axons reach their appropriate position, synapses will be formed to link neurons together (Gangatharan et al., 2018). Axonal branches are over-formed during the early development of the brain; then, specific neuronal connections are formed by removing the redundant synapses (Lehrman et al., 2018;Luck et al., 2019). Several ligand-receptor pairs are involved in these cellular events (Junqueira Alves et al., 2021).
Existing evidence has shown that HARs regulate neurodevelopmental processes that have diverged between humans and chimpanzees, such as synapse development (Won et al., 2019). The human prefrontal cortex (PFC) is enlarged compared to other portions of the brain and is related to the increasing neocortical volume in primates. Changes in the synaptic distribution in the primate PFC may be caused by differential expression and transcription patterns of specific genes (Pattabiraman et al., 2020). Based on our experimental results and the conclusions of existing studies, it can be speculated that HAR1A may affect the evolution of the human brain by regulating the expression of genes related to synaptic generation.

HAR1A regulates the crucial second signal calcium ion during cortical development
The GO analysis results showed that "cellular response to calcium ion" of the "biological process" exhibited the second highest rich factor value.
Calcium plays a crucial role in neurogenesis as an essential second messenger (Callens et al., 2021). Neural stem cells stay in the neocortex's ventricular zone to produce progenitor cells, glial cells, and subsequently, neurons, which compose the entire adult brain (Sokpor et al., 2018). Multiple signals strictly regulate proliferation, migration, and differentiation during the establishment of the structure of the cerebral cortex, including cytosolic calcium ions. The regulation of Ca 2+ is essential in the delicate process of cortical development.

Conclusion
Our study confirmed that HAR1 overexpression improved the memory and cognitive abilities of transgenic C57BL/6 mice. We identified the HAR1 regulatory network in the mice by RNA-Seq. The results showed that HAR1 may affect brain development through CRs. Also, it is related to ligand-receptor interaction, axon guidance, and synapse formation, which are important processes in brain development and evolution. HAR1 also played an important role in the regulation of the second signal calcium ion. These findings provide valuable insights into the molecular mechanisms of how HAR1 affects brain development and the potential role of HARs in the evolution of the brain.

Data availability statement
The datasets presented in this study can be found in online repositories (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE197554).

Ethics statement
The animal study was reviewed and approved by the Ethics Committee of Guangzhou Medical University.
Frontiers in Genetics frontiersin.org