Whole genome sequencing of Aoluguya reindeer (Rangifer tarandus) in China

Reindeer (Rangifer tarandus) belonging to Artiodactyla order and Cervidae family has a broad range in the Arctic and sub-Arctic regions mainly including Norway, Finland, Sweden, Russia, the United States, and Canada (Ju et al., 2019). Hunting and herding reindeer are of utmost importance to the people residing in the Arctic (Kvie et al., 2016). For centuries, reindeer has been a crucial source of subsistence items for Arctic residents, such as food and fur, and has also played a key role in transportation (Kofinas et al., 2000). Additionally, reindeer has played a pivotal role in the origin and development of numerous indigenous cultures in the North (Baskin, 2000). By studying reindeer, researchers can gain insights into the history of many Arctic cultures. Although the reindeer was added to the Red List of Threatening Species of the International Union for Conservation of Nature (IUCN) in 2015 and were estimated as vulnerable according to the criteria of A2a (http://www. iucnredlist.org/), fortunately it has beed monitored so that we can improve its position. Aoluguya reindeer (Figure 1A), located in the Greater Khingan Mountain region of Inner Mongolia in China (50°20′-52°30′N, 120°12′-122°55′E) and domesticated by the Ewenki people, is a population that is currently being bred in the southernmost part of this species habitat and is a valuable research target due to long-term isolation. The migration of Aoluguya reindeer (AgD) is limited by mountains and traffic, which hinders genetic exchange. In recent years, as a result of this geographical isolation, habitat degradation, long-term inbreeding, population degradation, and weakened disease resistance the number of population has shown a dramatic decline (Ju et al., 2019). It is recorded that the number of reindeer is always around 1,000 and the development is not optimistic (Zhai et al., 2017). The loss of reindeer will has severe negative consequences for the environment and the indigenous cultures of the North, such as the Ewenki. Because of the limited analyses of the genomic characteristics and genetic diversity of AgD, the genetic basis of these traits remains unclear. This also leads to a lack of relevant measures to preserve this population. So the sequencing of the AgD will be a valuable resource for researchers focused on its conservation and those interested in further reindeer genomic studies. Taking this into account, here, we report and release whole-genome sequencing data from 10 individuals with AgD to be available. OPEN ACCESS


Introduction
Reindeer (Rangifer tarandus) belonging to Artiodactyla order and Cervidae family has a broad range in the Arctic and sub-Arctic regions mainly including Norway, Finland, Sweden, Russia, the United States, and Canada (Ju et al., 2019). Hunting and herding reindeer are of utmost importance to the people residing in the Arctic (Kvie et al., 2016). For centuries, reindeer has been a crucial source of subsistence items for Arctic residents, such as food and fur, and has also played a key role in transportation (Kofinas et al., 2000). Additionally, reindeer has played a pivotal role in the origin and development of numerous indigenous cultures in the North (Baskin, 2000). By studying reindeer, researchers can gain insights into the history of many Arctic cultures. Although the reindeer was added to the Red List of Threatening Species of the International Union for Conservation of Nature (IUCN) in 2015 and were estimated as vulnerable according to the criteria of A2a (http://www. iucnredlist.org/), fortunately it has beed monitored so that we can improve its position.
Aoluguya reindeer ( Figure 1A), located in the Greater Khingan Mountain region of Inner Mongolia in China (50°20′-52°30′N, 120°12′-122°55′E) and domesticated by the Ewenki people, is a population that is currently being bred in the southernmost part of this species habitat and is a valuable research target due to long-term isolation. The migration of Aoluguya reindeer (AgD) is limited by mountains and traffic, which hinders genetic exchange. In recent years, as a result of this geographical isolation, habitat degradation, long-term inbreeding, population degradation, and weakened disease resistance the number of population has shown a dramatic decline (Ju et al., 2019). It is recorded that the number of reindeer is always around 1,000 and the development is not optimistic (Zhai et al., 2017). The loss of reindeer will has severe negative consequences for the environment and the indigenous cultures of the North, such as the Ewenki.
Because of the limited analyses of the genomic characteristics and genetic diversity of AgD, the genetic basis of these traits remains unclear. This also leads to a lack of relevant measures to preserve this population. So the sequencing of the AgD will be a valuable resource for researchers focused on its conservation and those interested in further reindeer genomic studies. Taking this into account, here, we report and release whole-genome sequencing data from 10 individuals with AgD to be available.

Sample collection, DNA extraction, and sequencing
The Genhe Hongrun Green Breeding Professional Cooperative has the license to farm AgD in the Greater Khingan Mountain region of Inner Mongolia. About 2 mL blood samples of 10 unrelated adult individuals (10 males) caughted by the experienced staff were collected from the veins of tail, and all animals remained healthy after blood collection. The EasyPure Blood Genomic DNA Kit (TransGen Biotech) was used to extract genomic DNA from the blood samples. The libraries were constructed for samples which concentration of genomic DNA >0.5 μg. Finally, the qualified libraries were sequenced by a 2 × 150 bp paired-end model with DNBSEQ-T7 at Novogene Bioinformatics Institute (Beijing, China).

Data processing and variant calling
The FASTP v0.23.2 (Chen et al., 2018) software was used to remove the adapter and low-quality sequences in raw data. Then, the clean data were mapped to the reindeer genome ULRtarCaribou_2v2 (https:// www.ncbi.nlm.nih.gov/datasets/genome/GCA_019903745.2/) using BWA-MEM v0.7.17 (Li and Durbin, 2009) with default options. The Single Nucleotide Polymorphisms (SNPs) were detected by means of the Genome Analysis Toolkit (GATK v4.1.4) variant calling pipeline (McKenna et al., 2010). To reduce the false positive, only the calling quality greater than 20 was considered for these to be called. The PLINK v1.9 (Purcell et al., 2007) was used to remove the SNPs with missing rate >0.10, and minor allele frequency (MAF) < 0.15 which ensure at least 3 alleles were found in the sample set. Finally, the autosomal SNPs with only two alleles were retained.

Data description
The FASTQ sequence data are available in NCBI Short Read Archive, which bioproject accession number is PRJNA983536. A total of 467.23 Gb clean data was retained. All individuals have been deeply sequenced with about 34.9-54.1 Gb data gained. The average mapping rate was~99.25% and the average depth was~16.8 × for each sample (Supplementary Table S1). After the SNPs detection and filtering, 8,151,569 high-quality autosomal bi-allelic SNPs (MAF ≥0.15) were identified. In these SNPs, a total of 5,487,298  Figure S1A). The high-quality SNPs were used to measure the Linkage disequilibrium (r 2 ) using PopLDdecay . The LD (linkage disequilibrium) decay pattern for distances up to 300 kb is shown in Figure 1B. LD value rapidly decreased with the increased distance between pair-wise SNPs, and its value (r 2 = 0.33) was about 6,500 bp in the AgD. LD levels at different distances were shown in Supplementary Table S2.
In order to get an insight into the genetic differences between AgD and other populations abroad, here we collected the resequencing data of other three populations including Fennoscandian domestic reindeer (FeD), Fennoscandian wild reindeer (FeW), and Russian wild reindeer (RuW) for comparison (Weldenegodguad et al., 2020) (Supplementary Table  S3). The Principal Component Analysis (PCA) has been performed ( Figures 1C, D) by Genome-wide Complex Trait Analysis (GCTA v1.92.3) (Yang et al., 2011). The plot of PC1-PC2 and PC1-PC3 all showed a clear genetic differentiation between AgD and other three populations. Moreover, the PCA pattern suggested a broader genetic differentiation within the FeW and RuW populations than in AgD.
To better understand the genetic relationship between AgD and other three populations, an unrooted maximum likelihood (ML) phylogenetic tree was constructed using RAxML (Stamatakis, 2014) software with 100 bootstraps ( Figure 1E). After visualization using iTOL v6 (https://itol.embl.de/), we found AgD has a closer genetic relationship with RuW. The neighbor-joining (NJ) tree using the genetic distance estimated by PLINK v1.9 (Purcell et al., 2007) also supported the same genetic relationship (Supplementary Figure S1B). In addition, the genetic diversity indices including nucleotide diversity (pi), observed heterozygosity (H O ), and expected heterozygosity (H E ) were estimated. PLINK v1.9 (Purcell et al., 2007) with the command "--hardy" was used to perform H O and H E , and VCFtools v0.1.16 (Danecek et al., 2011) was used to calculate pi. The average H O and H E in AgD were higher than in other three populations, and the average pi value also indicated a similar trend which noted that AgD has a higher genetic diversity (Table 1). This assumption can also be reflected in Supplementary Figure S1C that AgD has the lowest LD decay pattern in four populations. Finally, the method F GRM was used to calculate the inbreeding coefficient of each population (Table 1). F GRM was performed using the GCTA v1.92.3 (Yang et al., 2011) software with the option "--ibc". The results showed that this AgD population has a low inbreeding level.
In summary, this study provides whole-genome resequencing data of ten AgD in China, which provides a reference of data and theory for the future conservation of the valuable genetic resource AgD and the research on species of the genus Rangifer.

Data availability statement
The original contributions presented in the study are publicly available. This data can be found here: https://www.ncbi.nlm.nih. gov/bioproject accession number: PRJNA983536.

Ethics statement
The animal study was reviewed and approved by the Institutional Animal Care and Use Committee of Jilin University.

Author contributions
LS and SY designed the research. ZS, MW, and SY collected the samples. MH, XQ, LQ, ZZ, HS, and LT conducted experiments. LS and ZS wrote the manuscript. All authors contributed to the article and approved the submitted version.

Funding
This study was supported by the Changchun Scientific and Technological Plan Project (No. 21ZGN01).

Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.