Genetic diversity assessment of Hopea hainanensis in Hainan Island

Hopea hainanensis (Dipterocarpaceae) is an endangered tree species restricted to Hainan Island, China, and a small part of Northern Vietnam. On Hainan Island, it is an important indicator species for tropical forests. The wood of Hopea hainanensis has a very high utilization value in nature since it is compact in structure, hard in texture, not easily deformed after drying, durable, and resistant to sunlight and water. As a result of its high quality, it has been felled and mined by humans without restraint, resulting in a reduction of its population size, severe habitat fragmentation, and a sharp decline in its population. Therefore, its conservation biology needs to be researched urgently. Researchers are currently focusing on the ecological factors and seed germination in the habitat of Hopea hainanensis to determine its endangered status. In the literature, there are no systematic analyses of the endangered mechanism of Hopea hainanensis in terms of genetic diversity. It focuses especially on the systematic genetic diversity of Hopea hainanensis in fragmented habitats. Using single nucleotide polymorphism (SNP) and genotyping-by-sequencing (GBS) technology, 42 samples from seven different cohabitation groups were genotyped. The results showed that the average heterozygosity of the seven populations of Hopea hainanensis was 19.77%, which indicated that the genetic diversity of Hopea hainanensis was low. Genetic diversity research is essential for rare and endangered plant protection research. We can find a scientific basis for protecting endangered plants on slope bases by analyzing genetic differences and relationships among populations.


Introduction
The Earth's biodiversity quickly decreases due to agricultural growth, over-exploitation, deforestation, pollution, and climate change (Wang et al., 2007). Around 40% of plant species are on the verge of extinction (Ly et al., 2018). Conservation genetics, a new science that uses population genetics principles and methods to biological conservation, aims to save endangered species from extinction (Hogarth et al., 1997;Sarath Padmanabhan and Hemaprabha, 2018). Endangered animals are frequently distinguished by tiny, scattered populations and limited gene flow among populations (Mehmood et al., 2018). Mating happens more commonly among relatives in tiny, isolated populations, and selfing may be observed in hermaphroditic plants. Inbreeding causes homozygosity in harmful recessive genes and, as a result, the generation of poorer offspring, a condition known as inbreeding depression (Kardos et al., 2016).
Furthermore, genetic drift is higher in small populations, contributing to the fixation of harmful mutations and loss of genetic variation, weakening a population's adaptive ability and raising its extinction risk (Kardos et al., 2021). The area of conservation genetics, which aims to research the genetic diversity, population differentiation, mating system, and historical demography of endangered species, gives amazing insights into preserving biodiversity in the real world (Brown et al., 1991). Furthermore, Hopea hainanensis research is primarily concerned with the impacts of various environmental conditions in the habitat on seed germination and seedling development, ex situ conservation, and better seed selection and cultivation techniques in artificial propagation and cultivation .
Dipterocarpaceae comprise 20-50% of the basal forest area and more than 50% of the canopy trees in tropical Asian forests (Ghazoul, 2016). Because many Dipterocarpaceae species are valuable wood resources, they have been widely exploited in tropical Asian nations. As a result of the widespread harvesting of timber and destruction for agriculture, many dipterocarps are now designated as endangered or severely endangered . On the other hand, Dipterocarp woods are considerably more than just a supply of lumber. They are vital components of Asian tropical rainforest ecosystems, acting as the foundation for these varied ecosystems. Indeed, Southeast Asia is home to four of the world's 25 "biodiversity hotspots" (Wang et al., 2020). Furthermore, dipterocarp forests provide various ecosystem services and play a significant role in balancing ecological processes at the regional and global levels (Agoramoorthy, 2002). It is the representative and endemic species of the tropical ravine rainforest in Hainan.
The natural survival population of H. hainanensis in Hainan is mainly distributed in the forest patches dominated by broken secondary rainforests in and around Limushan, Bawangling, Jianfengling, Diaoluoshan, Yinggeling, Wuzhishan, and other forest areas in Hainan Island (Song et al., 2020). H. hainanensis is listed as a class I protected plant in the List of China's National Key Protected Wild Plants (Jiang, 2019). It was identified as an endangered species in the Red Book of Chinese Plants and is ranked as "Endangered" by the IUCN (Ly et al., 2018). Currently, studies on H. hainanensis mainly focus on the effects of various environmental factors in the habitat on seed germination and seedling development, ex situ protection, and improved seed selection and cultivation techniques in artificial propagation and cultivation . There is a lack of systematic analysis of the endangerment mechanism in terms of genetic diversity, and there is even less research on the systematic genetic diversity of ports in different fragmented and chemical habitats .
However, genetic diversity as an extinction mechanism for H. hainanensis has not been systematically studied. A few studies have been conducted on the genetic diversity of Hopea hainanensis systems in fragmented and metaplastic environments. Wang et al. isolated and identified 12 polymorphic microsatellite markers on endangered H. hainanensis (Wang et al., 2020). The genetic diversity and population structure of 10 H. hainanensis populations were analyzed using 12 SSR markers in Hainan Island. The emergence of population bottlenecks may cause genetic diversity loss and population differentiation. Long-term protection strategies for endangered species in Hainan were proposed.
In many fields, genotyping by sequencing (GBS) in simplified genome sequencing technology has become increasingly popular as next-generation high-throughput sequencing technology has developed (Mehmood et al., 2022). These include the construction and analysis of genetic maps, the study of genome-wide association systems and gene diversity and identifying the germplasm of plants and animals. Therefore, in this study, GBS technology was used to systematically identify 42 genome-wide SNPs of H. hainanensis resources. Based on the identified SNPs, a phylogenetic tree of these 42 H. hainanensis resources was constructed, and genetic diversity was analyzed. It has practical guiding significance for the protection of H. hainanensis resources and the improvement of its ecological environment. It is of great significance to the protection of H. hainanensis biodiversity.

Study area
Hainan Island (E108°37′-111°03′, N18°10′-20°10) is located in southern China (Zhang et al., 2021), and it is the largest island city in China (Figure 1). Hainan Island has a mild climate, with an annual average temperature of 22-27°C, and is rich in forest resources . Hainan Island is low and flat all around and towering in the middle, with Wuzhishanand Yinggeling as the uplift cores and descending step by step to the periphery. (https://www.hainan.gov.cn). Hainan Island is hailed as the largest "natural museum" by biologists, and 102 rare animals in Hainan Island have been included in the list of national firstand second-class key protected animals . There are many rare and endangered wild plants. At the same time, northern and coastal regions have relatively low biodiversity and fewer rare and endangered species due to greater human disturbance intensity (Nizamani et al., 2021).

Construction and sequencing of genomic GBS library
The quality and concentration of the extracted DNA were tested to be qualified and then sent to Hangzhou Lianchuan Biotechnology Co., Ltd. for GBS library construction and A distribution and sample collection information map of Hopea hainanensis in Hainan Island. Chen et al. 10.3389/fpls.2022.1075102 Frontiers in Plant Science frontiersin.org sequencing. RsaI-HaeIII digestion was used for digestion. The high-throughput sequencing library was prepared through terminal repair, A-tail addition, sequencing adaptor addition, purification, PCR amplification, and so on. The library was purified by electrophoresis and gluing according to the preset scheme. The gluing range of the library was 450-500bp to select the library with the length of the inserted fragment in the target interval for subsequent sequencing. Only libraries that had been screened for fragment length were qualified for sequencing. The sequencing platform was Illumina Nova6000, and the sequencing mode was PE150.

SNP mining in H. hainanensis genome
After the sequencing data is taken off the machine, the raw read data obtained by sequencing is quality-controlled, and low-quality sequences and splice sequences are removed to obtain a clean read. After that, all samples' reads are clustered, similar reads are clustered together as a tag, and the consensus sequence is generated. Then, the data were aligned with the consensus sequence by the individual, and the Clean Read data were aligned with the consensus sequence. GATK software and SAMTOOLS software were used for SNP detection, and the quality filtering of the detected mutation sites was carried out. The evolutionary analysis was based on SNP data. Before evolutionary population analysis, SNPs were filtered according to the minor allele frequency (MAF) > 0.05 and data integrity > 80% (i.e., no more than 20% of individual genotypes were missing).

Phylogenetic tree analysis of H. hainanensis
The phylogenetic tree is a diagram used to describe the genetic differentiation relationship between species. According to the evolutionary relationship between different sources and different types of organisms, all kinds of organisms are placed on the branching tree diagram. The evolution process and the relationship between these organisms are succinctly described. Based on SNP, 1000 replicates of the PDIST model were obtained as phylogenetic trees of all samples based on the neighbor-joining algorithm of MEGA software (Dieckmann et al., 2016).

Principal component analysis
Principal Components analysis (PCA) was performed based on SNP to obtain the clustering of Principal components of all samples. Through PCA analysis, it can be known which samples have short genetic distances and close relatives. The samples with long genetic distances and distant relatives are helpful for population genetic evolution analysis.

Analysis of population genetic structure
Population genetic structure analysis can provide information on the origin and composition of individual lineage. Based on SNP, the population structure of all samples was analyzed by ADmixture software, and the number of clusters (K value) was assumed to be 1-10, respectively. Different K values represent the distribution of ancestral genetic material of different populations under the assumption that there are K ancestral groups.

Analysis of the genetic relationship
Based on SNP, the genetic distance between pairs of all samples was calculated. We can get the relative distance between samples by analyzing genetic distance data, which can assist the evolution analysis. In the phylogenetic heat map, the redder the color, the closer the relationship between the two individuals on the horizontal and vertical axes, the large area of red in the phylogenetic heat map between multiple individuals indicates that these individuals may constitute a closely related family group. Conversely, the bluer the heat map, the more distant the relatives.

The quality of sequencing
After GBS sequencing, 28.09 Gb of Raw READ data were obtained from 42 H. hainanensis samples. After removing lowquality sequences, sequences containing more than 5% N (N represents undetermined base information), and adapter sequences, 27.85 Gb of high-quality sequencing data (Clean data) was obtained. The average size of each sample is 0.66 Gb. The average proportion of base error rate below 1% (Q20) was 96.66%, and the average proportion of base error rate below 0.1% (Q30) was 91.34%, indicating the high quality of sequencing. The average ratio (GC content) of guanine (G) and cytosine (C) among the four bases of DNA was 47.84%, indicating that the distribution was reasonable. The data overview of each sample is shown in Table 2.

SNP site mining
After comparing the data to the consensus sequence, GATK and SAMTOOLS software were used for mutation detection (Wright et al., 2019). SNPs consistently output by the two software were retained as reliable loci. According to the criteria of MAF >0.05 and data integrity >0.8, SNP data were further processed and filtered to retain SNP S with polymorphisms. After filtering the SNPs obtained, 430376 high-quality SNPs were finally obtained for subsequent analysis. It can be seen from the following Table 3 that the heterozygosity of the Fanyang population (FY) is relatively high, which may be related to the fact that the Fanyang population has only one sample, the sample size is small, the width of the genetic variation is insufficient and other factors, so there is not enough sample data for comparative analysis of the genetic diversity in this population. The heterozygosity of the other six populations ranged from 19.26% to 20.34%, with average heterozygosity of 19.77%, indicating a low level of genetic diversity.

Phylogenetic evolutionary tree
The identified high-quality SNPs were used for phylogenetic analysis of the 42 H. hainanensis resources. After 1000 repetitions based on the PDIST model, the neighbor-joining algorithm of MEGA software was used to perform evolutionary analysis on all samples, and the phylogenetic tree of 42 H. hainanensis sample resources was obtained (Figure 2). Samples from the same sampling site were relatively closer to each other. However, the relative distance between the samples from different sites means that the internal samples from different sampling sites in these seven population areas may have a common ancestor. The results showed that the 42 samples could be divided into two large groups, and each could be further divided into small subgroups. In general, the samples of the same geographical origin were relatively aggregated in the two large taxa. Still, the distribution was mixed in the small subgroups, and the samples of the same geographical origin were not completely merged into the same subgroup. Group I mainly include the resources from Diaoluoshan, Fanyang Mountain, and Yinggeling, and the resources from Wuzhishanare clustered into Group II. The resources from Limushan, Bawangling, and Jianfengling are distributed in both groups, and the distribution is relatively chaotic. The small subgroups clustered in group I were divided into three small independent subgroups, indicating a large difference in kinship distance between the large group and each other. The aggregation of samples in group II was relatively uniform. Therefore, although there is certain geographical isolation between the H. hainanensis resources of different population areas, there is no direct correlation between the clustering based on genetic distance and its geographical source.

Analysis of population genetic structure
To further verify the evolutionary genetic relationship between the samples and infer that the H. hainanensis population likely came from several ancestors. The genetic structure of the mutations in each sequencing sample was further studied. Based on SNP data, ADmixture software was used to analyze the population structure of all samples. Then, cluster analysis was performed, assuming that the number of clusters (K value) was 1-10. Different K values represent the distribution of ancestral genetic material of different populations under the assumption that there are K ancestral groups. Since K=1 cannot represent the distribution of ancestral genetic material of different populations, it is not shown in the figure. As shown in Figure 3, when K=2 and the sample are divided into two subgroups, the sample of group 1 is almost dark blue, and the sample of group 2 is almost light purple. The samples from Fanyang (FY), Diaoluoshan (DLS), Jianfengling (JFL), and Limushan (LMS) were clustered into group 2, and the remaining samples were clustered into group 1. In the Cross-Validation (CV) errors graph (Figure 4), when K=2, CV error achieves the minimum value, indicating that the genetic differences between samples are relatively large and the genetic relationship is distant. Therefore, it can be preliminarily concluded that the seven H. hainanensis populations in Hainan Island came from two different ancestors, and there was less gene exchange among them. In the table of genetic differentiation coefficients among populations ( Table 4). The F st values among the seven H. hainanensis populations ranged from -0.05258 to 0.29542. There was significant genetic differentiation (F st > 0.25) between FY, WZS, and DLS populations. The genetic differentiation between DLS and BWL, WZS populations, and FY and BWL populations was significant (0.15 < F st < 0.25). There was a moderate genetic differentiation between BWL and JFL, YGL, FY and LMS, JFL and DLS, and the other three populations (0.05 < Fst < 0.15). In addition, the genetic differentiation among other populations was low, so differentiation could not be considered (F st < 0.05).

Principal component analysis
Principal component analysis (PCA) was carried out on H. hainanensis population samples from Hainan Island to determine the evolutionary relationship among the populations further. When the geographical distance between groups is relatively close, PCA can better reflect the relationship between groups. Samples with similar genetic backgrounds will gather in the figure to form a cluster. The farther the distance between the two samples in the figure, the greater the genetic background difference between the two samples. As shown in the following figure (Figure 5), the 42 H. hainanensis were clustered to form three independent clusters, among which eight samples from Jianfengling (JFL_2, JFL_5-JFL_8), Limushan (LMS_3, LMS_8) and Yinggeling (YGL_1) populations with similar genetic backgrounds were clustered together to form cluster 1. Fanyang (FY_1), Wuzhishan(WZS_1-WZS_2), Bawangling (BWL1-BWL_13), Yingge Mountain (YGL_2, YGL_3), Jianfeng Mountain (JFL_1, JFL_3-JFL_4, JFL_9-JFL_10) and Limushan (LMS_1, LMS_2, LMS_4-L). The 31 samples from the six populations of MS_7, LMS_9, and LMS_10 were clustered together with similar genetic backgrounds to form cluster 2. The population of DLS_1-DLS_3 was far away from the other 2 clusters, showing a long genetic distance, so the population of DLS_1-DLS_3 formed a cluster alone.

Analysis of the genetic relationship
In the relatedness heatmap (Figure 6), the relatedness coefficient was more significant than 0.4 (between the three samples of DLS_1, DLS_2, and DLS_3). The relatedness among the three samples of Diaoluoshan was very close. The genetic distance between LMS_8 and LMS_1 was very close in the relatedness heatmap. It can be concluded that the samples in the same population are more closely related, and the more distant the geographical location, the more complex the gene exchange, and the more distant the genetic relationship. The six samples, YGL_1, JFL_2, and JFL_5-JFL_8, are closely related. The three samples from Diaoluoshan (DLS_1-DLS_3) and The population structure analysis on Hopea hainanensis. The neighbor-joining clustering of Hopea hainanensis in different Population.
Limushan (LMS_1) are just between 0.2 and 0.3. This indicates that there is still some genetic exchange between clusters under geographical isolation.

Genetic diversity in Hopea hainanensis
SNP variation is the most important and widespread type of sequence variation in the plant genome, which can be easily identified by sequence alignment (Fang et al., 2014). In this study, 48795 high-quality SNPs were obtained by screening and filtering. In the natural state of H. hainanensis field, the ecological range of the population is wide. The seeds are winged nuts, and the germination rate is high. Still, the seeds have higher requirements for germination conditions in the natural environment, which restricts the development of the population (Trang and Triest, 2019). Even if the H. hainanensis seeds can germinate and grow into seedlings in the natural population, the H. hainanensis seedlings are easy to be eliminated due to their weak competitiveness, resulting in few remaining adult H. hainanensis plants and weak natural regeneration ability of the population in the field (Kenta et al., 2004;Mehmood et al., 2021). The population density was very low, leading to the population's weak reproductive ability and stress resistance and slow natural recovery and development. Genetic diversity is lost when the effective population shrinks and mating is switched from outcrossing to selfing (Ellegren and Galtier, 2016;Cai et al., 2021). It is most likely that a severe demographic bottleneck is responsible for the low genetic diversity of H. hainanensis populations. Over the past 300 years, this species has lost about 70% of its population (Ly et al., 2018). In the 20th century, Hainan Island's deforestation increased rapidly. About 80% to 95% of the primary forests have been destroyed because of logging for wood on a large scale. Furthermore, transitions to rubber trees and Eucalyptus plantations, and the growth of cities (Lin et al., 2017;Chen et al., 2018;Sun et al., 2020). Due to the high quality of its wood, the number of H. hainanensis trees would go down proportionally, or maybe even more. There is a lack of genetic diversity analysis on the endangered mechanism of levees, especially on the genetic diversity of levees in different fragmented habitats.
Based on SNP, simplified genome sequencing analysis was performed on 42 H. hainanensis samples using GBS technology. After obtaining the data, genetic evolution and population analysis were performed, such as phylogenetic tree clustering analysis, population genetic structure analysis, principal component analysis, and phylogenetic relationship analysis. In principal component analysis, the contribution rates of the first principal component (PC1), the second principal component (PC2), and the third principal component (PC3) were 28.78%, 11.2%, and 6.29%, respectively. The contribution rates of the three principal components selected in this analysis were all low, and the total contribution rate was less than 50%. Therefore, there may be a deviation (difference) between the cluster results of PCA and the analysis results of other groups. In the principal component analysis, the genetic distance of the Diaoluoshan population was far from the other populations, and a single cluster was formed. Except for principal component analysis, the population structure of all samples, K value selection of population structure, and phylogenetic evolutionary tree K selection of population structure.
analysis results showed that the cluster division was the same and supported the division of seven populations into two populations. Therefore, it is more reasonable to divide the 42 H. hainanensis samples from seven populations into two groups: Group 1 (Diaoluoshan, Limu shan, Yinggeling, Jianfengling) and Group 2 (Wuzhishan, Fanyang, Bawangling). In this study, highthroughput GBS sequencing was performed based on SNPs, and the analysis results may be limited due to the lack of reference genomes covering the whole genome of SNP.

Population genetic structure and differentiation in Hopea hainanensis
Genetic structure is influenced by many factors, such as breeding system, genetic drift, population size, seed dispersal, gene flow, evolutionary history, and natural selection (Konuma et al., 2000;Mehmood et al., 2020). The terrain of Hainan Island is low, flat ground, and high in the middle. The terrain takes Wuzhishan and Yinggeling as the uplifted core and drops progressively to the periphery. The mountain, hill, platform, and plain form a ring-stratified landform with an obvious cascade structure. The samples collected in this study were taken from Wuzhishan, Yinggeling Mountain, and adjacent forest reserves. In geographical location, the Jianfenglin population and Diaoluoshan population, Limushan between groups are far apart (> 100 km). Still, the smaller the genetic distance between the two groups (F st = 0.09211), the existing gene flow between populations may have originated from the common ancestor of genetic exchange, carried by man-made factors, animals or other factors such as geological factors into the other group.

Conclusion
In order to improve genetic diversity among H. hainanensis populations, the H. hainanensis population resources of endangered plants should be effectively protected and developed. In order to protect H. hainanensis species, H. hainanensis seedlings may be protected ex situ due to their weak competitive ability and easy inhibition by mother trees. By conserving H. hainanensis seedlings ex-situ, we can reduce competition within the population and increase competition between poke stack populations. Genetic drift can also be reduced by increasing gene flow among small populations. Additionally, cross-introduction and breeding among the seven populations can improve genetic diversity.

Implications for conservation
Because the loss of genetic variation is a major threat to endangered species, preserving and restoring genetic variation is an important conservation action (Jiang et al., 2018;Cai et al., 2021). We discovered that genetic variation in the populations BWL, WZS and FY were low. These populations are more vulnerable to biotic and abiotic stresses, their conservation is critical. Furthermore, the populations DLS, YGL, JFL and LMS had higher levels of genetic diversity and contained more than one genetic subgroup. That populations could be used as seed sources for propagating seedlings and saplings in restoring Hainan Island's previously logged lowland rainforests. It is Principal component analysis diagram of H. hainanensis.

FIGURE 6
Ties of consanguinity. difficult to regenerate native H. hainanensis populations because seedlings and saplings grow slowly and are frequently unable to establish themselves in heavily shaded conditions. To help restore endangered H. hainanensis populations on Hainan Island, select populations with high genetic diversity (e.g., for seedlings).

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/supplementary material.