Genetic Diversity of a Natural Population of Akebia trifoliata (Thunb.) Koidz and Extraction of a Core Collection Using Simple Sequence Repeat Markers

Understand genetic diversity and genetic structure of germplasm is premise of germplasm conservation and utilization. And core collection can reduce the cost and difficulty of germplasm conservation. Akebia trifoliata (Thunb.) Koidz is an important medicinal, fruit and oil crop, particularly in China. In this study, 28 simple sequence repeat (SSR) markers were used to assess the genetic diversity and genetic structure of 955 A. trifoliata germplasms, determine their molecular identity and extract a core collection. The genetic diversity of the 955 germplasms was moderately polymorphic. The average number of alleles (Na), observed heterozygosity (HO), expected heterozygosity (HE), Shannon’s information index (I∗), and polymorphic information content (PIC) were 3.71, 0.24, 0.46, 0.81, and 0.41, respectively. Four subpopulations were identified, indicating a weak genetic structure. A 955 germplasms could be completely distinguished by the characters of s28, s25, s74, s89, s68, s30, s13, s100, s72, s77, and s3. And each germplasm’s molecular identity was made up of eleven characters. The core collection was composed of 164 germplasms (17.2% of 955 total germplasms in the population) and diversity parameters differed significantly from those of a random core collection. These results have implications for germplasm conservation. At the same time, based on the results, the 955 germplasm could be better used and managed.


INTRODUCTION
Akebia trifoliata (Thunb.) Koidz belongs to the family Lardizabalaceae and genus Akebia Decne and is mainly distributed throughout China, Japan, North Korea, and Russia. In China, this species is mainly distributed in the Yangtze and Yellow River Basins, and Shaanxi-Sichuan area (Xie et al., 2006). A. trifoliata has a long history of practical use in traditional Chinese medicine Abbreviations: SSR, simple sequence repeat; N a , number of alleles; N e , effective number of alleles; H o , observed heterozygosity; H e , expected heterozygosity; Nei' s, genetic distance; I * , Shannon's information index; PIC, polymorphic information content. (Xu et al., 2016). And in recent years, it has been developed as a fruit and oil crop (Zhao et al., 2014;Fang et al., 2021). Although A. trifoliata has a wide range of uses, the research progress on its biology has been slow, especially its domestication and genetics. At present, there is few work on the evolution, origin, domestication and so on of this species, which greatly limits its development. With the analysis of the genetic mechanism of main phenotypic characterization and its genome, it is expected to change the status quo (Niu et al., 2020;Huang et al., 2021).
Akebia trifoliata has different phenotypic characterization which needs to be improved for different purposes. For example, as a fruit, it needs to reduce the number of seeds; as a medicinal material, it needs to increase the content of medicinal ingredients; as an oil crop, it needs to increase the content of unsaturated fatty acids (Zhou et al., 2021). Genetic diversity and population structure is important for breeding of A. trifoliata. There are many different ways to determine genetic diversity and genetic structure, such as morphological, physiological, ecological characteristics, chromosome structure, biochemical and DNA marker (Rana et al., 2015). DNA marker is not influenced by growth and environmental conditions, so it is widely used in plant science (Agarwal et al., 2008). Among different DNA marker, simple sequence repeats (SSRs) are widely used, owing to their high polymorphism, reliability, rapid and simple detection, low cost, and easy operation. However, to the best of our knowledge, there have been few reports on A. trifoliata genetic diversity based on DNA markers. In previous study, the number of germplasm was few and only came from a part of China, so the results may not be representative (Niu et al., 2019;Zhang et al., 2021).
Germplasm conservation is essential to biodiversity and plant breeding, but it takes a lot of time and energy to collect and conserve as more germplasm as possible. However, redundant genetic resources present a challenge for the effective conservation, management, evaluation, and utilization of germplasm. To resolve this issue, it is necessary to construct a core collection to effectively preserve and utilize these germplasms (Holbrook and Anderson, 1995). Core collection were first proposed by Frankel, which provides preliminary information on diversity in a large collection (Frankel, 1984). Most core collections include only 5-20% of the total germplasms but preserve most of the genetic diversity, thereby reducing the cost and increasing the speed of the work process. Core collections have been established for many crops, even some important crops have established a variety of core collection, such as rice (Abadie et al., 2005;Yan et al., 2007), wheat (Balfourier et al., 2007;Kobayashi et al., 2016) and maize (Yu et al., 2005;Yan et al., 2009). However, thus far, only one core collection has been developed for A. trifoliata, and the germplasms were just collected from one province of China, therefore, the core collection had limitations .
In the present study, 28 pairs of SSR markers were used to analyze the variation of 955 A. trifoliata germplasms which were collected from China. Whose aims are the followings: (1) to evaluate the overall genetic diversity and genetic structure (2) to establish molecular identity to distinguish 955 germplasms and (3) to construct a core collection to facilitate later management. Genetic diversity and genetic structure will be valuable to guide the available collection and application of A. trifoliata in China. Molecular identity and core collection will be beneficial to the A. trifoliata germplasm management and breeding.

Plant Materials and DNA Isolation
The 955 A. trifoliata germplasms, for which the collection location was unclear, were cultivated in the Taojiang experimental field of the Institute of Bast Fiber Crops, Chinese Academy of Agricultural Sciences in 2012. Fresh tender leaves from each germplasm were placed in a liquid nitrogen tank, transported to the laboratory, and frozen at −80 • C until genomic DNA extraction. Genomic DNA was extracted using a Rapid DNA Extraction Kit (Tiangen Biotech, Beijing, China). The purity and quality of extracted DNA were evaluated by 1% agarose gel electrophoresis and determined using a NanoDrop 2000 spectrophotometer.

SSR Analysis
From the SSR primers developed by Niu et al. (2019), 28 pairs of primers with high polymorphism were selected (Supplementary Table 1). SSR-primed polymerase chain reactions (PCRs) were carried out in a 10-µL reaction volume with 1 × PCR buffer, 0.2 mmol/L dNTP, 1 U of Taq DNA polymerase (Tiangen), 0.5 µL of forward primer (10 nmol/L), 0.5 µL of reverse primer (10 nmol/L), and 0.5 µL of DNA from each accession. PCR was performed under the following conditions: 94 • C for 5 min, followed by 33 cycles each of 30 s at 95 • C, 30 s at the primerspecific annealing temperature, 30 s at 72 • C, and a final extension of 10 min at 72 • C. The PCR products were separated on 8% polyacrylamide gels, and silver dyeing was conducted according to the methods of Zhang et al. (2000). Based on Ni et al. (2018), molecular weights were estimated using a DNA marker. The allele with the maximal molecular weight was recorded as "A, " followed by B, C, D. If only one band was obtained for a set of primers, the accession was recorded as homozygous.
Establishment of Molecular Identity 1, 2, 3, 4 and A, B, C, D, E, F was used to instead of A/A, B/B, C/C, and D/D and A/B, A/C, A/D, B/C, B/D, and C/D genotype, respectively, and 0 represented no band. The character of the SSR marker which has the highest PIC was first to be used distinguish 955 germplasms. Then, the second, the third was added until all germplasm were completely distinguished. Finally, the molecular identity of each germplasm consisted of characters corresponding to the SSR markers.

Extraction of a Core Collection
The stepwise clustering method can effectively preserve the genetic diversity of the original germplasm (Wang et al., 2007). Accordingly, in this study, stepwise clustering was used to extract a core collection based on SSR markers. First, genetic distances were calculated for the original collection, and a cluster analysis was then performed according to the genetic distances. Next, a tree diagram was obtained. According to the principle of clustering, the differences within groups were the smallest at the lowest level. Therefore, one of the two genetic materials in each group was randomly selected to enter the next round of the cluster analysis. If only one genetic material was available, then it was used in the next round of the cluster analysis. All retained genetic materials were re-entered into the next round of the cluster analysis. The method was repeated until the material met the set requirements to obtain the core collection.

Data Analysis
PopGene version 1.3.2 was used to analyze the effective number of alleles (N e ), Shannon-Weaver diversity index (I * ), genetic distance (Nei's genetic distance), observed heterozygosity (H O ), and expected heterozygosity (H E ) (Yeh and Boyle, 1997); PowerMarker version 3.2.5 was used to estimate the polymorphic information content (PIC) and number of alleles (N a ) (Liu and Muse, 2005). Based on Nei's genetic distances and the unweighted pair group method with arithmetic mean (UPGMA), a clustering tree was constructed using PowerMarker and visualized using MEGA version 7.0 and iTol (Tamura et al., 2013). Population genetic structure was assessed using the mixed and correlated allele frequency models in STRUCTURE version 2.3.4 and Structure Harvester version 6.0 (Pritchard et al., 2000;Earl and Vonholdt, 2012). The variance analysis was implemented in SAS version 9.0 (Park, 2002).

Genetic Diversity of 955 Germplasms
A total of 104 alleles were detected using 28 SSR markers. As summarized in Table 1, N a per locus ranged from 2 to 5 (mean, 3.7143). Seventeen primer pairs amplified four alleles, and five primer pairs amplified two alleles, only one amplified five alleles. N E ranged from 1.2018 to 2.8556 (mean, 1.9873), H O from 0 to 0.83 (mean, 0.2382), H E from 0.1681 to 0.6521 (mean, 0.4604), Nei's distance from 0.1679 to 0.6535 (mean, 0.4600), I * from 0.3083 to 1.1866 (mean, 0.8086), and PIC from 0.1538 to 0.5936 (mean, 0.4085). The PIC indicated that the 28 SSR markers were moderately polymorphic (0.25 < PIC < 0.5). The most polymorphic SSR marker had 3.86 times higher variance than that of the least polymorphic marker. Seven microsatellites exhibited high polymorphism (PIC > 0.5), and four microsatellites exhibited low polymorphism (PIC < 0.25). The heterozygosity of A. trifoliata was relatively low based on H O and H E (i.e., 0.238 and 0.460, on average, respectively).

Genetic Structure of 955 Germplasms
A cluster analysis was performed to analyze the genetic relationships among the 955 A. trifoliata accessions, and a dendrogram based on genetic distances and the unweighted pair group method with arithmetic mean is shown in Figure 1. The cluster analysis divided 955 germplasms into four main groups, accounting for 25.03, 18.32, 17.07, and 39.58% of the natural population, respectively.
We evaluated K-values (population number) of 1 to 10 for a STRUCTURE analysis. The most significant change in likelihood occurred when the K-value increased from 2 to 4, and the highest K value was observed between K = 2, 4, and 6 ( Figure 2A). The average value of LnP(K) increased when the K-value ranged from 1 to 10 ( Figure 2B), but when the K-value was 4, the growth rate of LnP(K) decreased. Therefore, the optimal K-value in the present study was 4 with the division of the natural population into four subgroups ( Figure 2C). According to the results of the dendrogram and structure analysis, four main groups were clustered, suggesting that the four-clade model sufficiently explained the genetic structure of the 955 germplasms.

Establishment of Molecular Identity
According to PIC value, the character of s28 was first used to distinguish 955 germplasm, followed by s25, s74 and so on. The results showed that 11 markers (s28, s25, s74, s89, s68, s30, s13, s100, s72, s77, and s3) could distinguish the 955 germplasms. The molecular identity of each germplasm consisted of 11 characters (Supplementary Table 2); for example, the molecular identity of TJ1 and TJ2 was D0002D21100 and D1002322100, respectively.

Extraction of a Core Collection
Stepwise clustering was used to extract a core collection and using N a , Nei's distance, H O , and PIC over the 28 SSR markers as indicators. The core collection consisted of 164 genetic individuals (Table 2), representing only 17.2% of the    Tables 3-6). As shown in Table 3, all four indicators were significantly different from those in the truly established core collection. These results support the validity of the method for extracting the core collection and further suggest that the core collection effectively represents the entire genetic population.

DISCUSSION
Progress in A. trifoliata breeding has been slow, in part because the plants grow slowly, and new plants do not bear fruit for 4 years (Zhang et al., 2013). Therefore, cultivating new A. trifoliata varieties is time consuming. Furthermore, little known about the biological characteristics of A. trifoliata, making it difficult to select good parents for breeding. Molecular genetic markers are widely used in plant breeding, and genetic diversity must be considered when identifying trait populations and selecting parent material to ensure breeding success. The results obtained in this study can deepen our understanding of the genetic diversity of germplasm and facilitate its rational utilization of A. trifoliata. In this study, an SSR analysis of 955 A. trifoliata germplasms was performed to evaluate genetic diversity. The 28 SSR markers were selected which were high polymorphism and high stability from 49 SSR markers (Niu et al., 2019). In a previous study, 49 pairs of SSR markers were used to analyze 88 A. trifoliata germplasms, which were collected from eight provinces including 16 regions in China (Niu et al., Frontiers in Genetics | www.frontiersin.org been used to evaluate polymorphisms in 242 individuals from 11 natural populations , resulting in genetic diversity of N a = 1.99, I * = 0.47, ISSR; N a = 1.99, and I * = 0.50, SRAP. These values were inferior to the corresponding values in the present study, probably because the germplasms came from only one province of China. As for genetic structure, Niu et al. (2019) divided germplasms from eight provinces into four groups, which was similar to the results of the present study. Zhang et al. (2021) divided germplasms from 11 populations into three groups. These studies indicated that the genetic diversity of A. trifoliata was weak.
Although crop breeding is based on abundant crop germplasm resources, redundant germplasms have various limitations. For example, it is difficult to precisely and rapidly identify useful resources for plant breeding. The management and preservation of germplasm resources are expensive and time-consuming, a core collection can effectively resolve these issues (Frankel and Brown, 1984). Although core collections have been reported for A. trifoliata (56/242), the germplasm was only from the Qinba mountain area of China, and the results had certain limitations . The core germplasm represented 17.1% of all accessions, which is higher than the range of 5-10% recommended by Brown as well as the values reported in other plants (Brown, 1989), e.g., sesame (S. indicum) (28/277) (Park et al., 2014), maize (Z. mays) (951/13521) (Yu et al., 2005), and soyabean (G. max) (Oliveira et al., 2010). In contrast, they are slightly less than those for the rubber tree (Hevea brasiliensis) (128/505) (Adifaiz et al., 2020), ramie (Boehmeria nivea L.) (22/105) (Luan et al., 2014), and Gympie messmate (Eucalyptus cloeziana F. Muell., family Myrtaceae Juss.) (247/707) (Lv et al., 2020). However, in applying one additional filter, the N a and H O are reduced to 82.1 and 84.2%, respectively, of those for the full population, and the core collection is reduced to eight genetic individuals. The maintenance of the vast majority of germplasm diversity should be a priority for guiding the determination of an optimal fraction; accordingly, we did not aim for a low rate of germplasm retention. In comparison with the total natural population, the final core collection retention rate was 94.2, 98.7, 116.4, and 116.73% of Na, H O , Nei's distance, and PIC, respectively. These results indicated that the core collection could represent the genetic diversity of 955 germplasms.
Abundant crop germplasm resources are the basis of crop breeding, but germplasm collections is time demanding and laborious. Collection of the 955 germplasm resources from China took about 10 years, and the germplasms were cultivated in the Taojiang experimental field of the Institute of Bast Fiber Crops, Chinese Academy of Agricultural Sciences. However, due to early management and staff turnover, the geographic origin of each germplasm was unclear, which limits our understanding and utilization of the germplasm resources. Although the geographic origin of germplasms is important for genetic diversity research, the 955 germplasms in our collection are still important for the breeding of A. trifoliata, even with some missing geographic information. The use of modern technology to distinguish these germplasms is a key possibility. SSR molecular marker technology is not affected by geographical origin and can result in high polymorphism, stable results, and good repeatability (Ni et al., 2018). Therefore, SSR markers were used in the present study. Based on our results, we could reclassify these germplasms into individuals and populations. Thus, the molecular identity and core collection will be useful for management strategies, and the genetic diversity and genetic structure will be beneficial for breeding.

CONCLUSION
To the best of our knowledge, this study used the largest number of A. trifoliata germplasms to date. This results showed moderate genetic diversity and weak genetic structure in the natural population of A. trifoliata, based on 28 SSR markers. Eleven pairs of primers could distinguish the 955 germplasms, and the molecular identity of each germplasm consisted of 11 characters. A core germplasm collection consisting of 164 germplasms was generated, accounting for 17.2% of the original germplasm. Further, estimated of genetic diversity and genetic structure could provide a foundation for future A. trifoliata breeding. The core collection and molecular identity could reduce the management cost and improve the protection of germplasm resources.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
ML and JaC conceived and designed the project. YZ, YW, ZS, JN, YS, KH, and JnC collected the plant materials. YZ, YW, ZS, JN, YS, and JnC performed molecular labwork and scored the markers. YZ and YW analyzed the data and wrote the manuscript with assistance from all other authors. All authors read and approved the final manuscript. The funding bodies had no role in the design, collection, analysis, interpretation of data, or writing the manuscript.