Morphological and Comparative Transcriptome Analysis of Three Species of Five-Needle Pines: Insights Into Phenotypic Evolution and Phylogeny

Pinus koraiensis, Pinus sibirica, and Pinus pumila are the major five-needle pines in northeast China, with substantial economic and ecological values. The phenotypic variation, environmental adaptability and evolutionary relationships of these three five-needle pines remain largely undecided. It is therefore important to study their genetic differentiation and evolutionary history. To obtain more genetic information, the needle transcriptomes of the three five-needle pines were sequenced and assembled. To explore the relationship of sequence information and adaptation to a high mountain environment, data on needle morphological traits [needle length (NL), needle width (NW), needle thickness (NT), and fascicle width (FW)] and 19 climatic variables describing the patterns and intensity of temperature and precipitation at six natural populations were recorded. Geographic coordinates of altitude, latitude, and longitude were also obtained. The needle morphological data was combined with transcriptome information, location, and climate data, for a comparative analysis of the three five-needle pines. We found significant differences for needle traits among the populations of the three five-needle pine species. Transcriptome analysis showed that the phenotypic variation and environmental adaptation of the needles of P. koraiensis, P. sibirica, and P. pumila were related to photosynthesis, respiration, and metabolites. Analysis of orthologs from 11 Pinus species indicated a closer genetic relationship between P. koraiensis and P. sibirica compared to P. pumila. Our study lays a foundation for genetic improvement of these five-needle pines and provides insights into the adaptation and evolution of Pinus species.


INTRODUCTION
The five-needle pines are important commercial tree species, belonging to the genus Pinus in the Pinaceae family. They are key components of coniferous and broad-leaved mixed forest (Liu and Biondi, 2020;Goeking and Windmuller-Campione, 2021). There are 28 five-needle pine species, mainly distributed in Eurasia and North America, of which 12 five-needle pines are found in China. Only three pine species inhabit northeast China, which are Pinus koraiensis, Pinus sibirica, and Pinus pumila (Yang, 2014). Pinus koraiensis is distributed in Changbai, Zhangguangcailing, Laoyeling, Wanda, and Xiaoxinganling Mountain areas. Its northernmost boundary is Shengshan Forest Farm in Aihui County, Heihe City (Wu et al., 2021a,b). Pinus sibirica (Siberian pine) is only distributed in Altai, Xinjiang, and Daxinganling, Inner Mongolia. Pinus pumila (Siberian dwarf pine) only found in high-altitude mountain regions such as Daxinganling, Xiaoxinganling, Zhangguangcailing, and Changbai Mountains, scattered in small communities . These species show zonal distribution with variation in heat, cold, altitude, latitude, and longitude. They have commonalities and specificity in morphology and distribution. For instance, the external morphology and wood properties of P. koraiensis and P. sibirica are very similar, and it is difficult to distinguish them before seed production. Compared with P. koraiensis, P. sibirica had stronger cold resistance , while the natural distributions overlap. Pinus pumila is a perennial evergreen shrub with no vertical types observed. Its natural distribution overlaps with P. koraiensis and P. sibirica, but the altitude extends higher than that of P. koraiensis and P. sibirica. Furthermore, there is no reproductive isolation between the three five-needle pines species, and hybrid progeny have been obtained through interspecific hybridization (Goroshkevich et al., 2013). Although there are obvious differences between the cones and pine nuts of the three five-needle pine species, some scholars believe that P. koraiensis, P. sibirica, and P. pumila are just closely-related species (Dong et al., 2016).
Compared with other Pinaceae species, five-needle pines only contain one vascular bundle in the cross section of needles, and they are considered to be a relatively primitive and ancient species, which is important for studies of the origin and evolution of the Pinaceae (Liu et al., 2005). In addition, Pinus species are characterized by long life spans, and high levels of genetic diversity (Alicandri et al., 2020;Zeb et al., 2020). There are few reports on the genetic relationships of Pinus species, especially the five-needle pines (Niu et al., 2013;Baker et al., 2018;Zhao et al., 2018). The genetic relationships of P. pumila (Pall) Regel f. hingganensis, P. pumila and P. sibirica in the Daxinganling Mountains were examined based on peroxidase isoenzyme variation, and significant differences among the three five-needle pines was observed, and the genetic relationship between P. pumila and P. sibirica is relatively close (Chen et al., 1997). Using the same molecular makers, another study concluded that P. koraiensis, P. pumila, and P. sibirica were three distinctly different species (Shao et al., 2000). Following the development and application of DNA molecular markers and genome wide surveys, some studies explored the genetic relationship between P. koraiensis P. pumila, and P. sibirica using simple sequence repeats (SSR), and other markers of the nuclear, chloroplast, and mitochondrial genomes (Liu et al., 2005;Syring et al., 2005;Ayijiamali et al., 2013). There was obvious genetic differentiation among the three fiveneedle pines, but the genetic relationships among the three five-needle pines in Northern China was not clarified. Due to different genetic markers and research purposes, previous studies focused on the identification of Pinus species according to specific type of molecular marker, which cannot systematically identify the evolutionary relationships of species. Comparative transcriptome and ortholog analysis can play a crucial role in studies of plant evolution and can be used to explore genetic relationships of related species (Yang et al., 2017;Hu et al., 2018). At the moment, doubt remains about the genetic relationships among the three five-needle pines species as there is no genomic evidence to support this proposal. Clarifying the genetic relationships and taxonomy among these three five-needle pines would help us to understand the floral composition, origin and evolution of the northeast forest region of China, and would aid in the protection of these germplasm resources.
In this study, we compared the needles variation and transcriptome pattern of three endemic five-needle pines from different regions of northern China. In general, leaf morphological such as shape and size are considered as an important index for analyzing phenotypic genetic variation, which can reflect the adaptability of plants to different habitats and conditions (Guo et al., 2018). Expressed sequence tag-simple sequence repeat markers (EST-SSRs) were also identified and compared in these three five-needle pines. We also investigated the geneenvironment correlation of the three species. A total of 11 pine species, including two, three, and five-needle pines were compared to infer evolutionary relationships using transcriptome sequences. These results will contribute to greater understanding of the genetic differentiation and relationships of the three five-needle pines in the Pinus phylogeny.

Plant Materials
A total of 172 tree samples from six natural populations of three five-needle pines [P. pumila (Pall.) Regel, P. sibirica (Du Tour), and P. koraiensis (Siebold and Zuccarini)] were collected at the mature stage of needles development and used to document phenotypic variation ( Table 1). The collected samples were divided into three groups according to the species, and each species has two representative populations including P. pumila (ALS and LJ), P. sibirica (MG and IKT), and P. koraiensis (LS and MES). The altitude of the samples ranged from 231 to 2,052 m. Upon collection, the needles were quickly chilled and held at 4°C in a refrigerator for phenotypic measurement. For each individual, three needles were considered as one mixed sample. Four needle traits were measured, needle length (NL), needle width (NW), needle thickness (NT), and fascicle width (FW). The measurement methods were described Frontiers in Plant Science | www.frontiersin.org in a previous study (Liu et al., 2021). For each species, fresh needle samples were quickly frozen in liquid nitrogen and stored at −80°C refrigerator for RNA extraction and transcriptome sequencing.

Collection of Climate Data
To better understand the relationship of species distribution and climate factors, the climate data was downloaded from the WorldClim database. 1 The data package was at a resolution of 2.5 arc minutes (about 5 km 2 ), and the climate data was observed from 1950 to 2000. The climate data was imported into the DIVA-GIS software (v.7.5). 2 According to the geographic coordinates of the natural populations of the three five-needle pines, 19 bioclimatic variables (bio1 to bio19) including temperature and precipitation, for each population, were obtained. The bioclimatic variables, from bio1 to bio19, were annual mean temperature, diurnal temperature difference, isothermality, seasonal variance of temperature, extreme high temperature, extreme low temperature, temperature year difference, mean temperature in the wettest season, mean temperature in the most dry season, mean temperature in the warmest season, mean temperature in the coldest season, annual precipitation, precipitation in the wettest month, precipitation in the most dry month, variance coefficient of seasonal precipitation, precipitation in the wettest season, precipitation in the most dry season, precipitation in the warmest season, precipitation in the coldest season, respectively. The heatmap and cluster analysis of needle traits and climate variables were performed using TBtools (Chen et al., 2020). The correlation analysis of the needle traits and climate factors of different populations of the three five-needle pines was performed with the Corrplot (v. 0.84) package in R.

Identification of SSRs
The assembled transcriptome data from P. koraiensis, P. sibirica, and P. pumila in this study was employed to identify EST-SSR markers. A total of 58,526, 64,961, and 57,855 unigenes were used to detect the EST-SSRs of P. koraiensis, P. pumila, and P. sibirica using the Simple Sequence Repeat Molecular Marker Developer (SSRMMD), 3 respectively ( Table 2). The parameters for identifying SSR motifs were set with the minimum repeats of 10 for mononucleotides, seven for di-nucleotides, six for tri-, five for tetra-, four for penta-, and four for hexa-nucleotide motifs. Because of the artifactual biosynthesis of mononucleotide repeats during PCR amplification, the mononucleotide class was removed from the SSR distribution plot.

Transcriptome Sequencing and Data Processing
Total RNA was extracted from the needle samples using the RNAprep Pure Plant Kit (DP441, Tiangen). The integrated cDNA libraries were constructed using the methods described by Li et al. (2021). The high-quality cDNA libraries were paired-end sequenced using a 2 × 150 bp run on an Illumina HiSeq 2500. For quality control, the raw sequencing reads were processed and checked by fastp (version 0.12; Chen et al., 2018). 4 Because of the lack of a reference genome, the clean data were used for de novo transcript assemblies using Trinity (v2.11.0) for each species (Grabherr et al., 2011). To remove redundant transcripts CD-HIT (v4.8.1) was applied (Li and Godzik, 2006;Fu et al., 2012). The TransDecoder tool 5 was then employed to predict coding sequences (CDS) from the assembled transcripts. The longest transcript of the isoforms was considered as a unigene for comparative transcriptomic and phylogenetic analysis. The functional annotation of unigenes from each species was based on the uniport 6 and pfam 7 databases. Based on the common protein ID, differential expression analysis of the three five-needle pine species was performed using the DEseq2 R package (v1.26.0; Love et al., 2014). Gene Ontology (GO) enrichment of differentially expressed genes (DEGs) was carried out using Omicsmart tools. 8

Orthologs and Phylogeny of Eleven Pinus Species
Orthologs are set of homologous genes that have the same common ancestral gene after speciation (Koonin, 2005). The transcriptome data of the other eight pine species was obtained from the National Center for Biotechnology Information (NCBI).  (SRR13823418), and Pinus massoniana (SRR13823530). All of the Illumina sequence data for each species was independently assembled using Trinity. The assembled sequences were screened to remove redundancies and then clustered using CD-HIT (v4.8.1). The CDS were predicted by Transdecoder. To identify the evolutionary relationships of the Pinus species, the orthologs of proteins from 11 species were compared using OrthoFinder (Emms and Kelly, 2015) based on sequence similarity. The parameters were: "-inflation 1.5., " the phylogenetic tree construction was done with FastTree, and multiple sequences were aligned using DIAMOND (Buchfink et al., 2015).

Phenotypic Differentiation of Needle Traits of Three Five-Needle Pines
The three five-needle pines possessed specific genetic backgrounds and natural distributions, which resulted in their phenotypic differentiation and climatic adaptation. Analysis of phenotypic variation was carried out to quantify the variation of needle traits. Around 172 needle samples from six natural populations (MG, LJ, ALS, IKT, LS and MES; Table 1; Figure 1A) were collected in present study, which showed clear differences ( Figure 1B). The needles length of P. koraiensis was the longest with a mean of 110 mm, followed by P. sibirica (94.7 mm), and P. pumila, which was the shortest (77.7 mm), showing significant genetic variation (p < 0.05). In addition to NL/NT and FW/NT, there are significant differences between other needle traits among populations of the three five-needle pine species (p < 0.05; Figures 1C-H; Supplementary Table S1).

Correlation of Needle Traits and Climate Factors
To explore the correlations between needle traits and climatic factors, heatmap, cluster, and correlation analyses were performed. The climate data for the six populations of the three fiveneedle pines was obtained from the WorldClim database using DIVA-GIS as shown in Figure 2A and Supplementary Table S2.
The heatmap suggested that the P. sibirica populations (IKT and MG) are mainly distributed in the high latitudes, where there is relatively low annual precipitation (bio12), precipitation in the wettest month (bio13), precipitation in the wettest season (bio16), and precipitation in the warmest season (bio18). For P. koraiensis, the climate factors, including annual mean temperature (bio1), extreme high temperature (bio5), mean temperature in the wettest season (bio8), and mean temperature in the warmest season (bio10) in its natural distribution, were higher than that for P. sibirica and P. pumila. Its average NL, NW, NT, and FW were also the largest compared with the other two five-needle pines. More importantly, these climate factors and needle traits were significantly clustered into the same subgroup in the cladogram, indicating a clear correlation. P. pumila, was naturally distributed in the mountains (high altitude) compared with P. koraiensis and P. sibirica ( Table 1).
The average values of extreme high temperature (bio5), mean temperature in the wettest season (bio8) and mean temperature in the warmest season (bio10) were all significantly lower than that for P. koraiensis and P. sibirica. Interestingly, The ALS population from P. pumila exhibited the lowest annual mean temperature (bio1), extreme low temperature (bio6), mean temperature in the driest season (bio9), and mean temperature in the coldest season (bio11), which indicated that P. pumila has strong cold resistance, and that it can survive extremely low temperatures down to minus −39°C. Four needle traits and eight temperature factors were grouped into the same subcluster.
To further understand the correlation between the needle traits of five-needle pines and climate factors, the Corrplot R package was used for correlation analysis ( Figure 2B). Needle length was significantly correlated with isothermality (p < 0.05), extreme high temperature (p < 0.01), mean temperature in the wettest season (p < 0.05), and mean temperature in the most dry season (p < 0.05). There were significant correlations between the needle width and annual mean temperature (p < 0.05) and extreme high temperature (p < 0.01), mean temperature in the wettest season (p < 0.01), and mean temperature in the driest season (p < 0.01). No correlation was found between the four needle traits and precipitation as well as longitude and latitude.

Transcriptome Sequencing and de novo Assembly
The expression of genes in needles of the three five-needle pine species was different. Using the DEseq2 software, we investigated the DEGs of three pairs of species (PK_vs_PP, PP_vs_PS, and PK_vs_PS). From the Venn diagram, we could see that 292, 130, and 62 DEGs were identified from the PK_vs_PP, PP_vs_PS, and PK_vs_PS ( Figure 3A). Nineteen genes were common to the three pairs. GO function annotation reveled that all of the DEGs were divided into 44 level-2 functional classifications, including 18 terms of biological processes (BPs), 14 terms of cellular components, and 12 terms of molecular functions (MFs). The top terms were cellular processes, metabolism, cell components (CCs), and catalytic activity ( Figure 3B). To investigate more precisely the functions of the identified DEGs, GO enrichment analysis of different pairs of species was performed. For PK vs. PP, the identified DEGs were enriched in genes for photosynthesis (GO:0015979),  Figure 3E).

Ortholog Identification and Functional Characterization of the Three Five-Needle Pine Species
To further investigate the genetic variation of the three fiveneedle pine species, we identified unique and shared gene families and inquired about their biological functions using OrthoFinder. The Venn diagram indicates that 13,134, 15,864, (B) Correlation analysis of 25 characteristic related to needle traits and climate factors. The analysis was performed using R software. One asterisk (*) indicates the p < 0.05; two asterisks (**) means p < 0.01; and three asterisks (***) means p < 0.001. The circle from small to large indicates the corresponding correlation coefficient from low to high. The color scale of red and blue refers to negative correlations and positive correlations, respectively. NL, needle length; NW, needle width; NT, needle thickness; FW, fascicle width; E, longitude; N, latitude; bio1, annual mean temperature; bio2, diurnal temperature difference; bio3, isothermality; bio4, seasonal variance of temperature; bio5, extreme high temperature; bio6, extreme low temperature; bio7, temperature year difference; bio8, mean temperature in the wettest season; bio9, mean temperature in the most dry season; bio10, mean temperature in the warmest season; bio11, mean temperature in the coldest season; bio12, annual precipitation; bio13, precipitation in the wettest month; bio14, precipitation in the most dry month; bio15, variance coefficient of seasonal precipitation; bio16, precipitation in the wettest season; bio17, precipitation in the most dry season; bio18, precipitation in the warmest season; and bio19, precipitation in the coldest season.
Frontiers in Plant Science | www.frontiersin.org and 15,419 gene families were identified among P. koraiensis, P. pumila, and P. sibirica, respectively ( Figure 5A). Of these putative gene families, the largest number of unique gene families were found in P. pumila with 1,604, followed by P. sibirica (1,280), while P. koraiensis (818)  Stomatal complex formation (GO:0010376) could also be noted, suggesting a specific function during the evolution of this species (Figure 5D).

Phylogenetic Analyses
To identify the gene families and their evolution of P. koraiensis (18,221 proteins), P. sibirica (19,337 proteins), and P. pumila (21,854 proteins) and other eight Pinus species, a comparative analysis of the orthologous sets was carried out ( Table 3). The public transcriptome data of the other eight confer species was obtained from the NCBI database. The number of gene families in P. massoniana (17,367) was the highest, while P. parviflora (13,284) contains the lowest number of gene families between the 11 pine species. Additionally, the largest number of single copy genes was found in P. morrisonicola (7,082), followed by P. massoniana (4,673), P. koraiensis (3,808), P. pumila (3,720), and P. yunnanensis (2,955). To clarify the evolutionary status of the five-needle pines, phylogenetic analysis was conducted using orthologous genes. The results were in good agreement with classical taxonomy of Pinus species. In a phylogenetic tree, the candidate 11 Pinus species were grouped into two clear subsections, which consists of a two/three-needle pine group I, including P. yunnanensis, P. elliottii, P. tabuliformis, and P. massoniana and a fiveneedle pine group II including P. koraiensis, P. pumila, P. sibirica, P. armandii, P. squamata, P. parviflora, and P. morrisonicola according to the orthologous genes (Figure 6). Of these species, the three five-needle pines (P. koraiensis, P. sibirica, and P. pumila) of our interest were clustered, showing a close relationship. P. koraiensis and P. sibirica possess a high level of genetic relatedness, both showing a tall tree form, while they specifically separate from P. pumila (showing a shrub like form) at the species level. Around 10,153 gene families were conserved across P. koraiensis, P. pumila, and P. sibirica. GO enrichment analysis of the specific gene families in P. koraiensis (B), P. sibirica (C), and P. pumila (D). The x-axis indicates GO terms belonging to BPs and MFs, and the y-axis represents the percent of genes.

Phenotypic Variation and Environmental Adaptation of Three Five-Needle Pines
Plant leaf/needle traits have phenotypic plasticity due to climate change (Asao et al., 2020). Previous studies found phenotypic traits of P. koraiensis differed significantly between and within different populations (Tong et al., 2019). Also, the leaf fresh weight, dry weight, and moisture content of trees in different populations of P. pumila show significant (p < 0.01) differences (Wei et al., 2018). However, the relationship between genetic variation and environmental adaptation for needle traits among P. koraiensis, P. sibirica, and P. pumila was still unclear. In the present study, there are significant differences in needle length and needle width among these three five-needle pines. The needle length of P. koraiensis was the longest, followed by P. sibirica and P. pumila, which may be related to their specific geographic distribution during recent evolution. In the natural range of P. koraiensis, high precipitation and high temperature occur in the summer, forming suitable climatic conditions, for its needle growth and development. A high annual mean temperature and extreme high temperature were also found. In contrast to P. koraiensis, P. sibirica, and P. pumila are mainly distributed in alpine or high latitude areas, characterized by relatively low temperatures, which may be limiting factors for the species distribution. Correlation analysis also confirmed these results and indicated that the genetic variation of needle traits significantly correlated with temperature. Although P. sibirica and P. pumila can grow in the regions with relatively low temperature, P. pumila was mainly distributed in the alpine areas with high precipitation and high average latitude of 1,689 m. Thus, climate factors, especially temperature and altitude, may appear to be the main factors controlling the growth and development of needles and thereby determine the species distribution of P. koraiensis, P. sibirica, and P. pumila.

The Mechanisms of Adapting to Habitat Conditions
Although plants grow in specific geographical environments, their physiology and metabolism need to take place under appropriate temperature, light, water, and rhizosphere environments. To further understand the regulatory mechanisms controlling the variation of needles in these three five-needle pines, transcriptome analysis was performed.
In the present study, the DEGs of PK_vs_PP specifically were assigned to photosynthetic and homeostatic terms in GO enrichment analysis, suggesting that the DEGs involved in needle morphogenesis of P. koraiensis and P. pumila participate in complex metabolic processes. Previous studies have confirmed that light intensity, photoperiod and light composition affect many characteristics of plant development Jan et al., 2021). The GO analysis indicates a role for genes acting in isoprenoid metabolism, which regulates primary and secondary metabolism in plants (Lange et al., 2020;Mahmoud et al., 2021). The correlation may be the result of plant adaptation to the environment during evolution (Pulido et al., 2012), possibly to plant-environment interactions (Tadjouate et al., 2021). The upregulated DEGs in the needles of P. pumila, may therefore be related to its adaptation to an alpine habitat, determining needle growth and development ( Figure 7A; Supplementary Table S9).
Pinus koraiensis and P. sibirica are five-needle pines (subgroup Strobus) of the single vascular bundles with similar morphology. In this study, the GO enrichment analysis of DEGs between P. koraiensis and P. sibirica showed that these genes are mainly related with primary metabolism through the tricarboxylic acid cycle, citrate metabolism, and porphyrin catabolism and involved in respiration and chlorophyll formation. The expression of one gene (SGR_CAPAN) in P. koraiensis was higher than that in P. sibirica, and it mainly triggers chlorophyll degradation during leaf senescence (Figure 7B). The distribution pattern of chlorophyll content in needles and response to water stress of P. koraiensis and P. sibirica display significant differences, supporting the results reported here (Ayijiamali et al., 2013). In addition, The GO terms "protein phosphorylation" were also identified in enrichment analysis of PK vs. PS. Protein phosphorylation plays an important role in plant signal transduction, epigenetic regulation, and growth and development (Li et al., 2020). The heatmap and correlation analysis from needle traits and climate data indicated that there are relatively high temperatures and high precipitation in natural populations of P. koraiensis compared with that of P. sibirica, perhaps related to leaf development.
Pinus sibirica and P. pumila are important conifer trees in Daxinganling Mountains of China. Presumably, owing to the difference in altitude, the morphology and chemical composition of the needles show clear differentiation. Although the needle length of P. pumila was the lowest in the present study, P. pumila possesses strong low temperature tolerance compared with P. sibirica, which may be related to biosynthesis of cryoprotective compounds, similar to results previously reported (Jin et al., 1998). Carbohydrate biosynthesis was a major GO term in PP vs. PS, presumably due to the involvement of starch and sugar in leaf/needle development. Similar results were also found in other plants such as Vitis vinifera (Gao et al., 2021), Hevea brasiliensis (Gersony et al., 2020), and Pinus thunbergia . Additionally, regulation of morphogenesis, organization of external encapsulating structures and cell wall biosynthesis GO terms were identified, which may be related to the difference in stem morphology (apical dominance) between P. sibirica and P. pumila. Pinus sibirica is a perennial tall tree, while P. pumila is a creeping shrub, and to date, there is no erect type for this species. Therefore, we presume that the DEGs are probably involved in plant morphogenesis including stem and leaf development, especially the formation of cell wall organization (such as BC10_ORYSJ and MNS3_ARATH) as they show high expression on the heatmap ( Figure 7C). Similar results were also found in the GO enrichment of gene families of P. pumila, especially for cellulose catabolism. One gene (BRH1_ ARATH), involved in the brassinosteroid (BRs) signaling pathway was identified, which specifically regulates the growth and development of rosette leaves and can be a candidate target gene for needle development (Figure 7D).

Identification of Evolutionary Status
Owing to ecological and economical values, P. koraiensis, P. sibirica, and P. pumila were widely employed for reforestation and industrial uses. Investigating the genetic relationship and taxonomic status among these three species is significant for elucidating their evolutionary relationships and accelerating genetic improvement. Due to the limited amount of genomic information, the identification and extent of functional gene relationships of the present five-needle pines remains confusing. Transcriptome sequence has become a powerful tool for functional gene mining, estimation of genetic diversity, and identification of gene specific relationships (Estelle et al., 2021;Yu et al., 2021). Using RNA-seq technology, analysis of orthology for phylogenetic relationships and evolution between different species, have been carried out (Peng et al., 2019). In the present study, pairwise orthologs from different transcriptomes from 11 Pinus species were identified and used for estimation of evolutionary distance. Phylogenetic analysis showed that P. koraiensis, P. sibirica, and P. pumila in the present study belong to the same Pinus subsection (Strobus), displaying a close genetic relationship, consistent with the existing taxonomy. We observed that P. koraiensis and P. sibirica maintain relatively close genetic relationship compared with P. pumila. Similar results were reported in previous study of spatiotemporal evolution of global pines (Jin et al., 2021). Furthermore, in subgroup II, P. pumila and P. parviflora were divided into the same group of fiveneedle pines, which may correspond to the traits of short needles among them. The genetic relationships based on the EST-SSRs are highly consistent with the results of a neighborjoining (NJ) tree inferred from chloroplast and mitochondrial DNA fragments in previous studies (Yang, 2014). As a whole, there is a close genetic relationship between P. koraiensis and P. sibirica, but their biological characteristics changed to adapt to variation in temperature, precipitation, and location. Currently, there is limited genetic information available for the study of phenotypic variation, environmental adaptation, and evolution of these three pine species. In this study, we have systematically investigated the phenotypic and genetic differentiation of these three pine species combining morphological, climatic, and transcriptome data. These results provide new insights into gene evolution and contribute to the future genetic improvement of these three pine species.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: https://ngdc.cncb.ac.cn/ search/?dbId=&q=PRJCA006786, PRJCA006786.

AUTHOR CONTRIBUTIONS
XZ, RS, and WM conceived and designed the research. XL and KC performed the experiments. QZ, HL, and XW analyzed the data. XL wrote the manuscript. MT and RS revised the manuscript. XZ provided the funding. All authors contributed to the article and approved the submitted version.