The structure of the tetraploid sour cherry ‘Schattenmorelle’ (Prunus cerasus L.) genome reveals insights into its segmental allopolyploid nature

Wöhner, Thomas W.; Emeriewen, Ofere F.; Wittenberg, Alexander H. J.; Nijbroek, Koen; Wang, Rui Peng; Blom, Evert-Jan; Schneiders, Harrie; Keilwagen, Jens; Berner, Thomas; Hoff, Katharina J.; Gabriel, Lars; Thierfeldt, Hannah; Almolla, Omar; Barchi, Lorenzo; Schuster, Mirko; Lempe, Janne; Peil, Andreas; Flachowsky, Henryk

doi:10.3389/fpls.2023.1284478

ORIGINAL RESEARCH article

Front. Plant Sci., 01 December 2023

Sec. Functional and Applied Plant Genomics

Volume 14 - 2023 | https://doi.org/10.3389/fpls.2023.1284478

The structure of the tetraploid sour cherry ‘Schattenmorelle’ (Prunus cerasus L.) genome reveals insights into its segmental allopolyploid nature

Thomas W. Wöhner^1*

Ofere F. Emeriewen¹

Alexander H. J. Wittenberg²

Koen Nijbroek²

Rui Peng Wang²

Evert-Jan Blom²

Harrie Schneiders²

Jens Keilwagen³

Thomas Berner³

Katharina J. Hoff⁴

Lars Gabriel⁴

Hannah Thierfeldt⁴

Omar Almolla⁵

¹Institute for Breeding Research on Fruit Crops, Julius Kühn Institute (JKI) – Federal Research Centre for Cultivated Plants, Dresden, Saxony, Germany
²KeyGene N.V., Wageningen, Netherlands
³Institute for Biosafety in Plant Biotechnology, Julius Kühn Institute (JKI) – Federal Research Centre for Cultivated Plants, Quedlinburg, Saxony-Anhalt, Germany
⁴Institute of Mathematics and Computer Science, University of Greifswald, Greifswald, Mecklenburg-Western Pomerania, Germany
⁵Dipartimento di Scienze Agrarie, Forestali e Alimentari (DISAFA) – Plant Genetics, University of Turin, Grugliasco, Italy

Sour cherry (Prunus cerasus L.) is an important allotetraploid cherry species that evolved in the Caspian Sea and Black Sea regions from a hybridization of the tetraploid ground cherry (Prunus fruticosa Pall.) and an unreduced pollen of the diploid sweet cherry (P. avium L.) ancestor. Details of when and where the evolution of this species occurred are unclear, as well as the effect of hybridization on the genome structure. To gain insight, the genome of the sour cherry cultivar ‘Schattenmorelle’ was sequenced using Illumina NovaSeqTM and Oxford Nanopore long-read technologies, resulting in a ~629-Mbp pseudomolecule reference genome. The genome could be separated into two subgenomes, with subgenome Pce_S_a originating from P. avium and subgenome Pce_S_f originating from P. fruticosa. The genome also showed size reduction compared to ancestral species and traces of homoeologous sequence exchanges throughout. Comparative analysis confirmed that the genome of sour cherry is segmental allotetraploid and evolved very recently in the past.

1 Introduction

Cherries include several species of the genus Prunus, which belong to the sub-family Spiraeoideae in the plant family Rosaceae (Potter et al., 2007). The two economically most important cherry species worldwide are the sweet cherry (Prunus avium L.) and the sour cherry (Prunus cerasus L.). Both species are thought to have originated in the Caspian Sea and Black Sea region (Quero-García et al., 2019). Sour cherry commercial cultivation is concentrated in Eastern and Central Europe, North America, and Central and Western Asia, covering 217,960 ha. In 2021, the global production reached 1.51 million tons of fruit, with a production value of $1.2 billion in 2020 (https://www.fao.org/faostat/en/#data). Primarily grown for the production of jams, juices, and preserved or dried whole fruits, they also find use in dairy products and baked goods. Sour cherries display significant variation in morphological traits, including fruit characteristics and tree growth. This diversity is found within ecotypes and includes traits like cold tolerance and growth habits, which have been selectively bred across Europe over time (Dirlewanger et al., 2007; Hancock, 2008; Schuster et al., 2017). However, just a small number of cultivars actually dominate the cultivation of sour cherry. ‘Schattenmorelle’ is the dominant cultivar (cv) in Middle Europe (Figure 1), whereas sour cherry production in the United States is still based on Montmorency (Quero-García et al., 2019). ‘Schattenmorelle’ was first described in France and today it is known in many countries with different names. In Poland, for example, it is called Łutovka, and in France, it is called Griotte du Nord or Griotte Noir Tardive. The sour cherry is an allotetraploid with 2n=4x=32 chromosomes. It originated as a hybrid of an unreduced 2n pollen grain of P. avium (2n=2x=16) and a 1n egg cell of the tetraploid ground cherry P. fruticosa (2n=4x=32) (Kobel, 1927; Oldén and Nybom, 1968). Evidence of hybridization events between sweet and ground cherries has already been found several times in areas where both species occur simultaneously (Macková et al., 2018; Hrotkó et al., 2020). The resulting hybrids were usually triploid and were assigned to the secondary species P. ×mohacsyana Kárpáti. Natural occurrences of tetraploid sour cherries can be found in Eastern Turkey and the Caucasus region. There, they grow in forests and are used as wild forms for fruit production. The real area of origin is not known so far. Although P. cerasus can also be found in the wild in Europe, it is rather unlikely that those sour cherries are spontaneous hybrids. Since sour cherries are cultivated almost in many areas of the Northern hemisphere, they are often rather allochthonous individuals. The origin of the sour cherry thus seems to be based on a few hybridization events. The results obtained by Oldén and Nybom (1968) in experiments on the resynthesis of the species P. cerasus confirmed this hypothesis. The progeny from crosses between sweet and ground cherry showed the characteristic phenotype of the sour cherry. Studies based on chloroplast DNA markers strongly suggest that hybridization between P. avium and P. fruticosa led to the emergence of P. cerasus at least twice (Dirlewanger et al., 2007). Furthermore, the hypothesis could also be confirmed by genomic in situ hybridization (Schuster and Schreibner, 2000) and transcriptome sequencing (Bird et al., 2022).

FIGURE 1

Figure 1 Morphology of P. cerasus L. ‘Schattenmorelle’. (A) Mature tree habitus, (B) leaves, (C) inflorescence, (D) fruits, (E) genome sequencing and assembly strategy (created with BioRender.com).

The sour cherry genome is estimated at 599 Mbp (Dirlewanger et al., 2007) with two subgenomes, each featuring eight chromosomes in the haploid set. One subgenome originates from sweet cherry (Pce_a), and the other from ground cherry (Pce_f). Nonetheless, the genome is not entirely allopolyploid, as there is long-standing suspicion that segments of the sour cherry genome are of different origin (Raptopoulos, 1941; Oldén and Nybom, 1968; Beaver and Iezzoni, 1993; Schuster and Wolfram, 2008). The origins and stability of hybridization and polyploidization between sweet and ground cherry remain unexplored. Cai et al. (2018) suggest that a combination of multi- and bivalent pairing may have led to chromosome segregation imbalance in sour cherry, indicating an ongoing genome stabilization process (Mason and Wendel, 2020). Recent genome sequencing advances, including studies by Zhang et al. (2021), Edger et al. (2019); Bertioli et al. (2019), Wu et al. (2021), and Wang et al. (2019), provide insight into the intricate structure and evolution of polyploid genomes. We present a high-quality pseudo-chromosome-level genome assembly of tetraploid sour cherry ‘Schattenmorelle’ (referred to as Pce_S), created using a combination of Illumina NovaSeq short-read and Oxford Nanopore long-read sequencing. Hi-C technology was employed to scaffold the sequences into chromosomes. Additionally, we generated a full-length transcriptome of ‘Schattenmorelle’ with PacBio Sequel II SMRT cell long-read technology. Comparative sequence and amino acid analyses were conducted across datasets of Prunus avium cv Tieton (Pa_T) and Prunus fruticosa ecotype Hármashatárhegy (Pf_eH), representing the two ancestral species, alongside the two subgenomes of ‘Schattenmorelle’ (Pce_S_a and Pce_S_f). These analyses shed light on the evolution of sour cherry and reveal homoeologous exchanges (HE) within the sub-genomic structure, explaining the segmental allopolyploidy in the sour cherry genome.

2 Materials and methods

2.1 Plant material, DNA and RNA extraction, sequencing and Iso-Seq analysis

Snap-frozen Prunus cerasus L. ‘Schattenmorelle’ (accession KIZC99-2, Figure 1, supplements 1.1) young leaf material was sent to KeyGene N.V. (Wageningen, The Netherlands). High-molecular-weight extracted DNA (Wöhner et al., 2021) was used to generate 1D ligation (SQK-LSK109) libraries that were subsequently sequenced on two Oxford Nanopore Technologies (ONT) R9.4.1 PromethION flow cells. The same material was used to generate an Illumina PCR free paired-end library (insert size of ~550 bp) that was sequenced on a Illumina NovaSeq™ platform using 150-bp and 125-bp paired-end sequencing.

Snap-frozen tissues from buds, flowers, leaves, and fruits were collected in the field, and total RNA was extracted with Maxwell^® RSC Plant RNA Kit (Promega). Two pools were generated and used for PacBio Iso-Seq library preparation (Procedure & Checklist – Iso-Seq™ Express Template Preparation for Sequel^® and Sequel II Systems, PN 101-763-800 Version 02). Each library pool was sequenced on a single 8M ZMW PacBio Sequel II SMRT cell (supplements 1.1). Obtained full-length reads with the 5′-end primer, the 3′-end primer, and the poly-A tail were filtered, and these sequences were trimmed off. Transcripts containing (artificial) concatemers were completely discarded. Isoforms (consensus sequence) generated by full-length read clustering (based on sequence similarity) were finally polished with non-full-length reads using Arrow (SMRT Link v7.0.0, https://www.pacb.com/wp-content/uploads/SMRT_Tools_Reference_Guide_v600.pdf).

2.2 De novo assembly and scaffolding

The aligner Minimap2 (v2.16-r922, Li, 2018) and assembler Miniasm (v0.2-r137-dirty, Li, 2016) were used for raw data assembly generation. Racon (vv1.4.10, Vaser et al., 2017) and Pilon (v1.22, Walker et al., 2014) were used for base-quality improvement with raw ONT and Illumina read data. Chromosome-scale scaffolding was performed by Phase Genomics (Seattle, Washington, USA) with Proximo Hi-C (supplements 1.2). The resulting assembly was designated as 20-WGS-PCE_<Avium|Fruticosa>.2.0 _<Contig|Scaffold> (Figure 1E).

2.3 Correctness, completeness, and contiguity of the Prunus cerasus genome sequence

The BUSCO (Benchmark Universal Single-Copy Orthologs – Galaxy Version 4.1.4) software was used for quantitative and quality assessment of the genome assemblies based on near-universal single-copy orthologs. The long terminal repeat (LTR) assembly Index (LAI) (Ou et al., 2018) was calculated with LTR_retriever 2.9.0 (https://github.com/oushujun/LTR_retriever) to evaluate the assembly continuity between the final genome sequence of P. cerasus ‘Schattenmorelle’ and Prunus fruticosa ecotype Hármashatárhegy (Wöhner et al., 2021, Pf_1.0), P. avium Tieton (Wang et al., 2020), and P. persica Lovell (Verde et al., 2017), respectively. LTR_harvest (genometools 1.6.1 implementation) was used to obtain LTR-RT candidates. The genome size was also estimated by k-mer analysis (supplements 1.3) using Illumina short read data. The merged datasets were subsequently used to generate a histogram dataset representing the k-mers from all datasets. GenomeScope (Galaxy Version 2.0, Ranallo-Benavidez et al., 2020) was used to generate a histogram plot of k-mer frequency of different coverage depths using the tetraploid ploidy level (k-mer length 19). Marker sequences and genetic positions from five available genetic sour cherry maps (M172x25-F1, US-F1, 25x25-F1, Montx25-F1, and RE-F1) and 14,644 SNP markers (9 + 6k array) were downloaded from the Genome Database for Rosaceae (GDR, https://www.rosaceae.org/). The marker sequences were mapped on the chromosome sequences using the mapping software bowtie2 (Galaxy Version 2.5.0+galaxy0, Langmead and Salzberg, 2012) implementation on the Galaxy server (https://usegalaxy.org) with standard settings.

2.4 Structural and functional annotation

For an interspecies repeat comparison, a species-specific repeat library was generated with RepeatModeler open-1.0.11, and the genome was subsequently masked with RepeatMasker open-4.0.7. For structural genome annotation, another species-specific repeat library for PCE_1.0 was generated with RepeatModeler2 (Flynn et al., 2020) version 2.0.2, and the genome was subsequently masked with RepeatMasker 4.1.2. (Further details on the repeat masking software configuration are available in supplements 1.4.1.).

To generate extrinsic evidence for structural annotation of protein coding genes, short-read RNA-Seq library SRR2290965 (Jo et al., 2015) was aligned to the genome using HiSat2 version 2.1.0 (Kim et al., 2019). The output SAM file was converted to BAM format using SAMtools (Li et al., 2009). The resulting alignment file was further used by both BRAKER1 (Hoff et al., 2016; Hoff et al., 2019) and GeMoMa (Keilwagen et al., 2016; Keilwagen et al., 2019).

Furthermore, a custom protocol was used for integrating long-read RNA-Seq data into genome annotation (supplements 1.4.2). In short, protein coding genes were called in Cupcake transcripts using GeneMarkS-T (Tang et al., 2015), and these predictions were converted to hints for BRAKER1. In addition, intron coverage information from long-read to genome-spliced alignment with Minimap2 (Li, 2018) was provided to BRAKER1.

A combination of BRAKER1 (Hoff et al., 2016; Hoff et al., 2019), BRAKER2 (Brůna et al., 2021), TSEBRA (Gabriel et al., 2021), and GeMoMa (Keilwagen et al., 2016) was used for the final annotation of protein coding genes. BRAKER pipelines use a combination of evidence-supported self-training GeneMark-ET/EP (Lomsadze et al., 2014; Brůna et al., 2020) (here version 4.68) to generate a training gene set for the gene prediction tool AUGUSTUS (Stanke et al., 2008; here version 3.3.2). BRAKER1 version 2.1.6 was here provided with BAM-files of from short- and long-read RNA-Seq to genome alignments, and with gene structure information derived from Cupcake transcripts using GeneMarkS-T. This generated a gene set that consists of ab initio and evidence-supported predictions. A separate gene set was generated with BRAKER2, which uses protein to generate a gene set. We used the OrthoDB version 10 (Kriventseva et al., 2019) partition of plants in combination with the full protein sets of Prunus fruticosa (Wöhner et al., 2021), Prunus armeniaca (GCA 903112645.1), Prunus avium (GCF_002207925.1), Prunus dulcis (GCF_902201215.1), Prunus mume (GCF_000346735.1), and Prunus persica (GCF_000346465.2) as input for BRAKER2. Both the BRAKER1 and BRAKER2 AUGUSTUS gene sets were combined with a GeneMarkS-T derived gene set using TSEBRA (Gabriel et al., 2021) from the long_reads branch on GitHub with a custom configuration file (supplements 1.4.3.) incorporating evidence from BRAKER1 and BRAKER2.

GeMoMa was run on the genome assembly of ‘Schattenmorelle’ using 14 reference species and experimental transcript evidence (supplements 1.4.4). GeMoMa gene predictions of each reference species were combined with TSEBRA predictions using the GeMoMa module GAF, and subsequently, UTRs were predicted in a two-step process based on mapped Iso-Seq and RNA-seq data using the GeMoMa module AnnotationFinalizer (supplements 1.4.5). First, UTRs were predicted based on Iso-Seq data. Second, UTRs were predicted based on RNA-seq data for gene predictions without UTR prediction from the first step. An assembly hub for visualization of the Prunus cerasus genome with structural annotation was generated using MakeHub (Hoff, 2019; supplements 2). The functional annotation was performed with the Galaxy Europe implementation of InterProScan (Galaxy Version 5.59-91.0+galaxy3, Zdobnov and Apweiler, 2001; Quevillon et al., 2005; Hunter et al., 2009; Cock et al., 2013; Jones et al., 2014). The chloroplast and mitochondria sequences were annotated with GeSeq (Tillich et al., 2017, supplements 1.4.5).

2.5 Identification of syntenic regions

Structural comparison of orthologous loci between the subgenomes Pce_S_a and Pce_S_f of Prunus cerasus and the two genotypes Pa_T and Pf_eH as representatives of the two genome donor species P. avium and P. fruticosa was calculated with the final annotations using SynMap2 (Haug-Baltzell et al., 2017) available at the CoGe platform (https://genomevolution.org/coge/). Analysis on triplication events was performed with standard settings and Last as Blast algorithm at a ratio coverage depth of 3:3 in SynMap2 (Haug-Baltzell et al., 2017).

2.6 Identification of homoeologous exchange regions

Homoeologous exchanges were identified on the amino acid, transcript, and genomic level.

2.6.1 Calculation of amino acid identity

Identity of amino acids (IAA) between all reference annotation homology-based gene prediction was calculated by GeMoMa using the default parameters. Subsequently, the Pce_S genome was divided into 250k windows, and the percentage of proteins showing a higher IAA between Pf_eH (Wöhner et al., 2021) and Pa_T (Wang et al., 2020) to the respective subgenome (Pce_S_a and Pce_S_f) was determined. The percentage of proteins in this window, which were more similar to Pa_T, was finally subtracted from the percentage of proteins that were more similar to Pf_eH. A proportion of transcripts with higher intraspecific amino acid identity (between Pa_T and Pce_S_a or Pf_eH and Pce_S_f) is expected compared to the proportion of transcripts with interspecific amino acid identity match (Pf_eH and Pce_S_a or Pa_T and Pce_S_f). Opposite cases indicate potential translocations between the two subgenomes Pce_S_a and Pce_S_f and were plotted into a circos plot (Figure S1).

2.6.2 Read mapping and coverage analysis

RNAseq raw data published by Bird et al. (2022) were obtained from NCBI sequence read archive (SRA) for the following species: P. cerasus (SRX14816146, SRX14816142, and SRX14816138), P. fruticosa (SRX14816141), P. avium (SRX14816143), P. canescens (SRX14816137), P. serrulata (SRX14816136), P. mahaleb (SRX14816140), P. pensylvanica (SRX14816144), P. maackii (SRX14816139), and P. subhirtella (SRX14816145). Reads were adapter and quality trimmed using the software Trim Galore (version 0.6.3, parameters –quality 30 –length 50). Trimmed reads were mapped against the P. cerasus subgenomes Pce_S_a and Pce_S_f using STAR (version 2.7.8a, parameter –twopassMode Basic). The subsequent analysis was performed in accordance to Keilwagen et al. (2022). The Pce_S genome was divided into 250k windows. The percentage of covered bases using RNAseq data of P. cerasus (SRX14816146, SRX14816142, and SRX14816138) was estimated at a depth of 1 for each window. The same was done with all other RNAseq datasets. The percentage of covered bases from P. avium (SRX14816143) was subtracted from the percentage of covered bases from P. cerasus (SRX14816146, SRX14816142, and SRX14816138). The same was done using the reads of P. fruticosa (SRX14816141). For subgenome Pce_S_a, it is expected that the intraspecific difference for transcripts of dataset P. avium (SRX14816143) is lower (close to 0) than the interspecific difference for transcripts of dataset P. fruticosa (SRX14816141) and vice versa. Opposite cases indicate potential homoeologous exchanges between the two subgenomes Pce_S_a and Pce_S_f and were plotted into a circos plot (Figure S1).

The nucleotide short reads from Pce_S were mapped against the genomes of the two ancestral species Pa_T and Pf_eH. Subsequently, the mapped reads were filtered using samtools for mapped reads in proper pair (-f 3) and primary alignments and not supplementary alignment (-F 2304). Those reads were divided into four groups according to the following criteria: (1) unique match to Pa_T, (2) unique match to Pf_eH, (3) match to Pa_T and Pf_eH, and (4) no match to Pa_T and Pf_eH (unique to Pce_S). The first two separated read sets were then re-mapped against the subgenomes Pce_S_a and Pce_S_f. The percentage of covered bases was calculated for a 100k window. For the subgenomes of Pce_s, the percentage of intraspecific covered bases (Pce_S_a to Pa_T, Pce_S_f to Pf_eH) should be higher compared to the percentage of interspecific covered bases (Pce_S_a to Pf_eH, Pce_S_f to Pa_T). The opposite case indicates possible translocations and were plotted into a circos plot (Figure S1). Additionally, regions of the ‘Schattenmorelle’ genome assembly were determined that are uniquely covered by Pa_T and Pf_eH filtered read sets.

2.7 LTR insertion estimation

The difference (identity) of left and right LTR was calculated using the script EDTA_raw.pl from the software EDTA version 1.9 (https://github.com/oushujun/EDTA, Ou et al., 2019). As input files, we used the genome sequences of P. cerasus (Pce_S_a and Pce_S_f), Pa_T (NCBI BioProject acc. no. PRJNA596862), Pf_eH (NCBI BioProject acc. no. PRJNA727075), and a curated library of representative transposable elements from Viridiplantae (https://www.girinst.org/repbase/). Because trees are not annual plants, the identity obtained from the resulting.pass.list file was used for the estimation of generation time after LTR insertion using the formula T=K/2µ (K is the divergence of the LTR = 1 − identity) assuming a Prunus-specific mutation frequency of µ=7.7 × 10⁻⁹ (Xie et al., 2016) per generation.

2.8 Protein clustering, multiple sequence alignment, and divergence of time estimation

The protein datasets from Pce_S_a and Pce_S_f, Pa_T, Pf_eH, Pp (Prunus persica Whole Genome Assembly v2.0, v2.0.a1), Pm (Prunus mume Tortuosa Genome v1.0), Py (Prunus yedoensis var. nudiflora Genome v1.0), Md (Malus x domestica HFTH1 Whole Genome v1.0), and At (TAIR10.1, RefSeq GCF_000001735.4) from the annotation step were uploaded to Galaxy_Europe server as.fasta. The Proteinortho (Galaxy Version 6.0.32+galaxy0) was used to find orthologous proteins within the datasets. MAFFT (Galaxy Version 7.505+galaxy0) was used to align the obtained single copy orthogroups. The final alignments were merged with the Merge.files function (Galaxy Version 1.39.5.0). Finally, the alignments were concatenated into a super protein and the final sequences were aligned with MAFFT. A phylogenetic tree was reconstructed with RAxML (maximum likelihood based inference of large phylogenetic trees, Galaxy Version 8.2.4+galaxy3) and the obtained.nhx file was reformatted as.nwk file for further processing using CLC Mainworkbench (21.0.1, QIAGEN Aarhus A/S). Evolutionary analyses were conducted in MEGA X (Kumar et al., 2018). Estimation of pairwise divergence time was performed according to Shirasawa et al. (2019) with a divergence time from the reference species peach and apple (34–67 Mya, www.timetree.org). Specific parameters for the calculation are listed in supplements.

3 Results

3.1 De novo assembly and scaffolding

A total of 68 Gb of paired-end Illumina sequencing data were obtained, corresponding to ~114× coverage of the estimated genome size of 599 Mbp. Using two PromethION flow cells, a total of 178 Gb was produced (~300× coverage). The longest ONT reads that together resulted in a 20× coverage were selected for assembly, having a minimum read length of 64,214 bp. Table S1 summarizes the properties of the 20-WGS-PCE.1.0 assembly after polishing. The Prunus avium and Prunus fruticosa contigs were then separated successfully by read mapping and contig selection that fit the hypothesis of 1 or more clear coverage peaks from the 20-WGS-PCE.1.0 assembly. The resulting two datasets, representing the subgenomes Pce_S_a and Pce_S_f, were purged and used for HI-C scaffolding. After manual curation of both datasets, the final consensus genome assembly was scaffolded from 935 and 865 contigs of the Pce_S_a and Pce_S_f subgenomes, respectively. Eight clusters ideally representing the eight chromosomes were obtained for each subgenome (Figure S2). The final genome sequence is 628.5 Mbp long and consists of eight chromosomes for each subgenome (Figure 2). A total of 269 Mbp were assigned to subgenome Pce_S_a (N50 of 31.5 Mbp) and 299.5 Mbp (N50 of 39.4 Mbp) to Pce_S_f. Eighty-six and 134 unassembled contigs were unassigned to chromosomes for Pce_S_a (22.7 Mbp) and Pce_S_f (37.3 Mbp), respectively. The longest scaffold from Pce_S_a is 52.8 Mbp and 53.5 Mbp from subgenome Pce_S_f (Table S2). Except for chromosome five, all scaffolds obtained from subgenome Pce_S_f are longer compared to the corresponding chromosome of subgenome Pce_S_a. The chloroplast sequence obtained was 158,178 bp and the mitochondrial sequence was 343,516 bp long (Figure S3).

FIGURE 2

Figure 2 The genome of P. cerasus ‘Schattenmorelle’. Circos plot of 16 pseudomolecules of the subgenomes of Pce_S_a and Pce_S_f. (A) Chromosome length (Mb). (B) Gene density in blocks of 250k. (C) Distribution of repetitive sequences in blocks of 250k. (D) Gypsy elements in blocks of 250k. (E) Copia elements in block of 250k. (F) GC content in blocks of 1 Mb. (G) The inner ring shows markers from the 6 + 9k SNP array located on both subgenomes.

3.2 Transcriptome sequencing, Iso-Seq analysis, structural and functional annotation

The total repeat content of the entire Pce_S genome sequence was 49.7%. The total repeat content of subgenome Pce_S_a was 48.3% and that of subgenome Pce_S_f was 50.9% (Table 1). The class I elements Gypsy comprised the largest fractions of repetitive elements in the Pce_S genome sequence. A quantitative reduction between elements of this family was also detected in the Pce_S_f subgenome with a difference of 10.7%. Several elements could only be detected in one genotype of the two ancestral species. The TAD1 class I element only occurred in Pf_eH, while class II, order TIR - IS3EU, P, and Sola-3 were specifically detected in the genome of Pa_T. No element was found, which was only present in one of the two subgenomes of Pce_S. Several elements occurred in both subgenomes (class I, LINE – R1-LOA, RTE-X, SINE – tRNA-DEU- L2, class II, TIR – TcMar- Mariner, and DADA elements) but were not detected in Pf_eH and Pa_T. The class I elements of the order LTR (ERV1, Pao) and Academ/-2 were only detected in one of the two genomes representing the ancestral species and in Pce_S_a and Pce_S_f. Iso-Seq results are summarized in Table S3. In total, 248,218 high-quality isoforms have been identified. Both the high- and the low-quality isoforms have been used for genome annotation where each gene might be represented by multiple isoforms. A total of 107,508 transcripts (Pce_S_a: 53,497; Pce_S_f: 54,011) were predicted from the 60,123 gene models (Pce_S_a: 29,069; Pce_S_f: 31,054) obtained by structural annotation procedures (Table S4). Interproscan analysis detected 1,381,841 functional annotations (Pce_S_a: 649,310; Pce_S_f: 687,531) using 16 databases. Two-thirds (71,870) of the transcripts were assigned with GO terms and 9,114 were found to be involved in annotated pathways.

TABLE 1

Table 1 Characterization of repetitive sequences of P. fruticosa ecotype Hármashatárhegy (Pf_eH) compared to P. avium Tieton (Pa_T), P. persica Lovell, and the two subgenomes of P. cerasus ‘Schattenmorelle’ Pce_S_a and Pce_S_f.

3.3 Completeness and quality of the genome and transcriptome

BUSCO completeness of the Pce_S genome was 99.0% (S: 16.7%, D: 82.3%, F: 0.4%, M: 0.6%, n: 1,614) respectively and comparable with P. persica Lovell (99.3%) and P. avium Tieton (98.3%, Figure S4). Completeness of subgenome Pce_S_a was higher (C: 89.4%, S: 84.8%, D: 4.6%, F: 1.5%, M: 9.1%, n: 1,614) compared to subgenome Pce_S_f (C: 87.1%, S: 80.9%, D: 6.2%, F: 1.2%, M: 11.7%, n: 1,614). The calculated LAI index was 6.3 and low in comparison to other genomes (Pp_L: 17.6, Pa_T: 10.3, Pf_eH: 13.1). The LAI index for subgenome Pce_S_a was 7.1. The LAI index for subgenome Pce_S_f was 5.6 (Figure S5). The nucleotide heterozygosity rates were 94.9% for aaaa, 2.39 for aaab, 2.4 for aabb, 0.001 for aabc, and 0.308 for abcd (Figure 3). The comparison of genetic position and physical position of up to 1,856 markers of the five genetic sour cherry maps (Table S5A) showed a good co-linearity to the genome sequence (Figure S6). BUSCO evaluation on completeness of the annotated proteins resulted in 99.2% [C: 99.2% (S: 8.4%, D: 90.8%), F: 0.4%, M: 0.4%, n: 1,614]. The chloroplast sequence obtained contained 427 genes, 21 rRNAs, and 136 tRNAs, whereas the mitochondrial sequence contained 188 genes, 3 rRNAs, and 152 tRNAs (Figure S3). An ab initio and homology-based gene prediction with 14 reference species was performed (IAA). Based on the homology prediction, 34% of the proteins showed the highest IAA towards Prunus fruticosa and 17.9% towards P. avium. Only 5.2% of the proteins showed no IAA to any of the used reference datasets used, which was due to ab initio prediction. The data are summarized in Figure S7.

FIGURE 3

Figure 3 GenomeScope (Galaxy Version 2.0) estimation of the P. cerasus genome size by k-mer counts obtained from the software Meryl (Galaxy Version 1.3+galaxy2). Both programs are integrated on the GalaxyServerEurope. The k-mer peaks indicate that k-mers with a length of 19 bp occur in heterozygote (100× depth, 200× depth, 300× depth) and homozygote (400× depth) constitution within the genome. Coverage depth of individual k-mers is assigned as coverage.

Table 2A shows the general BUSCO statistics of transcriptomic data. A comparison of transcripts of Pce_S and the annotation datasets of Pf_eH and Pa_T enabled a quantitative comparison of shared transcripts within the datasets (Table 2B). A total number of 26,532 shared transcripts were found between the two subgenomes Pce_S_a and Pce_S_f and the genomes of Pf_eH and Pa_T. Thirty-eight percent of the P. cerasus proteins had a greater IAA to Pf_eH, whereas 54% showed a greater IAA to Pa_T. Eight percent showed an identical IAA to both ancestral species. A larger number of transcripts of both sour cherry subgenomes (Table 2B) were assigned to the annotation dataset of Pf_eH. A total of 13,425 transcripts from the Pce_S_a subgenome and 13,107 from the Pce_S_f subgenome were found in the genome sequences of Pf_eH and Pa_T. Seventy-five percent of the pool from the Pce_S_a subgenome showed a higher IAA to Pa_T and 17% to Pf_eH, while 59% from the pool originating from the Pce_S_f subgenome showed a higher IAA to Pf_eH and 32% to Pa_T.

TABLE 2A

Table 2A BUSCO statistics of the transcriptomic data generated in this study (n:1614).

TABLE 2B

Table 2B Comparison between the number of transcripts and %-IAA obtained from P. fruticosa ecotype Hármashatárhegy PfeH and P. avium cv ‘Tieton’ PaT representing the two ancestral species of P. cerasus.

3.4 Identification of syntenic regions and inversions

The sequences of the two subgenomes Pce_S_a and Pce_S_f and the genotypes Pa_T and Pf_eH of the two ancestral species P. avium and P. fruticosa were screened for duplicated regions using DAGchainer as previously published for peach (International Peach Genome Initiative et al., 2013). The seven major triplicated regions were found nearly one to one in P. avium but not in P. fruticosa, which lacked regions 4 and 7 corresponding to International Peach Genome Initiative et al., 2013. P. avium and P. fruticosa seem to derive from the same paleohexaploid event like peach, but with a loss of the fourth and seventh paleoset of paralogs in P. fruticosa. The graphical analysis is summarized in Figure S8.

Thirteen inversions were detected through positional co-linearity comparison between the two subgenomes using the molecular markers from the 9 + 6k SNP array (Figure S9). Five inversions were found between subgenome Pce_S_a and the genome sequence of Pa_T. Eleven inversions were found between subgenome Pce_S_f and Pf_eH (Table S6). By comparing the position of amino acid sequences of orthologous proteins (synteny), we found 21 inversions when comparing Pce_S_f with Pf_eH. Only 7 were found between Pce_S_a and Pa_T and 16 were found between both subgenomes Pce_S_a and Pce_S_f (Figure S10).

3.5 Detection of de novo homoeologous exchanges

For the detection of de novo homoeologous exchanges, we used three approaches by comparing inter- and intraspecific %-covered bases (genomic and transcriptomic) and %-IAA between proteins of Pce_S to Pa_T and Pf_eH (Figure 4; Figure S11). Pce_S short reads were mapped against Pa_T and Pf_eH and only species specific reads (Pa_T and Pf_eH) were filtered into read subsets. The obtained read subsets were re-mapped against Pce_S_a and Pce_S_f and base coverage was calculated. A total of 1,024 regions (100k window) that were intraspecific %-covered bases from mapped reads (Pce_S_a to Pa_T, Pce_S_f to Pf_eH) and were less than interspecific %-covered bases from mapped reads (Pce_S_a to Pf_eH, Pce_S_f to Pa_T) were discovered. In a second approach, translocations between the two subgenomes were localized by short-read mapping analyses. Short-reads (RNAseq) from P. cerasus, P. avium, and P. fruticosa obtained from Bird et al. (2022) were mapped on Pce_S. A total of 148 regions whose intraspecific difference of %-covered bases from obtained RNAseq reads (Pa and Pce_S_a, Pf and Pce_S_f) was greater than the interspecific difference of %-covered bases from obtained RNAseq reads (Pf and Pce_S_a, Pa and Pce_S_f) indicated homoeologous exchanges between the two subgenomes. Finally, 367 regions in which the proportion of transcripts with intraspecific amino acid identity (Pa_T and Pce_S_a, Pf_eH and Pce_S_f) was less than the proportion of transcripts with interspecific amino acid identity (Pf_eH and Pce_S_a, Pa_T and Pce_S_f) were identified (Figure 4). Several regions were confirmed by calculating the 70% quantile of the IAA value within a window of 1-Mbp windows (Note S1). This confirms that there are transcripts in the Pce_S genome whose IAA to the homoeologous representative genome (Pf_eH) is greater than that to the homologous (Pa_T) representative. A total of 21 in Pce_S_a and 29 in Pce_S_f regions spanning 250k windows were finally identified that match all three criteria indicating de novo homoeologous exchanges within the subgenomes (Figure 4; Figure S11). No evidence for an introgression of other Prunus species was found (Note S1, Note S2). Using 14 reference species, 60,123 gene models were annotated. Almost the same number was assigned to the two P. cerasus subgenomes (Table 2B). No evidence was found for large introgressions from any of the reference species (Note S1, S3). By comparing the amino acid identity of the proteins of Pa_T and Pf_eH with the respective sour cherry subgenome, the identified translocations via read mapping could be confirmed. The majority of the transcripts (51%) could be assigned to the genotypes Pa_T and Pf_eH of the two ancestral species P. avium and P. fruticosa (Figure S7); 5.2% of the transcripts could not be assigned to any of the reference species. Only <1% of the transcripts could not be assigned to one of the ancestral species. They showed equivalent matches to both species and are probably a product of ab initio prediction. A total of 49,698 proteins in subgenome Pce_S_a and 48,576 proteins in Pce_S_f shared only 13,435 and 13,107 proteins with Pa_T and Pf_eH, respectively. A total of 75% of the proteins of subgenome Pce_S_a matched better to Pa_T compared to Pf_eH, whereas only 59% of Pce_S_f mapped better to Pf_eH than to Pa_T (Table 2B).

FIGURE 4

Figure 4 Detected regions of homoeologous exchanges in the genome of P. cerasus ‘Schattenmorelle’. Circos plot of 16 pseudomolecules of the subgenomes of Pce_S_a and Pce_S_f. (A) Chromosome length (Mb); (B) 16 in Pce_S_a and 12 in Pce_S_f detected regions that match all three following analysis methods: (C) 1,024 regions (100k window) were intraspecific %-covered bases from mapped reads (Pce_S_a to Pa_T, Pce_S_f to Pf_eH) was less than interspecific %-covered bases from mapped reads (Pce_S_a to Pf_eH, Pce_S_f to Pa_T); (D) 148 regions were intraspecific difference of %-covered bases from obtained RNAseq reads (Pa and Pce_S_a, Pf and Pce_S_f) greater than interspecific difference of %-covered bases from obtained RNAseq reads (Pf and Pce_S_a, Pa and Pce_S_f); (E) 367 regions were the proportion of transcripts with intraspecific amino acid identity (Pa_T and Pce_S_a, Pf_eH and Pce_S_f) less than the proportion of transcripts with interspecific amino acid identity (Pf_eH and Pce_S_a, Pa_T and Pce_S_f).

3.6 LTR dating and divergence of time estimation

The left and the right LTR identity of a subset of 2,385 (Pce_S_a), 3,028 (Pce_S_f), 3,130 (Pa_T), and 3,992 (Pf_eH) LTRs were analyzed. The homologous genomes shared 200 (Pce_S_a versus Pa_T) and 100 (Pce_S_f versus Pf_eH) LTRs whereas 12 LTRs were shared by Pce_S_a versus Pf_eH and Pce_S_f versus Pa_T. Only five common LTRs were found between Pce_S_a and Pce_S_f and 13 between Pa_T and Pf_eH. A summary of the LTRs’ insertion time is shown in Figure 5A. The youngest shared LTRs between Pce_S_a and Pa_T and between Pce_S_f and Pf_eH were calculated with 103,896.1 and 97,402.6 generations, respectively. When comparing the homoeologous chromosomes, the youngest shared LTRs between Pa_T and Pf_eH and between Pce_S_a and Pce_S_f were calculated with 116,883.1 and 194,805.2 generations, respectively. LTRs of Pa_T were also found in subgenome Pce_S_f and calculated with 207,792.2 generations. LTRs of Pf_eH were detected in subgenome of Pce_S_a and calculated with 149,350.6 generations. This indicates an exchange of LTRs between the two subgenomes. A total of 834 single-copy orthogroups among nine genomes were found and used for single protein alignments. Single alignments were concatenated and a final alignment with nine amino acid sequences representing each species with 419,586 amino acid positions was used for phylogenetic tree construction. Using the RelTime method, the estimated divergence time between the genera Malus and Prunus was 50.4 Mya. The species groups P. persica and P. mume diverged from the P. yedonensis/P. avium/P. fruticosa group 11.6 Mya. Based on this model, the divergence of the two subgenomes of P. cerasus compared to the genome sequences of Pa_T and Pf_eH was estimated with 2.93 Mya and 5.5 Mya respectively (Figure 5B).

FIGURE 5

Figure 5 Investigation on the evolution of the genome of P. cerasus ‘Schattenmorelle’. (A) Determination of insertion time from shared long terminal repeats (LTRs) in P. cerasus subgenome avium (Pce_S_a) and P. cerasus subgenome fruticosa (Pce_S_f) compared to P. avium Tieton (Pa_T) and P. fruticosa ecotype Hármashatárhegy (Pf_eH). (B) Estimation of divergence of time (Mya) of P. cerasus subgenomes Pce_S_a and Pce_S_f compared to the donor species P. avium (Pa) and P. fruticosa (Pf). Prunus yedonensis (Pyn); Prunus avium (Pa); Prunus persica (Pp); Prunus mume (Pm); Malus domestica (Md); Paleocene (PAL); Eocene (EOC); Oligocene (OLI); Miocene (MIO); Pliocene (PlI); Pleistocene (PLEI).

4 Discussion

The genome of the economically most important sour cherry ‘Schattenmorelle’ in Europe was sequenced using a combination of Oxford Nanopore R9.4.1 PromethION long-read technology and Illumina NovaSeq™ short-read technology. After assignment of the long reads to the two subgenomes and Hi-C analysis, the final assembly was 629 Mbp and showed an overall acceptable contiguity of the subgenome P. avium to a recently published genome of P. cerasus Montmorency - Pce_M (Goeckeritz et al., 2023). Larger differences were found between the haplotypes of P. fruticosa, which was expected (Note S4). This sequence was used to study structural changes present in the allotetraploid sour cherry genome after its emergence. Therefore, the sour cherry genome sequence was compared to the published genome sequences of Prunus avium Tieton (Pa_T, Wang et al., 2020) and Prunus fruticosa ecotype Hármashatárhegy (Pf_eH, Wöhner et al., 2021) representing genotypes of the two ancestral species. The size of the subgenome Pce_S originating from P. avium was 269 Mbp. A similar genome size (271 Mbp) is described for the Prunus avium Big Star (Pinosio et al., 2020) and Sato Nishiki (Shirasawa et al., 2017). Larger differences were found in Regina with 279 Mbp and Tieton with 344 Mbp (Le Dantec et al., 2018; Wang et al., 2020). Differences were also found between the size of subgenome Pce_S_f (299 Mbp) and the genome of the ground cherry genotype P. fruticosa ecotype Hármashatárhegy (366 Mbp, Wöhner et al., 2021). These differences indicate a reduction of the subgenome Pce_S_a by 0.49%–21%, whereas for subgenome Pce_S_f, a reduction of 18.29% was found. This was further confirmed by a reduced genome size of the recently published subgenome sequences of Pce_M (Table 3) published by Goeckeritz et al. (2023). The reduction in genome size for allotetraploid species in comparison to their ancestral genomes was reported for Nicotiana tabacum (1.9%–14.3%) and Gossypium species (Leitch et al., 2008; Hawkins et al., 2009; Renny-Byfield et al., 2011), with an overall downsizing rate for angiosperms calculated as 0%–30% (Zenil-Ferguson et al., 2016). Genome downsizing in response to a genome hybridization event can be explained with evolutionary advantages, which give these species with smaller genomes a selection advantage in the long term (Knight et al., 2005; Zenil-Ferguson et al., 2016). Although downsizing of the P. cerasus subgenomes is most probable, enlargement and expansion of the genomes of ancestral species during evolution would be another possibility. However, an increase in genome size during the evolution of a species has only rarely been documented (Jakob et al., 2004; Leitch et al., 2008; Kim et al., 2014). BUSCO analysis provides additional evidence for the reduction of genome size. Although the number of genes does not correlate with genome size in eukaryotes (Pierce, 2012), differences between the ancestral genomes and sour cherry could be observed when looking at BUSCO completeness (Table 3). Considering both subgenomes and the genomes of Pa_T and Pf_eH, a completeness of >96.4% was obtained. However, the completeness of the single subgenomes was only 89.4% for Pce_S_a and 87.1% for Pce_S_f. This could also be observed for Pce_M (Table 3). Comparisons (Note S5) within the subgenomes of Pce_S and Pce_M showed a loss between 2.4% and 6.4% of BUSCOs (NoteS5, A), between 0.6% and 0.7% when comparing the subgenomes of Pce_S and Pce_M (Note S5, B), and between 3.3% and 6.9% between the subgenomes of Pce_S and Pce_M (Note S5, C) and the representing ancestral genomes (Pa_T and Pf_eH). Structural differences between the P. cerasus subgenomes and the genomes of Pa_T and Pf_eH were also found by comparing the number of repetitive elements. While the content of repetitive elements differs by only 0.86% between Pf_eH and Pce_S_f, it is 17.8% between Pa_T and Pce_S_a. Whether this is a consequence of hybridization remains speculative and would deserve further studies. An increase of class I elements Gypsy from 6% in the Pa_T genome to 7.3% in subgenome Pce_S_a indicates an expansion of this class following the formation of the sour cherry genome or a possible reduction of non-repetitive sequences in the corresponding subgenome resulting in a smaller genome size.

TABLE 3

Table 3 BUSCO and assembly statistics of the genomic data generated in this study and comparative datasets (n:1614).

A comparison of syntenic regions showed a high degree of collinearity between Pce_S, Pa_T, and Pf_eH genomes (Figure 2; Figures S8, S9), with single inversions between the respective chromosome pairs. Using the genome of P. persica, seven triplicated regions were detected in Pce_S_a, confirming that these genomes descend from a palaeohexploid ancestor. However, the triplicated regions 4 and 7 in Pce_S_f were only detected in highly fragmented form or have been lost. Hao et al. (2022), who described a rapid loss of homoeologs immediately after polyploidy events, described a similar finding. Based on Ranallo-Benavidez et al. (2020), the results from the k-mer analysis confirm that the genome of sour cherry can be considered as highly heterozygous and segmental allotetraploid. Furthermore, genomes of segmental allopolyploids may possess a mix of auto- and allopolyploid segments through duplication–deletion events as a result of homoeologous exchanges leading to either reciprocal translocations or homoeologous non-reciprocal translocations (Mason and Wendel, 2020). Whereas autotetraploids have an aaab > aabb rate, allotetraploids are considered to have aaab < aabb. The near identical rate between aaab and aabb in Pce_S provides strong evidence that the sour cherry is a segmental mix of auto- and allotetraploidy.

Due to this assumption of segmental allotetraploidy, homoeologous recombination between homoeologous chromosomes is very likely. This is confirmed by the coverage and amino acid identity analyses. Homoeologous exchange events between the chromosomes of subgenome Pce_S_a and Pce_S_f were detected (Figure 4; Figure S11). These exchanges are not balanced but probably a product of a duplication/deletion event as described by Mason and Wendel (2020), generating the proposed mosaic of genomic regions representing one or the other subgenome. In this study, we found 14 homoeologous exchanges in Pce_S_a and 3 in Pce_S_f within one assembled contig obtained from the primary assembly 20-WGS-PCE.1.0 and 33 spanning multiple contigs (Note S6). Whether these regions were a result of incorrect assembly was not investigated and could be part of future studies.

Evidence that subgenome Pce_S_a seems to be closer to Pa_T than Pce_S_f to Pf_eH was found. This was confirmed by an evolutionary approach that calculated the separation of the subgenome Pce_S_f from P. fruticosa 5.5 Mya, while subgenome Pce_S_a separated from P. avium 2.93 Mya (Figure 5B).

To validate these results, the insertion events of long terminal repeats between P. avium, P. fruticosa, and the subgenomes were calculated. Assuming a Prunus-specific rate of 7.7 × 10⁻⁹ mutations per generation (Xie et al., 2016), LTRs of the same type with the same insertion time were identified in the same positional order in the different (sub)genomes. The most recent co-occurring LTRs between the genomes of Pa_T and Pf_eH could be dated at 116,883.1 generations. Exact data on the duration of the generation time of Prunus species in natural habitats do not exist. Although the juvenile phase of many Prunus species is usually completed after 5 years (Besford et al., 1996), it can be assumed that the times for a generation are considerably higher. Many fruit species are hardly or not at all able to rejuvenate by seeds under natural conditions (Coart et al., 2003), or they rejuvenate mainly by root suckers (Li et al., 2022). Other studies on Prunus therefore assume a duration of 10 years per generation (Wang et al., 2021), although even this seems rather too little.

Assuming that a generation change is to be expected after 10 to 60 years (Besford et al., 1996), this would correspond to a time period of ~1 to 6 Mya. The youngest co-occurring LTR could be estimated at 1.9 Mya. This suggests that P. fruticosa and P. avium probably shared a gene pool between ~1 Mya and 2 Mya. It should be noted that this estimate can vary greatly depending on the number of years per generation used in the calculation (Figure 5A). Based on the results of the protein dating, a generation time of 30 years is more likely for P. avium. For P. fruticosa, which occurs less frequently in natural habitats and reproduces mainly via root suckers, the generation time seems to be somewhat longer at 55 to 60 years. Some LTRs present in Pa_T, but absent in Pf_eH, were found in subgenome Pce_S_f only and vice versa. Other class I elements (LTR - ERV1, Pao) and Academ/-2 were specifically detected in one of the two genotypes Pa_T and Pf_eH representing the two ancestral species of sour cherry and in Pce_S_a and Pce_S_f, which indicates a transfer of these elements between the two subgenomes following the formation of the allotetraploid P. cerasus genome. This is a further indication for a segmental exchange between the two sour cherry subgenomes. Mason and Wendel (2020) speculated that unidirectional homoeologous exchange was observed in recent or synthetic allopolyploids. However, our results confirm this hypothesis by the evidence that sour cherry is a recent allopolyploid with autopolyploid segments derived from unidirectional homoeologous exchanges.

5 Conclusion

Sequencing of the genome of the European sour cherry cv. ‘Schattenmorelle’ has provided strong evidence that it is indeed a segmental allotetraploid consisting of two subgenomes, one derived from the sweet cherry P. avium and one from the ground cherry P. fruticosa. DNA sequences have been repeatedly exchanged between the two subgenomes. Our findings differ slightly from the recently sequenced genome of Montmorency—the sour cherry cultivar predominant in the US (Goeckeritz et al., 2023). Although Montmorency was shown to possess two subgenomes inherited from P. avium and P. fruticosa, it inherited two copies of the same subgenome from the former and two distinct subgenomes from the latter, making it trigenomic (Goeckeritz et al., 2023). We could not show that ‘Schattenmorelle’ is trigenomic. This discrepancy between both studies could be attributed to the sequencing technologies used. Whereas we used Illumina and Oxford Nanopore long read, Goeckeritz et al. (2023) used PacBio Sequel II. At the same time, a reduction in genome size has taken place. Other Prunus species have not contributed to the evolution of this species. No evidence was found for introgressions in the sour cherry genome derived from Prunus species other than P. avium and P. ‘Schattenmorelle’ approximately 1 Mya at the earliest.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

Author contributions

TW: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Project administration, Software, Supervision, Writing – original draft, Writing – review & editing. OE: Conceptualization, Validation, Writing – original draft, Writing – review & editing. AW: Conceptualization, Data curation, Investigation, Methodology, Project administration, Software, Validation, Writing – original draft. KN: Data curation, Investigation, Methodology, Software, Writing – original draft. RW: Data curation, Investigation, Methodology, Software, Writing – original draft. E-JB: Data curation, Investigation, Methodology, Software, Writing – original draft. HS: Data curation, Investigation, Methodology, Software, Writing – original draft. JK: Conceptualization, Data curation, Investigation, Methodology, Software, Writing – original draft. TB: Data curation, Investigation, Writing – review & editing. KH: Conceptualization, Investigation, Methodology, Software, Writing – original draft. LG: Data curation, Investigation, Methodology, Software, Writing – original draft. HT: Investigation, Methodology, Software, Writing – original draft. OA: Conceptualization, Methodology, Software, Writing – review & editing. LB: Conceptualization, Validation, Writing – review & editing. MS: Conceptualization, Validation, Writing – review & editing. JL: Formal Analysis, Software, Writing – original draft. AP: Conceptualization, Validation, Writing – original draft, Writing – review & editing. HF: Conceptualization, Supervision, Validation, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Acknowledgments

We would acknowledge the Galaxy Europe, Galaxy USA server administration for support and provision of resources. Special thanks to Eric Lyon for support with Synmap2.

Conflict of interest

Authors AW, KN, RW, E-JB and HS were employed by company KeyGene N.V.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2023.1284478/full#supplementary-material

References

Beaver, J. A., Iezzoni, A. F. (1993). Allozyme Inheritance in Tetraploid Sour Cherry (Prunus cerasus L.), jashs. Available at: https://journals.ashs.org/jashs/view/journals/jashs/118/6/article-p873.xml.

Google Scholar

Bertioli, D. J., Jenkins, J., Clevenger, J., Dudchenko, O., Gao, D., Seijo, G., et al. (2019). The genome sequence of segmental allotetraploid peanut Arachis hypogaea. Nat. Genet. 51, 877–884. doi: 10.1038/s41588-019-0405-z

PubMed Abstract | CrossRef Full Text | Google Scholar

Besford, R. T., Hand, P., Peppitt, S. D., Richardson, C. M., Thomas, B. (1996). Phase change in Prunus avium: differences between juvenile and mature shoots identified by 2-dimensional protein separation and in vitro translation of mRNA. J. Plant Physiol. 147 (5), 534–538.

Google Scholar

Bird, K. A., Jacobs, M., Sebolt, A., Rhoades, K., Alger, E. I., Colle, M., et al. (2022). Parental origins of the cultivated tetraploid sour cherry (Prunus cerasus L.). Plants People Planet 4, 444–450. doi: 10.1002/ppp3.10267