Construction of a High-Density Genetic Map of Acca sellowiana (Berg.) Burret, an Outcrossing Species, Based on Two Connected Mapping Populations

Acca sellowiana, known as feijoa or pineapple guava, is a diploid, (2n = 2x = 22) outcrossing fruit tree species native to Uruguay and Brazil. The species stands out for its highly aromatic fruits, with nutraceutical and therapeutic value. Despite its promising agronomical value, genetic studies on this species are limited. Linkage genetic maps are valuable tools for genetic and genomic studies, and constitute essential tools in breeding programs to support the development of molecular breeding strategies. A high-density composite genetic linkage map of A. sellowiana was constructed using two genetically connected populations: H5 (TCO × BR, N = 160) and H6 (TCO × DP, N = 184). Genotyping by sequencing (GBS) approach was successfully applied for developing single nucleotide polymorphism (SNP) markers. A total of 4,921 SNP markers were identified using the reference genome of the closely related species Eucalyptus grandis, whereas other 4,656 SNPs were discovered using a de novo pipeline. The individual H5 and H6 maps comprised 1,236 and 1,302 markers distributed over the expected 11 linkage groups, respectively. These two maps spanned a map length of 1,593 and 1,572 cM, with an average inter-marker distance of 1.29 and 1.21 cM, respectively. A large proportion of markers were common to both maps and showed a high degree of collinearity. The composite map consisted of 1,897 SNPs markers with a total map length of 1,314 cM and an average inter-marker distance of 0.69. A novel approach for the construction of composite maps where the meiosis information of individuals of two connected populations is captured in a single estimator is described. A high-density, accurate composite map based on a consensus ordering of markers provides a valuable contribution for future genetic research and breeding efforts in A. sellowiana. A novel mapping approach based on an estimation of multipopulation recombination fraction described here may be applied in the construction of dense composite genetic maps for any other outcrossing diploid species.


Before to follow this tutorial
We expected that you have enough knowledge to build a linkage map for outcrossing populations with onemap software. If not, please follow its tutorial, available at http://augusto-garcia.github.io/onemap/vignettes_ highres/Outcrossing_Populations.html.

Built-in data
In this tutorial, we will use a built-in data of the onemap package called onemap2pop. It is a simulated data of two full-sib populations that share one same parent. We used the software PedigreeSim (Voorrips and Maliepaard, 2012) to simulate them and onemap to build the individual linkage maps. To load this data: The function rf_2pops estimates the recombination fraction based on two mapping populations. It estimates the recombination fractions based on a multipoint approach implemented using the methodology of Hidden Markov Models (HMM) with the Expectation Maximization (EM) algorithm as explained in the supplementary material of Quezada et al. (2021).
To use it, the user must had already built the individual maps for each population and assigned the correspondent linkage groups within markers. After building the maps for each population, the user must present an initial order with sharable markers between both populations, i.e., both populations have the markers provided in this order. Let's assume that we built the following two linkage maps for a given linkage group (hereafter LG1) based on the information derived from two populations (POP1 and POP2).
LG1_POP1_final We have in this example two different orders for the same markers, one for each population: LG1_POP1_final$seq.num ## [1] 1 2 3 4 5 6 7 8 9 11 12 13 10 14 15 16 17 18 19 20 21 LG1_POP2_final$seq.num ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 19 21 The first step is to obtain the multipoint recombination fraction for the the two previously order based on the information of both populations. The Parent 1 is the common parent between the populations, therefore, has the same linakge phase configuration. Parent 2 is different between the populations, and so is free phase configuration. The recombination fraction on the maps is the one estimated using the information of both populations based on HMM-EM from Quezada et al. (2021). The log-likelihood is computed for each map using the same recombination fractions for POP1, POP2, and POP1 and POP2 simultaneously.
We will use the RIPPLE algorithm. This function is current not optimized and may take an overnight for each linkage group. To avoid such waiting in this tutorial, the object ripple_result_LG1 was already made available and the user does not need to run the following chunk.
## It may take an overnight to run... ripple_result_LG1 <-ripple_2pops(markers_names = order_LG1POP2, data_P1 = POP1_geno, data_P2 = POP2_geno, twopts_POP1 = twopts_POP1, twopts_POP2 = twopts_POP2, LOD = 3, max.rf = 0.5, log10.mintol = -2, max_it = 60, window = 4) Now we find the order that maximizes the log-likelihood of the map. Based on the RIPPLE results, the 386 has the highest likelihood which is also higher than the initial order from the POP2 map. Therefore, we will use it as our final linkage group order. It is worthy noting that this order matches with the one we simulated. Building and printing our final order of LG1: