Harnessing genome-wide genetic diversity, population structure and linkage disequilibrium in Ethiopian durum wheat gene pool

Yanyang Liu, Henan Academy of Agricultural Sciences (HNAAS), China; Landraces are an important genetic source for transferring valuable novel genes and alleles required to enhance genetic variation. Therefore, information on the gene pool’s genetic diversity and population structure is essential for the conservation and sustainable use of durum wheat genetic resources. Hence, the aim of this study was to assess genetic diversity, population structure, and linkage disequilibrium, as well as to identify regions with selection signature. Five hundred (500) individuals representing 46 landraces, along with 28 cultivars were evaluated using the Illumina Infinium 25K wheat SNP array, resulting in 8,178 SNPs for further analysis. Gene diversity (GD) and the polymorphic information content (PIC) ranged from 0.13–0.50 and 0.12–0.38, with mean GD and PIC values of 0.34 and 0.27, respectively. Linkage disequilibrium (LD) revealed 353,600 pairs of significant SNPs at a cut-off (r2 > 0.20, P < 0.01), with an average r2 of 0.21 for marker pairs. The nucleotide diversity (π) and Tajima’s D (TD) per chromosome for the populations ranged from 0.29–0.36 and 3.46–5.06, respectively, with genome level, mean π values of 0.33 and TD values of 4.43. Genomic scan using the Fst outlier test revealed 85 loci under selection signatures, with 65 loci under balancing selection and 17 under directional selection. Putative candidate genes co-localized with regions exhibiting strong selection signatures were associated with grain yield, plant height, host plant resistance to pathogens, heading date, grain quality, and phenolic content. The Bayesian Model (STRUCTURE) and distance-based (principal coordinate analysis, PCoA, and unweighted pair group method with arithmetic mean, UPGMA) methods grouped the genotypes into five subpopulations, where landraces from geographically non-adjoining environments were clustered in the same cluster. This research provides further insights into population structure and genetic relationships in a diverse set of durum wheat germplasm, which could be further used in wheat breeding programs to address production challenges sustainably.

Durum wheat was domesticated in the Fertile Crescent in the ninth millennium BC (Fayaz et al., 2019), and the Levantine region is considered a center of origin and diversity (Kabbaj et al., 2017). Some reports consider Ethiopia as the third country of domestication of durum wheat, which led to the development of T. aethiopicum and is regarded as the center of origin and diversity of tetraploid wheat, including T. durum (Mengistu et al., 2016;Kabbaj et al., 2017). Harlan (1969), Simmonds (1993), and Savage et al. (1994) also reported Ethiopia as a center of the astonishing diversity of tetraploid wheat species, which is evidenced by the presence of the crop wild relatives and diversified forms of these species in the country. Research has demonstrated the usefulness of Ethiopian durum wheat collection as a source of alleles for improving traits, including grain yield, nutritional quality, and host plant resistance to pathogens and drought tolerance (Mengistu et al., 2016;Kabbaj et al., 2017;Mengistu et al., 2018;Kidane et al., 2019;Alemu et al., 2020;Negisho et al., 2021;Mulugeta et al., 2023). For example, Mengistu et al. (2016) discovered new gene associated with days to booting, flowering and maturity. Kidane et al. (2019) found 177 unique protein-coding genes in Ethiopian durum wheat utilizing a large nested association mapping population for breeding and quantitative trait locus mapping. Mulugeta et al. (2023) were also able to identify major novel loci associated to grain yield and related traits based on diverse sets of Ethiopian durum wheat landraces and cultivars. In spite of this, this valuable germplasm remains largely underutilized in breeding programs intended to improve these characteristics.
Analyses of the genetic diversity of crops is essential to determine the extent and pattern of diversity, domestication history, and the genetic relationship among different domesticated forms, such as landraces and cultivars (Soriano et al., 2016;Soriano et al., 2018;Rufo et al., 2019;Mazzucotelli et al., 2020). A comprehensive analysis of crop genetic diversity is necessary to enhance cultivar resilience to the changing climate. The genetic diversity of wheat under cultivation is declining sporadically due to its exposure to several bottlenecks during its domestication and post-Mendelian adoption of breeding, as well as due to the impacts of climate change and the growing human population (Louwaars, 2018;Pont et al., 2019;Kumar et al., 2020;Mazzucotelli et al., 2020;Sansaloni et al., 2020;Sthapit et al., 2020). To overcome these challenges, beneficial alleles can be transferred from crop wild relatives and landraces that are high in genetic diversity to improve the diversity of modern cultivars (Johansson et al., 2020;Kilian et al., 2020;Adhikari et al., 2022;Badaeva et al.). On the other hand, working with these genetic materials has challenges arising from the introduction of undesirable traits due to linkage drag, which needs careful selection to make them agronomically valuable for cultivar development programs (Mondal et al., 2016;Kilian et al., 2020;Sansaloni et al., 2020). Even if this limitation is challenging in crossbreeding, crop wild relatives and landraces remain the primary sources of novel beneficial alleles and diversity for future wheat improvement (Maccaferri et al., 2019;Sansaloni et al., 2020;Yadav et al., 2022). The determination of the extent and pattern of genetic diversity in durum wheat gene pool is therefore critical for future conservation and breeding efforts (Negisho et al., 2021).
Information on the population structure and linkage disequilibrium (LD) of the genetic materials of interest is also essential to understand the domestication and selection history, determine the genetic profiles of population subgroups (Jin et al., 2010;Tascioglu et al., 2016;Siol et al., 2017), and understand the evolutionary history of genomic regions (Maccaferri et al., 2010). These are crucial for providing a better understanding of genetic diversity in crop germplasm (Roncallo et al., 2021), and serve as the entry point for analyzing the genetic information of complex traits (Fiedler et al., 2017;Wang et al., 2019).
The extent and pattern of LD vary across populations, genetic regions, and proximity between pairs of loci (Fayaz et al., 2019). The LD between two loci decays progressively based on the degree of recombination rate and time passed across the number of generations (Fayaz et al., 2019;Maccaferri et al., 2019). The LD decay in plant species depends on the mutation rate, population size, the number of founding chromosomes in the population, and cycles of generation for which the population has existed (Devlin and Risch, 1995;Flint-Garcia et al., 2003;Roncallo et al., 2018). Research conducted so far to investigate the extent and pattern of population structure and LD in durum wheat germplasm has been very limited (Maccaferri et al., 2005;Fayaz et al., 2019;Liu et al., 2019;Alemu et al., 2020;Negisho et al., 2021;Roncallo et al., 2021). However, advances in genomic tools have played a pivotal role in estimating the extent and pattern of genetic variations, understanding the broader genetic implications of evolution, and executing hundreds of thousands of years' effect of selection and breeding in durum wheat (Maccaferri et al., 2019;Sansaloni et al., 2020).
The investigation of the genetic diversity of Ethiopian durum wheat have been made previously based on phenotypic traits, which revealed high diversity and distinctness in its morphological characteristics (Eticha et al., 2006;Mengistu et al., 2015a;Dejene and Mario, 2016). More recently, the genetic diversity of Ethiopian durum wheat has been revealed using advanced genomic tools (Mengistu et al., 2016;Kabbaj et al., 2017;Mengistu et al., 2018;Asmamaw et al., 2019;Kidane et al., 2019;Kidane et al., 2019;Alemu et al., 2020). However, the germplasm used represents a tiny fraction of the existing durum wheat accessions in the Ethiopian Biodiversity Institute (EBI) gene bank (https://ebi.gov.et/ resources/). In addition, only scanty research has previously analyzed the within-population genetic variation of Ethiopian durum wheat using recent genomic tools (Mengistu et al., 2016;Alemu et al., 2020;Negisho et al., 2021). The vast majority of ex-situ conserved Ethiopian durum wheat accessions have not been characterized using genome-wide DNA markers. Hence, molecular characterization of a large subset of the collections using such markers will facilitate the identification of exploitable and valuable genes and germplasm that can be utilized in crop improvement programs.
The present study aimed to evaluate the extent and amount of genetic diversity in diverse Ethiopian durum wheat landraces and cultivars. The study also aimed to describe genetic population structure and linkage disequilibrium in a set of durum wheat gene pools from Ethiopia, detect the admixture in a population, identify selection regions, and provide deeper insight into the level of genetic diversity and structure from different eco-geographic regions. This study highlights the ample amount of genetic diversity and untapped potential of Ethiopian durum wheat germplasm, which can be used to unravel novel genes for extending the gene pool and generating climate-resilient cultivars.

Plant materials
This study examined 46 phenotypically diverse durum wheat landraces collected from various geographical regions of Ethiopia and 28 improved cultivars registered by the Ethiopian Ministry of Agriculture (MoA) after confirming their DUS (distinctness, uniformity, and stability) (Figure 1, Supplementary Table 1). Initially, the seeds of the landraces were obtained from EBI for phenotypic characterization. The landraces used in the present study were selected based on our previous phenotypic characterization (Mulugeta et al., 2022), which noticed a high within-landrace diversity in each landrace. Hence, phenotypically different landraces were selected to molecularly describe within- Map of Ethiopia indicating the geographical distribution of collection sites of 46 durum wheat landraces populations origin (shaded green) (NB: All boundaries are approximated and have nothing to do with political borders). The map was constructed using the ArcGIS software suite vs. 10.7.1. landrace variations. Each landrace was represented by 8 to 16 plants. Five hundred individuals representing the 46 landraces were individually sampled during field characterizations, along with 28 cultivars. Based on the information obtained in our previous study (Mulugeta et al., 2022), each landrace was considered as separate population. For simplicity, the landraces and modern cultivars were referred to as genotype. We represented all 28 cultivars as one separate population to see the level of genetic diversity existing in them.

Planting, leaf sample harvesting, and genomic DNA extraction
For each genotype (i.e., 528 samples representing 47 populations), five healthy seeds were randomly selected and planted in a square-shaped pot with a size of 10 cm × 10 cm × 11 cm in the greenhouse of the Swedish University of Agricultural Science (SLU) Alnarp, southern Sweden, for two weeks. Ten discs of young leaf samples pooled from five plants per genotype were harvested in 96-well deep well plates and freeze-dried using CoolSafe ScanVAC Freeze Dryer following the instruction of TraitGenetics. The freeze-dried leaf samples were sent to Trait Genetics (Gmbh, Gatersleben, Germany) for genomic DNA extraction and subsequent genotyping. The genomic DNA was extracted using a standard cetyltrimethylammonium bromide (CTAB) method from the leaf samples using TraitGenetics' lab protocol.

SNP selection, genotyping, and filtering of SNP markers
The samples were genotyped using a high-density Illumina Infinium 25k wheat single nucleotide polymorphism (SNP) array by TraitGenetics Gmbh (Gatersleben, Germany). This SNP array contains most SNPs from the earlier 90k Infinium array, 35K Wheat Breeders array, 135K Axiom wheat array, and SNPs within genes associated with specific importance in durum wheat breeding. SNPs accurately matching the A and B genomes were selected based on a cluster file of hexaploid wheat and the details of these SNPs can be found at https://sgs-institut-fresenius.de/en/gesundheit-undernaehrung/traitgenetics/genotyping. The SNP loci were filtered by removing those with a missing value above 5% and a minor allele frequency (MAF) below 5% using TASSEL v 5.2.67 software (Bradbury et al., 2007). These filtering steps resulted in 8,178 SNPs for further genetic information analysis.

Data analysis 2.4.1 Patterns of genomic nucleotide variations
The nucleotide diversity (p) (Nei, 1987) and Tajima's D (Tajima, 1989) of each population were analyzed using the PopGenome package (Pfeifer et al., 2014) in the R program (R Development Core Team, 2021) to uncover genome-wide genetic variation. The sliding window approach with a window size of 1,000 kbp and a jump size of 100 kbp was applied as previously described . The site frequency spectrum of each population was analyzed using the software DnaSP version 6 (Rozas et al., 2017). The number of alleles (Na), the mean number of effective alleles (Ne), Shannon's information index (I) and Hardy-Weinberg equilibrium (HWE) test were performed using the GenAIEx v.6.5 software (Peakall and Smouse, 2012). The polymorphism information content (PIC) (Serrote et al., 2020) and gene diversity were computed using Power marker v3.25 (Liu and Muse, 2005). The observed heterozygosity (Ho), expected heterozygosity (Nei, 1973), and the percentage of polymorphic loci (PPL) were analyzed using Arlequin v.3.5.2.2 (Excoffier and Lischer, 2010).
Loci under selection from genome scans were analyzed assuming a null distribution under the hierarchical island model with 100,000 simulations and 100 numbers of demes simulated per population as described in Excoffier and Lischer (2010) using Arlequin v.3.5.2.2 (Excoffier and Lischer, 2010). Comparative analyses with previously published reports using different Triticum databases including GrainGene, T3/wheat, and Wheat URGI were used to determine the potential genes associated with loci under selections that are controlling important traits (Alaux et al., 2018). To identify genes related to selection signatures, lists of identified putative candidate genes and their functions were downloaded from the NCBI database (https://ftp.ncbi.nlm.nih.gov/ genomes/all/GCA/900/231/445/GCA_900231445.1_Svevo.v1/).The nucleotide position extending from 1-8.56 Mbp up and downstream from the SNP position was used for searching the potential candidate genes, as previously reported for wheat ((Breseghello and Sorrells, 2006). The genes associated with the regions under selection signatures were obtained from the durum wheat (Triticum turgidum (Svevo.v1)) reference genome (Maccaferri et al., 2019).

Linkage disequilibrium (LD) analysis
Knowing LD among pairs of multiple SNP markers provides valuable information on the correlation structure of different loci based on their allelic variation (Siol et al., 2017). The pairwise LD (measured as r 2 ) for SNP pairs was calculated as described in Weir (1997) using TASSEL version 5.2.8 (Bradbury et al., 2007) based on the LD window size of 50 bp. The decay rate was estimated for significant SNP marker pairs (r 2 = 0.20, p<0.01) for A and B genomes separately as well as for the whole genome. The association of genome-wide LD decay and the physical distance was plotted by fitting a locally weighted linear regression (loess) line using the R function 'loess`(R Development Core Team, 2021). The physical distance at which the r 2 value dropped to half its average maximum value was considered the LD decay rate (Huang et al., 2010).

Genetic population structure analysis
Principal Coordinate Analysis (PCoA) based on Nei's standard genetic distance was also performed to investigate further the association between the populations using GenAIEx v.6.5 (Peakall and Smouse, 2012). A Bayesian Model-based clustering algorithm implemented in the software STRUCTURE version 2.3.4 (Pritchard et al., 2000) was utilized to infer the population genetic structure. An ADMIXTURE model and correlated allele frequencies were assumed to assess the ancestry fractions of each subgroup attributed to each landrace. The burn-in period and Markov Chain Monte Carlo (MCMC) iterations for subgroups (K) ranging from K1 to K10 independent runs were adjusted to 50,000 and 100,000, respectively. The program STRUCTURE Harvester (Earl and von Holdt, 2012) was used to visualize the results. The best K representing the germplasm analyzed was determined using the delta K (DK) method as described in Evanno et al. (2005), and the optimum K bar plot was drawn using the CLUMPAK online server (Kopelman et al., 2015). Genotypes with an arbitrary value of Q > 75% of their genome were regarded as pure genotypes, while those with membership probabilities Q < 75% for each genotype were considered admixture (Carovic-Stanko et al., 2017). Nei's standard genetic distance (Nei, 1973) based unweighted pair group method with arithmetic mean (UPGMA) cluster analysis was performed using Power Marker v.3.25 (Liu and Muse, 2005) to determine the relationship between the populations further. Software MEGA version x (Kumar et al., 2018) was used to visualize the UPGMA tree. Analysis of molecular variance (AMOVA) was performed to partition the total genetic variation into variation within individuals, among individuals within populations, and among populations and groups (Weir and Cockerham, 1984;Peakall and Huff, 1995) using the software Arlequin v.3.5.2.2 (Excoffier and Lischer, 2010). Arlequin was also used to estimate pairwise genetic variation within populations and differentiation among populations. The joint population differentiation (F S T ) distribution and heterozygosity were analyzed as described in Excoffier and Lischer (2010).

SNP markers' quality, distribution, density, and levels of polymorphism
From a total of 24 145 SNP markers, after removing SNP markers with a missing value above 5% and MAF below 5%, 8,178 polymorphic and high-quality SNP loci distributed across all 14 durum wheat chromosomes were selected for further genetic analysis. Of these 8,178 SNP markers, 3,471 (42.4%) and 3658 (44.7%) have known map positions on the A and B genomes, respectively ( Table 1). The map positions of 1049 (12.83%) SNPs on the durum wheat genome have not been precisely determined. Chromosomes 5B and 4B contained the highest and lowest number of SNPs per chromosome, with 659 SNPs and 290 SNPs, respectively ( Figure 2; Table 1). The average marker density was 0.72, 0.73, and 0.72 markers per Mbp for the A and B genomes and the whole genome, respectively. In total, the distribution of these SNP markers covered 9.85 Gbp regions of the durum wheat genome, with chromosomes 1A and 2B having the least (582.20 Mbp) and largest (788.36 Mbp) regions covered ( Figure 2; Table 1).
The minor allele frequency (MAF) of the 8,178 SNP loci ranged from 0.07 to 0.5 with a mean of 0.24. The levels of polymorphism measured in terms of gene diversity (GD) ranged from 0.13 (at 505 loci) to 0.50 (at 2345 loci) with a mean gene diversity of 0.34. At a chromosome level, GD ranged from 0.32 to 0.36 with a mean value of 0.33 across the genome (Table 1). The PIC, an indicator of the informativeness of markers, ranged from 0.12 (for 208 loci) to 0.38 (for 456 loci), with a mean PIC value of 0.27. Moreover, at the chromosome level, the PIC varied from 0.24 on chromosome 3B to 0.29 on chromosomes 1B and 3A, respectively (Table 1). The expected heterozygosity (He) value across all loci ranged from 0.02 to 0.18.

Magnitude and pattern of allelic diversity in the populations
Several molecular diversity indices were determined to evaluate the magnitude and pattern of within-landraces genetic variation of the 46 landraces. The observed number of alleles (Na) and the effective number of alleles (Ne) per locus of the landraces varied from 1.00 (EH2) to 1.75 (WSH8) and from 1.00 (EH1) to 1.31 (WSH8), respectively. The mean Na and Ne values were 1.30 and 1.10, respectively. The highest percentage of polymorphic loci (%P, 85.65%) was found in cultivars population, followed by landrace WSH7 (%P, 74.5%), WSH3 (71.51%) and WSH8 (64.67%) (Supplementary Table 2). In contrast, landraces EH2, NSH4, NSH8, and WSH2 had no or almost no polymorphic loci, with % P of 0.00, 0.01%, 0.02%, and 0.02%, respectively. The mean %P across all landraces was 31.5%. The Shannon information index (I) for the landraces ranged from 0 (for EH2) to 0.33 (for WSH8), with a mean of 0.11. The observed heterozygosity (Ho) values were from 0 for landrace EH2 to 0.07 for landrace BL1, with a mean value of 0.011. The expected heterozygosity (He) of the landraces varied from 0 for EH2 to 0.21for NSH8, with a mean value of 0.07. The gene diversity for the landraces ranged from 0 for nine of the 46 landraces to 0.22 for WSH8, with a mean value of 0.07 (Supplementary Table 2).
There was a wide range of variation of molecular diversity of the SNP loci. The Shannon Information Index (I) ranged from 0.02 to 0.26, with a mean value of 0.12 (Supplementary Table 3). The Ho across the loci varied from 0.00 to 0.28, with a mean of 0.01. He and uHe across the loci ranged from 0.01 to 0.18, with a mean value of 0.07 for both indices. The gain (increased He) and loss of heterozygosity (increased Ho) were recorded for 99.9% and 0.1% of the loci, respectively. The fixation indices showed wide variation between the SNP loci. The fixation indices' minimum, maximum, and mean were -0.66, 1.00, and 0.84 for F IS , 0.04, 1.00, and 0.96 for F IT , and 0.39, 0.96, and 0.76 for F ST , respectively (Supplementary Table 3).
The Hardy Weinberg Equilibrium (HWE) test was carried out for all SNP loci for each landrace (population) as well as for all landraces. Almost all of the SNP loci (99.9%) significantly deviated from HWE across landraces (p<0.01). Almost all their loci (99%) significantly deviated (99.9%), thus showing heterozygote deficiency, which is in agreement with the inbreeding reproductive system of durum wheat. Only 0.1% (8 loci) had excess heterozygosity (Supplementary Table 3). Based on the HWE proportion, we categorized the landraces into two subgroups. The first group contains 26 landraces, whose genotypic proportions at most of the SNP loci significantly deviated from the HWE. The second group comprised 18 landraces, and more than half of the SNP markers hold the assumptions of HWE. For example, landraces NSH6, WGM2, NO, NSH2, AR1, and BL1 held the assumptions of HWE for 5074, 2217, 2165, 1951, 1760, and 1299 SNPs markers from respective polymorphic loci within each of these landraces, respectively. For landrace NSH6, 98.8% of loci hold the assumptions of HWE. Landraces WSH2, WSH5, WSH6, and ESH3 revealed only 2 to 3 polymorphic loci out of 8,178 SNP markers. Interestingly, these loci exhibited excess heterozygosity with a significant deviation from the HWE assumption (p<0.05).

Pattern and extent of linkage disequilibrium (LD)
The extent of LD (r 2 ), measured as the squared correlation of alleles at two loci, was estimated based on 7,129 SNPs in durum wheat genotypes since 1,049 SNPs do not have known positions on The density and distribution of the SNP markers used for genotyping in the present study on each durum wheat chromosome. The heatmap scales show the density of the markers per Mbp.
the durum wheat chromosome. Considering the whole genome, 353,600 pairs of SNPs were in LD and 107,471 (30.4%) were significant marker pairs at p<0.01 (r 2 ≥ 0.2; Table 2). The number of significant marker pairs ranged from 5,236 (18.7%) on chromosome 7A to 11,554 (39.3%) on chromosome 3B. The average r 2 value for marker pairs in LD on each chromosome varied from 0.14 (on chromosome 7A) to 0.26 on chromosome 3B (Table 2), with a mean r 2 value of 0.21 for the whole genome. As the physical distance between marker pairs increased on each chromosome, the mean r 2 values of the SNP pairs rapidly declined. The LD decay (at cut-off r 2 = 0.2) of pairs of markers happened within the range of 3.65 Mbp on chromosome 4A to 22.90 Mbp on chromosome 3B, with a mean of 8.56 Mbp across the genome (Figure 3).

Genomic pattern of nucleotide variation
Genome-wide variation and selection signature in Ethiopian durum wheat were examined with nucleotide diversity and Tajima's D. The mean nucleotide diversity (p) per chromosome varied from 0.29 on chromosome 3B to 0.36 on chromosome 1B, with an average p value of 0.33 ( Table 2). Most of each chromosome's pericentromeric regions exhibited a significant loss of variation in nucleotide diversity except for chromosomes 1A, 1B, 6A, and 6B, which exhibited wide variation across their chromosomes. In contrast, the distal regions of each chromosome had high nucleotide diversity (Figure 4), suggesting the presence of balancing selection in these regions. The A genome exhibited higher mean nucleotide diversity than the B genome (Table 2). At the population level, p value varied from 1 × 10 -5 (for population EH2) to 22 × 10 -2 (for populations AR4 and cultivars), with the overall population p, a mean value of 34 × 10 -2 , which indicated a wide genetic variation among the populations (Supplementary Table 2).
The highest (5.06) and lowest (3.5) mean Tajima's D were recorded for chromosomes 1B and 3B, respectively, with Tajima's D mean of 4.4 across the whole genome ( Table 2). The pattern and extent of variation in Tajima Table 2). Using nucleotide diversity (p) and Tajima's D, these results exhibited strong signatures of genetic divergence associated with domestication and breeding on chromosomes 1A, 1B, 6A, and 6B than on other chromosomes of the A and B genomes.
The number of segregating variants at different levels of allele frequency in a population was estimated based on the site frequency    Figures 1A-C). A coalescent analysis approach exhibited a disparity of joint distributions of expected and observed allelic frequency across most individuals in the population except for P4 (WSH3), P19 (JM), P44 (WSH8), and P47 (cultivars), which exhibited moderate matching of both observed and expected allelic distributions (Supplementary Figure 1A-C). The populations' haplotype diversity also ranged from 0.10 for NSH4 to 1.00 for WSH3, WSH8, TG1, and cultivars, being the population-wise haplotype diversity of 1.00.

Selection signatures and identified putative regions
Among the 8,178 informative SNPs used to scan for loci under selection, 85 loci at 1% quantiles (significant at p<0.01) were regarded as loci under selection, covering all 14 chromosomes of the durum wheat genome (Supplementary Table 4; Figure 5A). Of these, 65 loci were outliers with lower F st values ranging from 0.36 to 0.58 and were regarded as candidate loci putatively subjected to under-balancing selection. In contrast, 16 loci have high F st values varying from 0.89 to 0.95 and were putative candidate loci under local directional selection. The putative loci under balancing selection span across all 14 chromosomes, whereas those under directional selection are located on chromosomes 2A, 3A, 5B, 6B, and 7B. Higher numbers of loci under selection were recorded for B genome chromosomes than for A genome chromosomes. Candidate genes located near the selection signatures were identified by searching the genomic regions of loci under selection against the Svevo durum wheat reference genome (Maccaferri et al., 2019) using an interval of ± 8.6 Mbp, which is the average LD decay of the whole genome. Some of the identified candidate genes that are co-localized with the loci under selection are TRITD2Bv1G218450 (heavy metalassociated protein), TRITD2Bv1G029100 (heat shock transcription factor), TRITD3Av1G181000 (E3 ubiquitin-protein ligase SDIR1 G ) , T R I T D 5 B v 1 G 1 6 2 2 5 0 ( s u g a r t r a n s p o r t e r E R D 6 ) , TRITD5Bv1G162180 (disease resistance protein (TIR-NBS-LRR class) family), TRITD3Bv1G028390 (30S ribosomal protein S7), T R I T D 5 B v 1 G 1 5 5 7 7 0 ( 6 0 S r i b o s o m a l p r o t e i n L 3 2 ) , T R I T D 5 B v 1 G 1 9 8 9 4 0 ( p h o t o s y s t e m I I p r o t e i n ) , TRITD5Bv1G236030 (high affinity nitrate transporter), TRITD7Bv1G197270 (MADS-box transcription factor G), TRITD6Bv1G138770 (MYB transcription factor 1), and TRITD7Bv1G165520 (zinc finger CCCH zinc-finger proteins) (Supplementary Table 5).

Population structure and genetic relationship between populations
Principal coordinate analysis (PCoA), UPGMA, and modelbased Bayesian Inference were used to determine the population Scatter plot of genome-wide LD decay against total physical distance (bp) based on the r 2 values of the marker pairs. The horizontal red line represents the half decay r2 value of the genome (r 2 = 0.2). The yellow curve line is the smoothing spline regression model fitted to LD decay. The vertical light green line in bp (8,564,743bp) indicates the intersection between the half decay and the LD decay curve. structure and genetic relationship between the landraces. The first three principal components (PCs) of PCoA explained 67.3% of the total variation, with the first two PCs (PCo1 = 37.79%, PCo2 = 21.50%) capturing 59.29% of the total variation. The PCoA grouped the landraces into five major clusters ( Figure 6A). There was no correlation between the geographical origin of the landrace and their clustering within the first four clusters determined by the PCoA. The fifth cluster contained almost all modern cultivars.
The UPGMA tree, following the average linkage algorithm, agreed with the grouping pattern generated through PCoA analysis and grouped the genotypes into five distinct clusters ( Figure 6B). Cluster 1 includes all modern cultivars (28) and 25 genotypes of populations from Arsi, East Shewa, and West Shewa. Cluster 2 was the second largest cluster containing 170 genotypes (31.91%) of populations from Arsi, Bale, East Shewa, East Gojem, Sidama, North Gonder, East Hararge, West Shewa, North Shewa, North Omo, North, West, and South Wollo. Cluster 3 was the only cluster Genome-wide pattern of nucleotide diversity (ND) and Tajima`s diversity (TD) of all population of 46 durum wheat landraces based on the sliding window approach with a window size of 1000 kbp and jump size of 100 kbp.  Table 1). Bayesian Model-based population structure analysis revealed the highest DK value at K = 2, followed by K = 5, suggesting the optimal biological Inference into two and five subgroups, respectively. The number of clusters of five (K = 5) ( Figure 6C) was then considered optimal since it agreed with the number of clusters obtained through PCoA and cluster analyses. For K = 5, Cluster 1 (Cl-I) comprised 28 cultivars and 33 genotypes from Arsi, West Shewa, North Wollo, and North Gonder populations. Cluster 2 (Cl-II) included 137 (25.65% of the genotypes) populations from Arsi, Bale, East Hararge, East Gojam, North Gonder, North Wollo, West Gojam, West Shewa, West Wollo, Sidama, Tigray, and South Wollo. Cluster 3 (Cl-III) comprised 67 (12.54% of the genotypes) populations from Arsi, East Shewa, North Shewa, West Shewa, West Hararge, and North Omo. Cluster 4 (Cl-IV) was the largest, comprising 218 genotypes (40.82%) of populations from Arsi, Bale, East Gojam, East Hararge, Jimma, North Gonder, North Shewa, South Wollo, Tigray, West Hararge, and West Shewa. Cluster 5 (Cl-V) comprised 51 genotypes (11.42% of the genotypes) of populations West Hararge, West Gojem, North Omo, North Shewa, and Tigray. Compared to PCoA and UPGMA, this Bayesian-based population structure analysis grouped the genotypes slightly better regarding their geographical regions of origin. The analysis to determine whether a genotype is pure or admixed based on the Q value score (Q < 0.75 = admixture, and Q > 0.75 = pure genotypes) revealed that 177 genotypes (149 from landrace landraces and 28 cultivars) were admixed ( Figure 6C).

A B C
The net nucleotide (allelic) divergence among the subgroups inferred by STRUCTURE showed that the highest allelic divergence (0.47) was observed between clusters 1 and 3, whereas the lowest (0.24) was observed among clusters 4 and 5. The average genetic distance between genotypes in the same clusters ranged from 0.01 (Cluster 5) to 0.26 (Cluster 2). The mean expected heterozygosity between genotypes in the same clusters for cluster 1, cluster 3, and cluster 4 was 0.19, 0.10, and 0.13, respectively. The mean F st values of the subgroups varied from 0.53 for cluster 2 to 0.99 for cluster 5. The mean F st values for clusters 1, 3, and 4 were 0.68, 0.83, and 0.79, respectively.

Genetic differentiation of the hierarchical populations and gene flow
Analysis of molecular variance (AMOVA) was used to infer hierarchical genetic differentiation and estimate genetic variation within individuals, within populations, and among populations. The analysis revealed highly significant genetic differentiation Principal coordinate analysis (PCoA) generated based on Nei's unbiased genetic distance, representing the relationship between the genotypes (B) Unweighted pair group method with arithmetic mean (UPGMA) tree showing the genetic relationship, and (C) the population genetic structure of the genotypes at K = 5. The five colors represent the five clusters, and the proportion of each color in each landrace represents the average proportion of the alleles that placed each landrace under the five clusters.
among populations (F st = 0.77, p<0.001), which accounted for 76.68% of the entire genetic variation. Genetic variation among individuals within populations accounted for 20.18% of total genetic variation. The genetic differentiation between groups of populations that were grouped according to their Regional States of origin accounted for 1.18% of the total genetic variation (F CT = 0.012, p< 0.341), 76.66% among populations within the Regional States (F SC = 0.77, p<0.001) and 22.17% among individuals within populations (F st = 0.78, p<0.001), indicating high genetic variation among populations and individuals within the Regional States and absence of genetic differentiation among Regional State-based groups. AMOVA carried out by grouping the populations according to their geographical locations of origin revealed that 75.42% of the entire variation exists among populations within geographical regions of origin (F SC = 0.77, p<0.001), 23.04% among individuals within populations (F st : 0.76, p<0.001) and 1.54% among geographical regions of origin (F CT = 0.02, p=0.312).
According to the AMOVA for the five STRUCTURE-based subpopulations, 44.50% of the total genetic variation was found between the five subpopulations and 52.62% among individuals within the subpopulations (  Figure 5B; Supplementary Table 6). The historical rates of gene flow (Nm) for pairs of populations varied from 0 to 534.2, with a mean value of 0.85 (Supplementary Table 7). Of the populations considered in this study, WG, AR4, NSH8, SW3, NSH1, and WSH1 were the most distinct ( Figure 5B; Supplementary Table 6). In contrast, NSH5, NG3, and WSH7 were the least differentiated populations across all pairs (F st = 0.47). Wide variation and significant Nei's mean number of pairwise differences between populations (p xy ) were revealed for all population pairs, except for NSH1 vs. BL2, WH vs. NO, NSH7 vs. AR3 and WH vs. NSH7 ( Figure 5C, Green above diagonal, Supplementary Table 8). The Nei's mean number of pairwise differences (p) within the populations varied from 0 (WH2) to 1861.99 (modern cultivars), thereby suggesting large differences between the populations according to their within-population genetic variation ( Figure 5B, diagonal, Supplementary Table 8).

Levels of SNP polymorphism
Durum wheat landraces have been grown for thousands of years and have been subjected to natural and human selection, resulting in their adaptation to various environmental conditions (Mengistu et al., 2016;Baloch et al., 2017). Locally adapted germplasm, however, have been lost sporadically due to their replacement by new cultivars developed through modern breeding for specific traits (Mengistu et al., 2016;Pont et al., 2019;Mazzucotelli et al., 2020;Sansaloni et al., 2020;Sthapit et al., 2020). Hence, this scenario demands revisiting the crop's wild relatives and landraces, which are the primary genetic sources for transferring valuable alleles required to boost genetic variation in the cultivars, to cope with unpredictable challenges arising from changing climates (Kabbaj et al., 2017;Kilian et al., 2020;Adhikari et al., 2022). This study has provided a more profound insight into the population structure and genetic relationships in durum wheat gene pools collected from different eco-geographic regions of Ethiopia.
The physical distribution of selected SNPs was revealed in this study, with the highest number of SNPs present in the B genome than in the A genome. Previous research also revealed more SNPs on the B genome than on the A genome in the genetic diversity study of durum wheat (Alipour et al., 2017;Baloch et al., 2017;Kabbaj et al., 2017;Rufo et al., 2019;Alemu et al., 2020;Negisho et al., 2021). However, gene diversity and PIC indices were not significantly different between the A and B genomes regardless of the fact that Ethiopian durum wheat collections showed a high level of genetic variation. The result suggests that the average mutation rates of the A and B genomes in Ethiopian durum wheat landraces are comparable. The data support previous research findings on Ethiopian durum wheat landraces and cultivars (Mengistu et al., 2016;Alemu et al., 2020).
Compared to some previous research, the present study showed high mean gene diversity (0.34) and PIC (0.41), indicating the high genetic variation in Ethiopian durum wheat, which might have arisen due to crucial evolutionary forces such as mutation rate, natural selection, linked selection, population history, and demographic history. Previous research (Harlan,1969;Pecetti et al.,1992;Mengistu et al., 2015;Kabbaj et al., 2017) reported the uniqueness and high genetic diversity in Ethiopian durum wheat landraces compared to germplasm sources from different sites, which could be attributed due to the long-term separation of Ethiopian durum wheat landraces from primary sources of origin and internal germplasms sources. For instance, Alemu et al. (2020) reported mean gene diversity and PIC of 0.25 and 0.20, respectively, using 192 Ethiopian durum wheat landraces consisting of 167 landraces and 25 modern cultivars genotyped with 15,338 SNP markers. Likewise, Ren et al. (2013) reported mean gene diversity and PIC of 0.22 and 0.18 using 150 worldwide durum wheat landraces genotyped with 1,536 SNP markers. In other research on durum wheat germplasm diversity, lower magnitudes of gene diversity and PIC were noted compared to those obtained in the present study (Baloch et al., 2017;Kabbaj et al., 2017;Rufo et al., 2019;Mahboubi et al., 2020;Mazzucotelli et al., 2020).
The Ethiopian durum wheat gene pool exhibits high mean gene diversity and PIC values at the A subgenome, B subgenome, and whole genome levels. These results are in line with previous research that showed high genetic diversity in Ethiopian durum wheat germplasm (Mengistu et al., 2018;Alemu et al., 2020;Negisho et al., 2021). There is also a widely accepted understanding by several scholars that broad adaptation of germplasm to different agroecology, diverse farmers' agricultural practices, and natural cross-pollination facilitated by farmers' practices of planting mixed genotypes could have resulted in high genetic diversity (Peterson et al., 2014;Mengistu et al., 2015;Alemu et al., 2020).

Magnitude and pattern of within populations allelic diversity
Genetic diversity parameters mean of GD (0.10), I (0.11), %P (30.00%), and He (0.07) of the loci recorded low variation within the durum wheat landraces and is by far below those reported previously (Mengistu et al., 2016;Alemu et al., 2020;Negisho et al., 2021). The differences could be attributed to differences in sample size as well as differences in genetic background between the landraces used in this study and those used in previous studies. The low diversity within accessions of most of the landraces is primarily due to the fact their alleles were fixed across most of the loci. Hence, a single genotype could potentially provide sufficient genetic information in such accessions. However, some landraces (15 of those included in this study) showed high genetic variation within the accessions. Since genetic information generated based on a single plant of such landraces cannot sufficiently explain their genetic makeup, each of them should be represented by multiple individuals in genomic research to draw acceptable conclusions. The low estimate of mean gene flow (0.08) and broad variation in fixation indices (F IS , F IT , and F st ) suggest a high degree of genetic differentiation among the landraces and limited gene exchange, as reported previously (Rufo et al., 2019;Mourad et al., 2020;Negisho et al., 2021). Low within-landrace genetic variation and wide variation in fixation indices were also reported in sorghum landraces from Ethiopia (Enyew et al., 2022).
A Hardy Weinberg Equilibrium (HWE) test is a widely used approach to estimate allelic and genotype frequencies in populations, thereby providing crucial information regarding reproductive mechanisms as well as the different evolutionary forces shaping their genetic makeups. The HWE test for individual landraces revealed that the vast majority of the loci are not in HWE. This is not surprising, as durum wheat reproduces primarily through self-fertilization (Hucl and Matus-Cadiz, 2001). Several evolutionary factors could also influence this result, including gene flow, natural and artificial selection, mutation, population size, and different degrees of outcrossing. However, for some landrace populations, including NSH6, WGM2, NO, NSH2, AR1, and BL1, more than half of the polymorphic loci did not significantly deviate from HWE. These indicate the need for further research to gain deeper insight into the diversity in the reproductive mechanisms of durum wheat. Several research findings indicate that the outcrossing rates of durum wheat range from 0 to 6.7% (Hucl and Matus-Cadiz, 2001).

Pattern and extent of linkage disequilibrium (LD)
Determining the extent, pattern, and distribution of LD throughout the durum wheat genome provides crucial information necessary to define inherited genomic regions (Sajjad et al., 2012;Roncallo et al., 2021). Furthermore, the extent and pattern of LD in germplasm guide the mapping resolution of targeted genomic regions and the strategies to decide whether to use coarse mapping based on a set of less diverse germplasm with lower SNP markers or fine mapping with a higher number of markers based on a set of genetically diverse germplasm (Gaut and Long, 2003;Sajjad et al., 2012). LD has been estimated using several types of DNA markers in durum wheat (Maccaferri et al., 2005;Laidò et al., 2014;Taranto et al., 2020). This study revealed 30.39% (r 2 ≥ 0.2, p<0.01) significant SNP pairs across the durum wheat genome, a considerably higher percentage in comparison with the 13.4% (p<0.01) reported by Roncallo et al. (2021), 27.6% (p<0.01) by Mekonnen et al. (2021), and 19.8% (p<0.01) by Mulugeta et al. (2023).
Compared to previous research (Alemu et al., 2020;Mekonnen et al., 2021), a high genomic mean r 2 = 0.21 (all linked SNP pairs in LD, p<0.01) was estimated for the entire durum wheat set used in this study, including both landraces and cultivars. These results demonstrate the influence of significant elements of LD because of genetic linkage and the residual LD that might arise due to factors such as selection, rate of genetic recombination, and evolutionary history, leading to high genetic diversity (Fayaz et al., 2019;Roncallo et al., 2021). In agreement with previous research on the pattern and extent of LD in durum wheat (Maccaferri et al., 2019;Alemu et al., 2020;Taranto et al., 2020;Roncallo et al., 2021), this study also revealed distinct variation in the pattern and LD decay distances across each of the chromosomes and genomic regions of durum wheat.
The LD decay (at cut-off r 2 = 0.2) declined within the physical distance varying from 3.65 Mbp (chromosome 4A) to 22.90 Mbp (chromosome 3B), with a mean of 8.56 Mbp across the genome is comparable with previous research, i.e., 11.8 Mbp by Roncallo et al. (2021), 9.6 Mbp by Wang et al. (2019), and 9.96 Mbp by Taranto et al. (2020). However, this result is far below the previous report by Alemu et al. (2020) using Ethiopian durum wheat landraces (69.1 Mbp) and Bassi et al. (2019) using three different sets of durum wheat germplasm (51.3 Mbp). The differences could arise from the type and density of markers covering genomic regions and evolutionary forces acting on the germplasm.

Pattern of nucleotide variation across the genome
The high nucleotide diversity (p) and Tajima's D revealed in this study suggest substantial genetic variation in Ethiopian durum wheat populations. The mean p and Tajima's D values across the whole genome of 0.33 and 4.43, respectively, are high compared to several previous reports (Akhunov et al., 2010;Cavanagh et al., 2013;Liu et al., 2019). Reduced levels of genetic diversity were observed in the pericentromeric regions of most of the chromosomes except in chromosomes 1A, 1B, 6A, and 6B. These are similar to the reports of a genome-wide diversity scan of durum germplasm by Akhunov et al. (2010); Maccaferri et al. (2019), and Liu et al. (2019). However, chromosomes 1A, 1B, 6A, and 6B showed widespread variation across genomic regions suggesting that the influence of intense selection and domestication pressures on these chromosomes is minimal. The distal regions of all chromosomes showed higher genomic variation than the proximal regions and indicated the occurrence of balancing selections in these regions, in agreement with previous research in wheat (Zhou et al., 2018;Liu et al., 2019;Maccaferri et al., 2019;Gaire et al., 2020;Mazzucotelli et al., 2020). Zhou et al. (2018) indicated that near or in the centromeric regions, there is nearly 0 gene content and meiotic recombination in cereals' chromosomes, thus resulting in low genetic variation in the regions.

Selection signatures and associated putative genes
Previous research indicated that the selection scan approach based on the genetic differentiation (Fst ) outlier test is suitable to identify genomic regions subjected to selection signatures because it is not strongly influenced by ascertainment bias (Foll and Gaggiotti, 2008;Cavanagh et al., 2013). The F st outlier test identified 85 selection signatures that spread across all chromosomes. However, the number of selection signatures identified in this study is far below the signals revealed in previous investigations in wheat, thereby indicating that the influence of selection during or after domestication by farmers and breeding on Ethiopian durum wheat landraces is low when compared to germplasm from other parts of the world. For instance, Liu et al. (2019), using 687 Chinese and Pakistan landraces and cultivars genotyped with a 90K SNP array, found 268, 318, and 109 genomic regions in germplasm from China, Pakistan, and both, respectively. Zhou et al. (2018) also identified 148 loci associated with grain yield and host plant tolerance to pathogens using 717 Chinese wheat landraces genotyped with 27,933 DArT and 312,831 SNP markers. Additionally, Cavanagh et al. (2013) observed 308 loci associated with yield potential, vernalization, and plant height based on 2,994 wheat germplasm genotyped with 6,305 SNPs.
Consistent with previous research (Zhou et al., 2018;Liu et al., 2019), more selection signatures were identified on the B genome than on the A genome in this study. This indicates that the B genome carries more adaptation, agronomic, and domestication trait-related genes than the A genome. Likewise, this shows that the selection pressure that influenced the B genome during or after domestication by farmers and breeders was stronger than its influence on the A genome. The putative candidate genes identified near or within the regions under selection were associated with several desirable traits in wheat. Several known quantitative trait loci (QTL) for grain yield (Roncallo et al., 2018), plant height (Roncallo et al., 2017), leaf rust resistance (Aoun et al., 2016), yellow rust resistance , stem rust resistance (Letta et al., 2014), primary root length and heading date (Maccaferri et al., 2008;Maccaferri et al., 2016;Giunta et al., 2018), grain protein content (Suprayogi et al., 2009), test weight (Canè et al., 2014), grain b-glucan content (Marcotuli et al., 2017), and phenolic acid contents (Nigro et al., 2017) were found to be colocalized and associated with the genomic regions influenced by selection signatures as revealed in this study.

Genetic population structure and relationship
A fundamental component of harnessing genetic diversity is understanding the genetic population structure, which provides crucial information regarding available genetic resources, thereby contributing to the development of future conservation strategies and broadening the genetic base of crops (Eltaher et al., 2018;Tehseen et al., 2022). The model-based clustering using STRUCTURE revealed the highest delta K (DK) at K = 2, followed by K = 5, thereby suggesting a possible number of subpopulations. As previously reported, if a value of K = 2 is found in STRUCTURE analyses, it may indicate the inability of the STRUCTURE algorithm to estimate the population structure appropriately (Janes et al., 2017;Tehseen et al., 2022). Hence, we chose K = 5 as an optimal number of subpopulations representing the 528 genotypes, which showed up to 80% concordance with the PCoA and UPGMA-based analyses.
The grouping of the diverse landraces into five distinct clusters using the PCoA, UPGMA, and STRUCTURE suggests that they had evolved from different gene pools or they are the results of independent events shaped by different evolutionary forces (genetic drift, mutation, migration, selection, and in flux/out flux of genes in the form of germplasm exchange) that separated them into different gene pools. UPGMA tree cluster 1 (Cl-I) comprised 25 landraces grouped together with all modern cultivars. This could have be caused by the fact that some farmers practice planting mixed genotypes, allowing cross-pollination between cultivars and landraces. Another probable reason could be that cultivars were be mistakenly classified as landraces during the germplasm collecting mission or that they are admixed germplasm. Negisho et al. (2021) obtained similar results using 285 durum wheat landraces. The admixture level in this cluster was high, thus indicating that almost all breeding programs in Ethiopia utilized germplasm obtained from the Centro Internacional de Mejoramiento de Maıź y Trigo (CIMMYT, Mexico) and the International Center for Agricultural Research in the Dry Areas (ICARDA, Syria) as a source of desirable genotypes in the variety development pipeline to broaden the genetic basis of national breeding programs.

Genetic differentiation of the hierarchical populations
AMOVA indicated significant genetic differences among landraces, showing that genetic variation between populations is more significant than genetic variation within populations. Observed genetic variation among individuals within landraces might have occurred during domestications or might have been caused by seed exchange among farmers and local traders from adjoining and nonadjacent regions. Alemu et al. (2020) found higher genetic variation between the two groups (61.02%) than among individuals within the group (38.98%) using 167 landraces and 25 cultivars from Ethiopia. Similarly, Kabbaj et al. (2017) and Roncallo et al. (2021) reported higher genetic variation between sub-populations than among individuals within subpopulations using different durum wheat populations.

The implication of this study for durum wheat breeding
Genetic characterization of the diverse set of durum wheat germplasm provided a sound insight into the population structure and genetic diversity of Ethiopian durum wheat gene pool as well as the genetic linkages between the SNP markers along its chromosomes. The information provided here facilitates the identification of beneficial loci and useful alleles that will aid in the development of more resilient durum wheat cultivars capable of coping with climate change challenges and ensuring durum wheat's significant role in sustainable food security. These accumulated beneficial genetic variants of Ethiopian durum wheat could also help breeders to exploit available genetic variation more efficiently, optimizing future yield potential in more sustainable production systems and driving further discovery and deployment of beneficial alleles. The genetic analyses based on LD, GD, ND, Tajima's D, and loci under selection revealed key genomic information, including apparent differences among the landraces. This provides a basis for future conservation of the crop's genetic resources and breeding efforts to improve the crop.

Conclusion
The Illumina Infinium 25k wheat SNP array was used for genotyping 528 Ethiopian durum wheat to assess genetic diversity and population structure, determine LD, and uncover selection signatures related to domestication and breeding. High nucleotide diversity and Tajima's D were observed at distal regions than pericentromeric regions (nearly zero diversity) of the chromosomes except for 1A, 1B, 6A, and 6B, which showed high diversity across their entire regions indicating the influence of selection during domestication by farmers and breeders for specific traits. Loci found under balancing selection spanned over all 14 durum chromosomes, whereas those under directional selection were distributed across 2A, 3A, 5B, 6B, and 7B chromosomes. Interestingly, genomic regions previously reported to impact grain yield, days to heading, grain quality, and disease resistance have been confirmed in this study. Hence, our results showed Ethiopian durum wheat germplasm's high genetic diversity and untapped potential, which can be explored to discover novel genes for broadening the gene pool to develop climate-resilient cultivars. We recommend that Durum wheat breeders should strive to use these genetic materials to develop improved cultivars through fine mapping of genetically complex traits like grain yield and enduse quality traits, thereby maintaining yield stability, genetic gain, and adaptation to specific biotic and abiotic factors.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Funding
The study was funded by the Swedish International Development Cooperation Agency (Sida) grant awarded to Addis Ababa University and the Swedish University of Agricultural Sciences for a bilateral capacity-building program in biotechnology. The funding information is available on "https:// sida.aau.edu.et/index.php/biotechnology-phdprogram/; accessed on May 21, 2022". The funders played no role in the design of the study, data collection, analysis, decision to publish, or preparation of the manuscript.