A Comparison of Microsatellites in Phytopathogenic Aspergillus Species in Order to Develop Markers for the Assessment of Genetic Diversity among Its Isolates

The occurrence of Microsatellites (SSRs) has been witnessed in most of the fungal genomes however its abundance varies across species. In the present study, we analyzed the frequency of SSRs in the whole genome and transcripts of two phyto-pathogenic (Aspergillus niger and Aspergillus terreus) and compared them with two non-pathogenic (Aspergillus nidulans and Aspergillus oryzae) Aspergillus. Higher relative abundance and relative density of SSRs were observed in the whole genome and transcript sequences of the pathogenic Aspergillus when compared to the non-pathogenic. The relative abundance and density of SSRs were positively correlated with the G+C content of transcripts. Among the different classes of SSR, the percentage of tetra-nucleotide SSRs were maximum in A. niger (36.7%) and A. oryzae (35.9%) whereas A. nidulans and A. terreus preferred tri-nucleotide SSRs (38.2 and 42.1%) in whole genome sequences. In transcripts, tri-nucleotide SSRs were the most abundant whereas di-nucleotide SSRs were the least favored. Motif conservation study among the transcripts revealed conservation of only 27% motif within Aspergillus species. Furthermore, a similar relationship among the Ascomycetes was obtained on the basis of motif conservation and conserved genes (rDNA). To analyze the diversity present within the Indian isolates of Aspergillus, primers were successfully designed for 692 motifs in A. niger and A. terreus of which 20 were selected for diversity analysis. Among all the markers amplified, 10 markers (83.3%) were polymorphic, whereas remaining two markers (16.6%) were monomorphic. Ten polymorphic markers acquired in this investigation showed the utility of recently created SSR markers in the assessment of genetic diversity among various isolates of Aspergillus.


INTRODUCTION
Species of the filamentous fungal genus Aspergillus displays a wide variation and are of high significance to people (Gibbons and Rokas, 2013). Among many species, Aspergillus niger is a soil saprobe and widely used in fermentation industry for the production of enzymes and organic acids (Pel et al., 2007). It is also responsible for the degradation of various organic substances including fruits, vegetables, beans, and cereals (Baker, 2006). In India, A. niger has been reported for causing necrotic leaf spot disease in Zingiber officinale (Pawar et al., 2008). Aspergillus terreus has been widely known for the production of Lovastatin, a polyketide derivative used for lowering cholesterol. This fungus is having a reputation for being the third most common reason of invasive aspergillosis in humans. Mycotoxins produced by it causes food spoilage in several cereals and nuts grown in tropical and subtropical climates (Kuck et al., 2014). A recent report suggests its involvement in causing foliar blight of potato in India (Louis et al., 2013).
The genome of this genus is widely studied, curated, and annotated under the Aspergillus Genome Database (AspGD) where the whole genome sequences of several members of this genus are publically available (Arnaud et al., 2012). With the availability of the genome sequences from various species along with the development of many bioinformatics tools, it is now possible for the researchers to use the sequence information for various purposes. These tools provide an ease for developing high throughput in-silico methods for the better understanding and characterization of the Aspergillus population and developing markers for their identification.
Though the molecular marker technologies such as random amplified polymorphic DNA (RAPD), restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), and inter simple sequence repeats (ISSR) were utilized in Aspergillus yet they tend to exhibit reproducibility issues and were insufficient in evaluating intra species variation (Semighini et al., 2001;Baddley et al., 2003;Schmidt et al., 2004;Kermani et al., 2016).
Microsatellites or Simple Sequence Repeats (SSRs) are the parts of the genome comprising a succession of repeats of a given series of nucleotides having lengths from one to six bases. A large instability in the number of repeats is witnessed due to the addition or deletion of repeated units prompting variation in the number of motifs (Gur-Arie et al., 2000;Lim et al., 2004;Olango et al., 2015). Microsatellites can be found in the protein-coding Garnica et al., 2006;Lawson and Zhang, 2006;Mahfooz et al., 2012a) and non-coding regions of the genome (Kim et al., 2008;Araujo et al., 2012). Microsatellite loci show extensive length polymorphism, and hence they are widely used for DNA fingerprinting and diversity studies in bacteria (Mrazek et al., 2007;Guo and Mrazek, 2008), fungi (Kim et al., 2008;Araujo et al., 2012;Mahfooz et al., 2012a,b), plants (Datta et al., 2010;Yu et al., 2017), and human (Subramanian et al., 2003;Shin et al., 2017). The utility of microsatellites as a molecular marker is well-known, however, its presence and absence in a particular species are of great functional and evolutionary significance (Gibbons and Rokas, 2009;Mahfooz et al., 2015Mahfooz et al., , 2016. In spite of the fact that the genome sequences of Aspergillus species are freely available, SSRs were analyzed only in intergenic sequences (Gibbons and Rokas, 2009), leaving the remaining portion of the genome unexplored. In the present study we wanted to address (1) whether there is any difference between the frequency and distribution among phytopathogenic SSRs (2) the strength of phylogenetic signal at the species level (3) the level of motif conservation among Aspergillus species, and (4) the level of motif conservation among Ascomycota. We observed that the frequency of SSRs was higher in the phyto-pathogenic SSRs as compared to its non-pathogenic neighbor. Primers were designed and validated for their ability to revealed polymorphism in Indian isolates of Aspergillus isolated from different hosts.

SSR Mining
The entire genome sequence of A. niger, A. terreus, A. nidulans, and A. oryzae were downloaded from the Aspergillus Genome Database (AspGD; http://www.aspgd.org; Arnaud et al., 2012). The scanning of microsatellites was performed using online tool WebSat (Martins et al., 2009). Repeats more than 12 bp were considered as SSRs, which means that only more than six occurrences of a di-nucleotide repeats, four occurrences of a tri-nucleotide repeats, three occurrence of a tetra-, penta-, and hexa-nucleotide repeats will be considered as SSRs. All SSRs were analyzed for their frequency of occurrence, density, and relative abundance. Density was calculated by dividing the number of base pairs contributed by each SSR by total length analyzed (Mb). Relative abundance was calculated as the number of SSRs per Mb of a sequence. While scanning di-to hexa-nucleotide SSRs, combinations involving runs of the same nucleotide were considered. In the current analysis, each SSR was considered as unique.
For a superior comparison of the developmental relationship among the Aspergillus species, sharing of repeats was analyzed within the transcribed sequences. As previously reported (Mahfooz et al., 2015), motif sharing within the transcripts of Aspergillus species was investigated manually in Microsoft Excel workbook, 2007. Every motif obtained in the transcripts of each species was placed in Microsoft Excel sheet and searched for its counterpart in the transcripts of remaining species. If motif was available in all the four transcripts, it was deemed as common. Thus, the motifs shared between two and three transcript sequences were also analyzed. The motif which did not have any match was considered as novel. PRIMER 3 online software program (frodo.wi.mit.edu/) was used to design primers complementary to the flanking regions of microsatellites. We expected the primers to be 18-24 bp in length, with annealing temperature in the range of 54-62 • C, and product lengths between 150 and 400 bp. A total of 2,169 primers were designed from the four Aspergillus species (Supplementary Table 1). Online software GC content calculator (http://www.sciencebuddies.org/science-fair-ojects/ project_ideas/Genom_GC_Calcu-lator.shtml) was used to calculate the G+C content of the genomes and transcripts. Pearson correlation coefficient was calculated using SPSS package (SPSS V16.0, SPSS Inc., Chicago, IL, USA).

Fungal Isolates
A total of 23 different Aspergillus isolates representing 11 each from A. niger and A. terreus along with one from A. nidulans was obtained from Department of Plant Pathology, Indian Agricultural Research Institute, Pusa, New Delhi, India (Supplementary Table 2). These isolates represent the diverse agro-climatic zones of India.

DNA Isolation and SSR Amplification
Total genomic DNA from 23 Aspergillus isolates was extracted using HiPurA TM Fungal DNA Purification Kit (HIMEDIA, India). The PCR was performed in 10.0 µl reaction volume containing PCR buffer (10 mM Tris-HCl pH 9.0, 1.5 mM MgCl 2 , 50 mM KCl, 0.01% gelatine), 200 µM each of dNTP (Merck), 0.2 U of Taq DNA polymerase (Merck), 10 pM each of forward and reverse primers, and 10 ng of genomic DNA was used as template in each PCR reaction. PCR program was as follows: after initial denaturation at 95 • C for 3 min, five touch-down PCR cycles comprising of 94 • C for 20 s, 60/55 • C for 20 s, and 72 • C for 30 s were performed. These cycles were subsequently followed by 40 cycles of denaturation at 94 • C for 20 s with a constant annealing temperature of 54-60 • C (depending on primer) for 20 s, and extension at 72 • C for 20 s, and a final extension at 72 • C for 20 min. All PCR amplicons were resolved by electrophoresis on 2% agarose gel to identify the informative SSR loci across all the isolates.

Statistical Analysis
The amplification information produced by SSR primers was examined using SIMQUAL (Nei and Li, 1979) to create a Jaccard's similarity coefficient utilizing NTSYS-PC, programming version 2.1. These similarity coefficients were utilized to develop a dendrogram delineating hereditary relationship among the Aspergillus isolates by utilizing the Unweighted Paired Group Method of Arithmetic Averages (UPGMA). The allelic differences or polymorphism information content (PIC) was measured as described by Botstein et al. (1980). PIC is characterized as the likelihood that two randomly picked duplicates of a gene will represent different alleles within a population. The PIC value was calculated with the equation as follows: where P i represents the frequency of the j th pattern for marker i, and summation extends over n patterns. A phylogenetic tree of the 18S rRNA gene was also constructed using Clustal W programme in the MEGA 5.2 software (Treangen and Messeguer, 2006) using the neighbor-joining algorithm with bootstrap analysis for 1,000 replicates.

RESULTS
Relative Abundance and Density of SSRs in the Genomic Sequences of Aspergillus spp.
The maximum frequency of SSRs in whole genome sequences was identified in the genome of A. niger (8,896) which was much higher when compared with A. oryzae (5,226), A. terreus (4,823), and A. nidulans (2,919). The data suggested that A. oryzae which had the largest genome size contains the second highest frequency of SSRs whereas A. niger which had the second largest genome harbors maximum SSRs. Genome size may impact the frequency of SSRs, hence we have estimated the SSRs by taking 1 Mb length of each set of sequences analyzed as a reference. In this way, total relative abundance and total relative density were calculated. While maintaining the position of A. niger (256.4 and 1059.6) with the maximum frequency of SSRs, this promoted A. terreus (161.3 and 636.8) to the second place ahead of A. oryzae (140.8625 and 568.4367; Table 1). We further analyzed the percentage of different classes of repeats in their respective genomes. In A. nidulans and A. terreus, tri-nucleotide repeats constituted the maximum percentage of SSRs (38.25 and 42.15%) followed by tetra-nucleotide repeats (32.0 and 30.7%) while dinucleotide repeats were the least (6.3 and 8.1%; Table 2). In A. niger and A. oryzae, tetra-nucleotide repeats (36.7 and 35.9%) were the most abundant repeats which were closely followed by tri-nucleotide repeats (35.7 and 30.1%), hexa-nucleotide repeats constituted the minimum number of repeats (7.5 and 10.0%). While comparing the most abundant motif, we observed that tri-nucleotide motif aag/ctt was the most favored motif in the genome of A. nidulans. Similarly, A. terreus genome showed preference for another tri-nucleotide repeat cgc/gcg (2.52%). Dinucleotide repeat motif ga/tc (2.92%) was the most abundant motif in A. niger genome whereas A. oryzae preferred tetranucleotide repeat motif gaaa/tttc (2.64%; Table 3).

Relative Abundance and Density of SSRs in the Transcripts of Aspergillus spp.
In genic sequences, the maximum frequency of SSRs was observed in A. niger (935) transcripts which were followed by A. terreus (742) and A. oryzae (550). The relative abundance and relative density of SSRs also follow the same pattern as highest relative abundance and relative density of SSRs was observed in A. niger (55.6 and 811.0) while it was found lowest in A. oryzae (33.3 and 495.1; Table 1). While comparing the different classes of SSRs, we observed that the percentage of tri-nucleotide motifs were undoubtedly highest in all the transcripts. Hexa-nucleotide repeats were the second highest motifs in A. nidulans whereas in A. niger, A. oryzae, and A. terreus it was tetra-nucleotide repeats. Di-nucleotide motifs were the least abundant motifs in the transcripts (Supplementary Table 3). Overall, the frequency of SSR in the transcripts is much lower when compared to other Ascomycetes (Mahfooz et al., 2015(Mahfooz et al., , 2016. Analysis of most common repeats reveals preference of specific trinucleotide motifs in the transcripts of Aspergillus species. Motif aag/ctt was the most preferred motif in A. nidulans and A. oryzae however its percentage was significantly higher in A. oryae (6.60%) as compared to A. nidulans (5.22%). In remaining species, A. niger preferred cag/ctg motifs (5.30%) whereas A. terreus preferred ccg/cgg motifs (6.32%; Supplementary Table 4).

Conservation of Motifs among Aspergillus spp.
To analyze the developmental relationship among the Aspergillus species and to recognize unique motifs, every motif was examined for its counterpart in the transcripts of remaining species. The greatest number of motifs shared between all four transcripts was tri-nucleotide repeats (168, 84.8%) which were followed by dinucleotide repeats (8, 33.3%), and tetra-nucleotide repeats (76, 20.1%; Figure 1A). Interestingly, none of the penta-nucleotide motifs were found to be shared within the transcripts of four species studied. Among the unique motifs, maximum unique motifs were observed as penta-nucleotide repeats in A. nidulans (Supplementary  Figure 1B). Conservation of motifs was further analyzed at genus level among the members of Ascomycota. We have incorporated the most common motifs of Fusarium and Trichoderma from our previous analysis along with the common motifs identified in this study. As expected, tri-nucleotide repeats were the most common class of repeats shared between the three genera (Figure 2A), this was followed by tetra-nucleotide repeats. Interestingly, di-, penta-, and hexa-nucleotide repeats did not exhibit any conservation among the three genera. Further analysis of motif sharing among all classes revealed 20.3% conservation. Maximum conservation of motifs was witnessed among Trichoderma and Fusarium (18.7%) which was followed by Aspergillus and Trichoderma (7.2%). It is noteworthy that not a single motif was conserved within Aspergillus and Fusarium ( Figure 2B). The maximum numbers of unique motifs were identified in Trichoderma while Aspergillus showed the least. In addition to the relationship we obtained through the conservation of motifs, we wanted to compare this relationship with the conserved region (18S) based phylogeny. To our surprise, both the methodologies resulted in an almost similar relationship within the Ascomycetes (Figure 2C).

Codon Usage
Tri-nucleotide repeats in the transcripts have maximum chances of translation into protein. We analyzed all the trinucleotide repeats in order to get an insight of amino acids encoded by them. In A. nidulans and A. terreus, arginine coding motifs were in maximum whereas, in A. niger, glutamine coding motifs were in abundance. A. oryzae showed a preference for serine coding motifs. On the basis of amino acids encoded by different tri-nucleotide motifs, we intended to deduce a relationship among the four species. For this, we performed principal component analysis, a mathematical algorithm that reduces the dimensionality of the data while retaining most of the variation in the data set. The percentage of variance explained by the first component was 44.8% whereas it was 16.5% for the second (Figure 3). The PCA plot showed close clustering among A. niger, A. nidulans, and A. oryzae. The preference of amino acid is different in A. terreus as it clustered separately in the PCA plot.

Diversity Assessment among Different Isolates of Aspergillus
Out of 21, an aggregate of 12 SSR markers (six each from A. niger and A. terreus) amplified easily scorable amplicons running from 70 to 400 bp in all the isolates. Ten tri-nucleotide repeats and two di-nucleotide repeats were successfully amplified.
Percentage polymorphism, number of alleles per locus and PIC value was utilized to demonstrate SSR polymorphism level. Among all the amplified markers, 10 markers (83.3%) were polymorphic, while rest two markers (16.6%) were monomorphic. A total of 22 alleles were amplified by 12 markers. We identified 1-4 alleles for every microsatellite locus with an average of 1.83 alleles for each marker. A. niger primers amplified 12 alleles with 2.0 alleles for every locus, while A. terreus primers amplified 10.0 alleles with 1.6 alleles for every locus. The highest number of alleles (4) were amplified by primer At539, while least of one allele was amplified with five markers viz. An868, At193, At257, At648, At660 ( Table 4).
The coefficient values between isolates extended from 0.28 to 1.0 with a mean of 0.62 for each of the 276 isolates combination utilized as a part of the present investigation. For microsatellite markers obtained from A. niger, the similarity coefficient value between isolates ranges from 0.54 to 1.00 with an average genetic diversity of 33.1%. Similarly, with A. terreus SSR markers, the similarity coefficients between isolates ranges from 0.69 to 1.00 with 34.5% genetic diversity ( Table 5).
The most elevated similarity coefficient (1.0) was seen between A. terreus isolates At2167-At2457, At6369-At6514, At6544-At6369, and At6514-At6544 whereas the most diverse (similarity coefficient value 0.29) isolates were An423 and At5564. The dendrogram constructed based on similarity index resulted in two main clusters A and B. Most of the isolates from A. niger, A. nidulans, and A. terreus grouped together in cluster A whereas cluster B contained only A. niger isolates. Cluster A was further subdivided into subgroups 1A and 2A. 1A comprised exclusively of A. niger isolates, whereas 2A contained a majority of A. terreus isolates (Figures 4A,B).

DISCUSSION
The members of genus Aspergillus is having the reputation of being the most diverse. It has been reported that the most closely related species are as divergent as human and mice (Machida et al., 2005;Fedorova et al., 2008). This divergence is evident in the large variation among the frequency of microsatellite obtained in our study as well. The occurrence of significantly higher frequency of SSR in A. niger was surprising. In earlier studies, it has been reported that the frequency of SSRs is positively correlated with the G+C content of the genome (Tian et al., 2011;Mahfooz et al., 2016) however this is not true for Aspergillus where no such correlation was observed. This uneven frequency of SSR distribution was also observed among the species of Drosophylla as well (Ross et al., 2003). The most probable reason for the higher SSR frequency in A. niger is the presence of a large number of tetra-and di-nucleotide repeats in the whole genome, however, this also fall short in explaining the higher frequency of SSRs in A. niger. We further analyzed the frequency of SSR in the transcripts of Aspergillus species where the frequency of SSRs was found to be positively correlated with the G+C content of transcripts. Although, we obtained a weak correlation value (r 2 = 0.247), this might explain the difference in frequency of SSRs among the transcripts. We further noticed a significantly lower frequency of SSRs in the transcripts of A. oryzae as compared to other species. This was interesting as A. oryzae has been reported to display two-fold higher rate of insertion which is in parallel with its largest genome size (Machida et al., 2005). The most potential explanation of lower frequency of SSRs in the A. oryzae transcript is the acquisition of lineage-specific sequences, since we are estimating the frequency of SSR per Mb of transcripts, the presence of extra sequences might have diluted the frequency of SSRs. It is evident from the results that pathogenic species of Aspergillus contained more repeats as compared to the non-pathogenic one in both whole genomes as well as in the transcripts. It has been reported that in pathogens, SSRs can improve antigenic fluctuation of the pathogen population in a procedure that balances the host immune response (Mrazek et al., 2007).
Tri-nucleotide SSRs were unanimously the most abundant class of SSRs in the transcripts of Aspergillus species. The higher abundance of tri-nucleotide SSRs in the transcripts is expected as any expansion or contraction within these repeats did not disturb the reading frame, hence these repeats are well-tolerated in the coding region (Katti et al., 2001;Garnica et al., 2006). The higher occurrence of aag/ctt repeats in A. nidulans and A. oryzae was expected as it has been reported that due to positive selection, aag repeats are predominant in 5 ′ flanks close to those genes whose products are preferentially involved in transcription (Zhang et al., 2004). Our previous analysis in Fusarium also reveals the predominance of these repeats among its three species (Mahfooz et al., 2015). Since cag codes for glutamine, its abundance in the transcripts of A. niger might be attributed to its reputation of being a polar zipper protein-protein interaction domain (Michelitsch and Weissman, 2000).
We further analyzed the conservation of motifs among Aspergillus species, which resulted in a low conservation (27.8%) was obtained when compared to other Ascomycetes (Mahfooz et al., 2015(Mahfooz et al., , 2016, this again reflected its diverse genome architecture (Rokas et al., 2007;Fedorova et al., 2008). Among the three species, maximum conservation was obtained in the trio A. niger-A. nidulans-A. oryzae (5.3%) which may be explained on the basis of sequence conservation within 5,000 non-coding regions with the abundance of repeats actively conserved within these species (Galagan et al., 2005). It has been reported that of 8,695 genes in A. niger, 78% showed conservation of neighboring orthologs in at least one species (Pel et al., 2007). This might be the reason why A. niger and A. nidulans showed maximum motif sharing among themselves. Our previous analysis of motifs conservation in Fusarium and Trichoderma prompted us to analyze it at the genus level. The three genus shared only 20.3% common motifs despite the fact that 80% of genes in Aspergillus have homologs in other lineages of fungi (de Vries et al., 2017). Higher conservation of motifs among Fusarium and Trichoderma (18.7%) was anticipated as Trichoderma and  Fusarium belongs to Sordariomycetes whereas Aspergillus falls under Eurotiomyceties (Grigoriev et al., 2014). The least number of unique motifs obtained in Aspergillus suggests a low level of genetic heterogeneity in Aspergillus as compared to other Ascomycetes. It was thought provoking to witness similar relationship on the basis of hyper-variable and conserved regions. The possible explanation for this might be attributed to the fact that within the genes, apart from long stretches of nucleotides, short stretches are also conserved with a possibility of change in a number of repeats.
Due to positive selection, changes in amino acids has been witnessed in domesticated fungi probably because of the strong selection pressure exerted by humans. The genetic code itself can also provide unexpected adaptive amino acid changes. In Candida albicans, incorporated serine residues were witnessed at sites where leucine was previously placed and this replacement was well-tolerated in the genome (Miranda et al., 2013). This might be one of the reasons why A. terreus distantly clustered in the PCA plot. Apart from this, higher abundance of arginine, alanine, and proline coding repeats might also be responsible for the distant clustering of A. terreus in PCA plot. The acquisition of additional repeats in proteins of A. terreus may help to fine tune its function and/or modify some of its properties (Mularoni et al., 2010).
The primers designed in the present study were further validated for its ability to detect polymorphism. Till date, only six polymorphic microsatellites were developed for A. niger (Esteban et al., 2005) which were insufficient for estimating genetic diversity. Earlier RAPD markers were widely used for genetic characterization of Aspergillus isolates. Diversity and phylogenetic relationship of 12 Aspergillus species isolated from Tehran were studied using 11 RAPD markers (Kermani et al., 2016). The authors obtained similarity coefficient ranged from 0.02 to 0.40 indicating a wide diversity within Aspergillus isolates. Higher genetic diversity was also obtained in A. terreus isolates collected from Houston, Texas, and Innsbruck using RAPD markers (Lass-Florl et al., 2007). A higher range of similarity coefficient obtained in our study with SSR markers might be attributed to the fact that in our analysis only three species were analyzed. The newly develop markers in A. niger and A. terreus along with previously published marker in A. niger revealed that for the distinction of a broad range of A. niger and A. terreus strains and to analyze intraspecies variation among them, these markers are sufficient. The available markers can address issues such as pathogenicity, ecology, and species differentiation within the genus Aspergillus. In addition to this, the unique motifs obtained in this study may be utilized for the development of species-specific markers.

ACKNOWLEDGMENTS
This work was supported by the financial assistance from Science and Engineering Board of Department of Science and Technology, Government of India (GAP-3349).

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2017.01774/full#supplementary-material