A Chromosome-Scale Assembly of the Wheat Leaf Rust Pathogen Puccinia triticina Provides Insights Into Structural Variations and Genetic Relationships With Haplotype Resolution

Despite the global economic importance of the wheat leaf rust pathogen Puccinia triticina (Pt), genomic resources for Pt are limited and chromosome-level assemblies of Pt are lacking. Here, we present a complete haplotype-resolved genome assembly at a chromosome-scale for Pt using the Australian pathotype 64-(6),(7),(10),11 (Pt64; North American race LBBQB) built upon the newly developed technologies of PacBio and Hi-C sequencing. PacBio reads with ∼200-fold coverage (29.8 Gb data) were assembled by Falcon and Falcon-unzip and subsequently scaffolded with Hi-C data using Falcon-phase and Proximo. This approach allowed us to construct 18 chromosome pseudomolecules ranging from 3.5 to 12.3 Mb in size for each haplotype of the dikaryotic genome of Pt64. Each haplotype had a total length of ∼147 Mb, scaffold N50 of ∼9.4 Mb, and was ∼93% complete for BUSCOs. Each haplotype had ∼29,800 predicted genes, of which ∼2,000 were predicted as secreted proteins (SPs). The investigation of structural variants (SVs) between haplotypes A and B revealed that 10% of the total genome was spanned by SVs, highlighting variations previously undetected by short-read based assemblies. For the first time, the mating type (MAT) genes on each haplotype of Pt64 were identified, which showed that MAT loci a and b are located on two chromosomes (chromosomes 7 and 14), representing a tetrapolar type. Furthermore, the Pt64 assembly enabled haplotype-based evolutionary analyses for 21 Australian Pt isolates, which highlighted the importance of a haplotype resolved reference when inferring genetic relationships using whole genome SNPs. This Pt64 assembly at chromosome-scale with full phase information provides an invaluable resource for genomic and evolutionary research, which will accelerate the understanding of molecular mechanisms underlying Pt-wheat interactions and facilitate the development of durable resistance to leaf rust in wheat and sustainable control of rust disease.


INTRODUCTION
The rust fungi are a large group of plant pathogens in the order Pucciniales that are damaging in agriculture and in forestry. The rust species Puccinia triticina (Pt) causes leaf rust on cereals and grasses and is the most commonly occurring cereal rust disease worldwide (Huerta-Espino et al., 2011). Pt has also been rated as the most damaging pathogen of wheat, causing global losses of approximately 3.25% as documented by a recent survey of the global burden of pathogens on major food crops (Savary et al., 2019). To prevent the disease, planting wheat with resistance genes (R genes) is both effective and environmentally friendly. However, genetic mutations give rise to new pathotypes that can overcome plant resistance, as exemplified by the observation of Pt populations highly diverse for virulence with many different pathotypes detected annually (Park et al., 2000;Aoun et al., 2019). The molecular mechanisms underlying host-pathogen co-evolution include pathogen evasion of host recognition by modifying genes encoding avirulence proteins (Avrs) that are recognized by host proteins encoded by R genes. This specific recognition between host and pathogen was first described by Flor as the gene-for-gene model (Flor, 1971), and this type of host immune response was defined as effector-triggered immunity in differentiation with pathogen-associated molecular patterntriggered immunity (Jones and Dangl, 2006;Chen et al., 2017).
Pt is an obligate biotroph with a complex life cycle consisting of both asexual and sexual stages (Bolton et al., 2008;Cuomo et al., 2017). While the dikaryotic urediniospores (n + n) are most common and can be repeatedly produced through vegetative polycycle on the primary host wheat, the haploid basidiospores (n) generated through meiosis in mature teliospores (from n + n to 2n) have segregated mating-type (MAT) loci (+ and −) and can infect alternate hosts (e.g., Thalictrum speciosissimum), where sexual reproduction occurs (Bolton et al., 2008;Cuomo et al., 2017). The subsequently developed pycniospores and receptive hyphae with compatible MATs (opposite MATs) can fuse to restore the dikaryotic state (n + n). This leads to the production of aeciospores capable of infecting the primary host wheat to once again generate urediniospores. For sexual reproduction in Pt, two pairs of MAT genes are required. One pair encodes premating lipopeptide pheromones and their cognate receptors (designated as a locus in Ustilago maydis), and the other encodes two classes of homeodomain (HD) transcription factors (HD1 and HD2; b locus) (Nieuwenhuis et al., 2013). In addition to sexual reproduction, it has been proposed that two distinct compatible MATs are essential for the formation and maintenance of a stable dikaryotic state (Park and Wellings, 2012). It has also been suggested that the maintenance of MAT genes in dikaryotic fungi could have profound implications for their lifestyle, as exemplified by the adaption of Fusarium using pheromone receptors to detect the presence of potential plant hosts (Wallen and Perlin, 2018). Despite their crucial functions, the MAT genes of the wheat rust pathogens have not been investigated thoroughly and much remains to be explored .
For the genomic research on rust fungi, until recently, most published rust genome assemblies are based largely on short-read sequencing, such as the initial assembly of the three wheat rust fungi, Puccinia graminis f. sp. tritici (Pgt), Puccinia striiformis f. sp. tritici (Pst), and Pt . Due to the technical limitations of short-read sequencing and the high level of heterozygosity and repetitive sequences in rust genomes, these assemblies are highly fragmented, and haplotype phasing can hardly be achieved (Aime et al., 2017). With the advent of longread sequencing (LRS), which routinely generates reads longer than 10 kb, more accurate and complete genome assemblies with haplotype phasing have become available (Amarasinghe et al., 2020). For example, several LRS-based genome assemblies have been released for rust fungi, such as those for Pst isolates 104E 137 A-and 11-281 Li et al., 2019b), the Australian Pt isolate Pt104 , and two isolates of the oat crown rust pathogen Puccinia coronata f. sp. avenae . Recently, the LRSbased Pgt21-0 assembly was combined with Hi-C scaffolding data to yield the first chromosome-scale assembly for Pgt (Li et al., 2019a). Characterized by significantly improved contiguity and haplotype phase information, these LRS-based assemblies have provided references with better resolution for comparative studies on Avr genes, structure variations (SVs), and genetic relationships (Li et al., 2020b;Schwessinger et al., 2020;Wu et al., 2020).
In the present study, we used LRS and Hi-C sequencing data to generate a haplotype-resolved assembly at chromosomescale for Pt at the dikaryotic stage using the Australian isolate Pt64 initially detected in 1990 (Park et al., 1999). Comprising 18 chromosome pseudomolecules in each haplotype, this new assembly provided insights into the genetic diversity at SV level between haplotypes of the dikaryotic genome of Pt, revealed the complex mating system of Pt, and enabled an investigation of evolutionary relationships of 21 Australian Pt isolates at the haplotype level based on resequencing data. This study not only provides the complete chromosome-level assembly of Pt with haplotype resolution, but also demonstrates the great potential of haplotype-resolved genomic analyses for better understanding of Pt biology and evolution.

Haplotype-Phased Assembly of Pt at Chromosome-Scale
We generated a complete chromosome-scale assembly of Pt using LRS and Hi-C data for the Australian isolate Pt64. A total of 29.8 Gb LRS data with an average read length of 11.3 kb and ∼200-fold coverage were obtained from four SMRT cells using the PacBio Sequel System. Following the integrated Falcon and Falcon-unzip pipeline (Chin et al., 2016), the data were de novo assembled which produced a contig assembly consisting of 218 primary contigs (N 50 of 1.9 Mb; total length of 144.9 Mb) with associated haplotigs totaling up to 140.3 Mb (Table 1 and Figure 1). The contig assembly was then integrated with Hi-C data (∼160-fold coverage) for further phasing and scaffolding, which yielded a fully-phased assembly at the chromosome-scale for the dikaryotic genome of Pt (referred  (Table 1). Within each haplotype, 18 chromosome pseudomolecules were constructed, which covered 99% of the total sequence length and ranged from 3.5 to 12.3 Mb in size (Figure 1 and Supplementary File 1). The completeness of the Pt64 assembly was assessed using BUSCO analysis for both haplotypes and revealed similar statistics for each, implicating that each haplotype represented a full haploid genome. Using the A haplotype as the representative (hereafter the A haplotype is used as representative when both haplotypes show similar characteristics), 92.2% of the BUSCO genes were present as complete sequences and 85.8% were in single copy status ( Table 1). The fragmented and missing BUSCO genes were 4.3 and 3.5%, respectively. Of the 18 chromosomes in each haplotype of Pt64, 11 telomeres on nine chromosomes of the Pt64 assembly were identified, of which two contained telomere sequences at each end (chromosomes 7 and 11).

Structural Variations Between Pt64 Haplotypes
To assess similarity of the two haplotypes, whole-genome alignments were performed between A and B haplotypes using minimap2 and visualized by dotPlotly (Figure 2) (Li, 2018). The dot plot depicted the collinearity of A and B chromosomes and displayed the average identity of alignment blocks within each pair of homologous chromosomes ranging from 78 to 87% (Figure 2). When the proportion of the total base aligned versus the chromosome length was taken into account, the overall identity between A and B haplotypes was 70-81%.
Using the highly contiguous haplotype assembly, we inspected the SVs between the two haplotypes in Pt64 using Assemblytics as previously described (Nattestad and Schatz, 2016;Schwessinger et al., 2018). Three categories of SVs (50-100 kb) including insertions/deletions, tandem expansions/contractions, and repeat expansions/contractions were identified and 1,343, 363, and 1,193 events were detected for each category, respectively (Figure 2 and Supplementary File 2). While most insertion/deletion and tandem expansion/contraction events were populated in the two bins (50-500 bp and 500-10,000 bp), the repeat expansion/contraction was mainly concentrated in the bin of 500-10,000 bp (Figure 2). When the calculation was based on the length of the affected base pairs, the largest portion of the insertion/deletion and tandem contraction was in the bin of 500-1,000 bp, and the largest portion of repeat expansion/contraction was in the bin of 10-50 kb. Large-scale SVs (50-100 kb) including six insertions/deletions, 13 repeat expansions/contractions, and 11 tandem expansions/contractions were also detected (Figure 2 and Supplementary File 2).

Genome Annotation and Secretome Prediction
Repeat content including interspersed repeats and non-element repeats was identified using de novo predicted repeats and fungal elements from RepBase, which covered 58-59% of the whole genome in each haplotype of Pt64 ( Table 2). Despite unclassified repeats, the most abundant repetitive elements were long terminal repeats (16-18%), followed by DNA elements (∼5%).
The current transcript-based annotation predicted ∼29,000 genes for each haplotype, of which ∼28,000 were proteinencoding genes (Table 3, Figure 3, and Supplementary File 3). The predicted genes were then subjected to functional annotation using curated databases such as CAZymes (carbohydrate active enzymes) and MEROPS (peptidase database), which led to the identification of ∼430 CAZymes in the Pt64 genome with glycoside hydrolase as the most populated subclass and ∼300 proteases with serine peptidase being the most populated family ( Table 3).
For secretome prediction, proteins possessing a signal peptide with no transmembrane segment and a predicted localization of "secreted" or "unknown" were predicted as SPs. In total, 2,175 and 1,899 genes were predicted to encode SPs in the A and B haplotypes of the Pt64 assembly (Table 3, Figure 3, and Supplementary File 4), which comprised about 7.6 and 6.6% of the total proteins of each haplotype, respectively. About 77% of the secretome proteins show no homology to proteins with known functional domains (Supplementary File 4). We examined the expression of these predicted genes in previous Pt transcriptomes Duan et al., 2021) and found that over 20% of these genes encoding the secretome proteins were differentially expressed in infected wheat leaves as compared to urediniospores (Supplementary File 5).

The Identification of the MAT Genes Located on Two Chromosomes
Given the crucial function and lack of understanding of the MAT genes in Pt, we investigated them using the annotated haplotypes of Pt64. The Pt64 assembly at chromosome-level clearly showed that the MAT loci a (STE3/mfa) and b (HD complexes) were independently located on two chromosomes (chromosomes 7 and 14) ( Table 4 and Figure 4). For locus a on chromosome 7, we observed a pair of genes encoding pheromone (mfa2) and pheromone receptor (STE3.2) with a distance of 874 bp ( Table 4 and Figure 4). The variations in the a locus were limited to one amino acid difference between STE3.2 genes in the two haplotypes. For locus b, we detected a pair of divergently transcribed genes encoding transcription factors bW-HD1 and bE-HD2 on chromosome 14. Compared with the previously described alleles b1 and b2 in Pt Race 1 , two novel alleles b3 and b4 were found in Pt64. For haplotype A, bW3-HD1 (P0_021298) and bE3-HD2 (P0_021297) were 664 bp apart, FIGURE 2 | Whole genome alignment and structure variation between haplotypes A and B of the Pt64 assembly. (A) Dot plot of sequence alignment of Pt64 chromosome pseudomolecules of haplotypes A and B. (B) Summary of structure variation between haplotypes A and B. The counts of structure variation within each size bin are shown by bars with the left y-axis, whereas the total bases (kbp) of the structure variation are shown with the right y-axis. Each blue and red bar indicates the total counts and the number of bases that are covered in each bin by the specific variation category, respectively.

Comparison of Gene Content and Prioritization of AvrLr20 Candidates
To inspect our annotation results, ortholog analyses were carried out for the two Pt64 haplotypes and the Pt Race1 genome, which identified 20,825 orthogroups showing corresponding orthologs between these assemblies (Supplementary File 6). This included 9,716 orthogroups that are specific to the A and B haplotypes of isolate Pt64, and 11,109 orthogroups with at least one ortholog from both isolates. Of the total orthogroups, 19,113 were singlecopy orthologs either between the Pt64 haplotypes or across both isolates. Within each genome, genes with at least one ortholog consisted more than 72% of the total protein-encoding genes, reflecting a good consistency in gene annotation across assemblies. For the predicted secretome of haplotypes A and B of Pt64, more than 82% of the SPs had at least one ortholog (Supplementary File 7).
To further examine the assembly, we used 20 candidates of AvrLr20 previously detected based on the Race 1 genome (Wu et al., 2017) in the context of the phased orthologs in Pt64. Based on our phenotype studies and previous genetic analyses (Park et al., 1999), we postulated that Pt64 most likely possesses two heterozygous alleles with one copy avirulent and the other virulent to Lr20. If this assumption holds, the ortholog status could be transformed into an additional criterion that at least one copy of the AvrLr20 candidate ortholog shall be present in the Pt64 assembly as Pt64 is avirulent to Lr20 and the two orthologs of the AvrLr20 candidate in Pt64 shall be heterozygous. Of the 20 candidates showing no known functional domains (Supplementary File 4), 13 had at least one ortholog with Pt64, and nine FIGURE 3 | Genomic landscape of predicted gene and secreted protein in the haplotype A of Pt64 assembly and genetic variations of the 21 Pt isolates. Tracks from outside to inside are: (1) chromosome name; (2) -(5) gene density, secreted protein density, SNP (single-nucleotide polymorphism), and Indel (insertion or deletion) density in non-overlapping 100 kb windows. Each major tick on the contig track is for 1 Mb length. were heterozygous ( Table 5). Most of these heterozygous candidates were not differentially regulated in planta as compared to that in urediniospores based on the previous Pt transcriptomes Duan et al., 2021) except for two, GN64P176_026812/GN64H176_026857 and GN64P176_007373 (Table 5). Detailed functions of these nine candidates will be further prioritized and validated in future studies. Genes are labeled with their locus tag and represented by blue rectangle arrows Vertical gray shading illustrates the blastn identity between sequences on both haplotypes, according to the scale shown in the right bottom corner next to the sequence scale bar.

Whole-Genome Sequencing Analysis Based on Pt64 Assembly
To evaluate the Pt64 assembly, we used it to investigate genetic variants and relationships within a set of Australian Pt isolates including the Illumina sequencing data of Pt64 and 20 Pt isolates that are publicly available (Wu et al., 2017). For haplotype A, the Pt64 reference genome was covered by between 97.4 and 99.2% of the total genome bases (Supplementary File 8). The average aligned read depth was 28.4 and the average mapping rate of these isolates was 90.9%, which was a substantial improvement over the previous mapping rates of 74-81% using the Pt Race 1 reference genome   (Supplementary  File 8). For each isolate, genome-wide polymorphisms were detected using GATK HaplotypeCaller based on reads mapped to the Pt64 genome (Figure 3). The average number of total variants identified was 548,032 and the average number of single nucleotide polymorphism (SNP) and insertion-deletion mutation (InDel) variants were 476,509 and 58,222, respectively (Supplementary File 9). The average rates of heterozygous variants (SNP and InDel) and SNPs were 2.9 variants/kb and 2.7 SNPs/kb, respectively. The mapping and variant statistics derived from haplotype B were similar to those from haplotype A (Supplementary Files 8, 9) and both were also in line with reports from previous whole-genome sequencing studies on Pt Wu et al., 2017Wu et al., , 2020.

Phylogenetic Analysis of 21 Australian Pt Isolates
Based on the SNPs identified, phylogenetic analysis was carried out using the full dikaryotic genome of Pt64 and the individual haplotype genome of Pt64 separately. Using whole-genome SNPs called against the full dikaryotic genome of Pt64, the phylogenetic tree derived from maximum likelihood method showed similar overall topology to those derived from filtered SNPs called against A or B haplotype ( Figure 5). All three phylogenies had three major branches, which linked to Pt64 (S473), the isolates in subclade 4 (SC4), and the remaining 17 isolates. All three isolates linked to SC4 ( Figure 5) were collected around 1990 and identified as International Race 104; for the remaining 17 isolates linked to the third major branch, the two isolates from pair 7 forming a sister group showed larger distance than the others. It was noted that the third major branch inferred from the full dikaryotic genome (Figure 5C) was longer than those inferred from Pt64 haplotypes (Figures 5A,B), indicating that the estimated genetic distance for this subclade was larger based on the full genome of Pt64 than that based on a single haplotype. In addition, some pronounced differences were observed for the phylogenetic tree using filtered SNPs called against haplotype A versus that against haplotype B. For example, in the phylogeny inferred from A genome, pair 2 isolates (670028 and 790197; SC3) formed a sister group (Figure 5A), whereas in the other two phylogenies, the two isolates were separated (Figures 5B,C). Similarly, we also observed the relocation of the pair 1 isolate 760285 from the position close to SC3 to SC5 and the relocation of the pair 14 isolate (890155) from SC2a to SC5 when the B genome was used as the reference (Figures 5B,C).

DISCUSSION
Despite the economic importance of Pt worldwide, genomic resources for this pathogen are limited. Currently, no chromosome-level phased assemblies of Pt are available, which has greatly hampered detailed investigations of genetic variation, evolutionary relationships with haplotype resolution, and Avr gene cloning. Built upon the recent technology breakthroughs of LRS and Hi-C sequencing, here we present a phased chromosome-scale genome assembly of Pt using the Australian Pt pathotype, Pt64. Sharing the same number of chromosomes observed in several closely related rust fungi such as Pgt and Melampsora lini (M. lini) (Boehm and Bushnell, 1992;Li et al., 2019a), each haplotype of the dikaryotic genome of Pt64 has 18 chromosome pseudomolecules that range in size from 3.5 to 12.3 Mb (Table 1 and Figure 1). The Pt64 assembly at chromosome-scale with two highly contiguous haplotypes provides an unprecedented opportunity to investigate haplotype diversity at the SV level, revealing variations previously undetected by short-read based assemblies. Using this chromosome-scale assembly, we explored haplotype diversity at SV level and the complex mating system of Pt. This assembly has also enabled us to analyze the Illumina sequencing data of 21 Australian Pt isolates to explore the genetic relationships based on individual haplotype genomes, which has never been attempted for Pt before.
With the fully-phased Pt64 assembly at chromosome-level providing a valuable platform for SV investigation, 10% of the total genome was found to be represented by SVs between haplotypes A and B of Pt64 (Figure 2 and Supplementary File 2), highlighting a hidden layer of inter-haplotype variation previously undetected by short-read based assemblies. The 10% SV in Pt is slightly higher than the 8.6% SV reported for Pgt (Li et al., 2019a), which is in line with the observation of genome expansion by the integration of repetitive elements in Pt as compared to Pgt suggested by a previous study by Cuomo et al. (2017). In the Pst isolate Pst-104E, SVs comprised 6.4% of the total size of the primary assembly and the actual level of SVs was postulated to be higher . Further to these findings that SVs do represent a considerable portion of the wheat rust genomes, the importance of SVs in the pathogenicity of rust fungi was also highlighted by our recent identification of AvrSr50, which unveiled a ∼200 bp insertion in the AvrSr50 gene leading to the development of virulence (Chen et al., 2017). It is thus anticipated that with Pt64 assembly facilitating more accurate and in-depth detection of SVs along with the development of new approaches integrating SV analysis in rust comparative genomics , our understanding of the impact of SVs on rust pathogenicity will be substantially accelerated.
The annotation of the newly built Pt64 assembly (Tables 2, 3 and Figure 3) enabled the detection and characterization of the MAT genes, showing that the two MAT loci a (P/R genes) and b (HD genes) occur on two separate chromosomes (chromosomes 7 and 14) and implicating a tetrapolar type of mating system in Pt (Table 4 and Figure 4). For many species in the phylum Basidiomycota, the two sets of MAT genes that control different stages of the sexual cycle govern sexual reproduction, which not only promotes genetic variation essential for adaptation and long-term survival, but also plays a central role in pathogenic development (Maia et al., 2015). While early studies reported the tetrapolar type of mating system in the rust species P. coronata, M. lini, and Cronartium quercuum (Kües et al., 2011), the most recent genome sequencing of Pgt revealed that the MAT system of Pgt consists of single independent di-allelic (a) and multiallelic (b) loci (Li et al., 2019a). In the current Pt64 assembly, the STE3.2 allele in haplotype B was identical to that reported in Pt Race 1 , indicating that the a locus of Pt could be di-allelic. A similar configuration was also observed in most of the known tetrapolar yeast species such as U. maydis and Cryptococcus amylolentus (Bölker et al., 1992;Kämper et al., 1995;Findley et al., 2012). Previously, it was proposed that the bipolar states in basidiomycetes have most likely arisen from a tetrapolar configuration. Given that the maintenance of MAT genes may have more profound meaning for the adapted lifestyle of fungi as suggested by a recent study (Wallen and Perlin, 2018), it is plausible to postulate that the emergence of bipolar states could be a strategy to balance between the requirement for maintaining MAT genes and the cost of keeping such a complex system.
We attempted an ortholog analysis of the two haplotypes of Pt64 to prioritize previously identified candidates for AvrLr20 by inspecting the heterozygous status of the candidate orthologs in Pt64 (Table 5). In our previous study, we integrated genomewide association with comparative analysis to obtain a panel of AvrLr20 candidates based on the Pt Race 1 reference (Wu et al., 2017). The haplotype-resolved Pt64 assembly has provided a new opportunity to further prioritize these candidates by inspecting the heterozygous status of each candidate gene in Pt64, which further prioritized nine candidates. While additional criteria such as in planta expression and similarity to haustorial proteins have been used to overcome the difficulty in prioritizing candidate Avr genes by previous studies (Prasad et al., 2019), we have explored a new scenario integrating heterozygous status postulation from phenotype studies (Park and Wellings, 2012) with ortholog and haplotype information of phased Pt assembly to prioritize Avr candidate genes. In aid of the high quality of Pt64 assembly, this approach has demonstrated the potential value of specific phenotypic information from rust survey analyses when applied in conjunction with genomic resources of haplotype and ortholog information for functional genomics of rust fungi.
The phased genome of Pt64 allowed us to perform haplotypebased phylogenetic analyses (Figure 5), which demonstrated some pronounced differences between phylogenies of 21 Pt isolates derived from the three reference genomes, namely haplotype A, haplotype B, and the full genome (A and B), despite the overall similarity. This contrast suggests that the filtered SNPs derived from genome A may place more emphasis on the similarity between pair 2 isolates, whereas the SNPs based on genome B or the entire genome of A and B may place more emphasis on the differences between these two isolates. Despite the similar overall topology, the aforementioned incongruities between phylogenetic trees reveal the potential complexity hidden behind inferred genetic relationships based on whole-genome SNP data without haplotype resolution and the potential limitations of using a single haplotype as the reference to fully reveal genetic relationships. Consistent with our observation, a previous Pgt study also reported that Ug99 and five international isolates were relocated to different subclades when different haplotypes of the reference genomes were used (Li et al., 2019a). Taken together, both studies clearly demonstrate the hidden complexity in deriving genetic relationships using whole genome SNPs called against a reference without haplotype resolution. Similarly, the use of different reference genomes may also impact other outcomes of comparative analyses based on sequence alignment and SNP calling such as the identification of Avr genes. Most sequence data analyses have used the conventional comparative approach, which starts by aligning sequence reads to a linear reference genome. Because this does not take into account the multiple genomes at haplotype level, a new approach effectively using these haplotype genomes is needed. Recently, a new strategy known as reference flow has been proposed, which can simultaneously use the information from multiple reference genomes to improve alignment accuracy and reduce reference bias . In addition to the new alignment strategy, building pan-genomes is also a well-documented strategy (Tettelin et al., 2005;Golicz et al., 2020;Liu et al., 2020). While a single linear reference genome cannot fully represent the genetic diversity of a species, a pan-genome consisting of both core genes/sequences (present in all individuals) and variable genes/sequences (present in some individuals) can (Hurgobin and Edwards, 2017;Bayer et al., 2020). With this full representation, a pan-genome can provide more accurate and comprehensive information critical for comparative genomics such as analyses for phylogeny and identification of Avr genes.
With the advent of LRS technologies, more and more higher quality assemblies with significantly improved contiguity have become available, as exemplified by the Pt64 assembly presented here and recently available assemblies such as Pt104, Pgt21-0, and Pst104E Li et al., 2019a;Wu et al., 2020). For these high-quality assemblies, various new methods have been proposed for pan-genome construction. For example, a graph-based data model is suggested for the construction of a pan-genome graph, which by preserving links and relationships between pan-genome sequences, can compactly encode tens of thousands of SVs previously missing from the traditional lineage reference Li et al., 2020a). Currently, our lab is building high-quality Pt assemblies for representative isolates from different lineages collected in Australia, and the construction of a pan-genome capturing the entire gene/sequence space of Pt would be the ultimate goal.
In summary, we presented a phased chromosome-scale genome assembly using an Australian Pt isolate Pt64, which represents an unparalleled resource for understanding Pt diversity, pathogenicity, and evolution. This new assembly has enabled us to compare SVs between haplotypes, to unveil the complex mating system in Pt, and to investigate the evolutionary relationships with haplotype resolution for a set of 21 Australian Pt isolates. This Pt64 assembly at chromosome-scale with full phase information will undoubtedly accelerate the dissection of the genetic basis and understanding of molecular mechanisms underlying Pt-wheat interactions, which will facilitate future efforts to achieve sustainable control of the rust diseases of plants.

Pt Isolates and Plant Inoculation
We used Pt isolates identified in nationwide race surveys of Australian cereal rust control program for this study. These isolates are curated in the Plant Breeding Institute Rust Collection, The University of Sydney, Australia. To ensure the purity of each isolate for sequencing, a single pustule was selected from a region of low-density infection and propagated on the wheat genotype Morocco prior to DNA preparation. The identity and purity of each isolate were checked by pathogenicity tests with a set of host differentials at each cycle of inoculum increase and also using urediniospores subsampled from those used for DNA extraction. For Pt infection, plants were grown at high density (∼25 seeds per 12 cm pot with compost as growth media) to the one leaf stage (∼7 days) in a greenhouse microclimate set at 18-25 • C temperature and with natural day light. Plants were inoculated as previously described (Wu et al., 2017) and mature spores were collected, dried and stored at −80 • C for DNA isolation. Pathotype 64-(6),(7),(10),11 (North American race LBBQB; Kolmer and Hughes, 2016) was chosen to develop a high quality assembly based on its avirulence for many of the cataloged leaf rust resistance genes in wheat [viz. avirulence/virulence (partial virulence): Lr2a,Lr2b,Lr2c,Lr3a,Lr3bg,Lr3ka,Lr9,Lr11,Lr14a,Lr15,Lr17b,Lr18,Lr19,Lr20,Lr21,Lr23,Lr24,Lr25,Lr26,Lr28,Lr29,Lr30,Lr32,Lr37/Lr1,Lr10,(Lr13), Lr16, (Lr17a), (Lr27+31); Park et al., 1999].

DNA Extraction and Sequencing
For PacBio sequencing, DNA was extracted from urediniospores as previously described (Schwessinger and Rathjen, 2017) and sequencing was performed at the Australian Genome Research Facility Ltd (Adelaide, Australia). For library preparation, the SMRT cell Template Prep Kit 1.0-SPv3 with BluePippin sizeselection with 15-20 kb cutoff (PacBio) was used and DNA libraries were sequenced on a PacBio Sequel System with Sequel Sequencing chemistry 2.1. Four SMRT cells were used for Pt64 and each SMRT cell had a 5-10 Gb capacity. Hi-C library preparation and sequencing was carried out by Phase Genomics (Seattle, WA, United States). The Hi-C library of Pt64 was constructed with the ProxiMeta Hi-C kit from Phase Genomics following the standard protocol using the enzyme Sau3A for digestion. For resequencing, TruSeq library of DNA samples for Pt64 was constructed and sequenced on the Illumina HiSeqX instrument at Novogene with as 150 bp paired-end reads (Hong Kong, China).

Genome Assembly and Scaffolding
The integrated pipeline of FALCON and FALCON-Unzip (v4.1.0) was used for genome assembly (Chin et al., 2016). Read length cutoffs were computed by FALCON based on the seed coverage and expected genome size. After assembly by Falcon, FALCON-Unzip was used to phase haplotypes and to generate consensus sequences for primary contigs and the associated haplotigs. The generated assembly was subjected to error correction using the final consensus-calling algorithm Quiver implemented in SMRT (v4.0.0), an algorithm for calling highly accurate consensus from PacBio reads using a hidden Markov model exploiting both the base calls and QV metrics to infer the true underlying DNA sequence (Chin et al., 2013). Blastn searches against the NCBI nucleotide reference database were used to check potential noneukaryotic contamination and none of the contigs were found to have predominant noneukaryotic sequences as best BLAST hits at any given position.
A Hi-C-based phasing was processed by FALCON-Phase to correct likely phase switching errors in the primary contigs and alternate haplotigs generated from FALCON-Unzip, which created one complete set of contigs for each phase (Kronenberg et al., 2019). This assembly was further processed by the Proximo Hi-C genome scaffolding platform by Phase Genomics (Seattle, WA, United States) to build chromosome-scale scaffolds using the same single-phase scaffolding procedure as previously described (Bickhart et al., 2017). Approximately 40,000 separate Proximo runs were performed to optimize the number of scaffolds and scaffold construction in order to make the scaffolds as concordant with the observed Hi-C data as possible. Juicebox was then used to correct scaffolding errors and FALCON-Phase was run a second time to detect and correct phase switching errors, which produced the final Pt64 assembly, representing a dikaryotic genome with fully phased information for each set of chromosome-scale scaffolds (Durand et al., 2016;Kronenberg et al., 2019). To evaluate the completeness of the final assembly, the software BUSCO (v3.0) (Simao et al., 2015) was used for comparison with the fungal lineage set of orthologs (basidiomycota_odb9), which consisted of 1,335 conserved orthologs of basidiomycete fungi. To identify telomeric sequences in each haplotype of Pt64, 5 kb sequences from the termini of each chromosome were first extracted for repeat identification by tandem repeats finder (TRF) (Benson, 1999), which were then screened for telomeric sequences as described previously (Li et al., 2019a).

RNA Sequencing, Genome Annotation, and Secretome Prediction
Infected leaves were collected at 3, 5, and 7 days after inoculation with Pt64 and immediately frozen in liquid nitrogen. RNA samples were prepared as previously described . For library preparation, around 10 µg of total RNA was processed with the mRNA-Seq Sample Preparation kit (Illumina), which was then sequenced on the Illumina HiSeq2500 platform (125 bp paired-end reads).
After trimming, RNA-seq reads were first aligned to the Pt64 genome by using the CLC module large gap read mapping (default parameters). Both de novo transcriptome assembly and genome-guided transcriptome assembly using Trinity (v2.1.1) were built (Haas et al., 2013). The transcript models and previously reported EST sequences (Xu et al., 2011) were combined as transcript evidence; Pt Race 1 protein sequences were used as protein evidence. This evidence was fed into the Funannotate pipeline (v0.7.2) for a comprehensive annotation of the Pt64 assembly. Funannotate has integrated multiple tools tailored for fungal genome annotation, including repeat identification, alignment of protein and transcript evidence, ab initio gene prediction, tRNAs prediction, evidence weighting and combining, and final clean of gene models (Slater and Birney, 2005;Wu and Watanabe, 2005;Haas et al., 2008;Hoff et al., 2016;Lowe and Chan, 2016;Wu et al., 2020). After genome annotation, orthologs between Pt64 and Pt Race 1 genomes were identified by Proteinortho v5.16 (synteny mode) (Lechner et al., 2011). Functional annotation to the protein-coding genes was carried out using curated databases including UniProt (Apweiler et al., 2004), Pfam domains (Finn et al., 2014), CAZymes (Yin et al., 2012), and MEROPS (Rawlings et al., 2016). The mating type genes were identified by BLAST searches with the pheromone peptide encoding genes (mfa2 and mfa3) and pheromone receptors (STE3.2 and STE3.3) from the a locus, and HD1 and HD2 genes from the b locus previously reported in Pt Race 1 .

Comparative Genomic and Phylogenetic Analyses
The 20 Pt isolates cover a range of pathotypes and comprise 10 pairs with the members in each pair contrasting in virulence profile to Lr20 (Park et al., 1999;Wu et al., 2017). Full virulence/avirulence attributes of these isolates were provided by Wu et al. (2017) and are reproduced here in Supplementary File 8. After trimming, the sequence data as paired-end reads were mapped to the two haplotypes of Pt64 individually. Paired-end Illumina reads of the Pt isolates were independently mapped to the reference genome using BWA mem v0.7.17 (Li and Durbin, 2009). High quality alignments (with the mapping quality cutoff of 30) were selected using the SAMTools v1.6 view command and the generated BAM files were used as input to call SNPs and InDels using GATK v4.1.4.1 HaplotypeCaller (McKenna et al., 2010). Based on the genome-wide SNPs identified, the evolutionary relationships of the isolates were inferred using SNPhylo with the performance of 1,000 bootstrap replicates and visualized by the Bioconductor package ggtree v2.0.4 (Lee et al., 2014;Yu et al., 2018). SNPs called against the A or B haplotype were filtered from the total SNP sets using the Bioconductor package GenomicRanges v1.40 (Lawrence et al., 2013). The identified SNPs and InDels were visualized by the R package circlize v0.4.8 (Krzywinski et al., 2009). To obtain a summary of sequence alignments between haplotypes A and B, whole genome alignments were performed with minimap2 (Li, 2018) and visualized with dotPlotly 1 . For the identification of SVs between the two haplotypes of Pt64, mummer3 (Kurtz et al., 2004) was used to align the two haplotypes and the output delta file was fed into Assemblytics (Nattestad and Schatz, 2016) for SV detection.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm. nih.gov/search/all/?term=PRJNA728188, PRJNA728188.

AUTHOR CONTRIBUTIONS
JW analyzed the data and wrote the manuscript. CD, MH, and YD performed the experiments. LS and YD contributed to data analysis and prepared the figures. RP identified all pathotypes used. CD, MH, LS, YD, and RP contributed to the manuscript. RP and JW designed the experiments. RP supervised the work. All authors read and approved the final manuscript.

FUNDING
This work was part funded by the Australian Grains Research and Development Corporation (GRDC; 9175448) and USDA CSREES (2008-35600-04693). This research was undertaken as part of a long running program on national cereal rust surveillance. The program has been hosted by the University of Sydney since 1921, and since 1990 co-funded by the Australian Grains Research and Development Corporation, a statutory corporation founded in 1990 under the Primary Industries Research and Development Act 1989 and principally funded by a grower levy and Australian Government contributions. LS is the receipt of Christian Rowe Thornett Stipend Scholarship. Judith and David Coffey and family are also gratefully acknowledged for financial support. None of these sources had any further role in study design, data collection and analysis, decision to publish, or preparation of the article.

ACKNOWLEDGMENTS
We would like to acknowledge the Sydney Informatics Hub at the University of Sydney for providing access to the CLC Genomics Workbench (Core Research Facilities User Access Scheme) and the High Performance Computing cluster, Artemis. Particularly, we would like to thank Bernard Kirby, Leigh Burchat, Stephen Kolmann, Rosemarie Sadsad, Cali Willet, and Tracy Chew for technical support.