Comparative Genomics of Three Colletotrichum scovillei Strains and Genetic Analysis Revealed Genes Involved in Fungal Growth and Virulence on Chili Pepper

Colletotrichum scovillei causes anthracnose of chili pepper in many countries. Three strains of this pathogen, Coll-524, Coll-153, and Coll-365, show varied virulence on chili pepper. Among the three strains, Coll-365 showed significant defects in growth and virulence. To decipher the genetic variations among these strains and identify genes contributing to growth and virulence, comparative genomic analysis and gene transformation to show gene function were applied in this study. Compared to Coll-524, Coll-153, and Coll-365 had numerous gene losses including 32 candidate effector genes that are mainly exist in acutatum species complex. A cluster of 14 genes in a 34-kb genomic fragment was lost in Coll-365. Through gene transformation, three genes in the 34-kb fragment were identified to have functions in growth and/or virulence of C. scovillei. CsPLAA encoding a phospholipase A2-activating protein enhanced the growth of Coll-365. A combination of CsPLAA with one transcription factor CsBZTF and one C6 zinc finger domain-containing protein CsCZCP was found to enhance the pathogenicity of Coll-365. Introduction of CsGIP, which encodes a hypothetical protein, into Coll-365 caused a reduction in the germination rate of Coll-365. In conclusion, the highest virulent strain Coll-524 had more genes and encoded more pathogenicity related proteins and transposable elements than the other two strains, which may contribute to the high virulence of Coll-524. In addition, the absence of the 34-kb fragment plays a critical role in the defects of growth and virulence of strain Coll-365.


INTRODUCTION
Chili pepper (Capsicum spp.) is a globally significant spice crop. The cultivation of chili pepper is frequently threatened by the attacks of various pathogens, especially the anthracnose pathogens of the Colletotrichum species. Colletotrichum contains a large number of species members that have been classified into several species complexes (Cannon et al., 2012;Talhinhas and Baroncelli, 2021). Colletotrichum pathogens associated with Capsicum plants come from diverse species complexes including C. acutatum, C. boninense, C. gloeosporioides, C. orchidearum, C. magnum, C. spaethianum, and C. truncatum complexes (Damm et al., 2012a(Damm et al., ,b, 2019Weir et al., 2012;De Silva et al., 2021;Talhinhas and Baroncelli, 2021). At least 28 species have been reported and most of them are from the acutatum, gloeosporioides, and truncatum complexes (Mongkolporn and Taylor, 2018;de Silva et al., 2019).
Colletotrichum scovillei is the most dominant species causing anthracnose of chili in Asia, and is found widely distributed in Indonesia, Malaysia, Thailand and Taiwan (De Silva et al., 2017de Silva et al., 2019). It can infect many different species of Capsicum, especially species that are mainly grown for human consumption such as C. annuum, C. frutescens, C. chinense, and C. baccatum (Liao et al., 2012a;de Silva et al., 2019;De Silva et al., 2021). Two pathotypes of C. scovillei, CA1 and CA2, have been identified in Taiwan by the AVRDC-World Vegetable Center (Kumar et al., 2018;Sheu et al., 2019). CA2 is more virulent than CA1 in most tested cultivars of Capsicum species (Liao et al., 2012a;Kumar et al., 2018;Sheu et al., 2019). CA2 pathotype breaks down the resistance of Capsicum chinense PBC932-dervied lines, which are resistant to CA1 pathotype (Liao et al., 2012a;Sheu et al., 2019). CA2 distribution was limited before the year 2000, but since then has become dominant and replaced CA1 in most of the chili pepper planting locations in Taiwan (Kumar et al., 2018;Sheu et al., 2019). According to amplified fragment length polymorphism (AFLP) analysis, CA2 members are homogenous and mostly clonal, whereas CA1 are genetically diverse (Sheu et al., 2019). Strains Coll-153 and Coll-365 are grouped in CA1 pathotype and strain Coll-524 is a member of the CA2 pathotype. Understanding the variation among the three strains may provide insight about the high virulence of the CA2 pathotype.
Whole genome sequencing has had a significant impact on understanding the biology, ecology, genetics, and evolution of various organisms. In fungi, comparative genomic study of genetic variations has revealed mobile pathogenicity chromosomes in Fusarium (Ma et al., 2010), evolutionary adaptations from a plant pathogenic to an animal pathogenic lifestyle in Sporothrix (Teixeira et al., 2014), an influx of transposable elements creating a genetically flexible landscape to respond to environmental changes in Pyrenophora tritici-repentis (Manning et al., 2013), potential orthologs in adaptation to specific hosts or ecological niches in Botrytis, Colletotrichum, Fusarium, Parastagonospora, and Verticillium (Klosterman et al., 2011;Gan et al., 2016;Buiate et al., 2017;Syme et al., 2018;van Dam et al., 2018;Valero-Jimenez et al., 2019), the evolution and diversity of putative pathogenicity genes in Colletotrichum tanaceti and other Colletotrichum species (Lelwala et al., 2019;Liu et al., 2020), core and specific genetic requirements for fungal endoparasitism of nematodes in Drechmeria coniospora (Lebrigand et al., 2016), host-specific and symptom-related virulence factors horizontally transferred from Fusarium to Verticillium Zhang et al., 2019). Genomic comparisons with multiple races within a species have also been performed recently in Fusarium and Verticillium and revealed that secondary metabolites are crucial factors in F. fujikuroi for stunting symptom development and in V. dahliae for defoliation symptom development (Niehaus et al., 2017;Zhang et al., 2019). In addition, a study on 18 isolates of V. dahliae identified highly variable regions with race specific signatures (Ingram et al., 2020). However, most of these studies were in silico analyses. Only few included genetic studies to demonstrate the functions of the potential factors selected by in silico analysis, and these studies were on Fusarium and Verticillium (Klosterman et al., 2011;Niehaus et al., 2017;Chen et al., 2018;Zhang et al., 2019).
In our previous study, we investigated the plant-pathogen interactions between the chili pepper fruit and the three C. scovillei strains  and revealed the infection process and several potential virulence factors determining the infection and colonization of host tissues (Liao et al., 2012a). On the original chili pepper host, Coll-524 shows highest virulence, while Coll-365 displays the lowest virulence. Coll-365 shows significantly slower growth on agar medium and has the weakest virulence on Capsicum spp. Coll-365 produces less spores in axenic culture and in planta. Coll-365 was found to accumulate less turgor pressure in appressorium but produces higher levels of cutinase activity than the other two strains. Examination of the infection process showed that the three strains can form primary hyphae in the epidermal cells at 72 h post-inoculation (hpi), indicating no delay in the penetration step for Coll-365 compared to Coll-524 and Coll-153. The three strains can grow into the cuticle layer of chili pepper fruit at 24 hpi and form a highly branched structure (HBPS) within the cuticle layer and penetrate epidermal cells at 72 hpi (Liao et al., 2012b). After infecting the epidermal cells of chili pepper fruits, Coll-365 appeared to have less ability to colonize host tissues, formed limited lesions on infected tissue and produced much fewer spores on the infected tissue. In addition, Coll-524 was more resistant to host defense compound capsaicin than Coll-153 and Coll-365. In this previous work, the genetic variations and genes contributing to the defects of growth and virulence remained undeciphered.
In this study, to decipher the genetic variations of the three C. scovillei strains, we sequenced the genomes of the three strains, investigated the differences in their genome compositions, identified potential genes involved in the phenotypes, and verified genes involved in the defects of growth and virulence of Coll-365. We setup a simple mathematical method to identify open reading frame (ORF) variations and successfully identified DNA fragments involved in the defects in growth and virulence of Coll-365. Moreover, by gene transformation we have demonstrated four genes that display function in germination, growth and/or virulence in the chili pepper pathogen C. scovillei strain Coll-365.

Fungal Strains and Culture Conditions
Three Colletotrichum strains  were obtained from the AVRDC-The World Vegetable Center (Tainan, Taiwan) as mentioned in a previous study (Liao et al., 2012a). For sporulation, fungal strains were cultured on MS agar medium (0.1% yeast extract, 0.1% peptone, 1% sucrose, 0.25% MgSO 4 •7H 2 O, 0.27% KH 2 PO 4 , and 1.5% agar) for 6 days at 25 • C with a 12 h light/dark cycle. Spores were collected and used for general cultivation, assays of germination and appressorium formation, morphological examination, growth, and pathogenicity assay.

Genomic DNA Extraction and Library Preparation for Sequencing
Spores were inoculated into a 500 mL flask containing 100 mL MS liquid medium (1 × 10 5 spores/ml) at 25 • C with shaking at 200 rpm for 1 day. The young hyphae were collected by filtrating with a layer of miracloth. The DNA was extracted from the young hyphae using the CTAB extraction method (Chao et al., 2018). DNA purity and concentration were determined by Nanodrop 2000 (Thermo Fisher Scientific Inc., Waltham, MA, United States) and Qubit Fluorometric Quantification (Thermo Fisher Scientific Inc., Waltham, MA, United States) measurements. Libraries were generated from 2 to 5 µg of genomic DNA using the TruSeq 2 library preparation kit (Illumina, Inc., San Diego, CA, United States). Two types of libraries were prepared, shortfragment library (0.3 kb) for paired-end sequencing and longfragment library (3 and 8 kb) for mate-pair sequencing. DNA fragmentation was made by shearing using an M220 Focused-Ultrasonicator (Covaris, Inc., Woburn, MA, United States) and the fragments with 296-300 bp, 3 kb or 8 kb were selected by gelcutting. Without the 120 bp adaptor, the remaining 176-180 bp DNA fragments may have 20% overlap with 100 bp read length for paired-end sequencing. Long-fragment libraries, 3 and 8 kb, were prepared for Coll-524 only.

Genome Sequencing and Assembly
The Illumina HiSeq 2500 platform was used to sequence the genome of Coll-524 with 100 bp read length for paired-end and 3 kb mate pair libraries, and 125 bp read length for 8 kb mate pair library. The genomes of Coll-153 and Coll-365 were sequenced by Illumina Genome Analyzer with 150 bp read length. The raw reads were trimmed by Trimmomatic 0.39 to remove linkers and adaptors. Those refined reads of Coll-524 were further assembled by the ALLPATHS-LG de novo assembly program (Butler et al., 2008). The genome sequences of Coll-524 have been deposited in NCBI with accession numbers JAESDM010000001-JAESDM010000054.
Genome assembly of Coll-153 and Coll-365 was achieved by using the "Map Reads to contigs" function in CLC Genomic Workbench v.9.5.1. All trimmed reads were used to map to the Coll-524 genome under default settings. The consensus mapped sequences were extracted and joined to form a scaffold by the "Extract Consensus Sequence" function. Unmapped reads were collected and assembled by using the "De Novo Assembly" function in CLC software. The assembled contigs with sizes larger than 5 kb were collected and counted as the genome sequences of Coll-153 or Coll-365. The genome sequences of Coll-153 and Coll-365 have been deposited in NCBI with accession numbers JAESDN010000001-JAESDN010000059 for Coll-153 and JAESDO010000001-JAESDO010000059 for Coll-365. The genome completeness of the three strains were assessed using Benchmarking Universal Single-Copy Orthologs (BUSCO v.3.0) and the sordariomyceta_odb10 dataset (Simao et al., 2015;Waterhouse et al., 2018).

Multi-Locus Phylogenetic Analysis
The DNA sequences of five marker genes (ACT1, CHS1, GAPDH, ITS, and TUB2) were used for phylogenetic analyses (Cannon et al., 2012). The sequences of the five genes from the three strains, 170 other Colletotrichum strains and an outgroup species, Monilochaetes infuscans, were obtained and used in the assay. Among them, the five marker genes of 108 Colletotrichum species and M. infuscans were obtained from Q-bank database 1 according to the GenBank accession numbers listed in a publication of Cannon et al. (2012), while sequences from 62 Colletotrichum strains were obtained by downloading whole genome sequences from NCBI FTP and then identifying the five gene sequences by using CLC Genomic Workbench v.9.5.1. The information for all the genes sequences is provided in Supplementary Table 1. All collected sequences were aligned by MAFFT v.7 online version (Katoh et al., 2019) with default settings and then trimmed by trimAl 1.4.1 (Capella-Gutierrez et al., 2009) with automated1 setting. Five marker genes were concatenated in each strain. Bayesian inference of all the concatenated sequences was analyzed by MrBayes 3.2.7 (Ronquist et al., 2012). The Markov Chain Monte Carlo (MCMC) chains were set for 5,000,000 generations. Sample frequency was set for every 100 generations. Phylogenetic tree of the 174 strains were generated by FigTree v.1.4.3 2 .

Genome Annotation
Gene annotation was conducted using MAKER (v.2.31.10) (Holt and Yandell, 2011) pipeline with default settings (Augustus, PJ7, RNAseq reads). The genome of Coll-524 was used for gene annotation. The assembled contigs from Coll-524 RNAseq data was provided for EST evidence. The RNAseq data of Coll-524 included 10 sets of RNAseq data, which were infected purified cuticle layer, 18 h hyphal of axenic culture, and 8 sets of infected chili pepper fruits at different infection time points (unpublished). The CDS and protein sequence of Colletotrichum fioriniae PJ7 (PRJNA244481) was provided to MAKER for the "EST evidence" and "Protein Homology Evidence" functions respectively. The gene annotation of Coll-153 and Coll-365 was conducted using the same method as that for Coll-524. The CDS region of each gene was extracted from the genome sequence according to the GFF file which was originated from MAKER by GffRead (Pertea and Pertea, 2020). Gene annotation results of Coll-524, Coll-153, and Coll-365 have been deposited in NCBI.

Gene Functional Analysis, Clustering and Orthology Analysis
The genes of Coll-524, Coll-153 and Coll-365 were searched with nr database by using BLASTx (BLAST 2.2.29+) to obtain the information about gene function. To find effector candidates, protein sequences of the three strains were analyzed by EffectorP 2.0 (Sperschneider et al., 2018). Carbohydrate active enzymes (CAZymes) candidates were searched by dbCAN2 meta server . Candidate gene selection followed the recommendation setting provided by dbCAN2 meta server. Genes related to secondary metabolism were predicted using the Secondary Metabolite Unique Regions Finder (SMURF) database (Khaldi et al., 2010). The Pathogen-Host Interactions database (PHI-base) v.4.10 was downloaded from PHI-base website. All genes of the three strains were used to blast against PHI-base using CLC Genomic Workbench (Urban et al., 2020). Genes with E-values lower than 10 were selected according to the default setting of the PHIB-BLAST web server. The protein sequences of all annotated genes of Coll-153, Coll-365, and Coll-524 were used for ortholog analysis with the default settings of OrthoFinder (Emms and Kelly, 2019). After the Orthofinder analysis, singlecopy orthogroups were generated directly. Other analyses of orthologs, including multi-copy orthogroups with the same gene numbers, multi-copy orthogroups with different gene numbers, orthogroups in every two strains, strain-specific orthologs, and unassigned genes, were further classified by EXCEL. These unassigned genes were considered as "strain specific genes" together with genes in strain specific orthogroup. Genes of Coll-524 with extra copy number in comparison with the other two strains in the orthogroup, and genes analyzed to be Coll-524 strain specific genes were further considered as "Coll-524 extra genes."

Identification of Microsatellites, Repeats and Transposon Elements
MISA 2.0 was used for microsatellite analysis (Beier et al., 2017) and transposon elements were analyzed by Transposon PSI 2.2.26 3 . The transposon element sequences were acquired according to the GFF files provided by Transposon PSI. These sequences were further extracted to create a Transposon element database (TED). RepeatModeler was used for de novo genomewide repeat family analysis (Flynn et al., 2020). The sequences of TcMar-Fot1 were further blasted with BLASTn to the NCBI database and the functional domains were analyzed using InterPro (Blum et al., 2021) to determine if DDE was present (Braga et al., 2014).

Identification of Genome Sequence Variations and Open Reading Frame Variations
During the mapping of the Coll-153 or Coll-365 genome to the Coll-524 genome, data containing the variations between Coll-524 and the other two strains were generated with the "Extract Consensus Sequence" function. The data included four datasets, Misc. difference (miscellaneous differences), Insertion, Deletion and Removed, which are specified by CLC with default setting. Briefly, "Deletion" is for positions where a gap is called while the reference has a non-gap; "Insertion" is for positions where a non-gap is called while the reference has a gap; "Removal" is for positions where no reads are mapped to the reference; "Misc. difference" is for every position where the consensus is different from the reference. The Misc. difference dataset consisted of single nucleotide polymorphism (SNP) and polymorphism with more than one nucleotide, which was named as multiple nucleotide polymorphism (MNP) in this study. The four datasets were combined to generate the genome sequence variations and used for genome comparisons to identify ORF variations between Coll-524 and the other two strains as described in Figure 1. To simplify the analysis of ORF variations, the Removed datasets of Coll-153 and Coll-365 were combined. Briefly, all the sequences which existed in Coll-524 but removed from Coll-153 or Coll-365 were extracted from Coll-524 genome to create a "removed sequence database." Open reading frame variation analysis was based on the calculations of physical position overlapping between each ORF and the four datasets of variations (misc. difference, Insertion, Deletion, and Removed). The GFF file generated by MAKER could provide the physical positions of mRNA, exon, 5 UTR, CDS, 3 UTR for each gene. A gene usually contains multiple CDSs in eukaryotes and all CDSs together within the gene is the complete coding sequence to translate a protein encoded by the gene. To analyze ORF variation of Coll-153 and Coll-365, the ORF position of each Coll-524 gene was generated by setting the first nucleotide of the first CDS as the beginning position and the last nucleotide of the last CDS as the end position.
To analyze the variation of each ORF between Coll-524 and the other two strains, firstly, all scaffolds of Coll-524 were added together one by one to form a single sequence. The physical positions of each ORF were modified to fit into the single sequence. The four datasets of variations (misc. difference, Insertion, Deletion, and Removed) generated by CLC were tagged with physical positions in each scaffold of Coll-524. Therefore, the second step was to edit the physical positions of each variation according to the single genome sequence of Coll-524. The position differences of each variation to the ORF were compared and calculated as indicated in Figures 1C-G. The physical position overlapping situation of each ORF and each variation was identified by the calculation formula listed in Figure 1G. An ORF overlapped with a variation indicated that this ORF varied between Coll-524 and the other strain carrying this variation. ORFs overlapped with variation were further called "ORF-V."

Screening of Genes Potentially Involved in the Variations of the Three Strains for Functional Verification in Coll-365
To select genes for genetic analysis to verify their functions on the defects of growth and virulence of Coll-365, four selection criteria were set for gene selection. First were ORFs predicted to be ORF-V. Second were "Coll-524 extra genes" according to the result of OrthoFinder analysis. Third were genes analyzed to belonging to candidate effector, CAZyme, TF, SMURF, enzyme or PHI (E-value < 10e-50). Finally, were ORFs with <50% coverage to Coll-153 and/or 365 according to the blastn results. Genes with the four criteria were analyzed by Venn Diagram 4 . ORF regions of Coll-524 were compared to variation regions to find the overlapping. An ORF with the overlapped region indicated that this ORF is varied between the two strains. (C-F) Calculation methods used to identified all potential position relations between an ORF and variation regions. (G) Calculation for position difference to identify ORF variation types, removed or partial differences. VR, variation region; OS, ORF start site; OE, ORF end site; VS, variation start site; VE, variation end site.
Pathogen-host interaction related genes were selected with an E-value less than 10e-50. Genes with an E-value less than 10e-50 but greater than 10e-100 were considered to have almost identical sequences to the database according to the manual for CLC Genomics Workbench 9.5.

Plasmid Construction
Plasmids BsHR and BsBR were constructed and used in this research to clone the selected target gene(s) for transformation into Coll-365. pBsHR was constructed by cloning hygromycin resistance cassette (HygR) of pBHt2 (Mullins et al., 2001) into pBluescript SK(+) (Short et al., 1988) with PCR amplification, restriction enzyme digestion (EcoRV and SwaI, designed in the primers), and DNA ligation. Plasmid BsBR was modified from pBsHR by replacing (HygR) with bleomycin resistance gene (BleoR) of pAN8-1 (Mullaney et al., 1985). The BleoR cassette was PCR amplified with specific primers containing EcoRV and SwaI restriction sites and then cloned into EcoRV-and SwaI-digested pBsHR. To clone the target genes from Coll-524, a target gene containing an approximate 1-kb promoter and 0.5-kb terminator was amplified with a high-fidelity DNA polymerase KODplus-Neo (TOYOBO, Osaka, Japan) and ligated to EcoRV or SwaI digested and Shrimp Alkaline Phosphatase (rSAP) treated pBsHR. If double gene-transformation was needed, a second target gene was cloned in to pBsBR for protoplast transformation and phleomycin was used as the selection antibiotic (Pilgeram and Henson, 1990). For gene-transformation VII, the two genes were closely linked and were amplified together with one primer set by PCR and transferred to Coll-365 by single transformation. Transformations IX and X were completed with two steps of transformations by transferring additional gene(s) into the transgenic strain VII-1. Total 10 transformations (I-X) were conducted as listed in

Protoplast Transformation
Polyethylene glycol (PEG)/Ca +2 -mediated protoplast transformation was used to transfer the target gene to Coll-365. Spores collected from a 6-day MS agar medium were inoculated into a 500-ml flask containing 400 ml MS liquid medium and a stirrer bar, and cultured for 16 h under 25 • C at low speed to prevent the attachment of spores to the flask surface. The young hyphae were collected by filtering through a layer of miracloth and washed with 100 ml sterilized water and 100 ml wash buffer (1 M NaCl and 10 mM CaCl 2 ) under suction. The washed hyphae (250 mg) were then resuspended in a 50-ml flask containing 10 ml osmotic buffer (10 mM Na 2 HPO 4 , pH 5.8, 20 mM CaCl 2 , and 1.2 M NaCl) with 90 mg lysing enzymes (Sigma, L1412), and incubated in an orbital shaker with 150 rpm at 30 • C for 6 h. The undigested hyphae were removed by filtering through miracloth and the protoplasts were collected by centrifugation at 1500 g, 4 • C for 10 min. The protoplasts were resuspended by adding 100 µl mixture of four parts of STC buffer (1.2 M sorbitol, 10 mM Tris-HCl, pH 7.5, and 10 mM CaCl 2 ) and one part of PEG [50% (w/v) polyethylene glycol 3350, 10 mM Tris-HCl, pH 7.5, and 10 mM CaCl 2 ]. For transformation, 20 µl of 20 µg plasmid DNA was added to a tube containing 100 µl protoplast suspension with gentle mixing and the mixture was placed in ice for 20 min. PEG was added into the DNA-protoplast solution four times with different volumes. The adding steps were performed by following a previous description (Liu and Friesen, 2012) with slight modifications. Briefly, 20, 80, 300, and 600 µl PEG were added into protoplast suspension step by step. After each addition, the mixture was gently mixed and left to stand for 3 min, and then left to stand a further 20 min after the last addition at room temperature. After the 20 min incubation, 3 ml regenerate buffer [4 mM Ca(NO 3 ) 2 ·4H 2 0, 1.5 mM KH 2 PO 4 , 1 mM MgSO 4 ·7H 2 O, 2.5 mM NaCl, 60 mM glucose, and 1 M sucrose] was added into the PEG-DNA-protoplast mixture and cultured at 25 • C for 16 h with 100 rpm shaking. The protoplasts were collected with centrifugation at 25 • C, 1800 g for 10 min. A total of 300 µl regenerated protoplasts were obtained and then evenly spread on three regeneration agar medium plates. After incubation for 6-8 days at 25 • C, the transformants were isolated, single spore purification and PCR verification for the target gene insertion.

PCR Analysis for Gene Loss in Coll-153 and Coll-365 and Verification of Gene Transformation in Coll-365 Transformants
Eight genes were selected for functional assay in Coll-365 as listed in Table 1. PCR assays were used to confirm the loss of the eight genes in Coll-365 using specific primer sets as listed in Supplementary Table 3. Genomic DNA was extracted as described above and tubulin gene (Accession number: MW073123) was used as the control for PCR assay. To verify gene transformation of transgenic Coll-365 strains, regular PCR and RT-PCR assays were performed. Regular PCR using genomic DNA as template to amplify the transgenic gene was used for all transgenic strains. For transgenic strains carrying more than one transgene, RT-PCR assays were conducted in addition to the regular PCR. Primers used in theses assays are listed in Supplementary Table 3. For RNA extraction, spores were inoculated into a 50 mL flask containing 20 mL MS liquid medium (1 × 10 5 spores/ml) at 25 • C with shaking at 150 rpm for 2 days. The hyphae were collected by filtrating with a layer of miracloth. The RNA was extracted with Trizol reagent (Invitrogen). RNA purity and concentration were determined and cDNA was synthesized using 5 µg RNA and the MMLV reverse transcription kit (Invitrogen). PCR was performed using 20 µl reaction volume contained 1 × PCR Buffer, 0.2 mM dNTP, 0.4 µM primers, 1 U Blend Taq plus (TOYOBO), and 1 µl of template DNA. PCR reaction was performed as follows: 94 • C for 2 min and 25 cycles of 94 • C for 30 s, 60 • C for 30 s, and 72 • C for 3 min. The PCR products were analyzed by agarose gel electrophoresis.
Fungal Growth, Spore Germination, and Appressorium Formation Assays Fungal growth was assayed by inoculating spore suspension on the center of three different agar media. Spore germination and appressorium formation were detected in 96-well plates. Briefly, spores were collected from MS agar medium and the concentration was adjusted to 2 × 10 5 spores/ml by sterilized water. For growth assay, a 5-µl, spore suspension containing 500 spores was dropped on the center of MS agar medium plate and incubated for 6 days at 25 • C with a 12 h light/dark light cycle. Colony sizes were measured by using ImageJ 1.53a (Schneider et al., 2012). To detect germination and appressorium formation, 80 µl of spore suspension was added into each well of a 96-well plate and incubated at 25 • C with 12 h light/dark light cycle for 8 h. To count the numbers of germinated spores  and appressorium, the 96-well plate was placed upside down and examined using a light microscope. All the experiments mentioned above were conducted at least two times with three replicates for each strain at each time. The statistical significance of the data was determined by One-way ANOVA at P < 0.05 using Statistical Package for the Social Sciences software, version 20 (IBM SPSS software, IBM Corp., Armonk, NY, United States).

Pathogenicity Assay
Pepper fruits were used for pathogenicity assay, including Capsicum annuum cv. Hero, Fushimi amanaga and Groupzest. The fresh harvested fruits were washed with water and surface sterilized with 0.5% bleach and left to dry overnight. Fungal spores were collected from MS agar medium and the concentration was adjusted to 2 × 10 5 spores/ml. Spore suspension was dropped (5 µl/drop) on the fruit surface with dual inoculation, in which the wild type strain was inoculated on one side of the fruit and a transgenic strain was inoculated on the other site of the same fruit.  (Supplementary Figure 1). Further examination of the original hosts of members in acutatum species complex revealed that the three strains formed a small clade with other strains and most of the members were isolated from Capsicum plants (Supplementary Figure 2). However, the phylogenetic tree generated from the 173 Colletotrichum strains could not distinguish the phylogenetic relationship of the three strains with  than C. guajavae IMI 350839 (Figure 2A). In addition, Coll-524 was closer to C. scovillei TJNH1 isolated from pepper in China and two C. fioriniae strains (HC89 and HC91) isolated from apple in the United States than Coll-365 and Coll-153 (Figure 2A). These results indicated that strains Coll-524, 153 and 365 all belong to the C. scovillei species. In addition, the three strains showed significant differences in growth and virulence but had a similar preference for temperature range (Figures 2B-D; Liao et al., 2012a).

Genomic Comparison Revealed DNA Deletion Patterns Among Different Scaffolds in the Three Strains
The genomic sequences of Coll-524 were generated from pairedend and mate-pair libraries and assembled and annotated using the ALLPATHS-LG (Butler et al., 2008) and MAKER (Holt and Yandell, 2011) pipelines. Genomic sequences of Coll-153 and Coll-365 were generated from paired-end libraries and mapped to the assembled Coll-524 genome (for details, see section "Materials and Methods"). The results of assembly and annotation are summarized in Coll-153 and Coll-365 had their own special genome fragments compared to Coll-524, and there were six scaffolds for Coll-153 and seven scaffolds for Coll-365. One (0.9-kb) and two (0.9-kb and 5.1-kb) of the 54 scaffolds in Coll-524 did not exist in Coll-153 and Coll-365, respectively, and no genes were encoded in the two scaffolds. A total of 29 and 30 genes were annotated from Coll-153 and Coll-365 specific genome sequences, respectively, and 23 of them are homologs in both Coll-153 and Coll-365. None of these genes had traits related to pathogenicity when blasting to the PHI database.
Genome mapping of Coll-153 and Coll-365 to Coll-524 was analyzed with CLC Genomic Workbench v.9.5.1 and provided the specified information of DNA insertion, deletion, removal, and Misc. difference in Coll-153 and Coll-365 (Supplementary Table 4). The results showed that Coll-153 and Coll-365 had similar patterns within the four types of variations, but Coll-365 had a slightly higher number of total variations than Coll-153. The removed sequence database showed that 991,585 and 1,179,624 bp were absent in Coll-153 and Coll-365, respectively, compared to Coll-524. The largest removed fragments were 28,776 bp in Coll-365 and 10,125 bp in Coll-153. Scaffolds 19, 20 and 22 in Coll-153 and Coll-365 showed high numbers of DNA sequence removal compared to the three scaffolds in Coll-524 (Figure 3). The total number of sites of DNA sequence removal in scaffolds 19, 20, and 22 were 377, 386, and 256 in Coll-153, and 167, 114, and 76 in Coll-365, respectively. A notable difference in DNA sequence removal patterns between Coll-153 and Coll-365 was that DNA sequence removal occurred significantly in scaffolds 17 in Coll-365 but not in Coll-153 (Figure 3). Large DNA fragment removals occurred in Coll-365 more frequently than Coll-153 (Supplementary Table 5). However, most of them were located at the short scaffolds 19, 20, and 22. Coll-365 had four removals with DNA fragments larger than 20 kb, but none of the large fragments was found in Coll-153. Regarding the removal of DNA fragments larger than 5 kb, a total of 66 removals appeared in Coll-365 but only 8 in Coll-153. Small DNA fragment removals (<1 kb) were found most often in Coll-153 (Supplementary Table 5).

Open Reading Frame Variation Analysis Identified Open Reading Frame Losses in Strains Coll-153 and Coll-365
DNA insertion, deletion, removal, and misc. differences (SNP and MNP) occurring in Coll-153 or Coll-365 may affect the open reading frames (ORFs) of their genes. To understand the influence on ORFs, we used a simple mathematical method to detect ORF variation (ORF-V) for all ORFs in Coll-153 and Coll-365 as illustrated in Figure 1 and described in section "Materials and Methods." The results showed that most of the ORF-Vs occurred with SNP in Coll-153 and 365, and Coll-365 appeared to have significantly greater numbers of full ORF losses than Coll-153 ( Table 3). The total numbers of ORF-V of Coll-153 and Coll-365 compared to Coll-524 were 708 and 725, respectively, and 799 of them were singular. A total of 91 and 74 ORF-Vs were solely found in Coll-365 and Coll-153, respectively, but most of these ORF-Vs were SNP in Coll-153 or Coll-365, indicating that the two strains had similar genes with ORF-V to Coll-524. The major difference in the two strains with regard to ORF-V was found in scaffold 17 and was caused by ORF deletion as described below. The 799 singular ORFs were used for further analysis.
Open reading frames affected by DNA removal (ORF-Vrem) in each scaffold were further analyzed and the results were displayed in Figure 4. ORF-Vrem mostly occurred in scaffolds 19, 20 and 22 in Coll-153 and 365. There were a total of 260 and 274 ORF-Vrem events in Coll-153 and Coll-365, respectively. All ORFs were ORF-Vrem in scaffolds 19, 20, 22, and 44; however, scaffold 44 was a short scaffold with only one ORF (Figure 4). The distribution of ORF-Vrem in Coll-153 and Coll-365 was very similar except that 14 ORFs clustered in a 34-kb genomic fragment were totally or partially removed in Coll-365 compared to 1 ORF with SNP only in Coll-153 in scaffold 17 (Supplementary Figure 3). The 14 ORFs encode 4 transcription regulation-related proteins, 1 GPIanchored protein, 4 enzymes and enzyme-related proteins, and 5 hypothetical proteins. In addition, when compared with 62 genomes of other Colletotrichum strains that are available in the NCBI database, the 14 genes appeared in almost all members of the acutatum species complex, except Coll-365 ( Figure 5). Moreover, CsWDCP and CsPLAA existed in nearly all assayed strains, indicating that the two genes are core genes of the Colletotrichum species. Five of these genes existed in almost all the strains in the acutatum species complex, suggesting the possibility of them being acutatum species complex-specific genes (Figure 5).

Gene Ortholog Analysis Showed That the Three Strains Carry Slightly Different Ortholog Compositions
To understand the difference in orthology among the three strains, proteins were analyzed using OrthoFinder (Emms and Kelly, 2019). Ortholog analysis showed a high similarity in gene compositions among the three strains as shown in Supplementary Table 6. More than 88% genes in the three strains belonged to single-copy orthogroups, indicating that the three strains all carried the 13,779 single-copy genes (Supplementary Table 6). A total of 502 multiple-copy orthogroups with the same gene numbers containing 1,167 genes were all identified in the three strains. A combination of single-copy orthogroups and multi-copy orthogroups with the same gene copy number showed that there were a total of 14,946 genes in the three strains and they were 95.7, 96.9, and 97.1% of the total genes in Coll-524, Coll-153, and Coll-365, respectively. This suggests that the three strains had high similarity in gene compositions (Supplementary Table 6). The ortholog differences within the three strains analyzed using the dataset of multi-copy orthogroups with different gene number showed Coll-524 carried considerably greater numbers of genes than the other two strains. The three strains all carried the 128 orthogroups but they had different gene numbers in each of the orthogroups. Among the 128 orthogroups, Coll-524 had 97 and 101 more genes than Coll-153 and Coll-365, respectively, and there were a total of 103 singular genes found from the 97 and 101 genes. Regarding the differences in orthogroups that only existed in two strains, 92, 33, and 94 orthogroups were identified in Coll-524 and Coll-153, Coll-524, and Coll-365, and Coll-153 and Coll-365, respectively (Supplementary Table 6). Further analysis of the differences between pairs of two strains showed that some orthogroups were only found in two strains, and Coll-524 and Coll-153 shared more orthogroups (92 orthogroups, 94 genes) than Coll-524 and Coll-365 (33 orthogroups, 33 genes). Based on the data mentioned above, the combinations of the 103 genes, the 94 and 33 genes, the 85 of strain-specific genes and the 196 unassigned genes of Coll-524 strain, 511 genes in total, were used together with other criteria for further gene selections used in gene functional analysis.

Comparison Analysis Revealed Variations in Pathogenicity-Related Categories in the Three Strains
To understand the variations in pathogenicity-related genes among the three strains, analyses of pathogenicity-related functional categories were performed. Six categories were used, candidate effectors, carbohydrate active enzymes (CAZymes), secondary metabolism clusters, transcription factor (TF), and enzymes in the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, and pathogen-host interaction (PHI) related genes. The results showed the three strains had very similar numbers of genes in various categories (Supplementary Table 7). When combining these data and the ortholog data in a further analysis, notable differences were observed (   in Supplementary Table 9. Coll-153 and Coll-365 had remarkably fewer genes than Coll-524 in PHI (114 genes) and candidate effector (33 genes) categories (Supplementary Table 9). The ORFs of the 33 candidate effector genes were further analyzed using nucleotide BLAST (blastn) against 62 genomes of Colletotrichum strains with genomes available in the NCBI database. As shown in Figure 6, two strains of C. acutatum species complex (Coll-524 and C. scovillei TJNH1) carried all the 33 candidate effector genes, while one strain of this complex, C. acutatum 1, had 32 of the candidate effectors. However, two strains with phylogenetics closely related to Coll-524, C. fioriniae HC89 and HC91, only carried two of the 33 candidate effector genes. Among the 62 Colletotrichum strains, 27 did not carry any of the 33 candidate effectors and they were mainly in the species complexes graminicola, spaethianum, and gloeosporioides. Interestingly, none of the members of the graminicola species complex carried any of the 33 candidate effectors. Moreover, a total of 24 of the 33 candidate effectors were only found in the acutatum complex, and not in the other complexes.
The variations of PHI genes among the three strains in scaffolds 17, 19, 20, and 22 were found to mainly occur in scaffolds 19, 20, and 22, especially genes required for fungal full virulence in the PHI database. A total of 79 genes with functions related to reducing virulence were found to be partially or fully removed in Coll-365 and 71 of them were not in Coll-153 (Supplementary Table 9).
The variations of the three strains in KEGG pathways on scaffolds 17, 19, 20, and 22 were found to mainly occur on scaffold 20, in which 100% of the KEGG genes were lost in Coll-153 and Coll-365. The investigation of KEGG combined with ortholog analysis showed that gene number variations within the three strains in scaffolds 17, 19, 20, and 22 were found in five metabolic pathways. These pathways were metabolisms of drugs, purine, thiamine, phenylpropanoid, and N-glycan biosynthesis. Coll-524 carried more genes than Coll-153 and/or Coll-365 in the five pathways (Supplementary Figures 4-8).
FIGURE 5 | The distribution of the 14 genes of scaffold 17 in strains Coll-524, 153, and 365, and the 62 Colletotrichum strains with whole genome sequences deposited in NCBI. The 14 genes were analyzed using nucleotide BLAST against the 62 genomes of Colletotrichum strains. Blue squares indicate the presence of genes by the blasting with coverage > 90% and identity > 60%. The gene was ordered from left to right by distribution frequency in the 62 genomes.
FIGURE 6 | The distribution of the 33 candidate effector genes in strains Coll-524, 153, and 365, and the 62 Colletotrichum strains with whole genome sequences deposited in NCBI. The 33 candidate effector genes were analyzed using nucleotide BLAST against the 62 genomes of Colletotrichum strains. Blue squares indicate the presence of genes by the blasting with coverage > 90% and identity > 60%. The gene was ordered from left to right by distribution frequency in the 62 genomes.

Fot1 (TcMar-Fot1)-Like Transposon Significantly Decreased in Strain Coll-365
Repeat sequences were analyzed with MIcroSAtellite (MISA) (Beier et al., 2017), TransposonPSI and RepeatModeler (Flynn et al., 2020). Repeat sequence analysis with MISA showed the microsatellite compositions were very similar in the three strains (Supplementary Table 10). TransposonPSI analysis revealed that 11 transposon families were found in this species but the three strains carried different amounts of the transposon families. Coll-524 contained all 11 families, but Coll-153 and Coll-365 only carried 9 families. In addition, Coll-365 had significantly less DDE_1 domain-containing transposon than Coll-153 and Coll-524 (Figure 7). Repeat sequence analysis with RepeatModeler showed that Coll-524, Coll-153, and Coll-365 had 53, 67, and 44 repeat families, respectively. The DDE_1 domain identified by TransposonPSI is from the TcMar-Fot1 family in the RepeatModeler database. The complete Coll-Fot1 sequence was identified by combining sequences provided by the TransposonPSI and RepeatModeler (Supplementary Figure 9).

Eight Genes Were Selected for Functional Analysis
Orthogroup and ORF-V analysis revealed two sets of variations in gene composition of the three strains. To obtain comprehensive analysis of variation, additional analysis was conducted by blastn using Coll-524 genes against the genome sequences of Coll-153 and 365. A total of 219 genes from Coll-153 and/or 365 with less than 50% coverage to Coll-524 were identified and used with genes selected from the other three groups, orthogroup analysis (511 genes), ORF-V analysis (799 genes) and pathogenicity-related categories analysis, for further analysis (Supplementary Figure 10).
The results are shown in Supplementary Figure 10A. A total of 59 genes were found from the four groups. Among the 59 genes, 47 genes were lost in both Coll-153 and Coll-365, and 12 genes were only absent in Coll-365, including 8 candidate effectors, one CAZyme, one TF, one KEGG pathway, and one PHI (Supplementary Figure 10B). Most of the 47 genes were located at scaffolds 19 and 20, while most of the 12 genes were located at the scaffolds 17 and 20 (Supplementary   Figure 10A). These 137 genes contained five cytochrome P450 genes, five FAD binding domain-containing protein genes and large numbers of unknown and hypothetical protein genes. Eight genes, five from the 59 gene group, and three from the 137 gene group, which were highly expressed during infection, based on RNAseq data, were then selected for functional analyses. Five genes from the 59 gene group are CsEF1, CsBGN, CsLAP, CsBZTF, and CsPLAA (Supplementary Figure 10B and Table 1), and the three genes from the 137 gene group are CsWDCP, CsCZCP and a hypothetical protein gene (CsGIP). CsGIP was shown to inhibit the germination of Coll-365 in this study (see below). Thus, it is named as germination inhibited protein (GsGIP). Genes encoding cytochrome P450 or FAD binding domain-containing protein were not selected because they belong to large gene families in Coll-524. The eight genes were analyzed by PCR to confirm their absence in the genome of Coll-153 and/or Coll-365 (Supplementary Figure 11A).

The Genes Lost in a 34-kb Fragment Have a Large Effect on Strain Coll-365 Morphology and Pathogenicity
The eight selected genes were driven by their native promoters and transferred into Coll-365. Five genes were located within a 34-kb fragment at scaffold 17 and two sets of two genes were closely linked (CsCZCP and CsBZTF; CsWDCP and CsPLAA). Therefore, for Coll-365 transgenic strains carrying two closely linked genes, three or four of the four genes were generated ( Table 1). All transgenic strains were selected with PCR assays for the insertion fragment and/or further confirmed with RT-PCR for their expression in Coll-365 (Supplementary Figures 11B,C). The single spore-purified transgenic strains were used for assays on spore germination, appressorium formation, growth, and pathogenicity assay on chili pepper fruits. Two independent transgenic strains of every gene transformation were used in all assays. Coll-524, Coll-153, and Coll-365 showed similar ability with regard to spore germination and appressorium formation. Two independent transgenic strains of transformation IV (CsGIP ; Table 1) showed a significantly lower germination rate than the wild-type Coll-365 in two independent experiments (Figure 8). There were no significant differences in appressorium formation between the gene transformation strains and wildtype strain Coll-365, indicating that the penetration ability of the transformants might not be affected. In the growth assay, Coll-365 had significantly slower growth than Coll-153 and Coll-524. Among the transgenic strains, two independent transgenic strains of transformation VIII (CsPLAA), IX (CsBZTF, CsCZCP, and CsPLAA), and X (CsBZTF, CsCZCP, CsPLAA, and CsWDCP) showed significantly enhanced growth on MS agar medium in comparison with their wild-type strain Coll-365, but still showed slightly slower growth than Coll-524 (Figure 9 and Supplementary Figure 12). In addition, transgenic strains VIII, IX, and X displayed puffy colony with abundant aerial hyphae and unclear orange rings that usually indicate sporulation  ( Figure 9B). However, no significant difference on sporulation was observed between these transgenic strains and their wildtype strain Coll-365. For pathogenicity, Coll-365 was compared with transgenic strains carrying gene CsEF1, CsLAP, CsBGN or CsGIP, and Coll-153 or Coll-524 on chili pepper Capsicum annuum cv. Hero. Lesion size calculation showed that Coll-365 had no notable difference from the tested transgenic strains, and significantly lower virulence than Coll-153 (P-value = 0.023) and Coll-524 (P-value = 0.004). Inoculation assays conducted on Capsicum annuum cv. Fushimi-amanaga revealed that strains generated from transformation VII -X produced larger lesion sizes than wild-type Coll-365 (Figure 10 and Table 5). The mean lesion sizes were increased between 49 and 400%. The lesion size increase level was similar for each transgenic strain in two independent experiments ( Table 5). Transgenic strains of transformation VI and X were further inoculated on Capsicum annuum cv. Groupzest, and transformation X caused significant lesion size increase (Supplementary Table 12).

DISCUSSION
Colletotrichum species can cause great economic loss to various crops. Among the Colletotrichum, more than 30 species have been documented to cause chili anthracnose disease, constituting a major limitation to chili pepper production in tropical and subtropical regions (Saxena et al., 2016). In Taiwan and other Asia countries, C. scovillei attacks chili fruits (De Silva et al., 2021), but its interactions with hosts at the molecular genetic level remain to be examined. In this study, we focused on genomic comparisons of the three C. scovillei strains and combined these with genetic approaches to identify genes involved in fungal growth and virulence. We have provided the genome sequences of the three strains with gene functional annotation for C. scovillei. We have setup a simple mathematical method to search for ORF variations between Coll-153 or Coll-365 and Coll-524 and successfully identified DNA fragments containing genes involved in the defects in growth and virulence of Coll-365. Moreover, by genetic assay we have demonstrated four genes that have functions in germination, growth and/or virulence of C. scovillei. Our data suggested that the three strains all belong to the acutatum complex and are members of C. scovillei because they were grouped in the same clade of C. scovillei CBS 126529, the holotype strain of C. scovillei (Damm et al., 2012a). The data were consistent with a previous study that showed that Coll-524 and Coll-153 are in the C. scovillei clade of acutatum species complex (De Silva et al., 2021).
For closely related species, de novo assembly by referenceguided or mapping through a well-sequenced species is efficient, and can often improve the completeness of the genome sequence (Lischer and Shimizu, 2017). Our data revealed the FIGURE 10 | Anthracnose lesions caused by Coll-365 and its transgenic strain X-1 on Capsicum annuum cv. Fushimi-amanaga 5 days after paired inoculation. The photos were used to show the inoculation, symptom development and lesion size calculation in this study. The upper photograph shows the original lesions and the lower photograph shows the marked lesion sizes for Spot image calculation. The inoculation was performed by dropping 5 µl spore suspensions of Coll-365 on the left side (marked with blue color) and a transgenic strain on the right side (marked with yellow color) of each fruit.
major variation of Coll-153 and Coll-365 was the result of sequence removal. Genetic variations can be caused by three different events, local nucleotide sequence changes, intragenomic rearrangement of DNA segments and the acquisition of a foreign DNA segment by horizontal gene transfer (Arber, 2010). The transposable element is one of the major factors leading to variations in fungi. In C. higginsianum, two closely related strains carry large-scale rearrangements and strainspecific regions that are frequently associated with transposable elements (Tsushima et al., 2019). In the two C. higginsianum strains, the gene-sparse regions are transposable element-dense regions that have more candidate effector genes, while genedense regions are transposable element-sparse regions harboring conserved genes (Tsushima et al., 2019). C. higginsianum and other eukaryotic plant pathogens, such as Phytophthora infestans and Leptosphaeria maculans have been referred to as "twospeed genomes" as these genomes have a compartmentalized genome structure to protect housekeeping genes from the deleterious effects of transposable elements and to provide rapid evolution of effector genes (Grandaubert et al., 2014;Faino et al., 2016). In the genomes of Coll-524, Coll-153, and Coll-365, we did not find a relationship between the density of transposable elements and candidate effector genes or housekeeping genes. Coll-524 and C. higginsianum IMI 349063 have similar genome sizes but C. higginsianum IMI 349063 carries nearly three times the number of transposable element genes to Coll-524 (Supplementary Table 13). Moreover, compared with C. higginsianum IMI 349063, Coll-524 was found to be unlikely to have sequence structures like mini chromosomes which harbor high density transposable elements with over 40% sequences encoding transposable elements (Plaumann and Koch, 2020).
We designed a simple mathematical method to identify ORFvariations. Applying this method, a 34-kb fragment containing 14 genes that exist in the Coll-524 genome but are almost completely lost in Coll-365 were identified. The loss of the 14 genes in Coll-365 is likely caused by DNA deletion that might have resulted from DNA rearrangements. DNA rearrangement frequently occurs in fungi via the parasexual cycle that leads to recombination and chromosome gain or loss (Forche et al., 2008). It is possible that Coll-524 gained part or all of these genes by DNA rearrangement during the parasexual cycle in Coll-153 or Coll-365. Coll-365 and Coll-153 were collected 4 years earlier than Coll-524 from the fields by the ACRDC-the World Vegetable Center (Liao et al., 2012a). Coll-153 and Coll-365 are grouped in the CA1 pathotype and Coll-524 is a member of the CA2 pathotype. CA2 has higher virulence than CA1 and has replaced CA1 to become dominant in Taiwan (Kumar et al., 2018;Sheu et al., 2019). Thus, Coll-524 might have arisen evolutionarily from the CA1 pathotype members such as Coll-153 and Coll-365 by gaining genes horizontally or via gene arrangement. They might be evolutionarily developed from different branches of C. scovillei.
Gene family expansions and contractions are related to the changing of host range and virulence of plant pathogens (Baroncelli et al., 2016;Gan et al., 2016). A set of 33 candidate effectors was completely lost in Coll-365, but existed in two closely related chili pepper pathogens C. acutatum strain 1 and C. scovillei strain TJNH1. This suggests that the 33 candidate effectors might play a role in the virulence of Coll-524 to chili pepper. However, when one of the candidate effectors was transferred to Coll-365, it did not promote the virulence of the transgenic strains on chili pepper fruit, suggesting that multiple effectors may work together to affect fungal virulence (Arroyo-Velez et al., 2020).
CAZyme genes are potentially involved in fungal growth and colonization of host tissues. Coll-524, Coll-153 and Coll-365 have 661-663 CAZyme genes. The gene numbers of CAZyme and proteases vary in different Colletotrichum species complexes. Particularly, CAZymes and proteases were found to be associated with Colletotrichum spp. that have broad host range (Baroncelli et al., 2016). Research on Magnaporthaceae specific clusters showed CAZyme gene families may contribute to speciation (Okagaki et al., 2016). Transgenic strains carrying endo-beta-1,3-glucanase or laccase did not result in a gain of function of growth and virulence of chili pepper. Endo-beta-1,3-glucanase is expected to play a key role in cell wall softening in ascomycetes. The endo β-1,3-glucanase in Schizosaccharomyces pombe, Candia albicans, and Saccharomyces cerevisiae has been showed to be required for dissolution of the primary septum to allow cell separation (Baladron et al., 2002;Martin-Cuadrado et al., 2003;Esteban et al., 2005). Fungal laccase can detoxify phenolic pollutants and has a role in detoxifying host defense phenolic compounds, such as capsaicin in chili pepper (Strong and Claus, 2011). Coll-524 has five copies of laccase-related genes and two endo-beta-1,3-glucanase and one endo-beta-1,3(4)-glucanase. A single laccase or endo-beta-1,3-glucase gene may not be able to contribute the function of growth or virulence of Coll-365. Eight genes were selected for gene transformation in Coll-365. Three genes CsEF1, CsLAP and CsBGN did not show a role in germination, growth and virulence of Coll-365 after transforming into Coll-365. The candidate effector gene CsEF1, which encodes a hypothetical protein, was expressed strongly and specifically at the early infection stage, while laccase gene CsLAP was expressed highly at all infection stages but not in axenic cultures. Endobeta-1,3-glucanase gene CsBGN was highly expressed in axenic cultures and at the late infection stage. CsGIP was expressed at extremely high levels in axenic culture and at all infection stages. The transformation of CsGIP to Coll-365 showed no significant influence on fungal growth and virulence but inhibited spore germination ability. CsGIP has a non-specific hit (2.06e-05) in the Conserved Domains Database (CCD) of NCBI to COG1196. COG1196 is classified as a model that may span more than one domain and is the only member of the superfamily cl34174 in the CDD. The best hit of gene CsGIP in CDD is condensin subunit SMC4 (NP_013187.1) of Saccharomyces cerevisiae S288C.
Condensin is a subunit of SMC (Structural Maintenance of Chromosomes) protein complexes that exist in all eukaryotes and play important roles in chromosome organization and dynamics (Losada and Hirano, 2005;D'Ambrosio et al., 2008). Condensin was shown as a major determinant that changes the chromatin landscape as cells prepare their genomes for cell division in fission yeast (Kakui et al., 2017). It is possible that the transgenic Coll-365 carrying CsGIP might in some way lead to the changes of chromosome organization and dynamics and then reduce spore germination.
Four genes (CsBZTF, CsCZCP, CsPLAA, and CsWDCP) located within the 34-kb fragment at scaffold 17 were transferred into Coll-365 as individual genes or gene sets since they are closely linked. CsBZTF had a high expression level in axenic cultures and all infection stages. The other three genes were expressed in axenic cultures and the late infection stage, but genes CsPLAA and CsCZCP also had relatively high expression in the cuticle infection stage. These four genes did not influence the germination and appressorium formation of Coll-365. However, transgenic strains carrying one CsPLAA, three genes CsBZTF, CsCZCP, and CsPLAA, or four genes CsBZTF, CsCZCP, CsPLAA, and CsWDCP have similar colony sizes but are larger than the wild-type Coll-365, indicating that CsPLAA contributes to the growth enhancement of Coll-365. CsBZTF is a bZIP transcription factor and might be highly related to pathogenicity (Yu et al., 2017). The opposite direction CsCZCP, 1,313 bp away from CsBZTF encodes a C6 zinc finger domain-containing protein.
The C6 zinc finger proteins are strictly fungal proteins and involved in fungal-host interactions (MacPherson et al., 2006). CsBZTF or CsCZCP alone did not affect the virulence of Coll-524. However, transgenic strains carrying the two genes can enhance the virulence of Coll-365, indicating that the coexistence of the two genes together might contribute to the virulence enhancement of Coll-365.
CsPLAA encodes a protein related to phospholipase A2activation, while gene CsWDCP encodes a WD domaincontaining protein. Transgenic strains carrying one CsPLAA, three genes CsBZTF, CsCZCP, and CsPLAA, or four genes CsBZTF, CsCZCP, CsPLAA, and CsWDCP enhance the virulence of Coll-365, indicating CsPLAA might be the major contributor to the virulence enhancement of Coll-365. Moreover, transgenic strains carrying genes CsBZTF, CsCZCP, and CsPLAA, or genes CsBZTF, CsCZCP, CsPLAA, and CsWDCP have a smaller P value than transgenic strains carrying CsPLAA only ( Table 5), which is consistent with the virulence contribution by CsBZTF and CsCZCP as mentioned above. CsPLAA encodes a protein related to phospholipase A2-activation. Phospholipase A2activating protein has been relatively intensively studied in humans and is involved in apoptosis and tumor regression; however, there is only a single study related to its functional characterization in fungi (Zhang et al., 2009;Liu et al., 2011). The Magnaporthe royzae Moplaa gene encodes a phospholipase A2-activating protein and the deletion of this gene results in the reduction of fungal growth and pathogenicity (Liu et al., 2011).

CONCLUSION
In this study, four genes were identified which function in germination, growth and/or virulence of the chili pepper pathogen C. scovillei. CsGIP reduces the germination of Coll-365. Genes CsCZCP and CsBZTF together enhance the virulence of Coll-365. CsPLAA enhances the growth and virulence of Coll-365. In addition, the 34-kb fragment in scaffold 17 contributes a lot to the defects in growth and virulence in Coll-365 because three genes CsPLAA, CsBZTF, and CsCZCP are located on this fragment. Interestingly, we also identified 33 candidate effectors lacking in Coll-153 and Coll-365 which may be involved in the full virulence of Coll-524 on chili pepper. Future study will focus on the 33 candidate effectors and other candidate effectors that may have led to Coll-524 becoming a dominant and virulent pathogen.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: NCBI BioProject -PRJNA692809.