Comparative genomics profiling revealed multi-stress responsive roles of the CC-NBS-LRR genes in three mango cultivars

The nucleotide-binding site-leucine-rich repeat (NBS–LRR) gene family is the largest group of disease resistance (R) genes in plants and is active in response to viruses, bacteria, and fungi usually involved in effector-triggered immunity (ETI). Pangenome-wide studies allow researchers to analyze the genetic diversity of multiple species or their members simultaneously, providing a comprehensive understanding of the evolutionary relationships and diversity present among them. The draft pan-genome of three Mangifera indica cultivars (Alphonso, Hong Xiang Ya, and Tommy atkins) was constructed and Presence/absence variants (PAVs) were filtered through the ppsPCP pipeline. As a result, 2823 genes and 5907 PAVs from H. Xiang Ya, and 1266 genes and 2098 PAVs from T. atkins were added to the reference genome. For the identification of CC-NBS-LRR (CNL) genes in these mango cultivars, this draft pan-genome study has successfully identified 47, 27, and 36 members in Alphonso, H. Xiang Ya, and T. atkins respectively. The phylogenetic analysis divided MiCNL proteins into four distinct subgroups. All MiCNL genes are unevenly distributed on chromosomes. Both tandem and segmental duplication events played a significant role in the expansion of the CNL gene family. These genes contain cis-elements related to light, stress, hormone, and development. The analysis of protein-protein interactions (PPI) revealed that MiCNL proteins interacted with other defense-responsive proteins. Gene Ontology (GO) analysis indicated that MiCNL genes play a role in defense mechanisms within the organism. The expression level of the identified genes in fruit peel was observed under disease and cold stress which showed that Mi_A_CNL13 and 14 were up-regulated while Mi_A_CNL15, 25, 30, 31, and 40 were down-regulated in disease stress. On the other hand, Mi_A_CNL2, 14, 41, and 45 were up-regulated and Mi_A_CNL47 is down-regulated in cold stress. Subsequently, the Random Forest (RF) classifier was used to assess the multi-stress response of MiCNLs. It was found that Mi_A_CNL14 is a gene that responds to multiple stress conditions. The CNLs have similar protein structures which show that they are involved in the same function. The above findings provide a foundation for a deeper understanding of the functional characteristics of the mango CNL gene family.


Introduction
Plants have evolved various mechanisms to protect themselves from both biotic and abiotic stresses (Haak et al., 2017).When they are attacked by pathogens, such as bacteria, viruses, fungi, nematodes, and insects, plants activate their pathogen response mechanisms to prevent further harm (Baker et al., 2010).One key component of this defense system is the plant disease resistance (R) genes.These genes play a role in defense against pathogens and are triggered by pathogen signaling (Belkhadir et al., 2004).They can target specific pathogens and are typically encoded by a type of protein called a nucleotide-binding site-leucine-rich repeat (NBS-LRR) protein.The NBS domain of this protein contains three key motifs: the P-loop, kinase-2, and kinase-3a-binding nucleotide (Tameling et al., 2002).The LRR domain, which typically contains 20-30 amino acid residues, is made up of two segments: a highly conserved segment (HCS) and a variable segment (VS) (Matsushima and Miyashita, 2012).The NBS-LRR gene family is the largest class of R genes and plays multiple roles in hostpathogen recognition and downstream signaling transduction (Wan et al., 2012).
NBS-LRR proteins are a class of plant resistance (R) genes that play a crucial role in protecting plants against pathogens.These proteins are divided into two types based on their conserved functional domains: TIR-domain-containing (TNL) and non-TIRdomain-containing.The non-TIR-domain-containing type, also known as CC-NBS-LRR (CNLs), is characterized by the presence of a coiled-coil domain at the N-terminal instead of a TIR domain (Sukarta et al., 2016).Additionally, other domains such as zinc fingers or RPW8 domains may also be present in the N-terminal of CNL genes.CNL genes are found in both monocotyledons and dicotyledons and are widely present in plants (Tarr et al., 2009).
Furthermore, a large proportion of R genes (approx.80%) encode the NBS-LRR domain, and more than 50 NBS genes have been shown to play a role in disease resistance (Song et al., 2015).Examples of NBS-LRR proteins include the Pi-ta gene in rice, which directly interacts with the Magnaporthe grisea effector AVR-Pita, and the RRS1 protein in Arabidopsis thaliana, which directly interacts with the bacterial wilt pathogen protein PopP2 (Jia et al., 2000;Deslandes et al., 2003).Additionally, RPS2 and RPM1 resistance genes in Arabidopsis respond to Pseudomonas syringae through indirect interaction with AvrRpm1 and AvrB (DeYoung and Innes, 2006;Gururani et al., 2012).Furthermore, the ectopic overexpression of the Arabidopsis RPW8 gene has been shown to enhance resistance to powdery mildew in grapevine (Hu et al., 2018).
Mangifera indica (Mango) belongs to the Anacardiaceae family, which comprises 73 genera and almost 850 species.This fruit grows in tropical and subtropical regions of the world.Mangoes are renowned for being a natural source of dietary fiber, vitamins, proteins, carbohydrates, and essential minerals.They also have a unique flavor and are very nutritious.Therefore, it is called as "King of Tropical Fruits".Green, yellow, dark red, and orange are the skin colors of ripe mango fruits (Quintana et al., 2021).The mango's genome was sequenced in 2020, opening up greater resources for molecular studies on this fruit (Wang et al., 2020).The pan-genome of a species encompasses a collection of genes that can be divided into three categories: core genes that are found in all members of the species, accessory genes that are present in some members but not all, and unique genes that are specific to certain individuals within the species.This concept refers to the genetic diversity within a species, rather than an individual genome.
Since CNLs are involved in the defense mechanism of plants against various pathogens including viruses, bacteria, and fungi, the identification of mango CNLs is necessary to understand their interaction mechanisms and to develop defense-resilient cultivars.Additionally, mangoes are traded internationally, and the presence of diseases can restrict exports due to phytosanitary regulations.Disease resistant mango varieties can open up new markets and enhance international trade opportunities.
In this study, only those mango cultivars were chosen that have both the genome and annotation files available.Using a draft pangenome, the CNL gene family members were identified in three mango cultivars: Alphonso, Hong Xiang Ya, and Tommy atkins.The structural and functional characteristics, gene structure and motifs, chromosomal distribution, gene duplication, cis-regulatory elements, protein-protein interaction (PPI), and the expression pattern of Mi_A_CNLs at various conditions were analyzed.Furthermore, machine learning techniques were used to identify the multi-stress responsive genes.These results provide worthy clues for further analyzing the biological functions of MiCNLs in various other biotic and abiotic stresses.

Construction of mango draft pan-genome
The published genomes of three Mangifera indica cultivars named Alphonso, H. Xiang Ya, and T. atkins were downloaded from the MangoBase database (https://mangobase.org/easy_gdb/index.php)(Goḿez-Olléet al., 2023) and a draft pan-genome was constructed based on presence-absence variations (PAVs) using ppsPCP: a plant presence/absence variants scanner and pangenome construction pipeline (http://cbi.hzau.edu.cn/ppsPCP/)(Ul Qamar et al., 2019).PAVs are the types of Structural Variations (SVs) that are either present or absent in different organisms/genomes.Usually, plants have a PAV length of 100bp.The query genomes were iteratively mapped against reference genome using MUMmer and PAVs were harvested.Next, the harvested PAVs were validated with BLASTn search between the query and reference genomes.Finally, the boundaries of filtered PAVs were corrected and a draft pan-genome was established.

Identification and physiochemical characterization of mango CNLs
The 51 A. thaliana CNL protein sequences were retrieved from the Ensembl Plants database (https://plants.ensembl.org/index.html)and a tBLASTn search was performed against the draft pan-genome.From the coordinates of each blast hit, using a draft pan-genome GFF file the protein IDs were obtained and protein sequences were retrieved from the proteome of each cultivar.The identified proteins were further searched for the confirmation of the presence of the NB-ARC and LRR domains in Pfam (http://pfam-legacy.xfam.org/)(Bateman et al., 2004), InterPro (https://www.ebi.ac.uk/interpro/) (Hunter et al., 2009), C o n s e r v e d D o m a i n s D a t a b a s e ( C D D ; h t t p s : / / www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) (Marchler-Bauer et al., 2015), and HMMER (https://www.ebi.ac.uk/Tools/hmmer/) (Finn et al., 2011) databases.In addition, the coiled-coils structure was confirmed on the Paircoil2 website (https://cb.csail.mit.edu/cb/paircoil2/paircoil2.html), and the P-value parameter was set as 0.025 (McDonnell et al., 2006).The proteins having no characteristic conserved domains were excluded from further analysis.
To find common motifs among each mango cultivar, the Multiple Expectation Maximization for Motif Elicitation tool (MEME, https://meme-suite.org/meme/)(Bailey et al., 2015) was applied using protein sequences.Except for setting the motif number to 20, the rest of the parameters were retained by default.TBtools was used to visualize the identified motifs.The GFF file of each mango cultivar was used to analyze the intron and exon pattern of MiCNL genes and the structures were displayed using TBtools (Chen et al., 2018).

Chromosomal localization, Ka/Ks, and gene duplication analysis
The chromosomal position of each MiCNL gene was acquired from the GFF file of the relative cultivar and mapped using the gene location visualization tool of TBtools software (Chen et al., 2018).MiCNL gene duplication events were determined based on whether the length of the shorter gene covered was equal to or greater than 70% of the longer gene and if the similarity of the two aligned genes was equal to or greater than 70% (Tsai et al., 2012).Tandem and segmental duplications are reported to be the two main mechanisms underlying gene family expansion.Genes located on the same chromosome fragment were considered to be tandem duplicated genes.Genes found to be co-paralogs located on duplicated chromosomal blocks were considered to be segmentally duplicated genes (Flagel and Wendel, 2009).Ka/Ks values can be used to predict selection pressure for replicating genes.DnaSP v.6 software (Rozas et al., 2017) was used to calculate the nonsynonymous (Ka) and synonymous (Ks) nucleotide substitution parameters.If the ratio of Ka/Ks was greater than, equal to, or less than one, this indicated positive, neutral, and purifying selection, respectively (Zia et al., 2022).Moreover, the time of divergence for these gene pairs was calculated using the formula "t = Ks/2l×10 -6 ", with l value of 1.5× 10 −8 for dicots to calculate the duplication time in million years (Zameer et al., 2022;Sadaqat et al., 2023).

Cis-regulatory elements, proteinprotein interaction, and gene ontology enrichment analysis
As in the earlier studies, the cis-acting elements in the 2,000 bp upstream sequences in the genomic region of MiCNL genes were retrieved from the genome file using the "samtools faidx" tool in Ubuntu (Li et al., 2009;Hu et al., 2022;Xia et al., 2022;Zhu et al., 2022), and the types, numbers, and functions of these elements were analyzed using PlantCARE database (https://bioinformatics.psb.ugent.be/webtools/plantcare/html/) (Rombauts et al., 1999).Cis-elements were visualized using TBtools software.
Protein sequences of MiCNL were used as input in the STRING database (https://string-db.org/)(Mering et al., 2003) for analyzing PPI.For PPI the level of connection used was tenth and other parameters were kept by default.The PPI network was visualized and edited using Cytoscape software (Shannon et al., 2003).GO enrichment analysis was done using the DAVID database (https://david.ncifcrf.gov/home.jsp)(Dennis et al., 2003) and the components considered were biological processes (BP), cellular components (CC), and molecular function, and KEGG pathways.

Tissue specific analysis and 3D structure prediction of Mi_A_CNLs
The expression levels of all Mi_A_CNL genes under disease and cold stress were evaluated using transcriptome datasets available at the NCBI Sequence Read Archive (SRA) database (https:// www.ncbi.nlm.nih.gov/sra)(Kodama et al., 2012) under BioProject: PRJNA855362 and PRJNA304093 respectively.The genome and annotation files (GFF) were downloaded from the MangoBase database (https://mangobase.org/easy_gdb/index.php) (Goḿez-Olléet al., 2023).The reads quality was checked through the FastQC tool (Brown et al., 2017).Indexes of M. indica (Alphonso) genome sequences were built using Bowtie2 (Langdon, 2015) and high-quality paired-end reads were mapped to the genome.The Htseq-count (Anders et al., 2015) program used abundance estimation of annotated genes.Finally, count values of individual genes were used to generate the heatmap which was illustrated using TBtools software.

Prediction of multi-stress responsive genes using machine learning
DESeq2 was utilized to investigate both disease and cold stress samples to identify genes with significant expression changes (Anders and Huber, 2012).Based on statistical significance, the identified genes were screened based on their p-value < 0.05 and log2 fold change values (a log2FC value ≥ 0.5 for upregulation, and log2FC ≤ -0.5 for downregulation).Common CNL genes from both datasets were used for testing.To verify the validity of these genes, the random forest (RF) classification algorithm was applied within the R programming environment (Qi, 2012).Model performance assessment usually involves a comparison of the model's predictions with the known values of the dependent variable within a specific dataset.Count values of disease datasets were taken to train the model and common genes were used for testing.Performance metrics such as accuracy, area under the receiver operating characteristic curve (AUC), specificity, and sensitivity were used to evaluate the effectiveness of the RF classifier, specifically on the dataset containing common multi-stress responsive gene.

Draft pan-genome of three mango cultivars
Three mango genomes of cultivars: Alphonso, H. Xiang Ya, and T. atkins were used to construct a draft pan-genome through ppsPCP.The Alphonso genome was selected as a reference based on its quality and completeness, while H. Xiang Ya and T. atkins were mapped iteratively against the selected reference genome.In the first iteration, the H. Xiang Ya genome contributed 5907 PAVs and 2823 new genes to the reference genome.While, in the second iteration, T. atkins contributed 2092 PAVs and 1266 new genes to the developing draft pan-genome (Table S1).In total, 7999 novel PAVs and 4089 new genes were added to the reference genome and a draft pangenome assembly was established (Figure 1).The total draft pangenome assembly size was 470 MB, with a total of 39843 genes in its annotation file.The draft pan-genome assembly fasta (.fa) and annotation (.gff3) files are given in Supplementary Material.

Identification and physiochemical characteristics of CNL genes in Mangifera indica cultivars
A total of 47, 27, and 36 CNL genes were identified from the genomes of Alphonso (Mi_A_CNLs), H. Xiang Ya (Mi_H_CNLs), and T. atkins (Mi_T_CNLs), respectively.All of the identified MiCNLs were also confirmed for the presence of coil-coil, NB-ARC, and LRR domains (Table S2).The CNLs in Mangifera indica cultivars were relatively less than A. thaliana, Oryza sativa, Medicago truncatula, Helianthus annuus L., and Dioscorea rotundata but higher than C. sinensis, Brassica rapa, Cucumis sativus, and Raphanus sativus (Figure 2).
The protein names of each cultivar were named from CNL1 onward according to their position on chromosomes, from Chr1 to Chr20 (Table 1).
The physical and chemical properties of all MiCNL proteins were analyzed (Table S3).There were no significant differences in amino acid residue number, molecular weights, isoelectric point instability index, aliphatic index, and GRAVY among the three cultivars.In all cultivars, most of the proteins have an isoelectric point (pI) less than 7 indicating that these proteins have acidic behavior.The instability index (II) values of most proteins indicated that these are unstable in the test tube.Most of the proteins have an aliphatic index (AI) greater than 70 which indicates that these proteins are thermally stable, and negative GRAVY values indicate that these proteins are hydrophilic (Figure 3).The protein's subcellular localization shows that most of the proteins were present in the cytoplasm and nucleus.Few proteins were present in the chloroplast and endoplasmic reticulum (Table 1).

Phylogenetic relationships of CNL family members from three M. indica cultivars
To analyze the possible evolutionary relationship of the CNL gene family in M. indica cultivars, a phylogenetic tree was constructed using 204 amino acid sequences from six species.All   4).Overall, 20 motifs were chosen to analyze the pattern of conserved motifs among the MiCNLs.These motifs were identified through annotation from the Pfam database.The NBS domain consists of 8 motifs.Specifically, motif 1 was identified as the P-loop (Kinase a), motif 3 as GLPL, motif 4 as RNBS-D, motif 6 as MHD, motif 7 as Kinase-2, motif 8 as RNBS-C, motif 10 as RNBS-A, motif 13 as LRR.Out of 20, a total of 12 motifs (1, 3, 4, 6, 7, 8, 9, 10, 11, 13, 14, and 19) were conserved in all proteins of Alphonso.Motifs 2 and 12 were only conserved in the members of group C. Motif 5 was conserved in all proteins except the proteins of group A (Figure 5A).In H. Xiang Ya 8 motifs (1,2,3,4,5,6,7, and 8) were conserved in all proteins expects 2 proteins (Mi_H_CNL21 and Mi_H_CNL23).Motif 18 was only conserved in group B (Figure S1A).In T. atkins 6 motifs (1, 2, 3, 8, 9, and 12) were conserved among all members.Motifs 5 and 16 were only conserved in the members of group C (Figure S2A).
In Alphonso, gene structure varies from one group to another group.In group A, all the members have 5 exons and 4 introns.Group B has 1-4 exons and 0-3 introns, while members of Group C have 1-2 exons and 0-1 introns.Most of the members in group C have only 1 exon and no intron (Figure 5B).In H. Xiang Ya group A exons ranged from 3-13 and introns ranged from 2-12.Group B has 2-5 exons and 1-4 introns.Group C has 1-4 exons and 0-3 introns, while all members of Group D have only 2 exons and 1 intron (Figure S1B).In group A, T. atkins exons had a range from 5-13 and introns had a range from 4-12.Most of the members of group B had 1 exon but few members had a range of 1-3 exons and 0-2 introns.Members of group C have 1-15 exons and 0-14 introns (Figure S2B).
Gene duplication events were also analyzed among Mi_A_CNL, Mi_H_CNL, and Mi_T_CNL genes and a total of 37, 8, and 5 duplicated pairs of genes were found among all the members respectively.Most of the members were tandemly duplicated.On the other hand, a few members resulted from segmental duplication.Thus, in line with previous studies, these findings indicated that tandem, as well as segmental duplications, were the main factor causing the increase of the CNL gene family in M. indica cultivars (Table 2).
To analyze the evolutionary constraints of the repeated MiCNL genes, the Ka, Ks, and Ka/Ks ratios of all para-homologous gene pairs were also calculated.In Mi_A_CNLs, Mi_H_CNLs, and Mi_T_CNLs gene pairs had Ka/Ks values ranging from 0.51 to 1.27, 0.59 to 1.44, and 0.63 to 11.89 respectively.Resultantly, the time of divergence of all 50 duplicated gene pairs of Mi_CNLs was between 0.3 to 88.4 million years (MYA).

Prediction of cis-regulatory elements in the promoter of MiCNL genes
The cis-regulatory elements were analyzed to further predict the involvement of MiCNL genes in the regulation of abiotic stresses.In all M. indica cultivars several cis-elements were found which were further classified into light-related, hormone-related, stress-related, and development-related elements (Table S4).Regarding these elements, for cis-elements Box 4, G-box, GT1-motif, and GATAmotif were found to be involved in light-stress regulation.Five ciselements were involved with hormone responsiveness: ABRE, CGTCA-motif, TGA-element, P-box, and TCA-element.Further, four cis-elements were found to be involved with stress responsiveness: GC-motif, LTR, TC-rich repeats, and MBS.Five elements including CAT-box, MBSI, circadian, HD-Zip 1, and o2-   site were involved in developmental processes.In Mi_A_CNLs light and stress-related cis-elements were mostly present (Figure 7) while in Mi_H_CNLs hormones-related cis-elements were mostly present (Figure S5).Mi_T_CNLs have mostly cis-elements related to hormones, stress, and development (Figure S6).

PPI and gene ontology enrichment analysis
MiCNL proteins were evaluated to identify interactions among them to understand their functional interactions.Interacting Cis-regulatory elements in the promoter region of Mi_A_CNL genes.(A) Represents the cis-elements and their location on the upstream region of genes.(B) Represents the heatmap with colors showing the number of elements related to various stresses.
Tahir ul Qamar et al. 10.3389/fpls.2023.1285547Frontiers in Plant Science frontiersin.orgproteins might be involved in a pathway, thus affecting the roles of other proteins and giving an overall response.Some MiCNL proteins were found to interact with the other CNL as well as the other homologous proteins.Among MiCNLs, Mi_T_CNLs showed the highest interactions.Mi_T_CNL18 and Mi_T_CNL12 were among the highly interacting proteins.Further, Mi_T_CNL9, Mi_T_CNL26, and Mi_H_CNL2 also showed great interactions with other defense-responsive proteins (Figure 8A).Gene Ontology (GO) enrichment analysis was performed on the MiCNL genes.According to GO analysis, genes were involved in a KEGG pathway: Plant pathogen interactions (GO: ath04626), Molecular functions including ADP binding (GO:0043531), Adenyl ribonucleotide binding (GO:0032559), ATP binding (GO:0005524) and Anion binding (GO:0043168).Moreover, these proteins were found to be in the plasma membrane (GO:0005886).These proteins also participate in a variety of biological processes including Defense response (GO:0006952), Plant-type hypersensitive response (GO:0009626), Defense response to other organisms (GO:0098542), Cellular response to stress (GO:0033554), Regulation of immune system process (GO:0002682), and Defense response to the bacterium (GO:0042742) (Table S5; Figure 8B).

Expression analysis of Mi_A_CNL genes
To further investigate the roles of these genes, their expression patterns were observed in disease and cold stress.In the disease stage, few genes showed fluctuated expression as Mi_A_CNL13 and Mi_A_CNL14 were up-regulated in fruit peel and Mi_A_CNL15, 25, 30, 31, 40 were down-regulated (Figure 9A).In cold stress Mi_A_CNL2, 14, 41, 45 were up-regulated and Mi_A_CNL47 is down-regulated (Figure 9B).Overall, the expression level of the remaining genes was found to be similar in each stress and condition.

Structure prediction of Mi_A_CNL proteins
To obtain more structural and ultimately functional insights, the 3D structures of four Mi_A_CNLs proteins (Mi_A_CNL13, 14, 25, and 30) were modeled.All these structures shared almost similar structures of loops, helices, and turns.All these structures contained a great number of helices.The basic structure was similar such as turns on the left side of structures (leucine-rich repeats) are visible in every modeled protein.Moreover, the number of helices in each protein is also the same (Figure 10).

Performance evaluation of multistress responsive genes
A total of 15 genes were found to be present in both disease and cold datasets.A machine learning classifier, a random forest algorithm, was employed to assess their performance (Table 3).Using the count's data of disease stress as the training dataset, it was analyzed that only one gene (Mi_A_CNL14) was rigorously tested for its multi-stress responsiveness.The classification model's sensitivity, specificity, and overall accuracy were evaluated using the Receiver Operating Characteristic (ROC) plot.Impressively, Mi_A_CNL14 demonstrated a ROC value of 0.8333, indicating its acceptable performance as a potential multi-stress responsive gene.Figure S7 visually represents the ROC plot for Mi_A_CNL14, providing supporting evidence of its classification efficacy.

Discussion
Disease resistance R genes in plants are essential for effectortriggered immunity (ETI) because they have mechanisms for identifying pathogens in plants and protecting the plants directly or indirectly (Yang and Wang, 2015).The NBS-LRR class of these R genes, having most of the NBS and LRR domains at the C-terminal, encodes the largest family of all the five classes of these proteins (Meyers et al., 2003).Two major subfamilies of the NBS-LRR protein family are usually found: toll/interleukin-1 receptor-NBS-LRR (TNL) and coil-coil-NBS-LRR (CNL) (Shao et al., 2014).

B A FIGURE 9
Heatmap regarding the expression pattern of Mi_A_CNL genes in fruit peel at different conditions constructed using count values.(A) Disease stress (B) Cold stress at 2, 7, and 12 days.The red color represents the up-regulated expression and the blue color represents the higher or upregulated expression.
Pan-genome wide analysis provides a comprehensive overview of diversity at the genomic level involving multiple species, which may lead to the identification of unique genes that are present in specific species instead of being present in all genomes under study (Tahir ul Qamar et al., 2020).Similarly, in this study, three unique genes were identified only in the H. Xiang Ya cultivar including Mi_H_CNL3, 12, and 13.The phylogenetic analysis categorized CNL genes into four groups (A, B, C, and D) using A. thaliana as a reference.The clade of Group C was the largest and Group D was the smallest.All of these genes belonging to the same subgroup were Predicted 3D structures of four Mi_A_CNLs using Alphaflold2 and visualized using PyMOL.Red color represents the helices, cyan color represents the sheets, and pink color represents the loops.The conservation of motifs and gene structures was similar to the ones observed in previous studies like C. sinensis and B. rapa, in which very few such as one exon or intron were found.Similarly, the conservation of motifs among groups was also the same (Kohler et al., 2008).The observed differences in the number of exons and introns among mango cultivars and other species imply the evolutionary changes in gene structures over time, potentially impacting their functional conservation.This suggests diversification of CNL genes.Despite this variation in gene structure, most genes share a similar number of conserved motifs, indicating the preservation of their functions throughout evolution.
Chromosomal mapping indicates that the CNL genes in all three cultivars are distributed unevenly but among all cultivars, most of the genes are present in the form of clusters.The same trend was observed in A. thaliana (Meyers et al., 2003), C. sinensis (Yin et al., 2023), R. sativus L. (Ma et al., 2021), B. rapa (Liu et al., 2021), and O. sativa (Zhou et al., 2004).Most of these genes in three cultivars were found to have undergone tandem duplication.A similar pattern of duplication was observed in B. rapa (Ma et al., 2021) and C. sinensis (Yin et al., 2023).The evaluation of selection pressure on genes involved the use of the Ka/Ks ratio, which represents the ratio of non-synonymous (Ka) to synonymous (Ks) mutations.A Ka/Ks ratio greater than 1 indicates positive selection, while a ratio less than 1 signifies purifying selection.The analysis of mango cultivars revealed evidence of both positive and purifying selection acting on the studied genes.
The promoter region of these genes showed several stressrelated elements that further confirm the involvement of these genes in different abiotic and disease-resistant stresses.Other plants have also been shown to have these elements which confer resistance to various environmental stresses.Black rot (BR) is a bacterial disease caused by Xanthomonas campestris pv.campestris dowson, which infects many Brassica species, such as cabbage (Brassica oleracea var), Chinese cabbage (Brassica pekinensis), and oil seed rape (Brassica campestris) (Zeilmaker et al., 2015;Zhang et al., 2018).All these findings help us understand the involvement of these genes in various stresses.
Protein-Protein interaction studies showed that few proteins from mango cultivars interacted with other defense-responsive proteins including TIR, RIN1, RIN4, PBS1, PBS2, and RPP5.GO analysis revealed that most CNL genes are located in the plasma membrane and involved in defense responses, ADP binding, ATP binding, anion binding, and adenyl ribonucleotide binding.In Vitis vinifera, NBS-LRR genes are also involved in defense responses, ADP binding, and ATP binding (Goyal et al., 2020).
The expression profiling of these genes showed their varied expression in disease and cold stress.The expression was analyzed in fruit peel.In response to disease stress, Mi_A_CNL13 and Mi_A_CNL14 were up-regulated, whereas Mi_A_CNL15, 25, 30, 31, and 40 were down-regulated.Conversely, under cold stress, Mi_A_CNL2, 14, 41, and 45 were up-regulated, while Mi_A_CNL47 was down-regulated.In B. rapa most of the CNL genes have the same trend but in C. sativus L. most of the CNL genes were upregulated in salt and chilling (cold) stress (Liu et al., 2021;Zhang et al., 2022).Based on expression values the 3D structures of four proteins were also predicted to help understand their structural as well as functional conservations and all four proteins have almost the same number of alpha-helices and beta sheets.
Furthermore, random forest, the machine learning approach was utilized to evaluate the genes that were showing multi-stress responses in both disease and cold stress.A total of 15 genes were common in both datasets but only one gene (Mi_A_CNL14) was significantly involved in multi-stress response.Some other studies also utilized the same methods to evaluate genes involved in multistress response (Fatima et al., 2023).Therefore, it can be concluded that CNL genes can significantly benefit mango genetic improvement through breeding or genetic manipulation, by conferring disease resistance and enhancing tolerance to abiotic stresses.Their role in multi-stress responsiveness, as suggested by our analysis, makes them valuable candidates for further breeding programs seeking mango varieties with robust adaptability to diverse environmental conditions.Breeding for MiCNL gene related traits could lead to healthier mango plants, reduced pesticide dependency, and improved sustainability in mango cultivation.

Conclusion
In this study, a draft pan-genome was constructed and PAVs were scanned through the ppsPCP pipeline using three mango cultivars in which 47 genes in Alphonso, 27 in H. Xiang Ya, and 36 in T. atkins have been identified.These were classified into four groups: A, B, C, and D. All the members from the same group shared greater conservation in motif and gene structure.Few segmental and most tandemly duplicated pairs were found.A large number of cis-regulatory elements related to light, hormones, stress, and development responsive were found in promoter regions of mango CNLs.PPI showed CNL proteins interact with CNL and other defense-responsive proteins.and GO enrichment analysis revealed their interaction and involvement in pathways as well as processes related to defense response.Structure prediction showed high similarity among members of the same groups.Expression profiling of mango fruit peel under disease stress revealed that Mi_A_CNL13 and 14 were up-regulated while Mi_A_CNL15, 25, 30, 31, and 40 were down-regulated.On the other hand, in cold stress Mi_A_CNL2, 14, 41, 45 were up-regulated and Mi_A_CNL47 is down-regulated.Machine learning approaches indicate that out of 15 common genes, only one gene (Mi_A_CNL14) can be a multi-stress responsive gene (Super gene).Our results provide a solid foundation to further investigate the function of CNLs in regulating various abiotic and environmental stress responses and more accessions should be sequenced to improve the quality of the reference genome.

FIGURE 2
FIGURE 2Identified CNL gene family members from Mangifera indica cultivars and other plant species.
FIGURE 3 Violin plot of physiochemical properties of A thaliana and three M. indica cultivars.(A) Protein length, (B) Molecular weight, (C) Isoelectric point, (D) Instability index, (E) Aliphatic Index, and (F) Grand average of hydropathicity (GRAVY).
FIGURE 8 (A) Interactions among Mi_A_CNL and other homologous proteins.The teal color represents the Mi_A_CNL proteins and the cyan color represents the other interacting proteins from different species.(B) Predicted KEGG pathways, Molecular functions, Cellular components, and Biological Processes associated with Mi_A_CNL proteins.

TABLE 1
Details of identified CNLs in M. indica cultivars.

TABLE 2
Duplication data of three M. indica cultivars genes, rate of synonymous and non-synonymous mutations, duplication time (MYA), and type of duplication between genes.

TABLE 3
(Zhang et al., 2022)Gs identified in disease and cold stress.togetherandshared the same homology, even members from the other species as well.However, none of the CNL genes from Alphonso and T. atkins were found in group D. Similarly, in the case of C. sativus L. no gene was present in group D(Zhang et al., 2022).H. Xiang Ya is the only cultivar that has three members in group D named Mi_H_CNL3, 12, and 13. clustered