ORIGINAL RESEARCH article

Front. Plant Sci., 30 October 2023

Sec. Functional and Applied Plant Genomics

Volume 14 - 2023 | https://doi.org/10.3389/fpls.2023.1285547

Comparative genomics profiling revealed multi-stress responsive roles of the CC-NBS-LRR genes in three mango cultivars

  • 1. State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning, Guangxi, China

  • 2. Department of Bioinformatics and Biotechnology, Government College University Faisalabad (GCUF), Faisalabad, Pakistan

  • 3. Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture, Oil Crops Research Institute, Chinese Academy of Agricultural Sciences, Wuhan, China

  • 4. Environment and Plant Protection Institute, Chinese Academy of Tropical Agricultural Sciences, Haikou, China

  • 5. Department of Pharmacology and Toxicology, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia

Article metrics

View details

13

Citations

3,2k

Views

1,3k

Downloads

Abstract

The nucleotide-binding site-leucine-rich repeat (NBS–LRR) gene family is the largest group of disease resistance (R) genes in plants and is active in response to viruses, bacteria, and fungi usually involved in effector-triggered immunity (ETI). Pangenome-wide studies allow researchers to analyze the genetic diversity of multiple species or their members simultaneously, providing a comprehensive understanding of the evolutionary relationships and diversity present among them. The draft pan-genome of three Mangifera indica cultivars (Alphonso, Hong Xiang Ya, and Tommy atkins) was constructed and Presence/absence variants (PAVs) were filtered through the ppsPCP pipeline. As a result, 2823 genes and 5907 PAVs from H. Xiang Ya, and 1266 genes and 2098 PAVs from T. atkins were added to the reference genome. For the identification of CC-NBS-LRR (CNL) genes in these mango cultivars, this draft pan-genome study has successfully identified 47, 27, and 36 members in Alphonso, H. Xiang Ya, and T. atkins respectively. The phylogenetic analysis divided MiCNL proteins into four distinct subgroups. All MiCNL genes are unevenly distributed on chromosomes. Both tandem and segmental duplication events played a significant role in the expansion of the CNL gene family. These genes contain cis-elements related to light, stress, hormone, and development. The analysis of protein-protein interactions (PPI) revealed that MiCNL proteins interacted with other defense-responsive proteins. Gene Ontology (GO) analysis indicated that MiCNL genes play a role in defense mechanisms within the organism. The expression level of the identified genes in fruit peel was observed under disease and cold stress which showed that Mi_A_CNL13 and 14 were up-regulated while Mi_A_CNL15, 25, 30, 31, and 40 were down-regulated in disease stress. On the other hand, Mi_A_CNL2, 14, 41, and 45 were up-regulated and Mi_A_CNL47 is down-regulated in cold stress. Subsequently, the Random Forest (RF) classifier was used to assess the multi-stress response of MiCNLs. It was found that Mi_A_CNL14 is a gene that responds to multiple stress conditions. The CNLs have similar protein structures which show that they are involved in the same function. The above findings provide a foundation for a deeper understanding of the functional characteristics of the mango CNL gene family.

1 Introduction

Plants have evolved various mechanisms to protect themselves from both biotic and abiotic stresses (Haak et al., 2017). When they are attacked by pathogens, such as bacteria, viruses, fungi, nematodes, and insects, plants activate their pathogen response mechanisms to prevent further harm (Baker et al., 2010). One key component of this defense system is the plant disease resistance (R) genes. These genes play a role in defense against pathogens and are triggered by pathogen signaling (Belkhadir et al., 2004). They can target specific pathogens and are typically encoded by a type of protein called a nucleotide-binding site-leucine-rich repeat (NBS-LRR) protein. The NBS domain of this protein contains three key motifs: the P-loop, kinase-2, and kinase-3a-binding nucleotide (Tameling et al., 2002). The LRR domain, which typically contains 20-30 amino acid residues, is made up of two segments: a highly conserved segment (HCS) and a variable segment (VS) (Matsushima and Miyashita, 2012). The NBS-LRR gene family is the largest class of R genes and plays multiple roles in host-pathogen recognition and downstream signaling transduction (Wan et al., 2012).

NBS-LRR proteins are a class of plant resistance (R) genes that play a crucial role in protecting plants against pathogens. These proteins are divided into two types based on their conserved functional domains: TIR-domain-containing (TNL) and non-TIR-domain-containing. The non-TIR-domain-containing type, also known as CC-NBS-LRR (CNLs), is characterized by the presence of a coiled-coil domain at the N-terminal instead of a TIR domain (Sukarta et al., 2016). Additionally, other domains such as zinc fingers or RPW8 domains may also be present in the N-terminal of CNL genes. CNL genes are found in both monocotyledons and dicotyledons and are widely present in plants (Tarr et al., 2009).

Furthermore, a large proportion of R genes (approx. 80%) encode the NBS-LRR domain, and more than 50 NBS genes have been shown to play a role in disease resistance (Song et al., 2015). Examples of NBS-LRR proteins include the Pi-ta gene in rice, which directly interacts with the Magnaporthe grisea effector AVR-Pita, and the RRS1 protein in Arabidopsis thaliana, which directly interacts with the bacterial wilt pathogen protein PopP2 (Jia et al., 2000; Deslandes et al., 2003). Additionally, RPS2 and RPM1 resistance genes in Arabidopsis respond to Pseudomonas syringae through indirect interaction with AvrRpm1 and AvrB (DeYoung and Innes, 2006; Gururani et al., 2012). Furthermore, the ectopic overexpression of the Arabidopsis RPW8 gene has been shown to enhance resistance to powdery mildew in grapevine (Hu et al., 2018).

Mangifera indica (Mango) belongs to the Anacardiaceae family, which comprises 73 genera and almost 850 species. This fruit grows in tropical and subtropical regions of the world. Mangoes are renowned for being a natural source of dietary fiber, vitamins, proteins, carbohydrates, and essential minerals. They also have a unique flavor and are very nutritious. Therefore, it is called as “King of Tropical Fruits”. Green, yellow, dark red, and orange are the skin colors of ripe mango fruits (Quintana et al., 2021). The mango’s genome was sequenced in 2020, opening up greater resources for molecular studies on this fruit (Wang et al., 2020). The pan-genome of a species encompasses a collection of genes that can be divided into three categories: core genes that are found in all members of the species, accessory genes that are present in some members but not all, and unique genes that are specific to certain individuals within the species. This concept refers to the genetic diversity within a species, rather than an individual genome.

Since CNLs are involved in the defense mechanism of plants against various pathogens including viruses, bacteria, and fungi, the identification of mango CNLs is necessary to understand their interaction mechanisms and to develop defense-resilient cultivars. Additionally, mangoes are traded internationally, and the presence of diseases can restrict exports due to phytosanitary regulations. Disease resistant mango varieties can open up new markets and enhance international trade opportunities.

In this study, only those mango cultivars were chosen that have both the genome and annotation files available. Using a draft pan-genome, the CNL gene family members were identified in three mango cultivars: Alphonso, Hong Xiang Ya, and Tommy atkins. The structural and functional characteristics, gene structure and motifs, chromosomal distribution, gene duplication, cis-regulatory elements, protein-protein interaction (PPI), and the expression pattern of Mi_A_CNLs at various conditions were analyzed. Furthermore, machine learning techniques were used to identify the multi-stress responsive genes. These results provide worthy clues for further analyzing the biological functions of MiCNLs in various other biotic and abiotic stresses.

2 Materials and methods

2.1 Construction of mango draft pan-genome

The published genomes of three Mangifera indica cultivars named Alphonso, H. Xiang Ya, and T. atkins were downloaded from the MangoBase database (https://mangobase.org/easy_gdb/index.php) (Gómez-Ollé et al., 2023) and a draft pan-genome was constructed based on presence-absence variations (PAVs) using ppsPCP: a plant presence/absence variants scanner and pan-genome construction pipeline (http://cbi.hzau.edu.cn/ppsPCP/) (Ul Qamar et al., 2019). PAVs are the types of Structural Variations (SVs) that are either present or absent in different organisms/genomes. Usually, plants have a PAV length of 100bp. The query genomes were iteratively mapped against reference genome using MUMmer and PAVs were harvested. Next, the harvested PAVs were validated with BLASTn search between the query and reference genomes. Finally, the boundaries of filtered PAVs were corrected and a draft pan-genome was established.

2.2 Identification and physiochemical characterization of mango CNLs

The 51 A. thaliana CNL protein sequences were retrieved from the Ensembl Plants database (https://plants.ensembl.org/index.html) and a tBLASTn search was performed against the draft pan-genome. From the coordinates of each blast hit, using a draft pan-genome GFF file the protein IDs were obtained and protein sequences were retrieved from the proteome of each cultivar. The identified proteins were further searched for the confirmation of the presence of the NB-ARC and LRR domains in Pfam (http://pfam-legacy.xfam.org/) (Bateman et al., 2004), InterPro (https://www.ebi.ac.uk/interpro/) (Hunter et al., 2009), Conserved Domains Database (CDD; https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) (Marchler-Bauer et al., 2015), and HMMER (https://www.ebi.ac.uk/Tools/hmmer/) (Finn et al., 2011) databases. In addition, the coiled-coils structure was confirmed on the Paircoil2 website (https://cb.csail.mit.edu/cb/paircoil2/paircoil2.html), and the P-value parameter was set as 0.025 (McDonnell et al., 2006). The proteins having no characteristic conserved domains were excluded from further analysis.

Physicochemical properties including the length of protein sequence (aa), molecular weight (MW), isoelectric point (pI), aliphatic index (AI), Instability index (II), and grand average of hydropathicity (GRAVY) values were predicted using ProtParam tool of Expasy server (https://web.expasy.org/protparam/) (Gasteiger et al., 2005). Additionally, subcellular localization of mango CNL proteins was predicted using an online WoLF PSORT tool (https://wolfpsort.hgc.jp/) (Horton et al., 2007).

2.3 Multiple sequence alignment, phylogenetic analysis, conserved motifs, and gene structure analysis of MiCNLs

To further evaluate the evolutionary link of CNL proteins, a multiple sequence alignment of 51 A. thaliana (AtCNLs), 33 Cucumis sativus L. (CsaCNLs), 10 Citrus sinensis (ScCNLs), 47 Alphonso (Mi_A_CNLs), 27 H. Xiang Ya (Mi_H_CNLs), and 36 T. atkins (Mi_T_CNLs) protein were completed using ClustalW program (Tamura et al., 2021), and a phylogenetic tree was constructed using IQTREE Web Server (http://iqtree.cibiv.univie.ac.at/) (Zameer et al., 2021). The reliability of the constructed tree was verified using 1000 bootstrapping replicates using the maximum likelihood (ML) method. The tree was further edited using the iTOL: Interactive Tree of Life (https://itol.embl.de/) (Letunic and Bork, 2021).

To find common motifs among each mango cultivar, the Multiple Expectation Maximization for Motif Elicitation tool (MEME, https://meme-suite.org/meme/) (Bailey et al., 2015) was applied using protein sequences. Except for setting the motif number to 20, the rest of the parameters were retained by default. TBtools was used to visualize the identified motifs. The GFF file of each mango cultivar was used to analyze the intron and exon pattern of MiCNL genes and the structures were displayed using TBtools (Chen et al., 2018).

2.4 Chromosomal localization, Ka/Ks, and gene duplication analysis

The chromosomal position of each MiCNL gene was acquired from the GFF file of the relative cultivar and mapped using the gene location visualization tool of TBtools software (Chen et al., 2018). MiCNL gene duplication events were determined based on whether the length of the shorter gene covered was equal to or greater than 70% of the longer gene and if the similarity of the two aligned genes was equal to or greater than 70% (Tsai et al., 2012). Tandem and segmental duplications are reported to be the two main mechanisms underlying gene family expansion. Genes located on the same chromosome fragment were considered to be tandem duplicated genes. Genes found to be co-paralogs located on duplicated chromosomal blocks were considered to be segmentally duplicated genes (Flagel and Wendel, 2009). Ka/Ks values can be used to predict selection pressure for replicating genes. DnaSP v.6 software (Rozas et al., 2017) was used to calculate the nonsynonymous (Ka) and synonymous (Ks) nucleotide substitution parameters. If the ratio of Ka/Ks was greater than, equal to, or less than one, this indicated positive, neutral, and purifying selection, respectively (Zia et al., 2022). Moreover, the time of divergence for these gene pairs was calculated using the formula “t = Ks/2λ×10-6”, with λ value of 1.5× 10−8 for dicots to calculate the duplication time in million years (Zameer et al., 2022; Sadaqat et al., 2023).

2.5 Cis-regulatory elements, protein-protein interaction, and gene ontology enrichment analysis

As in the earlier studies, the cis-acting elements in the 2,000 bp upstream sequences in the genomic region of MiCNL genes were retrieved from the genome file using the “samtools faidx” tool in Ubuntu (Li et al., 2009; Hu et al., 2022; Xia et al., 2022; Zhu et al., 2022), and the types, numbers, and functions of these elements were analyzed using PlantCARE database (https://bioinformatics.psb.ugent.be/webtools/plantcare/html/) (Rombauts et al., 1999). Cis-elements were visualized using TBtools software.

Protein sequences of MiCNL were used as input in the STRING database (https://string-db.org/) (Mering et al., 2003) for analyzing PPI. For PPI the level of connection used was tenth and other parameters were kept by default. The PPI network was visualized and edited using Cytoscape software (Shannon et al., 2003). GO enrichment analysis was done using the DAVID database (https://david.ncifcrf.gov/home.jsp) (Dennis et al., 2003) and the components considered were biological processes (BP), cellular components (CC), and molecular function, and KEGG pathways.

2.6 Tissue specific analysis and 3D structure prediction of Mi_A_CNLs

The expression levels of all Mi_A_CNL genes under disease and cold stress were evaluated using transcriptome datasets available at the NCBI Sequence Read Archive (SRA) database (https://www.ncbi.nlm.nih.gov/sra) (Kodama et al., 2012) under BioProject: PRJNA855362 and PRJNA304093 respectively. The genome and annotation files (GFF) were downloaded from the MangoBase database (https://mangobase.org/easy_gdb/index.php) (Gómez-Ollé et al., 2023). The reads quality was checked through the FastQC tool (Brown et al., 2017). Indexes of M. indica (Alphonso) genome sequences were built using Bowtie2 (Langdon, 2015) and high-quality paired-end reads were mapped to the genome. The Htseq-count (Anders et al., 2015) program used abundance estimation of annotated genes. Finally, count values of individual genes were used to generate the heatmap which was illustrated using TBtools software.

To function properly, proteins are needed to be folded into a proper three-dimensional structure. Based on expression patterns, four Mi_A_CNL proteins were selected to predict their structures. Alphafold2 (https://rb.gy/dlamz) was used for this purpose (Jumper et al., 2021). Further, the predicted structures were validated using SAVES (https://saves.mbi.ucla.edu/) (Sawal et al., 2023) and MolProbity (http://molprobity.biochem.duke.edu/) (Davis et al., 2007). PyMOL was used to visualize these structures (Alexander et al., 2011).

2.7 Prediction of multi-stress responsive genes using machine learning

DESeq2 was utilized to investigate both disease and cold stress samples to identify genes with significant expression changes (Anders and Huber, 2012). Based on statistical significance, the identified genes were screened based on their p-value < 0.05 and log2 fold change values (a log2FC value ≥ 0.5 for upregulation, and log2FC ≤ -0.5 for downregulation). Common CNL genes from both datasets were used for testing. To verify the validity of these genes, the random forest (RF) classification algorithm was applied within the R programming environment (Qi, 2012). Model performance assessment usually involves a comparison of the model’s predictions with the known values of the dependent variable within a specific dataset. Count values of disease datasets were taken to train the model and common genes were used for testing. Performance metrics such as accuracy, area under the receiver operating characteristic curve (AUC), specificity, and sensitivity were used to evaluate the effectiveness of the RF classifier, specifically on the dataset containing common multi-stress responsive gene.

3 Results

3.1 Draft pan-genome of three mango cultivars

Three mango genomes of cultivars: Alphonso, H. Xiang Ya, and T. atkins were used to construct a draft pan-genome through ppsPCP. The Alphonso genome was selected as a reference based on its quality and completeness, while H. Xiang Ya and T. atkins were mapped iteratively against the selected reference genome. In the first iteration, the H. Xiang Ya genome contributed 5907 PAVs and 2823 new genes to the reference genome. While, in the second iteration, T. atkins contributed 2092 PAVs and 1266 new genes to the developing draft pan-genome (Table S1). In total, 7999 novel PAVs and 4089 new genes were added to the reference genome and a draft pan-genome assembly was established (Figure 1). The total draft pan-genome assembly size was 470 MB, with a total of 39843 genes in its annotation file. The draft pan-genome assembly fasta (.fa) and annotation (.gff3) files are given in Supplementary Material.

Figure 1

3.2 Identification and physiochemical characteristics of CNL genes in Mangifera indica cultivars

A total of 47, 27, and 36 CNL genes were identified from the genomes of Alphonso (Mi_A_CNLs), H. Xiang Ya (Mi_H_CNLs), and T. atkins (Mi_T_CNLs), respectively. All of the identified MiCNLs were also confirmed for the presence of coil-coil, NB-ARC, and LRR domains (Table S2). The CNLs in Mangifera indica cultivars were relatively less than A. thaliana, Oryza sativa, Medicago truncatula, Helianthus annuus L., and Dioscorea rotundata but higher than C. sinensis, Brassica rapa, Cucumis sativus, and Raphanus sativus (Figure 2).

Figure 2

The protein names of each cultivar were named from CNL1 onward according to their position on chromosomes, from Chr1 to Chr20 (Table 1).

Table 1

Gene NameGene IDGroupChromosomeStartEndStrandSubcellular localization
M. indica (Alphonso)
Mi_A_CNL1LOC123220599AChr157115895715357Cytoplasm
Mi_A_CNL2LOC123209381AChr157306425734541Nucleus
Mi_A_CNL3LOC123218214CChr191964969199407+Cytoplasm
Mi_A_CNL4LOC123215178CChr193132449316694+Chloroplast
Mi_A_CNL5LOC123219339BChr12727012827273740+Nucleus
Mi_A_CNL6LOC123213767CChr12891567228920915Nucleus
Mi_A_CNL7LOC123209617CChr21418795214192623+Nucleus
Mi_A_CNL8LOC123204063CChr21436829914371227Nucleus
Mi_A_CNL9LOC123208576CChr21437587814378458Chloroplast
Mi_A_CNL10LOC123204072CChr21438255914385212+Chloroplast
Mi_A_CNL11LOC123208577CChr21438715014389735+Chloroplast
Mi_A_CNL12LOC123208579CChr21441018314414603+Cytoplasm
Mi_A_CNL13LOC123201503BChr22259422422599836Chloroplast
Mi_A_CNL14LOC123209058BChr22260233622607922Nucleus
Mi_A_CNL15LOC123210654BChr360695506077011Cytoplasm
Mi_A_CNL16LOC123210656BChr360884606091345Cytoplasm
Mi_A_CNL17LOC123211055BChr366463746650817Chloroplast
Mi_A_CNL18LOC123211057BChr366538586657623Nucleus
Mi_A_CNL19LOC123211058BChr366610076664607Nucleus
Mi_A_CNL20LOC123211060BChr366705246674217Nucleus
Mi_A_CNL21LOC123211212BChr370212347024534+Chloroplast
Mi_A_CNL22LOC123211877BChr31850253218518194Cytoplasm
Mi_A_CNL23LOC123211635BChr32174985621752705+Nucleus
Mi_A_CNL24LOC123214379CChr42065819720661620Cytoplasm
Mi_A_CNL25LOC123214535CChr42089087520896330+Nucleus
Mi_A_CNL26LOC123216372CChr51288350612887432+Nucleus
Mi_A_CNL27LOC123219383CChr670183007021626+Cytoplasm
Mi_A_CNL28LOC123221139CChr728853842889217Cytoplasm
Mi_A_CNL29LOC123228245AChr10398100401366+Chloroplast
Mi_A_CNL30LOC123229776AChr1125925972596313Nucleus
Mi_A_CNL31LOC123229777AChr1125997302603343Nucleus
Mi_A_CNL32LOC123192330CChr1267666376772521Nucleus
Mi_A_CNL33LOC123196029CChr1424504422453601Endoplasmic reticulum
Mi_A_CNL34LOC123195951CChr1424724072475919+Chloroplast
Mi_A_CNL35LOC123199462CChr1613219231325404Nucleus
Mi_A_CNL36LOC123199183AChr1622626522271819+Nucleus
Mi_A_CNL37LOC123199184AChr1622790912283292+Nucleus
Mi_A_CNL38LOC123198840CChr161312770313134495+Nucleus
Mi_A_CNL39LOC123199400CChr161314638513150951+Nucleus
Mi_A_CNL40LOC123199894CChr1736835513709400Cytoplasm
Mi_A_CNL41LOC123199895CChr1737293363757669Cytoplasm
Mi_A_CNL42LOC123200365CChr171090661010909877+Chloroplast
Mi_A_CNL43LOC123200233CChr171095822410960998+Cytoplasm
Mi_A_CNL44LOC123199961CChr171098538510988251+Cytoplasm
Mi_A_CNL45LOC123202131BChr181298221412985341Nucleus
Mi_A_CNL46LOC123203903CChr201095428010957614Chloroplast
Mi_A_CNL47LOC123203859CChr201099692711000057Endoplasmic reticulum
M. indica (Hong Xiang Ya)
Mi_H_CNL1GWHGABLA018645CChr221815462184782Cytoplasm
Mi_H_CNL2GWHGABLA018663BChr224881142492313+Chloroplast
Mi_H_CNL3GWHGABLA018787DChr241287184131587+Chloroplast
Mi_H_CNL4GWHGABLA024667BChr465554786561794Nucleus
Mi_H_CNL5GWHGABLA024671BChr465752086577890Nucleus
Mi_H_CNL6GWHGABLA024672BChr465817736585107Nucleus
Mi_H_CNL7GWHGABLA027774CChr612559051258921Cytoplasm
Mi_H_CNL8GWHGABLA027891AChr621104352117173+Nucleus
Mi_H_CNL9GWHGABLA027892AChr621240122143734+Nucleus
Mi_H_CNL10GWHGABLA029891CChr71463104014634411+Endoplasmic reticulum
Mi_H_CNL11GWHGABLA002040CChr1083241068326913Endoplasmic reticulum
Mi_H_CNL12GWHGABLA002331DChr101117325111175860Nucleus
Mi_H_CNL13GWHGABLA002335DChr101120285211205713Cytoplasm
Mi_H_CNL14GWHGABLA002469BChr101221403212216701Cytoplasm
Mi_H_CNL15GWHGABLA002470BChr101223147012234153Nucleus
Mi_H_CNL16GWHGABLA005206CChr1225274972530340Cytoplasm
Mi_H_CNL17GWHGABLA005208CChr1225501482552999Endoplasmic reticulum
Mi_H_CNL18GWHGABLA006200BChr121304272313048093+Nucleus
Mi_H_CNL19GWHGABLA006205BChr121315611213161109+Nucleus
Mi_H_CNL20GWHGABLA006208BChr121322414513229537+Nucleus
Mi_H_CNL21GWHGABLA009646AChr1526047022606388Cytoplasm
Mi_H_CNL22GWHGABLA009647AChr1526111242613816Nucleus
Mi_H_CNL23GWHGABLA011611CChr161191547711918558+Chloroplast
Mi_H_CNL24GWHGABLA015785CChr182141899021421374Chloroplast
Mi_H_CNL25GWHGABLA016656AChr1954333995436916Nucleus
Mi_H_CNL26GWHGABLA018144BChr192572192025724934+Chloroplast
Mi_H_CNL27GWHGABLA018380CChr192753206927534275Nucleus
M. indica (Tommy Atkins)
Mi_T_CNL1Manin02g000840CChr212291731232869Endoplasmic reticulum
Mi_T_CNL2Manin03g005110CChr398965199899590Nucleus
Mi_T_CNL3Manin03g005120CChr399043929906972Chloroplast
Mi_T_CNL4Manin03g005130CChr399115919919202+Endoplasmic reticulum
Mi_T_CNL5Manin03g005150CChr399396179944041+Cytoplasm
Mi_T_CNL6Manin04g007540BChr457544145759283Cytoplasm
Mi_T_CNL7Manin04g008210BChr462562156268375Nucleus
Mi_T_CNL8Manin04g016160BChr41802295518025714Cytoplasm
Mi_T_CNL9Manin06g001600CChr612536201261613Nucleus
Mi_T_CNL10Manin07g007570CChr71053492110537638+Cytoplasm
Mi_T_CNL11Manin07g007580CChr71056014410562849+Cytoplasm
Mi_T_CNL12Manin07g007800CChr71087067210875128+Nucleus
Mi_T_CNL13Manin10g008560BChr101077543410778355Nucleus
Mi_T_CNL14Manin10g008590BChr101081817410820843Nucleus
Mi_T_CNL15Manin10g008600BChr101083593910839756Nucleus
Mi_T_CNL16Manin12g002700CChr1222309622233805Chloroplast
Mi_T_CNL17Manin12g002730CChr1222536052256448Chloroplast
Mi_T_CNL18Manin12g002740CChr1222657252269856Endoplasmic reticulum
Mi_T_CNL19Manin12g002750CChr1222829822292228+Nucleus
Mi_T_CNL20Manin13g010790CChr131229871612301259Nucleus
Mi_T_CNL21Manin15g003400AChr1526384862655784Chloroplast
Mi_T_CNL22Manin15g003410AChr1526559992657909Nucleus
Mi_T_CNL23Manin16g007090CChr161328430513299820+Nucleus
Mi_T_CNL24Manin17g007260CChr171067851310681296Chloroplast
Mi_T_CNL25Manin18g001880CChr1812296711250876+Chloroplast
Mi_T_CNL26Manin18g010170CChr1898707159873417+Nucleus
Mi_T_CNL27Manin19g006820AChr1954299935433782Nucleus
Mi_T_CNL28Manin19g009550CChr1988442548846864+Cytoplasm
Mi_T_CNL29Manin19g009570CChr1988915368894145+Nucleus
Mi_T_CNL30Manin19g009590CChr1989290568936542+Nucleus
Mi_T_CNL31Manin19g009600CChr1989448858947491+Nucleus
Mi_T_CNL32Manin19g014760BChr192032112520323863+Chloroplast
Mi_T_CNL33Manin00g008100C100000012391392723917394Cytoplasm
Mi_T_CNL34Manin00g008730C100000012532099625326389Nucleus
Mi_T_CNL35Manin00g008930B100000012558373225586779Nucleus
Mi_T_CNL36Manin00g017170C100000014595893945968686+Endoplasmic reticulum

Details of identified CNLs in M. indica cultivars.

The physical and chemical properties of all MiCNL proteins were analyzed (Table S3). There were no significant differences in amino acid residue number, molecular weights, isoelectric point instability index, aliphatic index, and GRAVY among the three cultivars. In all cultivars, most of the proteins have an isoelectric point (pI) less than 7 indicating that these proteins have acidic behavior. The instability index (II) values of most proteins indicated that these are unstable in the test tube. Most of the proteins have an aliphatic index (AI) greater than 70 which indicates that these proteins are thermally stable, and negative GRAVY values indicate that these proteins are hydrophilic (Figure 3). The protein’s subcellular localization shows that most of the proteins were present in the cytoplasm and nucleus. Few proteins were present in the chloroplast and endoplasmic reticulum (Table 1).

Figure 3

3.3 Phylogenetic relationships of CNL family members from three M. indica cultivars

To analyze the possible evolutionary relationship of the CNL gene family in M. indica cultivars, a phylogenetic tree was constructed using 204 amino acid sequences from six species. All CNL proteins were clustered into four groups. In comparison, group C contained the most CNL gene family members including 27 Mi_A_CNLs, 9 Mi_H_CNLs, 25 Mi_T_CNLs, 7 AtCNLs, 5 CsCNLs, and 23 CsaCNLs followed by group B which contain 13 Mi_A_CNLs, 10 Mi_H_CNLs, 8 Mi_T_CNLs, 23 AtCNLs, 5 CsCNLs, and 7 CsaCNLs. Group A contains 7 Mi_A_CNLs, 5 Mi_H_CNLs, 3 Mi_T_CNLs, 5 AtCNLs, and 3 CsaCNLs. Group D had only 3 members of Mi_H_CNLs and 16 members of AtCNLs. No member of Mi_A_CNLs, Mi_T_CNLs, CsCNLs and CsaCNLs was present in group D (Figure 4).

Figure 4

3.4 Conserved motifs, and gene structure analysis of MiCNLs

Overall, 20 motifs were chosen to analyze the pattern of conserved motifs among the MiCNLs. These motifs were identified through annotation from the Pfam database. The NBS domain consists of 8 motifs. Specifically, motif 1 was identified as the P-loop (Kinase a), motif 3 as GLPL, motif 4 as RNBS-D, motif 6 as MHD, motif 7 as Kinase-2, motif 8 as RNBS-C, motif 10 as RNBS-A, motif 13 as LRR. Out of 20, a total of 12 motifs (1, 3, 4, 6, 7, 8, 9, 10, 11, 13, 14, and 19) were conserved in all proteins of Alphonso. Motifs 2 and 12 were only conserved in the members of group C. Motif 5 was conserved in all proteins except the proteins of group A (Figure 5A). In H. Xiang Ya 8 motifs (1,2,3,4,5,6,7, and 8) were conserved in all proteins expects 2 proteins (Mi_H_CNL21 and Mi_H_CNL23). Motif 18 was only conserved in group B (Figure S1A). In T. atkins 6 motifs (1, 2, 3, 8, 9, and 12) were conserved among all members. Motifs 5 and 16 were only conserved in the members of group C (Figure S2A).

Figure 5

In Alphonso, gene structure varies from one group to another group. In group A, all the members have 5 exons and 4 introns. Group B has 1-4 exons and 0-3 introns, while members of Group C have 1-2 exons and 0-1 introns. Most of the members in group C have only 1 exon and no intron (Figure 5B). In H. Xiang Ya group A exons ranged from 3-13 and introns ranged from 2-12. Group B has 2-5 exons and 1-4 introns. Group C has 1-4 exons and 0-3 introns, while all members of Group D have only 2 exons and 1 intron (Figure S1B). In group A, T. atkins exons had a range from 5-13 and introns had a range from 4-12. Most of the members of group B had 1 exon but few members had a range of 1-3 exons and 0-2 introns. Members of group C have 1-15 exons and 0-14 introns (Figure S2B).

3.5 Chromosomal mapping and gene duplication analysis

In Alphonso, 47 CNL genes were distributed unevenly on 15 out of 20 chromosomes. It had maximum genes (9) at Chr3, and minimum genes (1) at Chr5, 6, 7, 10, 12, and 18. There was no gene on Chr 8, 9, 13, 15, and 19 (Figure 6). In H. Xiang Ya 27 genes were mapped unevenly at 10 out of 20 chromosomes. In this maximum gene (5) were present at chromosome 10 and 12, and minimum gene (1) was present at chromosome 7, 16, and 18. No gene was present at chromosomes 1, 3, 5, 8, 9, 11, 13, 14, and 17 (Figure S3). In T. Atkins 36 genes were present on 13 out of 20 chromosomes and on the scaffold. Chr19 has the maximum number of genes (5 genes) and Chr2, 6, 13, 16, and 17 have minimum numbers of genes (1 gene) on each chromosome. Four genes Mi_T_CNL33-36 were present on the scaffold (10000001). No gene was present on Chr1, 5, 8, 9, 11, and 14 (Figure S4).

Figure 6

Gene duplication events were also analyzed among Mi_A_CNL, Mi_H_CNL, and Mi_T_CNL genes and a total of 37, 8, and 5 duplicated pairs of genes were found among all the members respectively. Most of the members were tandemly duplicated. On the other hand, a few members resulted from segmental duplication. Thus, in line with previous studies, these findings indicated that tandem, as well as segmental duplications, were the main factor causing the increase of the CNL gene family in M. indica cultivars (Table 2).

Table 2

Gene 1Gene 2KaKsKa/KsTime (MYA)Duplication Type
Mi_A_CNL1Mi_A_CNL20.00660.01030.6407766990.343333333Tandem
Mi_A_CNL1Mi_A_CNL352.12052.65240.79946463688.41333333Segmental
Mi_A_CNL1Mi_A_CNL361.55611.61410.96406666353.80333333Segmental
Mi_A_CNL2Mi_A_CNL361.53261.67050.91744986555.68333333Segmental
Mi_A_CNL2Mi_A_CNL371.49161.77820.83882577959.27333333Segmental
Mi_A_CNL3Mi_A_CNL40.34610.45320.76368049415.10666667Tandem
Mi_A_CNL3Mi_A_CNL380.38940.55960.69585418218.65333333Segmental
Mi_A_CNL6Mi_A_CNL400.10190.14850.6861952864.95Segmental
Mi_A_CNL6Mi_A_CNL410.09380.13280.7063253014.426666667Segmental
Mi_A_CNL8Mi_A_CNL90.31420.49310.63719326716.43666667Tandem
Mi_A_CNL8Mi_A_CNL470.46250.57370.80617047219.12333333Segmental
Mi_A_CNL9Mi_A_CNL102.52022.6790.94072415189.3Tandem
Mi_A_CNL9Mi_A_CNL112.42723.03480.799789113101.16Tandem
Mi_A_CNL10Mi_A_CNL110.06270.10430.6011505273.476666667Tandem
Mi_A_CNL10Mi_A_CNL120.08490.07891.0760456272.63Tandem
Mi_A_CNL11Mi_A_CNL120.06830.09330.732047163.11Tandem
Mi_A_CNL13Mi_A_CNL140.05550.07960.6972361812.653333333Tandem
Mi_A_CNL15Mi_A_CNL160.11560.1440.8027777784.8Tandem
Mi_A_CNL15Mi_A_CNL220.13460.12191.1041837574.063333333Tandem
Mi_A_CNL16Mi_A_CNL220.18880.15771.1972098925.256666667Tandem
Mi_A_CNL17Mi_A_CNL180.07780.12940.6012364764.313333333Tandem
Mi_A_CNL17Mi_A_CNL190.080.13230.6046863194.41Tandem
Mi_A_CNL17Mi_A_CNL200.09490.15550.6102893895.183333333Tandem
Mi_A_CNL18Mi_A_CNL190.0350.06810.5139500732.27Tandem
Mi_A_CNL18Mi_A_CNL200.06930.12880.5380434784.293333333Tandem
Mi_A_CNL19Mi_A_CNL200.05920.11120.5323741013.706666667Tandem
Mi_A_CNL30Mi_A_CNL310.07850.14550.53951894.85Tandem
Mi_A_CNL34Mi_A_CNL420.14280.18140.7872105846.046666667Segmental
Mi_A_CNL34Mi_A_CNL430.13680.21350.6407494157.116666667Segmental
Mi_A_CNL34Mi_A_CNL440.13320.18850.70663136.283333333Segmental
Mi_A_CNL36Mi_A_CNL370.02980.05480.543795621.826666667Tandem
Mi_A_CNL38Mi_A_CNL390.05470.05141.0642023351.713333333Tandem
Mi_A_CNL40Mi_A_CNL410.07930.12040.6586378744.013333333Tandem
Mi_A_CNL42Mi_A_CNL430.04390.04420.993212671.473333333Tandem
Mi_A_CNL42Mi_A_CNL440.03910.03071.2736156351.023333333Tandem
Mi_A_CNL43Mi_A_CNL440.03350.03640.920329671.213333333Tandem
Mi_A_CNL46Mi_A_CNL470.14820.18350.8076294286.116666667Tandem
Mi_H_CNL4Mi_H_CNL61.47821.33320.90190772649.27333333Tandem
Mi_H_CNL5Mi_H_CNL61.13881.25721.1039690937.96Tandem
Mi_H_CNL8Mi_H_CNL91.49031.26440.84841978149.67666667Tandem
Mi_H_CNL10Mi_H_CNL161.18661.26781.06843081139.55333333Segmental
Mi_H_CNL18Mi_H_CNL190.08870.05260.5930101472.956666667Tandem
Mi_H_CNL18Mi_H_CNL201.0440.87110.83438697334.8Tandem
Mi_H_CNL19Mi_H_CNL201.18650.88860.74892541139.55Tandem
Mi_H_CNL21Mi_H_CNL221.1931.72341.44459346239.76666667Tandem
Mi_T_CNL2Mi_T_CNL230.7731.0540.73339658435.13333333Segmental
Mi_T_CNL10Mi_T_CNL173.2530.273511.893967099.116666667Segmental
Mi_T_CNL11Mi_T_CNL161.00820.57161.76382085419.05333333Segmental
Mi_T_CNL11Mi_T_CNL171.09410.62841.74108847920.94666667Segmental
Mi_T_CNL29Mi_T_CNL310.90261.42120.63509710147.37333333Tandem

Duplication data of three M. indica cultivars genes, rate of synonymous and non-synonymous mutations, duplication time (MYA), and type of duplication between genes.

To analyze the evolutionary constraints of the repeated MiCNL genes, the Ka, Ks, and Ka/Ks ratios of all para-homologous gene pairs were also calculated. In Mi_A_CNLs, Mi_H_CNLs, and Mi_T_CNLs gene pairs had Ka/Ks values ranging from 0.51 to 1.27, 0.59 to 1.44, and 0.63 to 11.89 respectively. Resultantly, the time of divergence of all 50 duplicated gene pairs of Mi_CNLs was between 0.3 to 88.4 million years (MYA).

3.6 Prediction of cis-regulatory elements in the promoter of MiCNL genes

The cis-regulatory elements were analyzed to further predict the involvement of MiCNL genes in the regulation of abiotic stresses. In all M. indica cultivars several cis-elements were found which were further classified into light-related, hormone-related, stress-related, and development-related elements (Table S4). Regarding these elements, for cis-elements Box 4, G-box, GT1-motif, and GATA-motif were found to be involved in light-stress regulation. Five cis-elements were involved with hormone responsiveness: ABRE, CGTCA-motif, TGA-element, P-box, and TCA-element. Further, four cis-elements were found to be involved with stress responsiveness: GC-motif, LTR, TC-rich repeats, and MBS. Five elements including CAT-box, MBSI, circadian, HD-Zip 1, and o2-site were involved in developmental processes. In Mi_A_CNLs light and stress-related cis-elements were mostly present (Figure 7) while in Mi_H_CNLs hormones-related cis-elements were mostly present (Figure S5). Mi_T_CNLs have mostly cis-elements related to hormones, stress, and development (Figure S6).

Figure 7

3.7 PPI and gene ontology enrichment analysis

MiCNL proteins were evaluated to identify interactions among them to understand their functional interactions. Interacting proteins might be involved in a pathway, thus affecting the roles of other proteins and giving an overall response. Some MiCNL proteins were found to interact with the other CNL as well as the other homologous proteins. Among MiCNLs, Mi_T_CNLs showed the highest interactions. Mi_T_CNL18 and Mi_T_CNL12 were among the highly interacting proteins. Further, Mi_T_CNL9, Mi_T_CNL26, and Mi_H_CNL2 also showed great interactions with other defense-responsive proteins (Figure 8A).

Figure 8

Gene Ontology (GO) enrichment analysis was performed on the MiCNL genes. According to GO analysis, these genes were involved in a KEGG pathway: Plant pathogen interactions (GO: ath04626), Molecular functions including ADP binding (GO:0043531), Adenyl ribonucleotide binding (GO:0032559), ATP binding (GO:0005524) and Anion binding (GO:0043168). Moreover, these proteins were found to be in the plasma membrane (GO:0005886). These proteins also participate in a variety of biological processes including Defense response (GO:0006952), Plant-type hypersensitive response (GO:0009626), Defense response to other organisms (GO:0098542), Cellular response to stress (GO:0033554), Regulation of immune system process (GO:0002682), and Defense response to the bacterium (GO:0042742) (Table S5; Figure 8B).

3.8 Expression analysis of Mi_A_CNL genes

To further investigate the roles of these genes, their expression patterns were observed in disease and cold stress. In the disease stage, few genes showed fluctuated expression as Mi_A_CNL13 and Mi_A_CNL14 were up-regulated in fruit peel and Mi_A_CNL15, 25, 30, 31, 40 were down-regulated (Figure 9A). In cold stress Mi_A_CNL2, 14, 41, 45 were up-regulated and Mi_A_CNL47 is down-regulated (Figure 9B). Overall, the expression level of the remaining genes was found to be similar in each stress and condition.

Figure 9

3.9 Structure prediction of Mi_A_CNL proteins

To obtain more structural and ultimately functional insights, the 3D structures of four Mi_A_CNLs proteins (Mi_A_CNL13, 14, 25, and 30) were modeled. All these structures shared almost similar structures of loops, helices, and turns. All these structures contained a great number of helices. The basic structure was similar such as turns on the left side of structures (leucine-rich repeats) are visible in every modeled protein. Moreover, the number of helices in each protein is also the same (Figure 10).

Figure 10

3.10 Performance evaluation of multi-stress responsive genes

A total of 15 genes were found to be present in both disease and cold datasets. A machine learning classifier, a random forest algorithm, was employed to assess their performance (Table 3). Using the count’s data of disease stress as the training dataset, it was analyzed that only one gene (Mi_A_CNL14) was rigorously tested for its multi-stress responsiveness. The classification model’s sensitivity, specificity, and overall accuracy were evaluated using the Receiver Operating Characteristic (ROC) plot. Impressively, Mi_A_CNL14 demonstrated a ROC value of 0.8333, indicating its acceptable performance as a potential multi-stress responsive gene. Figure S7 visually represents the ROC plot for Mi_A_CNL14, providing supporting evidence of its classification efficacy.

Table 3

Gene SymbolDiseaseCold
log2FC ( ± 0.5)p-value (<0.05)log2FC ( ± 0.5)p-value (<0.05)
Mi_A_CNL13-4.70267.13E-1762.5526671.32E-23
Mi_A_CNL14-3.916024.2E-1150.9322222.63E-08
Mi_A_CNL150.6386740.001414-2.49642.6E-08
Mi_A_CNL181.3326360.000346-1.929573.33E-11
Mi_A_CNL191.7161020.027093-2.799162.25E-09
Mi_A_CNL273.4342710.000113-1.952960.01815
Mi_A_CNL28-1.558537.4E-131.43284.57E-09
Mi_A_CNL311.6118881.22E-16-0.681610.000286
Mi_A_CNL32-2.812241.91E-271.026660.00068
Mi_A_CNL331.3059530.0032251.1049970.000268
Mi_A_CNL34-1.077791.38E-071.8559671.68E-14
Mi_A_CNL360.8185570.002012-2.520272.35E-20
Mi_A_CNL390.731570.0019351.014125.69E-05
Mi_A_CNL402.6797191.49E-43-1.312049.77E-20
Mi_A_CNL45-1.246959.26E-141.1179632.01E-11

Summary of common DEGs identified in disease and cold stress.

4 Discussion

Disease resistance R genes in plants are essential for effector-triggered immunity (ETI) because they have mechanisms for identifying pathogens in plants and protecting the plants directly or indirectly (Yang and Wang, 2015). The NBS-LRR class of these R genes, having most of the NBS and LRR domains at the C-terminal, encodes the largest family of all the five classes of these proteins (Meyers et al., 2003). Two major subfamilies of the NBS-LRR protein family are usually found: toll/interleukin-1 receptor-NBS-LRR (TNL) and coil-coil-NBS-LRR (CNL) (Shao et al., 2014).

In this study, three mango cultivars were analyzed to identify CNL genes in their genome by constructing a draft pan-genome. The CNL gene family is widespread in a variety of plant species such as C. sinensis (Yin et al., 2023), Carica papaya (Porter et al., 2009), C. sativus L. (Zhang et al., 2022), H. annuus L. (Neupane et al., 2018), O. sativa (Zhou et al., 2004), Populus trichocarpa (Kohler et al., 2008), Solanum lycopersicum (Andolfo et al., 2013), Solanum tuberosum (Jupe et al., 2012), and B. rapa (Liu et al., 2021). In this study 47, 27, and 36 CNL genes were found in Alphonso, H. Xiang Ya, and T. atkins respectively. The varying number of CC-NBS-LRR genes, specifically MiCNL genes, among three mango cultivars (ranging from 27 to 47) suggests intraspecific genomic diversity. This phenomenon may be attributed to factors such as genetic drift, environmental pressures, historical hybridization events, gene duplications, and transposon-mediated processes. Similarly, these members showed greater variation in numbers among different plants such as B. rapa which has 40 members [61]. The radish genome had 19 members [55]. Furthermore, A. thaliana has 51 identified members (Meyers et al., 2003).

Pan-genome wide analysis provides a comprehensive overview of diversity at the genomic level involving multiple species, which may lead to the identification of unique genes that are present in specific species instead of being present in all genomes under study (Tahir ul Qamar et al., 2020). Similarly, in this study, three unique genes were identified only in the H. Xiang Ya cultivar including Mi_H_CNL3, 12, and 13. The phylogenetic analysis categorized CNL genes into four groups (A, B, C, and D) using A. thaliana as a reference. The clade of Group C was the largest and Group D was the smallest. All of these genes belonging to the same subgroup were clustered together and shared the same homology, even members from the other species as well. However, none of the CNL genes from Alphonso and T. atkins were found in group D. Similarly, in the case of C. sativus L. no gene was present in group D (Zhang et al., 2022). H. Xiang Ya is the only cultivar that has three members in group D named Mi_H_CNL3, 12, and 13.

The conservation of motifs and gene structures was similar to the ones observed in previous studies like C. sinensis and B. rapa, in which very few such as one exon or intron were found. Similarly, the conservation of motifs among groups was also the same (Kohler et al., 2008). The observed differences in the number of exons and introns among mango cultivars and other species imply the evolutionary changes in gene structures over time, potentially impacting their functional conservation. This suggests diversification of CNL genes. Despite this variation in gene structure, most genes share a similar number of conserved motifs, indicating the preservation of their functions throughout evolution.

Chromosomal mapping indicates that the CNL genes in all three cultivars are distributed unevenly but among all cultivars, most of the genes are present in the form of clusters. The same trend was observed in A. thaliana (Meyers et al., 2003), C. sinensis (Yin et al., 2023), R. sativus L. (Ma et al., 2021), B. rapa (Liu et al., 2021), and O. sativa (Zhou et al., 2004). Most of these genes in three cultivars were found to have undergone tandem duplication. A similar pattern of duplication was observed in B. rapa (Ma et al., 2021) and C. sinensis (Yin et al., 2023). The evaluation of selection pressure on genes involved the use of the Ka/Ks ratio, which represents the ratio of non-synonymous (Ka) to synonymous (Ks) mutations. A Ka/Ks ratio greater than 1 indicates positive selection, while a ratio less than 1 signifies purifying selection. The analysis of mango cultivars revealed evidence of both positive and purifying selection acting on the studied genes.

The promoter region of these genes showed several stress-related elements that further confirm the involvement of these genes in different abiotic and disease-resistant stresses. Other plants have also been shown to have these elements which confer resistance to various environmental stresses. Black rot (BR) is a bacterial disease caused by Xanthomonas campestris pv. campestris dowson, which infects many Brassica species, such as cabbage (Brassica oleracea var), Chinese cabbage (Brassica pekinensis), and oil seed rape (Brassica campestris) (Zeilmaker et al., 2015; Zhang et al., 2018). All these findings help us understand the involvement of these genes in various stresses.

Protein-Protein interaction studies showed that few proteins from mango cultivars interacted with other defense-responsive proteins including TIR, RIN1, RIN4, PBS1, PBS2, and RPP5. GO analysis revealed that most CNL genes are located in the plasma membrane and involved in defense responses, ADP binding, ATP binding, anion binding, and adenyl ribonucleotide binding. In Vitis vinifera, NBS-LRR genes are also involved in defense responses, ADP binding, and ATP binding (Goyal et al., 2020).

The expression profiling of these genes showed their varied expression in disease and cold stress. The expression was analyzed in fruit peel. In response to disease stress, Mi_A_CNL13 and Mi_A_CNL14 were up-regulated, whereas Mi_A_CNL15, 25, 30, 31, and 40 were down-regulated. Conversely, under cold stress, Mi_A_CNL2, 14, 41, and 45 were up-regulated, while Mi_A_CNL47 was down-regulated. In B. rapa most of the CNL genes have the same trend but in C. sativus L. most of the CNL genes were up-regulated in salt and chilling (cold) stress (Liu et al., 2021; Zhang et al., 2022). Based on expression values the 3D structures of four proteins were also predicted to help understand their structural as well as functional conservations and all four proteins have almost the same number of alpha-helices and beta sheets.

Furthermore, random forest, the machine learning approach was utilized to evaluate the genes that were showing multi-stress responses in both disease and cold stress. A total of 15 genes were common in both datasets but only one gene (Mi_A_CNL14) was significantly involved in multi-stress response. Some other studies also utilized the same methods to evaluate genes involved in multi-stress response (Fatima et al., 2023). Therefore, it can be concluded that CNL genes can significantly benefit mango genetic improvement through breeding or genetic manipulation, by conferring disease resistance and enhancing tolerance to abiotic stresses. Their role in multi-stress responsiveness, as suggested by our analysis, makes them valuable candidates for further breeding programs seeking mango varieties with robust adaptability to diverse environmental conditions. Breeding for MiCNL gene related traits could lead to healthier mango plants, reduced pesticide dependency, and improved sustainability in mango cultivation.

5 Conclusion

In this study, a draft pan-genome was constructed and PAVs were scanned through the ppsPCP pipeline using three mango cultivars in which 47 genes in Alphonso, 27 in H. Xiang Ya, and 36 in T. atkins have been identified. These were classified into four groups: A, B, C, and D. All the members from the same group shared greater conservation in motif and gene structure. Few segmental and most tandemly duplicated pairs were found. A large number of cis-regulatory elements related to light, hormones, stress, and development responsive were found in promoter regions of mango CNLs. PPI showed CNL proteins interact with CNL and other defense-responsive proteins. and GO enrichment analysis revealed their interaction and involvement in pathways as well as processes related to defense response. Structure prediction showed high similarity among members of the same groups. Expression profiling of mango fruit peel under disease stress revealed that Mi_A_CNL13 and 14 were up-regulated while Mi_A_CNL15, 25, 30, 31, and 40 were down-regulated. On the other hand, in cold stress Mi_A_CNL2, 14, 41, 45 were up-regulated and Mi_A_CNL47 is down-regulated. Machine learning approaches indicate that out of 15 common genes, only one gene (Mi_A_CNL14) can be a multi-stress responsive gene (Super gene). Our results provide a solid foundation to further investigate the function of CNLs in regulating various abiotic and environmental stress responses and more accessions should be sequenced to improve the quality of the reference genome.

Statements

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

Author contributions

MT: Conceptualization, Data curation, Formal Analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Visualization, Writing – original draft. MS: Data curation, Investigation, Methodology, Visualization, Writing – original draft. X-TZ: Data curation, Methodology, Software, Validation, Writing – review & editing. HL: Data curation, Investigation, Methodology, Validation, Writing – review & editing. XH: Conceptualization, Data curation, Methodology, Validation, Visualization, Writing – review & editing. KF: Data curation, Investigation, Methodology, Validation, Writing – review & editing. MA: Data curation, Formal analysis, Funding acquisition, Investigation, Resources, Validation, Writing – review & editing. L-LC: Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Validation, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by the Starting Research Grant for High-level Talents from Guangxi University and Postdoctoral research platform grant of Guangxi University.

Acknowledgments

The researchers supporting project number (RSP2023R494), King Saud University, Riyadh, Saudi Arabia.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2023.1285547/full#supplementary-material

References

  • 1

    AlexanderN.WoetzelN.MeilerJ. (2011). “Bcl::Cluster: A method for clustering biological molecules coupled with visualization in the pymol molecular graphics system,” in 2011 IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences, ICCABS. (Orlando, FL, USA: IEEE), 1318. doi: 10.1109/ICCABS.2011.5729867

  • 2

    AndersS.HuberW. (2012). Differential Expression of RNA-Seq Data at the Gene Level–the DESeq Package Vol. 10 (Heidelberg, Germany: European Molecular Biology Laboratory (EMBL), f1000research.

  • 3

    AndersS.PylP. T.HuberW. (2015). HTSeq—a python framework to work with high-throughput sequencing data. Bioinformatics31 (2), 166169. doi: 10.1093/bioinformatics/btu638

  • 4

    AndolfoG.SanseverinoW.RombautsS.de PeerY. V.BradeenJ. M.CarputoD.et al. (2013). Overview of tomato (Solanum lycopersicum) candidate pathogen recognition genes reveals important solanum R locus dynamics. New Phytol.197 (1), 223237. doi: 10.1111/j.1469-8137.2012.04380.x

  • 5

    BaileyT. L.JohnsonJ.GrantC. E.NobleW. S. (2015). The MEME suite. Nucleic Acids Res.43 (W1), W39W49. doi: 10.1093/nar/gkv416

  • 6

    BakerC. M.ChitrakarR.ObulareddyN.PanchalS.WilliamsP.MelottoM. (2010). Molecular battles between plant and pathogenic bacteria in the phyllosphere. Braz. J. Med. Biol. Res.43 (8), 698704. doi: 10.1590/S0100-879X2010007500060

  • 7

    BatemanA.CoinL.DurbinR.FinnR. D.HollichV.Griffiths-JonesS.et al. (2004). The pfam protein families database. Nucleic Acids Res.32 (suppl_1), D138D141. doi: 10.1093/nar/gkh121

  • 8

    BelkhadirY.SubramaniamR.DanglJ. L. (2004). Plant disease resistance protein signaling: NBS-LRR proteins and their partners. Curr. Opin. Plant Biol.7 (4), 391399. doi: 10.1016/j.pbi.2004.05.009

  • 9

    BrownJ.PirrungM.McCueL. A. (2017). FQC dashboard: integrates fastQC results into a web-based, interactive, and extensible FASTQ quality control tool. Bioinformatics33 (19), 31373139. doi: 10.1093/bioinformatics/btx373

  • 10

    ChenC.XiaR.ChenH.HeY. (2018). TBtools, a toolkit for biologists integrating various HTS-data handling tools with a user-friendly interface. TBtools Toolkit Biologists Integrating Various HTS-Data Handling Tools User-Friendly Interface, 289660. doi: 10.1101/289660

  • 11

    DavisI. W.Leaver-FayA.ChenV. B.BlockJ. N.KapralG. J.WangX.et al. (2007). MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res.35 (SUPPL.2), 375383. doi: 10.1093/nar/gkm216

  • 12

    DennisG.ShermanB. T.HosackD. A.YangJ.GaoW.LaneH.C.et al. (2003). DAVID: database for annotation, visualization, and integrated discovery. Genome Biol.4 (9), 111. doi: 10.1186/gb-2003-4-9-r60

  • 13

    DeslandesL.OlivierJ.PeetersN.FengD. X.KhounlothamM.BoucherC.et al. (2003). Physical interaction between RRS1-R, a protein conferring resistance to bacterial wilt, and popP2, a type III effector targeted to the plant nucleus. Proc. Natl. Acad. Sci. U.S.A.100 (13), 80248029. doi: 10.1073/pnas.1230660100

  • 14

    DeYoungB. J.InnesR. W. (2006). Plant NBS-LRR proteins in pathogen sensing and host defense. Nat. Immunol.7 (12), 12431249. doi: 10.1038/ni1410

  • 15

    FatimaK.SadaqatM.AzeemF.RaoM. J.AlbekairiN. A.AlshammariA.et al. (2023). Integrated omics and machine learning-assisted profiling of cysteine-rich-receptor-like kinases from three peanut spp. Revealed their role in multiple stresses. Front. Genet.14. doi: 10.3389/fgene.2023.1252020

  • 16

    FinnR. D.ClementsJ.EddyS. R. (2011). HMMER web server : interactive sequence similarity searching. Nucleic Acid Res.39, May, 2937. doi: 10.1093/nar/gkr367

  • 17

    FlagelL. E.WendelJ. F. (2009). Gene duplication and evolutionary novelty in plants. New Phytol.183 (3), 557564. doi: 10.1111/j.1469-8137.2009.02923.x

  • 18

    GasteigerE.HooglandC.GattikerA.WilkinsM. R.AppelR. D.BairochA. (2005). Protein identification and analysis tools on the exPASy server. Proteomics Protoc. Handb., 571607. doi: 10.1385/1-59259-890-0:571

  • 19

    Gómez-OlléA.BullonesA.HormazaJ. I.MuellerL. A.Fernandez-PozoN. (2023). MangoBase: A genomics portal and gene expression atlas for mangifera indica. Plants12 (6), 1273. doi: 10.3390/plants12061273

  • 20

    GoyalN.BhatiaG.SharmaS.GarewalN.UpadhyayA.UpadhyayS. K.et al. (2020). Genome-wide characterization revealed role of NBS-LRR genes during powdery mildew infection in vitis vinifera. Genomics112 (1), 312322. doi: 10.1016/j.ygeno.2019.02.011

  • 21

    GururaniM. A.VenkateshJ.UpadhyayaC. P.NookarajuA.PandeyS. K.ParkSe W. (2012). Plant disease resistance genes: current status and future directions. Physiol. Mol. Plant Pathol.78, 5165. doi: 10.1016/j.pmpp.2012.01.002

  • 22

    HaakD. C.FukaoT.GreneR.HuaZ.IvanovR.PerrellaG.et al. (2017). Multilevel regulation of abiotic stress responses in plants. Front. Plant Sci.8 (September). doi: 10.3389/fpls.2017.01564

  • 23

    HortonP.ParkK.-J.ObayashiT.FujitaN.HaradaH.Adams-CollierC. J.et al. (2007). WoLF PSORT: protein localization predictor. Nucleic Acids Res.35 (suppl_2), W585W587. doi: 10.1093/nar/gkm259

  • 24

    HuY.LiY.HouF.WanD.ChengY.HanY.et al. (2018). Ectopic expression of arabidopsis broad-spectrum resistance gene RPW8.2 improves the resistance to powdery mildew in grapevine (Vitis vinifera). Plant Sci.267 (August 2017), 2031. doi: 10.1016/j.plantsci.2017.11.005

  • 25

    HuH.ShiB.ZhuW.ZhengB.ZhouK.QianM.et al. (2022). Genome-wide identification, characterization and expression analysis of mango (Mangifera indica l.) chalcone synthase (CHS) genes in response to light. Horticulturae8 (10), 968. doi: 10.3390/horticulturae8100968

  • 26

    HunterS.ApweilerR.AttwoodT. K.BairochA.BatemanA.BinnsD.et al. (2009). InterPro: the integrative protein signature database. Nucleic Acids Res.37 (suppl_1), D211D215. doi: 10.1093/nar/gkn785

  • 27

    JiaY.McAdamsS. A.BryanG. T.HersheyH. P.ValentB. (2000). Direct interaction of resistance gene and avirulence gene products confers rice blast resistance. EMBO J.19 (15), 40044014. doi: 10.1093/emboj/19.15.4004

  • 28

    JumperJ.EvansR.PritzelA.GreenT.FigurnovM.RonnebergerO.et al. (2021). Highly accurate protein structure prediction with alphaFold. Nature596 (7873), 583589. doi: 10.1038/s41586-021-03819-2

  • 29

    JupeF.PritchardL.EtheringtonG. J.MacKenzieK.CockP. J.A.WrightF.et al. (2012). Identification and localization of the NB-LRR gene family within the potato genome. BMC Genomics13 (1), 114. doi: 10.1186/1471-2164-13-75

  • 30

    KodamaY.ShumwayM.LeinonenR. (2012). The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res.40 (D1), 20112013. doi: 10.1093/nar/gkr854

  • 31

    KohlerA.RinaldiC.DuplessisS.BaucherM.GeelenD.DuchaussoyFrédéricet al. (2008). Genome-wide identification of NBS resistance genes in populus trichocarpa. Plant Mol. Biol.66 (6), 619636. doi: 10.1007/s11103-008-9293-9

  • 32

    LangdonW. B. (2015). “Performance of genetic programming optimized bowtie2 on genome comparison and analytic testing (GCAT) benchmarks.“. BioData Min.8 (1), 17. doi: 10.1186/s13040-014-0034-0

  • 33

    LetunicI.BorkP. (2021). Interactive tree of life (ITOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res.49 (W1), W293W296. doi: 10.1093/nar/gkab301

  • 34

    LiH.HandsakerB.WysokerA.FennellT.RuanJ.HomerN.et al. (2009). The sequence alignment/map format and SAMtools. Bioinformatics25 (16), 20782079. doi: 10.1093/bioinformatics/btp352

  • 35

    LiuY.LiD.YangN.ZhuX.HanK.GuR.et al. (2021). Genome-wide identification and analysis of cc-nbs-lrr family in response to downy mildew and black rot in chinese cabbage. Int. J. Mol. Sci.22 (8). doi: 10.3390/ijms22084266

  • 36

    MaY.ChhapekarS. S.LuL.OhS.SinghS.KimC. S.et al. (2021). Genome-wide identification and characterization of NBS-encoding genes in raphanus sativus L. and their roles related to fusarium oxysporum resistance. BMC Plant Biol.21 (1), 117. doi: 10.1186/s12870-020-02803-8

  • 37

    Marchler-BauerA.DerbyshireM. K.GonzalesN. R.LuS.ChitsazF.GeerL. Y.et al. (2015). CDD: NCBI’s conserved domain database. Nucleic Acids Res.43 (D1), D222D226. doi: 10.1093/nar/gku1221

  • 38

    MatsushimaN.MiyashitaH. (2012). Leucine-rich repeat (LRR) domains containing intervening motifs in plants. Biomolecules2 (2), 288311. doi: 10.3390/biom2020288

  • 39

    McDonnellA. V.JiangT.KeatingA. E.BergerB. (2006). Paircoil2: improved prediction of coiled coils from sequence. Bioinformatics22 (3), 356358. doi: 10.1093/bioinformatics/bti797

  • 40

    MeringC. v.HuynenM.JaeggiD.SchmidtS.BorkP.SnelB. (2003). STRING: A database of predicted functional associations between proteins. Nucleic Acids Res.31 (1), 258261. doi: 10.1093/nar/gkg034

  • 41

    MeyersB. C.KozikA.GriegoA.KuangH.MichelmoreR. W. (2003). Genome-wide analysis of NBS-LRR-encoding genes in arabidopsis. Plant Cell15 (4), 809834. doi: 10.1105/tpc.009308

  • 42

    NeupaneS.AndersenE. J.NeupaneA.NepalM. P. (2018). Genome-wide identification of NBS-encoding resistance genes in sunflower (Helianthus annuus L.). Genes9 (8), 384. doi: 10.3390/genes9080384

  • 43

    PorterB. W.PaidiM.MingR.AlamM.NishijimaW. T.ZhuY. J. (2009). Genome-wide analysis of carica papaya reveals a small NBS resistance gene family. Mol. Genet. Genomics281, 609626. doi: 10.1007/s00438-009-0434-x

  • 44

    QiY. (2012). “Random forest for bioinformatics.“. Ensemble Mach. Learning: Methods Appl.307–323. doi: 10.1007/978-1-4419-9326-7_11

  • 45

    QuintanaS. E.SalasS.García-ZapateiroL. A. (2021). Bioactive compounds of mango (Mangifera indica): A review of extraction technologies and chemical constituents. J. Sci. Food Agric.101 (15), 61866192. doi: 10.1002/jsfa.11455

  • 46

    RombautsS.DéhaisP.MontaguM. V.RouzéP. (1999). PlantCARE, a plant cis-acting regulatory element database. Nucleic Acids Res.27 (1), 295296. doi: 10.1093/nar/27.1.295

  • 47

    RozasJ.Ferrer-MataA.Sanchez-DelBarrioJ. C.Guirao-RicoS.LibradoP.Ramos-OnsinsS. E.et al. (2017). DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol. Biol. Evol.34 (12), 32993302. doi: 10.1093/molbev/msx248

  • 48

    SadaqatM.UmerB.AttiaK. A.AbdelkhalikA. F.AzeemF.JavedM. R.et al. (2023). Genome-wide identification and expression profiling of two-component system (TCS) genes in brassica oleracea in response to shade stress. Front. Genet.14 (May). doi: 10.3389/fgene.2023.1142544

  • 49

    SawalH. A.NighatS.SafdarT.AneesL. (2023). Comparative in silico analysis and functional characterization of TANK-binding kinase 1–binding protein 1. Bioinf. Biol. Insights17, 11779322231164828. doi: 10.1177/11779322231164828

  • 50

    ShannonP.MarkielA.OzierO.BaligaN. S.WangJ. T.RamageD.et al. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res.13 (11), 24982504. doi: 10.1101/gr.1239303

  • 51

    ShaoZ. Q.ZhangY. M.HangY. YuXueJ. YuZhouG. C.WuP.et al. (2014). Long-term evolution of nucleotide-binding site-leucine-rich repeat genes: understanding gained from and beyond the legume family. Plant Physiol.166 (1), 217234. doi: 10.1104/pp.114.243626

  • 52

    SongW.WangB.LiX.WeiJ.ChenL.ZhangD.et al. (2015). Identification of immune related LRR-containing genes in maize (Zea mays L.) by genome-wide sequence analysis. Int. J. Genomics. 2015, 11. doi: 10.1155/2015/231358

  • 53

    SukartaO. C.A.SlootwegE. J.GoverseA. (2016). Structure-informed insights for NLR functioning in plant immunity. Semin. Cell Dev. Biol.56, 134149. doi: 10.1016/j.semcdb.2016.05.012

  • 54

    Tahir ul QamarM.ZhuX.KhanM. S.XingF.ChenL.-L. (2020). Pan-genome: A promising resource for noncoding RNA discovery in plants. Plant Genome13 (3), e20046. doi: 10.1002/tpg2.20046

  • 55

    TamelingW. I.L.ElzingaS. D.J.DarminP. S.VossenJ. H.TakkenF. L.W.HaringM. A.et al. (2002). The tomato R gene products I-2 and mi-1 are functional ATP binding proteins with ATPase activity. Plant Cell14 (11), 29292939. doi: 10.1105/tpc.005793

  • 56

    TamuraK.StecherG.KumarS. (2021). MEGA11: molecular evolutionary genetics analysis version 11. Mol. Biol. Evol.38 (7), 30223027. doi: 10.1093/molbev/msab120

  • 57

    TarrD.EllenK.AlexanderH. M. (2009). TIR-NBS-LRR genes are rare in monocots: evidence from diverse monocot orders. BMC Res. Notes2, 110. doi: 10.1186/1756-0500-2-197

  • 58

    TsaiYu C.WeirN. R.HillK.ZhangW.KimH. J.ShiuS. H.et al. (2012). Characterization of genes involved in cytokinin signaling and metabolism from rice. Plant Physiol.158 (4), 16661684. doi: 10.1104/pp.111.192765

  • 59

    Ul QamarM. T.ZhuX.XingF.ChenL. (2019). PpsPCP: A plant presence/absence variants scanner and pan-genome construction pipeline. Bioinformatics35 (20), 41564158. doi: 10.1093/bioinformatics/btz168

  • 60

    WanH.YuanW.YeQ.WangR.RuanM.LiZ.et al. (2012). Analysis of TIR- and non-TIR-NBS-LRR disease resistance gene analogous in pepper: characterization, genetic variation, functional divergence and expression patterns. BMC Genomics13 (1), 1–15. doi: 10.1186/1471-2164-13-502

  • 61

    WangP.LuoY.HuangJ.GaoS.ZhuG.DangZ.et al. (2020). The genome evolution and domestication of tropical fruit mango. Genome Biol.21 (1), 117. doi: 10.1186/s13059-020-01959-8

  • 62

    XiaL.HeX.HuangX.YuH.LuT.XieX.et al. (2022). Genome-wide identification and expression analysis of the 14-3-3 gene family in mango (Mangifera indica L.). Int. J. Mol. Sci.23 (3), 1593. doi: 10.3390/ijms23031593

  • 63

    YangX.WangJ. (2015). Genome-wide analysis of NBS-LRR genes in sorghum genome revealed several events contributing to NBS-LRR gene evolution in grass species. Evol. Bioinf.12, 921. doi: 10.4137/EBO.S36433

  • 64

    YinT.HanP.XiD.YuW.ZhuL.DuC.et al. (2023). Genome-wide identification, characterization, and expression profile of NBS-LRR gene family in sweet orange (Citrus sinensis). Gene854 (August 2022). doi: 10.1016/j.gene.2022.147117

  • 65

    ZameerR.FatimaK.AzeemF.ALgwaizH. I.M.SadaqatM.RasheedA.et al. (2022). Genome-wide characterization of superoxide dismutase (SOD) genes in daucus carota: novel insights into structure, expression, and binding interaction with hydrogen peroxide (H2O2) under abiotic stress condition. Front. Plant Sci.13 (June). doi: 10.3389/fpls.2022.870241

  • 66

    ZameerR.SadaqatM.FatimaK.FiazS.RasulS.ZafarH.et al. (2021). Two-component system genes in sorghum bicolor: genome-wide identification and expression profiling in response to environmental stresses. Front. Genet.12 (November). doi: 10.3389/fgene.2021.794305

  • 67

    ZeilmakerT.LudwigN. R.ElberseJ.SeidlM. F.BerkeL.DoornA. V.et al. (2015). DOWNY MILDEW RESISTANT 6 and DMR 6-LIKE OXYGENASE 1 are partially redundant but distinct suppressors of immunity in arabidopsis. Plant J.81 (2), 210222. doi: 10.1111/tpj.12719

  • 68

    ZhangB.LiP.SuT.LiP.XinX.WangW.et al. (2018). BrRLP48, encoding a receptor-like protein, involved in downy mildew resistance in brassica rapa. Front. Plant Sci.9, 1708. doi: 10.3389/fpls.2018.01708

  • 69

    ZhangW.YuanQ.WuY.ZhangJ.NieJ. (2022). Genome-wide identification and characterization of the CC-NBS-LRR gene family in cucumber (Cucumis sativus L.). Int. J. Mol. Sci.23 (9), 121. doi: 10.3390/ijms23095048

  • 70

    ZhouT.WangY.ChenJ. Q.ArakiH.JingZ.JiangK.et al. (2004). “Genome-wide identification of NBS genes in japonica rice reveals significant expansion of divergent non-TIR NBS-LRR genes.“. Mol. Genet. Genomics271, 402415. doi: 10.1007/s00438-004-0990-z

  • 71

    ZhuJ.-w.HeX.-h.LiY.-z.ZhangY.-l.YuH.-x.XiaL.-m.et al. (2022). Genome-wide analysis of the mango SPL family and overexpression of miSPL13 confers early flowering and stress tolerance in transgenic arabidopsis. Scientia Hortic.305, 111363. doi: 10.1016/j.scienta.2022.111363

  • 72

    ZiaK.RaoM. J.SadaqatM.AzeemF.FatimaK.QamarM. T. ulet al. (2022). Pangenome-wide analysis of cyclic nucleotide-gated channel (CNGC) gene family in citrus spp. Revealed their intraspecies diversity and potential roles in abiotic stress tolerance. Front. Genet.13. doi: 10.3389/fgene.2022.1034921

Summary

Keywords

draft pan-genome, mango, NBS-LRR, multi omics, machine learning

Citation

Tahir ul Qamar M, Sadaqat M, Zhu X-T, Li H, Huang X, Fatima K, Almutairi MM and Chen L-L (2023) Comparative genomics profiling revealed multi-stress responsive roles of the CC-NBS-LRR genes in three mango cultivars. Front. Plant Sci. 14:1285547. doi: 10.3389/fpls.2023.1285547

Received

30 August 2023

Accepted

17 October 2023

Published

30 October 2023

Volume

14 - 2023

Edited by

Shaojun Dai, Shanghai Normal University, China

Reviewed by

Yang Zhiquan, Guangzhou University, China; Xin Wei, Shanghai Normal University, China

Updates

Copyright

*Correspondence: Muhammad Tahir ul Qamar, ; Ling-Ling Chen,

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics