High-quality Momordica balsamina genome elucidates its potential use in improving stress resilience and therapeutic properties of bitter gourd

Introduction Momordica balsamina is the closest wild species that can be crossed with an important fruit vegetable crop, Momordica charantia, has immense medicinal value, and placed under II subclass of primary gene pool of bitter gourd. M. balsamina is tolerant to major biotic and abiotic stresses. Genome characterization of Momordica balsamina as a wild relative of bitter gourd will contribute to the knowledge of the gene pool available for improvement in bitter gourd. There is potential to transfer gene/s related to biotic resistance and medicinal importance from M. balsamina to M. charantia to produce high-quality, better yielding and stress tolerant bitter gourd genotypes. Methods The present study provides the first and high-quality chromosome-level genome assembly of M. balsamina with size 384.90 Mb and N50 30.96 Mb using sequence data from 10x Genomics, Nanopore, and Hi-C platforms. Results A total of 6,32,098 transposons elements; 2,15,379 simple sequence repeats; 5,67,483 transcription factor binding sites; 3,376 noncoding RNA genes; and 41,652 protein-coding genes were identified, and 4,347 disease resistance, 67 heat stress–related, 05 carotenoid-related, 15 salt stress–related, 229 cucurbitacin-related, 19 terpenes-related, 37 antioxidant activity, and 06 sex determination–related genes were characterized. Conclusion Genome sequencing of M. balsamina will facilitate interspecific introgression of desirable traits. This information is cataloged in the form of webgenomic resource available at http://webtom.cabgrid.res.in/mbger/. Our finding of comparative genome analysis will be useful to get insights into the patterns and processes associated with genome evolution and to uncover functional regions of cucurbit genomes.


Introduction
Momordica balsamina (2n = 2x = 22), commonly referred to as Balsam apple, Southern Balsam pear, or African pumpkin, is a wild plant belonging to the Momordica genus within the Cucurbitaceae family (Bharathi and John, 2013).It is native to tropical regions of Africa, Asia, and Australia (Jeffrey, 1967;Mishra et al., 1986).M. balsamina has an annual to perennial life cycle and grows as a trailing herb (John, 2005;Behera et al., 2010).It grows better in hot, humid climates and prefers acidic soil (pH 5.0-6.5)(Mishra et al., 1986).Ellipsoid-shaped immature fruits of M. balsamina are rich in essential vitamins (A and C) and vital minerals (iron and calcium) (Wehner et al., 2020).Additionally, its leaves are abundant in carotenoids (Mashiane et al., 2022).These nutritionally and medicinally enriched fruits and leaves are consumed in rural areas of Africa and Asia (Flyman and Afolayan, 2007;Bharathi and John, 2013).It is one of the four Momordica species cultivated in India, primarily in the dry regions of the Northwest plains, Eastern Ghats, and Western Ghats (Peter and Abraham, 2007).
Balsam apple has the reputation of a "gifted plant" due to its richness in bioactive compounds, which offer diverse therapeutic benefits.These compounds exhibit wide spectrum of medicinal values, including anti-septic, anti-microbial, anti-bacterial, anti-viral (including anti-HIV), anti-inflammatory, anti-plasmodial, antioxidant, and analgesic properties (Hassan and Umar, 2006;Thakur et al., 2009).The extensive range of medicinal properties exhibited by M. balsamina can be attributed to its diverse array of terpenoid compounds, such as balsaminol, balsaminoside, balsaminagenins, karavilagenin, cucurbalsaminol, and balsaminapentaol (Ramalhete et al., 2009;Ramalhete et al., 2010;Ramalhete et al., 2011a;Ramalhete et al., 2011b).Numerous researches have been conducted on these compounds in order to highlight their potential medical uses.Additionally, "cucurbitacins" derived from M. balsamina were found to have selective antiproliferative activity against multidrug resistant cancer cells (Ramalhete et al., 2022).Furthermore, Balsam apple contains ribosomal-inactivating proteins (RIPs) such as Momordin II and Balsamin, which possess remarkable antiviral, anticancer, and antibacterial properties.These RIPs have found practical applications in the development of commercial drug preparations (Khare, 2007;Kaur et al., 2012;Ajji, 2016;Ajji et al., 2017).The findings from these aforementioned studies justify the immense potential of M. balsamina within the pharmaceutical industry, thus making it a subject of intense scientific research in the field of cucurbitaceous vegetable crops.
Momordica charantia, commonly known as Bitter gourd, is the most widely cultivated vegetable within the Momordica genus, renowned for its distinctive bitter taste, attributed by cucurbitanetype tri-terpenoids (Chen et al., 2005).The fruits of Bitter gourd are abundant in vitamin C and iron and exhibit high antioxidant activity (Behera et al., 2010).Beyond its culinary use, it finds extensive application in traditional medicine, alleviating stomach pain, anemia, malaria, coughs, and fever, and it is a renowned source of anti-diabetic drug in pharmaceutical industry (Tan et al., 2008;Krawinkel et al., 2018).Despite its biological and economic significance, the crop improvement and varietal development program in Bitter gourd have been hindered by the limited genetic diversity found in natural populations (Dhillon et al., 2016).Furthermore, bitter gourd, being a crop of tropics and subtropics, is affected by various biotic and abiotic stresses.To overcome these obstacles, there is a critical need for diverse and valuable genetic resources to facilitate the development of elite highyielding and resilient bitter gourd varieties (Cui et al., 2020).
Among the seven Momordica species found in India, M. charantia and M. balsamina are the only two species with monoecious sex expression.These two species share same basic chromosome number of x = 11 and exhibit similar frequencies and ranges of bivalent and chiasmata formation.This high karyomorphological similarity indicates a close ancestral relationship between these two species (Trivedi and Roy, 1972;Singh, 1990;Bharathi et al., 2011).M. balsamina, in particular, is considered the closest wild relative that can be crossed with Bitter gourd, falling under the II subclass of the primary gene pool of Bitter gourd (Bharathi et al., 2012).M. balsamina also possesses a high level of tolerance to like pests such as ladybird beetle (Epilacna septima), pumpkin caterpillar (Margaronia indica), red pumpkin beetle (Aulocophora fevicoli), gall fly (Lasioptera falcata), root-knot nematode (Meladogyne incognita), and diseases such as yellow mosaic and little leaf disease, making it an invaluable genetic resource for the improvement of M. charantia (Rathod et al., 2021).Hence, in addition to medicinal attributes, M. balsamina can serve as a potent genetic source of biotic stress resistance.
Interspecific hybridization has proven to be a successful method for harnessing natural genetic variation and transferring desirable genes from wild relatives to cultivated crops (Bowley and Taylor, 1987;Dempewolf et al., 2017).In the Cucurbitaceae family, successful inter-specific hybrids have been developed within and between wild and cultivated taxa (Weeden and Robinson, 1986;Singh, 1991;Robinson and Decker-Walters, 1997).Likewise, there is great potential for the transfer of beneficial genes from M. balsamina to M. charantia for the genetic improvement of Bitter gourd.Previous studies have reported partial cross-compatibility between M. charantia and M. balsamina, resulting in progenies exhibiting normal meiosis (Singh, 1990;Bharathi et al., 2012).Recently, a detailed study on crossability involving 116 diverse Bitter gourd genotypes demonstrated success in six cross-combinations (Rathod et al., 2021).The study also confirmed the partial introgression of chromosome segments from M. balsamina into the Bitter gourd genome through morpho-cytological and molecular analysis of interspecific hybrids between M. charantia cv.Pusa Aushadhi × M. balsamina and their advanced generations (F 2 and backcross generations).These findings suggest the possibility of transferring genes or traits related to biotic resistance and medicinal properties from M. balsamina to M. charantia, producing high-quality and resistant Bitter gourd varieties.
The era of genomics-assisted vegetable breeding commenced with the completion of the cucumber whole genome assembly in 2009 (Huang et al., 2009).In 2016, the first draft genome of Bitter gourd was published (Urasaki et al., 2017), followed by subsequent high-quality, chromosome-level assemblies (Cui et al., 2020;Matsumura and Urasaki, 2020).With advancements in sequencing technologies and bioinformatics tools, genomic data for flowering plants has been expanding rapidly (Chen et al., 2018), and genome assemblies for most cultivated cucurbits are now available in the public domain.Presently, there is a focus on genome characterization of closely related cross-compatible crop wild relatives (CWRs).
CWRs serve as a dynamic gene pool to access vital genetic diversity needed for crop improvement.Earlier, molecular techniques were used to characterize CWR (Dillon et al., 2007a;Sotowa et al., 2013).Now, advanced next-generation sequencing (NGS) platforms can be utilized for genome characterization of CWR to study phylogeny and discover useful genes in order to support agriculture and food security (Brozynska et al., 2016).Several wild relatives of tomato (Sato et al., 2012), brinjal (Gramazio et al., 2019), potato (Aversano et al., 2015), and sweet potato (Wu et al., 2018) have already been sequenced.In the current study, we present first high-quality genome assembly of M. balsamina a, close relative of bitter gourd that can be a vital genetic resource to improve medicinal value and stress resistance in bitter gourd.

Sample collection and DNA extraction
Young leaf samples of M. balsamina (IC-467683) weighing around 10 g were collected for DNA isolation from 30-day-old seedlings at the active vegetative stage during the early morning hours.The collected leaf samples were packed immediately in aluminium foil, frozen into liquid nitrogen and stored at −80°C.Total DNA was isolated using the modified cetyl trimethyl ammonium bromide (CTAB) method (Saghai-Maroof et al., 1984).The genomic DNA samples were adjusted to 50 ng DNA/ µL and stored at 4°C until used for sequencing.The quality and quantity of the extracted DNA were estimated with an Eppendorf Biospectrometer confirmed by running on 0.8% w/v agarose gel.

10x genomics sequencing and library preparation
High-molecular weight DNA (1.25 ng) was loaded onto a Chromium Controller chip, along with 10x Chromium reagents and gel beads following manufacturers recommended protocols.Initial library construction occurred within droplets containing Gel Beads-in-Emulsion (GEMs) beads with unique barcodes.The library construction incorporated a unique barcode adjacent to read one.All molecules within a GEM got tagged with the same barcode.However, because of the limiting dilution of the genome (roughly 300 haploid genome equivalents), the probability that two molecules from the same region of the genome were partitioned in the same GEM was minimal.Thus, the barcodes were used to associate short reads with their source long molecule statistically.The resulting library was sequenced on Illumina HiSeq X Ten sequencer (San Diego, CA, USA) as per the manufacturer's protocol to produce 2 × 150 paired-end sequences.The entire process was performed on four replicates; thus, four pair-end libraries were prepared.

NanoPore sequencing and library preparation
First, 05-µg genomic DNA was sheared to approximately 15,000 bp by centrifugation at 5,200 rpm in a gTUBE.DNA was repaired with damage repair reagent and end-repaired using endrepair mix before ligation to nanopore blunt end adapter.Unligated material was digested with Exo III and Exo VII.Then, 12-25 Kb library fragments were purified via two consecutive Ampure cleanups, and size selection was done on Blue Pippin (SageScience, Beverley, MA, USA) with a 0.75% agarose cassette.An aliquot of 20 picomol of the final library was loaded onto the flow cell and sequenced on machine MinION (Oxford Nanopore Technologies, Oxford Science Park, United Kingdom) using Oxford Nanopore sequencing kit 2.0 and improved instrument workflow (Instrument Control Software 4.0).

Hi-C sequencing and library preparation
Fresh and young leaf samples were collected and cross-linked for 10 min with a 1% final concentration of fresh formaldehyde and quenched with a 0.2 M final concentration of glycine for 5 min.The cross-linked cells were subsequently lysed in lysis buffer.The extracted nuclei were re-suspended with a 150-µL 0.1% Sodium dodecyl sulfate (SDS) and incubated at 65°C for 10 min.Furthermore, they were quenched by adding 120 µL of water and 30 µL of 10% Triton X-100 and incubated at 37°C for 15 min.The DNA in the nuclei was digested by adding 30 µL of 10x NEB buffer 2.1 and 150 U of Mbol and incubated at 37°C for 12h.This was followed by inactivation of Mbol enzyme at 65°C for 20 min and filling of cohesive ends by adding 1 µL of each 10 mM deoxythymidine triphosphate (dTTP), deoxyadenosine triphosphate (dATP), and deoxyguanosine triphosphate (dGTP), 2 µL of 5 mM biotin-14-deoxycytidine triphosphate (dCTP), and 4 µL (40 U) Klenow and after that incubated at 37°C for 2h.To start proximity ligation, 120 pL 10x blunt-end ligation buffer, 100 pL 10% Triton X-100, and 20U T4 DNA ligase were added and held at 16°C for 4h.This was followed by reversing of the cross-linking with 200 ug/mL proteinase K (Thermo Fisher Scientific) at 65°C for 12h.Furthermore, chromatin DNA manipulations were performed using a method described by Belaghzal et al. (2017), followed by DNA purification using QIAamp DNA Mini Kits (Qiagen) and shearing of purified DNA in length of 400 bp.Dynabeads MyOne Streptavidin C1 (Thermo Fisher Scientific) was used to pull down point ligation junctions.NEB Next Ultra II DNA library Prep Kit for Illumina (NEB) was used to prepare Hi-C library for Illumina sequencing.The final library was sequenced on the Illumina HiSeq X Ten platform (San Diego, CA, USA) as per the manufacturer's protocol with 2 × 150 paired-end mode.

Data pre-processing and genome assembly
All the raw reads of 10x Genomics, Nanopore and HiC libraries used in the present study have been submitted in National Center for Biotechnology Information (NCBI) with SRA IDs SRR21495983, SRR21495982, and SRR21495981, respectively.Figure 1 shows the outline followed during the present study.Prior to assembly, reads of these libraries were cleaned using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/ fastqc: Andrews, 2010) by removing low quality reads at < 20 phred score, followed by adapter cleaning using TrimGalore (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ ).De-novo genome assembly was performed using all the 10x Genomics libraries of four replicates using Supernova v2.1.1 (Weisenfeld et al., 2017).After this, Nanopore libraries were mapped on de-novo genome assembly for further scaffolding using npScarf (Cao et al., 2017).Finally, HiC libraries were mapped on improved genome assembly using Juicerv1.5(Durand et al., 2016) to obtain the de-duplicated alignment file.Furthermore, scaffolding, editing, and polishing of assembly was performed using 3dDNA v180419 (Dudchenko et al., 2017).Finally, identification of chromosomes and editing of miss-assembly was performed using JuiceBox v1.11.08 (Robinson et al., 2018) to construct contact maps for chromosomes.Genome polishing was performed on final assembly using Pilon (Walker et al., 2014).

Validation of chromosome level assembly
To assess the quality of the assembled genome, assembly statistics were calculated using QUAST (Gurevich et al., 2013).Furthermore, validation of assembly was performed using BUSCO (Simao et al., 2015) to find the completeness and contamination within genome assembly.A comparative study of M. balsamina genome assembly with other related species, such as Momordica charantia, Citrullus lanatus, Cucumis sativus, and Cucumis melo was also performed.

FIGURE 1
Over-view of pipeline adopted in the study.Vinay et al. 10.3389/fpls.2023.1258042Frontiers in Plant Science frontiersin.org

Genome annotation
For genome annotation, a series of bioinformatics tools were employed.First, repeat regions of the assembled genome were masked using RepeatMasker v4.1.0(http://www.repeatmasker.org/RMDownload.html).This was followed by the identification of transposable elements (TEs) using RepeatModeler (http:// www.repeatmasker.org/RepeatModeler/) to find LINE, SINE, Simple Repeats, LTR elements, DNA elements, and so forth.The ncRNAencoding genes were also identified from the assembled genome.Furthermore, tRNAs were identified using tRNA scan-SE v. 1.3.1 (Chan and Lowe, 2019) with < 1 false positive per 15 gigabases.Other ncRNAs, such as microRNAs, snRNAs, rRNAs, and spliceosomal RNAs, were also identified using INFERNAL v1.1.4 (Nawrocki and Eddy, 2013) at default parameters., Protein-encoding genes were predicted using SEQing v0.1.45(Lewinski et al., 2020), which is an automated pipeline of self-trained hidden Markov models (HMM) models and transcriptomic data for gene prediction by Glimmer HMM, SNAP, and AUGUSTUS and combining their results by MAKER2 in association with transcriptomic evidence of Momordica charantia.Finally, the predicted genes passed through Cluster Database at High Identity with Tolerance (CD-HIT) (Limin et al., 2012), clustering at 90% sequence similarity to extract nonredundant genes.Extraction of Single Sequence Repeat (SSR) markers was performed using MIcroSAtellites (MISA) (Beier et al., 2017), considering mononucleotide repeats motif with at least 10 repeats, dinucleotide with six, tri-, tetra-, penta-, and hexa-nucleotide with five repeats (Thiel et al., 2003).Compound microsatellites were defined as those with the interval between two repeats motifs ≤100 nucleotides in the previous reports (Zhao et al., 2017).Furthermore, primers were also designed for each of the SSR makers using Primer3 (Untergasser et al., 2012) with parameters 18-27 bp primer length, 57°C-63°C melting temperature, 30%-70% GC content, and 100-300 bp product size.Transcription factor (TF) binding sites were extracted using PlantRegMap (Jin et al., 2017).

Functional annotation of proteincoding genes
The predicted protein-coding genes were mapped against the NR database (updated May 2020) and the plant TF database (version 5.0) using NCBI blast (version 2.2.29+) (Lipman and Pearson, 1985) for functional annotation.Furthermore, gene ontology (GO) analysis was performed on predicted genes using Blast2GO (Conesa et al., 2005).Pathway analysis was performed using Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (Erxleben and Grüning, 2020).
2.9 Disease resistant, defence, stress, and sex expression-related genes Disease resistant genes were identified by mapping proteins against the PRGDB database v.4.0 (Garcıá et al., 2021) with e-value cutoff of 1e-10 using BLAST (NCBI 2.2.29+) (Lipman and Pearson, 1985).Along with resistance genes, genes related to salt stress, heat stress and sex expression were also extracted.

Orthologous genes, phylogenetic, and synteny analysis
M. balsamina genes, orthologous with M. charantia, Citrullus lanatus, Cucumis sativus, and Cucumis melo, were predicted using OrthoMCL (Chen et al., 2006) based on a Markov Cluster algorithm to group (putative) orthologs utilizing all-against-all BLAST ( Lipman and Pearson,1985) comparisons among protein sequences of considered species.For the detection of synteny between M. balsamina genome assembly and four other genome assemblies of abovementioned species was performed by SyMAP v4.2 (Soderlund et al., 2011).Synteny blocks shown as colored ribbons between the chromosomes arranged in circle were visualized in the circular plots using Circos (Krzywinski et al., 2009).Furthermore, micro-synteny, a synteny in small regions, identified between two or more genomic regions was performed between of M. balsamina and M. charantia genomes using McScan python version (Tang and Krishnakuar, 2015).Furthermore, a phylogenetic study was also performed among genome assemblies of M. balsamina, M. charantia, Cucumis melo, Citrullus lanatus, and Cucumis sativus.First, a multi-sequence alignment (MSA) was performed among genome assemblies using Multiple Alignment using Fast Fourier Transform (MAFFT) (Katoh et al., 2002).Later, a distance matrix was calculated among assemblies using MSA with Poisson correction method, >70% site coverage, and <30% alignment gaps, missing data, and ambiguous bases by ClustalW2 (Thompson et al., 1994).Finally, a phylogenetic tree was constructed using Neighbor-Joining method by ClustalW2.

Development of M. balsamina genomic resource
A web-genomic resource for M. balsamina, named MbGeR, was developed using all the results obtained from the genomic data analyses performed in the present study.MbGeR catalogs the information related to molecular markers such as SSRs, transposons elements (TEs), TF sites, ncRNAs and genes.It is based on a three-tier architecture, namely, client tier, middle tier, and database tier, developed using PHP, MySQL, HTML, and Apache.Web pages are developed using PHP and HTML in order to browse MbGeR and put up queries by users in client tier.All the information regarding transcripts, Differentially Expressed Genes (DEGs), markers, and so forth.are placed in different tables in MySQL database in the database tier.The scripting of client query page was done in PHP and HTML for execution and fetching in the middle tier.The web hosting was performed using Apache server.The bitter melon web resources are available at http://webtom.cabgrid.res.in/mbger/.

Data pre-processing, genome assembly, and comparative analysis
In the present study, the whole genome of M. balsamina was assembled using reads obtained from three different platforms: Oxford Nanopore, 10 X and Hi-C.A combination of multiple technologies is reported to improve the quality and completeness of genome assembly (Wang et al., 2023).An average of 27,767,526; 2,331,456; and 168,098,715 reads were accessed in 10x Genomics, Nanopore, and Hi-C libraries, respectively after pre-processing and quality check.Supplementary Table S1 shows the detailed read statistics in different replicates and their average length in all three libraries.GC% was 39 for 10x Genomics and Hi-C read libraries, while Nanopore reads had 35% GC content.
De-novo genome assembly was generated using 10x Genomics libraries followed by mapping of Nanopore libraries onto de-novo genome assembly for further scaffolding.The nanopore raw read size ranged from 1000 bp to 222917 bp, with N50 (minimum length representing half of the total length of the assembly) as 26.08 Kb and 15.29 Mb for raw reads and scaffolds, respectively.Then, reads from HiC libraries were used for chromosome-level scaffolding, which is considered as the best choice for capturing the longest range DNA connectedness (Wang et al., 2023).
The genome assembly of M. balsamina and its assessment was found to have 3,710 scaffolds of 384,902,967 bp length and N50 of 30,984,295 bp (Table 1).BUSCO analysis, which uses universal single-copy orthologs, is considered as high-resolution quantifications of genomes, which facilitate informative comparisons and provides suggestions for improvements to assemblies or annotations (Simao et al., 2015).Assessment of this generated assembly shows 2,266 (97.4%) of 2,326 BUSCO to be complete and single copy (Table 1).The comparative statistics of M. balsamina assembly with other assemblies of related species showed the assembly size to be comparable with others while the N50 value (30.96 mb) was much improved than other assemblies (Table 2).

Annotation of genome assembly
Genome annotation is crucial to facilitate the utilization of assembled genomes in genetic studies.In the current study, homology-based inference, in-silico prediction techniques and merged transcriptomics data (of Momordica charantia) are merged into a single concordant annotation (Yandell and Ence, 2012).Genome annotation was done to identify TEs, ncRNA encoding genes, tRNAs, ncRNAs, SSR makers, TF binding sites and protein-encoding genes in the assembled genome.
Out of the total 384,902,967 bp length of 3,710 scaffolds of the assembled genome, 218,862,155 (56.73%) bases were masked.Frequencies of various classes of predicted TEs in genome assembly are delineated in Table 3.A significant proportion of TE class belonged to LTR elements, while 22.29% were found to be unclassified.The frequency of SINEs was the least (0.05%), while it was 3.02% for SINEs.A sum of 567,483 TF binding sites were predicted in M. balsamina genome and Figure 2A is showing chromosome wide distribution of TF binding sites.Maximum number of TF binding sites were observed in chromosome number 2 (~12%), followed by chromosome number 1 (~9%) and chromosome number 11 (~9%).Almost ~12% of TF binding sites were associated with the remaining unknown scaffolds (Figure 2A).A total of 2,15,379 SSR markers were mined from the assembled genome.The highest number of SSR belonged to motif type mononucleotide (~69%), followed by di (~13%) and tri (~6%).A total of 29,618 (~9%) SSRs were compound type (Figure 2B).A total 3,376 different non-coding RNA genes were predicted in M. balsamina assembly, out of which 1,823 tRNA, 270 rRNA, 150 microRNA, 961 snoRNA, 27 SRP RNA, and 129 spliceosomal RNA genes were predicted (Table 1).Out of the total 1,823 predicted tRNA genes in M. balsamina assembly, their frequency distribution over chromosome 1 was highest, followed by chromosomes 3 and 2. A minimum number of tRNA genes were observed in chromosome 4 (Figure 2C).Apart from the chromosomes, higher number of tRNA genes were found localized on unknown scaffolds.Figure 2D shows the frequencies of protein-coding genes distributed over various chromosomes along with 74 pseudogenes predicted in M. balsamina assembly.It was observed that a higher number of protein-coding genes were found on chromosomes 1 (4,592), followed by chromosome 2 (4,410) and 3 (3,909).

Functional annotation of protein-coding genes
Functional annotation of protein-coding genes yielded a total of 33,450 genes that were annotated with NR database.GO analysis of these annotated genes showed 52 GO terms to be associated with 20,525 genes, of which 16, 12, and 25 were from cellular component, molecular function, and biological process classes, respectively.The GO terms were categorized into three classes, namely, molecular function, biological functions and cellular components.Figure 3A shows the GO terms associated with more than five protein-coding genes predicted in M. balsamina assembly.It was found that the GO terms named binding activities (11,892) followed by the catalytic activities (9,604) and transporter activities (889) were associated with most genes in molecular function class.In biological processes, cellular processes GO term (8,650) was the most frequent in genes, followed by metabolic processes (8,458) and biological regulations (1,252).Cell (5,026), cell part (5,026), and membrane (4,850) GO terms were the most frequent in cellular component class (Figure 3A). Figure 3B shows the top 10 KEGG pathways associated with 3,414 annotated genes in M. balsamina assembly.It was found that metabolic pathways (>1,500 genes involved) were the most abundant pathway, followed by biosynthesis of secondary metabolites (~700 genes involved) and microbial metabolism (~250 genes involved) in diverse environments.

Genes related to plant defence, medicinal properties, and sex expression
M. balsamina is well-known for its biotic and abiotic stress tolerance and medicinal properties.In the M. balsamina assembly, a total of 4,347 important disease resistance genes (R genes) were identified, out of which 1,174 genes encoded for nucleotide-binding site-leucine-rich repeat (NBS-LRR) domains along with 858 RLP and 273 RLK encoding genes, which are well known in resistance response in plants.In our study, we identified 67 heat stress-related genes, including a total of 34 heat stress factor genes (HSFs), contribute to thermo-tolerance through the regulation of heat shock proteins (HSPs).In addition, 29 HSP genes, predominantly encoding HSP70, and 17 small heat proteins (HSP20) were identified in the M. balsamina assembly.Similarly, 15 genes encoding proteins related to salinity tolerance in the M. balsamina assembly, including alkaline ceramidase (ACER), Sacyltransferase, salt stress root protein RS1-like, and protein RICE SALT SENSITIVE 3 isoforms were identified.Cucurbit crops are considered as models for deciphering the mechanism of sex determination in monoecious plant species and ethylene is considered to be the core regulator.To shed more light on this, in the current study, 06 genes related to ethylene biosynthesis were extracted.M. balsamina contains a diverse array of Cucurbitacin terpenoid compounds exhibiting anti-septic, anti-microbial, antibacterial, anti-viral (including anti-HIV), anti-inflammatory, antiplasmodial, antioxidant, and analgesic properties (Thakur et al., 2009;Ramalhete et al., 2022).The genes related to terpenoid biosynthesis were searched in the genome to elucidate the mechanism behind the medicinal property exhibited by this species.Thirty-seven antioxidant activity related and 229 genes related to the biosynthesis of cucurbitacin, the key factors behind medicinal attributes of the M. balsamina, were detected.Table 4 shows the frequencies of genes extracted with provided functions.GO terms of pathogenesis-related genes, heat tolerant genes, salt tolerance-related genes, sex determination-related genes, triterpenoid-related genes, cucurbitin-related genes, nutritionrelated genes, and phloem-related genes are graphically represented in Supplementary Figure S1.

Orthologous genes, phylogenetic, and synteny analysis
Comparative genetic parameters such as orthology, synteny, and phylogeny were utilized in the study to understand the genome composition, evolution and relatedness among the members of a

A B
(A) GO terms associated with predicted protein coding genes and (B) top 10 KEGG pathways associated with annotated protein coding genes in M. balsamina assembly.
family or clade at the nucleotide/molecular level.A total of 1,542 genes of M. balsamina were found orthologous with other related species considered in the present study.Frequencies of these genes are provided in Table 5 along with the species with which these are found orthologous.The unique and overlapping M. balsamina genes found orthologous in other related species are delineated in Figure 4A.It is observed that 165, 159, 953, and136 M. balsamina genes were orthologous in Cucumis melo, Citrullus lanatus, M. charantia, and Cucumis sativus, respectively, only and the rest of the genes were orthologous in more than two species.The syntenic relationship analyses of M. balsamina with other species were performed.In the synteny analysis, the sequences of related species were aligned, and conserved genes between the two genomes were identified as anchors, and then regions with more than seven anchors connecting two species were considered as synteny blocks.Frequencies of orthologous genes and syntenic blocks of M. balsamina with related species, M. charantia, Citrullus lanatus, Cucumis sativus, and Cucumis melo were found to be (8845, 306), (8308, 264), (8265, 245), and (8092, 282), respectively (Table 5).Also, the diagrammatic representation of syntenic blocks in the form of Circos figures is provided for synteny between M. balsamina and Cucumis sativus, M. balsamina and Cucumis melo, M. balsamina and Citrullus lanatus, M. balsamina and M. charantia (all scaffolds), and M. balsamina and M. charantia (scaffolds >100Mb), respectively (Supplementary Figures S2A-E).A general absence of a one-to-one relationship in the chromosomes between the Momordica balsamina and other cucurbit genomes was observed.However, syntenic loci of one chromosome of Momordica balsamina chromosome exhibited a syntenic relationship between one or two chromosomes of studied   S3A-K show homologous genes on chromosomes 1-11 of M. balsamina with syntenic relationship with corresponding scaffolds M. charantia.The rooted phylogenetic tree was constructed to represent the phylogenetic relationship of M. balsamina with other related species, namely, M. charantia, Cucumis melo, Cucumis sativus, and Citrullus lanatus (Figure 4B).M. balsamina was observed to be more closely related to M. charantia.

Development of M. balsamina webgenomic resource
A web genomic resource for M. balsamina, named MbGeR, was developed from the output obtained after genomic data analyses of M. balsamina genome in the present study.Its web interface includes a home page with an introduction to MbGeR with horizontal and vertical tabs including statistics, SSRs, TEs, TF sites, ncRNAs, genes and team, each of which is linked to their respective pages (Figure 5).The statistics page provides summary statistics of data provided in genome resources in the form of histograms.Users are provided with flexible options to select SSR data on the desired 11 chromosomes of M. balsamina along with desired motifs on SSRs page.Users can choose TEs from the TEs page according to their desired types and chromosome numbers.TF sites provide options to choose TF binding sites on the desired chromosome.On the ncRNAs page, users can select non-coding RNAs among the various types.Gene's page has two options: (i) selection of chromosomes for all genes extracted from the genome and (ii) choice of extracted genes associated with a certain function.Once the desired options are submitted on each of the mentioned the page, the output is displayed in tabular form in desired combinations of options.The Team page provides information and hyperlinked profiles of the team members involved in the study.The bitter gourd web resources, MbGeR is available for non-commercial use for research community at http:// webtom.cabgrid.res.in/mbger/.

Discussion
CWRs are the primary source of diversity for utilization in crop improvement.Specifically, in crops with narrow genetic bases, the lack of diversity becomes the major bottlenecks in breeding program.To address the issue, close wild relatives inter-fertile with the cultivated crop species can be used as extended gene pool in crop improvement (Brozynska et al., 2016).CWRs evolve continuously in the natural environment and, hence, serve as a dynamic resource to access desirable genes to overcome several challenges in agriculture posed by increasing human population and climate change.Several workers have documented the wide-scale use of CWR to enhance agriculture production    Maxted et al., 2012;Fitzgerald, 2013;Dempewolf et al., 2014;Kell et al., 2015).It was estimated that about 30% of increased crop yields in the late 20th century can be attributed to the use of CWR in plant breeding programs (Pimentel et al., 1997).Hence, there is an increased need for the conservation and characterization of wild germplasm to utilize in crop improvement programmes.Molecular tools [e.g., simple sequence repeat (SSR) markers or microsatellites] were used in the past to characterize the CWR and to establish the relationship between wild and domesticated species (Dillon et al., 2007a;Sotowa et al., 2013).Recent DNA sequencing technology advancements increase opportunities to understand species at the whole-genome level (Edwards and Henry, 2011).Hence, genomic tools serve as the best strategy to characterize CWR and elucidate phylogenetic relationships between species, so that wild genetic diversity can be used in crop improvement (Kasem et al., 2010).M. balsamina, Balsam apple is the closest wild species with cross-compatibility with M. charantia, exhibits greater tolerance to biotic stress, and possesses medicinal qualities (Rathode et al., 2021).Therefore, it is an ideal candidate species for harnessing natural variation within the primary gene pool and transferring desirable genes to cultivated M. charantia.Hence, genome characterization of this species proved vital for its usage in future breeding programs.In this study, we present the world's first highquality chromosome-level genome assembly of M. balsamina, with a genome size estimate of 384.90 Mb and an N50 of 30.96Mb.This study used reads from multiple platforms (Oxford Nanopore, 10 X and Hi-C), which facilitates chromosome-level scaffolding with improved base accuracy.This assembly will facilitate targeted gene introgression between M. balsamina and M. charantia, enhancing tolerance and medicinal properties.Furthermore, this assembly, a combination of multiple technologies, can be used to improve further the quality and completeness of genome assembly of related species (Wang et al., 2023).
Approximately 89.44% (345 Mb) of the assembly was anchored on 11 chromosomes, while the remaining scaffolds remained unlocalized.The quality of this assembly, based on the N50 value, surpasses that of previously published assemblies for other members of the Cucurbitaceae family, such as cucumber (Huang et al., 2009), melon (Garcia-Mas et al., 2012), watermelon (Guo et al., 2013), and bitter gourd (Cui et al., 2020 andMatsumura andUrasaki, 2020).Additionally, the BUSCO analysis revealed that the M. balsamina assembly contains 97.4% conserved core genes, a higher percentage compared to other M. charantia assemblies [M.cDali-11 (95.9%),TR (95.5%), and OHB3-1 (82.20%)] and related species: C. lanatus (86.50%), C. melo (86.9%),Cucurbita pepo (92.8%), C. sativus (94.8%), and Lagenaria siceraria (88.2%) assemblies (Cui et al., 2020).Gene space completeness as measured by single-copy standards, including universal singlecopy orthologs (BUSCOs) and core gene families (CoreGFs) are widely used for evaluating genome assembly and annotation for its completeness and quality (Vaattovaara et al., 2019).Using estimates of gene content from hundreds of species and guided by evolution, BUSCO assessments provide comprehensible metrics to assess the completeness of genome and.hence it is considered high-resolution quantifications of the genomes (Simao et al., 2015).Therefore, with a high BUSCO score (97.4%), this assembly provides a comprehensive representation of the M. balsamina genome and serves as a valuable reference for studying the genome architecture and evolution of related cucurbits, including its closest cultivated species, M. charantia.The assembled genome of M. balsamina will aid in the identification of a greater number of genome-wide markers, allowing for the specific and accurate tracing of introgressed segments, which is crucial in interspecific introgression breeding, as reported by Qin et al. (2021).The assembly also revealed the presence of 632,098 TEs; 215,379 SSRs; 3,376 noncoding RNAs (ncRNAs); 567,483 TF binding sites; and 41,652 protein-coding genes.Many of these genes are associated with disease resistance (4421), heat stress tolerance (67), salt stress tolerance (15), carotenoid biosynthesis (05), cucurbitacin biosynthesis (229), terpenes related (19).antioxidant activity (37), and sex determination (06).Identifying these genes provides insights into the defence mechanisms, nutritional properties, and stress responses of M. balsamina.
TEs are well recognized for their role in genome evolution and regulation, providing alternative promoters, novel exons, neofunctionalization, and extensive rearrangements (Hoen & Bureau, 2015).A Comparison of our study's assembly with recent studies on M. charantia assemblies by Cui et al., 2020, andMatsumura andUrasaki, 2020, revealed an improvement in the genome size of approximately 95 Mb and 84 Mb, respectively.This enhancement could be attributed to a higher repeat content in the M. balsamina genome than M. charantia.Our findings supported this hypothesis, as we observed that 56.73% (218.86Mb) of the M. balsamina assembly was masked as TEs, which was higher than the percentages reported for M. charantia (52.52%), cucumber (20.8%), watermelon (39.8%), and muskmelon (35.4%) assemblies.LTR repeat content (26.82%) was the most abundant in M. balsamina genome.Higher LTR repeats are a common feature of cucurbit genomes evident from genomes of cucumber, melon, and watermelon (Huang et al., 2009;Garcia-Mas et al., 2012;Guo et al., 2013).In addition to this, the findings of the current experiment support the results of the previous studies on genome characterization of bitter gourd done by Urasaki et al., 2017 andCui et al., 2020, which reported a higher accumulation of repeat content in the Momordica genus compared to Cucumis and Citrullus, particularly LTR repeats.However, LTR repeat content in the M. balsamina genome was less than in the Watermelon (30.5) and Bottle gourd (39.8).Earlier studies also speculated a differential rate of LTR retro transposon accumulation in the cucurbits as the reason behind the difference in the genome size among cucurbits.For instance, a higher accumulation of LTR retrotransposons is found in sponge gourd (Wu et al., 2020) and watermelon genome (Guo et al., 2013) than in cucumber (Huang et al., 2009).Hence, with absence of WGD (whole genome duplication) in cucurbits, TE might be playing vital role in genome expansion (Wu et al., 2020).
In our study, 3,376 noncoding RNA genes were annotated in the M. balsamina assembly.Extracted miRNAs, tRNAs, rRNAs, and other noncoding genes can be important resources for further studies.Additionally, we predicted 41,652 protein-coding genes in the M. balsamina assembly, a number comparable to the M. charantia OHB3-1 assembly (45859) by Urasaki et al. (2017), and significantly higher than the assemblies of M. charantia Dali-1 (26,427) by Cui et al., 2020, cucumber (26,682) by Huang et al. (2009), melon (27,427) by Garcia-Mas et al. (2012), and watermelon (23,440) by Guo et al. (2013).The variation in gene numbers could be attributed to the utilization of different transcript information during the annotation of genome assemblies or the loss of genetic diversity due to the domestication of cucurbits.Functional annotation of the protein-coding genes in our study revealed the presence of essential genes associated with detoxification, antioxidant activity, toxin activity, response to stimuli, immune system processes, defence, nutrient reservoir activity, and nutritional properties.These genes were also associated with pathways such as biosynthesis of secondary metabolites, plant hormone signal transduction, and protein processing in the endoplasmic reticulum.
M. balsamina is resistant to significant pest and diseases affecting cucurbits (Rathod et al., 2021).To understand the molecular basis for pest and pathogen resistance three major classes of R/resistance genes were searched in the genome.In the M. balsamina assembly, we identified 4,347 disease resistance genes (R genes), out of which 1,174 genes encoded NBS-LRR domains.These genes were grouped into two subfamilies based on the presence of either the toll/interleukin-1 receptor (TIR) domain or the coiled-coil (CC) domain at the N-terminal region, as described by Tameling et al. (2002).Additionally, we identified 858 RLP and 273 RLK encoding genes involved in conferring resistance response.These genes, such as Cf family proteins in tomatoes conferring resistance against Cladosporium fulvum fungus (Jones et al., 1994;Thomas et al., 1997) and HcrVf2 in apples conferring apple scab resistance (Belfanti et al., 2004), were found in lower numbers compared to melon and cucumber.The number of R genes identified in M. balsamina was much higher than reported in bottle gourd, watermelon, cucumber, and melon.However, cucurbits generally have fewer NBS-LRR encoding genes than Arabidopsis (Baumgarten et al., 2003) and rice (Goff et al., 2002).Only 61 NBS containing resistance were found in the cucumber genome (Huang et al., 2009).Likewise, out of 411 genes associated with disease resistance in melon only 81 disease resistance genes encoded NBS, the LRR and the TIR domains (Garcia-Mas et al., 2012).Similarly, only 44 NBS-LRR genes were found in watermelon genome (Guo et al., 2013).So, in general, Cucurrbitaceae genomes possess comparatively a smaller number of R genes encoding NBS-LRR proteins (Lin et al., 2013).Hence, other mechanisms might be involved in stress response.For instance, in cucumber and LOX gene family expansion is speculated as the possible complementary mechanism to cope with pathogen invasion (Huang et al., 2009).However, in M. balsamina, it seems the defence mechanisms works through the involvement of "R" genes like the majority of crop plants.The variation in the number of R genes in cucurbits suggests that they are not conserved, and the differential expansion of NBSencoding families could be attributed to segmental and wholegenome duplications during the evolution of plant species, as suggested by Wang et al. (2009).The higher number of R genes in the M. balsamina assembly suggests their potential use in improving resistance to a wide variety of prevalent biotic stresses in its closest relative, M. charantia.
In our study, we identified a total of 34 HSFs in the M. balsamina assembly, which was higher than the numbers reported for rice (25) by Chauhan et al. (2011), Arabidopsis (21) by Nover et al. (2001), and cucumber (23) by Chen et al. (2021).Among these genes, the primary heat stress factors identified were HSFB1 (01), HSFA2 (03), HSFA4 (04), HSFB4 (04), and HSF-A6 (04), which contribute to thermo-tolerance through regulating HSPs as described by Ohama et al. (2017).Additionally, we identified 29 HSP genes, predominantly encoding HSP70, and 17 small heat proteins (HSP20) in the M. balsamina assembly.HSPs play an essential role in the regulation of HSFs and, subsequently, the expression of heat-responsive genes associated with heat tolerance.HSP20 has been reported to contribute to heat stress tolerance in melon (Zheng et al., 2021), watermelon (He et al., 2019), cucumber (Chen et al., 2021), and pumpkin (Hu et al., 2021).Over-expression of HSP70 has also been reported to significantly increase heat tolerance in watermelon, cabbage, and chilli (Park et al., 2013;Guo et al., 2015;Usman et al., 2015;Zhao et al., 2018;He et al., 2019).Therefore, the thermo-tolerance capacity of M. balsamina can be attributed to the identified important HSPs, which can be further functionally validated for future use.Similarly, we identified 15 genes encoding proteins related to salinity tolerance in the M. balsamina assembly, including ACER, S-acyltransferase, salt stress root protein RS1-like, and protein RICE SALT SENSITIVE 3 isoforms.These proteins have previously been reported to play a role in salinity tolerance in Arabidopsis by Wu et al. (2015) and in wheat by Kang et al. (2012).However, their role in salt tolerance in cucurbits has yet to be well documented.These identified genes with a possible role in salt tolerance can be further studied to understand the detailed physiological and molecular network associated with salt tolerance and improve the salt tolerance of related species through inter-specific introgression.Additionally, we found 37 glutathione S-transferase (GST) family genes in M. balsamina, which are vital antioxidant enzymes involved in reducing the damage caused by reactive oxygen species during abiotic stress (salt, drought, and cold) tolerance mechanisms (Venkateswarlu et al., 2012;Chan and Lam, 2014;Islam et al., 2019;and Song et al., 2021).GSTs are also involved in detoxification processes and protection against damage from various environmental factors (Dixon et al., 1998;(Esmaeili et al., 2009).The large number of identified GST family genes in M. balsamina suggests its high tolerance to abiotic stress, which can be harnessed to improve abiotic stress tolerance in M. charantia.
In the M. balsamina assembly, we identified five genes related to carotenoid biosynthesis, including chloroplast-specific lycopene beta-cyclase, phytoene desaturase/phytoene dehydrogenase, prolycopene isomerase, zeta-carotene desaturase, and lycopene epsilon cyclase.The overexpression of one or more carotenoid biosynthesis genes to produce carotene-rich varieties has been successfully employed in advanced vegetable improvement programs for crops such as tomatoes (Fraser et al., 2001), carrot (Fraser and Bramley, 2004), and potatoes (Diretto et al., 2007).Carotenoids contribute to color, serve as precursors of vitamin A, and have various health benefits, including reducing the risk of cancers and cardiovascular diseases (Paine et al., 2005;Aluru et al., 2008).Therefore, the transfer of these carotenoid biosynthesis genes from M. balsamina to M. charantia could be utilized to improve its nutritional value.Furthermore, we identified 229 genes related to cucurbitacin biosynthesis in the M. balsamina assembly.Cucurbitacins are signature bioactive compounds of the Cucurbitaceae family and confer a bitter taste to cucurbits (Chen et al., 2005).The identified genes encoding enzymes such as oxidosqualene cyclase, cytochromes P450, and acyltransferases are essential for cucurbitacin biosynthesis.Similar pathways and mechanisms are involved in the production of terpenoids across the genera of the Cucurbitaceae family (Huang et al., 2009;Shang et al., 2014).Moreover, we identified 19 genes related to the biosynthesis of other triterpenoids in the M. balsamina assembly.These triterpenoids have diverse medicinal properties, namely, anticancer, antidiabetic, anti-HIV, antimalarial, antiinflammatory, and antimicrobial activities (Ramalhete et al., 2022).Many of these triterpenoids such as balsaminol, balsaminoside, balsaminagenin, karavilagenin, cucurbalsaminol, and balsaminapentaol (Ramalhete et al., 2009a;Ramalhete et al., 2009;Ramalhete et al., 2010;Ramalhete et al., 2011a;and Ramalhete et al., 2011b) have been previously isolated from M. balsamina, highlighting its potential as a source of bioactive compounds.These results confirm the value of M. baslamina in terms of its nutritional and therapeutic properties.
M. balsamina is a monoecious plant with separate male and female flowers on the same plant.Sex determination and expression in cucurbits have been extensively studied, and various phytohormones and their cross talk have been identified as key regulators (Chen et al., 2016;Wang et al., 2019).Ethylene, in particular, is considered a core regulator of sex expression in cucurbits (Yin and Quinn, 1995;Boualem et al., 2015;Chen et al., 2016).In the M. balsamina assembly, we identified six genes related to ethylene biosynthesis, including ACS (1-aminocyclopropane-1carboxylate synthase) ACS-7, ACS-CMA101, and ACS-CMW-33 genes.These genes are involved in the production of ethylene, which regulates sex expression in cucurbits.In Cucumis sativus, ACS-1 is encoded by the F locus and is known to promote female sex expression by suppressing stamen development in bisexual flower primordial (Trebitsh et al., 1997;Mibus and Tatlioglu, 2004).Likewise, ACS-7 is encoded by A locus (orthologue of the cucumber M gene) and is known to promote femaleness in monoecious melon lines, and a miss-sense mutation in CmACS-7 led to andromonoecy, the predominant sex type of commercial melon (Boualem et al., 2008;Boualem et al., 2009).Similarly, two genes (MOMC46_189, MOMC518_1) encoding CmAcs-7 like protein and a gene (MOMC3_649) encoding CmACS 11 like protein were identified in M. charantia (Urasaki et al., 2017).ACS encoding genes for sex determination in M. balsamina and M. charantia were found orthologous by synteny analysis as well.This suggests the possible involvement ethylene regulated sex expression like all other cucurbits in Momoridica genus.The orthologous relationship of these ACS genes with those identified in M. charantia and other cucurbits suggests a highly conserved nature of sex-regulating genes across the Cucurbitaceae family.Additionally, our study revealed a high number of conserved genes (approximately 8,500) between M. balsamina and M. charantia, Cucumis sativus, Cucumis melo, and Citrullus lanatus, indicating a substantial level of genetic similarity and potential for comparative genomics studies among cucurbits.
Comparative plant genomics investigates the distinctiveness and differences among plant genomes.By comparing the genomes of closely and distantly related species, researchers can gain insights into the patterns and processes associated with plant genome evolution and identify functional regions within genomes (Caicedo and Purugganan, 2005).In this particular study, we conducted a genome comparison of Momordica balsamina with other related cucurbit species, namely, Momordica charantia (Bitter gourd), Cucumis sativus (Cucumber), Cucumis melo (Musk melon), and Citrullus lanatus (Watermelon), in order to identify syntenic and phylogenetic relationships.Our analysis revealed that Momordica balsamina shared the highest number of orthologous pairs (8,845) with Momordica charantia, followed by 8,265 orthologous pairs between Momordica balsamina and Cucumis sativus.Previous research by Garcia-Mas et al. (2012) identified 19,377 one-to-one ortholog pairs between Cucumis melo and Cucumis sativus.
Furthermore, we detected paralogous and orthologous relationships between the five studied Cucurbitaceae genomes, which can serve as a guide for translational research and facilitate the study of conserved economic traits.By utilizing conserved BUSCO genes (orthologous genes), we identified the evolutionary relationship between Momordica balsamina, Momordica charantia, Citrullus lanatus, Cucumis sativus, and Cucumis melo.Phylogenetic analysis done using Vitis vinifera as an outgroup classified Momordica balsamina and Momordica charantia to the same clade, indicating a close genetic relationship between these two species with a speciation/separation event estimated to have occurred 23 million years ago.Additionally, Momordica was found to be closer to Citrullus (Watermelon) than to Cucumis, suggesting a divergence around 53 million years ago.Previous studies by Urasaki et al. (2017); Jobst et al. (1998), andSchaefer et al. (2009) also reported a closer genetic association between bitter gourd and watermelon compared to cucumber or melon in phylogenetic and genetic analyses.
We performed synteny analysis to elucidate variations at the nucleotide level arising from mutations, duplications, chromosomal rearrangements, and gene family expansion or loss (Alkan et al., 2011).Synteny blocks, which identify regions of chromosomes shared between genomes that have a common order of homologous genes from a common ancestor, were identified to shed light on evolutionary relationships between species (Vergara and Chen, 2010).Previous synteny analysis in members of the Cucurbitaceae family helped to clarify the reason behind differences in basic chromosome number between Cucumis sativus and C. melo (Huang et al., 2009;Li et al., 2011).In our current study, we found the highest number of syntenic blocks between Momordica balsamina and Momordica charantia (306), followed by 282 syntenic blocks between Momordica balsamina and Citrullus lanatus, indicating a high level of synteny between M. balsamina and M. charantia, followed by watermelon (Citrullus lanatus).Previous synteny analyses also reported a high level of colinearity between Momordica and Citrullus genomes (Urasaki et al., 2017;Cui et al., 2020).Our findings revealed a general absence of one-to-one relationships in the chromosomes between Momordica balsamina and other cucurbit genomes.This observation aligns with most of the synteny analyses conducted in cucurbits (Matsusmura and Urasaki, 2020); Wu et al., 2020, except for the study by Wu et al. (2017), which identified chromosomelevel synteny between bottle gourd and melon (C.melo) and watermelon (Citrullus lanatus) genomes.The findings of our study, along with the synteny analysis by Matsusmura and Urasaki (2020), support the fact that most Cucurbitaceae genomes belong to a different clade than the genus Momordica (Renner and Schaefer, 2016).Therefore, the absence of one-to-one chromosome synteny between Momordica (balsamina and charantia) and other cucurbits may be attributed to higher structural re-arrangement in chromosomes after speciation.
In addition to the genome comparison and synteny analysis, we identified 215,379 SSRs and 567,483 TF binding sites (TFBSs).These data were incorporated into a genomic web resource called MbGeR, developed to provide access to the data extracted during this study.Characterizing the M. balsamina genome contributes to our understanding of the available gene pool that can be utilized to improve M. charantia through advanced plant breeding techniques.Due to the significant therapeutic values, resilience to biotic and abiotic stress and nutritional value of M. balsamina, this study offers valuable insights and a high-quality assembly and annotation of its genome, thereby assisting in the development of high-yielding and resistant varieties of this promising vegetable crop.

Conclusion
M. balsamina is the closest wild species of M. charantia, with higher resilience to biotic and abiotic stresses and greater medicinal and nutritional qualities.The present study provides the first highquality chromosome-level genome assembly of M. balsamina with size 384.90Mb and N5030.96Mb using sequence data from 10x Genomics, Nanopore, and Hi-C platforms.Annotation of the provided assembly identified 215,379 SSRs; 632,098 TEs; 567,483 TF binding sites; 3,376 noncoding RNAs (tRNA, miRNA, snoRNA, and so forth) genes, and 41,652 protein coding genes.A sum of 4,347 disease resistance, 67 heat stress-related, 15 salt stress related, 229 cucurbitacin related, 19 terpenes related, 37 antioxidant activity, 05 carotenoid related, and 06 sex determination related genes were identified in M. balsamina assembly.Because of stress tolerance and better therapeutic values, M. balsamina will serve as a potential genomic resource, and provided assembly will help to boost the targeted gene introgression between M. balsamina and M. charantia species in developing high-yielding climate-smart and stress-resilient crop varieties.In addition, this high-quality genome assembly done using reads from multiple sequencing platforms can be used to improve further the quality and completeness of genome assembly of related species.The SSR markers obtained in this study would assist in linkage mapping, QTL and gene discovery, population genetics, evolutionary studies and gene regulation.The provided assembly will also help in identifying a higher number of genome-wide markers with greater specificity and accuracy to trace the introgressed segments during advanced breeding programs to improve resistance and medicinal values to high-yielding M. charantia varieties, which is significantly lost due to domestication of bitter gourd.Furthermore, the finding of comparative genome analysis (phylogeny and synteny) will be helpful to get insights into the patterns and processes associated with genome evolution and to uncover functional regions of cucurbit genomes.
FIGURE 2 (A) Distribution of TF binding sites on different chromosomes; (B) frequency of SSRs of different nucleotide repeat motifs; (C) frequency of predicted tRNA genes and (D) protein coding genes distributed over chromosomes along with pseudogenes predicted in M. balsamina assembly.
FIGURE 4 (A) Unique and overlapping M. balsamina genes found orthologous in other related species (Cucumis sativus, Cucumis melo, Citrullus lanatus, and M. charantia); (B) rooted phylogenetic tree represented in terms of divergence time (MYA: million years ago) based on whole genome assemblies of M. balsamina and other related species (Cucumis sativus, Cucumis melo, Citrullus lanatus, and M. charantia).

TABLE 2
Comparative statistics of M.balsamina genome assembly with genome assemblies of related species.TABLE3Frequencies and proportion of various classes of TEs predicted in M. balsamina assembly.

TABLE 4
Frequencies of genes associated with disease resistance, defence, salt stress, heat stress, sex determination, and secondary metabolite synthesis identified in M. balsamina assembly.

TABLE 4 Continued
Momoridica balsamina Chr11 was syntenic to Chr6 and Chr7 of Cucumis sativus and Chr5 was syntenic to Chr3 and Chr4 of C. sativus.Similarly, Chr8 of Momordica balsamina was syntenic to Chr 11 of C. melo.Furthermore, Chr7 was colinear to Chr 2 and 12 of Melon.Synteny between M. balsamina Chr 7 and Chr2 of watermelon was observed.Furthermore, Chr 5 was syntenic to Chr5 and Chr7 of watermelon.Maximum number of genes on each chromosome of M. Balsamina found homologous with genes on corresponding scaffolds of M. charantia are shown in Supplementary Table2.In addition, the Supplementary Figures

TABLE 5
Frequencies of M. balsamina orthologous genes and syntenic blocks found in other related species.