Genome-Based Drug Target Identification in Human Pathogen Streptococcus gallolyticus

Streptococcus gallolysticus (Sg) is an opportunistic Gram-positive, non-motile bacterium, which causes infective endocarditis, an inflammation of the inner lining of the heart. As Sg has acquired resistance with the available antibiotics, therefore, there is a dire need to find new therapeutic targets and potent drugs to prevent and treat this disease. In the current study, an in silico approach is utilized to link genomic data of Sg species with its proteome to identify putative therapeutic targets. A total of 1,138 core proteins have been identified using pan genomic approach. Further, using subtractive proteomic analysis, a set of 18 proteins, essential for bacteria and non-homologous to host (human), is identified. Out of these 18 proteins, 12 cytoplasmic proteins were selected as potential drug targets. These selected proteins were subjected to molecular docking against drug-like compounds retrieved from ZINC database. Furthermore, the top docked compounds with lower binding energy were identified. In this work, we have identified novel drug and vaccine targets against Sg, of which some have already been reported and validated in other species. Owing to the experimental validation, we believe our methodology and result are significant contribution for drug/vaccine target identification against Sg-caused infective endocarditis.


INTRODUCTION
Streptococcus gallolyticus (Sg) is Gram-positive, non-motile bacteria previously referred as Streptococcus bovis. It is phenotypically diverse bacteria belonging to the Lancefield Group D Streptococci (Pasquereau-Kotula et al., 2018;Arjun et al., 2020). This bacterium grows in chain or pairs and is non-γ-hemolytic or slightly γ-hemolytic but sometimes shows alpha-hemolytic activity on ovine blood agar plates (Rusniok et al., 2010;Hensler, 2011). Although commonly present in microflora, approximately 2.5-15% is present in the gastrointestinal tract of a healthy individual (Hinse et al., 2011) and become an opportunistic pathogen causing various diseases, including infective endocarditis, colon cancer, meningitis, and septicemia.
This opportunistic pathogenesis of Sg is dependent on genes involved in polysaccharide production, glucan mucopolysaccharide, a putative component of biofilm produced by this species, and three types of pili and collagen-binding protein (Takamura et al., 2014). These genes provide protection from host immune system and help in adherence to the epithelial lining of the heart (Rusniok et al., 2010), causing infection and resulting in endocarditis (Millar and Moore, 2004).
For the last two decades, a significant rise in incidence of infective endocarditis were observed worldwide (Tripodi et al., 2005;Marmolin et al., 2016;Shahid et al., 2018;Arregle et al., 2019;Chamat-Hedemand et al., 2020). Among 100,000 population, 2.6-7 cases of endocarditis have been reported per year, a significant proportion of which was contributed by streptococcal infections: with incidence of 17% in North America, 31% in other European countries, 39% in the South America, and 32% in rest of the world (Holland et al., 2016). This disease mostly occurs in elderly patients (Firstenberg, 2016), and the median age of patients is ≥58 (Vilcant and Hai, 2018). The risk of developing Sg endocarditis rises with the consumption of uncooked meat or fresh dairy products, weakened immune system, history of hepatic diseases, and comorbidities such as diabetes mellitus and rheumatic disorders (Cãruntu et al., 2014).
In the presence of primary infection, metabolic disorder, or immune-compromised state, Sg tries to cause endocardial injury. This injury then triggers the thrombus formation by the removal of fibrin and platelets. After thrombus formation, the bacteria enters into the bloodstream through the thrombus. As Sg has virulence properties, it can enter into the bloodstream in a paracellular manner without inducing major immune response and adheres to the damaged collagenrich surface of the cardiac valve (endocardium). Once it is attached to the endocardium, this bacterium proliferates and forms a biofilm, which causes the inflammation in the lining of the heart and causes endocarditis (McDonald, 2009;Hensler, 2011).
Antibacterial drugs such as Penicillin G along with Gentamycin and estreptomicin are preferred medical treatments against infective endocarditis. Other options include Gentamicin-related Ceftriaxone and vancomycin in patients allergic to penicillin (Satué-Bartolomé and Alonso-Sanz, 2009). For patients with persistent fever and resistance to medical therapy, an expensive surgical intervention may be needed (Grubitzsch et al., 2016). Sg is resistant to penicillin, and one of the strains of Sg is also found to be resistant to tetracycline (Hinse et al., 2011). Therefore, development of an efficient treatment strategy against endocarditis, novel therapeutic targets, and potent drugs are urgently required.
For the rapid identification, many computational methods have been established such as core genome and subtractive genomic approaches that allow us to identify the core essential genomes and which do not possess any homology with the human genome (Caputo et al., 2019). These approaches has been used in a number of human pathogens such as Corynebacterium diphtheria , Corynebacterium pseudotuberculosis (Tiwari et al., 2014), and Treponema pallidium (Jaiswal et al., 2017). This study is designed with a goal to exploit in silico approaches to link Sg species genomic data with its proteome and to identify the putative therapeutic targets. It can be used to classify potent inhibitors that may contribute to the discovery of compounds that can inhibit pathogenic developments . The proteomes from the seven genomes of Sg were compared using a pan genome approach, from which only those genes were selected that were present in all the strains of Sg (Hinse et al., 2011). Then, the predicted core genome was further filtered out on the basis of essentiality for the bacteria, from which only 18 proteins were found to be essential, and all these proteins were non-homologous to the host (human). Out of these 18 proteins, 12 cytoplasmic proteins were identified as drug targets. These essential and non-host homologous protein targets were subjected to virtual screening using a library of 11,993 compounds. The identified putative targets might be used to design peptide vaccines and suggest novel lead druggable compounds that could bind to the proposed target proteins (Barh et al., 2011;Jamal et al., 2017;Uddin et al., 2019).

Genome Selection
In the current study, all available strains of Sg with available complete genome were considered for the pan genome analysis. A total of seven strains of Sg were selected; gene and protein sequences were retrieved from NCBI 1 .

Identification of Core Genomes
The core genome of Sg was identified from pan genome analysis using EDGAR software (Blom et al., 2016). Only those genes that were common in all the strains of Sg were selected. The selection criteria in EDGAR software were as follows: one strain is selected as a reference strain, and rest of all the strains were compared with the reference strains and from which the core genomes were selected that were common in all the strains. The algorithm that it used was protein Basic Local Alignment Search Tool (BLASTp) with the standard scoring matrix BLOSUM62 and cutoff value of E = 1 × 10 −5 (Blom et al., 2016).

Identification of Non-host Homologous Proteins
The identified core genome of Sg was then subjected to BLASTp against the human proteome to find out the proteins non-homologous to human host using default parameters e-value = 0.0001, bit score ≥ 100, scoring matrix BLOSUM62 and identity ≥ 25%. Only those proteins that showed no hit against human proteome database were selected .

Identification of Essential Genes
The non-host homologous proteins were subjected to BLASTp against Database of Essential Genes (DEG) with the standard scoring matrix BLOSUM62, e-value = 0.001 and identity ≥25% to find out essential proteins that are indispensable for the survival of pathogen. The database of essential genes consist of experimentally validated data from eukaryotes, archaea, and prokaryotes, and it covers a large number of essential genes for 31 bacteria containing more than 12,000 bacterial essential genes (Luo et al., 2014).

Drug Target Prioritization
For the determination of potential therapeutics, several factors are used like molecular weight, molecular function, cellular localization, pathway analysis, and virulence (Agüero et al., 2008). Molecular weight (MW) was determined by ProtParam tool 2 . Targets whose MW is <100 kDa are considered as best therapeutic target (Mondal et al., 2015). Molecular functions and biological process for target proteins were determined by Uniprot 3 . Subcellular localization of pathogen was performed by CELLO 4 . The cellular localization of bacteria determines the environment in which proteins operate. It affects the function of protein by controlling accessibility and availability of all types of molecular interaction partners. The knowledge of protein localization often plays an important role in characterizing the cellular function of hypothetical and newly discovered proteins (Scott et al., 2005). For pathway analysis, the Kyoto Encyclopedia of Genes and Genomes (KEGG) web tool 5 was used to determine the role of protein targets in different cellular and metabolic pathways (Kanehisa and Sato, 2020). To identify virulence of protein targets, Virulence Factor Database (VFDB) 6 was used, which determines the pathogenic virulence of the target proteins.

Catalytic Pocket Detection
The shortlisted potential druggable proteins were further screened to detect the possible binding pockets by calculating the druggable score using DoGSiteScorer (Volkamer et al., 2012). It is an automated pocket detection tool that is used for the calculation of druggability of protein cavities. This tool needs sequence of interest in 3D structure format; therefore, SwissModel was used for the prediction of the 3D structure. SwissModel web tool predicts the 3D structures of protein targets (Nielsen et al., 2010). After obtaining 3D structures, the druggability evaluation was performed by DoGSiteScorer. This tool returns the pocket residue and druggability score, which ranges from 0 to 1. The score closer to 1 is considered as a highly druggable protein cavity .

Retrieval of Ligands
Eleven thousand nine hundred ninety-three druggable molecules with Tonimoto cutoff level of 60% were retrieved from the ZINC database (Sterling and Irwin, 2015). Then, partial charges were  calculated, and energies of these compounds were minimized using energy minimization algorithm with default parameters. All minimized structures were saved in.mdb file. Then, these prepared ligands were used as an input file for molecular docking (Wadood et al., 2014).

Validation of 3D Structures
All the 3D structures quality was further validated using RAMPAGE and ERRAT tool. RAMPAGE stands for RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression. This tool does Ramachandran plot analysis and provides validity score for the 3D structure of target proteins. The score ≥80 were considered good (Batut and Gingeras, 2013). For further validation, ERRAT, an online tool, was used, which provides information about the protein structure with bad regions. The quality factor of the 3D structure ≥37% were considered good (Saddala and Adi, 2018).

Preparation of Protein for Docking
The predicted 3D structures were further prepared for docking using the Molecular Operating Environment (MOE) tool. This tool is quite robust along with the meticulous algorithm. It not only predicts the top ranking poses but also prognosticate the root mean-square deviation (RMSD) along with the calculated energies of docked molecule (Pagadala et al., 2017). The 3D protonation and energy minimization of these 3D structures was done (Vilar et al., 2008); then, these minimized structures were further used as template for molecular docking.

Molecular Docking of Drug Targets
The prepared minimized structures of targeted proteins and ligands were further subjected to molecular docking carried out in MOE using the MOE Dock (Figure 1). It predicted the favorable binding possess of selected ligands active sites of drug targets. Default parameters were selected for molecular docking. After the docking, we analyzed the best poses for hydrogen bonding/π-π interactions, and FIGURE 1 | Complete workflow of drug target identification in Sg using in silico approaches.
then, RMSD was calculated in MOE (Wadood et al., 2014). The orientation of the best dock molecules was further analyzed in chimera.

Genome Selection
The seven strains of Sg were retrieved from the National Center for Biotechnology Information (NCBI) 7 . The selection was based on the availability of their complete genome to have accuracy in our result. The details of the selected strains are summarized in Table 1.

Identification of Core Genomes
Core genome was identified to find drug targets that are homologous to all strains. Basically, only those core genomes that 7 https://www.ncbi.nlm.nih.gov/ were defined as genes persistently present in all population of an organism were extracted (Uddin and Jamil, 2018). Core genomes were identified from pan genome analysis using EDGAR software. UCN34 strain was selected as reference genome, and the rest of the strains were compared to the reference strain. The total genes identified in pan genome were 3,242, out of which 1,138 were core genes.

Identification of Non-host Homologous Proteins
The file generated by NCBI-BLASTp of Sg core genomes against human was parsed. Amidst 1,138 of core genomes, 1,115 proteins showed no hit and hence selected as non-homologous to the human proteome to avoid the aftereffect.

Identification of Essential Genes
The core 1,115 non-host homologous proteins were subjected to BLASTp against essential proteins present in DEG (Luo et al., 2014). The number of non-homologous proteins that is essential for the survival of the pathogen was 176. Among these, 18 proteins were selected as potential drug targets whose percent identity was >25, shown in Table 2. Out of these 18 proteins, 12 cytoplasmic proteins were selected as potential drug targets. The selection of final set of drug targets was kept strict to percentage identity to host, essentiality, and cutoff values.

Drug Target Prioritization
To determine the potential therapeutic targets, various factors were considered, including molecular weight from ProtParam (ExPASy, 2020; ProtParam documentation) of all 12 proteins was <100 kDa; therefore, these molecules featured as "druggable" molecule (Hughes et al., 2011). All these druggable molecules were analyzed using BLASTp against virulence factor database of VFDB (Chen et al., 2005), which predicts all therapeutics targets as virulent. Subcellular localization is a key factors in determining the function of protein. The CELLO (Yu et al., 2006) was used to  predict the subcellular localization of 12 query proteins. These query proteins were further subjected for pathway analysis using KEGG database (Kanehisa et al., 2017). It appeared that most of the proteins are involved in metabolic pathways like enzymes, glycosyltransferases, peptidoglycan biosynthesis, degradation of proteins, and lipids biosynthesis proteins. Whereas a few of them are involved in cell signaling and cell processing such as secretion system and two-component system, very few proteins were involved in genetic information processing and resistance pathway such as transcription factor, ribosomes, DNA replication protein, mitochondrial biogenesis, β-lactam pathway, and vancomycin resistance pathway. The details about the drug target prioritization parameters and functional annotation of 12 essential non-host homologous proteins are shown in Table 3. Quality factor of 3D structures of druggable proteins were further validated through RAMPAGE and ERRAT. Quality factor predicted by both tool was ≥80 and ≥37%, respectively, as shown in Table 4. This score shows that our protein 3D structures are good and could be prepared for docking.

Docking
Docking was performed against 12 drug targets with 11,993 ZINC druggable compounds via MOE tool. Top 100 molecules were redocked into the binding pocket of target proteins, and finally, a set of top 10 molecules was selected. The orientation of best docked molecule was analyzed in Chimera.

Validation of docking
In order to validate the MOE Dock program, the cocrystallized ligand was removed from the active site and redocked within the inhibitor binding cavity of penicillin-binding protein (PDB  ID: 3vsl). In this study, RMSD value (Figure 2) was found as 1.0968 Å, showing that our docking method is valid for the studied druggable molecules, and MOE Dock method, therefore, is reliable for docking of these compounds. The analysis and biological significance of each of the predicted protein-ligand interaction are described as follows.
16S rRNA methyltransferase B (BTR42_02745) is a protein that plays an important role in methylation of cytosine at position 967 of 16S rRNA. The structure of this protein consists of active sites in which two conserved cysteine residues are present. These cysteine residues are located near the activated methyl of cofactor. One of the cysteine residues act as a catalytic nucleophile and other play an important role in methyl transferase mechanism (Foster et al., 2003;Shen et al., 2020). The top 10 best confirmations are shown in Table 5 along with their ZINC ID, number of interactions, interacting residues, dock score, and minimized energy. The residues Lys 285, Lys 339, and Cys330 were found to interact with active ligand (ZINC01532584). The interaction of 16S rRNA methyltransferase B with ZINC01532584 is shown in Figure 3.
Chromosomal replication initiator protein DnaA (dnaA) is a protein that plays a significant role in initiation and regulation of chromosomal replication. In DNA regulation, the initiation process is the key event in the cell cycle of all organism. The initiation of replication starts at the site of origin, which is recognized and processed by the initiator protein. The structure of this protein consist of nucleotide binding folds with the long helical connector to all-helical DNA binding domain. The  conserved motif of this protein provide information about two most important steps in origin processing, which are binding of DNA and homo-oligomerization (Erzberger et al., 2002). Table 6 presents top 10 protein-ligand interaction with ZINC ID, minimized energy, number of interactions, Dock score, and interactive residues. ZINC71782058 was predicted as the most active lead compound against chromosomal replication initiator protein DnaA (dnaA). The protein-ligand interaction is shown in Figure 4.
Transcriptional regulator CtsR (ctsR) is an important repressor that regulates the transcription of class III stress genes in Grampositive bacteria. CtsR controls the expression of genes encoding for chaperons and proteases. These genes play an important role in protein quality control system of bacteria. The structure of this protein consist of N-terminal DNA binding domain and C-terminal dimerization domain. N-Terminal DNA binding domain consists of helix-turn-helix (HTH) folds, and C-terminal dimerization domain consist of α-helices organized in four helix bundle. This protein also play an important role in pathogenicity, as it provides benefit to the bacteria during its stress condition and improves the survival chances for bacteria (Fuhrmann et al., 2009). Top 10 lead molecules against this protein are shown in Table 7 consisting ZINC ID, minimized energy, dock score, numbers of interactions, and interacting residues. The best interaction was shown by ZINC79090716 as shown in Figure 5.
Phosphotransferase system (PTS) fructose transporter subunit IIA (DW662_04200) is a protein that is involved in phosphoenolpyruvate-dependent sugar PTS. In bacteria, it is a major carbohydrate transport system. PTS catalyzes the translocation with naturally occurring phenomenon of phosphorylation of sugar and hexitols, and it also regulates the metabolism in response to the availability of carbohydrates. It consists of two proteins HPr and enzyme I protein. These are the cytoplasmic proteins, in which first enzyme I transfers phosphoryl groups from phosphoenolpyruvate to phosphoryl carrying protein HPr. Then, this HPr further transfers the phosphoryl group to different transport complexes. PTS fructose transporter subunit IIA belongs to the fructose-mannitol family. This is a large and complex family that consists of several sequenced fructose and mannitol-specific permeases and putative permeases of unknown specificities. This family have three domains, IIA, IIB, and IIC, from which the most specific domain is IIA for the fructose PTS transporters (Siebold et al., 2001). The top 10 protein-ligand interaction is shown in Table 8, and the best interaction is shown in Figure 6 with ZINC01638334.
Penicillin-binding protein 2A (pbp2A) is a transpeptidase that catalyzes the cell wall crosslinking, which is quite essential for the growth and survival of bacteria. This protein activation is regulated by active site at which the crosslinking take place (Fishovitz et al., 2014). Through pathway analysis, it is clear that it is involved in β-lactam resistance pathway. β-Lactam antibiotic is the most used group of antibiotics, which exerts its effect by interfering with the bacterial cell wall by structural crosslinking of peptidoglycan. This protein has already been reported as β-lactam resistant. This antibiotic resistance is due to the inactivation of the enzymes, change in β-lactam targets of pbp, change in porins, and use of efflux pump (Kocaoglu and Carlson, 2015). The top-ranked lead compounds are given in Table 9 where compound ZINC16942644 was predicted as best on the basis of minimized energy, dock score, and number of interactions made (Figure 7).
UDP-N-acetylmuramoyl-tripeptide-D-alanyl-D-alanine ligase (murF) is a protein involved in the biosynthesis of peptidoglycan. Peptidoglycan is the important component of bacterial cell wall, and enzymes involved in its synthesis could represent as potential drug target. MurF catalyzes the final step in the biosynthesis of the peptidoglycan in which it adds the D-Ala-D-Ala to the nucleotide precursor UDP-MurNAc-L-Ala-γ-D-Glu-meso-DAP TABLE 10 | UDP-N-acetylmuramoyl-tripeptide-D-alanyl-D-alanine ligase and its interaction profile with docked compounds, their ZINC ID, minimized energy, number of interactions, dock score, and interactive residues.       (Hrast et al., 2013). The protein-ligand interaction of the top 10 molecules is shown in Table 10, and among these molecules, the best interaction was with ZINC14681317 as shown in Figure 8.

ZINC ID
AraC family transcriptional regulator (melR) protein belongs to Arac/XylS family. This is a family of transcription regulators and is widely distributed in bacteria. This protein regulates the transcription of several genes and operons that are  involved in arabinose catabolism and transport. This protein coregulates with another transcription regulator that is also involved in degradation of I-arabinose. By binding together, these regulators activate the transcription of five operons that are involved in transport, catabolism, and autoregulation of I-arabinose. Its structure is composed of C-terminal DNA binding domain and N-terminal domain. C-Terminal DNA binding domain consists of two HTHs that are connected with α-helix, and N-terminal domain is responsible for dimerization and binding of I-arabinose. The structure of this reveal that the N-terminal of this protein plays an important role in regulation of arabinose (Rodgers and Schleif, 2009;   -López et al., 2015;Malaga et al., 2016). Table 11 presents the best results against AraC family transcriptional regulator (melR) where ZINC71781167 was predicted as top lead compound as shown in Figure 9. DNA polymerase III subunit alpha (dnaE) is responsible for the replication in bacterial genome. This protein function as tripartite assembly consisting two core polymerases. In Escherichia coli, the core polymerases contain the catalytic α-subunit also known as PolIIIα, the 3 -5 exonuclease ε-subunit and the θ subunit whose function is essentially unknown (Wing et al., 2008). From the function and pathway analysis, this protein is involved in DNA replication, mismatch repair pathway, and homologous recombination. It is located in the cytoplasm, which means it could act as drug target. The top 10 interaction of this protein with ligands is shown in Table 12 along with their ZINC ID, minimized energy, number of interactions, dock score, and interactive residues. The binding pocket residues Arg955, Lys553, Gln556, and Arg554 were predicted to contribute in FIGURE 13 | Interaction of ribosome-binding factor A with ZINC01235906 (colored in red). The interacting residues (green) are shown making bonding (dotted lines) with the ligand. the interaction with lead molecule ZINC38653615 as shown in Figure 10. 50S ribosomal protein L28 (rpmB) protein plays an important role in the assembly of ribosome. This protein is encoded by rpmB operon. This protein could act as potential drug target as its role in ribosome assembly and functioning (Aseev et al., 2016). The functional analysis also showed its role in translation and structural constituent in ribosomes, which makes it a good drug target. The top 10 results of 50S ribosomal protein L28 protein is shown in Table 13 along with their ZINC ID, minimized energy, dock score, number of interactions, and interactive residues, and the best interaction was observed with ZINC03872713 shown in Figure 11.

Fernandez
2-Isopropylmalate synthase (leuA) protein catalyzes to form 2-isopropylmalate by the condensation of acetyl group of acetyl-CoA with 2-oxoisovalerate. It is also involved in biosynthesis of leucine, by synthesizing L-leucine from 3methly-2-oxobutanate (De Carvalho and Blanchard, 2006). In Mycobacterium tuberculosis, biosynthesis of leucine plays an essential role, which is important for the growth of bacteria, and so it could act as a potential drug target. The structure of this protein consist two domains N-and C-terminal. N-Terminal consist of triosephosphate isomerase (TIM) barrel catalytic domain, and C-terminal is a regulatory domain (Koon et al., 2004). The top 10 ligands against 2-isopropylmalate synthase (leuA) protein are shown in Table 14 along with ZINC ID, minimized energy, number of interactions, dock score, and interactive residue, and the best interacting protein-ligand confirmation is shown in Figure 12.
Ribosome-binding factor A (rbfA) is cold shock adaptation protein that helps bacteria to grow at low temperature (10-20 • C). This protein associates with 30S ribosomal subunit but do not associate with 70S ribosomes or polysomes. It also interacts with 5 -terminal helix of 16S rRNA. During the cold shock adaptation, several cold shock proteins are synthesized, which allow the efficient translation processing of the messenger RNAs (mRNAs), which facilitates the ribosome assembly that is required for the growth of bacteria (Huang et al., 2003). This protein is found to be virulent and quite essential for bacteria so that it could act as potential drug target. The best interacting lead molecules are shown in Table 15 along with ZINC ID, minimized energy, dock score, number of interactions, and interacting residues. ZINC01235906 was predicted as top ranked molecule interacting with binding site residues lys24 and Arg77 (Figure 13).
DNA-binding response regulator (DW662_02135) is a protein that mediates the change in cell according to the response in the environment. This protein is a part of a two-component regulatory system (TCS). Bacteria tend to change its environment according to different levels of regulation and expression of genes, expression of multiple operons and stress response and sporulation and cellular motility, cell aggregation, and biofilm formation. All these levels are controlled by TCS from primarily through transcription, translations, and posttranslation of regulation of genes and also through different types of protein-protein interaction and also its virulence. TCS consists of histidine kinases, which sense the environmental signal and generate the response regulator. This process is phosphorylated by the cognate histidine kinase, and it also sometimes function as transcription regulator to regulate the expression of genes (Wang et al., 2007;Galperin, 2010). As this protein is non-homolog to human and also found to be essential and virulent, this protein could be a potential drug target against Sg. Table 16 presents best interacting lead molecules along with their ZINC ID, interacting residues, number of interactions, dock score, and minimized energy. Binding site residues His74, Ser114, Arg117, Lys156, and Lys153 were predicted to interact with ZINC38140720 as shown in Figure 14.
For each target protein, we were able to shortlist 10 lead molecules out of which 1 molecule was ranked on top. It would be appropriate to translate these in silico findings into in vitro and finally in vivo to channelize the computational findings toward experimental validation.

CONCLUSION
In the current study, we have used an in silico approach in which 1,138 core proteins of 7 strains of Sg were determined from pan genome analysis. Subtractive genomic and identification of essential genes further reduced the number of selected targets to 18. The exploitation of 3D structural information and drug prioritization of these proteins enabled to prioritize 12 putative drug targets. All of the identified drug targets are playing an essential role in the bacterial growth, survival, and virulence, which could act as potential therapeutic targets. Furthermore, molecular docking analysis allowed us to shortlist 10 active molecules from which the best active molecule was selected on the basis of drug score, number of interactions, and binding free energy. Thus, this study provides a significant breakthrough in designing new and potent compounds against Sg. For the future work, the experimental validation of these targets is suggested to validate its role in survival and virulence of Sg.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
NQ, RU, and SJ conceived the idea and performed the experimental work. NQ, MF, MSh, AB, HM, MSo, and SJ performed all the analysis. NQ and RM drafted the manuscript. SJ, SB, and RU critically reviewed the manuscript and provided intellectual support. All authors contributed to the article and approved the submitted version.

FUNDING
This study was funded by the Deanship of Scientific Research at King Saud University through research group no. RG-1440-100, King Saud University, Riyadh, Saudi Arabia.