AUTHOR=Saxena Priya , Rauniyar Shailabh , Thakur Payal , Singh Ram Nageena , Bomgni Alain , Alaba Mathew O. , Tripathi Abhilash Kumar , Gnimpieba Etienne Z. , Lushbough Carol , Sani Rajesh Kumar TITLE=Integration of text mining and biological network analysis: Identification of essential genes in sulfate-reducing bacteria JOURNAL=Frontiers in Microbiology VOLUME=Volume 14 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2023.1086021 DOI=10.3389/fmicb.2023.1086021 ISSN=1664-302X ABSTRACT=The growth and survival of an organism in a particular environment is highly depends on the certain indispensable genes, termed as essential genes. Sulfate-reducing bacteria (SRB) are obligate anaerobes which thrives on sulfate reduction for its energy requirements. The present study used Oleidesulfovibrio alaskensis G20 (OA-G20) as a model SRB to categorize the essential genes based on their key metabolic pathways. Herein, we reported a feedback loop framework for gene of interest discovery, from bio-problem to gene set of interest, leveraging expert annotation with computational prediction. Defined bio-problem was applied to retrieve the genes of SRB from literature databases (PubMed, and PubMed Central) and annotated them to the genome of OA-G20. Retrieved gene list was further used to enrich protein-protein interaction and was corroborated to the pangenome analysis, to categorize the enriched gene sets and the respective pathways under essential and non-essential. Interestingly, the sat gene (dde_2265) from the sulfur metabolism was the bridging gene between all the enriched pathways. Gene clusters involved in essential pathways were linked with the genes from seleno-compound metabolism, amino acid metabolism, secondary metabolite synthesis, and cofactor biosynthesis. Furthermore, pangenome analysis demonstrated the gene distribution, where 69.83% of the 116 enriched genes were mapped under “persistent”, inferring the essentiality of these genes. Likewise, 21.55% of the enriched genes, which involves specially the formate dehydrogenases and metallic hydrogenases, appeared under “shell”. Our methodology suggested that semi-automated text mining and network analysis may play a crucial role in deciphering the previously unexplored genes and key mechanisms which can help to generate a baseline prior to perform any experimental studies