AUTHOR=Nissan Nour , Hooker Julia , Arezza Eric , Dick Kevin , Golshani Ashkan , Mimee Benjamin , Cober Elroy , Green James , Samanfar Bahram TITLE=Large-scale data mining pipeline for identifying novel soybean genes involved in resistance against the soybean cyst nematode JOURNAL=Frontiers in Bioinformatics VOLUME=Volume 3 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/bioinformatics/articles/10.3389/fbinf.2023.1199675 DOI=10.3389/fbinf.2023.1199675 ISSN=2673-7647 ABSTRACT=Soybean cyst nematode, [Heterodera glycines Ichinohe], (SCN) is a devastating pathogen of soybean [Glycine max (L.) Merr.] which is rapidly becoming an economic issue at a global scale. Two loci conferring SCN resistance have been identified in soybean, Rhg1 and Rhg4, however they offer declining protection. Therefore, it is imperative that we identify additional mechanisms for SCN resistance. In this paper, we develop a bioinformatics pipeline to identify protein-protein interactions (PPI) related to SCN resistance by data mining massive-scale datasets. The pipeline combines two leading sequence-based PPI predictors, the Protein-protein Interaction Prediction Engine (PIPE), PIPE4, and Scoring PRotein INTeractions, SPRINT, to predict high-confidence interactomes. First, we predicted the top soy interacting protein partners of the rhg1 and Rhg4 proteins. Both PIPE4 and SPRINT overlap in their predictions with 58 soybean interacting partners, 19 of which had GO terms related to defense. Beginning with the top predicted interactors of rhg1 and Rhg4 we implement a “guilt by association” in-silico proteome-wide approach to identify novel soybean genes that may be involved in SCN resistance. This pipeline identified 1082 candidate genes whose local interactomes overlap significantly with the rhg1 and Rhg4 interactomes. Using GO enrichment tools, we were able to highlight a number of important genes including five genes with GO terms related to response to nematode (GO:0009624): Glyma.18G029000, Glyma.11G228300, Glyma.08G120500, Glyma.17G152300, and Glyma.08G265700. This study is the first of its kind to predict interacting partners of known resistance proteins rhg1 and Rhg4 forming an analysis pipeline that enables researchers to focus their search on high-confidence targets to identify novel SCN resistance genes in soybean.