Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Bioinform.

Sec. Integrative Bioinformatics

Volume 5 - 2025 | doi: 10.3389/fbinf.2025.1651623

This article is part of the Research TopicAI in Integrative BioinformaticsView all 4 articles

Inferred global dense residue transition graphs from primary structure sequences enable protein interaction prediction via directed graph convolutional neural networks

Provisionally accepted
  • 1Texas Woman's University, Denton, United States
  • 2The University of Texas Rio Grande Valley, Brownsville, United States

The final, formatted version of the article will be published soon.

Accurate prediction of protein-protein interactions (PPIs) is crucial to understanding cellular functions and advancing drug discovery. While some in-silico methods leverage direct sequence embeddings from Protein Language Models (PLMs), others apply Graph Neural Networks (GNNs) to topological features from PPI networks or 3D protein structures. Here, we introduce a novel approach ProtGram, that models protein primary structure through a hierarchy of globally inferred n-gram graphs. For each n-gram level, residue transition probabilities, aggregated from a large sequence corpus, define the edge weights of an underlying directed graph of paired residues. We propose a custom directed graph convolutional neural network, DirectGCN, featuring a unique convolutional layer that processes information through separate path-specific (incoming, outgoing, undirected) and shared transformations. These components are then combined via a learnable gating mechanism scoring each aggregated path. The efficacy of DirectGCN is established on standard node classification benchmarks, where its performance is on par with established methods on general benchmark datasets yet highlights its specialization for complex, directed, limited and dense heterophilic graph structures. Subsequently, DirectGCN is applied to the hierarchy of n-gram graphs ProtGram to learn residue-level embeddings, which are then pooled via an attention mechanism to generate protein-level embeddings. We name our two-stage graph representation learning framework ProtGram-DirectGCN. The main focus of this study is to investigate less intensive alternative models to PLMs for the downstream task of PPI prediction via link prediction. Our method achieves robust predictive power, suggesting that our globally inferred directed graph-based representation of global sequence transitions offers a potent and computationally distinct alternative to resource-intensive PLMs on the task of PPI prediction. Future work includes testing ProtGram-DirectGCN on a wide range of bioinformatics tasks.

Keywords: UniProt, BioGrid, russellab, graph theory, graph representation learning, Graph neural networks, Graph convolution networks, Link prediction

Received: 22 Jun 2025; Accepted: 22 Sep 2025.

Copyright: © 2025 Ebeid, Tang and Gu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Islam Akef Ebeid, iebeid@twu.edu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.