AUTHOR=Law Simon R. , Kellgren Therese G. , Björk Rafael , Ryden Patrik , Keech Olivier 

TITLE=Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study

JOURNAL=Frontiers in Plant Science

VOLUME=Volume 11 - 2020

YEAR=2020

URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2020.00524

DOI=10.3389/fpls.2020.00524

ISSN=1664-462X

ABSTRACT=Gene co-expression networks (GCNs) can be prepared using a variety of mathematical approaches based on data sampled across diverse developmental processes, tissue types, pathologies, mutant backgrounds, and stress conditions. These networks aim to identify genes with similar expression dynamics, but are prone to introducing false-positive and false-negative relationships, especially in the instance of large and heterogenous datasets. With the aim of optimizing the relevance of edges in GCNs and enhancing global biological insight, we propose a novel approach that involves a data-centering step performed simultaneously per gene and per sub-experiment, called centralisation within sub-experiments (CSE). 
Using a gene set encoding the plant mitochondrial proteome as a case study, our results show that all CSE-based GCNs assessed had significantly more edges within the majority of the considered functional sub-networks, such as the mitochondrial electron transport chain and its sub-complexes, than GCNs not using CSE; thus demonstrating that CSE-based GCNs are efficient at predicting canonical functions and associated pathways, here referred to as the 'core gene network'. Furthermore, we show that correlation analyses using CSE-processed data can be used to fine-tune the prediction of the function of uncharacterised genes; while its use in combination with analyses based on non-CSE data can augment conventional stress analyses with the innate connections underpinning the dynamic system being examined. 
Therefore, CSE appears as an efficient alternative method to conventional batch correction approaches, particularly when dealing with large and heterogenous dataset. The method is easy to implement into a pre-existing GCN analysis pipeline and can provide enhanced biological relevance to conventional GCNs by allowing users to delineate a core gene network.