AUTHOR=Zhu Zhongxu , Gregg Keqin , Zhou Wenli TITLE=iRGvalid: A Robust in silico Method for Optimal Reference Gene Validation JOURNAL=Frontiers in Genetics VOLUME=Volume 12 - 2021 YEAR=2021 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2021.716653 DOI=10.3389/fgene.2021.716653 ISSN=1664-8021 ABSTRACT=Background: Appropriate reference genes are critical to accurately quantifying relative gene expressions in researches and clinical applications. Numerous efforts have been made to select the most stable reference gene(s), but a consensus has yet to be achieved. In this report, we propose an in silico reference gene validation method, iRGvalid, that can be used as a universal tool to validate the reference genes recommended from different resources so as to identify the best ones without the need of any web-lab validation tests. Methods: iRGvalid takes advantage of high throughput gene expression data, and is built on a double-normalization strategy. First, the expression level of each individual gene is normalized against the total gene expressions of each sample, followed by target gene(s) normalization to the candidate reference gene(s). Linear regression analysis is then performed between pre- and post- normalized target gene(s) across the whole sample set to evaluate the stability of the reference gene(s) which is positively associated with the Pearson correlation coefficient, Rt. The higher the Rt value, the more stable a reference gene, and the better as a reference gene. We applied iRGvalid to 14 candidate reference genes to validate and identify the most stable reference genes in four cancer types: lung adenocarcinoma, breast cancer, colon adenocarcinoma, and nasopharyngeal cancer. The stability of reference gene is evaluated both individually and in groups of all possible combinations. Results: Highly stable reference genes resulted in high Rt values regardless of the target gene used. The highest stability was achieved with a specific combination of 3 to 6 reference genes. A few genes were among the best reference genes across the cancer types studied here. Conclusions: iRGvalid provides an easy and robust method to validate and identify the most stable reference gene or genes from a pool of candidate reference genes. The inclusivity of large expression data sets and direct comparison of candidate reference genes makes it possible to identify reference genes with universal quality. This method can be used in any other gene expression studies when large cohorts of expression data are available.