AUTHOR=Shankar Rama , Paithankar Shreya , Gupta Suchir , Chen Bin TITLE=Detection of viral contamination in cell lines using ViralCellDetector JOURNAL=Frontiers in Microbiology VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2025.1595180 DOI=10.3389/fmicb.2025.1595180 ISSN=1664-302X ABSTRACT=Background and aimsCell lines are widely used in biomedical research to investigate various biological processes, including gene expression, cancer progression, and drug responses. However, cross-contamination with bacteria, mycoplasma, and viruses remains a persistent challenge. While the detection of bacterial and mycoplasma contamination is relatively straightforward, identifying viral contamination is more difficult. To address this issue, we developed ViralCellDetector, a tool designed to detect viral contamination by mapping RNA-seq data to a comprehensive viral genome library.MethodsViralCellDetector processes RNA-seq data from any host species by first aligning reads to the host reference genome, followed by mapping the unmapped reads to the NCBI viral genome database. Viral presence is determined using stringent criteria based on the number of mapped reads and viral genome coverage. To further enable the detection of viral contamination from unknown sources, we identified host genes that are differentially expressed during viral infection and used these markers to train a machine learning model for classification.ResultsUsing ViralCellDetector, we found that approximately 10% (110 samples) of RNA-seq datasets involving MCF7 cells were likely contaminated with viruses. The tool demonstrated high sensitivity in detecting viral sequences. Furthermore, the machine learning model effectively distinguished infected from non-infected samples based on human gene expression profiles, achieving an AUC of 0.91 and an accuracy of 0.93.ConclusionOur mapping-based approach enables robust detection of viral contamination in RNA-seq data from any host organism, while the marker-based approach accurately identifies viral infections specifically in human cell lines. This capability can help researchers detect and avoid the use of contaminated cell lines, thereby improving the reliability of experimental outcomes.