AUTHOR=Wang Yue , Huang Xulong , Xian Bin , Jiang Huajuan , Zhou Tao , Chen Siyu , Wen Feiyan , Pei Jin TITLE=Machine learning and bioinformatics-based insights into the potential targets of saponins in Paris polyphylla smith against non-small cell lung cancer JOURNAL=Frontiers in Genetics VOLUME=Volume 13 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2022.1005896 DOI=10.3389/fgene.2022.1005896 ISSN=1664-8021 ABSTRACT=Background: Lung cancer has the highest mortality rate among cancers worldwide, and non-small cell lung cancer (NSCLC) is the major lethal factor. Saponins in Paris polyphylla smith exhibit antitumor activity against non-small cell lung cancer, but their targets are not fully understood. Methods: In this study, we used differential gene analysis, lasso regression analysis and support vector machine recursive feature elimination (SVM-RFE) to screen potential key genes for NSCLC by using relevant datasets from the GEO database. The accuracy of the signature genes was verified by using ROC curves and gene expression values. Screening of potential active ingredients for the treatment of NSCLC by molecular docking of the reported active ingredients of saponins in Paris polyphylla Smith with the screened signature genes. The activity of the screened components and their effects on key genes expression were further validated by CCK-8, flow cytometry (apoptosis and cycling) and qPCR. Results: 204 differential genes and two key genes (RHEBL1, RNPC3) stood out in the bioinformatics analysis. Overall survival (OS), First-progression survival (FP) and post progression survival (PPS) analysis revealed that low expression of RHEBL1 and high expression of RNPC3 indicated good prognosis. In addition, Polyphyllin VI(PPVI) and Protodioscin (Prot) effectively inhibited the proliferation of non-small cell lung cancer cell line A549 with IC50 of 4.46±0.69μM and 8.09±0.67μM, respectively. The number of apoptotic cells increased significantly with increasing concentrations of PPVI and Prot. PPVI induces G1/G0 phase cell cycle arrest and Prot induces G2/M phase cell cycle arrest. After PPVI and Prot acted on this cell line for 48h, the expression of RHEBL1 and RNPC3 was found to be consistent with the results of bioinformatics analysis. Conclusion: The results suggest that PPVI and Prot may act on RHEBL1 and RNPC3 to treat NSCLC, and in addition, the two key genes can be used as candidate diagnostic genes to determine the prognosis with NSCLC patients. Keywords: bioinformatics; key genes; Paris polyphylla Smith; non-small cell lungcancer