AUTHOR=Das Jutan , Kumar Sanjeev , Mishra Dwijesh Chandra , Chaturvedi Krishna Kumar , Paul Ranjit Kumar , Kairi Amit TITLE=Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system JOURNAL=Frontiers in Genetics VOLUME=Volume 13 - 2022 YEAR=2023 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2022.1085332 DOI=10.3389/fgene.2022.1085332 ISSN=1664-8021 ABSTRACT=CRISPR-Cas9 system is one of the most used genome editing techniques in the recent time. In spite of its high potentiality to modify the specific target genes and region of the genome which are complementary of the designed guide RNA (or sgRNA), still it suffers from the off-target effect. However, the machine learning techniques has been already developed for human, animals and few for plant species. Here, in this study, an attempt has been made to develop models based on three machine learning based techniques (i.e. Artificial Neural Network, Support Vector Machine and Random Forest) for estimation of the CRISPR-Cas9 cleavage sites to be cleaved by a given sgRNA. All these machine learning based models were exclusively developed on the plant dataset. The models were trained on the 70 percent of the collected on-target and off-target dataset of different plant species. Whereas the performance of the model were evaluated on remaining 30 percent of collected data based on various evaluation measures such as specificity, sensitivity, accuracy, precision F1 score, F2 score and AUC. All together eleven models were trained based on above machine learning techniques. Relative evaluation of these developed models reveals that model based on random forest technique shows better performance. Random Forest model accuracy is 96.27% and area under ROC curve (AUC) was found to be 99.21%. Total six models based on ANN technique (ANN1-Logistic, ANN1-Tanh, ANN1-ReLU, ANN2-Logistic, ANN2-Tanh, and ANN-ReLU) and four SVM models (SVM-Linear, SVM-Polynomial, SVM-Gaussian and SVM-Sigmoid) were trained. The performance of ANN1-ReLU and SVM-Linear model were found to be better among ANN and SVM based models respectively but overall performance of Random Forest is better among all other ML techniques.