AUTHOR=Chien Ching-Hsuan , Huang Lan-Ying , Lo Shuen-Fang , Chen Liang-Jwu , Liao Chi-Chou , Chen Jia-Jyun , Chu Yen-Wei TITLE=Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants JOURNAL=Frontiers in Genetics VOLUME=Volume 12 - 2021 YEAR=2021 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2021.798107 DOI=10.3389/fgene.2021.798107 ISSN=1664-8021 ABSTRACT=To change the expression of the flanking genes by inserting T-DNA into the genome is commonly used in rice functional gene research. However, whether the expression of a gene of interest is enhanced must be validated experimentally. Consequently, to improve the efficiency of screening activated genes, we established a model to predict gene expression in T-DNA mutants through machine learning methods. We gathered experimental datasets consisting of gene expression data in T-DNA mutants and captured the PROMOTER and MIDDLE sequences for encoding. In first-layer models, SVM models were constructed with nine features consisting of information about biological function and local and global sequences. Feature-encoding based on the PROMOTER sequence was weighted by logistic regression. The second-layer models integrated 16 first-layer models with mRMR feature selection and the LADTree algorithm, which were selected from nine feature selection methods and 65 classified methods, respectively. The accuracy of the final two-layer machine learning model, referred to as TIMgo, was 99.3% based on five-fold cross-validation, and 85.6% based on independent-testing. We discovered that the information within the local sequence had a greater contribution than the global sequence with respect to classification. TIMgo had a good predictive ability for target genes within 20 kb from the 35S enhancer. Based on the analysis of significant sequences, the G-box regulatory sequence may also play an important role in the mechanism of activation of the 35S enhancer.