AUTHOR=Zhang Jiahui , Du Wenjie , Yang Xiaoting , Wu Di , Li Jiahe , Wang Kun , Wang Yang TITLE=SMG-BERT: integrating stereoscopic information and chemical representation for molecular property prediction JOURNAL=Frontiers in Molecular Biosciences VOLUME=Volume 10 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/molecular-biosciences/articles/10.3389/fmolb.2023.1216765 DOI=10.3389/fmolb.2023.1216765 ISSN=2296-889X ABSTRACT=Molecular property prediction is a crucial task across various fields and has recently garnered significant attention. To achieve accurate and rapid prediction of molecular properties, machine learning (ML) models have been widely employed as their superior performances compared with traditional methods by trial. However, most existing ML models, which are lacking of involving 3D molecular information, are still improvable since they are mostly poor at differentiating stereoisomers of certain types, particularly chiral ones. And routine featurization methods which only employ incomplete features such as molecular fingerprint or topological graph, is hard to obtain explicable molecular representations. In this paper, we propose the Stereo Molecular Graph BERT (SMG-BERT) by integrating the 3D space geometrical parameters, 2D topological information and 1D SMILES string within self-attention-based BERT model. In addition, nuclear magnetic resonance (NMR) spectroscopy result and bond dissociation energy (BDE) are integrated as extra atomic and bond features to enhance the model performance and interpretability analysis. The comprehensive integration of 1D, 2D and 3D information could establish a unified and unambiguous molecular characterization system to distinguish conformations, like chiral molecules. Intuitively integrated chemical information enables the model possess the interpretability which is consistent with the chemical logic. The experimental results on 12 benchmark molecular datasets show that SMG-BERT consistently outperforms existing methods. The series of experiments simultaneously verify that SMG-BERT is generalizable and reliable.