AUTHOR=Villalobos-Alva Jalil , Ochoa-Toledo Luis , Villalobos-Alva Mario Javier , Aliseda Atocha , Pérez-Escamirosa Fernando , Altamirano-Bustamante Nelly F. , Ochoa-Fernández Francine , Zamora-Solís Ricardo , Villalobos-Alva Sebastián , Revilla-Monsalve Cristina , Kemper-Valverde Nicolás , Altamirano-Bustamante Myriam M. 

TITLE=Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field

JOURNAL=Frontiers in Bioengineering and Biotechnology

VOLUME=Volume 10 - 2022

YEAR=2022

URL=https://www.frontiersin.org/journals/bioengineering-and-biotechnology/articles/10.3389/fbioe.2022.788300

DOI=10.3389/fbioe.2022.788300

ISSN=2296-4185

ABSTRACT=The implementation of machine learning/AI in protein science gives rise to a world of knowledge adventures in the workhorse of the cell and proteome homeostasis, which are essential for making life possible. The benefits of which are already tangible, such as important advances in  protein structure prediction. Moreover, the driving force behind the protein processes of self-organization, adjustment and fitness requires a space corresponding to gigabytes of life data in its order of magnitude. 
We undertake our task by exploring “the state of the art” in AI and machine learning (ML) applications to protein science in the scientific literature to address some critical research questions in this domain.  We are a cross-functional group of scientists from several academic disciplines  and we have conducted the systematic review using a variant of the PICO and PRISMA approaches. The search was carried out in four databases (PubMed, Bireme, OVID, and EBSCO), resulting in 144 research papers. After three rounds of quality screening, 93 articles were finally selected for further analysis. A summary of our findings  is as follows: regarding AI applications, there are mainly four: i) genomics, ii) protein structure and function, iii) protein design and evolution, and iv) drug design.  In terms of the ML algorithms and databases used, supervised learning was the most common approach (85%). As for the databases used for the ML models, PDB and UniprotKB/Swissprot were the most common (21% and 8%, respectively). Moreover, we identified that approximately 63% of the papers organized their results into three steps, which we labelled pre-process, process, and post-process. 
Our main finding is that, as of today, there are no research road maps serving as guides to address gaps in our knowledge of the AI-PS binomial. All research efforts  to collect, integrate multidimensional data features and then analyze and validate them are, so far, uncoordinated and scattered throughout the scientific literature without a clear epistemic goal or connection between the studies. Therefore, our main contribution to the scientific literature is to offer a road map to help solve problems in PS,