Skip to main content

ORIGINAL RESEARCH article

Front. Genet.
Sec. Human and Medical Genomics
Volume 15 - 2024 | doi: 10.3389/fgene.2024.1362469

Host genetics and COVID-19 severity: increasing the accuracy of latest severity scores by Boolean quantum features

Provisionally accepted
  • 1 Medical Genetics Unit, Azienda Ospedaliera Universitaria Senese, Siena, Tuscany, Italy
  • 2 Osservatorio Astrofisico di Arcetri (INAF), Florence, Tuscany, Italy
  • 3 Med Biotech Hub and Competence Center, Department of Medical Biotechnology, University of Siena, Siena, Tuscany, Italy
  • 4 National Research Council (CNR), Roma, Italy
  • 5 Department of Electrical, Electronic and Information Engineering, School of Engineering, University of Bologna, Bologna, Emilia-Romagna, Italy

The final, formatted version of the article will be published soon.

    The impact of common and rare variants in COVID-19 host genetics has been widely studied. In particular, in Fallerini et al. (2022) common and rare variants were used to define an interpretable machine learning model for predicting COVID-19 severity. Firstly, variants were converted into sets of Boolean features, depending on the absence or the presence of variants in each gene. An ensemble of LASSO logistic regression models was used to identify the most informative Boolean features with respect to the genetic bases of severity. After that, the Boolean features, selected by these logistic models, were combined into an Integrated PolyGenic Score, the so called IPGS, which offers a very simple description of the contribution of host genetics in COVID-19 severity. IPGS leads to an accuracy of 55-60 % on different cohorts and, after a logistic regression with in input both IPGS and the age, it leads to an accuracy of 75%. The goal of this paper is to improve the previous results, using not only the most informative Boolean features with respect to the genetic bases of severity but also the information on the host organs involved in the disease. We generalize here the IPGS adding a statistical weight for each organ, through the transformation of Boolean features into "Boolean quantum features", inspired by the Quantum Mechanics. The organs coefficients were set via the application of the genetic algorithm PyGad and, after that, we defined two new Integrated PolyGenic Score (IPGS_1^ph and IPGS_2^ph ). By applying a logistic regression with both IPGS, IPGS_2^ph (or indifferently IPGS_1^ph ) and age as input, we reach an accuracy of 84-86%, thus improving the results previously shown in Fallerini et al. (2022) by a factor of 10%..

    Keywords: COVID19, host genetics, Integrated PolyGenic Score, Genetic Algorithm, Logistic regression, Genetic science modelling

    Received: 28 Dec 2023; Accepted: 09 Apr 2024.

    Copyright: © 2024 Martelloni, Turchi, Fallerini, Degl'innocenti, Baldassarri, Multicenter Study, Olmi, Furini and Renieri. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Simona Olmi, National Research Council (CNR), Roma, Italy

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.