AUTHOR=Haas Oliver , Maier Andreas , Rothgang Eva TITLE=Machine Learning-Based HIV Risk Estimation Using Incidence Rate Ratios JOURNAL=Frontiers in Reproductive Health VOLUME=Volume 3 - 2021 YEAR=2021 URL=https://www.frontiersin.org/journals/reproductive-health/articles/10.3389/frph.2021.756405 DOI=10.3389/frph.2021.756405 ISSN=2673-3153 ABSTRACT=HIV/AIDS is an ongoing global pandemic, with an estimated 39 million infected worldwide. Early detection is anticipated to help improve outcomes and prevent further infections. Point-of-care diagnostics make HIV/AIDS diagnoses available both earlier and to a broader population. Wide-spread and automated HIV risk estimation can offer objective guidance. This supports providers in making an informed decision when considering patients with high HIV risk for HIV testing or pre-exposure prophylaxis (PrEP). We propose a novel machine learning method that allows providers to use the data from a patient's previous stays at the clinic to estimate their HIV risk. All variables available in the clinical data are considered, making the set of variables objective and independent of expert opinions. The proposed method builds on association rules that are derived from the data. The incidence rate ratio (IRR) is determined for each rule. Given a new patient, the average IRR of all applicable rules is used to estimate their HIV risk. The method was tested and validated on the publicly available clinical database MIMIC-IV, which consists of around 525,000 hospital stays that included a stay at the intensive care unit or emergency department. We evaluated the method using the area under the receiver operating characteristic curve (AUC). The best performance with an AUC of 0.88 was achieved with a model consisting of 78 rules. A threshold value of 1.0, i.e. an IRR that denotes no association, leads to a sensitivity of 98% and a specificity of 51%. The rules were grouped into social factors (e.g. homelessness, violence), drug abuse, psychological illnesses (e.g. depression, PTSD), previously known associations (e.g. pulmonary, neurological diseases), and new associations (e.g. diabetes, insulin uptake). In conclusion, we propose a novel HIV risk estimation method that builds on existing clinical data. It incorporates a wide range of variables, leading to a model that is independent of expert opinions. It supports providers in making informed decisions in the point-of-care diagnostics process by estimating a patient's HIV risk.