Development and Validation of a Personalized, Sex-Specific Prediction Algorithm of Severe Atheromatosis in Middle-Aged Asymptomatic Individuals: The ILERVAS Study

Background Although European guidelines recommend vascular ultrasound for the assessment of cardiovascular risk in low-to-moderate risk individuals, no algorithm properly identifies patients who could benefit from it. The aim of this study is to develop a sex-specific algorithm to identify those patients, especially women who are usually underdiagnosed. Methods Clinical, anthropometrical, and biochemical data were combined with a 12-territory vascular ultrasound to predict severe atheromatosis (SA: ≥ 3 territories with plaque). A Personalized Algorithm for Severe Atheromatosis Prediction (PASAP-ILERVAS) was obtained by machine learning. Models were trained in the ILERVAS cohort (n = 8,330; 51% women) and validated in the control subpopulation of the NEFRONA cohort (n = 559; 47% women). Performance was compared to the Systematic COronary Risk Evaluation (SCORE) model. Results The PASAP-ILERVAS is a sex-specific, easy-to-interpret predictive model that stratifies individuals according to their risk of SA in low, intermediate, or high risk. New clinical predictors beyond traditional factors were uncovered. In low- and high-risk (L&H-risk) men, the net reclassification index (NRI) was 0.044 (95% CI: 0.020–0.068), and the integrated discrimination index (IDI) was 0.038 (95% CI: 0.029–0.048) compared to the SCORE. In L&H-risk women, PASAP-ILERVAS showed a significant increase in the area under the curve (AUC, 0.074 (95% CI: 0.062–0.087), p-value: < 0.001), an NRI of 0.193 (95% CI: 0.162–0.224), and an IDI of 0.119 (95% CI: 0.109–0.129). Conclusion The PASAP-ILERVAS improves SA prediction, especially in women. Thus, it could reduce the number of unnecessary complementary explorations selecting patients for a further imaging study within the intermediate risk group, increasing cost-effectiveness and optimizing health resources. Clinical Trial Registration [www.ClinicalTrials.gov], identifier [NCT03228459].


INTRODUCTION
Atherosclerotic cardiovascular disease (ASCVD) is still the main preventable cause of cardiovascular mortality and disability. ASCVD is a latent and progressive condition in which atheroma plaques develop in the artery wall, and it is usually widely extended by the time symptoms, typically a cardiovascular event, occur. Current European guidelines on cardiovascular prevention recommend initial assessment and stratification of the risk of ASCVD based on a probabilistic tool that includes traditional risk factors, followed by therapeutic intervention when necessary (1). However, these traditional risk scores underestimate individual risk in several circumstances, such as women, youths, and individuals with a low-to-moderate ASCVD risk (2,3). Indeed, the number of cardiovascular events in individuals with a low-to-moderate risk is unacceptably high (4) and, although ASCVD is often thought to be a disease with a higher prevalence in men, the annual cardiovascular mortality rate has remained greater in women (5). Despite these numbers, women have been underrepresented in CVD-related clinical trials, and data are not often stratified by sex, limiting their interpretation. Indeed, better individualized prediction algorithms are urgently required, especially in individuals with moderate risk and in women.
One of the available tools to improve prediction algorithms is the use of non-invasive imaging techniques. Those techniques can detect the presence, estimate the extent, and evaluate the clinical consequences of ASCVD. Detection of subclinical atheromatosis in the arteries by vascular ultrasound is a reliable method to predict future coronary events (6), and it improves cardiovascular risk assessment in asymptomatic individuals (7). The most common method of estimating ASCVD burden has been carotid ultrasound. Carotid artery atheromatosis has a well-demonstrated role in the incidence of cerebrovascular and cardiovascular events (8). Atherosclerotic lower extremity peripheral artery disease is increasingly recognized as an important cause of cardiovascular morbidity and mortality (9). Nevertheless, femoral artery atheromatosis assessment has long been underexplored, although femoral atheroma plaques show the strongest association with cardiovascular risk factors, and a higher sensitivity to predict calcified coronary disease (10). Furthermore, extensive vascular territory assessment has been shown to increase its predictive value in cardiovascular mortality (11). However, and even though the 2019 European guideline recommends vascular ultrasounds in individuals with a low-tomoderate ASCVD risk (1), its implementation is still limited possibly because it is not easy to select candidates who could benefit from a comprehensive imaging study. Thus, a new algorithm that helps clinicians to identify those patients is highly recommended, as it will be cost-effective and will optimize health resources allocation.
Machine learning (ML) technologies can improve the diagnosis of diseases developing risk prediction algorithms to guide clinical decisions (12). Although several ML algorithms have been described to predict cardiovascular events (13)(14)(15)(16)(17), few predict the presence and extent of subclinical atheromatosis (18), and sex-specific algorithms are not available. Therefore, we developed and validated a personalized model to predict the probability of severe atheromatosis (SA; the PASAP-ILERVAS) based on classification trees (19), integrating clinical data, anthropometrical parameters, and routine, easy-to-obtain, affordable biochemical parameters in an asymptomatic, middleaged population with a low-to-moderate cardiovascular risk. The new algorithm could help clinicians to select patients who could benefit from further vascular ultrasound examination and reduce the number of unnecessary complementary explorations.

Discovery Cohort: The ILERVAS Study
It is an ongoing randomized, interventional, longitudinal clinical trial in a low-to-moderate cardiovascular risk population of the North-East region of Spain (ClinicalTrials.gov identifier: NCT03228459). The study was originally designed with two arms: (i) intervention group, thereafter called Mobile Unit follow-up group, and (ii) no intervention group, thereafter called Electronic Medical Record follow-up group. The intervention was the generation of a report sent to the primary care physician, including vascular ultrasound examination in carotid and femoral arteries assessing 12 territories, combined with clinical, anthropometric, lifestyle, and biochemical parameters.
The study population was randomly allocated to the groups by stratified sampling from the electronic clinical history database of primary care. The no intervention group will be used to compare the impact of the intervention on cardiovascular morbidity and mortality. Thus, this group was not used for the present work. The intervention group was formed by 8,330 participants who were enrolled from January 2015 to December 2018 from 32 basic health areas of the province of Lleida (Catalonia, Spain). The study was designed to (i) assess the prevalence, vascular distribution, severity, and progression of subclinical atheromatosis in a middle-aged population with a low-to-moderate cardiovascular risk; (ii) uncover potential new factors predicting severe; and (iii) assess the impact of subclinical atheromatosis detection on the incidence of cardiovascular events during a 10-year follow-up period. The inclusion criteria were: 50-70-yearold women and 45-65-year-old men with at least one of the following CVD risk factors: hypertension, dyslipidemia, obesity (defined as body mass index, BMI ≥ 30), smoking, and/or first-degree relative who developed premature CVD (with a threshold at age 55 years for men or 65 years for women). The exclusion criteria were clinical history of diabetes, chronic kidney disease, cardiovascular pathology (angina, myocardial infarction, cerebral vascular accident, peripheral arterial disease, intestinal, or other ischemia), history of arterial surgery, active neoplasia, less than 18 months of expected life, long-term home care, and/or institutionalized population. The Ethics Committee of the Hospital Arnau de Vilanova (Lleida, Spain) approved the protocol (CEIC-1410). All patients signed informed consent. The study was conducted according to the principles of the Declaration of Helsinki. A more-detailed explanation of the study has been previously published (3).

Validation Cohort: The NEFRONA Study
It is an observational, multicenter, prospective study. For the present study, 559 healthy asymptomatic individuals with normal kidney function were selected from the NEFRONA cohort. Subjects were enrolled in Primary Care centers from October 2010 to June 2012. It was designed to evaluate the prevalence and natural history of subclinical atheromatosis in chronic kidney disease patients, and the contribution of vascular ultrasound for a more precise cardiovascular risk assessment. The NEFRONA cohort included a control group of individuals with normal kidney function, which is the subgroup used for the validation. Inclusion criteria were asymptomatic individuals between 18 and 74 years of age with a glomerular filtration rate (GFR) over 60 ml/min/1.73 m 2 . Exclusion criteria were active infections, pregnancy, active neoplasia, life expectancy shorter than 12 months, previous cardiovascular event, carotid surgery, or any organ transplantation. The Ethics Committee of the Hospital Arnau de Vilanova (Lleida, Spain) approved the protocol. All patients signed informed consent. The study was conducted according to the principles of the Declaration of Helsinki. A more-detailed explanation of the study has been previously published (20).

Source of Information and Data Collection
Sociodemographic variables and clinical history of cardiovascular risk factors were collected from clinical records. In contrast, anthropometric data, smoking habit, and blood samples were collected at the moment of vascular examination.

Anthropometric Data
The same protocols were used in both cohorts. Weight and height were measured according to guidelines to obtain BMI. Waist perimeter was measured with a non-stretchable tape with a precision of 0.1 cm to assess abdominal adiposity, which was defined as a waist perimeter ≥ 88 cm in women and ≥ 102 cm in men. Blood pressure was determined in triplicate, after 5 min rest using an automated device (Omron M6 Comfort, Omron Healthcare, Japan) at 2-min intervals. The mean of the three recordings was calculated.

Biochemical Parameters
In the ILERVAS cohort, a fasting dried blood spot sample was obtained by a fingertip puncture according to standard protocols. Creatinine, uric acid, and total cholesterol levels were assessed with the REFLOTRON R Plus system (Roche Diagnostics, Germany). It is a validated clinical chemistry system with highly correlated results to well-standardized laboratory methods (21)(22)(23). The glycosylated hemoglobin test was performed using a point-of-care instrument (Cobas B101 R , Roche Diagnostics, Germany) that meets the generally accepted performance criteria for its measurement (24). In the NEFRONA cohort, biochemical parameters were obtained from a routine fasting blood test taken no more than 3 months apart from vascular examination. GFR was estimated according to international guidelines using the CKD-EPI equation in both cohorts (25).

Atheromatous Plaque Assessment by Vascular Ultrasound
Vascular ultrasound was performed by nurses specialized in vascular imaging. Standardized scanning and reading protocols were followed to decrease interoperator variability and type 2 errors. Intra-observer reliability assessment showed a k-coefficient of one (two repeated measurements; 1,007 observations). Overall inter-rater reliability for all operators showed a k-coefficient of 0.915 (95% CI: 0.892-0.944; 959 observations). Readers were unaware of the patients' clinical histories. The VIVID i BT09 model ultrasound system (GE Healthcare) equipped with a 12L-RS linear probe (6)(7)(8)(9)(10)(11)(12)(13), and a pulsed Doppler ultrasound was used to assess hemodynamic repercussions. In the ILERVAS cohort, arterial ultrasound was performed in 12 territories, both carotid (common, bifurcation, internal, and external) and femoral (common and superficial) arteries. In the NEFRONA study, 10 territories were explored, both carotid (common, bifurcation, and internal) and femoral (common and superficial) arteries (3,20). Subclinical atheromatosis was defined as the presence of any plaque in the explored areas. According to Mannheim consensus, an atheroma plaque was defined as a focal encroachment into the lumen of the artery ≥ 1.5 mm (26). SA was defined as ≥ 3 territories with atheroma plaque.

Variable Selection
Parameters that require a high degree of specialization or technical resources were ruled out. Thus, a set of easy-toobtain and affordable variables were selected. The outcome was SA, defined as ≥ 3 territories with atheroma plaque out of 12. The explanatory variables were age, sex, clinical data (history of hypertension and dyslipidemia and smoking habit), anthropometrical data [systolic blood pressure (SBP), diastolic blood pressure (DBP), body mass index (BMI), abdominal adiposity, and waist-to-height ratio], and biochemical parameters (creatinine, GFR, uric acid, and total cholesterol).

Descriptive Analysis of the Cohorts
The clinical characteristics of the ILERVAS cohort (discovery cohort) and NEFRONA cohort (external validation cohort) were described as frequencies for categorical variables, and means and standard deviation (SD) for quantitative variables. Differences between SA and non-SA were assessed by several univariate logistic regressions for each clinical predictor adjusted by age and stratified by sex. The p-value of the likelihood-ratio test and odds ratio (OR) values with 95% confidence intervals (95% CI) were represented with the forestplot package (27). For each numerical predictor, adjusted OR per 1-SD higher parameter measure was estimated.

Sample Balancing
The descriptive analyzed revealed that the prevalence of SA was lower than non-SA in the discovery cohort. Thus, a Random Over-Sampling Example (ROSE) function (28) was used to increase the sample size of individuals with SA separately for men and women. Individuals who presented missing data (Supplementary Table 1) were excluded prior to sample balancing. This new balanced data set was used to train ML models in both sexes.

Machine Learning
A Recursive PARTicioning (rpart) classification tree (29) approach was used and plotted with rpart.plot (30). Trees were constructed in the balanced sample stratified by sex as follows: first, the algorithm recursively selected itself the clinical predictors that provide an optimal split in each node based on the generalized Gini index to obtain the maximal impurity reduction. The growth of the tree was controlled by pre-pruning techniques. A minimum of 200 individuals in each node was required to try a split and at least 100 in the terminal nodes. The maximum depth of the final tree was fixed on 6. Second, the resultant tree was trimmed back to minimize overfitting by postpruning techniques, which select the complexity parameter (CP) that produced the minimum internal three-fold cross-validation error to prune the full tree. The probability of SA and the 95% CI were represented in the terminal nodes with ggplot2 (31).
The obtained model was called Personalized Algorithm for Severe Atheromatosis Prediction (PASAP-ILERVAS).

Variable Importance Calculation
A variable importance score was calculated using the improvement measure attributable to each predictor in its role as splitter, plus the goodness for all splits in which it was a surrogate. The values of importance were scaled up to sum 100%.

Probability Calibration
In order to agglutinate terminal nodes into groups with similar risk, the probabilities of SA were calibrated using a histogramtransformed method using the CalibratR (32). First, the original probabilities with a maximum of 10 bins were plotted. Then, the optimal number of partitions to maximize the sensitivity was tested. The calibration process evidenced three risk groups, classified as low-, intermediate-, and high-risk nodes.

Prediction of Severe Atheromatosis and Non-severe Atheromatosis With Personalized Algorithm for Severe Atheromatosis Prediction
The Youden index (33) was computed to identify the cut-off probability to distinguish between non-SA and SA in the terminal nodes. Terminal nodes with a lower or equal probability to the cut-off were considered as non-SA, whereas terminal nodes with a higher probability were classified as SA.

Performance Metrics
The area under the curve (AUC) and the recommended metrics derived from the confusion matrix were used to evaluate model performance (34). A report from the European Society of Cardiology Prevention of CVD Programme is recommended to familiarize the reader with the risk prediction tools in cardiovascular disease prevention and performance metrics (35).

External Validation
The PASAP-ILERVAS model was externally validated in a subpopulation of the NEFRONA study and performance metrics were also calculated.

Systematic COronary Risk Evaluation Model
A logistic binary model with the Systematic COronary Risk Evaluation (SCORE) model was developed in the discovery balanced sample. The Youden index was computed to identify the cut-off probability to distinguish between non-SA and SA and performance metrics were calculated in L&H-risk individuals.

Improvement of Risk Prediction Between the Personalized Algorithm for Severe Atheromatosis Prediction and the Systematic COronary Risk Evaluation Model
The increment on AUC, the Net Reclassification Index (NRI), and the Integrated Discrimination Index (IDI) were evaluated to quantify the differences between both models in L&H-risk individuals (36). The SCORE model was considered as reference, and the PASAP-ILERVAS as the new model whose improvement was evaluated (37,38). The CI for all metrics was performed by bootstrapping (2,000 bootstraps). The comparisons between models were performed by the DeLong test at 95% CI (39). To test the differences between performance metrics [sensitivity, specificity, accuracy, and the positive predictive value (PPV)], a two-sample test for equality of proportions with continuity correction was performed (40).

Clinical Characteristics of the Discovery and Validation Cohorts
The workflow of the whole analysis is shown in Figure 1. The clinical characteristics of the ILERVAS cohort stratified by sex is shown in Table 1. The prevalence of SA, defined as ≥ 3 territories with atheroma plaque out of the 12 studied, was 42.6 and 24.4% in men and women, respectively. These patients showed a higher prevalence of clinical history of hypertension and dyslipidemia, and there were fewer non-smokers and more current smokers. The mean age was higher in participants with SA, and the levels of systolic blood pressure, glycosylated hemoglobin, uric acid, and total cholesterol. However, the association of these clinical predictors with SA showed some differences between sexes (Supplementary Figure 1). Age, systolic and diastolic blood pressure, total cholesterol, and being underweight showed a stronger positive association with SA in men than in women   Values are shown as means and standard deviations for quantitative variables. Clinical history data were obtained from electronic medical records and refers to patients who had the prior clinical diagnostic of hypertension or dyslipidemia. Underweight was defined as a BMI < 18.5 kg/m 2 , normal weight as 18.5-24.9, overweight 25-29.9, and obesity ≥ 30. Abdominal adiposity was defined as an abdominal perimeter ≥ 88 cm in women or ≥ 102 in men. DBP, diastolic blood pressure; GFR, glomerular filtration rate; SBP, systolic blood pressure.

A Personalized Algorithm for Severe Atheromatosis Prediction
Pruning the trees to 13 splits in men and 17 splits in women showed the lowest internal cross-validation error, beyond which tree complexity entailed no additional improvement (Supplementary Figure 2). The structure of the classification tree in men and women in the balanced ILERVAS cohort is shown in Figures 2, 3, respectively. In both genders, patients were first classified by age, followed by the smoking habit. The combination of these two parameters conditioned the clinical threshold of the other predictors, such as systolic blood pressure, total cholesterol, uric acid, GFR, BMI, waist perimeter, and history of dyslipidemia and hypertension. A total number of 14 final nodes were obtained in men and 18 in women. The probability of severe atheromatosis and its 95% CI are shown in each node to improve visualization. In order to agglutinate individual terminal nodes into groups with similar risk, the probability of severe atheromatosis in each node was calibrated. Supplementary Figure 3 shows the absolute frequency of patients with or without severe atheromatosis and its probability of having the disease before and after calibration. Three risk profiles were identified: low-risk nodes with a mean prevalence of severe atheromatosis of 28.8% in men and 28.6% in women, intermediate-risk nodes with 54.8% in men and 54.9% in women, and high-risk nodes with 71.8% in men and 71.2% in women. These profiles are shown in Figures 2, 3 as green, yellow, and red nodes, respectively. Thus, clinicians could easily identify the risk of severe atheromatosis according to the characteristics of the patients.
Variable importance assessment revealed that in both sexes, age, smoking habit, SBP, and total cholesterol were the most important clinical predictors of severe atheromatosis, which account for 98.4% of the predictor importance in men and 82.3% in women. Additionally, a GFR ≥ 107 ml/min/m 2 increased the risk in men. In women, other predictors, such as history of dyslipidemia, BMI, history of hypertension, uric acid, and waist perimeter were identified (Figure 4). Table 2 shows the internal validation metrics. Specificity, PPV, accuracy, and AUC were higher in L&H-risk individuals compared to the total sample (p-value: < 0.001), which indicate a higher feasibility of the PASAP-ILERVAS in those patients. On the contrary, the negative predictive value (NPV) showed no FIGURE 2 | Personalized Algorithm for Severe Atheromatosis Prediction in men. The structure of the classification tree in men is represented. The probability of severe atheromatosis, which corresponds to the proportion of affected patients, is shown inside circles in the final nodes. The barplot offers a clearer visualization of this prevalence with its 95% confidence interval. The colors green, yellow, and red indicate the level of risk (low, intermediate, or high) identified after calibration. The units were as follows: age, year; GFR, ml/min/m 2 ; SBP, mmHg; total cholesterol, mg/dl. GFR, glomerular filtration rate, SBP, systolic blood pressure. differences (i-value: > 0.1), and sensitivity was lower in L&H-risk individuals than in the total sample (i-value: < 0.001).

Internal Validation of the Personalized Algorithm for Severe Atheromatosis Prediction Algorithm
In L&H-risk patients, all parameters were similar in both sexes. The NPV, the PPV, and accuracy were 0.71. Thus, the PASAP-ILERVAS showed a 71% of correct predictions (including both true positive and true negative). Finally, AUC was 0.734 in men and 0.730 in women.

External Validation of the Personalized Algorithm for Severe Atheromatosis Prediction Algorithm
The external validation was performed in an asymptomatic, kidney disease-free, subpopulation of the NEFRONA cohort.
Although the prevalence of SA was lower in the NEFRONA cohort compared to the ILERVAS cohort, patients showed similar characteristics (Supplementary Table 3). Table 2 shows the external validation metrics. In L&H-risk individuals, the specificity was similar in both sexes (men: 0.795; women 0.774, i-value = 0.783). The NPV was extremely high (men: 0.898; women: 0.941). Thus, PASAP-ILERVAS showed a high ability in identifying truly healthy individuals.
The PPV was higher in men than in women (0.561 vs. 0.178, p-value: < 0.001). It is noteworthy that performance metrics were limited by the low prevalence of SA in the external validation cohort (men: 31.2%; women: 10.7%). Sensitivity was similar in both sexes (0.744 vs. 0.500, p-value = 0.141). Accuracy was high in both, indicating a 78.2% and a 75.0% of correct predictions

Comparison of the Personalized Algorithm for Severe Atheromatosis Prediction Algorithm With the Systematic COronary Risk Evaluation
The performance of the PASAP-ILERVAS to predict SA in L&Hrisk individuals was compared to the traditional SCORE model (Figure 5 and Table 3).
In men, the PASAP-ILERVAS showed a higher sensitivity than the traditional score (0.661 vs. 0.620, p-value: 0.019). Thus, the new algorithm had a higher ability to detect L&H-risk men who actually were affected by the disease. In contrast, other metrics The new algorithm showed a significant increase in the AUC of 0.074 (95% CI: 0.062-0.087, p-value: < 0.001). Thus, PASAP-ILERVAS showed a higher percentage of correct predictions in L&H-risk women than the traditional SCORE, especially those FIGURE 4 | Clinical predictor importance in the PASAP-ILERVAS in both sexes. A variable importance score was calculated using the improvement measure attributable to each predictor in its role as splitter, plus the goodness for all splits in which it was a surrogate. The values of importance were scaled up to sum 100%. SBP, systolic blood pressure; BMI, body mass index; GFR, glomerular filtration rate.

DISCUSSION
In this study, we presented a novel, sex-specific, machine learning-based algorithm for SA prediction in asymptomatic middle-aged individuals. The PASAP-ILERVAS integrates clinical history data, anthropometrical measurements, and affordable biochemical parameters to obtain a hierarchical, flexible, easyto-interpret, predictive model. This model was validated in an external cohort, showing excellent performance in the L&H-risk individuals. Thus, the algorithm could be useful to select individuals who can benefit from a vascular ultrasound, namely those classified as intermediate risk. Non-invasive imaging techniques can detect the presence of atheromatosis and estimate the burden, which has been proven to improve cardiovascular risk assessment (7), and it is recommended when the coronary artery calcium score is unavailable or not feasible (41). Furthermore, it is a cheap and accessible in most medical offices. However, the implementation of vascular ultrasound, which could help increase accuracy in those patients has not been accomplished in the routine clinical practice due to several reasons. Among them, the high-time burden added to each visit is likely one. Therefore, identifying patients in whom a vascular ultrasound could add prognostic value is of paramount importance. The novel PASAP-ILERVAS stratifies individuals according to their risk of SA as low, intermediate, and high. First, patients were classified by age, followed by smoking habit. The combination of these two parameters conditioned the clinical threshold of the other predictors. Thus, the ML approach evidenced the additive effect of cardiovascular risk factors. In addition, the algorithm revealed that biological thresholds which conditioned ASCVD risk, must be considered individually.
Age, smoking habit, SBP, and total cholesterol accounts 98.4% of the predictor importance in men and 82.3% in women. Additionally, a GFR ≥ 107 ml/min/m 2 increased the risk in men. Although it is amply proven that kidney failure increases ASCVD risk as kidney function decreases, few studies linked an increased glomerular filtration with ASCVD risk. Glomerular hyperfiltration is an initial step of kidney damage in diabetes and obesity, but its threshold remains elusive. However, there are data showing that glomerular hyperfiltration of 107-115 ml/min/m 2 is independently associated with increased cardiovascular risk in middle-aged healthy individuals (42). In women, other predictors related to obesity were identified. Obesity is associated with an increased cardiovascular risk since the pro-inflammatory cytokines produced by the adipose tissue itself induce atherosclerotic plaque formation (43), and a larger waist perimeter was independently associated with recurrent atherosclerotic cardiovascular disease (44). The PESA study, which is another large Spanish cohort similar to the ILERVAS cohort with asymptomatic middle-aged individuals without established CVD, revealed that in metabolically healthy individuals the presence of subclinical atherosclerosis increased across BMI categories, whereas fewer differences were observed for metabolically unhealthy individuals. Thus, the presence of subclinical atherosclerosis observed in patients with an abnormal BMI was mainly attributed to the coexistence of other cardiovascular risk factors, such as hypertension, diabetes, and dyslipidemia (45). A recent study in a subpopulation of the PESA study revealed that individuals with metabolic syndrome or its individual components (central obesity, hypertension, low HDL-C, triglycerides, and altered glucose metabolism) showed bone marrow activation, even in the absence of systemic inflammation, which is associated with early atherosclerosis (46). Similarly, uric acid levels have been positively correlated with cardiovascular diseases, including hypertension and atherosclerosis, through oxidative stress and an inflammatory response (47). Women are often older than men when they suffer from their first cardiovascular event (48). This difference is attributed to the protective role of circulating estrogens on the vascular endothelium, maintaining oxide nitric release leading to vasodilation (49), regulating prostaglandin production and inhibiting smooth muscle cell proliferation (50). On the contrary, at menopause, women show endothelial dysfunction and lipid deposition in the artery walls, which can promote atherosclerosis development (51). In women under 55 years of age, smoking is the most important preventable cause of CVD, increasing their risk 7-fold (52). Large cohort studies have demonstrated that high SBP is an important risk factor for cardiovascular disease and a 5-mm Hg reduction of SBP reduced the risk of major cardiovascular events by about 10% (53). Strikingly, hypertension is more strongly associated with CVD in women compared to men (54).
The PASAP-ILERVAS can be used as a prescreening model to determine patients who will further benefit from a vascular ultrasound examination. It identified three risk profiles (low, intermediate, and high). The internal validation showed that specificity, PPV, accuracy, and AUC were higher in L&H-risk individuals compared to the total sample (p-value: < 0.001), which indicate a higher feasibility of the PASAP-ILERVAS in those patients. In L&H-risk individuals, all parameters were similar in both sexes. The NPV and the PPV were higher than 71%, which indicated that PASAP-ILERVAS algorithm correctly identified patients with SA (true positives) or without SA (true negatives). The external validation results were limited by the low prevalence of SA in that cohort. However, the accuracy was high in both sexes (men: 0.782; women: 0.750), indicating a high percentage of correct predictions. Thus, patients with an estimated low-or high-risk are very accurately classified by the algorithm and are not candidates for vascular ultrasound examination. On the contrary, patients in whom the prediction is not so accurate (intermediate-risk) are strong candidates to undergo a vascular ultrasound to truly determine whether they have SA in order to adjust treatments and/or follow-up visits. The NPV was extremely high (men: 0.898; women: 0.941). Thus, PASAP-ILERVAS showed a high ability in identifying truly healthy individuals.
The SCORE is a cardiovascular risk assessment tool that estimates the 10-year risk of fatal CVD (55). At the moment, there is no internationally accepted tool to predict subclinical atheromatosis, so new methods to do so have been previously compared to the SCORE in similar middle-aged asymptomatic cohorts (18,56,57). The comparison of the PASAP-ILERVAS with the SCORE model revealed a better performance in both sexes, showing a higher sensitivity in men, and a higher PPV, specificity, accuracy, and AUC in women. Importantly, PASAP-ILERVAS improved patients' reclassification in both sexes, but was much higher in women [NRI in men: 0.044 (95% CI: 0.020-0.068); women: 0.193 (95% CI: 0.162-0.224)]. Thus, the new algorithm showed a higher percentage of correct predictions in L&H-risk women than the traditional SCORE, especially in healthy women (specificity:0.753 vs. 0.457, p-value: < 0.001) and in those who were positive in screening test and truly had severe atheromatosis (PPV:0.712 vs. 0.565, p-value: < 0.001). Indeed, our algorithm showed a significant increase in the AUC of 0.074 (95% CI: 0.062-0.087, p-value: < 0.001) in women.
Several limitations should be considered when interpreting this study. First, results are based on a cross-sectional analysis from the ILERVAS study cohort. However, the inclusion of history factors, such as hypertension and dyslipidemia, partially overcomes this limitation. Second, the ILERVAS cohort consists of middle-aged symptomatic participants with relatively homogeneous socioeconomic, lifestyle, and ethnic characteristics. However, having a homogenous population increases internal validity, can unveil hidden clinical associations, and is not uncommon (45,58). Even though the PASAP-ILERVAS was externally validated in another cohort, further analysis would be considered to reinforce data extrapolation. Third, the derivation cohort is different not only in size (< 10% of individuals), and time period, but also in terms of incidence of subclinical atheromatosis, and even baseline characteristics of the patients. Fourth, the levels of glycosylated hemoglobin can be used to identify asymptomatic individuals at higher risk of subclinical atherosclerosis on top of traditional cardiovascular risk factors (58). However, unfortunately, we could not study the contribution of HbA1c in the PASAP-ILERVAS due to a high percentage of missing values in the external validation cohort. Finally, recursive partitioning may be complex or overfit the data. This issue was addressed by a post-pruning technique based on the CP, which shows the trade-off between the tree complexity and how well the tree fits the data.
In contrast, our study has several strengths. First, the study population was randomized, and a stratified sampling was performed from primary care records to reduce selection bias, and to obtain a representative cohort of the entire province. Second, contrary to recent atheromatosis prediction algorithm, a sex-stratified analysis was performed to unveil clinical differences in men and women. Third, rpart is a non-parametric ML method that can handle highly skewed data, does not require data categorization, and generates an easy-to-interpret graphical representation, which is very convenient in the daily clinical practice. Finally, the algorithm combines clinical history data, anthropometrical measurements, and affordable biochemical parameters that can be easily obtained in non-hospital settings, such as primary care centers and even pharmacies. Blood biochemical parameters were obtained by dried blood spot tests, which are highly validated methods.

CONCLUSION
We developed a personalized, sex-specific, easy-to-interpret model for severe atheromatosis prediction in asymptomatic middle-aged individuals. The PASAP-ILERVAS predicted accurately in L&H-risk patients. However, in intermediaterisk individuals a vascular imaging exploration would be recommended. Thus, the present algorithm could reduce the number of unnecessary complementary explorations selecting candidates for a further imaging study, increasing costeffectiveness and optimizing health resources.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by CEIC Hospital Universitario Arnau de Vilanova. The patients/participants provided their written informed consent to participate in this study.