Developing a Hybrid Risk Assessment Tool for Familial Hypercholesterolemia: A Machine Learning Study of Chinese Arteriosclerotic Cardiovascular Disease Patients

Background Familial hypercholesterolemia (FH) is an autosomal-dominant genetic disorder with a high risk of premature arteriosclerotic cardiovascular disease (ASCVD). There are many alternative risk assessment tools, for example, DLCN, although their sensitivity and specificity vary among specific populations. We aimed to assess the risk discovery performance of a hybrid model consisting of existing FH risk assessment tools and machine learning (ML) methods, based on the Chinese patients with ASCVD. Materials and Methods In total, 5,597 primary patients with ASCVD were assessed for FH risk using 11 tools. The three best performing tools were hybridized through a voting strategy. ML models were set according to hybrid results to create a hybrid FH risk assessment tool (HFHRAT). PDP and ICE were adopted to interpret black box features. Results After hybridizing the mDLCN, Taiwan criteria, and DLCN, the HFHRAT was taken as a stacking ensemble method (AUC_class[94.85 ± 0.47], AUC_prob[98.66 ± 0.27]). The interpretation of HFHRAT suggests that patients aged <75 years with LDL-c >4 mmol/L were more likely to be at risk of developing FH. Conclusion The HFHRAT has provided a median of the three tools, which could reduce the false-negative rate associated with existing tools and prevent the development of atherosclerosis. The hybrid tool could satisfy the need for a risk assessment tool for specific populations.


INTRODUCTION
Familial hypercholesterolemia (FH) (1) is an autosomaldominant genetic disorder with a high risk of premature arteriosclerotic cardiovascular disease (ASCVD). Underdiagnosis and indeed undertreatment create problems for patients with FH around the world (2). Unfortunately, the global rate of FH diagnosis is only approximately 1% (3), despite having an estimated prevalence of 1:313 (4) in the general population. There is, however, increasing interest in FH due to growing concerns about the rising levels of cholesterol in diets (5). We know that early lipid-lowering therapies hinder the development of ASCVD (6); however, most cases are identified only after encountering an ASCVD event. Underdiagnosis and undertreatment of FH are partially due to the fact that we do not have an effective gold standard to identify high-risk patients at an early stage.
At present, genetic testing is the gold standard for FH (7). However, the high costs of genetic testing and counseling have not yet been covered by social medical insurance, which has limited their application for FH diagnosis in clinical practice (8).
To address these issues, researchers have developed phenotypic tools which intercalate an assessment of clinical features, family history, and genetic test results. The Dutch Lipid Clinic Network (DLCN) (9), Simon Broome Register (SBR) (10), and "Make Early Diagnosis to Prevent Early Deaths" (MEDPED) are three commonly used tools in clinical practice (11). These three tools are advocated for clinical use in many different countries (12)(13)(14), although subtle differences within (and between) populations have been acknowledged.
The FH diagnosis rate is <0.1% in China (3). The phenotypic tools were applied; however, DLCN had low specificity and SBR had low sensitivity (15) in mainland Chinese populations. MEDPED is also complicated because of difficulties in collecting family histories (16). Recently, researchers have devised novel tools for Chinese patients mainly by modifying the DLCN model. Even though these are based on comparatively lower levels of LDL cholesterol in mainland Chinese populations or to overcome difficulties in collecting family histories, the relatively new modified DLCN tools have not performed as expected, for either domestic Chinese populations or international populations (17,18). For various heterozygous FH phenotypes, detection rates that use clinical measures and genetics cannot be significantly improved by simply raising the threshold of a single variable.
At present, prediction models based on datasets with a great number of variables and large sample sizes for FH risk assessment have been established in both the United States and Europe. For example, the FAMCAT in the United Kingdom (19), SEARCH (20), and FindFH algorithms (21) in the United States rely on the logistic regression models and have areas under the receiver operating curve (AUC) greater than 0.8. To further enhance model performance, machine learning (ML) models have been applied. The FindFH model has been established using random forest analysis and was found to have an improved AUC of 0.89 (21). Further ML models based on gradient boosting machines, neural networks, and ensemble learning have reached AUCs of greater than 0.89, which is a substantial improvement compared to the standard logistic model (AUC = 0.81) (22). Even though ML algorithms have enhanced model performance, the "black-box" feature has resulted in unknown correlations and distribution of each variable. Therefore, researchers in this field generally prefer logistic regression-based models to ML models. However, as interpretable ML methods emerge, the aforementioned situation will no longer exist. Indeed, interpreted ML methods have revealed non-linear correlations across different population samples, including biomethane production (23) and gut microbiome features in type 2 diabetes (24).
Additionally, hybridizing existing FH risk assessment tools could achieve higher clinical and genetic detection rates than unadjusted, single tools (25). This will also help establish tools for specific populations. Therefore, we aimed to assess the risk discovery performance of a hybrid model, which intercalated existing FH risk assessment tools with ML methods, based on a Chinese ASCVD patient sample.

Study Population
Data from 6,208 patients with ASCVD diagnosis at first discharge were collected from January 2012 to June 2020 in Peking Union Medical College Hospital (PUMCH). Data were then deidentified for anonymity. Cases with the following characteristics were excluded: (a) those without an LDL-c reading; (b) those with a history of ASCVD; (c) cases without total cholesterol (TC) record; and (d) those younger than 18 years, given the different criteria for the children.
In total, 5,597 cases made up the final dataset for criteria assessment and model development. Figure 1 gives further details of the eligibility process. In total, 5,597 patients with first-ever ASCVD were included, with an average age of 63.02 ± 11.44 years. Of which, 71.34% (n = 3,993) were men, and the average BMI was 25.46 ± 3.32 kg/m 2 . Only 0.11% (n = 6) had tendon xanthomata. Overall, 24.71% (n = 1,383) had been diagnosed as having premature coronary heart disease. We divided them into four groups according to the FH likelihood through the DLCN. The average age of four groups is decreased with the raising FH likelihood; for example, the patients with definite FH were 42.41 ± 9.19 years on average. More patients have been diagnosis with premature coronary atherosclerotic heart disease with a raising FH likelihood, and its prevalence in the definite FH patients is 100%. We have provided more information that is useful for describing our sample of 5,597 participants in Table 1.
Peking Union Medical College Hospital approved the study protocol, and all participants were provided with information regarding the objectives of the study and provided formal consent to participate.

Hybrid Risk Assessment Tools
To establish an FH tool for the mainland Chinese population, we selected several tools to hybridize, using the following three steps: (I) We select the most frequently used tools in both Chinese and international populations. These included 11 risk assessment tools that intercalated varied items and cutoff values. The heads of these items can be divided into five levels, namely, lipid levels, physical examination, family history, clinical history, and genetic test. We have provided each of the factors included in the 11 tools in Table 2.
Details for specific items have also been provided in the  MEDPED [3] JFHMC [4] LDL-C/TC [5] AHA [6] SCCFH [7] Lp(a)+DLCN [8] mDLCN [9] TW [10] CHC [11] Lipid levels Physical examination Supplementary Material Table 1. Lipid levels and physical examination are the two most common factors. Genetic testing was included in only five of the 11 tools but was not considered essential in any. (II) We also select candidate tools for the hybrid model.
In order to assess the tool performance, the 11 tools were assessed using the DLCN as the reference. As a four-level criterion, the DLCN determines "probable" and "definite" patients into a high-risk group. The remainder was considered low-risk; two of the 10 risk assessment tools were selected for high sensitivity and specificity. Additionally, the definition of the three variables has to be mentioned before the assessment and include untreated LDL-c, BMI, and the cutoff of premature age. Data lipid-lowering treatments and duration of interventions were recorded. Lipid-lowering treatments mainly included statins, ezetimibe, niacin, and fibrates. Based on dosage, the treatments could be divided into three levels: high, median, and low potency. Evidence of receiving lipid-lowering interventions provides insights into LDL-c adjustments (26), which is necessary because most tools were assessed with previously untreated LDL-c patients. The weight and height were used to calculate body mass index (BMI), where necessary. Medical history was compared for the identification of premature cases which were based on two variables, namely, DLCN (with men < 55 years; women < 60 years), that is, premature coronary heart disease (pCHD), and the Taiwan criteria (27) which include men < 45 years and women < 55 years, that is, pCHDTW. Similarly, "premature" was defined using the DLCN, which stipulates that men younger than 55 years and women younger than 60 years, with a family history of premature coronary heart disease (pCHD_fh), or TW (men < 45 years; women < 55 years), known as the pCHD_fhTW. Both were included to determine which significantly correlated with our results. (III) We hybridize the selected tools. We combined the two aforementioned tools with DLCN using a voting strategy (28). To discover high-risk patients, the hybrid result (HYR) has grouped "possible, " "probable, " and "definite" FH patients as high risk into a two-category outcome.

Establish the Hybrid Familial Hypercholesterolemia Risk Assessment Tool With Machine Learning Algorithms
In order to establish the predictive model based on the dataset with a large number of variables and sample size, the ML models have been built as an FH risk predictor and to avoid personal biases during the multi-tool application, simultaneously. There were essentially three steps involved in establishing the predictive model.

Variable Selection
The first step in this process was variable selection. For this, 33 variables were collected directly through the electronic medical record (EMR) system. Variables were then divided into four categories namely demographics, patient histories, laboratory examinations, and family histories. Patient variables and family histories were based on the items from 11 frequently used risk assessment tools. Demographics, that is, age and gender and clinical characteristics including TC, LDL-c, HDL-c, and Lp(a) were included in respective laboratory indexes. After stratifying the dataset with HYR into two subgroups, the variable selection process was considered under univariate analysis and through a four-variable selection method. The variable selection method included the Lasso (29), elastic net (30), random forest (RF) (31), and logistic regression. Variables used for the ML model setting were selected according to more than two of the four aforementioned methods.

Machine Learning Model for Familial Hypercholesterolemia Risk Prediction
The second step in this process was the FH risk predictor step which included establishing the Hybrid FH Risk Assessment Tool (HFHRAT). HYR was set as the outcome, with four ML methods taken as the main structure. These were then  compared in terms of their performances using logistic models, including extreme gradient boosting (XGBoost) (32), RF (33), support vector machine (SVM) (34), and back-propagation artificial neural network (BPANN). We added Adaboost (35) and stacking (36) ensemble learning methods to balance data and model performance improvements. Adaboost is a homogeneous ensemble learning method that continuously samples from the dataset in order to construct a new model, based on the previous model performance.
Results were integrated using several models using the same ML method. Stacking is a heterogeneous method that links the results of base-level models set by different methods and further inputs into the meta-level model as independent variables. In this study, base-level models included RF, SVM, and BPANN, while simultaneously taking the logistic as the meta-level model. In total, six indexes assessed the performance of classifiers, including accuracy, sensitivity, precision-recall F measure (F), AUC, rootmean-squared error (RMSE), and G-mean value.
To simplify the use and explanation for participants, the probability model was also established using five indexes for performance assessment, consisting of AUC, RMSE, mean of calibration error (CAL mean) (37), and Brier scores (BS) (38). The best performing model in both classification and probability predictions was determined to be the final identifier. Due to the fact that ML models cannot generate probabilities during the classification process, an isotonic regression line (39) was added to recalibrate probabilities.

Correlations in the Hybrid Familial Hypercholesterolemia Risk Assessment Tool
During the final step, we interpreted the HFHRAT. ML models are not best suited to describing relations between independent and dependent variables. Therefore, we implemented interpretative machine learning methods, such as individual conditional expectation (ICE) and partial dependence plot (PDP), which are two model agnostic methods (40). The ICE and PDP were applied generally to provide explanations that were not bound to model settings (41). The ICE was used to display predictions for each sample through a single line, while the PDP was implemented to show how the marginal effect of one independent variable contrasted to the predicted value generated by the ML model through a fixed effect of the chosen variable and the average of other variables. The PDP line is always described as the average of the ICE. The whole process of setting identifiers has been presented as a schematic diagram in Figure 2.
To discover additional changes these hybrids present, we have compared all 33 variables for high-risk group predictions

Tool Selection
Tools were assessed according to performance using the final dataset and were compared with the DLCN using five clinical measures. According to the DLCN assessment, the FH prevalence in our sample was 2.57%, which has two distinct elements (herein referred to as levels). In order to ensure consistency, each of the remaining assessment tools to be analyzed here was also modified to have two distinct levels. Additionally, without the result of the genetic test, the Japanese FH Management Criteria (JFHMC) could not predict the FH diagnosis and therefore was excluded because it cannot effectively identify high-risk cases.
The existing tools were selected in two aspects, namely sensitivity and specificity, to ensure the higher accuracy of the following hybrid tool. In terms of sensitivity, both the mDLCN and the LDL-C/TC tool have reached 97.22%, in which the latter assessed only laboratory test indexes, that is, LDL-C or TC, which yielded a high AUC (92.46%) and a slightly lower specificity (87.51%) compared to the mDLCN (AUC [95.06%] and specificity [92.90%]). In terms of specificity, TW and American Heart Association (AHA) reached 100%; however, the TW criteria had the highest sensitivity (41.54%) of the three. Therefore, the mDLCN and TW were the two selected tools. Table 3 gives more comprehensive summative data of the prevalence and the performances of the 10 different criteria compared with the DLCN.

Tool Hybridization: Generating Hybrid Result
The Dutch Lipid Clinic Network, modified DLCN for China (mDLCN), and TW diagnostic criteria are three 4-level tools. With a voting strategy, their HYR was also a 4-level tool. Overall, 4,285 ASCVD patients were categorized as unlikely FH cases according to all three tools. However, different cutoff values for each tool led to an inconsistent risk-level assessment of a single sample, that is, a sample may be assessed by DLCN as probable FH, mDLCN as definite FH, and TW as possible FH. To discover all high-risk patients, we further grouped HYRs into two levels according to "possible, " "probable, " and "definite" FH cases as high risk, which means the aforementioned samples were identified as high-FH risk patients. For another group of samples, they were assessed by the DLCN as possible FH, mDLCN as probable FH, and TW as unlikely FH. According to the voting strategy, the results of the two tools owned to the high risk, and these samples were identified as the high-FH risk samples. For details of the voting strategy and the hybridizing process, see the Supplementary Material Table 2.

Hybrid Familial Hypercholesterolemia Risk Assess Tool With Machine Learning Algorithms and Hybrid Result
The HYR modified the three existing tools into a robust single tool; however, the cutoff for every single item remained unknown using a crude voting strategy. When the HYP is applied, clinicians will have to learn several tools simultaneously, which may introduce personal biases into FH diagnosis. Therefore, the ML model was established based on the HYR, for better application and correlation clarification.

Variable Selection Based on Hybrid Result
The dataset was stratified according to HYR into high-risk and low-risk groups. In all, 33 significant variables remained between the groups for variable selection, while more than two variable selecting methods highlighted 14 candidate variables. LDL-c was identified as the most correlated variable by all methods, and the highest LDL-c during admission was included in the final model.
We included three variables identified in the DLCN and TW: history of premature CHD, premature stroke, and the family history of premature CHD. To avoid problems caused by overlapping information, we also eliminated variables with smaller coefficients. Finally, nine variables were left for further model development which coefficient in each variable selection method has been displayed in Table 4. The results of the 33 variables are given in Supplementary Material Table 3, and the top 20 variables of the variable selection result in Supplementary Material Table 4.

Machine Learning Algorithms for Familial Hypercholesterolemia Risk Prediction
The HFHRAT was based on the aforementioned nine variables as the independent variables and the HYR as the dependent variable. The HYR divided the predictor setting dataset into two groups: 1,112 high-risk participants, that is, level 1, and 4,485 low-risk participants, that is, level 0 participants. The number of participants in the low-risk group was predictably four times larger than that in the high-risk group. We applied ensemble learning to handle problems caused by imbalanced data. Under the process of 10-fold cross-validation, 7-fold was randomly selected and combined to create a training dataset, while testing the model performance with the remaining 3-fold.
The HFHRAT is composed of the stacking models with the best performance in both classifier ( 1.03]). The performance of classifiers and probability models is given in Table 5.

The Interpretation of the Hybrid Familial Hypercholesterolemia Risk Assessment Tool and Efficiency Assessment for Tools
Partial dependence plot was used to explain correlations between age, LDL-c, and lipid-lowering therapies with FH risk. According to our dataset, 1,947 patients were yet to receive any lipidlowering medication before admission. In total, 312, 3,210, and 128 patients had received low-, medium-, and high-potency statins before admission, respectively.
We divided the dataset into four subsets according to lipidlowering therapies and analyzed correlations. According to PDP, the risk of FH correlated with an upward trend in all participants and across each subset. With the increasing level of lipid therapies, the risk of FH increased at the lower level with LDLc. The decreasing trend of age to the FH risk was associated with limited changes among the different levels of therapies; however, the FH risk has been improved with the growing level of lipidlowering therapy. The high-risk FH patients without lipid therapy were mostly patients aged <70 years and LDL-c > 4.5 mmol/L. As the level of lipid therapy increased, the range of the age and LDL-c have become more concentrated in the patients younger than 75 years with LDL-c > 4 mmol/L. See Figure 3 for PDPs.
To further discover the use of these hybrids in the tools, we have further compared the features of the high-risk groups predicted by five tools, including three existing tools, HYR, and the HFHRAT. Based on the voting strategy, HYR has found the median of three tools. With the ML algorithms, the HFHRAT has followed the data distribution and further adjusted the median position by enlarging the difference between the high-and lowrisk groups. Of all 5,597 participants, 1 Table 3.

DISCUSSION
The hybrid FH risk assessment tool is a novel risk assessment tool based on hybridizing diagnostic with machine learning. This was designed for early identification of FH risk, specifically for mainland Chinese populations. The HFHRAT modified the median of the existing tools and avoided personal biases which can be a problem when applying hybridized tools. Despite having a rather heterogeneous population, this highlights a marked improvement and, of course, supports the need for a more accurate tool for mainland Chinese. It is hoped that this study will not only benefit those in China by identifying and preventing to development of ASCVD but will also help the global research community who are striving to improve risk assessment tools of this type.

Prevalence of Familial Hypercholesterolemia
In our research, the overall prevalence of FH in patients with ASCVD (definite and probable) was 2.57%, of which definite FH accounted for 0.52% and probable FH was 2.05%, based on the DLCN. Although the FH prevalences are different between regions (42) and even between research (43), the prevalence of heterozygous FH mainly ranged from 1.17 [95% CI, 1.12-1.24] to 4.88 [95% CI, 4.17-8.33]) in the ASCVD (44). The estimation is higher than that of our research, 2.57%, for the range is based on the meta-analysis mostly consisting of the research for white individuals. Because of the different dietary habits, Chinese patients exhibit a lower level of TC and LDL-C than patients in Western counties (8). Such condition has slightly changed due to economic growth and increases in population cholesterol levels (5), although has not yet reached the level of TC and LDL-C in Western counties. The research with only Chinese patients has a prevalence similar to our study, in which the prevalence of probable and definite FH was 3.5% in patients undergoing coronary angiography (definite 1.0% and probable 2.5%) with coronary artery disease (CAD) (45) and 4.4% in patients with premature myocardial infarction (MI) (17). The prevalence may be different due to the constituent ratio of the study population by local and allopatry patients. The patients from specialty hospitals (in previous research studies) have more typical clinical symptoms than those in general hospitals (in our research). The prevalence estimated through phenotypic tools may elevate by such difference. The study population from a general hospital may own better consistency with the natural distribution of the FH patients in ASCVD. The consistency may attribute to the robustness of the model, while the improvement still needs further research to prove. Additionally, our participants are the ones with the firstever ASCVD and exclude the ones with recurrence ASCVD. As the patient with FH is at a high risk of recurrent cardiovascular events (46), the prevalence may be downward by the exclusion.

Hybrid Diagnosis Tool
The diagnosis of the study population originated from the hybrid result of the three existing tools. Each of them is applicable to diagnose both homozygous and heterozygous FH. Homozygous FH is an orphan disease, with a low prevalence of 1 in 300,000 individuals (47). Homozygous FH patients have very high LDL-C levels from birth, accelerated arterial stenosis and atherosclerosis, and premature death in their juvenile stage (48) due to myocardial infarction/acute coronary insufficiency. Our tools were built based on patients older than 18 years. Though a limited number of patients with homozygous FH may be, undeniable, include as well, the number shall be quite limited. According to the prevalence, the modeling individuals in our research are mainly composed of patients with heterozygous FH. Our tools are for heterozygote FH likelihood assessment. We are planning to recruit more homozygous FH specifically in our further research.
For heterozygous FH diagnosis, the DLCN, SBR, and MEDPED perform reasonably well, although they do compromise a certain amount of either sensitivity or specificity. To satisfy the need for different regions, these tools have been modified in three ways: elevating or downgrading levels of existing items, adding new items, and assessing the importance of each item using new statistic algorithms. Most of the previous modifications are in the first category, for example, elevated the LDL-c level in DLCN for Canadians (49) and Chinese (15,27), while LDL-c measures are superior for Italians (50). In the second category, lipoprotein(a) has long been discussed whether it should be included in the criteria (51). Lp(a)+DLCN (52) has put the statement into practice; however, it had very little effect on our participants. Logistic regression and machine learning methods (53) are the main algorithms involved in the third category. The FindFH (21) and the FH prediction model (FAMCAT) (54) are both logistic regression tools based on clinical records from US and UK patients, respectively. Both are improved based on random forest analysis (22). For the Chinese population, even the most popularly used DLCN has long been deemed unsuitable for Chinese patients (55), for the lipid level of Chinese was usually lower than that observed in Western societies. The discrepancy among the different tools in a new dataset mainly depended on the different cutoff levels in the premature onset or the lipid indexes (11). Assessing the FH risk in each participant with several different existing risk assessment tools and then considering their results comprehensively, the hybrid diagnosis tool could provide solutions to mitigate FH risk.
The 10 existing FH risk assessment tools we selected were from the former two categories and have estimated the FH risk in each participant in our dataset, which performed their consistency and difference with DLCN. In the consistency aspect, two modified DLCN criteria for Han Chinese specifically have left, including mDLCN and TW. The mDLCN has higher sensitivity than DLCN and the genetic test (40); however, low specificity has been its shortcoming. This mostly resulted from its modification in the cutoff of LDL-c, while the LDLc ≥ 8.5 mmol/L matched the score 8 in the DLCN, but the same score in mDLCN matched the LDL-c ≥ 6 mmol/L. Therefore, the risk predicted using the mDLCN is higher than it ought to be for some. The TW criteria were also performed with high sensitivity in contrast to the genetic test when using the Han Chinese dataset (56). However, among the criteria with highest specificity (100%) based on our dataset, the TW has the highest sensitivity (41.54%), but this was less than that needed if the number of false negatives is reduced. Mostly resulted from a modification to the cutoff for premature development in participants and their first-level relatives, which as from <55 years old in men using the DLCN to <45 years old using the TW, and <60 to 55 years old for women. This constructive cutoff eliminated the high-risk participants without the early discovery of the high-lipid-leading disease. This means that at present, we cannot define a unified cutoff for mainland Chinese populations. We combined these tools with the voting strategy, with the result named HYR. The HYR has placed the median of the aforementioned three tools and improved the tool performance, although it brings more personal bias in the application and the exact cutoff value of each variable remains unknown.

Establishing the Hybrid Familial Hypercholesterolemia Risk Assessment Tool
To eliminate the personal bias and get the cutoff value, the multi-variable models were settled as the classifier and probability predicting models with both the hybrid diagnosis and machine learning algorithms, named HFHRAT. The best performing model was finally set with the stacking ensemble algorithm, and nine variables yielded an AUC for the classifier, which was 94.85 ± 0.47%, and performed similarly to the previous ML models based on the participants from other regions (22), which ranged from 89 to 90%. All ML models performed better than the traditional logistic model.
For the number of the variables mentioned in the model, the FindFH model (21) for the U.S. population was a random forest model, with an AUC of 89% with 75 variables, while another random forest model (57) based on the same dataset has settled the model with 20 variables and reached an AUC of 94%. These three models also suggested that a large number of variables does not mean a high level of model performance. Using fewer variables for accurate predictions is easier and more likely to be accepted by clinicians. Our predicted model consisted of nine variables, in which relations with FH risk have been estimated by previous research, and mainly consisted of the items mentioned in the clinical guidelines. In addition to the dyslipidemia-related variables, personal and family histories (58) also affect the detection of patients at risk for FH. The tendon xanthomata (59) is an essential feature in FH diagnosis, although this is not often seen in patients from mainland China.

Cutoff Values for Variables in the Hybrid Familial Hypercholesterolemia Risk Assessment Tool and Assessments
Most of the previous classifiers preferred the tree model for easier interpretation, which may compromise the accuracy of the model. The stacking ensemble learning model performed the best in our research, and its black box feature may be a disadvantage before the appearance of interpretation machine learning compared to the tree models. As a data-derived method, the correlations in the machine learning models mostly originated from the training dataset directly and can hardly acquire from the domain expertise, which brings potential pitfalls, such as unwanted confounding or interaction effects. The black box feature is the result of these effects. With two interpretation machine learning methods, PDP and ICE, the correlations in the model have been unveiled and assessed whether they corresponded with the natural laws. We have unveiled correlations among age, LDL-c, and the level of lipid-lowering therapy on the FH risk assessments. For the normal value of LDL-c ranging from 2.07 to 3.11 mmol/L, the participants with LDL-c ≥ 5.0 mmol/L can be defined as the possible FH by DLCN and TW, so as it is with the LDL-c ≥ 3.5 mmol/L in mDLCN. In HFHRAT, the LDLc has been modified through the voting strategy and with machine learning methods using LDL-c levels in the three existing risk assessment tools, including DLCN, mDLCN, and the Taiwan FH diagnostic criteria (TW). LDL-c in the HFHRAT machine learning model highlighted the highest low-density lipoprotein cholesterol in first-ever ASCVD admissions. With the interpretation of the PDP, the participants with untreated LDL-c > 4.5 mmol/L without any lipid-lowering therapy are the possible FH, which is the median of the aforementioned three tools. The PDP and ICE have first estimated the LDL-c level of the participants with various levels of lipidlowering therapies.
The univariate analysis provided in the Supplementary Material Table 3 provides the effect of the five tools. During the comparison among the five tools, the effect of the raw tools, the one with the hybrid diagnosis, and the one with both hybrid and ML algorithms, have been compared with each other. For the number of participants in the high-risk group, the HFHRAT consisted of 1,116 participants which are lower than that in the mDLCN (n = 1,234), higher than that in the DLCN (n = 1,076), TW (n = 959), and modified HYR (n = 1,112). The HYR has found the "median" position in these tools, while the HFHRAT has modified this "median" position based on the data distribution resulting from the features of the machine learning algorithms. For example, lp(a) is 199.21 ± 226.79 mg/L in the high-risk group of HYR, which is lower than that of the TW (201.52 ± 230.90 mg/L) but higher than that of the DLCN (198.39 ± 224.20 mg/L) and mDLCN (198.39 ± 225.05 mg/L). Although lp(a) is 198.04 ± 228.51 mg/L in the high-risk group of HFHRAT and the modified process has a lower lp(a) value, the high lp(a) still has a good effect on FH risk assessment (52). However, its effect may be covered up as a continuous variable (60), and its normal cutoff value has not yet been generally agreed upon, which could hardly be leveled into the categorical variable.
The hybrid FH risk assessment tool is a machine learning model based on the HYR, which is a majority testing result (also named voting strategy in biostatistics) of three tools, including DLCN, mDLCN, and TW. The performance of each tool could be tested by sensitivity and specificity. Majority testing avoids the trade-off and results in relatively high overall values for both test characteristics (61). In this case, HYR is more robust than DLCN. However, the majority of testing required more independent tools, which have been validated by several large population cohorts. Such need could hardly be satisfied, for the existing FH risk assessment tools for specific Chinese populations are limited and lack external validation. Therefore, the data-derived machine learning model, HFHRAT, has been built to adjust the sensitivity and specificity through the data distribution. For example, the average age of patients in level 1 was lower than in level 0. However, the average age of level 1 was defined according to the DLCN, which was 57.54 ± 12.42 years. This was the oldest average in all five tools (mDLCN 56.54 ± 12.18 years; TW 56.93 ± 13.41 years; HYR 56.91 ± 12.63 years; and HFHRAT 57.02 ± 12.70 years old). While compared with premature CHD prevalence, there were 48.3% (n = 520) with premature CHD in the level 1 group defined using DLCN [mDLCN 669 (54.2%), TW 437 (45.6%), HYR556 (50.0%), and HFHRAT 543 (48.7%)]. Level 1 defined using HFHRAT may be higher in age, but this approach compromised the prevalence of premature CHD and several other specific clinical features of patients with FH. However, this provided the unique potential for HFHRAT in FH likelihood detection and highlights its effectiveness in early diagnosis. Nevertheless, FH still requires results from patients at a longer follow-up and with genetic testing. Other variables with levels between 1 and 0 were divided according to the five risk assessment tools, provided in the Supplementary Material Table 3.

Study Limitations
While there were some advantages and this study adds to the evidence base, there are few limitations to this study. First, 5,597 participants were all from a single center. This is likely to have skewed data to some extent, although further biases are introduced when analyzing multi-centric datasets. However, the final identifier (which is provided in the Supplementary Material) could be used to assess further datasets. Second, the results of genetic testing have not been included in this study. The HFHRAT aimed at identifying patients with a high FH likelihood and was therefore provided with advice around early genetic tests and lipidlowering therapies. The participants identified as high risk required further confirmatory testing, which followed clinical guidelines. Third, the PDP made predictions for each sample according to others, which may further confound findings. Large-scale studies analyzing age-stratified groups might help overcome these biases.
In conclusion, the HFHRAT, for FH early diagnosis of the Chinese population, was set according to a stacking ensemble learning model based on the hybrid diagnosis (HYR), a voting strategy tool with three tools to avoid discrepancies among the DLCN, mDLCN, and TW. The mDLCN and TW were selected for consistency with the DLCN, which is most commonly used in clinical practice. The PDP and ICE revealed that the Chinese participants were younger than 75 years and untreated LDL-c > 4.5 mmol/L with the wFH risk. We would encourage these participants to assess their FH risk through the HFHRAT, and the high-risk ones shall further receive the genetic test in addition to the lipid-lowering therapy. We envisage that this risk assessment tool could perform efficiently in FH diagnosis in China and reduce the development of ASCVD and associated deaths.

DATA AVAILABILITY STATEMENT
The original contributions presented in this study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, China. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
SZ, LW, and JG designed the work, while ZT and YJ provided valuable advice in terms of design and interpretation. LW and JG have collected and interpreted the dataset. LW created the codes and put them into practice. SS helped to write and edit this report. All authors contributed to the article and approved the submitted version.