Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Reprod. Health, 09 October 2025

Sec. Menopause

Volume 7 - 2025 | https://doi.org/10.3389/frph.2025.1670141

This article is part of the Research TopicImpact of Diminished Gonadal Function on Metabolic and Cardiovascular HealthView all 3 articles

Precision prediction of hyperhomocysteinemia development in perimenopausal women using LASSO regression


Xuan Tan,Xuan Tan1,2Mingqi LiMingqi Li1Jie WangJie Wang1Yiwei Peng,Yiwei Peng1,2Liwen Zhu,Liwen Zhu1,2Na Jiang,Na Jiang1,2Ling Li
Ling Li1*Xiuqin Hong,

Xiuqin Hong1,2*
  • 1Clinical Epidemiology Research Office, Hunan Provincial People’s Hospital (The First Affiliated Hospital of Hunan Normal University), Changsha, China
  • 2Key Laboratory of Molecular Epidemiology, Hunan Normal University, Changsha, China

Background: Hyperhomocysteinemia (HHcy) is associated with an increased risk of cardiovascular diseases, particularly in perimenopausal women, who are more susceptible to metabolic disorders due to declining estrogen levels. This study aimed to identify risk factors and develop a predictive model for HHcy in this population.

Methods: A retrospective study included 687 perimenopausal women, divided into a training set (481) and an internal validation set (206). Demographic characteristics, pregnancy-related factors, lifestyles, and diet information were collected by questionnaire. 63 perimenopausal women hospitalized from March to June 2025 were selected as the external validation set. The least absolute shrinkage and selection operator (LASSO) regression was used to select variables. The logistic regression model was developed to predict HHcy risk, with results visualized using a nomogram. Model performance was evaluated using receiver operating characteristic (ROC) curves, calibration curves, and decision curve analysis (DCA).

Results: 137 of 687 (19.94%) perimenopausal women had HHcy. Through Lasso regression and multifactor logistic regression, 4 predictors were identified, including egg consumption frequency, LDL, TP, and CysC for constructing the nomogram model. The AUC of the training set was 0.765 (95% CI = 0.708–0.822), for the internal validation set was 0.854 (95% CI = 0.781–0.928), and for the external validation set was 0.776 (95% CI = 0.603–0.949), indicating good predictive performance of the model.

Conclusion: The nomogram demonstrated high predictive accuracy and clinical utility, providing a potential tool for HHcy risk prediction and selection of treatment strategies in perimenopausal women.

1 Introduction

Perimenopause, or the menopausal transition, refers to the period during which physiological changes signal the progression toward a woman's final menstrual period. This phase begins with the onset of menstrual disorders and continues until a woman enters menopause, or one year after amenorrhea occurs (1). Women during this period are more susceptible to certain health risks due to declining estrogen levels, and one of the important health risks is hyperhomocysteinemia (HHcy) (2). As a blood biochemical predictor, elevated levels of homocysteine (Hcy) are closely associated with the risk of cardiovascular diseases and ischemic stroke (3, 4). Additionally, an evidence-based analysis confirmed that markedly elevated Hcy levels significantly increase the risk of developing type 2 diabetes (5). Studies have shown that postmenopausal women have an increased prevalence of HHcy (2)—a finding that aligns with the physiological changes characteristic of this stage and lays the groundwork for exploring HHcy risk in the perimenopausal transition.

Hcy levels are influenced by many factors, encompassing genetic predispositions, dietary intake, and lifestyle choices. Genetic factors are pivotal, with hereditary defects in metabolic and methylation pathways contributing to increased Hcy levels. Notably, the polymorphism of the methyltetrahydrofolate reductase (MTHFR) gene—specifically its 677TT genotype—is closely associated with Hcy levels (6). Dietary intake significantly influences Hcy levels. Vitamins B12, B6, and folic acid are essential coenzymes in Hcy metabolism, and their plasma levels are inversely correlated with plasma Hcy concentrations. Insufficient intake of these nutrients results in elevated plasma Hcy levels. Additionally, a diet rich in methionine, which is characteristic of high animal protein and low vegetable protein intake, may contribute to HHcy. Age, underlying diseases, and medication use also modulate Hcy levels, with levels generally increasing with age. Comorbidities such as hepatic or renal impairment, and the use of contraceptives, antiepileptic drugs, diuretics, and other medications can elevate Hcy levels (7). Additionally, our prior studies indicate that in clinical practice, estradiol (E2) is a protective factor of HHcy.

Perimenopausal women exhibit unique physiological and hormonal fluctuations, and while their risk of developing HHcy is elevated, it also shows inter-individual heterogeneity. These characteristics lead to greater variability in predictive factors among perimenopausal women; therefore, a robust variable selection method such as Least Absolute Shrinkage and Selection Operator (LASSO)-logistic regression is urgently needed to avoid model overfitting and ensure model stability. LASSO regression is a variable selection method proposed by statistician Robert Tibshirani in 1996 (8). Compared with traditional regression methods, LASSO regression can deal with a larger number of potential predictors and select the variables that are most relevant to the disease which is an important tool for clinical screening of influencing factors. Given the critical need for early identification of HHcy risk in perimenopausal women and the current lack of specialized predictive tools, the objective of our study was to explore associated risk factors using the LASSO regression method and to develop a predictive model. This predictive model was designed to integrate demographic characteristics, lifestyles, diets, pregnancy-related factors, and biochemical indicators to identify patients at high risk for developing HHcy before these perimenopausal women meet the full diagnostic criteria. The development of such a predictive model represents a crucial step toward improving the management of perimenopausal women who may develop HHcy, potentially reducing morbidity through early intervention.

2 Materials and methods

2.1 Study subjects

Perimenopausal women hospitalized at Hunan Provincial People's Hospital from January to August 2021 were consecutively recruited and further divided into the development group and the internal validation group. Perimenopausal women admitted to the Department of Cardiology of Hunan Provincial People's Hospital from March to June 2025 were selected as the external validation group. The inclusion criteria were as follows: (1) participants were perimenopausal women (40–60 years old); (2) participants consented to peripheral vein blood collection; (3) participants or family members had signed an informed consent form. The exclusion criteria were as follows: (1) patients with severe hematologic diseases, cardiac, renal, or hepatic functional diseases, and malignant tumors; (2) patients with infectious diseases, such as hepatitis B and tuberculosis; (3) patients who had recently used medications affecting hormone levels, lipid levels, and Hcy levels, or had taken fibrinolytic anticoagulant medications; and (4) women during lactation and pregnancy. Finally, a total of 687 women from the 2021 study group were enrolled and divided into a control group (n = 550) and an HHcy group (n = 137); the external validation group (enrolled from March to June 2025) included 63 participants. The study was ethically approved by the Medical Ethics Committee of Hunan Normal University (approval numbers: 2017034 and 2024061), and all study participants gave informed consent.

2.2 Diagnostic criteria

HHcy was defined as a fasting plasma Hcy level of 15 μmol/L or higher (9). Smoking was categorized as the regular smoking of at least one cigarette per day for a continuous period of at least six months (10). Alcohol consumption was defined as the intake of alcohol at least once per week for a minimum of six months (10). Tea consumption was defined as the consumption of tea at least three times per week for a minimum of six months (10). Macrosomia was defined as a fetal weight exceeding 4,000 g at any gestational age (11). Gestational diabetes mellitus was defined as glucose intolerance that emerges or is first detected during pregnancy (12). Pregnancy-induced hypertension (PIH) was defined as hypertension without proteinuria, occurring for the first time after 20 weeks of gestation, with a systolic blood pressure (SBP) greater than 140 mmHg and a diastolic blood pressure (DBP) greater than 90 mmHg (13). Irregular exercise was defined as engaging in physical activity once to three times per week or less, while regular exercise was defined as more than three times per week. The frequency of meat consumption was classified as follows: “Never” indicated less than one consumption per month, “Occasionally” indicated one to three meat meals per week, and “Often” indicated more than three meat meals per week.

2.3 Data collection

The demographic and lifestyle characteristics of the study participants were assessed using a questionnaire. The survey encompassed four primary domains: (1) demographic characteristics (including Age, Residential Area, Educational level, Menarche Age, Hypertension, and Menopause status); (2) pregnancy-related factors (including Age at First Birth, Macrosomia, PIH, and Gestational Diabetes Mellitus); (3) lifestyles (including Physical Activity, Smoking and Drinking habits, and Sleep Duration); and (4) diets (including consumption frequency of meat, eggs, vegetables, nuts, dairy products, fruits, soy products, alcohol, tea, and coffee). BMI (Body Mass Index) was calculated as weight (kg) divided by height squared (m2). We also collected the following biochemical indicators at admission: Triglycerides (TG), Total Cholesterol, Low-Density Lipoprotein (LDL), High-Density Lipoprotein (HDL), Alanine Aminotransferase, Aspartate Aminotransferase (AST), E2, Testosterone, Total Protein (TP), Albumin (ALB), Globulin (GLB), Albumin/Globulin ratio (A/G), Prothrombin Time (PT), Uric Acid (UA), Glomerular Filtration Rate (GFR), Serum Creatinine (Scr), Blood Urea Nitrogen, and Cystatin C (CysC).

2.4 Data pre-processing

The clinical research big data underwent rigorous cleaning, including the removal of outliers and imputation of missing values. Indicators with missing values over 20% were excluded from the analysis. Random-Forest Multiple Imputation (14) was used for handling missing predictor values. This method was implemented using the “mice” R package, which involved imputing the dataset five times; the average of these results constituted the final imputed values. The imputed dataset was then used for subsequent statistical analyses. The data were divided into a training set (70% of the data) and an internal validation set (30% of the data) for model training and evaluation. The classification model was trained using the training set, and its performance was assessed using the internal validation set and the external validation set.

2.5 Statistical methods

For quantitative variables, those obeying normality were described by mean ± standard deviation, and the t-test was used to compare the differences. Those not obeying normality were described by median and quartile intervals [M (P25, P75)], and the Mann–Whitney U-test was used to compare the differences. Qualitative variables were described by proportion (%), and differences were compared using the Chi-square (χ2) test or Fisher exact probability method.

Predictor selection and regularization were conducted utilizing LASSO regression analysis. The “glmnet” package in R was used to perform LASSO regression analysis to identify clinical characteristics that were significantly associated with the risk of HHcy. The variance inflation factor (VIF) was utilized to assess the severity of multicollinearity in the multivariate linear regression model, and significant variables were included in the multivariate logistic regression model to identify predictive factors. A VIF value < 10 could be regarded as the absence of high multicollinearity. Subsequently, Restricted Cubic Spline analysis was applied to assess the linear relationship between potential continuous predictors and HHcy risk before developing the multivariate logistic regression model, ensuring only variables with appropriate linear relationships were included, after which a multivariate logistic regression analysis was conducted to develop a predictive model capable of discriminating between HHcy and non-HHcy participants. The created model served as the basis for the development of a nomogram. The predictive efficacy of the model was evaluated in terms of discrimination, calibration, and clinical applicability by using the Receiver Operating Characteristic (ROC) curve, calibration curve, and clinical decision curve analysis (DCA). The “pROC” package in R was used to plot the ROC curves and calculate the area under the ROC curve (AUC) values. Data were analyzed using SPSS 26.0 statistical software and R 4.4.1 software. All statistical tests were two-sided, and a significance level of P < 0.05 was considered statistically significant.

3 Results

3.1 Study population characteristics

The dataset contained varying levels of missingness across different variables, ranging from 0.1% to 10.2% of total values. The variables with the highest percentage of missing values (10.2%) were Menarche Age, followed closely by Age at First Birth at 9.5%. Lower levels of missingness were observed for variables such as Smoking (0.1%), Alcohol Consumption (0.1%), and Egg Consumption Frequency (0.3%), as shown in Supplementary Figure S1. To properly handle missing data, this study used Random-Forest Multiple Imputation for imputation. A comparison of data distribution before and after imputation revealed no statistically significant differences between the original and imputed data (P > 0.05), as detailed in Supplementary Table S1.

The basic characteristics of patients with HHcy and controls are shown in Table 1. During the study period, we enrolled 687 eligible perimenopausal women with an average age of 52.73 years. According to the diagnostic criteria, 137 women were diagnosed with HHcy, and the prevalence was 19.94%. The training group had 481 women, with an average age of 52.77 years, and 97 women with HHcy (20.17%). The internal validation group had 206 women, with an average age of 52.64 years, and 40 women had HHcy (19.42%). No significant differences were observed between the two groups with regard to participants' characteristics (P > 0.05; Supplementary Table S2). Comparative analysis of basic characteristics between HHcy and non-HHcy groups revealed significant differences in Age, Education Level, Egg Consumption Frequency, PIH, Gestational Diabetes Mellitus, Hypertension, SBP, DBP, TG, LDL, TP, GLB, A/G, PT, UA, Scr, GFR, and CysC (P < 0.05).

Table 1
www.frontiersin.org

Table 1. Patients’ characteristics of the enrolled population.

3.2 Variable selection

LASSO regression analysis was performed to identify the potential predictive factors. As the penalty parameter λ was adjusted, the number of variables included in the model decreased progressively. A 10-fold cross-validation was performed, and the lambda value corresponding to the minimum (λ.min) was determined to be 0.015 (Figure 1). Initial LASSO regression analysis of 49 potential variables identified 16 predictors with non-zero coefficients, including Education Level, Menarche Age, Egg Consumption Frequency, Nut Consumption Frequency, Coffee Consumption, PIH, Gestational Diabetes Mellitus, Heart Rate, Testosterone, LDL, HDL, AST, TP, PT, GFR, and CysC.

Figure 1
Two graphs in one image. (A) A coefficient path plot showing coefficients against the log of lambda values, with lines converging as lambda increases. (B) A plot of mean-squared error versus log lambda values, with a red curve indicating error values and grey bars representing variability, showing an optimal value around log lambda equals negative four.

Figure 1. Screening predictors based on LASSO regression. (A) Ten-fold cross-validation was performed to determine the optimal value of the LASSO regression-related tuning parameter (lambda). (B) The coefficient profiles of the variables incorporated in the LASSO regression analysis were plotted against the logarithm of the lambda sequence.

3.3 Prediction model establishment using selected factors

The multicollinearity test revealed no statistically significant correlations among the variables (Supplementary Table S3). Specifically, the VIFs value of all 49 initial predictor variables ranged from 1.03 to 2.87, with all values well below the widely accepted threshold of 10.

The 16 selected variables were used as independent predictors, with HHcy occurrence as the dependent variable. Multifactorial binary logistic regression analysis showed that Egg Consumption Frequency [odds ratio (OR) = 0.545, 95% CI = 0.301–0.987, P < 0.05], LDL (OR = 1.419, 95% CI = 1.017–1.978, P < 0.05), TP (OR = 1.071, 95% CI = 1.026–1.117, P < 0.05), and CysC (OR = 9.378, 95% CI = 4.582–19.193, P < 0.001) were identified as predictors for HHcy (P < 0.05; Table 2). Multifactorial binary logistic regression analysis was conducted with further adjustment of age and E2, and the results were consistent with the main finding (Supplementary Table S4).

Table 2
www.frontiersin.org

Table 2. Multifactorial binary logistic regression analysis of independent risk factors based on LASSO.

Analysis of the continuous variables in the model revealed linear relationships between LDL, TP, CysC, and the prevalence of HHcy. Since the logistic regression model is a linear model in terms of logit, these linear relationships satisfy the basic conditions for modeling using logistic regression analysis (Supplementary Figure S2). A diagnostic model for the training group was constructed based on these four independent variables, visualized using a nomogram (Figure 2). These variables were incorporated into the development of the nomogram for HHcy risk prediction. The underlying regression equation of this nomogram is:

Logit(P)=12.510.61×(EggConsumptionFrequency)+0.35×(LDL)+0.07×(TP)+2.24×(CysC)
Figure 2
Graphical representation showing scales for various health metrics: Points, frequency of egg consumption (more than three times per week or at most three times per week), LDL (ranging from 0.5 to 6 mmol/L), TP (ranging from 40 to 85 g/L), CysC (ranging from 0 to 5 mg/L), Total Points (ranging from 0 to 120), and probability of HHcy (ranging from 0.1 to 0.99). Each parameter is represented on a linear scale.

Figure 2. Nomogram model based on LASSO logistic regression. The nomogram represents the prediction probability of HHcy, ranging from 0 to 120. For each predictive, a vertical line is drawn to the point axis, and the corresponding point is noted down. The scores of each predictor are summed. The total score corresponding to the predicted occurrence of probative variability of HHcy is provided at the bottom of the nomogram.

Note: For the categorical variable “Egg Consumption Frequency” in the equation, the assignment is defined as follows: 1 represents egg consumption frequency < 3 times/week, and 0 represents egg consumption frequency ≥ 3 times/week.

Each variable's values were assigned scores on the scale axis based on the magnitude of their regression coefficients. The sum of individual scores yielded a total score, and the probability of HHcy occurrence was calculated along the total score scale axis. To demonstrate the clinical utility of the nomogram, a practical example is provided as follows: For a hypothetical perimenopausal patient with an egg consumption frequency of more than 3 times per week, an LDL level of 6 mmol/L, a TP level of 70 g/L, and a CysC level of 1 mg/L, first locate the patient's specific values for each variable on the corresponding variable axes of the nomogram, then draw a vertical line upward from each variable value to the “points axis” to obtain the component score for each variable—specifically, approximately 5 points for egg consumption frequency, approximately 15 points for LDL at the concentration of 6 mmol/L, approximately 20 points for TP at the concentration of 70 g/L, and approximately 20 points for CysC at the concentration of 1 mg/L—subsequently sum these component scores to calculate the total score (5 points for egg consumption frequency + 15 points for LDL + 20 points for TP + 20 points for CysC = 60 points), and finally draw a vertical line downward from the total score (60 points) to the “probability axis,” where the corresponding value represents the predicted probability of HHcy for this patient, approximately 50%–60% in this case.

3.4 Internal evaluation of the prediction model: accuracy and calibration

We initially plotted the ROC curve of the model in the training set (Figure 3A), with an AUC of 0.765 (95% CI = 0.708–0.822). On this curve, when the specificity reached 0.682, the corresponding sensitivity was 0.753, which reflects the trade-off between sensitivity and specificity of the model in the training set and indicates the good clinical diagnostic performance of the model. The calibration curve suggested that the mean absolute error (MAE) between the predicted and actual values was 0.012 (Figure 4A), indicating that the predicted risk closely aligns with the actual risk. As the nomogram model was constructed based on the training set, we evaluated and validated the model in the validation set, resulting in an AUC of 0.854 (95% CI = 0.781–0.928) (Figure 3B). For the internal validation set ROC curve, when the specificity was 0.753, the corresponding sensitivity was 0.850. The calibration curve showed that the MAE between the predicted values and the actual values was 0.031 (Figure 4B). The DCA results of the training set (Figure 5A) and the internal validation set (Figure 5B) showed that the predictive model occupied a high position on the decision curve. The DCA curves clearly indicated that within a specific “high-risk threshold” range, the performance of the nomogram model (the red curve) was superior to both the “intervene all” (the gray curve) and “intervene none” (the black line) strategies. In particular, when the threshold probability fell within the interval of 0.2–0.8, the standardized net benefit of the model was significantly higher, which fully demonstrated that the model had higher net benefit and clinical application value.

Figure 3
Two ROC curve graphs labeled A and B compare sensitivity and specificity. Graph A shows an AUC of 0.765 with confidence interval 0.708 to 0.822. Graph B shows an AUC of 0.854 with confidence interval 0.781 to 0.928. Both graphs highlight specific points on the curves.

Figure 3. ROC curves for predicting risk of HHcy in perimenopausal women (A) training set (B) internal validation set.

Figure 4
Panel (A) and panel (B) show calibration plots comparing predicted probability versus actual probability. Both panels include apparent, bias-corrected, and ideal lines. Panel (A) has a mean absolute error of 0.012 with 481 samples. Panel (B) has a mean absolute error of 0.031 with 206 samples. Both plots display calibration curves with similar trends, indicating the model's predictive performance.

Figure 4. Calibration curves for predicting risk of HHcy in perimenopausal women (A) training set (B) internal validation set.

Figure 5
Two line graphs labeled A and B depict standardized net benefit against high-risk threshold. Both graphs compare three curves: Nomogram model (red), All (gray), and None (gray). The x-axis shows high-risk threshold and cost-benefit ratio, while the y-axis shows standardized net benefit ranging from zero to one. The Nomogram model in both graphs indicates higher net benefit across the threshold compared to All and None.

Figure 5. DCA curves for predicting risk of HHcy in perimenopausal women (A) training set (B) internal validation set.

3.5 External validation of the prediction model

From March to June 2025, 63 perimenopausal women were selected from those hospitalized in the Department of Cardiology of Hunan Provincial People's Hospital during this period, all of whom met the specific inclusion and exclusion criteria of the study. Among them, 11 perimenopausal women were diagnosed with HHcy, accounting for 17.46% of the total study participants. ROC curve analysis (Figure 6) showed that the AUC of the nomogram model for predicting HHcy risk in the external validation group of perimenopausal women was 0.776 (95% CI = 0.603–0.949); specifically, when the specificity reached 0.846, the corresponding sensitivity was 0.727. In addition, the MAE between the predicted values and actual values of the model was 0.055.

Figure 6
Receiver Operating Characteristic (ROC) curve illustrating sensitivity versus specificity. The plot includes a curve indicating performance, with an Area Under the Curve (AUC) of 0.776, range 0.603 to 0.949, and a specific data point marked at 0.556 (sensitivity 0.846, specificity 0.727).

Figure 6. ROC curve for external validation of the HHcy prediction model in perimenopausal women.

4 Discussion

In this study, we found that the frequency of egg consumption, LDL, TP, and CysC were significant predictors of HHcy in perimenopausal women using LASSO regression, demonstrating high predictive accuracy and clinical applicability. The AUC for the predictive model was 0.765, and the internal validation AUC was 0.854. The Hosmer-Lemeshow goodness-of-fit calibration curve showed that the MAEs of the training and internal validation sets were 0.012 and 0.031, respectively. For external validation, ROC curve analysis showed the nomogram had an AUC of 0.776 (95% CI = 0.603–0.949) for predicting HHcy risk in the external validation group, with a MAE of 0.055 between the model's predicted values and actual outcomes.

The findings of this study on HHcy in perimenopausal women are highly consistent with existing research conclusions regarding Hcy metabolism and its clinical significance, while also extending such knowledge. The prevalence of HHcy in this study was 19.94%, a figure consistent with the results of studies on different populations. For example, a cross-sectional survey covering 10,511 middle-aged and elderly individuals in China showed that the prevalence of HHcy in this population was 22.00% (9); another study involving Japanese patients with stroke complicated by chronic kidney disease (CKD) reported that the prevalence of HHcy among its subjects was 18.50% (15).Although there are obvious differences in population characteristics between the above two studies and this one, the prevalence of HHcy in perimenopausal women in this study falls exactly within the numerical range of the existing research results. This finding suggests that the epidemiological characteristics of HHcy in perimenopausal women are not completely independent of the metabolic laws of the general adult population, but share certain commonalities with them. At the same time, the prevalence data of this study also provide a key reference for subsequent comparisons of Hcy level distribution among women in different physiological stages and different health statuses, filling the partial gap in epidemiological data on HHcy in women during the special physiological stage of perimenopause.

The frequency of egg consumption was considered an important predictor in our study. Specifically, perimenopausal women who consumed eggs no more than three times per week were found to have a 0.545-fold risk of developing HHcy compared to those who consumed eggs more frequently (more than three times per week). Few studies have confirmed a direct link between egg intake and elevated Hcy levels, and existing evidence remains inconsistent. On one hand, the association between excessive egg intake and elevated Hcy levels is biologically plausible. First, eggs are an important source of methionine, which is metabolized in the body by transmethylation to form homocysteine (16). Excessive methionine intake leads to HHcy, which is a causative agent of cardiovascular disease in humans (17). Furthermore, excessive consumption of egg yolks can increase LDL levels (18). Changes in LDL levels also affect Hcy concentrations. On the other hand, a 2011 randomized controlled trial found that among participants with type 2 diabetes or impaired glucose tolerance, there was no significant association between egg consumption (two eggs per day) and Hcy levels (19). This suggests that the association between egg intake and HHcy may be influenced by glucose metabolism status while the participants in our study were mainly perimenopausal women with normal glucose metabolism. Under the state of glucose metabolic homeostasis, the methionine metabolism pathway is more susceptible to the regulation of dietary methionine intake, which may thereby strengthen the association between egg consumption and HHcy.

Although the mechanism by which LDL influences Hcy concentrations is not fully understood, LDL and its oxidized form (OxLDL) are known to accumulate in the arterial intima, triggering adaptive immunity and initiating a cascade of events that can lead to atherosclerosis and endothelial dysfunction (20). Endothelial dysfunction shares risk factors with HHcy, such as increased oxidative stress and reduced nitric oxide bioavailability. Additionally, there is a significant genetic component involved in the regulation of reactive oxygen species, Hcy levels, and atherogenesis (21). The increased correlation between HHcy and dyslipidemia in perimenopausal women may be attributed to changes in metabolism, hormonal levels, and lifestyle factors. However, it is crucial to acknowledge that this association might be influenced by unmeasured subclinical inflammation. As a common underlying condition in metabolic disorders, chronic low-grade inflammation can both disrupt lipid metabolism and promote Hcy production via pathways like oxidative stress or impaired enzyme activity in Hcy metabolism (22). Therefore, the association between LDL and HHcy may not solely reflect direct biological interactions but could also incorporate indirect effects of inflammation acting as a shared driver. The total protein in the serum is made up of two main categories: ALB and GLB. These components are important for assessing nutritional status and diagnosing various diseases. Methionine is regenerated via the retrieval of a methyl group from 5-methyltetrahydrofolate, a process that converts 5-methyltetrahydrofolate to tetrahydrofolate; tetrahydrofolate is subsequently converted back to 5-methyltetrahydrofolate by methylenetetrahydrofolate reductase. This process is called remethylation. Alternatively, Hcy can follow the transsulfuration route, where through cystathionine-beta-synthase, it is irreversibly converted into cystathionine, a precursor of cysteine, glutathione, and other substances that are finally excreted in the urine. HHcy results from inhibition of the remethylation route, or inhibition or saturation of the transsulfuration pathway (23). Higher levels of TP may imply a more active methylation reaction in vivo, thus affecting Hcy metabolism. However, it is important to note that confounding factors may exist in the association between TP levels and HHcy: TP levels can indirectly reflect underlying nutritional status. For instance, mild malnutrition may simultaneously reduce serum TP synthesis and impair Hcy metabolism by limiting the intake of critical micronutrients essential for Hcy clearance (24). Consequently, the observed association between TP and HHcy may not represent a direct causal relationship but could, to some extent, be driven by unmeasured nutritional factors.

Serum CysC was identified as the most significant predictor of HHcy in perimenopausal women in this study. The kidney is one of the important sites for Hcy metabolism. Hcy levels are closely associated with renal function. Previous studies have indicated that Hcy is elevated in patients with CKD and increases as the disease progresses (25). Numerous studies have established an association between Hcy and renal function indicators (26). Notably, CysC, Scr, and GFR share overlapping biological correlations and clinical significance, while CysC also exhibits unique advantages (27). Biologically, all three are linked to glomerular filtration function. Clinically, their significance overlaps in that all three are used to assess renal function and predict renal-related complications. However, serum Scr-based GFR has limitations due to its dependence on muscle mass, dietary intake, and tubular secretion. CysC is less influenced by these factors, potentially offering a more accurate reflection of kidney function, especially in certain populations such as the elderly or those with reduced muscle mass (28). CysC, a cysteine protease inhibitor, is ubiquitous in body fluids and nucleated cells throughout the human body (29). Regarding its potential role as an inflammatory marker, preclinical studies suggest CysC may influence Hcy metabolism through inhibition of cystathionine γ-lyase, an enzyme involved in Hcy catabolism. Additionally, both CysC and Hcy have been linked to pro-inflammatory pathways, including oxidative stress and endothelial dysfunction, which could create bidirectional relationships that are difficult to parse in observational data (30). However, these mechanistic links remain to be fully validated in clinical settings, and our study design cannot definitively establish causality.

LASSO regression analysis is a widely used statistical method for feature selection. It constructs a penalty function by compressing the regression coefficients. The advantages of this method lie in avoiding overfitting and extracting significant features effectively. Besides, LASSO is more advantageous in situations where there are various clinical parameters and a limited sample size (31). In addition, it outperforms stepwise logistic regression, ridge regression, and elastic net, thanks to its features of sparse variable selection (directly setting the coefficients of irrelevant variables to 0) and mitigation of multicollinearity (32). Furthermore, LASSO-logistic regression is an optimized extension of the traditional linear model framework. It not only retains the core advantages of traditional linear models—such as the ability to quantify the association between variables and outcomes and good adaptability to moderate sample sizes—but also addresses the limitations of traditional linear models in handling multiple variables through coefficient shrinkage. Ultimately, LASSO identified 4 independent predictive factors for HHcy, meeting the clinical demand for a parsimonious and stable model. LASSO regression identified fewer variables than expected based on clinical experience, which can be attributed to several straightforward reasons. This study focused specifically on perimenopausal women aged 40–60 years, and several cohort-specific characteristics may help explain the variable selection outcomes. First, regarding BMI—a factor often considered clinically relevant—this study's perimenopausal women exhibited a relatively narrow BMI distribution, which likely reduced its ability to serve as a distinct predictive marker for HHcy. Second, for traditional risk factors like smoking, the number of smokers among the perimenopausal women was particularly small; this limited sample size for smoking status may have weakened its statistical association with the outcome, leading to its exclusion from the final model (33). Furthermore, the unique hormonal fluctuations inherent to women in this specific age group may also have modulated the relationships between potential predictors and HHcy, further influencing which variables remained in the model. Finally, it is important to note that differences in the datasets and samples used—including variations in overall sample size, participant origin, and data collection timing—can also introduce variability in results, and these factors may have contributed to the final set of selected predictors as well.

In this study, we established a predictive statistical model to assess the risk of HHcy in perimenopausal women, and the nomogram not only visually presents the independent risk factors identified in multivariate regression analysis but also enables prediction through simple graphics. This tool will help doctors to accurately predict the risk of HHcy and provide a powerful tool for clinical management. To further enhance its accessibility and practicality in primary care settings, we plan to develop a user-friendly web-based calculator based on this nomogram, which will allow clinicians to automatically calculate HHcy risk by inputting patients’ egg consumption frequency, LDL levels, TP levels, and CysC levels. Concurrently, we will conduct a pilot application in 3 hospitals to verify its usability and predictive consistency in real-world clinical scenarios. Notably, although our study population focuses on perimenopausal women aged 40–60 years, there is still objective heterogeneity within this group. As supported by relevant studies in the field (34), this heterogeneity may contribute to differential predictive performance of the nomogram across subgroups of this population. Meanwhile, the modifiable risk factors identified by the model highlight the relevance of analyzing causal associations between intervention measures and HHcy outcomes. Within the framework of target trial emulation (35), methods like propensity score matching and inverse probability weighting offer approaches to more reasonably control confounding factors. This can strengthen the reliability of evidence when evaluating the link between modifying these risk factors and changes in HHcy risk, and further provide implications for guiding HHcy management in perimenopausal women.

The strengths of this paper are, first, that LASSO regression was used for variable screening, with its most significant advantage over traditional univariate analysis being the ability to automate variable selection. Secondly, we included complete information, including demographic characteristics, pregnancy-related factors, lifestyles, and diets. Finally, our model was validated and showed good accuracy and stability. It is targeted at perimenopausal women and can provide some guidance for the prevention of HHcy in this special population. Several limitations of this study need to be recognized. This study is retrospective, with selection bias (hospital recruitment overrepresenting perimenopausal women with chronic conditions, underrepresenting healthy ones) and information bias (retrospective self-reported dietary data may cause recall bias); however, the real-world model has in-hospital clinical value, and future prospective studies should validate it in community cohorts while including nutritional biomarkers (folate, vitamin B6, B12) detection to better control nutritional confounding. Additionally, despite the use of LASSO regression for variable selection, potential overfitting risk remains due to limited sample size and initial multiple predictors, and subsequent studies will expand sample size, optimize criteria, and use stricter validation to boost model stability. Finally, the generalizability of the study is limited because it was conducted in only one region of China, and future studies should expand it to other regions.

5 Conclusion

This study identified independent risk factors (including egg consumption frequency, LDL, TP, and CysC) for HHcy and developed a predictive risk model for perimenopausal women using LASSO regression combined with multifactorial binary logistic regression methods, showing good diagnostic efficacy and calibration. Based on these findings, regular monitoring of these factors can aid in the early detection and reduction of HHcy. The clinical implementation of this tool may contribute to reducing the prevalence of cardiovascular diseases by identifying individuals with early-stage HHcy and enabling targeted interventions.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

Ethics statement

The studies involving humans were approved by The study was ethically approved by the Medical Ethics Committee of Hunan Normal University (approval numbers: 2017034 and 2024061). The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

XT: Investigation, Writing – original draft. ML: Writing – original draft. JW: Writing – original draft. YP: Writing – original draft. LZ: Writing – original draft. NJ: Writing – original draft. LL: Writing – review & editing. XH: Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported by the National Natural Science Foundation of China (8177120863), the Hunan Provincial Natural Science Foundation (2025JJ60519), the Key Research and Development Program of Hunan Province of China (2023SK2059), and the Healthcare and Public Health Research Project of Hunan Province (20254401).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frph.2025.1670141/full#supplementary-material

Abbreviations

HHcy, hyperhomocysteinemia; Hcy, homocysteine; LASSO, least absolute shrinkage and selection operator; ROC, receiver operating characteristic; DCA, decision curve analysis; AUC, area under the receiver operating characteristic curve; SBP, systolic blood pressure; DBP, diastolic blood pressure; BMI, body mass index, PIH; pregnancy-induced hypertension; E2, estradiol; TG, triglycerides; LDL, low-density lipoprotein; HDL, high-density lipoprotein; AST, aspartate aminotransferase; TP, total protein; ALB, albumin; GLB, globulins; A/G, albumin/globulin ratio; PT, prothrombin time; UA, uric acid; Scr, serum creatinine; GFR, glomerular filtration rate; CysC, cystatin C; VIF, variance inflation factor; OR, odds ratio.

References

1. Tran KH, Luki J, Hanstock S, Hanstock CC, Seres P, Aitchison K, et al. Decreased GABA+ levels in the medial prefrontal cortex of perimenopausal women: a 3 T 1H-MRS study. Int J Neuropsychopharmacol. (2023) 26:32–41. doi: 10.1093/ijnp/pyac066

PubMed Abstract | Crossref Full Text | Google Scholar

2. Keller AC, Klawitter J, Hildreth KL, Christians U, Putnam K, Kohrt WM, et al. Elevated plasma homocysteine and cysteine are associated with endothelial dysfunction across menopausal stages in healthy women. J Appl Physiol. (2019) 126:1533–40. doi: 10.1152/japplphysiol.00819.2018

PubMed Abstract | Crossref Full Text | Google Scholar

3. Guo J, Gao Y, Ahmed M, Dong P, Gao Y, Gong Z, et al. Serum homocysteine level predictive capability for severity of restenosis post percutaneous coronary intervention. Front Pharmacol. (2022) 13:816059. doi: 10.3389/fphar.2022.816059

PubMed Abstract | Crossref Full Text | Google Scholar

4. Pinzon RT, Wijaya VO, Veronica V. The role of homocysteine levels as a risk factor of ischemic stroke events: a systematic review and meta-analysis. Front Neurol. (2023) 14:1144584. doi: 10.3389/fneur.2023.1144584

PubMed Abstract | Crossref Full Text | Google Scholar

5. Cheng Y, Wang C, Zhang X, Zhao Y, Jin B, Wang C, et al. Circulating homocysteine and folate concentrations and risk of type 2 diabetes: a retrospective observational study in Chinese adults and a Mendelian randomization analysis. Front Cardiovasc Med. (2022) 9:978998. doi: 10.3389/fcvm.2022.978998

PubMed Abstract | Crossref Full Text | Google Scholar

6. Bhatt RD, Karmacharya BM, Shrestha A, Timalsena D, Madhup S, Shahi R, et al. Prevalence of MTHFR C677T polymorphism and its association with serum homocysteine and blood pressure among different ethnic groups: insights from a cohort study of Nepal. BMC Cardiovasc Disord. (2025) 25:235. doi: 10.1186/s12872-025-04690-z

PubMed Abstract | Crossref Full Text | Google Scholar

7. Gao Y, Guo Y, Hao W, Meng J, Miao Z, Hou A, et al. Correlation analysis and diagnostic value of Serum homocysteine, cystatin C and uric acid levels with the severity of coronary artery stenosis in patients with coronary heart disease. Int J Gen Med. (2023) 16:2719–31. doi: 10.2147/ijgm.S411417

PubMed Abstract | Crossref Full Text | Google Scholar

8. Li Y, Lu FG, Yin YN. Applying logistic LASSO regression for the diagnosis of atypical Crohn’s disease. Sci Rep. (2022) 12:9. doi: 10.1038/s41598-022-15609-5

PubMed Abstract | Crossref Full Text | Google Scholar

9. Feng H, Wang X, Yu L, Zheng Q, Wang Z. Study on the relationship between homocysteine and general metabolic indexes in healthy population in Hebei Province, China. Front Endocrinol. (2025) 16:1523157. doi: 10.3389/fendo.2025.1523157

PubMed Abstract | Crossref Full Text | Google Scholar

10. Li H-l, Xu B, Zheng W, Xu W-h, Gao J, Shu X-o, et al. Epidemiological characteristics of obesity and its relation to chronic diseases among middle aged and elderly men. Zhonghua Liu Xing Bing Xue Za Zhi. (2010) 31:370–4.20513277

PubMed Abstract | Google Scholar

11. Fotă A, Petca A. Gestational diabetes mellitus: the dual risk of small and large for gestational age: a narrative review. Med Sci. (2025) 13:144. doi: 10.3390/medsci13030144

Crossref Full Text | Google Scholar

12. Saravanan P. Gestational diabetes: opportunities for improving maternal and child health. Lancet Diabetes Endocrinol. (2020) 8:793–800. doi: 10.1016/s2213-8587(20)30161-3

PubMed Abstract | Crossref Full Text | Google Scholar

13. Kintiraki E, Papakatsika S, Kotronis G, Goulis DG, Kotsis V. Pregnancy-induced hypertension. Horm Int J Endocrinol Metab. (2015) 14:211–23. doi: 10.14310/horm.2002.1582

PubMed Abstract | Crossref Full Text | Google Scholar

14. Pelgrims I, Devleesschauwer B, Vandevijvere S, De Clercq EM, Vansteelandt S, Gorasso V, et al. Using random-forest multiple imputation to address bias of self-reported anthropometric measures, hypertension and hypercholesterolemia in the Belgian health interview survey. BMC Med Res Methodol. (2023) 23:15. doi: 10.1186/s12874-023-01892-x

PubMed Abstract | Crossref Full Text | Google Scholar

15. Mizuno T, Hoshino T, Ishizuka K, Toi S, Takahashi S, Wako S, et al. Hyperhomocysteinemia increases vascular risk in stroke patients with chronic kidney disease. J Atheroscler Thromb. (2023) 30:1198–209. doi: 10.5551/jat.63849

PubMed Abstract | Crossref Full Text | Google Scholar

16. Hirota K, Yamauchi R, Miyata M, Kojima M, Kako K, Fukamizu A. Dietary methionine functions in proliferative zone maintenance and egg production via sams-1 in caenorhabditis elegans. J Biochem. (2024) 176:359–67. doi: 10.1093/jb/mvae054

PubMed Abstract | Crossref Full Text | Google Scholar

17. Ramires OVR Jr., Silveira JS, Gusso D, Prauchner GRK, Deniz BF, de Almeida W, et al. Homocysteine decreases VEGF, EGF, and TrkB levels and increases CCL5/ RANTES in the hippocampus: neuroprotective effects of rivastigmine and ibuprofen. Chem Biol Interact. (2024) 403:11. doi: 10.1016/j.cbi.2024.111260

Crossref Full Text | Google Scholar

18. Palfrey HA, Kumar A, Pathak R, Stone KP, Gettys TW, Murthy SN. Adverse cardiac events of hypercholesterolemia are enhanced by sitagliptin in sprague dawley rats. Nutr Metab. (2024) 21:54. doi: 10.1186/s12986-024-00817-9

PubMed Abstract | Crossref Full Text | Google Scholar

19. Pearce KL, Clifton PM, Noakes M. Egg consumption as part of an energy-restricted high-protein diet improves blood lipid and blood glucose profiles in individuals with type 2 diabetes. Br J Nutr. (2011) 105:584–92. doi: 10.1017/s0007114510003983

PubMed Abstract | Crossref Full Text | Google Scholar

20. Veledar E, Veledar O, Gardener H, Rundek T, Garelnabi M. Harnessing statistical and machine learning approaches to analyze oxidized LDL in clinical research. Cell Biochem Biophys. (2025):15. doi: 10.1007/s12013-025-01837-9

Crossref Full Text | Google Scholar

21. Hurjui LL, Tarniceriu CC, Serban DN, Lozneanu L, Bordeianu G, Nedelcu AH, et al. Homocysteine attack on vascular endothelium-old and new features. Int J Mol Sci. (2025) 26:6298. doi: 10.3390/ijms26136298

PubMed Abstract | Crossref Full Text | Google Scholar

22. Ren L, Guo J, Zhao W, Zuo R, Guo S, Jia C, et al. Serum homocysteine relates to elevated lipid level, inflammation and major adverse cardiac event risk in acute myocardial infarction patients. Biomark Med. (2023) 17:297–306. doi: 10.2217/bmm-2023-0096

PubMed Abstract | Crossref Full Text | Google Scholar

23. Rumińska M, Witkowska-Sędek E, Krajewska M, Stelmaszczyk-Emmel A, Sobol M, Pyrżak B. Evaluation of total homocysteine levels in relation to abdominal fat mass and traditional cardiovascular risk factors in overweight and obese adolescents. Life. (2025) 15:1329. doi: 10.3390/life15081329

Crossref Full Text | Google Scholar

24. Savic-Hartwig M, Kerlikowsky F, van de Flierdt E, Hahn A, Schuchardt JP. A micronutrient supplement modulates homocysteine levels regardless of vitamin B biostatus in elderly subjects. Int J Vitam Nutr Res. (2024) 94:120–32. doi: 10.1024/0300-9831/a000777

PubMed Abstract | Crossref Full Text | Google Scholar

25. Cohen E, Margalit I, Shochat T, Goldberg E, Krause I. The relationship between the concentration of plasma homocysteine and chronic kidney disease: a cross sectional study of a large cohort. J Nephrol. (2019) 32:783–9. doi: 10.1007/s40620-019-00618-x

PubMed Abstract | Crossref Full Text | Google Scholar

26. Niu XN, Wen H, Sun N, Yang Y, Du SH, Xie R, et al. Estradiol and hyperhomocysteinemia are linked predominantly through part renal function indicators. Front Endocrinol. (2022) 13:9. doi: 10.3389/fendo.2022.817579

Crossref Full Text | Google Scholar

27. Gadashova A, Tunçay SC, Özek G, Hakverdi G, Kansoy S, Kabasakal C, et al. Long-term kidney outcomes in children after allogeneic hematopoietic stem cell transplantation assessed with estimated glomerular filtration rate equations, creatinine levels, and cystatin C levels. J Bras Nefrol. (2023) 45:60–6. doi: 10.1590/2175-8239-JBN-2021-0231en

PubMed Abstract | Crossref Full Text | Google Scholar

28. Zhang F, Sun Y, Bai Y, Yu W, Yin M, Zhong Y, et al. Association of intra-individual differences in estimated GFR by creatinine versus cystatin C with incident cardiovascular disease. Nutr Metab Cardiovasc Dis. (2025) 35:104034. doi: 10.1016/j.numecd.2025.104034

PubMed Abstract | Crossref Full Text | Google Scholar

29. Peng P, Fu XC, Wang Y, Zheng X, Bian L, Zhati N, et al. The value of serum cystatin c in predicting acute kidney injury after cardiac surgery: a systematic review and meta-analysis. PLoS One. (2024) 19:e0310049. doi: 10.1371/journal.pone.0310049

PubMed Abstract | Crossref Full Text | Google Scholar

30. Wu W, Guan Y, Xu K, Fu XJ, Lei XF, Lei LJ, et al. Plasma homocysteine levels predict the risk of acute cerebral infarction in patients with carotid artery lesions. Mol Neurobiol. (2016) 53:2510–7. doi: 10.1007/s12035-015-9226-y

PubMed Abstract | Crossref Full Text | Google Scholar

31. Zhao F, Huang X, He J, Li J, Li Q, Wei F, et al. A nomogram for distinguishing benign and malignant parotid gland tumors using clinical data and preoperative blood markers: development and validation. J Cancer Res Clin Oncol. (2023) 149:11719–33. doi: 10.1007/s00432-023-05032-2

PubMed Abstract | Crossref Full Text | Google Scholar

32. Robledo KP, Marschner IC, Grossmann M, Handelsman DJ, Yeap BB, Allan CA, et al. Predicting type 2 diabetes and testosterone effects in high-risk Australian men: development and external validation of a 2-year risk model. Eur J Endocrinol. (2025) 192:15–24. doi: 10.1093/ejendo/lvae166

PubMed Abstract | Crossref Full Text | Google Scholar

33. Wang L, Wei W, Cai M. A review of the risk factors associated with endometrial hyperplasia during perimenopause. Int J Women’s Health. (2024) 16:1475–82. doi: 10.2147/ijwh.S481509

PubMed Abstract | Crossref Full Text | Google Scholar

34. Yang J, Zhang B, Hu C, Jiang X, Shui P, Huang J, et al. Identification of clinical subphenotypes of sepsis after laparoscopic surgery. Laparosc Endosc Robot Surg. (2024) 7:16–26. doi: 10.1016/j.lers.2024.02.001

Crossref Full Text | Google Scholar

35. Yang J, Wang L, Chen L, Zhou P, Yang S, Shen H, et al. A comprehensive step-by-step approach for the implementation of target trial emulation: evaluating fluid resuscitation strategies in post-laparoscopic septic shock as an example. Laparosc Endosc Robot Surg. (2025) 8:28–44. doi: 10.1016/j.lers.2025.01.001

Crossref Full Text | Google Scholar

Keywords: hyperhomocysteinemia, LASSO, nomogram, perimenopausal women, factor associated

Citation: Tan X, Li M, Wang J, Peng Y, Zhu L, Jiang N, Li L and Hong X (2025) Precision prediction of hyperhomocysteinemia development in perimenopausal women using LASSO regression. Front. Reprod. Health 7:1670141. doi: 10.3389/frph.2025.1670141

Received: 23 July 2025; Accepted: 24 September 2025;
Published: 9 October 2025.

Edited by:

Andrew Libby, University of Colorado Anschutz Medical Campus, United States

Reviewed by:

Zhongheng Zhang, Sir Run Run Shaw Hospital, China
Azadeh Anna Nikouee, Loyola University Chicago, United States
Ju Gao, Suzhou Guangji Hospital, China
Héctor Emmanuel Cortés-Ferré, Monterrey Institute of Technology and Higher Education (ITESM), Mexico

Copyright: © 2025 Tan, Li, Wang, Peng, Zhu, Jiang, Li and Hong. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Ling Li, bGlsaW5nbXBkQGh1bm51LmVkdS5jbg==; Xiuqin Hong, eGl1cWluaG9uZzA1MjhAaHVubnUuZWR1LmNu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.