Examining the U-shaped relationship of sleep duration and systolic blood pressure with risk of cardiovascular events using a novel recursive gradient scanning model

Background Observational studies have suggested U-shaped relationships between sleep duration and systolic blood pressure (SBP) with risks of many cardiovascular diseases (CVDs), but the cut-points that separate high-risk and low-risk groups have not been confirmed. We aimed to examine the U-shaped relationships between sleep duration, SBP, and risks of CVDs and confirm the optimal cut-points for sleep duration and SBP. Methods A retrospective analysis was conducted on NHANES 2007–2016 data, which included a nationally representative sample of participants. The maximum equal-odds ratio (OR) method was implemented to obtain optimal cut-points for each continuous independent variable. Then, a novel “recursive gradient scanning method” was introduced for discretizing multiple non-monotonic U-shaped independent variables. Finally, a multivariable logistic regression model was constructed to predict critical risk factors associated with CVDs after adjusting for potential confounders. Results A total of 26,691 participants (48.66% were male) were eligible for the current study with an average age of 49.43 ± 17.69 years. After adjusting for covariates, compared with an intermediate range of sleep duration (6.5–8.0 h per day) and SBP (95–120 mmHg), upper or lower values were associated with a higher risk of CVDs [adjusted OR (95% confidence interval) was 1.20 (1.04–1.40) for sleep duration and 1.17 (1.01–1.36) for SBP]. Conclusions This study indicates U-shaped relationships between SBP, sleep duration, and risks of CVDs. Both short and long duration of sleep/higher and lower BP are predictors of cardiovascular outcomes. Estimated total sleep duration of 6.5–8.0 h per day/SBP of 95–120 mmHg is associated with lower risk of CVDs.


Introduction
Cardiovascular diseases (CVDs) continue to be the foremost cause of both morbidity and mortality on a global scale (1).Recent studies have suggested that there may be U-shaped associations between systolic blood pressure (SBP), sleep duration, and CVDs (2)(3)(4)(5)(6)(7)(8).Thus, determining the optimal range of SBP and sleep duration is crucial for reducing the risk of CVDs.
When examining the relationships between continuous explanatory variables and health-related outcomes in medical research, it is commonly recommended to investigate U-shaped relationships when non-linear effects are suspected (9)(10)(11).If we make the simplifying assumption that these continuous variables exhibit a linear correlation with prognosis and directly incorporate them into the construction of regression models, this approach will lead to a significant increase in the residuals of regression analysis.On the other hand, the epidemiologists consequently may fail to find a fundamental clue to formulate intervention measures.Although Cox regression models supplemented by flexible smoothing techniques (12-14), such as penalized splines and restricted cubic splines, can handle the U-shaped effects of continuous variables, many clinical and epidemiological researchers prefer to categorize continuous explanatory variables into high-risk and low-risk groups (15,16).Optimal cut-points can identify crucial predictor thresholds, facilitate the development of patient classification schemes, and assist in clinical treatment strategies.However, determining appropriate cut-points becomes critical when clinical reference ranges are unavailable or cannot be directly applied to populations with distinct characteristics (17)(18)(19)(20)(21).
Two methods are utilized to discretize continuous independent variables in biostatistical analysis.One of them is the data-oriented cut-points approach (22,23), which involves utilizing percentiles like median or quartiles based on the distribution of continuous variables.Although this method is easy to implement, it can produce arbitrary cut-points that do not consider the relationship with survival outcomes and may lead to inaccurate estimates of the actual effects (24).The other approach is the maximum statistic or minimum p-value approach (25), which chooses a cut-point with maximum χ 2 statistic as the optimal cut-point for binary outcomes.However, the above two discretization approaches have a high probability of dividing individuals with similar risk into different groups, leading to inconsistent discretization results for high-and low-risk groups.
To address the limitations of conventional discretization methods and fulfill the requirement of identifying optimal cutpoints for continuous predictors that exhibit a U-shaped relationship with outcomes, our team proposed two novel methods to discretize the single non-monotonic continuous variable, namely, "two cut-points with maximum odds ratio (OR) value method" (26) and "optimal equal-hazard ratio with minimum Akaike information criterion (AIC) value method" (27), which have been widely validated by peer review consensus.The OR or RR values obtained by our original methods not only directly respond to clinical needs but also optimize the evaluation from a statistical methodology perspective.Due to the universality of our proposed methods, the methods were quickly applied by domestic and foreign scholars to solve their practical problems, such as optimal cut-points identification of biomarkers (i.e., red blood cell distribution width for prognostic significance, serum creatinine for kidney injury after lung transplantation, and hemoglobin for surgical coronary revascularization).However, extending this discretization method to multiple non-monotonic independent variables can be challenging, as it requires a more complex consideration on either theoretical assumption or algorithm implementation.Currently, our team has developed a novel "recursive gradient scanning method" for the discretization of multiple non-monotonic independent variables simultaneously.This approach enables us to prioritize critical intervention measures and achieve targeted goals efficiently.The proposed method will provide a theoretical basis and algorithmic support for identifying significant influencing factors and constructing intervention programs.
This study utilized data from the National Health and Nutrition Examination Surveys (NHANES) covering 12 years to examine the relationships between sleep duration, SBP, and risks of CVDs.In addition, this study aimed to identify the optimal cut-points for sleep duration and SBP with CVDs as the outcome of interest, test the new method on real-world data, and compare its performance with other existing methods for discretizing multiple nonmonotonic independent variables.

Sample and design
The NHANES surveys use a complex, multistage, probability sampling design to create a representative sample of the civilian, non-institutionalized US population, and are conducted in a series of cross-sectional population-based surveys.Each year, about 5,000 individuals are examined, and data are released to the public in 2-year cycles.NHANES datasets have detailed information on data collection procedures and analytic guidelines provided elsewhere (28,29).To gather information on CVDs, questionnaires were added to the surveys from 2007 to 2016, and this study used a total of five cycles (NHANES 2007-2016).

Definition of outcome
The presence of CVDs was ascertained using a combination of self-reported physician diagnoses and standardized medical status questionnaires, which were completed during individual interviews.The participants were specifically asked if a healthcare professional had ever informed them of having congestive heart failure (CHF), coronary heart disease (CHD), angina pectoris, heart attack, or stroke.Those who answered "yes" to any of the above were considered as having CVDs, and the outcome was converted to a dichotomous variable.Participants who responded with "did not know" were excluded from the analysis.

Sleep duration and SBP assessment (ascertainment of exposure)
NHANES datasets use responses to the question "How much sleep do you usually get at night on weekdays or workdays?" to obtain information on sleep duration.In cases where individuals reported a sleep duration of ≥12 h, this value was coded as 12.To minimize the risk of inaccurate sleep duration data and the potential impact of poor health on the study results, we opted to exclude individuals with missing sleep duration and those who reported sleeping less than 4 h.An average of three consecutive blood pressure measurements taken after resting quietly for 5 min was used to determine the SBP and diastolic blood pressure (DBP).

Statistical analysis
All statistical analyses in this study were adjusted for the complex sampling design of NHANES.

Descriptive analysis and modeling
Participant characteristics were summarized using weighted means and standard deviations for continuous variables and weighted counts and percentages for categorical variables.Differences between participants with and without CVDs were assessed using Rao-Scott χ 2 for categorical variables and independent t-tests for continuous variables.A multivariable logistic regression model was used to examine the association between sleep duration, SBP, and the risk of CVDs, with the lower risk group serving as the reference category.

Graphical diagnostic plot
The semiparametric models with penalized B-splines (P-splines) were fitted using the R package "SemiPar" (35).This approach balances the goodness of fit and variance to curve the relationship and assess the statistical significance of the non-linear term.

Find two optimal cut-points for each continuous explanatory variable as original cut-points
If the visual representation of the curve indicates a U-shaped relationship (df > 2 using semiparametric regression analysis), then the "two cut-points with maximum OR value method (26)" was used to identify the original upper and lower cut-points of the continuous explanatory variable at which the OR reaches its maximum.

The recursive gradient scanning method for the discretization of multiple non-monotonic independent variables
The details of the methods raised by us to discretize multiple non-monotonic independent variables simultaneously are described as follows (depicted in Figure 1).
(1) If the curve depicted in the plot implies U-shaped associations between multiple independent variables and corresponding lnOR, we used the "two cut-points with maximum OR value method" for each variable to identify its optimal cut-points as their starting points for scanning, respectively.(2) Find the percentile rankings of the estimated lnOR values for each independent variable, which are represented as Q k , k = 1, 2, …, 100.Subsequently, draw a horizontal line (known as "gradient") parallel to the x-axis for each percentile between the 5th and the 95th percentile of the estimated lnOR.The y-value for each of these lines is set to Q k , k = 5, 6, …, 95.These lines intersect the fitted U-shaped curve at two points.(3) Interpolation: The R-function spline interpolation technique is utilized to generate new data points as candidate cutpoints, resulting in a smooth curve that maintains equal lnOR values across candidate cut-points (with a constraint for candidate cut-points that jln OR 1k À ln OR 2k j 0:01).( 4) The recursive gradient scanning method: we set up a loop program to refine the boundary points and improve the discretization accuracy.Scanning starts from original cutpoints of each variable and then moves up or down vertically in each gradient by the step of lnOR × 1/100.If the model fits increasingly well during the upward or downward scan, the model stops at the P 95 of lnOR for the upward scan and at the P 5 for the downward scan.If the model fits increasingly worse during the upward or downward scan, the current scan is suspended and the scan continues in the opposite direction.The number of independent variables determines the scanning method (e.g., if the number of variables is k, there are 2 k scanning methods).Specifically, if k = 2, there are four scanning methods: (1) Scanning upward for X 1 combined with downward for X 2 ; (2) Scanning downward for X 1 combined with upward for X 2 ; (3) Scanning upward for both X 1 and X 2 simultaneously; (4) Scanning downward for both X 1 and X 2 simultaneously (illustrated in Figure 2).

Measures of predictive ability
The predictive performance of logistic regression models fitted with covariates discretized by different approaches was evaluated.The areas under the curve (AUC) constructed by receiver operating characteristic (ROC) analysis were calculated to compare different model's predictive capability.

Implementation in R
The minimum p-value method with log-rank statistics was implemented using the R package "maxstat."The freely available R package "SemiPar" was applied to fit logistic regression models with splines.The two-sided significance level for all tests was set at 0.05, and any p-values less than this threshold were deemed statistically significant.The R programming language, version

Study population
This study included a total of 26,691 participants from NHANES 2007-2016, with an average age of 49.4 years and 51.3% being female.Subjects younger than 20 years of age (n = 21,387) and those having missing data on SBP (n = 2,194), sleep duration (n = 74), and CVDs (n = 242) were excluded.Thus, 26,691 participants were included in the final list (Figure 3).

Characteristics of participants
The characteristics of study participants are presented in Table 1.Among all participants, 10.44% (2,786/26,991) reported having One of the scenarios of the scanning strategies for recursive gradient scanning method (i.e., scanning downward for X 1 combined with upward for X 2 ).CVDs were more likely to be older, lacked physical activities, had higher levels of sedentary time, had higher prevalence of comorbidities, had more treatment, and had low income (all p < 0.0001).In addition, participants with CVDs had significantly higher SBP, fasting blood glucose, triglyceride, CRP and body mass index (BMI) (all p < 0.0001), indicating poor cardiometabolic risk profiles.

The analyses of U-shaped relationship
As depicted in Figure 4, the results of the semiparametric regression analysis found a U-shaped association between sleep duration, SBP, and risks of CVDs.This U-shaped relationship suggested that individuals who sleep for intermediate duration and had healthier SBP levels were at a lower risk for CVDs.To better illustrate the relationships between all continuous variables and CVDs, we plotted the curve in Supplementary Figure S1.

Relationship between sleep duration, SBP, and CVDs by univariate logistic regression
The performance of logistic regression models for various estimated cut-points is illustrated in Table 2.The original  method used to identify cut-points in the semiparametric regression analysis may not have been accurate.A new method, the recursive gradient scanning method, was used and significantly improved the model fitting effect (the new model had larger value in Nagelkerke R 2 and smaller values in AIC and −2Loglikelihood).Our findings suggested that categorizing individuals into high-risk and low-risk groups based on the optimal cut-points of the U-shaped curve may offer a more precise depiction of the relationships between sleep duration and the risk of CVDs, as well as between SBP and the risk of CVDs.Finally, we chose 6.5 and 8.0 h as optimal cut-points for sleep duration and chose 95 and 120 mmHg as optimal cut-points for SBP.Short and long sleep duration [OR = 1.35, 95% confidence interval (CI) = 1.15-1.58]was associated with a higher risk of CVDs (p < 0.0001).Participants with healthier SBP levels were at a lower risk for CVDs (OR = 1.59, 95% CI = 1.24-2.06).

Logistic regression model results after adjusting for covariates
As shown in Table 3, after adjustment for other risk factors, the OR for those with SBP greater than 120 or less than 95 mmHg was found to be 1.17 times greater than for those with SBP between 95 and 120 mmHg (OR = 1.17, 95% CI = 1.01-1.36;p = 0.0375).Similarly, individuals who slept more than 8.0 h per day or less than 6.5 h per day also had a higher risk for CVDs than those who slept between 6.5 and 8.0 h (OR = 1.20, 95% CI = 1.04-1.40;p = 0.0138).

Predictive ability and goodness-of-fit among different methods
The present study evaluated the predictive capacity of traditional discretization methods and compared it with an optimal model using ROC curve analysis.The results showed that the recursive gradient scanning method has a higher AUC value of 0.8344, indicating a better predictive capacity than traditional discretization methods.Moreover, the adjusted R 2 value, which measured how well the model fits the data, was calculated and found to be 0.27812 for the recursive gradient scanning method, higher than the other traditional discretization methods, indicating a better fit of the model to the data.In addition, the goodness-of-fit index AIC was evaluated for all the methods.It was found that the recursive gradient scanning method had the lowest AIC value (AIC = 5,519.389),indicating that it provided an ideal compromise between model complexity and goodness of fit (Table 4).The recursive gradient scanning method 5,519.389 (1)0.8344 (1)  0.27812 (1)   Minimum p-value 5,671.354 (2)0.8337 (4)  0.27666 (4)   Q1-Q3 5,778.388 (3)0.8342 (2)  0.27795 (3)   Median 6,658.365 (4)0.8341 (3)  0.27809 (2)   (1)-( 4) means to rank according to the priority of the parameters for the model, and

Association patterns when covariates included
The following charts illustrated that additional adjustments for other covariates did not change the majority of our results.Furthermore, the associations and U-shaped trends between sleep duration, SBP, and CVD remained similar to our main results (Supplementary Figure S2).

Discussion
In this nationally representative survey of American adults, among 26,691 participants, 10.44% (sample n = 2,786) of the total reported having CVDs.This study suggested that a U-shaped relationship between sleep duration, SBP, and risk of CVDs, where both extremes of SBP and sleep duration are associated with an elevated risk of CVDs.Consequently, these results provide valuable insight into the potential impact of sleep duration and blood pressure on cardiovascular health.
In medical research, non-monotonic U-shaped dose-response relationships are increasingly common, and predictive models often involve multiple non-monotonic independent variables.If such kinds of explanatory variables are considered directly as candidate independents with a form of continuous variables in the regression model, their non-monotonic features may probably make themselves be eliminated in the followed selection, e.g., a stepwise selection.Therefore, the corresponding factors cannot be mentioned in the design of interventions.This study suggests to discretize factors according to their association with prognosis and to manifest their importance in assessing prognosis.Our previous pioneering research has garnered citations from experts in the field, highlighting the significance of our contributions.These citations from esteemed colleagues underscore the relevance and impact of our work, positioning it at the forefront of scholarly discourse.This recognition encourages us to continue our pursuit of advancing knowledge and making meaningful contributions to the field.
Unfortunately, there is a limited amount of research on discretization methods for multiple continuous variables, particularly in cases where there are U-shaped relationships between outcomes and explanatory variables.Choosing the appropriate discretization method is essential to obtain accurate predictions from statistical models.Therefore, it is critical to develop effective strategies for discretizing multiple continuous variables to ensure that these models are reliable and can be used to guide medical research and clinical decision-making.In this regard, it is essential to compare the predictive capacity of traditional discretization methods with optimal models.The results of our research showed that the recursive gradient scanning method had a higher AUC and adjusted R 2 than other traditional discretization methods (including median, minimum p-value, and Q 1 -Q 3 ), indicating its superior predictive capacity.Moreover, the goodness-of-fit index AIC remained the minimum for the recursive gradient scanning method in all the methods, further highlighting its efficacy in accurately predicting the outcomes of the statistical models.Overall, the results suggested that the recursive gradient scanning method was a promising approach for discretizing multiple non-monotonic independent variables, with potential applications in various fields.Further study is needed to explore the method's full potential and to compare it with other emerging discretization techniques, such as Bayesian classification (36,37) and decision trees (38,39).
Our study showed that having high or low SBP levels was associated with a 17% higher risk of CVDs compared to those with SBP within 95-120 mmHg intervals.Our study reinforces the importance of maintaining stable BP levels in managing CVDs.The concept of a U-shaped association between targeted SBP and the risk of morbidity and mortality has been long suggested (40).This hypothesis is based on the assumption of an SBP threshold for autoregulation of organ blood flow, and the potential role of BP as a compensatory mechanism for preserving organ function (41).The observed link between lower SBP and increased risk of CVDs supports the previous concerns about the intensity of antihypertensive treatment in older adults (42).Notably, low blood pressure can not only be harmful in itself but can also indicate poor health status (43).Even in physically fit individuals, low SBP was found to be associated with CVDs (44).However, the currently prevailing paradigm of "the lower, the better" in hypertension management has been challenged by recent randomized clinical trials (RCTs).The ACCORD trial involving diabetic patients (45) revealed that intensive blood pressure lowering (to 120 mmHg SBP) did not result in a decreased risk of cardiovascular outcomes compared to the standard therapy group (to 140 mmHg SBP).In contrast, the link between higher SBP and CVDs has been consistent in epidemiological studies.According to published studies, hypertension was found to be linked with a greater proportion of CVDs when compared to other common risk factors like smoking, obesity, hypercholesterolemia, and diabetes (46,47).Taking into account the overall evidence, adopting a less aggressive treatment approach may be the optimal approach to manage hypertension (48).Our study expands upon these findings by showing that maintaining a stable SBP level is crucial in reducing the risk of CVDs.
Similarly, individuals who slept for more than 8 h or less than 6.5 h had a higher risk for CVDs than those who slept for 6.5-8 h.The OR for this group was 1.20, indicating that individuals who sleep beyond the normal range have a 20% higher risk of CVDs than those who sleep for the recommended duration (6.5-8.0 h).This finding supported the notion that maintaining optimal sleep durations were critical in reducing the risk of CVDs.Short sleep duration has been consistently associated with increased risks of CVDs in observational studies (6,49,50).The pathophysiological mechanisms underlying this association involve abnormalities in the sympathetic nervous system, acceleration of arterial stiffening and atherosclerosis, increased inflammation, and cardiac dysfunction (5,51,52).Recent studies have suggested that extended sleep duration could improve cardiovascular health, particularly in college students or prehypertension participants who are often sleep-deprived (53, 54).Therefore, increasing sleep duration among individuals with short sleep may be a promising strategy to reduce the risk of CVDs.On the contrary, some studies have proposed that a longer duration of sleep is linked to a higher risk of developing cardiovascular disease and cardiometabolic disease  (55,56).Physiological changes that could happen include elevated blood pressure (57), impaired glucose metabolism (58), and a rise in cortisol levels (59).Furthermore, there was evidence indicating that a longer duration of sleep is linked to an increase in carotid intima-media thickness (60).Therefore, it may be clinically recommended to advise individuals with prolonged sleep duration to reduce their sleep time.
The current study has several strengths, including the utilization of the recursive gradient scanning method to discretize multiple non-monotonic independent variables.This method provides valuable insights into the relationship between sleep duration, SBP, and the risk of CVDs.Another strength is that we screened multiple influencing factors and then sorted them according to the size of the OR value, which corresponds to the risk level of each influencing factor.It will be helpful to indicate priorities for formulating intervention measures.People at a greater risk of CVDs may benefit significantly from public health campaigns promoting good sleep hygiene in the future.
To properly evaluate the outcomes of this study, it is crucial to recognize several limitations.First of all, the cross-sectional design of NHANES imposes restrictions on establishing causality or accurately determining the direction of the relationship.Thus, it is important to acknowledge the limitations of observational data and the potential for reverse causation when drawing conclusions about causality.Whenever possible, randomized trials and prospective studies can provide more robust evidence for establishing causal relationships.Second, our understanding of the associations may have been underestimated due to the absence of information about the changes over time in sleep duration and SBP, as we only had baseline assessments of these markers.The approach of utilizing mean values from short-term repeated BP measurements to assess CVD risk fails to account for BP variability and inadequately addresses masked hypertension and white coat hypertension.Instead, 24-h ambulatory blood pressure monitoring (ABPM) provides continuous readings, capturing natural fluctuations, nighttime levels, and patterns such as non-dipping.Consideration of a combined approach involving clinic-based and periodic ABPM may yield a more comprehensive evaluation of BP dynamics and CVD risk in certain cases.Third, the study relied solely on self-reported sleep duration data, which could lead to measurement errors and potentially impact the precision of the findings.Future research could benefit from using objective measures of sleep duration, such as polysomnography or actigraphy, to improve data reliability.Furthermore, despite careful adjustment for many potential confounding factors to ensure the validity of the key findings, residual confounding may still exist due to unmeasured risk factors.Therefore, additional research is required to replicate these associations and investigate the mechanisms underlying the results.

Conclusion
U-shaped relationships were identified between sleep duration, BP, and risk of CVDs.Both shorter and longer sleep duration/ higher and lower SBP are significant predictors of CVDs in large population studies.One should consider duration of sleep and blood pressure control as additional behavioral risk factors that are heavily influenced by environmental factors and can potentially be modified through education, counseling, and public health interventions.

( 5 )
Select the best cut-points according to model fitness: We scan and calculate from the original cut-points in each program loop until the desired results are achieved.Then the goodness of fit index of the model under hyperparametric scenarios was obtained, such as AIC, Nagelkerke R 2 , and −2 log-likelihood.Select the respective variable cut-points corresponding to the best-fit model under each parameter combination as the final cut-points, place the discretized classified variables into the regression model, and rank the influencing factors according to the magnitude of OR values.

FIGURE 1
FIGURE 1The process of calculation implementation.

FIGURE 3 Flow
FIGURE 3 Flow chart for population selection.This figure represents the sample selection for the analysis of sleep duration as well as SBP with CVDs.

FIGURE 4
FIGURE 4 Smoothing plot for sleep duration and SBP with risks of CVDs by semiparametric regression analysis.U-shaped relationships were shown between sleep duration (A), SBP (B), and risks of CVDs [The solid line indicates the point estimation for ln odds ratios (lnOR) of CVDs, and the dotted lines represent 95% confidence intervals (CIs)].

TABLE 1
The characteristics of participants in the NHANES 2007-2016 according to CVDs.

TABLE 2
Performance of different estimated cut-points in logistic regression.-points SBP L (mmHg) SBP U (mmHg) Sleep L (h) Sleep U (h) ORs BP (95% CI) OR Sleep (95% CI) AIC R 2 −2 loglikelihood L represents the lower cut-point, and U represents the upper cut-point.

TABLE 3
Multivariable logistic regression for the final model.
a Variables were treated as continuous variable form.

TABLE 4
The predictive capacity and goodness-of-fit among different methods.
means the highest priority.