ORIGINAL RESEARCH article

Front. Stroke, 18 November 2024

Sec. Stroke in the Young

Volume 3 - 2024 | https://doi.org/10.3389/fstro.2024.1488313

Identifying predictors of stroke in young adults: a machine learning analysis of sex-specific risk factors

  • 1. Department of Health Services Research, Management and Policy, College of Public Health and Health Professions, University of Florida, Gainesville, FL, United States

  • 2. Department of Speech, Language and Hearing Science, College of Public Health and Health Professions, University of Florida, Gainesville, FL, United States

Article metrics

View details

2,1k

Views

563

Downloads

Abstract

Introduction:

Stroke among Americans under age 49 is increasing. While the risk factors for stroke among older adults are well-established, evidence on stroke causes in young adults remains limited. This study used machine learning techniques to explore the predictors of stroke in young men and women.

Methods:

The least absolute shrinkage and selection operator algorithm (LASSO) was applied to data from Wave V of the National Longitudinal Survey of Adolescent to Adult Health (N = 12,300)—nationally representative, longitudinal panel containing demographic, lifestyle, and clinical information for individuals aged 33–43—to identify the key factors associated with stroke in men and women. The resulting LASSO model was tested and validated on an independent sample and model performance was assessed using the area under the receiver operating characteristic curve (AUC) and calibration. For robustness, synthetic minority over sampling technique (SMOTE) was applied to address data imbalance and analyses were repeated on the balanced sample.

Results:

Approximately 1.1% (N = 59) and 1.3% (N = 90) of the 5,318 and 6,970 men and women in the sample reported having a stroke. LASSO was used to predict stroke using demographic, lifestyle, and clinical predictors on both balanced and imbalanced data sets. LASSO performed slightly better on the balanced data set for women compared to the unbalanced set (Female AUC: 0.835 vs. 0.842), but performance for men was nearly identical (Male AUC: 0.820 vs. 0.822). Predictor identification was similar across both sets. For females, marijuana use, receipt of health services, education, self-rated health status, kidney disease, migraines, diabetes, depression, and PTSD were predictors. Among males, income, kidney disease, heart disease, diabetes, PTSD, and anxiety were risk factors.

Conclusions:

This study showed similar clinical risk factors among men and women. However, variations in the behavioral and lifestyle determinants between sexes highlight the need for tailored interventions and public health strategies to address sex-specific stroke risk factors among young adults.

Introduction

An estimated 10%−15% of all first-ever strokes occur in people aged 18–50 years (Kissela et al., 2012; Singhal et al., 2013). With a yearly stroke incidence of 15 million people worldwide, at least 1.5 million young adults are affected every year. While the incidence of stroke among the elderly is declining, stroke in younger adults is increasingly common (Wilson and Biller, 2004; Aigner et al., 2017). Additionally, young adults are less likely to die from stroke than older adults and their risk is significantly higher than that of the age-adjusted general population (Bukhari et al., 2023). One-third of young adult stroke survivors are left with moderate to severe functional impairment (Varona et al., 2004) and 40% have long-term cognitive impairment (Jia, 2015). Stroke at a young age not only results in impairment in basic daily activities but also impacts participation in normal activities, such as returning to work, family, and social activities (Treger et al., 2007; Pollock et al., 2014).

Young stroke survivors are more likely to experience marital problems such as separation or divorce (Teasell et al., 2000; Banks and Pearson, 2004), unmet financial need due to the number of productive life years lost (Sultan and Elkind, 2013; Béjot et al., 2016), and long-term sequelae including permanent cognitive deficits, epilepsy, and chronic debilitating fatigue with poor functional outcomes (Schaapsmeerders et al., 2013; Maaijwee et al., 2015; Arntz et al., 2013; Amoah et al., 2024). The social and economic burdens of stroke in young adults are substantial due to loss of prime productive years, longer time spent with disability, and increased mortality (Ekker et al., 2019). More specifically, strokes in younger adults carry the potential for a greater lifetime burden of disability and may have more catastrophic consequences for people of working age (Vestling et al., 2003). Finally, stroke in young adults is more challenging because the variability in clinical presentation and differences relative to older adults (Bukhari et al., 2023).

In addition to age-related variation in stroke and stroke-related outcomes, research also indicates sex differences in the both the incidence and risk factors associated with stroke (Lasek-Bal et al., 2018). Although individual studies vary in their findings, a recent meta-analysis revealed that young adult women experience strokes at a 44% higher rate than men (Leppert et al., 2022). Additionally, Reeves et al. (2008) found that traditional stroke risk factors like hypertension and diabetes had different impacts on stroke risk in men and women, necessitating sex-specific preventive strategies. Moreover, a study by Bushnell et al. (2014) emphasized the importance of considering sex differences in stroke risk to improve the accuracy of predictive models and the effectiveness of interventions. These findings underscored the need for separate analysis to develop more precise, sex-specific public health policies and individualized treatment plans that can better address the unique risk factors and improve outcomes for both young men and women.

One potential area that has shown promise in better understanding stroke in young adults is related to the use of machine learning to explore variations in stroke outcomes (Chandrabhatla et al., 2023). Machine learning (ML) is a data analytics methodology that is increasingly being used to explore the relationship between and make predictions of outcomes derived from multiple data sources (Ni et al., 2018). ML uses algorithms that iteratively learn from multiple inputs of training data to determine complex relationships within the data to improve prediction on future data sources (Ni et al., 2018). ML approaches have been utilized in stroke diagnosis (Mainali et al., 2021; Dev et al., 2022), stroke risk factor identification (Hassan et al., 2024), stroke imaging (Sheth et al., 2023; Soun et al., 2021), and stroke outcome prediction (Mainali et al., 2021). Given the substantial contributions of ML to the study of stroke overall, ML approaches appear ideal to study stroke in young adults to improve diagnosis, treatment and patient-related outcomes (Daidone et al., 2024). Therefore, this study was designed to examine sex-specific risk factors among young adults with stroke. The team used a nationally representative sample of young adults aged 33–43 (Lee et al., 2022). Predictive modeling was utilized to identify the key sex-specific stroke risk factors and how these risk factors varied between young men and women with the ultimate goal of characterizing the multifactorial nature of stroke incidence in younger adults.

Materials and methods

Data

Data for this study came from the National Longitudinal Survey of Adolescent Health (ADD Health)—a nationally representative, longitudinal survey of individuals who were in Grades 7–12 during the 1994–1995 school year in the United States. ADD Health includes longitudinal data on respondents' social, economic, psychological and physical well-being with contextual data on the family, neighborhood, community, schools, friendships, peer groups, and romantic relationships, providing unique opportunities to study how health, social environments, and behaviors are linked over time. The initial Wave I sample (N = 20,745) represented the national cohort of adolescents in grades 7 to 12 in the US in 1995. This cohort was followed into young adulthood with five in-home interviews in 1995 (Wave I), 1996 (Wave II, N = 17,738), 2001–2002 (Wave III, N = 15,197), 2008–09 (Wave IV, N = 15,701), and 2016–18 (Wave V, N = 12,300) when respondents were 12–17, 13–18, 18–26, 24–32, and 33–43 years old, respectively. Each wave consisted of core household, demographic and health information along with additional wave-specific topics.

This study used data collected in Wave V (n = 12,300) when all respondents were age 18 and above as well as information reported in the Wave I parental survey. Wave V collected social, environmental, behavioral, and biological data with which to track the emergence of chronic disease as the cohort advanced through their 30s and early 40s. The data collection employed a mixed mode survey design consisting of web, in-person, telephone, and mail-based questionnaires and interviews (Harris et al., 2019). Additional information on the sampling design, survey modes, instrumentation, and validation can be found here https://addhealth.cpc.unc.edu/wp-content/uploads/docs/user_guides/Add-Health-Wave-V-Sampling-and-Mixed-Mode-Survey-Design_doi.pdf. They Wave V survey was the first to include questions related to stroke and family history of stroke. Survey items included in this study are described below. Using these data, we sought to identify specific predictive factors associated with stroke among young men and women. To explore the influence of data balancing methods on the performance of the LASSO, analyses were performed on both the balanced and imbalanced data set for both sexes.

Outcome variables

In Wave V, respondents indicated whether a doctor, nurse, or health care provider diagnosed them with a stroke. Responses were coded as either zero which represented no prior stroke diagnosis or one which represented a prior stroke diagnosis. Respondents who asked for additional clarity were told to respond affirmatively if they had been diagnosed with a stroke, ministroke, or received surgery for clogged neck arteries (including endarterectomy, bypass, angioplasty, or stent).

Candidate variables

Candidate variables included important confounding variables in stroke risk, but excluded those that were themselves potential outcomes of stroke since their inclusion would bias the focal association (Hernan, 2002; Pearl, 2009; Elwert, 2013). Therefore, to capture factors associated with stroke (Cramer and Kapusta, 2017), sets of theoretically relevant demographic, lifestyle, and clinical characteristics were chosen from those available in the ADD Health database.

Demographic factors

Demographic factors included age, sex at birth (male, female), race, ethnicity, highest educational attainment (less than a college degree, college degree or above), employment status (currently working 10+ hours per week, not working 10+ hours per week), marital status (married, not married), school enrollment (currently enrolled in an educational degree program at least part-time, not enrolled), and income level (< $75,000, ≥$75,000). Race was self-reported as White, Black, Asian or Pacific Islander, American Indian or Native American, or other. Due to sample size limitations, Asian or Pacific Islander, American Indian or Native American, and other were collapsed into a single category. Ethnicity was self-reported as either Hispanic or non-Hispanic. To account for early life and familial characteristics, indicators for parental education (high school diploma or above), parental marital status (married), and parental income (≥$75,000) were created. Additional indicators were created for residing in the South, Midwest, or West and living within or near a neighborhood area historical classified as “definitely declining” or “hazardous” by the Homeowners Loan Corporation (HOLC)—also known as a historically “red” neighborhood. Finally, indicators for prior stroke diagnosis among biological aunts/uncles, grandparents, parents, and siblings were created.

Lifestyle factors

Indicators were created from survey items capturing the frequency of engaging in health-impacting behaviors. Binary behavioral indicators were created from ordinal survey variables based on the univariate distributions. These indicators included consuming alcohol more than once monthly, smoking at least one cigarette monthly, used marijuana at least once in the past month, watching more than 20 h of television weekly, and exercising at least once weekly. Responses to individual survey items concerning use of illicit drugs including cocaine, crystal meth, heroin, or other types of illegal drugs, such as LSD, PCP, ecstasy, or mushrooms or inhalants in the last month were combined into a single variable. Finally, an additional indicator for the use of prescription sedatives, tranquilizers, stimulants, or pain killers that were not prescribed, taken in larger amounts than prescribed, more often than prescribed, for longer periods than prescribed, or taken for the feeling or experience they caused in the past 30 days was created.

Clinical factors

Health-related characteristics included self-reported health status, health services utilization, and diagnoses. Indicators for self-reported good/very good/excellent health, being obese (body mass index ≥30), having health insurance, and self-reported diagnosis of diabetes, heart disease, migraine headaches, kidney disease/kidney failure, depression, anxiety, hyperlipidemia, or high blood pressure were included. Additionally, receipt of health services was captured using indicators for receipt of mental health counseling within the last 12 months, taking at least one prescription medication regularly, having a dental exam within the last 12 months, having a regular doctor or health center, and having not received needed health services in the last year. For female respondents, indicators for taking oral contraception, and having previously had at least one live birth were created.

Data analysis approach

As previously indicated, continuous and ordinal variables were transformed into categorical outcomes using established or pragmatic thresholds to enhance interpretability and simplify interpretation of findings (Bennette and Vickers, 2012; Barrio et al., 2017). Many variables have been reported to influence stroke occurrence including age, sex, race/ethnicity, family history, genetic factors, hypertension, diabetes, heart disease, high cholesterol, smoking, obesity, physical inactivity, diet, alcohol and substance use, previous stroke or transient ischemic attack (TIA), sleep apnea, hormonal factors (e.g., oral contraceptives, hormone replacement therapy), chronic stress, socioeconomic status (Yahya et al., 2020). To identify factors associated with young stroke, predictive modeling techniques were employed to uncover the most important predictors from a complex dataset. This approach is vital as it identifies key risk factors while allowing for more accurate model generalization. We examined a dataset encompassing 51 and 53 variables for men and women, respectively, the unequal number resulting from several female specific characteristics related to pregnancy and childbirth. This dataset included a spectrum of demographic, lifestyle, and clinical information. However, the relationship between various social, behavioral, and health-related outcomes often requires advanced approaches to identify the most important predictors without overfitting which cannot be easily rectified using standard techniques (Irvin et al., 2020; Richmond et al., 2020; Kino et al., 2021).

Regularization, a technique designed to generalize models in the context with many potentially important predictors, was completed by adding a penalty to model parameters. This approach helps the model generalize to the data rather than overfitting to the training set. Least Absolute Shrinkage Selector Operator (LASSO), a type of regularization, was utilized to minimize model overfitting by applying a penalty term (λ) to the log-likelihood function and setting the coefficients of unimportant predictors to zero (Tibshirani, 1996). LASSO simultaneously performs variable selection by identifying the most important predictors while managing model complexity. This is particularly valuable in datasets with numerous predictors, thereby enhancing our understanding of stroke risk among young adults. The approach has been used in a variety of settings with complex sets of underlying predictors (Ortega Hinojosa et al., 2014; Simeonov and Himmelstein, 2015). LASSO was executed using the glmnet package (Tay et al., 2023; Friedman et al., 2010) in R software (R Core Team, 2021) (version 4.4.0), incorporating a ten-fold cross-validation strategy to ascertain the optimal regularization parameter (λ). A random training set (70%) was selected to train the modes and a random hold-out test set (30%) to assess its performance. To ensure model results were not influenced by multicollinearity between factors, variance inflation factors (VIF) were inspected. All VIFs were below five suggesting a low correlation with other predictors.

On the training set, 10 × 10-fold cross validation was used to select the optimal lambda value within one standard error of the minimal cross-validation error (i.e., lambda.1se criterion) (Tibshirani, 1996). Through this procedure, we identified key indicators that manifested non-zero coefficients in the LASSO model and were identified as predictors of stroke occurrence among men and women.

To ensure that poor data quality did not degrade the final prediction, data discretization, redundant values reduction, and class balancing was performed to make it more appropriate for mining and analysis (Fan et al., 2021). Class balancing employed the synthetic minority oversampling technique (SMOTE) (Maldonado et al., 2019) to address the imbalanced distribution of participants among the stroke and non-stroke classes. SMOTE was executed using the performanceEstimation (Torgo, 2014) package in R and generated synthetic samples by oversampling the minority class to balance the class distribution. However, since balancing a dataset can itself introduce bias (Krawczyk, 2016), analyses were conducted with both the unbalanced (original) and balanced data.

To interpret the results from the LASSO regression model, the magnitude of the coefficients was used to determine the strength of the association with larger magnitudes indicating a stronger association or more predictive value while the sign of the coefficient indicated the direction of the association (Wiemken and Kelley, 2020). To evaluate the performance of the model, the model prediction was tested using the testing data set then the model AUC, accuracy, precision, and recall were calculated (Friedman et al., 2010).

Results

Among 12,300 respondents in the sample, 6,970 (56.67%) were female and 53,18 (43.53%) were male with mean ages of 37.45 (SD = 1.88) and 37.71 (SD = 1.89), respectively. About one percent of the sample (N = 149, 90 female, 49 male) reported having a stroke. Comparison of demographic, lifestyle, and clinical characteristics of the female and male samples is shown in Table 1. Comparisons of the stroke cohort and the cohort without stroke and within each sex is shown in Tables 2, 3. There were few differences between the balanced and unbalanced cohorts. For ease of interpretability, cohort comparison results were based on the unbalanced sample. The percent of the stroke cohort with hypertension (male 32.2%; female 27.78%), diabetes (male 25.42%; female 21.11%), kidney disease (male 18.64%; female 14.44%), chronic migraines (male 23.73%; female 75.56%), hyperlipidemia (male 38.98%; female 24.44%), and obesity (male 55.93%; female 55.56%) was significantly higher compared to the non-stroke cohort for both the male and female samples. Similarly, the portion of the male and female stroke cohorts reporting marijuana (male 32.20%; female 31.11%), illegal drug (male 3.39%; female 8.89%), and cigarette (male 40.68%; female 35.56%) usage was also higher than their non-stroke counterparts.

Table 1

Female (N = 6,970, 56.67%)Males (N = 5,318, 43.53%)Sex difference
MeanSDMeanSDF-statProb
Age (33–43)37.451.8837.711.891.010.66
Live births (0–9)1.691.33
NPercentNPercentχ2p-Value
Stroke901.29591.110.830.36
Good self-reported health3,75553.872,74551.626.160.01
Hypertension1,18316.971,31624.75112.51< 0.0001
Diabetes4216.042053.8529.80< 0.0001
Kidney disease/failure600.86510.960.320.57
Heart disease881.26761.430.640.43
Chronic migraines2,37234.0385716.12499.83< 0.0001
Hyperlipidemia1,02014.631,1172185.20< 0.0001
Obese2,99142.912,26442.570.140.71
Depression2,54436.51,13421.32331.23< 0.0001
Anxiety2,14330.7590216.96307.51< 0.0001
PTSD5257.532935.5119.86< 0.0001
Dental appointment in past 12 months4,74568.083,12458.74114.11< 0.0001
Counseling within last 12 months1,14516.4363311.949.90< 0.0001
Health insurance6,49693.24,79890.2235.97< 0.0001
Has regular health facility4,20860.372,53847.72194.90< 0.0001
Did not received necessary care last 12 months1,59222.841,09120.529.560.00
Takes ≥1 prescription medication1,78625.6286316.23157.49< 0.0001
Takes oral contraception3,85455.29
Mother had a stroke100.1430.062.160.14
Father had a stroke100.1430.062.160.14
Sibling(s) had a stroke90.1360.110.070.80
Aunt(s)/uncle(s) had a stroke1141.64891.670.030.87
Grandparent(s) had a stroke2062.961633.070.120.72
Parents Married6,96399.95,31599.940.720.40
Parents education ≥ high school5,91584.864,56985.922.670.10
Parents earned >$75,0006299.0253310.023.510.06
Hispanic1,04014.9278914.840.020.90
Black1,59722.9195217.946.07< 0.0001
Other race5517.914889.186.300.01
Education level college degree or above2,90941.741,77633.488.94< 0.0001
Married4,01957.663,10958.460.790.37
Household income >$75,0002,55436.642,83653.33341.06< 0.0001
Currently employed5,61480.554,68988.17129.55< 0.0001
Currently enrolled in school at least part-time6359.113386.3631.39< 0.0001
Exercise ≥1 time weekly4,74668.093,08257.95134.08< 0.0001
Used marijuana ≥1 time last month1,04715.021,19722.53113.63< 0.0001
Used illegal drugs ≥1 time last month1792.572474.6438.77< 0.0001
Improperly used prescription medication79611.4255710.462.800.09
Smokes Regularly1,44420.721,39526.2351.63< 0.0001
Consumes alcohol ≥1 time weekly3,27446.973,20260.21212.06< 0.0001
Watches ≥20 h of TV weekly1,03414.8496818.225.08< 0.0001
HOLC grade declining or hazardous2,29832.971,75332.960.000.99
South2,92341.942,14040.243.580.06
Midwest1,59922.941,23623.240.150.70
West1,64923.661,23723.260.270.61

Demographic, lifestyle, and clinical characteristics by sex.

Table 2

No stroke (N = 6,880)Stroke (N = 90)Difference
MeanSDMeanSDF-statProb
Age37.451.8837.691.881.001.00
Live births1.691.331.931.451.190.21
NPercentNPercentχ2p-Value
Good self-reported health3,73754.32182042.10< 0.0001
Hypertension1,15816.832527.787.550.01
Diabetes4025.841921.1136.49< 0.0001
Kidney disease/failure470.681314.44197.13< 0.0001
Heart disease771.121112.2287.85< 0.0001
Chronic migraines2,30433.496875.5670.03< 0.0001
Hyperlipidemia99814.512224.447.020.01
Obese2,94142.755055.565.950.01
Depression2,48336.096167.7838.49< 0.0001
Anxiety2,09130.395257.7831.29< 0.0001
PTSD5037.312224.4437.44< 0.0001
Dental appointment in past 12 months4,69968.34651.1112.080.00
Counseling within last 12 months1,11716.242831.1114.320.00
Health insurance6,41793.277987.784.230.04
Has regular health facility4,15760.425156.670.520.47
Did not received necessary care last 12 months1,55322.573943.3321.73< 0.0001
Takes ≥1 prescription medication1,75025.4436409.890.00
Takes oral contraception3,80355.285156.670.070.79
Mother had a stroke90.1311.115.960.01
Father had a stroke90.1311.115.960.01
Sibling(s) had a stroke80.1211.116.820.01
Aunt(s)/uncle(s) had a stroke1121.6322.220.200.66
Grandparent(s) had a stroke1992.8977.787.390.01
Parents earned >$75,0006239.0666.670.620.43
Parents education ≥ high school5,83784.847886.670.230.63
Parents Married6,87399.9000.090.76
Hispanic1,0321588.892.610.11
Black1,57422.882325.560.360.55
Other race5467.9455.560.690.41
Education level college degree or above2,89242.031718.8919.57< 0.0001
Married3,97357.754651.111.600.21
Household income >$75,0002,53736.881718.8912.380.00
Currently employed5,55680.765864.4415.080.00
Currently enrolled in school at least part-time6259.081011.110.440.51
Exercise ≥1 time weekly4,68368.0763700.150.70
Used marijuana ≥1 time last month1,01914.822831.1118.46< 0.0001
Used illegal drugs ≥1 time last month1712.4988.8914.560.00
Improperly used prescription medication77811.311718.895.050.02
Smokes regularly1,41220.523235.5612.220.00
Consumes alcohol ≥1 time weekly3,23847.0636401.780.18
Watches ≥20 h of TV weekly1,01714.781718.891.190.28
HOLC grade declining or hazardous2,26732.953134.440.090.76
South2,88941.993437.780.650.42
Midwest1,56822.793134.446.820.01
West1,63423.751516.672.470.12

Female cohort demographic, lifestyle, and clinical characteristics by stroke status.

Table 3

No stroke (N = 5,259)Stroke (N = 59)Difference
MeanSDMeanSDF-statProb
Age37.711.8937.952.031.150.4003
NPercentNPercentχ2p-Value
Good self-reported health2,73051.911525.4216.39< 0.0001
Hypertension1,29724.661932.21.780.18
Diabetes1903.611525.4274.89< 0.0001
Kidney disease/failure400.761118.64196.46< 0.0001
Heart disease641.221220.34151.45< 0.0001
Chronic migraines84316.031423.732.560.11
Hyperlipidemia1,09420.82338.9811.620.00
Obese2,23142.423355.934.360.04
Depression1,10521.012949.1527.54< 0.0001
Anxiety87516.642745.7635.14< 0.0001
PTSD2805.321322.0331.29< 0.0001
Dental appointment in past 12 months3,09858.912644.075.300.02
Counseling within last 12 months61911.771423.737.960.00
Health insurance4,74190.155796.612.760.10
Has regular health facility2,51047.732847.460.000.97
Did not received necessary care last 12 months1,07420.421728.812.520.11
Takes ≥1 prescription medication84416.051932.211.200.00
Takes oral contraception00000.030.85
Mother had a stroke30.06000.030.85
Father had a stroke30.06000.030.85
Sibling(s) had a stroke60.11000.070.80
Aunt(s)/uncle(s) had a stroke861.6435.084.220.04
Grandparent(s) had a stroke1603.0435.080.820.37
Parents earned >$75,00052910.0646.780.700.40
Parents education ≥ high school4,51685.875389.830.760.38
Parents Married5,25699.94591000.030.85
Hispanic78314.89610.171.030.31
Black93417.761830.516.450.01
Other race4839.1858.470.040.85
Education level college degree or above1,76333.521322.033.460.06
Married3,08358.622644.075.090.02
Income >$75,0002,82153.641525.4218.67< 0.0001
Currently employed4,65388.483661.0242.19< 0.0001
Currently enrolled in school at least part-time3346.3546.780.020.89
Exercise ≥1 time weekly3,04757.943559.320.050.83
Used marijuana ≥1 time last month1,17822.441932.23.180.00
Used illegal drugs ≥1 time last month2454.6623.390.210.65
Improperly used prescription medication54610.381118.640.250.04
Smokes regularly1,37126.072440.686.430.01
Consumes alcohol ≥1 time weekly3,17960.452338.9811.220.00
Watches ≥20 h of TV weekly95218.11627.123.190.07
HOLC grade declining or hazardous1,73132.922237.290.500.48
South2,10740.063355.936.110.01
Midwest1,22323.261322.030.050.83
West1,23023.39711.864.340.04

Male cohort demographic, lifestyle, and clinical characteristics by stroke status.

Table 4 provides results from predictive models of stroke risk including model performance, accuracy, feature selection, and feature coefficients. Of the 51 and 53 variables entered in the LASSO for the female and male cohorts, respectively, results showed that 10 (balanced 6) and 8 (balanced 6) were associated with stroke in the female and male unbalanced data sets. The most predictive clinical features for males were kidney disease, heart disease, and diabetes, and the most important clinical variables among females were kidney disease, chronic migraines, diabetes, and depression. The AUC values for the balanced and imbalanced data models were similar, indicating consistent predictor identification and model performance. Mental health variables were also identified as predictors of stroke—PTSD (Sumner et al., 2023) and anxiety (Ryder and Cohen, 2021) for males and PTSD (Ebrahimi et al., 2021) and depression for females (Dong et al., 2012).

Table 4

FemalesMales
Female unbalancedFemale balancedMale unbalancedMale balanced
BD0.11BD0.47BD0.11BD0.44
ME0.01ME0.08ME0.01ME0.08
AUC0.84AUC0.83AUC0.82AUC0.82
MSE0.02MSE0.13MSE0.02MSE0.12
MAE0.05MAE0.26MAE0.04MAE0.25
Intercept−5.06Intercept−2.95Intercept−4.77Intercept−2.79
Kidney disease1.96Kidney disease1.05Kidney disease2.24Kidney disease1.92
Diabetes0.80Chronic migraines0.69Heart disease2.05Diabetes1.65
Chronic migraines0.76Diabetes0.63Diabetes0.41PTSD0.44
Depression0.24Depression0.15Anxiety0.38Heart disease0.38
Used marijuana0.15Used marijuana0.15PTSD0.20Anxiety0.23
Did not received necessary care last 12 months0.09Did not received necessary care last 12 months0.02Consumes alcohol ≥1 time weekly−0.03Income > $75,000−0.04
Live births0.06Currently employed−0.04
PTSD0.06Total features6Income > $75,000−0.05Total features6
Good self-reported health−0.01Percent correct0.92Percent correct0.92
College degree or above−0.30Total features8
Percent correct0.99
Total features10
Accuracy0.99

Model performance and feature selection.

BD, binomial deviance; ME, misclassification error; AUC, area under the ROC (receiver operating characteristic) curve; MSE, mean squared error; MAE, mean absolute error.

Non-clinical characteristics identified as predictors of stroke in females included having a college degree, not receiving necessary healthcare, use of marijuana, good self-reported health, and the number of live births. Non-clinical predictors of male stroke differed from those identified among females and include consuming alcohol, employment, and income.

Model fit parameters are depicted in Figures 1, 2 for the male and female cohorts respectively. The AUC, deviance, mean error, positive predictive value, and accuracy of each model is also shown in Table 4. The AUC of the unbalanced female model was 0.84 (balanced 0.83) and the unbalanced male model was 0.82 (balanced 0.82). Although there was no difference in AUC between the models, the accuracy was slightly lower for the balanced data models compared to the unbalanced data models (male cohort 0.92 vs. 0.99; female cohort 0.92 vs. 0.99). However, the features identified by both models as well as the order of feature importance was highly similar. Thus, the most important predictors of stroke for each sex remained unchanged.

Figure 1

Figure 2

Discussion

This study utilized LASSO regression, also known as L1 regularization, to examine clinical and non-clinical predictors of stroke risk among young men and women. LASSO regression is a technique used to estimate the relationships between variables and make predictions by finding a balance between model simplicity and accuracy. It achieves this by adding a penalty term to the regression model, which encourages sparse solutions where some coefficients are forced to be exactly zero. This feature makes LASSO particularly useful for feature selection, as it can automatically identify and discard irrelevant or redundant variables.

Only about one percent of the sample “self-reported” having a stroke which agrees with recent work completed by the US Centers for Disease Control and Prevention showing that approximately one percent of individuals age 18–44 reported having a stroke (Imoisili et al., 2024). Given the age of the population, the incidence of stroke is relatively low. To ensure that the sample imbalance did not bias mode results, data balancing was performed, and the LASSO models were re-estimated on balanced data. The high degree of similarity between these sets of results suggests minimal bias in estimates.

Findings from this analysis showed some similarities as well as some variations in stroke-related risk factors between men and women. For example, kidney disease, diabetes, and post-traumatic stress disorder (PTSD) were predictors of stroke in both men and women. However, specific predictors of stroke among men included heart disease and anxiety whereas women demonstrated a relationship with chronic migraines and depression. Greater variation existed among non-clinical stroke predictors as men were more likely to report alcohol consumption, employment, and income, compared to education, healthcare access, self-reported health, number of live births, and marijuana usage among women.

Clinical predictors

This study's findings of kidney disease (Krishna et al., 2009) and diabetes (Chen et al., 2016) being predictors of stroke in young adults regardless of sex aligns with previous literature. Additionally, PTSD has been consistently associated with stroke risk (Nanavati et al., 2023). However, these three clinical factors were the only clinical characteristics that were identified as predictors of stroke for both men and women. When considering that sex-related differences in stroke subtypes, etiology, and lateralization have been previously reported (Bonkhoff et al., 2021) and hormonal, physiological, and lifestyle differences contribute to distinct stroke risk factors in men vs. women, we anticipated a greater number and variation and in the clinical predictors for both men and women. For instance Appelros et al. (2009) demonstrated that women were more likely to experience strokes associated with migraines, autoimmune disorders, and hormonal factors, while men more commonly exhibited stroke risks related to lifestyle factors such as smoking and alcohol use. However, this study indicated that there were more predictors of stroke in women compared to men, but lifestyle factors were important determinants among both sexes.

Non-clinical predictors

The study did not identify any overlap of non-clinical predictors of stroke between men and women. The finding of increased number of live births as a predictor of stroke aligns with previous work demonstrating pregnancy increases risk of stroke (Camargo and Singhal, 2021). Healthcare access and self-reported health emerged as important non-clinical predictors of stroke among women may reflect disparities in the quality of stroke care. Prior studies have shown that women are less likely to receive the same level of evidence-based care for stroke when compared to men with some early stroke care disparities being attributed to differences in initial symptomology among women (Ospel et al., 2023). Additionally, women with stroke were less likely to receive diagnostic services and acute stroke intervention relative to their male counterparts (Roquer et al., 2003). While substance use was observed as predictor in both males and females, the type of substance use differed. Alcohol consumption was a predictor among males whereas marijuana consumption was a risk factor among females. This finding aligns with previous literature demonstrating that alcohol usage and marijuana usage (Jeffers et al., 2024) are associated with stroke risk and men drink more alcohol than women (White, 2020). Lastly, alcohol usage is associated with heart disease (Piano, 2017) suggesting an interrelationship between this non-clinical characteristic and observed variation in clinical characteristics.

Other non-clinical predictors of stroke included employment and income. These findings may be partially explained by traditional reported societal roles related to employment. Although not explored in stroke, sex differences in unemployment and mental health have been observed with men experiencing higher risk of mental health illness with unemployment than women (Artazcoz et al., 2004). Subsequently, the relationship between employment and income and stroke risk in males may occur through neurobiological pathways of stress (Booth et al., 2015). Recently, cultural gender norms have been acknowledged as important to sex and gender differences in health (Bates et al., 2022). Although, we can infer the role of gender inequities such as healthcare access in our study, this study does not include additional variables such as perceptions of masculinity. Clear sex differences in stroke across all ages suggest future work should explore the influence of gender norms on sex differences.

Quality of model prediction

In machine learning-based studies of chronic disease such as stroke it is critically important that the techniques and subsequent results be comparable across studies. Therefore, a standard set of metrics are often reported to facilitate these comparisons. In studies applying the LASSO regression, the area under ROC curve (AUC) provides a comprehensive evaluation of the model's prediction performance, ranging from 0.5 to 1 with a value closer to 1 indicating stronger predictive accuracy and better overall model performance. AUC measures overall model performance thereby providing a useful measure for comparing the performance of two different models. In this study we performed 10 × 10-fold cross validation for both the balanced and unbalanced models and found the AUC to be relatively consistent and minimal differences regardless of model (0.82–0.84). We did see that the accuracy of the balanced models was slightly lower than the unbalanced models for both men and women (0.92 vs. 0.99). However, the predictive accuracy overall was high (0.92–0.99)—indicating that these models were equally if not better performing than previous studies exploring stroke diagnosis (Ni et al., 2018). The identification and accuracy of these predictive models underscores the importance of predictive modeling stroke research (Daidone et al., 2024).

Limitations

Although this study provides valuable information on the predictors of stroke among young men and women, findings must be considered within the context of the following limitations. First, only 149 (59 males and 90 females) of the nearly 12,300 individuals in the ADD Health sample reported having been diagnosed. Small sample sizes have been a consistent concern among stroke studies utilizing machine learning (Lee et al., 2020) and consequently their generalization (Zhi et al., 2024). Second, all information is self-reported and cannot be validated or verified as accurate. Studies have shown that certain health or behavioral conditions can suffer from underreporting, delayed reporting, and incomplete reporting. Further, survey data can also suffer from recency bias, response bias, recall bias, and favorability bias. Third, not all potential predictors of stroke were available in the ADD Health data. For example, the survey did not contain information on coagulation system disorders, antiphospholipid antibody syndrome, or sickle cell anemia. Fourth, the survey employed a complex design and sampling framework that could not be incorporated into the LASSO regression. Fifth, while the LASSO performs both variable selection and regularization to enhance the prediction accuracy and interpretability of the statistical model it produces, it has several limitations including variable selection instability, difficulty handling multicollinearity, and limited variable selection in high dimensional data. Finally, Wave V included a smaller sample than prior waves resulting in smaller samples of all population groups. Additionally, the identified predictors are associations and should not be interpreted as causal factors. Further research is needed to establish causal pathways and underlying mechanisms.

In conclusion, the findings from this study provide valuable insights into the risk factors for stroke among young adults, highlighting the significance of both clinical and behavioral determinants. The application of the LASSO algorithm to a large, nationally representative dataset allowed for the identification of distinct risk profiles for men and women, underscoring the importance of tailored prevention strategies. The modest improvement in model performance with data balancing techniques like SMOTE suggests that addressing data imbalance is beneficial but not transformative. Importantly, this study emphasizes the rising incidence of stroke in younger populations and the need for surveillance and interventions to mitigate this trend. This evidence underscores the necessity of studying men and women separately to inform more effective, personalized prevention and treatment strategies. Future research should focus on refining these models, exploring causal pathways, and developing prevention programs that account for the diversity of identified risk factors.

Statements

Data availability statement

The data analyzed in this study is subject to the following licenses/restrictions. This study used ADD Health restricted-use data, which is available only by contractual agreement to certified researchers who commit themselves to maintaining limited access. To be eligible to enter into a contract, researchers must have an IRB-approval letter, security plan for handling and storing sensitive data, and sign a data-use contract agreeing to keep the data confidential. Requests to access these datasets should be directed at: https://data.cpc.unc.edu/projects/2/view.

Author contributions

MJ: Conceptualization, Data curation, Formal analysis, Methodology, Writing – original draft, Writing – review & editing. NH: Formal analysis, Methodology, Software, Writing – original draft, Writing – review & editing. EE: Writing – original draft, Writing – review & editing. CE: Conceptualization, Supervision, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

  • 1

    AignerA.GrittnerU.RolfsA.NorrvingB.SiegerinkB.BuschM. A.et al. (2017). Contribution of established stroke risk factors to the burden of stroke in young adults. Stroke48, 17441751. 10.1161/STROKEAHA.117.016599

  • 2

    AmoahD.SchmidtM.MatherC.PriorS.HerathM. P.BirdM. L.et al. (2024). An international perspective on young stroke incidence and risk factors: a scoping review. BMC Public Health24:1627. 10.1186/s12889-024-19134-0

  • 3

    AppelrosP.StegmayrB.TeréntA. (2009). Sex differences in stroke epidemiology. Stroke40, 10821090. 10.1161/STROKEAHA.108.540781

  • 4

    ArntzR.Rutten-JacobsL.MaaijweeN.SchoonderwaldtH.DorresteijnL.van DijkE.et al. (2013). Post-stroke epilepsy in young adults: a long-term follow-up study. PLoS ONE8:e55498. 10.1371/journal.pone.0055498

  • 5

    ArtazcozL.BenachJ.BorrellC.CortèsI. (2004). Unemployment and mental health: understanding the interactions among gender, family roles, and social class. Am. J. Public Health94, 8288. 10.2105/AJPH.94.1.82

  • 6

    BanksP.PearsonC. (2004). Parallel lives: younger stroke survivors and their partners coping with crisis. Sex Relatsh. Ther. 19, 413429. 10.1080/14681990412331298009

  • 7

    BarrioI.ArosteguiI.Rodríguez-ÁlvarezM. X.QuintanaJ. M. (2017). A new approach to categorising continuous variables in prediction models: proposal and validation. Stat. Methods Med. Res. 26, 25862602. 10.1177/0962280215601873

  • 8

    BatesN.ChinM.BeckerT. (Eds.) (2022). Measuring Sex, Gender Identity, and Sexual Orientation. Washington, DC: National Academies Press. 10.17226/26424

  • 9

    BéjotY.BaillyH.DurierJ.GiroudM. (2016). Epidemiology of stroke in Europe and trends for the 21st century. Presse Med. 45, e391e398. 10.1016/j.lpm.2016.10.003

  • 10

    BennetteC.VickersA. (2012). Against quantiles: categorization of continuous variables in epidemiologic research, and its discontents. BMC Med. Res. Methodol12:21. 10.1186/1471-2288-12-21

  • 11

    BonkhoffA. K.SchirmerM. D.BretznerM.HongS.RegenhardtR. W.BrudforsM.et al. (2021). Outcome after acute ischemic stroke is linked to sex-specific lesion patterns. Nat. Commun. 12:3289. 10.1038/s41467-021-23492-3

  • 12

    BoothJ.ConnellyL.LawrenceM.ChalmersC.JoiceS.BeckerC.et al. (2015). Evidence of perceived psychosocial stress as a risk factor for stroke in adults: a meta-analysis. BMC Neurol. 15:233. 10.1186/s12883-015-0456-4

  • 13

    BukhariS.YaghiS.BashirZ. (2023). Stroke in young adults. J. Clin. Med. 12:4999. 10.3390/jcm12154999

  • 14

    BushnellC. D.ReevesM. J.ZhaoX.PanW.Prvu-BettgerJ.ZimmerL.et al. (2014). Sex differences in quality of life after ischemic stroke. Neurology82, 922931. 10.1212/WNL.0000000000000208

  • 15

    CamargoE. C.SinghalA. B. (2021). Stroke in pregnancy. Obstet. Gynecol. Clin. North Am. 48, 7596. 10.1016/j.ogc.2020.11.004

  • 16

    ChandrabhatlaA. S.KuoE. A.SokolowskiJ. D.KelloggR. T.ParkM.MastorakosP.et al. (2023). Artificial intelligence and machine learning in the diagnosis and management of stroke: a narrative review of United States Food and Drug Administration-Approved Technologies. J. Clin. Med. 12:3755. 10.3390/jcm12113755

  • 17

    ChenR.OvbiageleB.FengW. (2016). Diabetes and stroke: epidemiology, pathophysiology, pharmaceuticals and outcomes. Am. J. Med. Sci. 351, 380386. 10.1016/j.amjms.2016.01.011

  • 18

    CramerR. J.KapustaN. D. (2017). A social-ecological framework of theory, assessment, and prevention of suicide. Front. Psychol. 8:1756. 10.3389/fpsyg.2017.01756

  • 19

    DaidoneM.FerrantelliS.TuttolomondoA. (2024). Machine learning applications in stroke medicine: advancements, challenges, and future prospectives. Neural. Regen. Res. 19, 769773. 10.4103/1673-5374.382228

  • 20

    DevS.WangH.NwosuC. S.JainN.VeeravalliB.JohnD. A.et al. (2022). predictive analytics approach for stroke prediction using machine learning and neural networks. Healthc. Anal. 2:100032. 10.1016/j.health.2022.100032

  • 21

    DongJ. Y.ZhangY. H.TongJ.QinL. Q. (2012). Depression and risk of stroke. Stroke43, 3237. 10.1161/STROKEAHA.111.630871

  • 22

    EbrahimiR.LynchK. E.BeckhamJ. C.DennisP. A.ViernesB.TsengC. H.et al. (2021). Association of posttraumatic stress disorder and incident ischemic heart disease in women veterans. JAMA Cardiol. 6, 642651. 10.1001/jamacardio.2021.0227

  • 23

    EkkerM. S.VerhoevenJ. I.VaartjesI.van NieuwenhuizenK. M.KlijnC. J. M.de LeeuwF. E. (2019). Stroke incidence in young adults according to age, subtype, sex, and time trends. Neurology92, e2444e2454. 10.1212/WNL.0000000000007533

  • 24

    ElwertF. (2013). “Graphical causal models,” in Handbook of Causal Analysis for Social Research, ed. S. L. Morgan (Cham: Springer), 245273. 10.1007/978-94-007-6094-3_13

  • 25

    FanC.ChenM.WangX.WangJ.HuangB. (2021). A review on data preprocessing techniques toward efficient and reliable knowledge discovery from building operational data. Front. Energy Res.9:652801. 10.3389/fenrg.2021.652801

  • 26

    FriedmanJ.HastieT.TibshiraniR. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33. 10.18637/jss.v033.i01

  • 27

    HarrisK.HalpernC.BiemerP.LiaoD.DeanS. (2019). Add Health Wave V Documentation: Sampling and Mixed-Mode Survey Design, 2019. Available at: http://www.cpc.unc.edu/projects/addhealth/documentation/guides/ (accessed November 12, 2023).

  • 28

    HassanA.Gulzar AhmadS.Ullah MunirE.Ali KhanI.RamzanN. (2024). Predictive modelling and identification of key risk factors for stroke using machine learning. Sci. Rep. 14:11498. 10.1038/s41598-024-61665-4

  • 29

    HernanM. A. (2002). Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am. J. Epidemiol. 155, 176184. 10.1093/aje/155.2.176

  • 30

    ImoisiliO. E.ChungA.TongX.HayesD. K.LoustalotF. (2024). Prevalence of stroke — behavioral risk factor surveillance system, United States, 2011–2022. MMWR Morb. Mortal. Wkly. Rep. 73, 449455. 10.15585/mmwr.mm7320a1

  • 31

    IrvinJ. A.KondrichA. A.KoM.RajpurkarP.HaghgooB.LandonB. E.et al. (2020). Incorporating machine learning and social determinants of health indicators into prospective risk adjustment for health plan payments. BMC Public Health20:608. 10.1186/s12889-020-08735-0

  • 32

    JeffersA. M.GlantzS.ByersA. L.KeyhaniS. (2024). Association of cannabis use with cardiovascular outcomes among US adults. J. Am. Heart Assoc. 13. 10.1161/JAHA.123.030178

  • 33

    JiaJ. (2015). Factors related to long-term post-stroke cognitive impairment in young adult ischemic stroke. Med. Sci. Monit. 21, 654660. 10.12659/MSM.892554

  • 34

    KinoS.HsuY. T.ShibaK.ChienY. S.MitaC.KawachiI.et al. (2021). A scoping review on the use of machine learning in research on social determinants of health: trends and research prospects. SSM Popul. Health15:100836. 10.1016/j.ssmph.2021.100836

  • 35

    KisselaB. M.KhouryJ. C.AlwellK.MoomawC. J.WooD.AdeoyeO.et al. (2012). Age at stroke. Neurology79, 17811787. 10.1212/WNL.0b013e318270401d

  • 36

    KrawczykB. (2016). Learning from imbalanced data: open challenges and future directions. Prog. Artif. Intell. 5, 221232. 10.1007/s13748-016-0094-0

  • 37

    KrishnaP.NareshS.KrishnaG. S.LakshmiAVengammaB.KumarV. (2009). Stroke in chronic kidney disease. Indian J. Nephrol.19, 57. 10.4103/0971-4065.50672

  • 38

    Lasek-BalA.KopytaI.Warsz-WianeckaA.PuzP.Łabuz-RoszakB.ZarebaK.et al. (2018). Risk factor profile in patients with stroke at a young age. Neurol. Res. 40, 595601. 10.1080/01616412.2018.1455367

  • 39

    LeeH.LeeE. J.HamS.LeeH. B.LeeJ. S.KwonS. U.et al. (2020). Machine learning approach to identify stroke within 4.5 hours. Stroke51, 860866. 10.1161/STROKEAHA.119.027611

  • 40

    LeeY.TsaiC.YenY.HuangL. K.ChaoS. P.HuL. Y.et al. (2022). Periodontitis is a potential risk factor for transient ischemic attack and minor ischemic stroke in young adults: a nationwide population-based cohort study. J. Periodontol. 93, 18481856. 10.1002/JPER.21-0528

  • 41

    LeppertM. H.BurkeJ. F.LisabethL. D.MadsenT. E.KleindorferD. O.SillauS.et al. (2022). Systematic review of sex differences in ischemic strokes among young adults: are young women disproportionately at risk?Stroke53, 319327. 10.1161/STROKEAHA.121.037117

  • 42

    MaaijweeN. A. M. M.ArntzR. M.Rutten-JacobsL. C. A.SchaapsmeerdersP.SchoonderwaldtH. C.van DijkE. J.et al. (2015). Post-stroke fatigue and its association with poor functional outcome after stroke in young adults. J. Neurol. Neurosurg. Psychiatry86, 11201126. 10.1136/jnnp-2014-308784

  • 43

    MainaliS.DarsieM. E.SmetanaK. S. (2021). Machine learning in action: stroke diagnosis and outcome prediction. Front. Neurol. 12:734345. 10.3389/fneur.2021.734345

  • 44

    MaldonadoS.LópezJ.VairettiC. (2019). An alternative SMOTE oversampling strategy for high-dimensional datasets. Appl Soft Comput. 76, 380389. 10.1016/j.asoc.2018.12.024

  • 45

    NanavatiH. D.ArevaloA.MemonA. A.LinC. (2023). Associations between posttraumatic stress and stroke: a systematic review and meta-analysis. J. Trauma Stress36, 259271. 10.1002/jts.22925

  • 46

    NiY.AlwellK.MoomawC. J.WooD.AdeoyeO.FlahertyM. L.et al. (2018). Towards phenotyping stroke: leveraging data from a large-scale epidemiological study to detect stroke diagnosis. PLoS ONE. 13:e0192586. 10.1371/journal.pone.0192586

  • 47

    Ortega HinojosaA. M.DaviesM. M.JarjourS.BurnettR. T.MannJ. K.HughesE.et al. (2014). Developing small-area predictions for smoking and obesity prevalence in the United States for use in Environmental Public Health Tracking. Environ Res. 134, 435452. 10.1016/j.envres.2014.07.029

  • 48

    OspelJ.SinghN.GaneshA.GoyalM. (2023). Sex and gender differences in stroke and their practical implications in acute care. J. Stroke25, 1625. 10.5853/jos.2022.04077

  • 49

    PearlJ. (2009). Causal inference in statistics: an overview. Stat. Surv. 3. 10.1214/09-SS057

  • 50

    PianoM. R. (2017). Alcohol's effects on the cardiovascular system. Alcohol Res. 38, 219241. http://www.ncbi.nlm.nih.gov/pubmed/28988575

  • 51

    PollockA.St GeorgeB.FentonM.FirkinsL. (2014). Top 10 research priorities relating to life after stroke – consensus from stroke survivors, caregivers, and health professionals. Int. J. Stroke9, 313320. 10.1111/j.1747-4949.2012.00942.x

  • 52

    R Core Team (2021). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.

  • 53

    ReevesM. J.BushnellC. D.HowardG.GarganoJ. W.DuncanP. W.LynchG.et al. (2008). Sex differences in stroke: epidemiology, clinical presentation, medical care, and outcomes. Lancet Neurol. 7, 915926. 10.1016/S1474-4422(08)70193-5

  • 54

    RichmondH. L.TomeJ.RochaniH.FungI. C. H.ShahG. H.SchwindJ. S.et al. (2020). The use of penalized regression analysis to identify county-level demographic and socioeconomic variables predictive of increased COVID-19 cumulative case rates in the State of Georgia. Int. J. Environ. Res. Public Health17:8036. 10.3390/ijerph17218036

  • 55

    RoquerJ.CampelloA. R.GomisM. (2003). Sex differences in first-ever acute stroke. Stroke34, 15811585. 10.1161/01.STR.0000078562.82918.F6

  • 56

    RyderA. L.CohenB. E. (2021). Evidence for depression and anxiety as risk factors for heart disease and stroke: implications for primary care. Fam. Pract. 38, 365367. 10.1093/fampra/cmab031

  • 57

    SchaapsmeerdersP.MaaijweeN. A. M.van DijkE. J.Rutten-JacobsL. C.ArntzR. M.SchoonderwaldtH. C.et al. (2013). Long-term cognitive impairment after first-ever ischemic stroke in young adults. Stroke44, 16211628. 10.1161/STROKEAHA.111.000792

  • 58

    ShethS. A.GiancardoL.ColasurdoM.SrinivasanV. M.NiktabeA.KanP.et al. (2023). Machine learning and acute stroke imaging. J. Neurointerv. Surg. 15, 195199. 10.1136/neurintsurg-2021-018142

  • 59

    SimeonovK. P.HimmelsteinD. S. (2015). Lung cancer incidence decreases with elevation: evidence for oxygen as an inhaled carcinogen. PeerJ. 3:e705. 10.7717/peerj.705

  • 60

    SinghalA. B.BillerJ.ElkindM. S.FullertonH. J.JauchE. C.KittnerS. J.et al. (2013). Recognition and management of stroke in young adults and adolescents. Neurology81, 10891097. 10.1212/WNL.0b013e3182a4a451

  • 61

    SounJ. E.ChowD. S.NagamineM.TakhtawalaR. S.FilippiC. G.YuW.et al. (2021). Artificial intelligence and acute stroke imaging. Am. J. Neuroradiol. 42, 211. 10.3174/ajnr.A6883

  • 62

    SultanS.ElkindM. S. V. (2013). The growing problem of stroke among young adults. Curr. Cardiol. Rep. 15:421. 10.1007/s11886-013-0421-z

  • 63

    SumnerJ. A.ClevelandS.ChenT.GradusJ. L. (2023). Psychological and biological mechanisms linking trauma with cardiovascular disease risk. Transl. Psychiatry13:25. 10.1038/s41398-023-02330-8

  • 64

    TayJ. K.NarasimhanB.HastieT. (2023). Elastic net regularization paths for all generalized linear models. J. Stat. Softw. 106. 10.18637/jss.v106.i01

  • 65

    TeasellR. W.McRaeM. P.FinestoneH. M. (2000). Social issues in the rehabilitation of younger stroke patients. Arch. Phys. Med. Rehabil. 81, 205209. 10.1016/S0003-9993(00)90142-4

  • 66

    TibshiraniR. (1996). Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B. 58, 267288. 10.1111/j.2517-6161.1996.tb02080.x

  • 67

    TorgoL. (2014). An infra-structure for performance estimation and experimental comparison of predictive models in R. arXiv [Preprint]. arXiv:1412.0436. 10.48550/arXiv.1412.0436

  • 68

    TregerI.ShamesJ.GiaquintoS.RingH. (2007). Return to work in stroke patients. Disabil. Rehabil. 29, 13971403. 10.1080/09638280701314923

  • 69

    VaronaJ. F.BermejoF.GuerraJ. M.MolinaJ. A. (2004). Long-term prognosis of ischemic stroke in young adults. J. Neurol. 251, 15071514. 10.1007/s00415-004-0583-0

  • 70

    VestlingM.TufvessonB.IwarssonS. (2003). Indicators for return to work after stroke and the importance of work for subjective well-being and life satisfaction. J. Rehabil. Med. 35, 127131. 10.1080/16501970310010475

  • 71

    WhiteA. (2020). Gender differences in the epidemiology of alcohol use and related harms in the United States. Alcohol Res. Curr. Rev. 40. 10.35946/arcr.v40.2.01

  • 72

    WiemkenT. L.KelleyR. R. (2020). Machine learning in epidemiology and health outcomes research. Annu. Rev. Public Health41, 2136. 10.1146/annurev-publhealth-040119-094437

  • 73

    WilsonC. M.BillerJ. (2004). Ischemic stroke. Adv. Neurol.92, 1147.

  • 74

    YahyaT.JilaniM. H.KhanS. U.MszarR.HassanS. Z.BlahaM. J.et al. (2020). Stroke in young adults: current trends, opportunities for prevention and pathways forward. Am. J. Prev. Cardiol. 3:100085. 10.1016/j.ajpc.2020.100085

  • 75

    ZhiS.HuX.DingY.ChenH.LiX.TaoY.et al. (2024). An exploration on the machine-learning-based stroke prediction model. Front. Neurol.15:1372431. 10.3389/fneur.2024.1372431

Summary

Keywords

young stroke, machine learning (ML), sex, risk, behavior

Citation

Jacobs M, Hammarlund N, Evans E and Ellis Jr. C (2024) Identifying predictors of stroke in young adults: a machine learning analysis of sex-specific risk factors. Front. Stroke 3:1488313. doi: 10.3389/fstro.2024.1488313

Received

29 August 2024

Accepted

29 October 2024

Published

18 November 2024

Volume

3 - 2024

Edited by

Amre Nouh, Cleveland Clinic, United States

Reviewed by

Nandavar Shobha, Manipal Hospitals, India

Khuzeima Khanbhai, Jakaya Kikwete Cardiac Institute (JKCI), Tanzania

Updates

Copyright

*Correspondence: Molly Jacobs

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics