Sleep Duration/Quality With Health Outcomes: An Umbrella Review of Meta-Analyses of Prospective Studies

Background To quantitatively evaluate the evidence of duration and quality of sleep as measured by multiple health outcomes. Methods This review is registered with PROSPERO, number CRD42021235587. We systematically searched three databases from inception until November 15, 2020. For each meta-analysis, the summary effect size using fixed and random effects models, the 95% confidence interval, and the 95% prediction interval were assessed; heterogeneity, evidence of small-study effects, and excess significance bias were also estimated. According to the above metrics, we evaluated the credibility of each association. Results A total of 85 meta-analyses with 36 health outcomes were included in the study. We observed highly suggestive evidence for an association between long sleep and an increased risk of all-cause mortality. Moreover, suggestive evidence supported the associations between long sleep and 5 increased risk of health outcomes (stroke, dyslipidaemia, mortality of coronary heart disease, stroke mortality, and the development or death of stroke); short sleep and increased risk of overweight and/or obesity; poor sleep quality and increased risk of diabetes mellitus and gestational diabetes mellitus. Conclusions Only the evidence of the association of long sleep with an increased risk of all-cause mortality was graded as highly suggestive. Additional studies are needed to be conducted. Systematic Review Registration: https://www.crd.york.ac.uk/PROSPERO/, identifier: CRD42021235587


INTRODUCTION
Sleep is an important and complex physiological process for maintaining optimal health. The National Sleep Foundation recommends 7-9 h of sleep for people aged 26-64 years and 7-8 h of sleep for people aged ≥65 years (1). However, because of irregular working, shift-work patterns, and unhealthy sleeping habits, the quantity and quality of sleep may be abnormal in modern society. Over the last few decades, there has been growing evidence to suggest that self-reported short or long sleep duration (often defined as <6 or 7 and >8 or 9 h, respectively) and poor sleep quality [Pittsburgh Sleep Quality Index (PSQI) > 5] may be consistently associated with adverse health outcomes [e.g., short sleep and increased risk of hypertension (2), long sleep and increased risk of chronic kidney disease (3), poor sleep quality and increased risk of preterm birth (4), etc.].
Although previous studies have examined this topic using various methodologies, a quantitative appraisal of epidemiological credibility is lacking, as are examinations of the potential bias between the quantity and quality of sleep and health-related outcomes and assessments of the most influential outcomes. Therefore, in the present study, we conducted an umbrella review of the evidence between the quantity and quality of sleep and the multiple health outcomes in systematic reviews and meta-analyses, assessed the diverse bias, and quantitatively evaluated the strength and credibility of the evidence.

METHODS
We strictly followed standardized guidelines to perform an umbrella review, which is the systematic collection and evaluation of multiple systematic reviews and meta-analyses conducted on a specific research topic (5). The umbrella review was conducted according to the Preferred Reporting Items for Systematic Reviews, Meta-Analyses guidelines (Supplementary Table S1) (6), and the guidance of the Meta-analysis of Observational Studies in Epidemiology statement (Supplementary Table S2) (7). The study protocol was registered in the PROSPERO database for systematic reviews and meta-analyses (registration number: CRD42021235587).

Search Strategy
The electronic databases, PubMed, EMBASE, and the Web of Science were searched systematically from inception until November 15, 2020, to identify related systematic reviews and meta-analyses of observational studies. A predefined search strategy was used, which is presented in Supplementary Table S3. In addition, we performed a manual check of reference lists from the retrieved articles for further potentially relevant articles.

Eligibility Criteria and Appraisal of Included Studies
Two authors (CG and X-YL) independently scrutinized articles based on titles and abstracts. If needed, full articles were retrieved for a final decision. Disagreements between the two reviewers were resolved by discussion and consensus with a senior advisor (T-TG). Articles were included according to the following criteria: (A) systematic reviews and meta-analyses of prospective studies on the associations between duration and quality of sleep and any health-related outcome, (B) studies that relied on data from human studies with any type of health-related outcome measure, and (C) studies that reported effect sizes such as odds ratios (ORs), relative risk (RRs), or hazard ratios (HRs) at follow-up. We included information that we were interested in in each study, such as subgroup analysis and dose-response analysis. If a systematic review or meta-analysis performed a subgroup analysis stratified by the study design, then the results for prospective studies were included (8)(9)(10).
We excluded individual studies according to the following criteria: (A) meta-analyses of case-control or cross-sectional studies, (B) studies in which sleep measures were not the exposure of interest (such as sleep-disordered breathing, restless leg syndrome, or napping), (C) meta-analyses or systematic reviews that did not present study-specific data [effect sizes, 95% confidence intervals (CIs) and numbers of cases/population)], (D) systematic reviews without a quantitative synthesis, or (E) other types of papers (e.g., review, abstract, non-English, or editorial). For the main analysis, whenever an eligible meta-analysis included a lower number of component studies compared to other meta-analyses related to the same association, we retained the one with the largest number of primary studies (8)(9)(10).

Data Extraction
Two investigators (X-YL and F-HL) independently extracted the related data from the included studies using a custom-made data extraction form. In the case of discrepancies, the data were subsequently verified by a third author (CG). The data-collection form included the first author, year of publication, journal of publication, exposure, outcome examined, number of included studies, case number, and study population. For each of the included studies in each eligible meta-analysis, we extracted the first author, year of publication, epidemiological design, number of cases and total population, and the maximally adjusted relative risk (ORs, RRs or HRs) along with the corresponding 95% CI.

Data Analysis
For each exposure and outcome pair, we evaluated the summary effect size and the 95% CI through both fixed and random effects models (11,12). The heterogeneity between studies was assessed with the I 2 metric of inconsistency and its 95% CI (13). The I 2 ranges between 0 and 100% and quantifies the variability in effect estimates that it is due to heterogeneity rather than sampling error. Values exceeding 50% were indicative of high heterogeneity, whereas values >75% implied very high heterogeneity (14). We also calculated the 95% prediction interval (PI), which further accounted for heterogeneity between studies and estimated the uncertainty of the association if future studies examine that same association (15).
We used Egger's regression asymmetry test to identify smallstudy effects (16) to evaluate whether smaller studies tend to give higher risk estimates compared with larger studies, which can indicate publication, other reporting biases, or other reasons for differences between small and large studies (17). We calculated the standard error of the effect size for the largest data set of each meta-analysis to determine whether larger estimates of effect size were predicted by small studies compared to large studies (10). Indication of small study effects was based on the P value for Egger's test was smaller than 0.10 and the largest study had a smaller effect size than the summary effect size (17).
We applied the excess significance test to evaluate whether the observed number of studies (O) with statistically significant results among those included in a meta-analysis was larger than the expected number of studies (E) with statistically significant results (18). E is calculated by the sum of the statistical power estimates for each component study. The statistical power of each study was calculated with an algorithm using a noncentral t distribution (19). The excess significance test for single meta-analyses was considered positive at P < 0.10, given that O > E as previously proposed (10). When standardized mean differences were reported, we planned to transform these estimates into ORs (20). The statistical analysis and the power calculations were conducted in STATA version 15.0, and all P values were two-tailed.

Methodological Quality Appraisal
To study the quality of the reporting of the included systematic reviews and meta-analyses, two investigators (CG and X-YL) independently rated the methodological quality with the Assessment of Multiple Systematic Reviews (AMSTAR-1) tool. Higher scores imply greater quality, ranging from 0 to 11. The AMSTAR-1 tool involves dichotomous scoring (0 or 1) of 11 related items to assess the methodological rigor of the included articles, such as a comprehensive search strategy or publication bias assessment. AMSTAR-1 scores are graded as high (8)(9)(10)(11), medium (4)(5)(6)(7), and low quality (0-3) (21).

Grading the Evidence
Statistically significant meta-analyses (P < 0.05) were rated into four levels (convincing, highly suggestive, suggestive, and weak) using specific criteria. For convincing evidence: P < 10 −6 , number of cases >1,000, I 2 < 50%, P <0.05 of the largest component study in the meta-analysis, 95% PI excludes the null value, absence of small-study effects (P > 0.1 for Egger's test), and no excess significance bias (P > 0.1). For highly suggestive evidence: P < 10 −6 , number of cases >1,000, and P < 0.05 of the largest study. For suggestive evidence: P < 10 −3 , and number of cases >1,000. For weak evidence, the sole criterion was P < 0.05 (22). When P > 0.05, there was no association (10). All analyses were conducted in STATA, version 15.0.

Study Selection
As reported in Figure 1, 15,669 records were retrieved across three electronic databases search, and 7,958 records were identified unduplicated through the parallel reviews. A total of 7,728 records were excluded after title and abstract screening, and 201 were excluded through assessment of the full-text (Figure 1). Ultimately, 36 articles were included in our umbrella review for analysis.

Characteristics of Included Meta-Analyses
The characteristics of these 36 articles are summarized in Table 1. All articles were published between 2009 and 2020. These included studies covered 85 meta-analyses, which reported associations between duration and quality of sleep and 36 different outcomes. The median number of original studies in each meta-analysis was 7 (range from 3 to 27), while the median number of cases was 4,848 (range from 156 to 219,518), and the median number of the total participants was 113,226 (range from 1,230 to 2,311,390). The case number exceeded 1,000 in 81 meta-analyses.

Summary Effect Size
Of the 78 meta-analyses from 32 articles regarding sleep duration, the summary random effects estimates were significant at P < 1 × 10 −6 in 10 (13%) meta-analyses, and the summary fixed effects estimates were significant in 28 (36%) metaanalyses (Supplementary Table S4). Thirty-nine (50%) metaanalyses reported that the largest study effect was nominally statistically significant, with a P < 0.05, and a more conservative effect than the summary random effects was observed in 57 (73%) meta-analyses. The studies with the smallest SE for each association suggested that 30 of 78 were significant at P < 0.05.
Out of the 7 meta-analyses from 5 articles regarding sleep quality, the summary fixed-effects and random-effects estimates were significant at P < 0.05. However, when we used (P < 1 × 10 −6 ) as a threshold for significance, the summary random effects estimates were not significant in any meta-analyses, and 3 (43%) meta-analyses produced significant summary results using the fixed-effects methods (Supplementary Table S4). The studies with the smallest standard error for each association suggested that 5 of 7 were significant at P < 0.05, and a more conservative effect than the summary random effects was observed in 5 (71%) meta-analyses. The studies with the smallest SE for each association suggested that two of 7 were significant at P < 0.05.

Small-Study Effects and Excess Significance Bias
According to Egger's test, evidence of small-study effects was observed in 21 (27%) of 78 meta-analyses and 2 (29%) of 7 meta-analyses about duration and quality of sleep, respectively ( Table 2). When taking the largest study estimate as to the plausible effect size, 22 (28%) of 78 meta-analyses and 3 (43%) of 7 meta-analyses about duration and quality of sleep respectively showed evidence of excess significance ( Table 2).

Methodological Quality of the Meta-Analyses
The methodological quality of the included studies regarding sleep duration (n = 33) and sleep quality (n = 5) was assessed by AMSTAR-1, which contained 11 items for scoring. Figure 2 provides a breakdown of AMSTAR-1 levels for studies representing each study. For sleep duration, the median AMSTAR-1 score achieved across all studies was 6 out of 11 (range from 2 to 9). The studies were rated at three levels: 15% were rated as "high, " 79% were rated as "moderate, " and 6% were classified as "low." For sleep quality, the median AMSTAR-1 score achieved across all studies was 7 (range from 5 to 8). Approximately 40% were rated as being of "high, " 60% as "moderate" quality, and no meta-analysis was categorized into low quality according to the AMSTAR-1 criteria. The common flaws were that gray literature was not considered in the literature search (item 4), and the list of excluded studies was not presented (item 5).

Evidence Grading
For sleep duration, no association presented convincing evidence, the only evidence of long sleep duration with an increased risk of all-cause mortality was categorized as highly suggestive, and the methodological quality was moderate ( Table 2). Moreover, suggestive evidence supported the associations between long sleep and increased risk of 5 health outcomes (stroke, dyslipidaemia, mortality of coronary heart disease, stroke mortality, and the development or death of stroke); short sleep duration and increased risk of overweight and/or obesity. Moreover, 14 associations were supported by weak evidence. The remaining 31 associations were not confirmed. The detailed results of the analyses on which the evidence ratings were based are shown in Table 2.
For sleep quality, no association presented convincing or highly suggestive evidence, whereas suggestive evidence suggested that poor sleep quality was associated with an increased risk of diabetes mellitus and gestational diabetes mellitus. Moreover, 3 associations were supported by weak evidence and 2 associations were not confirmed.

DISCUSSION
In this umbrella review, to objectively assess the strength of associations between duration and quality of sleep and health outcomes, we performed a comprehensive overview by incorporating evidence from the current systematic reviews and meta-analyses of prospective studies. Overall, 85 published meta-analyses were included, and 52 (61%) were nominally statistically significant at P < 0.05 under the random-effects models. Although the study confirmed that short/long sleep duration or poor sleep quality was associated with an increase in the important health outcomes, the mechanisms do not seem straightforward.
In this umbrella review, evidence of the association of long sleep duration with an increased risk of all-cause mortality (25), was the only one categorized as highly suggestive, and the methodological quality was moderate in the above outcome. An association between long sleep and an increased risk of allcause mortality was reported previously in studies with high quality and large sample sizes (59)(60)(61)(62), which was consistent with our results. Heslop and colleagues (63), however, analyzed data from a workplace-based study of Scottish men and women who were followed over a 25-year period and found that long sleep was associated with decreased risk of all-cause mortality in men. However, this study reported RRs with only 3 quantitative categories of sleep duration. Meanwhile, long sleep duration was defined as >8 h, which may result in inaccurate evaluation of extremely long sleep. To date, no published studies have demonstrated a possible mechanism mediating the effect of long sleep as a cause of mortality. The association between a long duration of sleep and mortality may be explained by residual confounding and comorbidities (64). In particular, depressive symptoms, low socioeconomic status, low level of physical activity, unemployment, undiagnosed health conditions, poor general health, and cancer-related fatigue have all been shown to be associated with long sleep (64).
Suggestive evidence has shown that long sleep duration is positively linked with the morbidity of stroke (32) and mortality of stroke per 1-h increase in sleep duration (42). At present, the biological mechanisms of the relationship between long sleep and stroke are not clear. One important biological pathway is inflammation, as long sleep periods have been associated with an increased level of inflammatory biomarkers, such as C-reactive protein and interleukin-6 (65-68). Interestingly, a number of studies have associated long sleep with cardiovascular conditions including atrial fibrillation, carotid artery atherosclerosis, and left ventricular mass, which might have predisposed one to the risk of stroke (69)(70)(71)(72)(73). Meanwhile, some studies suggested an association for long sleep and stroke only among those with limited physical function (74) or with a history of hypertension (75). Another possible biological pathway is due to sleep disorders such as sleep-disordered breathing (76). Decreased cerebral blood flow and raised intracranial pressure occurred during apneic events in some studies (77,78), and cerebral hypoperfusion may also occur during wakefulness in sleep apnea patients (79). Klingelhofer et al. (80) found that blood flow in the middle cerebral artery during apneic showed rapid increases and decreases in velocity. Such changes could incline vulnerable individuals to ischemic or hemorrhagic events (76). Several epidemiological studies have explored this association. A previous meta-analysis indicated that long sleep duration was associated with an increased risk of stroke (55), but they did not use a dose-response analysis to determine the association (55). A meta-analysis by Ge showed a significantly increased risk of stroke incidence and mortality at long sleep durations in both cohort and cross-sectional studies. Their subgroup analysis also showed that long sleep duration was a statistical stroke risk in both sexes and in Asians (81). Those results were in accordance with ours. However, the relationship between sleep duration and stroke may be related to stroke types (82), age (83), gender and race (84), and high-quality studies are therefore needed to explore this matter.
We also found suggestive evidence that long sleep duration is associated with an increased risk of mortality of coronary heart disease (28). Khan et al. (85) found that there is a significant association of coronary heart disease in the top quartile of sleep duration compared to those in the bottom quartile. Those results were in accordance with ours. However, further adjustment for risk factors including systolic blood pressure, history of cardiovascular disease, diabetes, smoking, alcohol use, renal function and serum Low-Density Lipoprotein cholesterol attenuated the associations with fatal coronary heart disease. However, the average sleep duration in Khan's study was 9.1 h with the lowest quartile being 8.2 h. This is longer than previously reported data from Western populations. Long sleep duration has been related to systemic inflammation, an increase in cytokines and changes in several metabolic pathways (68). The lack of physiological challenge due to increased sleep has also been proposed as a mechanism that may increase mortality (86). Longer sleep duration has also been linked to depression and other psychiatric disorders, which are known to be associated with increased cardiovascular disease events (51,87). However, all these proposed mechanisms are speculative at best and require more research.
Suggestive evidence also showed that short sleep duration is linked with the increased risk of overweight or obesity only observed in children (50). There are several lines of evidence to suggest plausible mechanisms. Sleep deprivation is associated with various hormonal responses that may affect both hunger and satiety, leading to appetite dysregulation. These include lower leptin and higher ghrelin levels (88,89), which would increase appetite. Sleep deprivation has effects on endocannabinoids which regulate a variety of central nervous system processes including appetite (90). Changes in factors that affect metabolism, including insulin and glucose metabolism, cortisol, growth hormone and thyroid stimulating hormone are also important (48,(91)(92)(93)(94)(95). In turn, obesity predisposes individuals to metabolic dysfunction that can cause sleep apnea, which leads to short sleep duration (96). Activation of inflammatory pathways by short sleep periods may be implicated in the development of obesity (97) and it can up and downregulate the expression of genes involved in oxidative stress and metabolism (98). Finally, insufficient sleep is associated with alterations in attention, impulse control, mood, motivation, and judgment, and all of these factors could potentially influence eating behaviors, energy intake, and ultimately BMI in children (99).
Regarding sleep quality, we found that poor sleep quality is associated with an increased risk of diabetes mellitus (56) and gestational diabetes mellitus (53). This result is consistent with the results of a previous meta-analysis, which concluded that sleep quality may be a novel and independent risk factor for poorer glycemic control in type 2 diabetes patients (100). Poor sleep quality as defined by the presence of one or more insomnia symptoms included in the Diagnostic and Statistical Manual of Mental Disorders diagnostic criteria was associated with a 40% increase in the risk of developing diabetes. Poor self-reported sleep quality may also be linked with other comorbid conditions, such as depression, undiagnosed obstructive sleep apnea and sleep deprivation, which are risk factors for diabetes. Adjusting for these covariates still resulted in a significant association between poor sleep quality and incident diabetes in several studies (101,102). Pregnant women with poor sleep quality had an increased risk of gestational diabetes mellitus (GDM). In addition to its direct effect on GDM, sleep quality has been considered a moderator of the association between sleep duration and GDM risk (103). The potential pathophysiological mechanisms between poor sleep quality and glucose intolerance have been well established, including decreased brain glucose utilization (104-106), sympathetic nervous system overactivity (107,108), alterations in the hypothalamic-pituitary-adrenal axis and growth hormone (109)(110)(111), elevated systemic inflammatory response (112,113), reduction in the percentage of slow wave sleep (114), adipocyte dysfunction (115), changes in appetiteregulating hormones (88), and increased obesity risk (95).
To our knowledge, the present study is the first umbrella review to quantitatively evaluate the existing evidence of the associations between duration and quality of sleep and health outcomes. The main strength of our umbrella review was to provide a comprehensive summary and evaluation of the credibility and validity of evidence of duration and quality of sleep and health outcomes according to the assessment results of a series of statistical analyses. In addition, we searched three databases through a rigorous strategy, and two authors independently extracted the information. Moreover, we followed the AMSTAR-1 criteria to assess the methodological quality of selected studies in our umbrella review, and most of the investigated meta-analyses achieved a moderate-to-high quality score. We used standardized criteria to explore the extent of heterogeneity and potential bias among the included studies and further assessed the strength of claimed associations to identify which was the most credible evidence. We also used the criteria of evidence grading to evaluate the evidence categorization.
Nevertheless, several limitations should be noted when interpreting the results. Firstly, we failed to find convincing evidence for the relation of sleep and health outcomes. In addition, the quality of the evidence was rated weak or not confirmed for 39% of the associations (n = 33). Thus, further research is needed for outcomes for which the certainty of evidence was rated weak or not confirmed. Secondly, owing to the limited studies, we failed to conduct subgroup analysis (e.g., exploring by age, sex, geographical location), or sensitive analysis (e.g., excluding studies with high risks), and other relevant factors might have been missed. Many meta-analyses lacked doseresponse information and compared high vs. low sleep duration without defining thresholds for these categories. Thirdly, we only evaluated published meta-analyses of prospective studies with available data, therefore, meta-analyses of randomized controlled trials were not included in our study. Fourthly, for sleep quality, only systematic reviews and meta-analyses assessing sleep quality by the PSQI questionnaire were included in this umbrella review. The PSQI is currently the only standardized clinical instrument that covers a broad range of indicators relevant to sleep quality (116). Lastly, we did not examine any error of the meta-analyses or the quality of the primary studies, as these were beyond the scope of our umbrella review. Our findings appear to be very convincing, but one may need to practice caution in terms of considering the implications of the results in the community. Although long or short sleep is associated with an increased risk of some health outcomes, there is no rigorous evidence that lengthening or shortening sleep duration can lead to a smaller frequency of these outcomes.
In conclusion, abnormal duration or quality of sleep was significantly associated with an extensive range of adverse healthrelated outcomes. Based on our umbrella review, although 36 studies explored 36 unique associations, the highly suggestive evidence only supported that long sleep duration was associated with an increased risk of all-cause mortality. The relationship between abnormal duration or quality of sleep and other outcomes could be genuine, but there is still limited evidence for them. Overall, this article assessed the associations between duration and quality of sleep and health outcomes based on previous studies, which is helpful for identifying at-risk groups and developing prevention strategies to counteract the effect of sleep discrepancies. Abnormal duration or quality of sleep is harmful to human health, but further high-quality prospective studies and better designed trials are needed to generate definite conclusions.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
T-TG and Q-JW contributed to the study design. CG and X-YL conducted the literature search. JG, J-LL, X-YL, F-HL, and MZ extracted the data and conducted the analyses. CG, JG, T-TG, J-LL, Y-TS, and Y-HZ wrote the first draft of the manuscript and edited the manuscript. All authors read and approved the final manuscript and accept responsibility for the integrity of the data analyzed.