The effect of aquatic exercise on bone mineral density in older adults. A systematic review and meta-analysis

Introduction: Aquatic or water-based exercise is a very popular type of exercise in particular for people with physical limitations, joint problems and fear of falling. The present systematic review and meta-analysis aimed to provide evidence for the effect of aquatic exercise on Bone Mineral Density (BMD) in adults. Methods: A systematic literature search of five electronic databases (PubMed/MEDLINE, Cochrane Library, Scopus, Web of Science and CINAHL) according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) was conducted until 2022/01/30, with an update to 2022/10/07. We included controlled trials with a duration of more than 6 months and at least two study groups, aquatic exercise (EG) versus non-training controls (CG) with no language restrictions. Outcome measures were standardized mean differences (SMD) with 95%-confidence intervals (95%-CI) for BMD changes at the lumbar spine (LS) and femoral neck (FN). We applied a random-effects meta-analysis and used the inverse heterogeneity (IVhet) model to analyze the data. Results: Excluding an outlier study with an exceptionally high effect size for LS-BMD, we observed a statistically significant (p = .002) effect (EG vs. CG) of aquatic exercise for the LS-BMD (n = 10; SMD: 0.30; 95%-CI: 0.11–0.49). In parallel, the effect of aquatic exercise on FN-BMD was statistically significant (p = .034) compared to the CG (n = 10; SMD: 0.76, 95%-CI: 0.06–1.46). Of importance, heterogeneity between the trial results was negligible for LS (I2: 7%) but substantial for FN-BMD (I2: 87%). Evidence for risks of small study/publication bias was low for LS-BMD and considerable for FN-BMD. Discussion: In summary, the present systematic review and meta-analysis provides further evidence for the favorable effect of exercise on bone health in adults. Due to its safety and attractiveness, we particularly recommend water-based exercise for people unable, afraid or unmotivated to conduct intense land-based exercise programs.


Introduction
Exercise is a highly effective tool for decreasing the risk of fragility fractures in older adults (Hoffmann et al., 2022). However the application of intense weight bearing (WBE) and dynamic resistance exercise (DRT) predominately recommended for increasing bone strength (DVO, 2009;Beck et al., 2016;Daly et al., 2019;Kemmler and Stengel, 2019) often conflicts with the physical situation and preferences of many older and/or vulnerable cohorts (Rodrigues et al., 2017). In addition, the risk of falling, frequently prevalent during intense WBE, discourages older people from starting and maintaining exercise (Rodrigues et al., 2017). Correspondingly, for people unable, afraid or unmotivated to exercise conventionally, aquatic/water-based interventions with their reduced stress on the joints, analgesic effect (Falagas et al., 2009) and lack of fall risk might be a suitable option. On the other hand, exercise categories without relevant weight-bearing character, e.g., swimming (Gomez-Bruton et al., 2013;Mohr et al., 2015;Gomez-Bruton et al., 2016;Su et al., 2020) or cycling (Nagle and Brooks, 2011;Olmedillas et al., 2012) failed to generate positive effects on Bone Mineral Density (BMD) at the lumbar spine (LS) or proximal femur (FN), i.e., the most important sites for osteoporotic fractures (Kanis et al., 2008). Of note however, the characteristics of aquatic exercise differ greatly from swimming. In this context, aquatic exercise should not be considered a rigid exercise categorization, but more as a vehicle for several types of exercise that are executed in chest high, predominately suitably heated water (Torres-Ronda and Del Alcazar, 2014). Apart from a few studies that focus on aerobic (aquatic) exercise (e.g., (Tsukahara et al., 1994;Pernambuco et al., 2013), the majority of studies applied resistance type exercise using water resistance to increase exercise intensity (Rotstein et al., 2008;Borba-Pinheiro et al., 2010;Borba-Pinheiro et al., 2012;Moreira et al., 2014;Wochna et al., 2019). Considering further that many studies (e.g., (Wochna et al., 2019) used devices (e.g., boards or cuffs) to increase water resistance during the movements, exercise intensity might be similar to intense conventional DRT. In their systematic review and metaanalysis, Simas et al. (2017) reported (statistically significant) higher effects of land-based versus water-based exercise on BMD in middle aged and older adults. However, more importantly, compared to sedentary controls, BMD effects of aquatic exercise were significantly higher at LS (mean difference 0.03 g/cm 2 ; 95%-CI: 0.01-0.05 g/cm 2 ) and FN (0.04 g/cm 2 ; 95%-CI: 0.02-0.07 g/cm 2 ). The authors (Simas et al., 2017) included six and five studies that focus on LS or femoral neck BMD respectively, however, several new trials on aquatic exercise were published after the search deadline of this work. Thus, the aim of this systematic review and meta-analysis was to provide an update on the effect of aquatic exercise on BMD at the lumbar spine and proximal femur region of interest (ROI). We hypothesize that aquatic exercise statistically significantly increases BMD at the LS and proximal femur (i.e., total hip or femoral neck ROI) compared with non-training controls.

Methods
This meta-analysis was reported according to the Preferred Reporting Items for systematic reviews and meta-Analyses (PRISMA) 2020 statement (Page et al., 2021). The present metaanalysis was registered on the International Prospective Register of Systematic Reviews (PROSPERO) under ID CRD42022298321.

Information sources
An overall search was performed on five electronic databases (PubMed/MEDLINE, Cochrane Library, Scopus, Web of Science and CINAHL) for all articles published from inception up to 2022/ 01/30 (with an update on 2022/10/07) with no language restrictions. Two articles with Spanish (Ramirez-Villada et al., 2016) or Japanese (Wu et al., 2000) full-text articles were translated by electronic resources (DeepL-translator).

Eligibility criteria
Inclusion criteria: 1) Adult participants of both genders. 2) Studies with participants on pharmaceutic osteoporosis therapy, when the number of subjects was comparable (difference <10%) between the exercise and control group. 3) Randomized and non-randomized controlled trials with at least one exercise group compared with a non-training control group. 4) A minimum of 6 months aquatic exercise intervention duration (shorter studies might not reach the full amount of mineralized bone and thus confound the BMD assessment). 5) Bone mineral density (BMD) of the lumbar spine (LS), femoral neck (FN) and/or total hip (TH) region at baseline and follow-up assessment as determined by 6) dual-energy x-ray absorptiometry (DXA), dual photon absorptiometry (DPA) or quantitative computed tomography (QCT). 7) Studies with pharmaceutic therapy on osteoporosis were included, however, only when exercise and control group were similarly provided.
Exclusion criteria: Studies that focus on 1) professional athletes or 2) animals. 3) Diseases or conditions that relevantly affect bone metabolism. 4) Double/multiple publications from one study (we included the publication with the most recent data on BMD) and preliminary data from subsequently published trials 5) Review articles, meta-analyses, case reports, editorials, conference abstracts, and letters.

Literature search
A standard protocol for this search was developed and a controlled vocabulary (MeSH term for MEDLINE, CINAHL ® Subject Headings for CINAHL) was applied. Keywords and their synonyms were used by applying the following queries ("water sports" OR "aquatics" OR "water aerobics" OR "warm water exercise" OR "aquatic weight-bearing exercises" OR "aquatic therapy" OR "hydro-gymnastics" OR "hydrogymnastics" OR "water exercise therapy" OR "water-based exercise" OR "exercise in water" OR "aqua* exercise" OR "aqua* gymnastics" OR "aqua* sports" OR "water exercise" OR "water based exercise" OR "water gymnastics" OR "hydro gymnastics" OR "swimming" OR "hydrotherapy" OR "exercise therapy") AND ("bone strength" OR "BMC" OR "bone loss" OR "bone content" OR "bone*" OR "bone mass" OR "bone status" OR "bone structure" OR "bone turnover" OR "bone metabolism" OR "bone mineral content" OR "skeleton" OR "bone mineral density" OR "BMD" OR "bone density" OR "osteoporoses" OR "osteoporosis" OR "osteopenia") Reference lists of eligible articles or reviews (Simas et al., 2017) dealing with the effect of aquatic exercise on BMD in adults were also screened. Articles were excluded when no full text was available or the reports were unpublished. We contacted authors by email when relevant data on eligibility, methods or results were unclear or missing.

Data extraction
ES and SK independently reviewed titles and abstracts for eligible articles, then the full-text articles were checked by ES and SK. Study data were extracted by ES, SK and WK with any disagreement being resolved by discussion. Data from included articles were checked using a extraction form that determined: a) publication details (e.g., first author's name, publication year, country, study design); b) participant characteristics (gender, age, age at menopause, medication, diseases, medical conditions, bone status, lifestyle including Vitamin D and calcium intake and supplementation, physical activity, training status, body mass, body-height, BMI); c) study characteristics (length of the study, initial sample size, loss to follow-up); d) exercise characteristics including intervention length, type of exercise, exercise parameters (exercise frequency, intensity and volume), intensity progression, periodization, adherence, number of withdrawals, supervision of the session, adverse effects and f) supplementation with nutritional complements.

Outcome measures
Outcomes of interest were change in (areal) Bone Mineral Density at the lumbar spine (LS) and/or total hip (TH) region, or femoral neck (FN) assessed by DXA, DPA or QCT, between baseline and study end. Of importance, due to the aspect that most authors reported either TH or FN we had to conduct a joint analysis. In detail, TH BMD data were preferentially included in the analysis, however in cases where TH results were not reported, FN-BMD data of the studies were used. Nevertheless data were summarized under "FN".

Quality assessment
To evaluate the methodologic quality of the trial, the articles were assessed independently by two reviewers (ES and SK) utilizing the PEDro (Physiotherapy Evidence Database scale risk of bias tool) (Sherrington et al., 2000) and TESTEX (Smart et al., 2015) score, specifically dedicated to physiotherapy (PEDro) and exercise (TESTEX) trials. In cases of inconsistency, a third reviewer decided (WK).

Data synthesis
Missing standard deviations (SD) were calculated using the method detailed in the recently published comprehensive metaanalysis by Shojaa et al. (Shojaa et al., 2020). If the studies presented a confidence interval (CI) or standard errors (SE), they were converted to standard deviation (SD) with standardized formulas (Higgins et al., 2011;Higgins et al., 2021). The subgroup analyses focused on differences in study length (i.e., <8 months versus ≥8 months).

Statistical analysis
We applied a random-effects meta-analysis using the metafor package (Viechtbauer, 2010) that is included in the statistical software R (R_Development_Core_Team, 2020). Effect size (ES) values were presented as standardized mean differences (SMDs) with a 95% confidence interval (95%-CI). We applied the heterogeneity (IVhet) model proposed by Doi et al. (2015). Heterogeneity between the studies was checked using I 2 statistics with I 2 categorization of 0%-40% being considered "low", 30%-60% "moderate", 50%-90% "substantial" and 75%-100% considerable heterogeneity (Higgins et al., 2011). Along with funnel plots, regression test and the rank correlation effect estimates and their standard errors using the t-test and Kendall's τ statistic for potential publication bias, we also conducted a trim and fill analysis using the L0 estimator proposed by Duval et al. (Duval and Tweedie, 2000). Additionally, we used Doi plots and the Luis Furuya-Kanamori index (LFK index) (Furuya-Kanamori et al., 2017) to check for asymmetry. Sensitivity analyses were applied to determine whether the overall result of the analysis is robust to the use of the imputed correlation coefficient (minimum, mean or maximum). Furthermore, we applied a sensitivity analysis without an outlying study (Ramirez-Villada et al., 2016) due to doubt as to whether the authors reported standard deviation or standard error. p-value <0.05 was considered as the significance level for all tests. SMD values of 0.2, 0.5, and 0.8 were interpreted as small, medium, and large effects.

Intervention characteristics 3.2.2 Exercise characteristics
The length of the intervention varied from 6 ( Moreira et al., 2014;Ramirez-Villada et al., 2016;Aboarrage Junior et al., 2018;Wochna et al., 2019) to 24 (Wu et al., 2000) months. Three studies did not report the pre-study exercise status of their participants (Tsukahara et al., 1994;Wu et al., 2000;Borba-Pinheiro et al., 2012). Only one study considered its cohort as being sedentary (Moreira et al., 2014) while Ramirez-Villada et al. (2016) included participants with exercise habits that might have affected the result of the later exercise intervention (Table 2). Although all studies applied "water-based exercise," the type of exercise actually prescribed varied considerably. One study applied (not only but largely) swimming exercise (Wu et al., 2000), while most of the other studies applied a mix of aerobic, jumping and resistance exercise using water resistance to increase exercise intensity (Table 2). Eight trials scheduled jumps or other explosive movements (Littrell, 2004;Borba-Pinheiro et al., 2010;Borba-Pinheiro et al., 2012;Pernambuco et al., 2013;Moreira et al., 2014;Ramirez-Villada et al., 2016;Aboarrage Junior et al., 2018;Wochna et al., 2019). Specification of training frequency/week and volume/session varied from ≥1 session of 60 min (Tsukahara et al.,

FIGURE 1
Flow chart of the present study according to PRISMA. Adapted from (Page et al., 2021).
Frontiers in Physiology frontiersin.org 04 1994) to 3 sessions of 90 min (Ramirez-Villada et al., 2016) per week (Tab. 2). Unfortunately, four studies (Wu et al., 2000;Littrell, 2004;Ramirez-Villada et al., 2016;Wochna et al., 2019) did not state exercise intensity specification. Since intensity specification is based predominately on RPE prescription, it is difficult to categorize studies with intensity progression during the intervention (Table 2). Nevertheless, with respect to studies of 12 months and longer (Tsukahara et al., 1994;Wu et al., 2000;Littrell, 2004;Borba-Pinheiro et al., 2010;Borba-Pinheiro et al., 2012), i.e., studies with increased relevance of intensity progression, only two studies (Borba-Pinheiro et al., 2010;Borba-Pinheiro et al., 2012) reported a structured intensity progression. Further, only three trials (Littrell, 2004;Moreira et al., 2014;Ramirez-Villada et al., 2016) reported attendance rates, thus the net training frequency of most studies is not known. Finally, loss to followup ranges from 0% for the 24-month study of Wu et al. (2000) to 25% for the 12-month study of Littrell, (2004). However, loss of follow-up was not consistently reported by all the studies (Table 2). Nevertheless, based on attendance rates and drop-out/loss to follow-up (Table 2), aquatic exercise can be considered as an attractive type of exercise (Shojaa et al., 2020;Kistler-Fischbacher et al., 2021). Finally, four studies (Littrell, 2004;Moreira et al., 2014;Ramirez-Villada et al., 2016;Aboarrage Junior et al., 2018) (Table 3) listed adverse effects. In summary, no injuries, serious medical event or other unintended side effects induced by the exercise protocol were reported. Table 3 shows the methodological quality of the included studies according to the PEDro (Sherrington et al., 2000) and TESTEX (Smart et al., 2015) score. Following PEDro and applying the classification of Ribeiro de Avila et al. (Ribeiro de Avila et al., 2018), the methodologic quality of seven studies can be classified as low (PEDro: <5) and four studies as moderate (PEDro: 5-7). In particular, aspects related to allocation concealment or blinding were not satisfied or not reported. Another important aspect that has   (Figure 2). Heterogeneity between the trials was negligible (I 2 : 7%, Figure 2) Sensitivity analysis with respect to imputation of the mean correlation (see Figure 2), minimum or maximum correlation revealed roughly comparable effects.

BMD changes at the lumbar spine region of interest
The funnel plot with trim and fill analysis suggests no evidence for a publication/small study bias for LS-BMD (Figure 3). The LFK Index (0.39) confirmed the negligible asymmetry, in parallel the regression (p = .833) and rank correlation test (p = .727) do not indicate statistically significant asymmetry. Figure 5 displays the results on FN-BMD. In brief, the effect of aquatic exercise on FN-BMD was statistically significantly (p = .034) higher compared to the non-training CG (SMD: 0.76, 95%-CI: 0.06-1.46). Heterogeneity between the trial results was considerable (I 2 : 87%; Figure 4). Sensitivity analysis with respect to imputation of the mean correlation (see Figure 4), minimum or maximum correlation revealed roughly comparable effects.

BMD changes at the proximal femur region of interest
The funnel plot with trim and fill analysis suggests considerable evidence for a publication/small study bias for FN-BMD ( Figure 5). Two missing studies on the lower left-hand side (i.e., small studies with negative outcome) were imputed. The LFK Index (3.73: major asymmetry) and the regression test (p = .020), but not the rank correlation test (p = .216), confirmed the statistically significant asymmetry of the plot.

Subgroup-analysis on study length
Briefly, we did not determine any statistically significant differences between shorter and longer (5 comparisons each) study duration for the LS (p = .809) or FN BMD (p = .576). In detail, however, results on BMD were statistically significant only for studies ≥8 months.

Discussion
In summary, we verified our hypotheses through observing statistically significant effects of aquatic exercise for lumbar spine and proximal femur BMD. This finding is of particular importance since aquatic exercise is very popular in eastern and central Europe. In Germany alone roughly 15,000 aquatic exercise groups are run under the legal requirements of § 64, German Social Security Code IX (SGB_IX, 2019). Although the evidence for positive effects of aquatic exercise on BMD had been moderate at best, the majority of these mandatorily supervised aquatic exercise groups focus on participants with osteopenia and osteoporosis. Correspondingly, the result of the present systematic review and meta-analysis on aquatic exercise and bone health is of particular interest for the non-pharmacologic treatment of osteoporosis. We are not the first to report the favorable effect of water-based exercise on bone health. In their systematic review and meta-analysis of seven comparisons (aquatic exercise vs. sedentary control), Simas et al. (2017) observed statistically significant effects of water-based exercise on LS and FN-BMD. However, the authors also reported a statistically significant superiority of land-based versus water-based exercise programs (4 comparisons) on LS-and (less pronounced) FN-BMD. Less surprisingly, due to the buoyancy effect, peak vertical ground reaction forces of typical water exercises (e.g., stationary running, Nordic Skiing) were roughly half as high in the aquatic compared to the dry environment (Alberton et al., 2013). Undoubtedly, the main effect of aquatic exercise on bone is triggered by joint reaction forces, while the effect of ground reaction forces during (breast-high) water-based exercise is less relevant or even negligible (Alberton et al., 2013). On the other hand, the minor impact on lower back, hip and lower limb joints, analgesic effects (Hinman et al., 2007;Falagas et al., 2009), psychological comfort (Wochna et al., 2019) and negligible fall risk/ fall consequences make aquatic exercise suitable for many people. This includes cohorts with osteoarticular limitations (Mattos de et al., 2016) or overweight (Torres-Ronda and Del Alcazar, 2014) but also physically limited older people.
Frontiers in Physiology frontiersin.org 07 a TESTEX awards one point for listing the eligibility criteria and, also in contrast to PEDro, a further point for the between group comparison of at least one secondary outcome. Frontiers in Physiology frontiersin.org osteopenia/osteoporosis), which vary widely between the studies (Table 1), might be candidates that modulate exercise effects on bone (Kemmler and Riedel, 1998;Kemmler, 1999). Having said that, two recent meta-analyses on exercise and BMD (Shojaa et al., 2020;Mohebbi et al., 2023) did not report statistically significant differences between the categories. Reviewing the exercise characteristics of the included studies was a daunting task because many studies (Table 2) did not completely or comprehensibly report their exercise protocols. We opt to focus our subgroup analysis on study length, as this was consistently reported by all trials and might be an important modulator of the exercise effects on BMD. Indeed, considering bone remodeling as the primary mode of bone renewal in adults (Eriksen, 2010;Erben, 2015), taking initial familiarization and conditioning phases and regular changes in exercise intensity into account, shorter studies might not reach the full amount of mineralized bone and thus confound the BMD assessment. Categorizing study length into <8 versus ≥8 months did not result in statistically significant differences between the subgroups, however. We attribute this result in part to the very complex interaction between types of exercise, exercise parameters and training principle that aggravates or even prevents a reliable subgroup-analysis of single exercise characteristics in comprehensive meta-analyses (Kemmler, 2013;Gentil et al., 2017). Undoubtedly, our systematic review and meta-analysis feature some limitations and study particularities that should be considered to properly interpret our results. 1) First of all, we have to admit that we were not always convinced whether the studies were interventional (or prospective). This refers to the study of Tsukahara et al. (Tsukahara et al., 1994) and Wu et al. (Wu et al., 2000). After internal discussion, we finally included the studies due to the aspect that according to the authors, initial (BMD) assessment and start of the training program were closely related in time. 2) Related to this issue, unfortunately none of the authors who were contacted (n = 5) by email responded to resolve important methodological issues. This includes the simple issue as to whether variations for BMD changes were given as standard error (SE) or standard deviation (SD). 3) The exclusion of the trial of Ramirez-Villada et al. (2016) is primarily related to the very extreme SMD of 10.06 (7.27-12.84) (LS-BMD, Figure 2), which does not seem realistic to us. It might be caused by the possible confusion of standard deviation and standard error for LS-BMD by the authors. Including the study of Ramirez-Villada et al. (Ramirez-Villada et al., 2016) considerably increase heterogeneity (I 2 = 82%) and lead to non-statistically significant results for LS-BMD (SMD: 0.34; 95%-CI: 0.18-0.96) 4) The methodological quality of the studies ranged between 2 and 7 of a maximum of 10 score-points (median 3.5) for PEDro (Sherrington et al., 2000) and 3 to 12 of a maximum of 15 score-points (median 7.5) for TESTEX (Smart et al., 2015), which on average is low. Even when considering that blinding of participants and caregivers (i.e., trainers) is hardly realizable in

FIGURE 2
Forest plot of meta-analysis results of all included trials at the lumbar spine. Data shown as pooled standardized mean difference (SMD) with 95%-CI for changes in the, EG versus the CG.

FIGURE 3
Funnel plot of the studies that address lumbar spine BMD.
Frontiers in Physiology frontiersin.org 09 exercise studies, exercise trials in general should be more aware of reporting standards (e.g., CONSORT; (Moher et al., 2010) and methodological quality scores (PEDro; TESTEX) so as to ensure the adequacy and completeness of study information that is essential in a publication on exercise. 5) In this context, unfortunately some studies (Table 2) did not report drop-out or exercise attendance, two aspects that reflect the feasibility and attractiveness of the trainings protocol. 6) We included three studies (Borba-Pinheiro et al., 2010;Borba-Pinheiro et al., 2012;Pernambuco et al., 2013) that involved participants with Alendronate therapy. However, due to the aspect that, EG and CG were similarly supplemented and no additive or synergistic effects on exercise and Bisphosphonate therapy have been reported (Klotz et al., 2022), we do not expect a relevant confounding effect. 7) Due to the remodeling issue (Eriksen, 2010;Erben, 2015) discussed above, we include only studies with a minimum of 6 months intervention duration. 8) Due to the aspect that we observed relevant heterogeneity among the studies in a number of meta-analyses on training studies (Hamilton et al., 2021;Mohebbi et al., 2023), we performed a random-effects metaanalysis and specifically chose the applied the inverse heterogeneity Funnel plot of the studies that address lumbar spine BMD.

FIGURE 4
Forest plot of meta-analysis results for the proximal femur region of interest. Data shown as pooled standardized mean difference (SMD) with 95%-CI for changes in the EG versus the CG.
Frontiers in Physiology frontiersin.org 10 model (IVhet) (Doi et al., 2015). This model is less prone to underestimating the statistical error and thus leads to confidence intervals that meet the specified coverage probability better. (Furuya-Kanamori et al., 2017). 9) All the trials included focus on postmenopausal women, thus the generalization of our results might be limited on this cohort.
Summing up our results on aquatic exercise and BMD, we provided further evidence for the positive effect of this training option specifically suitable for physically limited older cohorts with low physical fitness and at risk for falls. Nevertheless, some other important research questions on aquatic exercise should be answered in the near future. This relates in particular to the validation of its positive effects in other cohorts with increased fracture risk. Further, aquatic exercise studies with multiple exercise groups should address the effects of different exercise characteristics (e.g., low vs. high exercise intensity, frequency and duration) on BMD in order to provide validated recommendations on aquatic exercise programs.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions
ES, SS, and WK initiated the present meta-analysis. The literature search was carried out by ES and SK. Data analysis, interpretation, and initial draft of the manuscript was conducted by ES, SK, SS, MK, MU, and WK. All the authors contributed to quality assessment and revised the manuscript. WK accepts responsibility for the integrity of the data sampling, analysis and interpretation.