The Effectiveness of Post-exercise Stretching in Short-Term and Delayed Recovery of Strength, Range of Motion and Delayed Onset Muscle Soreness: A Systematic Review and Meta-Analysis of Randomized Controlled Trials

Background: Post-exercise (i.e., cool-down) stretching is commonly prescribed for improving recovery of strength and range of motion (ROM) and diminishing delayed onset muscular soreness (DOMS) after physical exertion. However, the question remains if post-exercise stretching is better for recovery than other post-exercise modalities. Objective: To provide a systematic review and meta-analysis of supervised randomized-controlled trials (RCTs) on the effects of post-exercise stretching on short-term (≤1 h after exercise) and delayed (e.g., ≥24 h) recovery makers (i.e., DOMS, strength, ROM) in comparison with passive recovery or alternative recovery methods (e.g., low-intensity cycling). Methods: This systematic review followed PRISMA guidelines (PROSPERO CRD42020222091). RCTs published in any language or date were eligible, according to P.I.C.O.S. criteria. Searches were performed in eight databases. Risk of bias was assessed using Cochrane RoB 2. Meta-analyses used the inverse variance random-effects model. GRADE was used to assess the methodological quality of the studies. Results: From 17,050 records retrieved, 11 RCTs were included for qualitative analyses and 10 for meta-analysis (n = 229 participants; 17–38 years, mostly males). The exercise protocols varied between studies (e.g., cycling, strength training). Post-exercise stretching included static stretching, passive stretching, and proprioceptive neuromuscular facilitation. Passive recovery (i.e., rest) was used as comparator in eight studies, with additional recovery protocols including low intensity cycling or running, massage, and cold-water immersion. Risk of bias was high in ~70% of the studies. Between-group comparisons showed no effect of post-exercise stretching on strength recovery (ES = −0.08; 95% CI = −0.54–0.39; p = 0.750; I2 = 0.0%; Egger's test p = 0.531) when compared to passive recovery. In addition, no effect of post-exercise stretching on 24, 48, or 72-h post-exercise DOMS was noted when compared to passive recovery (ES = −0.09 to −0.24; 95% CI = −0.70–0.28; p = 0.187–629; I2 = 0.0%; Egger's test p = 0.165–0.880). Conclusion: There wasn't sufficient statistical evidence to reject the null hypothesis that stretching and passive recovery have equivalent influence on recovery. Data is scarce, heterogeneous, and confidence in cumulative evidence is very low. Future research should address the limitations highlighted in our review, to allow for more informed recommendations. For now, evidence-based recommendations on whether post-exercise stretching should be applied for the purposes of recovery should be avoided, as the (insufficient) data that is available does not support related claims. Systematic Review Registration: PROSPERO, identifier: CRD42020222091.


INTRODUCTION
Exercise sessions typically begin with a warm-up period, followed by the main workout, and end with a cool-down phase, including a progressive reduction of effort and intensity (ACSM, 2018). Stretching is prescribed as an essential component of the cool-down phase by the guidelines of ACSM (2018) and the American Heart Association (2020). The main goals of stretching exercises applied during the cool-down phase (i.e., post-exercise stretching) are to enhance range of motion (ROM) and to reduce stiffness and delayed onset muscle soreness (DOMS) (Sands et al., 2013). There are different post-exercise stretching methods, such as passive static, active static, dynamic, proprioceptive neuromuscular facilitation (PNF), among others (Lima et al., 2019). Despite its wide adoption in exercise protocols, its effectiveness is not well-understood (Van Hooren and Peake, 2018).
Past research has a mixed and often contradicting set of results, with numerous studies indicating post-exercise stretching is not effective for improving recovery. Indeed, in one study with 10 healthy men (Mika et al., 2007), the participants performed three sets of leg extension and flexion at 50% of maximum voluntary contraction (MVC). Post-exercise recovery protocols were used, including light-intensity cycle ergometer and PNF stretching for 5 min. Light-intensity cycle ergometer exercise (10 W at 60 rpm) induced greater short-term recovery (i.e., immediately after the post-exercise protocol) than stretching as measured by MVC, total effort time, motor unit activation and EMG frequency (p < 0.05). In another study (Robey et al., 2009), club (8 men, 6 women; age: 20.2 ± 2.2 years) and elite level rowers (4 men, 2 women, age: 18.6 ± 0.8 years) performed a strenuous stair-climb running protocol. Postexercise recovery protocols were applied at 15-min, 24 and 48 h, including stretching, hot/cold water immersion and passive recovery (i.e., rest). Compared to passive recovery, stretching and hot/cold water immersion induced no recovery effect on leg extension concentric peak torque, 2 km rowing ergometer times, creatine kinase levels, or DOMS, at any time-point. Further, nine physically active men (age, 23 ± 1 years) performed a fatiguing exercise protocol (i.e., 8-min of cycle ergometer at 90% maximum oxygen uptake), followed by a post-exercise stretching protocol (i.e., 10 min) (Cè et al., 2013). After 1 h of performing the stretching protocol, mechanical and physiological assessments (e.g., MVC, EMG amplitude, and lactate kinetics) were similar between the stretching group and the passive recovery group.
Moreover, stretching may be ineffective in relieving perceived muscle pain or in reducing DOMS (Wessel and Wan, 1994;Cheung et al., 2003;Xie et al., 2018). Also, recovery may not simply mean a return to basal values. In other words, to be effective, post-exercise stretching should recover and improve participants function over basal condition (Sands et al., 2013;Van Hooren and Peake, 2018).
Furthermore, potential short-term positive effects of postexercise stretching on recovery should be balanced with long-term adaptations. For example, Fuchs et al. (2020) recently demonstrated that post-exercise cooling (i.e., coldwater immersion) accelerated acute recovery after training sessions; however, it impaired myofibrillar protein synthesis rates after 2-weeks of training compared to not performing coldwater immersion. In this sense, to comprehensively assess the effectiveness of post-exercise stretching, both short-term and delayed recovery should probably be considered.
In order to bring clarity to conflicting results, systematic reviews and meta-analysis (SRMA) are usually performed as a cornerstone for evidence-based practices (Higgins et al., 2019). Indeed, studies in the field tend to use small samples with reduced statistical power (Abt et al., 2020). In contrast, SRMA provide greater statistical power. In fact, some attempts were performed to synthesize current literature related to postexercise stretching and recovery. A SRMA of randomized and quasi-randomized studies showed that stretching before or after exercise did not protect from DOMS (Herbert and Gabriel, 2002), and two independent updates reinforced the same conclusions (Henschke and Lin, 2011;Herbert et al., 2011). However, relevant databases such as PubMed and Web of Science were not included in the searches of the aforementioned SRMAs, and potentially relevant search terms such as "mobility" and "post-exercise" or "post-training" were not applied. Likewise, external experts were not consulted after automated searches, as suggested in high-standard protocols (Moher et al., 2009(Moher et al., , 2015Shea et al., 2017). Moreover, nearly a decade has passed since the publication of the aforementioned SRMAs, and a cursory search of articles in Google Scholar from 2011 to present date suggests that several new studies have been done on the topic. An updated SRMA focused solely on post-exercise stretching and limited to randomized controlled trials (RCTs) may provide a more homogeneous and high-quality data set (Hariton and Locascio, 2018), while an expanded set of relevant databases and search terms may provide a more representative sample of existing studies. Therefore, our goal was to review supervised RCTs on the effects of post-exercise stretching on recovery makers (i.e., DOMS, strength, ROM), in comparison with passive recovery or alternative recovery methods (e.g., low-intensity cycling). Shortterm (≤1 h after exercise) and delayed recovery (24, 48, and 72 h) markers were considered.

Protocol and Registration
This systematic review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Moher et al., 2009(Moher et al., , 2015, the Cochrane Collaboration guidelines for evaluation of risk of bias (RoB) in randomized studies (Sterne et al., 2019), and the AMSTAR 2 recommendations (Shea et al., 2017). Quality of studies was assessed using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) (Guyatt et al., 2011). The review methods were established before initiating the research, and protocol registration preceded the search. Protocol was published in PROSPERO with the reference CRD42020222091.

Eligibility Criteria
Studies were eligible if consisting of original research or replication studies published in peer-reviewed journals, with full-text not limited to any particular language or publication date. Beyond English language, the authors also have a deep understanding of Portuguese and Spanish, as well as a good understanding of French and Italian. If studies were written in different languages, professional translators were hired. Based on scope, P.I.C.O.S. and timeframe for follow-up, Table 1 presents the inclusion and exclusion criteria. The limitation to RCTs was decided because randomization reduces the RoB and balances participants distribution between groups (Hariton and Locascio, 2018). Indeed, RCTs are the gold standard for evidence-based practices (Spieth et al., 2016). Supervision was considered if explicit information was available stating that at least one qualified professional oriented the post-exercise protocol. No studies were excluded on the basis of RoB as assessed through RoB 2 (Sterne et al., 2019).

Information Sources
Search was programmed to start on January 1st, 2021, but since protocol approval occurred earlier (December 2nd, 2020), we conducted the automated searches on December 23 and 24, 2020, with search results being exported to EndNote X9 for Mac (v.9.3.3., Clarivate Analytics). The following electronic databases were searched: Cochrane Library (including CENTRAL), EBSCO (all available databases), PEDro, PubMed, Scielo, Scopus, SPORTDiscus (all databases), and Web of Science (all databases/collections). Search protocol used Boolean operators and required the title, abstract, or keywords had to include ("stretch * " OR "flex * " OR "mobility" OR "range of motion") AND ("post-exerci * " OR "post-workout" OR "post-exertion" OR "post-train * " OR "after exerci * " OR "after workout" OR "after exertion" OR "after training" OR "recover * " OR "warm-down" OR "cool-down") AND "random. * " Similar terms or synonyms were used to guarantee a more inclusive initial search and avoid an excessively narrow scope of analyzed studies. Searches were updated on February 16, 2021, for inclusion of records with date of entry from December 25, 2020, onwards. Where date of entry was not a feature (e.g., EBSCO, Scielo, Scopus, SPORTDiscus, Web of Science), publication date was limited to 2021, since the year 2000 would be practically all covered until the search was completed.
A manual search was conducted within the reference list of the records included in the sample after full text analysis, to retrieve potentially relevant studies that had not emerged in the initial search. After completion of this stage, the list of studies, as well as inclusion and exclusion criteria were sent to eight independent experts in the field, to check if they were aware of additional papers. The experts were university professors with a Ph.D. and with peer-reviewed publications within the scope of our SRMA. Search strategy was not provided, to avoid biasing the experts' search. After the final list of studies was completed, all the databases were again consulted to retrieve errata, corrigenda/corrections, or retractions of the included studies, as some may have been found to be fraudulent or retracted (Higgins et al., 2019). Conference abstracts, books and book chapters, editorials, letters to the editor, feasibility and pilot studies, trial registrations, reviews, essays, or original research in non-peer-reviewed journals.

Participants
Participants of any age, sex, health, and training status. Non-human animals (e.g., rats).
Stretching as the training intervention per se.
Absence of comparators. Multimodal comparators that include stretching.

Secondary outcomes
Biochemical markers of muscle damage; muscle and tendon stiffness; adverse effects from the post-exercise interventions.
No outcomes related to strength and/or ROM for short-term recovery. AND No outcomes related to DOMS, strength and/or ROM for delayed recovery.

Timeframe for follow-up
Maximum 72 h post-intervention, based on the existing literature (Van Hooren and Peake, 2018).
No study will be excluded if presenting values >72 h, but these will not be considered for analysis.
*If an additional exercise bout or an active recovery protocol is included between the initial session and the delayed markers (e.g., application of a second exercise bout at 48 h while providing data regarding recovery from the initial bout at 72 h), then only values until that second bout (i.e., 48 h) will be considered.

Study Selection
The screening process started on January 4, 2021 for the first wave of searches. The screening process for the updated searches started on February 17, 2021. JA and FMC conducted the initial search, screening of titles and abstracts and analysis of full texts independently. HS and PM later reviewed the entire process. Thirdly, a step-by-step comparison of the whole process was conducted, and any disagreements motivated a new analysis of the records in question. Discussion regarding manuscripts suitability was performed with all the involved authors in the study selection process, until consensus was achieved. The same process was then used to analyze the reference lists of the included studies to verify if additional relevant studies were available. External experts were contacted to provide additional suggestions of relevant studies based on inclusion criteria and on our preliminary list. JA and FMC independently verified the list to decide on inclusion of the suggested studies. HS and PM then reviewed this process. The same process was applied to search for errata of the included studies.

Data Extraction
All extracted data were defined a priori, to avoid biased analyses (Spieth et al., 2016). Study characteristics: (i) sample size and features (e.g., age, sex, health, training status, country, continent; single or multicenter study); (ii) length and characteristics of the interventions and comparators (e.g., weekly frequency, type/modality of stretching and comparators, volume, intensity, duration, supervision ratio, qualification of supervisors, description of co-interventions); (iii) adherence rates to training (i.e., attendance percentage); (iv) funding sources and potential conflicts of interest. Data specific to crossover studies (Elbourne et al., 2002;Spieth et al., 2016): (i) length of wash-in and wash-out periods; (ii) carryover effects, if there were any. Primary outcomes for short-term recovery (≤1-h postintervention): strength levels (e.g., maximum voluntary contraction) and joint ROM immediately or until 1 h after exertion. Primary outcomes for delayed recovery: DOMS, strength levels, and joint ROM at 24, 48, and 72 h, which are considered theoretically relevant (Van Hooren and Peake, 2018) and are commonly assessed periods on studies investigating this subject matter (Bonfim et al., 2010;Torres et al., 2013).
Secondary outcomes: Biochemical markers (e.g., plasma creatine kinase; blood lactate concentration); muscle and tendon stiffness; adverse effects during the post-exercise interventions (type, intensity or severity, time points). The timings described in the previous paragraph were considered for secondary outcomes as well.
Outcomes were only considered for analysis in case there was no additional exercise bout between the initial session and the delayed recovery timeframe. For all primary and secondary outcomes, description of measurement tools and metrics was included (Higgins et al., 2019) and both significant and non-significant results were considered (Spieth et al., 2016). Furthermore, parallel and cross-over trials were combined as long as the latter did not have significant carryover effects (Elbourne et al., 2002). JA and FMC completed initial data extraction independently. HS and PM later reviewed the entire process and consensus had to be achieved. The data required for meta-analysis was fulfilled by JA and FMC and then reviewed by HS and PM. RRC provided a final verification of the quality of data inserted into the table.

Risk of Bias in Individual Studies
Bias refers to systematic errors that can threaten the internal validity of an RCT (Spieth et al., 2016). RoB was assessed using the revised Cochrane risk-of-bias tool for randomized trials (RoB 2) (Sterne et al., 2019), which consists of five dimensions, i.e., bias arising due to: (i) the randomization process; (ii) deviations from intended interventions; (iii) missing outcome data; (iv) measurement of the outcome; and (v) selection of the reported result. JA and FMC independently assessed RoB for all studies. After the first assessment, tables were compared and disagreements were discussed, with a subsequent re-analysis of the situation. Finally, HS and PM reviewed the assessments to ensure the quality of the evaluations. For assessing RoB in parallel trials, the Excel tool ROB2_IRPG_beta_v7 (Cochrane) was used. For crossover trials, the Excel tool ROB2.0_IRCX_beta (MRC | Hubs for Trials Methodology Research) was planned to be used. However, this tool is outdated. Following the most upto-date Cochrane guidelines for applying RoB 2 to individual cross-over trials (Higgins et al., 2020), the five domains can be assessed following the structure of parallel trials. However, an extra dimension (Domain S) is added. Therefore, we used the ROB2_IRPG_beta_v7, with manual addition of Domain S.

Summary Measures
It is possible to use two studies in a meta-analysis (Valentine et al., 2010), but we chose to establish a minimum of three studies (Moran et al., 2018;García-Hermoso et al., 2019;Skrede et al., 2019) to avoid small sample sizes (Abt et al., 2020;Lohse et al., 2020). Pre-and post-intervention means and standard deviations (SDs) were converted to Hedge's g effect size (ES) (García-Hermoso et al., 2019;Skrede et al., 2019). In case the study instead provides 95% confidence intervals (CIs) or standard errors of mean (SEM), means and standard deviations were obtained from 95% CI or SEM, using Cochrane's RevMan Calculator for Microsoft Excel (Drahota and Beller, 2020). In case data for primary outcomes was presented only in graphical form, a validated software (r = 0.99, p < 0.001), WebPlotDigitizer, version 4.4 (Rohatgi, 2020) was used to extract data, with all values rounded to two decimal places. In these cases, the main author extracted data from the graphs, and an outside researcher, not involved in this work (see section Acknowledgments), performed an independent data extraction. Reliability was calculated through Cronbach's Alpha, using SPSS Statistics version 27 for Mac (IBM).
The inverse variance random-effects model for meta-analyses was used because it allocates a proportionate weight to trials based on the size of their individual standard errors (Deeks et al., 2008) and enables analysis while accounting for heterogeneity across studies (Kontopantelis et al., 2013). The ESs were presented alongside 95% CIs and interpreted using the following thresholds (Hopkins et al., 2009): <0.2, trivial; 0.2-0.6, small; >0.6-1.2, moderate; >1.2-2.0, large; >2.0-4.0, very large; >4.0, extremely large. Heterogeneity was assessed using the I 2 statistic, with values of <25, 25-75, and >75% considered to represent low, moderate, and high levels of heterogeneity, respectively (Higgins and Thompson, 2002). Publication bias was explored using the extended Egger's test (Egger et al., 1997). To adjust for publication bias, a sensitivity analysis was conducted using the trim and fill (Duval and Tweedie, 2000), with L0 as the default estimator for the number of missing studies (Shi and Lin, 2019). Analyses were performed in the Comprehensive Meta-Analysis program (version 2; Biostat, Englewood, NJ, USA). Statistical significance was set at p ≤ 0.05.

Moderator Analyses
These analyses were planned but could not be performed. Details on planned moderator analysis can be found in the Supplementary Materials.

Confidence in Cumulative Evidence
For RCTs, GRADE starts assuming high quality, which can be downgraded according to five dimensions (Zhang et al., 2019). In addition to RoB, inconsistency (heterogeneity) and publication bias, which have already been addressed, indirectness and imprecision (using 95% CIs) were assessed independently by JA and FMC and verified by HS. These authors also estimated the overall quality and confidence in cumulative evidence.

Study Selection
Initial search retrieved 16,851 results [Cochrane Library: 13 reviews and 621 trials; EBSCO: 1,704; PEDro: 21; PubMed: 2,421; Scielo: 12; Scopus: 5,253; SPORTDiscus: 734; Web of Science (all collections): 6,072]. Automated removal (EndNote function) of 6,635 duplicates resulted in 10,216 records. Manual removal of additional 2,333 duplicates resulted in 7,882 records to be screened. The first stage of screening titles and abstracts was based on study type (first inclusion criteria) and resulted in the exclusion of 2,101 records. The second stage of screening started with 5,781 records and 5,481 studies that were clearly out of scope (e.g., exercise-related studies not addressing the theme of our work, non-exercise related studies) were removed. Finally, starting with 300 records, the third stage of screening applied the PICOS criteria, and further excluded 278 studies. In these three stage-screening processes, exclusion criteria were defined hierarchically, i.e., if a paper had several reasons for exclusion, its exclusion would be based on the first criteria it failed to fit. Finally, two records had untraceable full texts, with discontinued links, disappearance from databases from where they were retrieved, and even not emerging in searches within the journals where they were supposedly published.
The updated searches retrieved 199 new records [Cochrane Library: 1 review and 8 trials; EBSCO: 49; PEDro: 0; PubMed: 53; Scielo: 3; Scopus: 25; SPORTDiscus: 7; Web of Science (all collections): 53]. Removal of duplicates results in 121 records, of which 14 were excluded due not fitting study type, 60 being non-related to exercise, 40 being related to exercise but out of scope, and six did not comply with PICOS criteria. More indepth information concerning the screening can be found in Supplementary Table 1. Therefore, 21 records were considered eligible for analysis of the full text (20 in the initial searches and one in the updated searches). While most were written in English, one was in Portuguese (Bonfim et al., 2010), one in Greek (Kokkinidis et al., 1998), and three in Korean ( et al., 2010;Oh, 2013;Kang and Park, 2018). A translator was hired for the Korean studies, and another for the Greek study.
At this stage, 12 records were excluded, with reasons. The study by Apostolopoulos et al. (2018) was excluded because the interventions were not supervised. However, they have interesting results that we will explore briefly here. Since they applied stretching for three consecutive days after the eccentric exercise protocol, only results at 24 h were considered. The authors used a 90% CIs (and not the more common 95% CIs) to compare low-intensity and high-intensity stretching to a control group using passive rest. Despite the authors' claims, all 90% CIs passed through zero, and no differences were observed at 24 h between the stretching groups and the controls for DOMS, eccentric and isometric peak torques of knee extensors, creatine kinase (U/L), and high-sensitivity C-reactive protein.
The study of Boobphachart et al. (2017) was excluded because the stretching intervention was performed three times per day and, furthermore, was unsupervised.
The study of Cha and Kim (2015) was excluded because both groups included some form of stretching, therefore inhibiting the comparison of stretching with alternative protocols and failing our PICOS criteria. The study of Duffield et al. (2014) was excluded because both the training interventions and the protocols were applied twice a day. Furthermore, one of the protocols included not only immediate measures (15-min coldwater immersion), but also ongoing measures such as 3 h of wearing full-body compression garments, plus abiding by sleephygiene recommendations in that night. The study of Gulick et al. (1996) was excluded because randomization was compromised. The authors created seven groups with 10 participants each. When a participant would quit, they would simply recruit a new participant to the group, therefore compromising both randomization and baseline values for each group. In addition, no details were provided concerning how these new subjects changed the values for each variable.
The study of Kang and Park (2018) was excluded because the exercise intervention lasted 20 min, while the post-exercise stretching protocol consisted of 5 min of so-called preparation exercises, followed by 30 min of stretching, followed by 5 min of so-called clean-up exercises. Therefore, not only did postexercise recovery last 200% more than the exercise intervention (thereby, being akin to a stretching intervention per se and failing our inclusion criteria), but also the recovery intervention was not exclusively reliant on stretching (again, falling our inclusion criteria). The study of McGlynn et al. (1979) was excluded because stretching was applied immediately post-exercise, but also repeated at 6, 25, 30, 49, and 54 h post-exercise. Therefore, even the 24 h assessments could not be attributed to stretching performed immediately following an exercise bout. Incidentally, the authors reported that both the stretching and biofeedback groups observed a reduction in EMG muscle activity on the biceps brachii in comparison with a passive control group, but they had no effect on perceived pain.
The study of Oh (2013) was excluded because the cooldown protocols were not stretching-based. The study of Pooley et al. (2020) was a cross-over study that was excluded because randomization was compromised: while after "home" fixtures, the participants were randomized to cold-water immersion or cycle ergometer, in "away" fixtures stretching was always prescribed. The study of Robey et al. (2009) was excluded because the authors detail, in the manuscript, that the crossover was only semi randomized, and therefore does not meet our inclusion criteria. In any case, the main characteristics and results from this study have been addressed in the introduction, which was written prior to our searches. The study of Xanthos et al. (2013) was excluded because the so-called traditional recovery group was multimodal. The study of et al. (2010) was excluded because the cool-down protocol was multimodal.
Therefore, nine studies fulfilled all inclusion criteria (Kokkinidis et al., 1998;Mika et al., 2007;Bonfim et al., 2010;Cè et al., 2013;Torres et al., 2013;McGrath et al., 2014;Muanjai and Namsawang, 2015;Cooke et al., 2018;César et al., 2021). As per protocol, in studies where the recovery methods were applied in multiple sessions (e.g., stretching after exercise and repeated at 24 and 48 h), only data before the second application was considered. To illustrate, in the studies of Cooke et al. (2018) and Kokkinidis et al. (1998), only the results at 24 h post-exercise were considered. Since a new recovery session was applied at 24 h, the results at 48 h and longer were not considered in the meta-analysis since results might not be attributable to the immediate post-exercise stretching protocol. In addition, and following protocol, multimodal recovery groups also including stretching were excluded from analysis (e.g., the group combining stretching followed by cold water immersion in the study of Muanjai and Namsawang, 2015). In the study of Torres et al. (2013), two groups were considered: the group performing eccentric exercise, and the group performing eccentric exercise followed by a single bout of stretching. The group that only performed stretching and the group that performed eccentric exercise followed by repeated bouts of stretching in the following days were excluded as they did not conform to our inclusion criteria.
A manual search within the reference lists of included studies revealed 26 potentially fitting titles (including updated searches). Of these, two had already been included in our final sample, and five had been excluded during the process. Nineteen studies had not appeared in our searches; screening of their abstracts resulted in the exclusion of five based on study type (e.g., abstract, review), and 10 based on failure to fulfill PICOS criteria. Of the four studies that required full text analysis, two fulfilled all PICOS criteria and were therefore added to our sample (Torres et al., 2005;West et al., 2014). In relation to Torres et al. (2005), and following the rules applied to Torres et al. (2013), only the two groups meeting the criteria were considered for analysis. Subsequently, eight experts were invited to contribute with additional relevant studies. Two experts declined the invitation due to lack of time, while five experts did not respond. One expert responded that our list was thorough and did not make any additional recommendation. Finally, errata, corrigenda, corrections, and retractions were searched for the included studies, but none was found. Therefore, 11 studies were included for qualitative analysis (n = 289), of which 10 could integrate quantitative analysis (n = 280, n = 229 after exclusion of groups that did not fulfill PICOS criteria). The process is summarized in the PRISMA flow diagram (Figure 1).
All studies had at least one group performing post-exercise stretching as an attempt to mitigate the negative effects of the soreness-inducing protocols. Active static stretching was used in four studies (Kokkinidis et al., 1998;Bonfim et al., 2010;West et al., 2014;Cooke et al., 2018), passive stretching in six (Torres et al., 2005(Torres et al., , 2013Cè et al., 2013;McGrath et al., 2014;Muanjai and Namsawang, 2015;César et al., 2021), and PNF in two (Mika et al., 2007;McGrath et al., 2014). McGrath et al. (2014) used both passive static stretching and PNF. No study used dynamic stretching. Almost all the post-exercise stretching protocols targeted the lower limbs, with one study targeting the upper limbs (César et al., 2021), and lasted between ∼1 min (McGrath et al., 2014) and 30 min (West et al., 2014;Cooke et al., 2018). Intensity of stretching was measured using only subjective feelings during the exercise, ranging from "subjects perceiving a slight feeling of stretching (. . . ), without generating discomfort" (Bonfim et al., 2010) to "until subjects felt a maximal stretch of the hamstrings" (McGrath et al., 2014) or "until the greatest discomfort was reported by the participants" (César et al., 2021).
One study explicitly stated that there were no adverse effects to report (Muanjai and Namsawang, 2015), while the other studies made no mention to it. We further highlight that two studies had potentially relevant conflicts of interest, as the company manufacturing the anti-gravity treadmill provided financing for the research (West et al., 2014;Cooke et al., 2018).

Risk of Bias Within Studies
Cochrane's RoB 2 tool evaluates RoB in five different dimensions (Sterne et al., 2019), the second of which subdivided into two parts. Here, an intention-to-treat analysis was considered. In terms of outcomes, RoB was only assessed for the primary outcomes (i.e., strength, ROM, and DOMS). None of the included studies had a pre-registered protocol. However, one had a specific reference to a grant (Torres et al., 2013), and another to an approval number by an Ethics Committee (Bonfim et al., 2010). In both cases, a pre-study protocol had to exist, and so we have contacted the authors. The corresponding author of Bonfim et al. (2010) provided the trial protocol, which also contained a statistical analysis plan. The main author of Torres et al. (2013), which was the recipient of the grant, was contacted, but unfortunately did not have the original project, which is comprehensible given the timeline. Since some studies had more than one outcome, assessments for domains 4 and 5 could have multiple assessments for each study. The complete assessments (i.e., one assessment per outcome per study) can be found in Supplementary Table 2. Table 3 presents the worst-case scenario a After excluding the subjects of the multimodal recovery group, because it also included stretching and therefore had to be excluded due to PICOS; b After exclusion of Group 1, since stretching was the intervention per se, and not a post-exercise application; c After exclusion of Group 1, since stretching was the intervention per se, and not a post-exercise application, and after exclusion of group 4, which had multiple application/bouts of the recovery intervention.
Frontiers in Physiology | www.frontiersin.org  for each study, i.e., considering the outcome for which the risk of bias was higher. These results can be visualized in Figure 2, which exhibits the percentage distribution of RoB for domains 1-5 and overall bias considering the worst assessment for each study. Overall RoB was high in 72.7% of the studies and presented some concerns in 27.3%. All studies presented problems with the randomization process: no description of how randomization was achieved and whether allocation sequence was properly concealed and, in 27.3% of the studies, baseline values suggested problems with the randomization process. Moreover, 72.7% of studies had high RoB in measurement of the outcome, mostly because testers were usually not blinded, and some outcomes were particularly prone to being influenced by knowledge of the intervention received.
There was low RoB arising from deviations from intended interventions and from missing outcome data in 90.9% of the papers. Finally, although 90.9% of papers presented some concerns for RoB arising from selection of the reported result, this resulted mostly from lack of pre-registered protocols, and our opinion upon reading the studies is that the authors provided an honest and complete reporting. Of the crossover studies, one had high RoB for carry-over effects (Cè et al., 2013) and, following protocol, was excluded from meta-analysis. However, it still integrated the qualitative review.

Results of Individual Studies
Primary outcomes were registered on the form of means ± SDs, except for Cè et al. (2013), that used means ± SEM. This study, in particular, had a graph from which we felt we could not extract reliable data. Allied to the fact that this study could not enter the meta-analytical calculations, we chose not to extract the data from the graph, and only present the qualitative results provided by the authors. For values extracted from graphs (Mika et al., 2007;Bonfim et al., 2010;Muanjai and Namsawang, 2015;Cooke et al., 2018;César et al., 2021), Cronbach's Alpha values were 0.991 (means) and 0.981 (SDs). The results of individual studies are compiled in Table 4.
Primary outcomes were any assessments related to strength, ROM and/or soreness, both short-term (i.e., until ≤1-h postrecovery) and delayed (24, 48, and 72 h post-recovery). These outcomes were useful only if there were pre-exercise and post-recovery assessments. Short-term effects were reported for strength-related measures in six studies (Torres et al., 2005(Torres et al., , 2013Mika et al., 2007;Cè et al., 2013;Muanjai and Namsawang, 2015;César et al., 2021), ROM in one study (McGrath et al., 2014;Muanjai and Namsawang, 2015), and DOMS in three studies (Torres et al., 2005(Torres et al., , 2013Muanjai and Namsawang, 2015). Three studies had no short-term assessments (Kokkinidis et al., 1998;Bonfim et al., 2010;West et al., 2014). One study mentioned having data at 15-and 30-min after recovery, but that data only applied to secondary outcomes (Cooke et al., 2018). With the exception of César et al. (2021), all strength-related assessments were performed for the lower limbs, and this was valid also for delayed assessments.
Delayed assessments were performed for strength-related variables in five studies (Torres et al., 2005(Torres et al., , 2013; West et al., 2014; Muanjai and Namsawang, 2015;Cooke et al., 2018), ROM in one (Muanjai and Namsawang, 2015), and DOMS in seven (Kokkinidis et al., 1998;Torres et al., 2005Torres et al., , 2013Bonfim et al., 2010;McGrath et al., 2014;Muanjai and Namsawang, 2015;Cooke et al., 2018). Three studies did not have delayed outcomes (Mika et al., 2007;Cè et al., 2013;César et al., 2021). Although Kokkinidis et al. (1998) assessed delayed effects on strength and ROM, they presented only means, without any measure of variation that could help to better interpret the results. As previously explained, if the delayed assessments were conducted after a new bout of the recovery protocol, they would be discarded, as the effects of the first bout could no longer be assessed. Of the studies including delayed assessments, four had data for the three timepoints defined in our protocol (i.e., 24, 48, and 72 h) (Torres et al., 2005(Torres et al., , 2013Bonfim et al., 2010;Muanjai and Namsawang, 2015), one study had data for 24 and 48 h postrecovery protocol (McGrath et al., 2014), and three had data for 24 h post-recovery only (Kokkinidis et al., 1998;West et al., 2014;Cooke et al., 2018).
Based on their data, some studies concluded that post-exercise stretching was not an effective recovery strategy, and was not superior to comparator interventions (West et al., 2014;Cooke et al., 2018), including passive recovery, i.e., rest (Bonfim et al., 2010;Cè et al., 2013;César et al., 2021). In the study of Kokkinidis et al. (1998), the authors stated that stretching and cryotherapy were superior to passive rest, but these effects were not observed at 24 h, only at 48 h; moreover, after 24 h, the experimental groups had an additional recovery bout applied, but without the soreness-inducing exercise. In study of McGrath et al. (2014), PNF was not superior to passive recovery, and the static stretching group was the only one not showing significant decreases in DOMS at 24 or 48 h.
In the study of Mika et al. (2007), short-term strength levels recovered faster in the low-intensity cycling group than in the stretching or passive rest groups. In two studies (Torres et al., 2005(Torres et al., , 2013, the authors stated that post-exercise stretching did not impair recovery in terms of strength and DOMS when compared to a passive rest group, but it did not improve recovery either. Finally, Muanjai and Namsawang (2015) concluded that both stretching and cold-water immersion could be used to improve post-exercise recovery. However, this conclusion is not sustained on their data, as DOMS only returned to baseline at 96 h post-recovery protocol, strength levels and ROM after 48 h, and vertical jump was still not back to baseline even after 96 h. Moreover, without a passive recovery group to compare to, no statement can be provided regarding acceleration of recovery.

Synthesis of Results
As stipulated in the protocol, cross-over trials would only be combined with parallel trials if there were no significant carryover effects (Elbourne et al., 2002). This was not guaranteed in the study of Cè et al. (2013), which was therefore excluded from meta-analysis. Across the remaining nine studies, as previously presented, there was considerable variation concerning the soreness-inducing protocols, the comparators to stretching, the outcome domains, the measurements within those outcome domains, and the timepoints of assessing the outcomes. Our protocol had stipulated three primary outcomes (strength, ROM, and DOMS) across four different timepoints (short-term, i.e., maximum 1 h after the recovery intervention; and 24, 48, and 72 h after the recovery intervention). After analyzing the outcomes and timepoints in each study, and also considering the comparator protocols, we found that only a few meta-analytical comparisons were feasible.

Short-Term Effects on Strength, Stretching vs. Passive Recovery (Rest)
Three studies had comparable data (i.e., strength measures of the knee extensors) to afford this meta-analysis (Torres et al., 2005(Torres et al., , 2013Mika et al., 2007). One study used PNF stretching (Mika et al., 2007) and the others used passive static stretching and compared this intervention to passive rest. Although the study of César et al. (2021) had strength assessments, they were for the upper limbs, more specifically grip strength, and so we decided not to compare it with the remaining studies. In RoB assessments considering this outcome, these studies had an Sit-and-reach, muscle Soreness Scale (1-6) at 24 and 48 h. Sit-and-reach also immediately post-recovery.
No differences between groups. No reporting of adverse effects.
CWI, Cold-water immersion; DOMS, Delayed onset muscular soreness (more is worse); HG, Handgrip; MVC, Maximum voluntary contraction; ROM, Range of motion; VAS, Visual Analog Scale (greater values mean worse outcomes). a As defined in our protocol. b 96 h not considered, as per protocol.
Frontiers in Physiology | www.frontiersin.org overall classification of "some concerns, " meaning none of the domains presented high RoB. In domain 4 (measurement of the outcome), they had low RoB. For within-group effects, three studies provided data for shortterm strength recovery, involving three stretching groups (pooled n = 33). Results showed that post-exercise stretching protocols did not allow participants to recover their basal strength level (ES = −0.85; 95% CI = −1.53 to −0.17; p = 0.015; I 2 = 80.4%; Egger's test p = 0.396; Figure 3).

Delayed Effects (24 h) on Delayed Onset Muscle Soreness, Stretching vs. Passive Recovery (Rest)
Five studies had comparable data to assess DOMS at 24 h (Kokkinidis et al., 1998;Torres et al., 2005Torres et al., , 2013Bonfim et al., 2010;McGrath et al., 2014). Two used active static stretching (Kokkinidis et al., 1998;Bonfim et al., 2010) and three passive static stretching (Torres et al., 2005(Torres et al., , 2013McGrath et al., 2014). All had at least one comparator that passively recovered (i.e., rest). The study of Bonfim et al. (2010) had two assessments of DOMS; here, we used the assessment through the visual analog scale, as the other studies also used similar scales. The four studies had high RoB in measurement of this outcome, so all results should be considered with caution.

Additional Analysis
Due to the small number of studies included in each metaanalysis, additional analysis, and sensitivity analyses were not performed. In each analysis, RoB was similar in all studies, and so FIGURE 3 | Forest plot denoting short-term strength recovery level in participants that completed post-exercise stretching protocols. Values shown are effect sizes (Hedges's g) with 95% confidence intervals (CI). The size of the plotted squares reflects the statistical weight of each study. The black diamond reflects the overall result. Note: negative values denote that post-exercise stretching protocols did not allow participants to recover their basal strength level (i.e., 0.00 in the figure).
FIGURE 4 | Forest plot denoting short-term strength recovery level in participants that completed post-exercise passive recovery protocols. Values shown are effect sizes (Hedges's g) with 95% confidence intervals (CI). The size of the plotted squares reflects the statistical weight of each study. The black diamond reflects the overall result. Note: negative values denote that post-exercise passive recovery protocols did not allow participants to recover their basal strength level (i.e., 0.00 in the figure).
we decided not to assess the effects of RoB on the results. Metaregression was not performed due to having <10 studies with sufficient commonalities.

Confidence in Cumulative Evidence
Confidence in cumulative is equivalent to quality of the evidence (Higgins et al., 2019). GRADE assessments are presented in Table 5. Overall, we have very little confidence in the effect estimate, and the true effect is likely to be substantially different from the estimate of effect.

Summary of Evidence
Stretching has been traditionally prescribed for the cool-down phase of training sessions, under the premise that it enhances recovery (ACSM, 2018; American Heart Association, 2020). But this premise has been questioned by previous assessments of the literature (Herbert and Gabriel, 2002;Henschke and Lin, 2011;Herbert et al., 2011). Therefore, we have conducted a systematic review with meta-analysis of supervised RCTs on the effects of post-exercise stretching on short-term (i.e., ≤1 h) and delayed (24, 48, and 72 h) recovery of strength levels, ROM, and DOMS. Searches were conducted in eight electronic databases post-protocol approval, on December 23 and 24 of 2020, and updated on February 16, 2021. Of the 17,050 records emerging from the searches and 25 additional records emerging from manual searches within reference lists, 11 RCTs were eligible for qualitative analysis (n = 289), and 10 for quantitative analyses (n = 280, with n = 229 after excluding groups not fulfilling PICOS criteria). Due to the overall small sample size, generalization to a broader population is not advised.
Active static stretching, passive stretching and PNF were used for post-exercise recovery, but no protocol adopted dynamic FIGURE 5 | Forest plot of changes in short-term strength recovery after participating in post-exercise stretching protocols compared to control conditions (i.e., passive recovery). Values shown are effect sizes (Hedges's g) with 95% confidence intervals (CI). The size of the plotted squares reflects the statistical weight of each study. The black diamond reflects the overall result.
FIGURE 6 | Forest plot denoting 24-h post-exercise delayed onset of muscle soreness (DOMS) in participants that completed post-exercise stretching protocols. Values shown are effect sizes (Hedges's g) with 95% confidence intervals (CI). The size of the plotted squares reflects the statistical weight of each study. The black diamond reflects the overall result. Note: positive values denote that post-exercise stretching protocols did not allow participants to recover their basal DOMS level (i.e., 0.00 in the figure).
stretching. Overall, analysis of individual studies showed that there was no evidence that stretching enhanced recovery in comparison to passive recovery (i.e., rest) or to alternative recovery modalities, such as cycling and cold-water immersion. There was no evidence to the contrary, i.e., that stretching impaired recovery. Even for secondary outcomes, such as blood lactate and serum creatine kinase, for example, no strong case can be made for stretching accelerating or improving recovery. Furthermore, overall RoB was high, meaning that this field of research is lacking in terms of methodological design. Especially problematic was the wide use of unblinded testers, even for outcomes with greater degree of subjectivity.
Due to the diversity of outcomes and timepoints of assessments, only four meta-analytical comparisons were possible, all between stretching and passive recovery (i.e., rest): strength levels at ≤1 h, and DOMS at 24, 48, and 72 h. Overall, stretching was no more effective than passive recovery in returning strength levels and DOMS to baseline values. Heterogeneity of the meta-analysis (I 2 ) was high for withingroup (pre-post) comparisons and low for between-group comparisons for strength outcomes at ≤1 h of recovery, moderate (within) and low (between) for DOMS at 24 h, low to moderate (within) and low (between) for DOMS at 48 h, and low (within and between) for DOMS at 72 h. Information in terms of recovery of ROM after different recovery protocols was insufficient to run a meta-analysis. There was no evidence of publication bias.

Poor External Validity
Overall, the studies included in our analysis may be considered to have poor external validity. In terms of population, they only apply to adults under 40-years-old, with no studies being performed in children, teenagers or adults older ≥40-years-old. And only two of the 11 studies included women in their sample: FIGURE 7 | Forest plot denoting 24-h post-exercise delayed onset of muscle soreness (DOMS) in participants that completed passive recovery (control conditions) protocols. Values shown are effect sizes (Hedges's g) with 95% confidence intervals (CI). The size of the plotted squares reflects the statistical weight of each study. The black diamond reflects the overall result. Note: positive values denote that passive recovery protocols did not allow participants to recover their basal DOMS level (i.e., 0.00 in the figure).
FIGURE 8 | Forest plot of changes in 24-h post-exercise delayed onset of muscle soreness (DOMS) after participating in post-exercise stretching protocols compared to control conditions (i.e., passive recovery). Values shown are effect sizes (Hedges's g) with 95% confidence intervals (CI). The size of the plotted squares reflects the statistical weight of each study. The black diamond reflects the overall result.
50% of the sample in one study (McGrath et al., 2014) and unclear in another (Bonfim et al., 2010). As such, current results derive mainly from studies with men. As all subjects were healthy, it is unclear how subjects with injuries and/or pathologies would respond. Furthermore, only two studies included recreationally trained subjects (West et al., 2014) or athletes (César et al., 2021).
The nature of the exercise protocols (pre-recovery) presents a number of problems that limit their external validity as well. While most studies used protocols that were likely to induce DOMS, in real-life settings coaches are unlikely to regularly try to elicit DOMS in their athletes or patients. And since most studies did not assess athletes, it is possible that results from the fatigueinducing protocols have been somewhat artificial, as most were conducted with populations not engaged in regular, structured physical activity, and thereby less well-adapted to the acute effects of fatiguing exercise. Lack of familiarity with the protocols may have exacerbated this effect. Moreover, the protocols were single-component or even single exercise, while real-life exercise sessions will more likely involve multiple components and/or multiple exercises. Also, most of the knowledge derives from studies focusing on the lower limbs, with only one study having assessed the effects of the upper limbs (César et al., 2021).
With one exception (Cooke et al., 2018), the fatigue-inducing protocols had very short durations, usually well below 30 min. Hardly will a real-life exercise session last ≤30 min, especially with athletic populations. Conversely the duration of recovery protocols was excessive in many cases, even reaching 30 min in duration (West et al., 2014;Cooke et al., 2018). The combination of very short exercise sessions with long recovery sessions does not seem practical. Also, six studies (∼55%) used individualized passive stretching. This means that one supervisor is required FIGURE 9 | Forest plot denoting 48-h post-exercise delayed onset of muscle soreness (DOMS) in participants that completed post-exercise stretching protocols. Values shown are effect sizes (Hedges's g) with 95% confidence intervals (CI). The size of the plotted squares reflects the statistical weight of each study. The black diamond reflects the overall result. Note: positive values denote that post-exercise stretching protocols did not allow participants to recover their basal DOMS level (i.e., 0.00 in the figure).
FIGURE 10 | Forest plot denoting 48-h post-exercise delayed onset of muscle soreness (DOMS) in participants that completed post-exercise passive recovery. Values shown are effect sizes (Hedges's g) with 95% confidence intervals (CI). The size of the plotted squares reflects the statistical weight of each study. The black diamond reflects the overall result. Note: positive values denote that post-exercise passive recovery did not allow participants to recover their basal DOMS level (i.e., 0.00 in the figure).
for every practitioner, something that will hardly be possible to implement in physical education classes, sports training, and even for the general gym-going population (exceptions would be those with access to a personal trainer).

Data Is Scarce, Heterogeneous, and Does Not Support Existing Guidelines
Considering that stretching is so often prescribed as a valid protocol for enhancing post-exercise recovery (ACSM, 2018), the reduced number of studies (n = 11) and small overall sample (n = 289) emerging from our searches, allied with a considerable diversity of exercise and post-exercise recovery protocols, demonstrate that data is too scarce and heterogeneous to support existing guidelines. Although absence of evidence is not evidence of absence, world-leading organizations should encourage further research in this field before promoting more definitive recommendations. Recommendations should not be provided in the absence of empirical support. At a minimum, guidelines should acknowledge that prescribing post-exercise stretching as a means of improving recovery is based on belief and not on data. In fact, enhancing recovery implies that recovery is accelerated and/or improved if post-exercise stretching is applied than if passive recovery (i.e., rest) is used. Our data does not sustain this belief. Indeed, >70% of the analyzed studies had one group performing passive recovery (i.e., rest), and stretching did not prove to improve recovery when compared to those controls. Perhaps the eventual benefits of post-exercise stretching are balanced by the extra fatigue that they add, although further research is required to better explore the mechanistic phenomena underlying these effects.
We strongly suggest that science should abide by the burden of proof. Until more (and better) data is collected, no case should be built for (or against) post-exercise stretching with the goal FIGURE 11 | Forest plot of changes in 48-h post-exercise delayed onset of muscle soreness (DOMS) after participating in post-exercise stretching protocols compared to control conditions (i.e., passive recovery). Values shown are effect sizes (Hedges's g) with 95% confidence intervals (CI). The size of the plotted squares reflects the statistical weight of each study. The black diamond reflects the overall result. Values shown are effect sizes (Hedges's g) with 95% confidence intervals (CI). The size of the plotted squares reflects the statistical weight of each study. The black diamond reflects the overall result. Note: positive values denote that post-exercise stretching protocols did not allow participants to recover their basal DOMS level (i.e., 0.00 in the figure). of improving recovery. Admittedly, post-exercise stretching may have other goals than improving recovery, but these were not addressed in our analysis.
What's Different in Relation to Previous Systematic Reviews on the Topic?
As mentioned in the introduction, previous SRMA addressed the topic of post-exercise stretching (Herbert and Gabriel, 2002;Henschke and Lin, 2011;Herbert et al., 2011). However, important differences in design exist in comparison with our review, beyond the natural update: (i) these reviews assessed the effects of both post-and pre-exercise stretching, while we focused solely on post-exercise stretching; (ii) they assessed the effects of stretching on DOMS and risk of injury, while we focused on DOMS, strength levels, and ROM; (iii) finally, they accepted non-randomized studies, while our review was limited to randomized studies; (iv) furthermore, we consulted more databases than those reviews. Therefore, it is not surprising that the list of included articles is largely different. Still, our review reinforces previous conclusions that post-exercise stretching does not confer protection from DOMS, while also showing that it does not accelerate (nor impairs) recovery and strength levels or ROM.

LIMITATIONS
The limited number of studies; the high RoB and high heterogeneity, allied to the diversity of designs and poor external validity advise against more definitive conclusions. Moreover, the included studies solicited extremely varied stretching intensities, but all were based in vague sentences to suggest the subjects the degree of stretching intended. And if stretching intensity is not properly described, any comparisons can be limited (Sands et al., 2013). Instead, we believe that stretching intensity could be more rigorously assessed with instruments such as the Stretching Intensity Scale (Freitas et al., 2015).

CONCLUSIONS
Overall, our data does not support nor contradicts the utilization of post-exercise stretching. Notwithstanding, if post-exercise stretching does not seem to enhance recovery in relation to passive recovery (i.e., rest), the implementation of the former among participants or athletes is, at least, questionable. Still, data is scarce, heterogenous, and overall confidence in cumulative evidence is very low. For now, recommendations on whether post-exercise stretching should be applied for the purposes of recovery are misleading, as the (insufficient) data that is available does not support those claims.
We suggest that future research on post-exercise recovery always pre-registers the protocol and adopts a randomized design, with proper description of how randomization was performed and whether allocation sequence was concealed. A passive recovery (i.e., rest) control group should always be included. Multi-component exercise sessions lasting ≥60 min, with recovery protocols lasting ≤15 min, would provide greater external validity to the findings. Studies with women and athletes should be reinforced, as studies with children, teenagers, adults ≥40 years and populations with pathologies and/or injuries are lacking and should be prioritized.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Materials, further inquiries can be directed to the corresponding author.