The Impact of Excluding Nonrandomized Studies From Systematic Reviews in Rare Diseases: “The Example of Meta-Analyses Evaluating the Efficacy and Safety of Enzyme Replacement Therapy in Patients With Mucopolysaccharidosis”

Nonrandomized studies are usually excluded from systematic reviews. This could lead to loss of a considerable amount of information on rare diseases. In this article, we explore the impact of excluding nonrandomized studies on the generalizability of meta-analyses results on mucopolysaccharidosis (MPS) disease. A comprehensive search of systematic reviews on MPS patients up to May 2020 was carried out (CRD42020191217). The primary endpoint was the rate of patients excluded from systematic reviews if only randomized studies were considered. Secondary outcomes included the differences in patient and study characteristics between randomized and nonrandomized studies, the methods used to combine data from studies with different designs, and the number of patients excluded from systematic reviews if case reports were not considered. More than 50% of the patients analyzed have been recruited in nonrandomized studies. Patient characteristics, duration of follow-up, and the clinical outcomes evaluated differ between the randomized and nonrandomized studies. There are feasible strategies to combine the data from different randomized and nonrandomized designs. The analyses suggest the relevance of including case reports in the systematic reviews, since the smaller the number of patients in the reference population, the larger the selection bias associated to excluding case reports. Our results recommend including nonrandomized studies in the systematic reviews of MPS to increase the representativeness of the results and to avoid a selection bias. The recommendations obtained from this study should be considered when conducting systematic reviews on rare diseases.


INTRODUCTION
The randomized clinical trials provide the strongest evidence regarding the efficacy of new therapeutic interventions (Higgins et al., 2019). Thus, they must be eligible for systematic reviews. On the contrary, nonrandomized designs, in addition to different design features, are variable in their susceptibility to bias (Reeves et al., 2017). Empirical evidence suggests that the observational studies are considered less rigorous, without a preplanned statistical analysis plan, with a higher risk of selection bias, and more affected by confounding than randomized clinical trials (Higgins et al., 2019;Morgan et al., 2019). Furthermore, the systematic review guidelines recommend that both randomized and nonrandomized studies with different design features should be analyzed separately (Higgins et al., 2019). Accordingly, nonrandomized studies and case reports are excluded from systematic reviews and only considered when there are no other design alternatives (Catalá-López et al., 2017;Pérez-López et al., 2017;Sampayo-Cordero et al., 2018;Sampayo-Cordero et al., 2019).
In the field of rare diseases, and specifically in mucopolysaccharidosis (MPS), the low prevalence of the disease makes the possibility of performing randomized clinical trials extremely difficult. Additionally, clinical trials usually include a relatively more homogeneous population according to predefined characteristics than clinical practice. Thus, the phenotypic and genotypic heterogeneity increases the difficulty to generalize the efficacy results from a clinical trial (Kahan et al., 2015;Frieden, 2017;Hong et al., 2018). Previous systematic reviews have stated that observational studies properly conducted could achieve evidence equivalent to that of randomized clinical trials (Concato et al., 2000;Sampayo-Cordero et al., 2018;Sampayo-Cordero et al., 2019). It is increasingly recognized that subject selection criteria and monitoring conditions of clinical trials usually differ from clinical practice, while prospective studies and case reports are nearer to standard clinical practice conditions (Kahan et al., 2015;Hong et al., 2018; on behalf of the ACMG Professional Practice and Guidelines Committee et al., 2020). Therefore, the exclusion of patients from nonrandomized studies and case reports could reduce the generalization of the meta-analyses results and introduce a selection bias in the study. Moreover, different methods have been proposed to combine information provided by clinical trials, clinical studies, and case reports (Bradley et al., 2017;Dornelles et al., 2017;Pérez-López et al., 2017. Previous systematic reviews on patients with MPS have evaluated the efficacy and safety of enzyme replacement therapies (ERTs). However, the selection strategies among different revisions and meta-analyses are quite heterogeneous. Some studies included only randomized trials and others allowed for the selection of different types of nonrandomized designs (Schrover et al., 2017;Gomes et al., 2019;Jameson et al., 2019).
Our proposal is to carry out a systematic review of metaanalyses that includes randomized or nonrandomized studies in MPS that evaluate the efficacy or safety of ERTs. The aim of the present study is to explore the impact of excluding nonrandomized studies on the generalizability of the meta-analysis results and the probability of selection bias. Additionally, we report the methods used to combine data from different study designs.

Data Sources and Searches
A comprehensive search of systematic reviews containing clinical information of MPS patients up to May 2020 was carried out on MEDLINE, the Cochrane Library (Cochrane Database of Systematic Reviews and protocols), and the International Prospective Register of Systematic Reviews (PROSPERO) databases as well as on the Latin American and Caribbean Literature on Health Sciences (LILACS). The search strategy retrieved citations from the MEDLINE database containing the "All fields" headings: "mucopolysaccharidosis" and "systematic review". In the Cochrane Library, the strategy retrieved citations containing the "complete text" headings: "mucopolysaccharidosis". The search strategy for the PROSPERO database retrieved citations containing the heading: "mucopolysaccharidosis". Finally, the search strategy for the LILACS database retrieved citations containing the "Title, Summary, Issue" headings: "mucopolysaccharidosis" and "systematic review". There was no restriction of dose, treatment duration, administration (via intravenous or intrathecal), type of study design, or language. All published studies up to May 20, 2020, were included in the search.

Study Selection
The original articles of the systematic reviews selected by an electronic search were obtained and reviewed. We selected those systematic reviews meeting the following selection criteria: Inclusion Criteria a) Systematic reviews of nonrandomized and/or randomized studies conducted in patients with MPS. b) Systematic reviews based on the assessment of ERT efficacy and safety.
Exclusion Criteria a) Nonsystematic reviews. b) Systematic reviews evaluating pharmacoeconomic endpoints, preclinical evaluations, or incidence of MPS. c) Systematic reviews evaluating only hematopoietic stem cell transplantation.

Quality Assessment
The study was prospectively designed to describe the methods used in MPS systematic reviews. The current meta-analysis is reported in accordance with the preferred reporting items for systematic reviews and meta-analysis (PRISMA) and meta-analyses and systematic reviews of observational studies (MOOSE) guidelines (Stroup et al., 2000;Liberati et al., 2009;Page and Moher, 2017). The protocol was published in the PROSPERO database (CRD42020191217). Two investigators entered findings into a database, independently reviewed citations/abstracts from the database and hand-searched and selected full relevant articles and documents for data extraction using the abovementioned preset criteria. Discrepancies were resolved through discussion or input from a third reviewer. The contributions of all authors are described in the author's contributions section.

Primary Outcome
The primary outcome is the number of patients from nonrandomized studies (clinical studies and case reports) included in the MPS reviews among all patients included (i.e., patients from both randomized and nonrandomized studies). The number of patients from nonrandomized studies was calculated for each clinical type of MPS. When there was more than one systematic review for an MPS type, we selected the systematic review with more patients included.
We have not summed all the patients from each systematic review because we would have counted some patients more than once. This measure was calculated to estimate the rate of patients excluded from the meta-analyses and qualitative synthesis where only randomized studies have been considered. Statistical guidance has stated that the more the missing values, the less the grade of evidence obtained (Little et al., 2012;Madley-Dowd et al., 2019). Patients who were reported in more than one nonrandomized study were counted once (Bradley et al., 2017).
However, in some randomized clinical trials, the patient followup was extended after completing the study and the placebo arm started to receive ERT, thus switching the study design to a prospective nonrandomized design for the analysis of long-term outcomes. As the study design and the evaluations performed in the randomized and nonrandomized stages of these trials were different, their patients have been considered twice (Wraith et al., 2004;Muenzer et al., 2006;Muenzer et al., 2011;Clarke et al., 2009;STRIVE;Investigators et al., 2014;Hendriksz et al., 2016). Secondary Outcomes 1) We qualitatively described the differences in patients selected and study conduct between randomized and nonrandomized studies included in a systematic review of the same MPS type. We evaluated these differences at four levels: treatment schedule, patients' characteristics, period of follow-up, and outcomes. The recommendations for the management of missing values in clinical trials suggest that the proportion of missing values is not as relevant as the reason and pattern. Thus, it was important to explore if these missing values were at random, or if there were differences in baseline characteristics between patients included and not included in the analyses (Little et al., 2012;Madley-Dowd et al., 2019). 2) We described the methods used to combine data from different study designs. 3) We estimated the number of patients who were included in a systematic review excluding case reports among all patients included in randomized, nonrandomized, and case reports studies. This measure calculates the number of patients excluded from the meta-analyses and qualitative synthesis if case reports were not considered. As stated earlier, patients included in both randomized trials and prospective study extensions were considered twice (Wraith et al., 2004;Muenzer et al., 2006;Muenzer et al., 2011;Clarke et al., 2009;STRIVE;Investigators et al., 2014;Hendriksz et al., 2016).

Statistical Methods
The primary variable is the rate of patients excluded from the meta-analyses and qualitative synthesis if only randomized studies were to be considered. The primary variable is reported as the percentage of patients for each MPS. Statistical guidance in single clinical trials has stated that bias is likely in analyses with more than 10% of missing data and that if more than 40% of data are missing in important variables, results should only be considered as hypothesis-generating (Little et al., 2012;Madley-Dowd et al., 2019). Thus, we tested the null hypotheses that the rate of patients excluded when nonrandomized studies are not considered is less than or equal to 10% (likely bias) and that is less than or equal to 40% (evidence degraded to exploratory). Both null hypotheses have been tested with a fixed sequence approach. First, we compared the observed rate and the exclusion rate with a 10% null hypothesis and continued with the next analysis (40% null hypothesis), only if the previous assessment met the requirements for statistical significance. We based our analysis on a one-sided binomial test (Sampayo-Cordero et al., 2020b). This primary objective was analyzed with a nominal alpha level of 0.025 onesided (equivalent to 0.05 two-sided). Multiplicity issues derived from analyzing the rate of excluded patients in each MPS type were corrected based on the Bonferroni method. We multiplied the p-values by the number of MPS analyzed (U.S. Department of Health and Human Services Food and Drug Administration et al., 2017). In addition, we also reported the 95% confidence interval for the rate of patients excluded. We summarized in each systematic review selected, the differences in patients selected and study conduct between randomized and nonrandomized studies, describing them in a narrative format for each MPS type. Additionally, the methods used to combine data from different study designs were reported in the same format.
The rate of patients excluded from the meta-analyses and qualitative synthesis, if case reports were not considered, was also analyzed in accordance with the primary endpoint methods. However, multiplicity adjustment was not performed.

Data Search Results
Database searches through May 20, 2020, identified 64 citations and 46 unique abstracts. Among the 46 studies identified, six were excluded after the abstract revision because they were not MPS studies. A total of 23 out of 40 communications with a full text review were excluded because they did not evaluate ERT efficacy or safety (N 16), were not systematic reviews (N 4), or were pharmacoeconomic studies (N 3). Thus, 17 reviews which met the inclusion criteria and evaluated ERT efficacy or safety in patients with MPS (N 6 in type I MPS, N 6 in type II MPS, N 1 in type IV-A MPS, N 3 in type VI MPS, and N 1 including patients with type I to type VI MPS) were included in the present study. The references of all abstracts screened and the reasons for exclusion are reported in Figure 1; Supplementary Material Table S1. The publication years ranged between 2007 and 2019 (see Table 1).

Primary Outcome
The rate of patients missing from systematic reviews when nonrandomized studies were excluded was significantly higher (p < 0.001) than 40% in all MPS types ( Table 1). We observed that at least 50% of MPS patients had been recruited in nonrandomized designs. In accordance with the rate of missing boundaries proposed, the results of MPS systematic reviews excluding nonrandomized trials should not be considered confirmatory but only hypothesis generating (see Table 1) (Little et al., 2012;Madley-Dowd et al., 2019).

Differences in Patients Included and Trials Conduct Between Randomized and Nonrandomized Studies Selected in Systematic Reviews
The qualitative analysis of patients included in randomized and nonrandomized studies suggested relevant differences in dose treatment schedule, inclusion criteria, extension of follow-up, and outcomes evaluated between the two types of studies. Therefore, there were some types of patients (with less frequent phenotypes, the youngest (<6 years) and oldest patients (>50 years), and special populations (pregnant women, infants, and suitable patients for HSCT and intrathecal therapies)), dose schedules, and clinically relevant outcomes (such as mortality, long-term efficacy, and extended safety profile) that were excluded from randomized trials. Even in the studies evaluating phase III extended follow-up with the same patients included in the randomized and prospective designs (placebo arm received ERT in an extended follow-up), the nonrandomized study provided results about long-term efficacy and safety, and randomized trials excluded these relevant outcomes (see Table 2).
FIGURE 1 | Flow diagram of systematic reviews that evaluated the efficacy and safety of enzyme replacement therapies in mucopolysaccharidosis. MPS, Mucopolysaccharidosis; ERT, Enzyme replacement therapy. a One systematic review included all types of MPS patients to evaluate the pretreatment and posttreatment prevalence and severity (Pal et al., 2015). This systematic review did not include outcomes of ERT for MPSIII and MPS IV. b We included a congress communication evaluating enzyme replacement therapy safety in MPS II (Almeida et al., 2018). c This systematic review only includes patients with Morquio A syndrome.

Methods Used to Combine Data From Different Study Designs in Each Systematic Review Selected
There were four types of strategies used to combine results from studies of different designs in systematic reviews (see Table 3): 1) The first method was to select studies with the same design and to exclude the other designs (only randomized trials or only prospective nonrandomized studies). The studies excluded are considered with a low level of evidence or inappropriate to cover the research question (El Dib and Pastores, 2007;El Dib, 2009;Brunelli et al., 2016;da Silva et al., 2016;Jameson et al., 2019). 2) The second approximation was to summarize the results of the studies with different designs and discuss the results qualitatively (Alegra et al., 2013;Pal et al., 2015;Schrover et al., 2017;Gomes et al., 2019). 3) The third method was to combine the results of studies with different designs qualitatively, based on a method to assess the strength of the evidence (very low, low, moderate, and high) for each outcome. This approach allows to use a reproducible methodology and to rank each outcome in accordance with its strength of evidence (Bradley et al., 2017;Dornelles et al., 2017;Pérez-López et al., 2017;Perez-Lopez et al., 2018).
4) The fourth method was to define two improvement criteria. One for individual patients included in the meta-analyses (e.g., increase in the 6-min walk test [6-MWT] over the baseline assessment) and another for the specific outcome or group of patients evaluated (e.g., a significant difference versus a 5% null hypothesis in the rate of patients with improvement in the walk test) (Almeida et al., 2018;Sampayo-Cordero et al., 2018;Kuiper et al., 2019;Sampayo-Cordero et al., 2019).

Rate of Patients Excluded From Systematic Reviews if Case Reports Were Excluded
The rate of patients missing from systematic reviews when case reports were excluded was neither statistically higher than 40% nor higher than 10%. However, the rate of patients tended to be statistically higher than 10% in two studies. Importantly, the systematic reviews with the objective to assess ERT outcomes in the overall population showed the lowest rate of patients excluded (around 3.3%) (Bradley et al., 2017;Gomes et al., 2019;Sampayo-Cordero et al., 2019). Alternatively, the systematic reviews with the objective targeting a specific patient subgroup (ERT initiation in adult age) showed the highest rate of patients excluded (around 16%) (Pérez-López et al., 2017; Pérez-López et al., Max: The maximum number of patients included in a systematic review for this MPS type. We selected this method of aggregation over sum or mean, because different systematic reviews compare the same patients and the same evaluations. a The studies included a randomized phase III trial and its follow-up extension. The placebo arm started to receive enzyme replacement therapy and thus, the study switched to a prospective nonrandomized design for the long-term outcomes. As the study designs and evaluations performed were different, the information provided for the same patient was different. So, these patients have been considered twice; as participants of the randomized studies and as participants of the nonrandomized studies (21-26). b According to the study objectives, only adult patients were considered (≥18 years). c According to the study objectives, only adult patients were considered (≥16 years).
Frontiers in Molecular Biosciences | www.frontiersin.org June 2021 | Volume 8 | Article 690615  , 2018). So, the exclusion of clinical reports is more likely to introduce a bias in subgroup analyses than in the results of the whole population (where the number of assessable patients is higher than those in subgroup analyses) (see Table 4).

DISCUSSION
Low clinical trial accrual of patients with rare diseases is an important constraint in evidence-based clinical practice. Additionally, the phenotypic and genotypic heterogeneity also   (2015) They included different types of study designs and mucopolysaccharidosis. They combine in each mucopolysaccharidosis the results of the same outcomes and types of designs in a meta-analysis. The noncombined results were summarized and discussed Type I El Dib and Pastores (2007) They selected one randomized study. They did not combine results from different study designs Dornelles et al. (2017) 1) They combined the median change to baseline or the incidence of some events. Comparison between experimental and control arms were not considered 2) Qualitative analyses included all the study designs selected. They used GRADE criteria (Guyatt et al. (2008), They selected two studies, the phase III clinical trial and their extension. They summarized the results from these studies without combining the results Additionally, the authors calculated the minimum clinically important difference in 6-MWT. They calculated it combining the results in a variety of diseases Type VI El Dib, (2009) They combined results from 2 randomized clinical trials. They did not select other study designs Brunelli et al. (2016) They selected one randomized study. They did not combine results from different study designs Gomes et al. (2019) The effectiveness of ERT was identified by a qualitative assessment of each study report. The outcomes were classified as primary and substitutive according to the authors' assessment MPS, muchopolysaccharidosis; ERT, enzyme replacement therapy; GRADE, grading of recommendations assessment, development and evaluation; HSCT, hematopoietic stem cell transplant; 6-MWT, 6-min walk test. produces a greater fragmentation of the disease and increases the difficulty to generalize the results from only one study (Bradley et al., 2017;Dornelles et al., 2017;Rath et al., 2017). Therefore, the systematic reviews on rare diseases represent a very useful tool for medical community to obtain maximum information from published data and identify areas for improvement (Pérez-López et al., 2017). Over the last years, our comprehension of complex, heterogeneous, highly prevalent diseases has been considerably increased due to significant improvements in molecular and genetic medicine. On the other hand, personalized medicine has fragmented complex diseases into multiple molecular subtypes, each one representing a rare disease (Bartlett and Parelukar, 2017;Jardim et al., 2017;Klein and Gahl, 2018). Thus, the research methods derived from rare diseases and the strategies to integrate multiple heterogeneous data from the literature are highly relevant (Schork, 2015;Pérez-López et al., 2017;Klein and Gahl, 2018;Sampayo-Cordero et al., 2019). One of the most important items when conducting a systematic review is the assessment of the validity of studies included according to stringent criteria. Usually reviews emphasize the risk of bias in their results. A common practice is to exclude nonrandomized and nonblinded designs (Higgins et al., 2019). However, our results suggest that more than 50% of available patients in the field of MPS are recruited in nonrandomized studies. Methodologies managing missing data suggested that this rate of missing values (>40%) in a usual clinical trial enables results to be used only as hypothesis generators (Little et al., 2012;Madley-Dowd et al., 2019). It would be considered that missingness in a variable collected in a clinical trial is a different concept to the exclusion of patients from nonrandomized studies for systematic reviews. But randomized clinical trials usually provide the strongest methodology to evaluate the efficacy of an intervention (Concato et al., 2000;Higgins et al., 2019). So, it is expected that missingness rates higher than 40% also could bias the results of a systematic review. However, we also considered that these results alone cannot be indicative of bias. Methods for missing data management have stated the relevance of the source of missing data (Little et al., 2012;Madley-Dowd et al., 2019). So, we also observed differences in the baseline characteristics of patients, the conduct of the study, and the outcomes evaluated between randomized and nonrandomized studies. The randomized trials usually recruit pediatric patients (Wraith et al., 2004;Muenzer et al., 2006). Importantly, the clinical manifestations of MPS in adulthood are different from pediatric patients (Lampe et al., 2019b;Stepien et al., 2020). So, the exclusion of nonrandomized studies prevents the assessment of the efficacy and safety of ERT in adulthood. In accordance, previous studies stated that the selection criteria and monitoring conditions of clinical trials usually differed from clinical practice. On the contrary, prospective studies and case reports are the nearest to standard clinical practice conditions (Kahan et al., 2015;Hong et al., 2018; on behalf of the ACMG Professional Practice and Guidelines Committee et al., 2020). We would like to say that excluding nonrandomized studies from a systematic review leads to exclude >50% of available patients and most of the data reported about adult population, alternative treatment schedules, mortality rates, long-term efficacy, and long-term safety.
In addition, some sections of the Cochrane guidelines recommend including only studies with high evidence that "can estimate causality with minimal risk of bias except to examine the case for performing a randomized trial by describing the weakness of the available evidence" (Higgins et al., 2019). In the same direction, some methodologists did not defend that excluding nonrandomized studies from a systematic review results in a selection bias. They argue that "a biased effect estimate from a systematic review may be more harmful to future patients than no estimate at all, particularly if the people using the evidence to make decisions are unaware of its limitations" (Peto et al., 1995;Higgins et al., 2019). However, this recommendation has three important false assumptions.
First, it is assumed that a clinical trial of any research issue is possible (e.g., the assessment of the long-term effects and mortality of an intervention that has improved the disease evolution cannot be carried out using a placebo control. Therefore, no new clinical trials would be performed regardless of how much emphasis is placed on successive reviews (Higgins et al., 2019)).
Second, it is assumed that the lack of epidemiological data is not harmful. However, without published evidence, The studies included a randomized phase III trial and its follow-up extension. The placebo arm started to receive enzyme replacement therapy. So, the study switched to a prospective nonrandomized design for the long-term outcomes. As the study designs and evaluations performed were different, the information provided for the same patient was different. So, these patients have been considered twice, as participants of the randomized studies and as participants of the nonrandomized studies.
Frontiers in Molecular Biosciences | www.frontiersin.org June 2021 | Volume 8 | Article 690615 physicians would be guided by their intuition and their clinical experience acquired from treating a limited number of patients. So, trials with a large sample size will always have a lower risk of bias than trials with a limited number of subjects (Guyatt et al., 2008;Guyatt et al., 2011;Morgan et al., 2019). Finally, this recommendation assumes that investigators should only be informed of facts where causality can be clearly established. The clinician deals with the patient and relies to a large extent on case reports, especially in rare and ultrarare diseases. Accordingly, there is an established tendency to report the available information with its degree of evidence, rather than not reporting it. The role of nonrandomized data in medical counseling cannot be underestimated (Scarpa et al., 2011;Morgan et al., 2019;Cardoso et al., 2020).
Overall, we consider that excluding most of the data reported about adult population, alternative treatment schedules, mortality rates, long-term efficacy, and long-term safety of a chronic treatment would produce a selection bias that could be easily quantified. In detail, if we excluded findings of nonrandomized studies and "low-quality" trials, a unique clinical trial published in 2004 could be considered. This means ignoring the clinical research done on this disease in the last 16 years (Jameson et al., 2019).
The HSCT represents another form of ERT. The HSCT has been accepted as MPS I therapy for its severe form in the early course of the disease. However, there is not a single RCT on HSCT in the field of MPS. Nonrandomized data have served to establish a consensus on the best target population and its most appropriate strategy of treatment of MPS-I patients (Scarpa et al., 2011). The evidence for these recommendations was not the same weight as if they were based on randomized clinical trials. However, these recommendations have a higher degree of evidence than the individual beliefs of a single investigator; and they had a great utility summarizing the available knowledge to guide clinical practice and the future investigations (Scarpa et al., 2011;Barth and Horovitz, 2018). Nowadays, HSCT is reconsidered as therapy in other forms of neuronopathic MPS after the development of new treatment techniques and the creation of umbilical cords bank and bone marrow donor registries (Barth and Horovitz, 2018).
Accordingly, the Cochrane guidelines accept the inclusion of nonrandomized study designs in systematic reviews under some constraints: "i) when randomized trials are unable to address the effects of the intervention on harm and long-term outcomes or in specific populations or settings; or ii) for interventions that cannot be randomized (e.g., policy change introduced in a single or small number of jurisdictions)" (Higgins et al., 2019).
In addition, the Cochrane guidelines recognize that there is not a general strategy for deciding which nonrandomized studies will be included in a systematic review (section 24.2.1.3). As a possible strategy the authors suggest including either nonrandomized trials with best available designs or nonrandomized studies with a strong design. As the authors suggest, and according to our experience of investigating the long-term outcomes in MPS, the latter strategy may lead to exclude all available studies. So, they recommend giving greater emphasis to the choice of the included studies (Higgins et al., 2019).
These results highlight the relevance of rare disease registries as valuable sources of information for systematic reviews (Montaño et al., 2007;Beck et al., 2014;Wood et al., 2014;Muenzer et al., 2017aMuenzer et al., , 2017bLampe et al., 2019a). Although the methodology presents a higher risk of bias than clinical trials, a global registry is the best alternative for evaluating the longterm efficacy of ERTs and the survival of patients treated with these therapies (Burton et al., 2017). In the absence of data from other designs, it is reasonable to consider these data in the systematic reviews of rare diseases, with an assessment of the methodological flags and possible bias (Bradley et al., 2017;Wikman-Jorgensen et al., 2020).
Usually, case reports are never included in systematic reviews, unless the study indication is an ultrarare disease with only case reports available (Sampayo-Cordero et al., 2018;Sampayo-Cordero et al., 2019). However, our results suggest that the smaller the number of patients in the reference population, the larger the selection bias associated to excluding case reports Sampayo-Cordero et al., 2018;Sampayo-Cordero et al., 2019). Accordingly, the selection of case reports should be considered in an MPS systematic review if subgroup analyses were planned (Pérez-López et al., 2017;Pérez-López et al., 2018;Sampayo-Cordero et al., 2018). In addition, we observed that some relevant information about treatment management was usually reported in case reports (Lampe et al., 2019b;Stepien et al., 2020), and previous publications has stated the relevance of case reports to propose clinical novelties (Nakamura et al., 2014;Sampayo-Cordero et al., 2020a).
Alternatively, including nonrandomized studies and case reports raise important questions. The nonrandomized studies have been commonly associated with biased effect estimates, low scientific evidence, and low quality (Morgan et al., 2019;Higgins et al., 2019). Additionally, the data and methods of analyses from randomized and nonrandomized studies are quite different making it difficult to combine both results (Higgins et al., 2019). However, previous studies have stated that properly conducted observational studies and aggregations of case reports could achieve equivalent conclusions than randomized clinical trials and meta-analyses of prospective studies (Concato et al., 2000;Frieden, 2017;Sampayo-Cordero et al., 2018;Sampayo-Cordero et al., 2019). Methods to grade the strength of evidence allow incorporating the results of nonrandomized studies into the data analysis and discussion (Guyatt et al., 2008;Guyatt et al., 2011;Morgan et al., 2019). Accordingly, we observed that the combination of results from different study designs is feasible, communicating the strength of the evidence and the possibility of bias of each recommendation (Dornelles et al., 2017;Kuiper et al., 2019). This is the only way to contrast the short-term efficacy and safety results of clinical trials with the long-term outcomes (Bradley et al., 2017), mortality rates (Burton et al., 2017), alternative treatment schedules (Pérez-López et al., 2017;Sampayo-Cordero et al., 2019), or patient with uncommon phenotype or clinical characteristics of other study types (Pérez-López et al., 2017;Pérez-López et al., 2018).
Our findings described four strategies to combine results from different study designs in systematic reviews. The first strategy was excluding studies with different designs (El Dib and Pastores, 2007;El Dib, 2009;Brunelli et al., 2016;da Silva et al., 2016;Jameson et al., 2019); the second was to summarize the results and discuss them qualitatively (Alegra et al., 2013;Pal et al., 2015;Schrover et al., 2017;Gomes et al., 2019); the third was to combine the results of studies based on a method which assess the strength of the evidence (very low, low, moderate, and high) for each outcome (GRADE) (Bradley et al., 2017;Dornelles et al., 2017;Pérez-López et al., 2017;Perez-Lopez et al., 2018); and the last was to rank individual patients in each study with an improvement or deterioration criterion and combining them quantitatively (Almeida et al., 2018;Sampayo-Cordero et al., 2018;Kuiper et al., 2019;Sampayo-Cordero et al., 2019). Importantly, previous reviews have stated that ranking and aggregation of this patients' results should be guided by the quality assessment of studies selected and the standardization and good definition of outputs evaluated (Sampayo-Cordero et al., 2018;Sampayo-Cordero et al., 2019).
The second and third strategies are based on interpreting qualitatively the results from different study designs. However, there is an important difference between both methods. The second method proposes to summarize all studies results and discus them without a predefined criterion to combine them. Thus, the interpretation of these results is equivalent to a conventional literature review, although the bibliographic search could be reproducible and more reliable (second strategy) (Alegra et al., 2013;Pal et al., 2015;Schrover et al., 2017;Gomes et al., 2019). The third method proposes to evaluate the strength of evidence for each relevant outcome based on a predefined criterion stated in the protocol and widely accepted. Usually, the results are reported as a qualitative score for each outcome (invaluable, very low, low, moderate, and high evidence grade) (Guyatt et al., 2011;Morgan et al., 2019). So, the interpretation of the results and conclusions are more reproducible and evidence-based. This method could be considered a qualitative meta-analysis (third strategy) (Bradley et al., 2017;Dornelles et al., 2017;Pérez-López et al., 2017;Pérez-López et al., 2018). In accordance with our results, the first strategy can lead to biased and poorly generalizable results in the context of rare diseases. Therefore, results could only be considered as hypothesis generators (exploratory). The second strategy conducts the results without a preplanned strategy, so conclusions should also be considered as exploratory. The third and fourth strategies conduct the analyses with a preplanned and reproducible method. Therefore, we considered them as more reliable source to draw conclusions about the data. It is important to consider that all four methods presented in this review assumed a quality review of the studies selected (Dornelles et al., 2017;Pérez-López et al., 2017;Jameson et al., 2019;Sampayo-Cordero et al., 2019). The first method will use this revision to exclude nonrandomized studies and low-quality clinical trials (Jameson et al., 2019). The second method will comment differences between studies and sources of bias in results or discussion (Dornelles et al., 2017). The third and fourth method will report this quality evaluation in the results section (Pérez-López et al., 2017;Pérez-López et al., 2018;Sampayo-Cordero et al., 2018;Sampayo-Cordero et al., 2019). Thus, these combination methods do not propose to pool results from randomized and nonrandomized trials without carefully evaluating the study quality, with awareness regarding the potential bias in efficacy estimates. On the contrary, these methods propose to evaluate both the quality and heterogeneity of the data. In the third and fourth method authors proposed to report separate estimates from randomized and nonrandomized studies in order to assess if the results could be combined (Morgan et al., 2019;Higgins et al., 2019).
Additionally, conclusions from these studies could not be considered confirmatory without any criticism. The primary source of the data, the proper conduct of the systematic review, and the specific research context should be considered (Higgins et al., 2019).
As it is the case with nonrandomized studies, methods to assess the quality of case reports, to aggregate their results, to analyze the heterogeneity, and to assess publication bias have been also described (Sampayo-Cordero et al., 2018;Sampayo-Cordero et al., 2019). Guidelines to homogenize and upgrade the quality of the case reports are also published (Gagnier et al., 2013), and methods to develop a retrieval system in which, given one case report, one can find other case reports that report the same or very similar main findings have been proposed (Luo et al., 2020).
To analyze the rate of patients excluded, if nonrandomized studies were not included, we have taken advantage of previous systematic reviews to select the individual clinical studies analyzed in this communication. We consider that this strategy provides two important quality control measures. First, the searches and study selection are less affected by the authors' assumptions. Second, we included communications whose quality is high enough to be selected in previous published systematic reviews. However, an important limitation is that new clinical studies, which were not included in these systematic reviews, are not considered. It is important to state that the most recent publication selected was from 2019 (Gomes et al., 2019;Sampayo-Cordero et al., 2019) and the oldest was from 2007 (El Dib, 2009). In addition, a very recent article in press about a systematic review and meta-analysis, aiming at the evaluation of ERT for treatment of Hunter disease, reported equivalent results that we have reported in this study. Most of the studies selected were nonrandomized, and some important outcomes such as mortality were only analyzed in prospective observational studies (Wikman-Jorgensen et al., 2020).
An important point in our strategy for data analyses is that the patients analyzed in randomized clinical trials and included in nonrandomized follow-up extensions have been considered twice, as participants of randomized and nonrandomized trials (Wraith et al., 2004;Muenzer et al., 2006;Muenzer et al., 2011;Clarke et al., 2009;STRIVE;Investigators et al., 2014;Hendriksz et al., 2016). This strategy makes sense because we are not evaluating the clinical response of each patient. We evaluated the amount of information provided by each design. The fact that they are the same patients is irrelevant since the design of the studies, the outcomes evaluated, and the information provided was different.
It is usually considered that a MEDLINE search alone is not adequate for answering relevant questions (Porter et al., 2020;Higgins et al., 2019). Accordingly, we have increased the scope of our search with the Cochrane and PROSPERO databases, since they are the preferred databases to publish systematic review protocols. The publication of the protocol prior to the analysis is a key recommendation from PRISMA (Stroup et al., 2000;Liberati et al., 2009). This strategy has allowed us to find relevant congress communications (Almeida et al., 2018). Additionally, searches in the LILACS database are increasingly common in systemic review searches (Membrive-Jiménez et al., 2020;Suleiman-Martos et al., 2020). The LILACS database has previously demonstrated its utility incorporating in systematic review's unique contents, since most of its indexed journals are not indexed in other databases (Clark and Castro, 2001;Clark and Castro, 2002). Accordingly, we found in our study additional systematic reviews in LILACS that were not indexed in other databases (Pal et al., 2015). This was also the case for previous studies performed on MPS and other indications (Pérez-López et al., 2017;Sampayo-Cordero et al., 2018;Sampayo-Cordero et al., 2019;Mösges et al., 2019). Our study only included systematic reviews from MPS patients. It is important to note that differences between randomized and nonrandomized studies observed in our analyses have also been described in other highly prevalent and rare diseases. In addition, the research in ultrarare genetic diseases is usually based on registry studies, analysis of public databases, and communication of case reports. Some ultrarare diseases have a proportion of nonrandomized studies higher than those in our study. (Nakamura et al., 2014;Kahan et al., 2015;Hong et al., 2018;Jansen-van der Weide et al., 2018; on behalf of the ACMG Professional Practice and Guidelines Committee et al., 2020). Therefore, the recommendations obtained from this metaanalysis could be extended to other rare diseases.

CONCLUSION
More than 50% of patients analyzed in MPS publications have been recruited in nonrandomized studies (Bradley et al., 2017;Dornelles et al., 2017). The baseline characteristics of the recruited patients, the duration of follow-up, and the clinical outcomes evaluated differed between the randomized and nonrandomized studies (El Dib, 2009;Brunelli et al., 2016;Gomes et al., 2019). Therefore, our findings recommend including nonrandomized studies in the systematic reviews of MPS to increase the representativeness of the results and avoid a selection bias. Despite the difficulties in analyzing the results of different designs all together, previous studies have proposed multiple methods to combine them (Bradley et al., 2017;Pérez-López et al., 2017;Sampayo-Cordero et al., 2018;Kuiper et al., 2019). Additionally, results suggest the relevance of including case reports in a systematic review, since smaller the number of patients in the reference population, larger the selection bias associated to excluding case reports. Therefore, the selection of case reports should be considered in an MPS systematic review if subgroup analyses were to be planned Sampayo-Cordero et al., 2018;Sampayo-Cordero et al., 2019). In addition, differences between randomized and nonrandomized studies observed in our analyses are observed in other diseases as well (Concato et al., 2000;Liberati et al., 2009;Frieden, 2017;Hong et al., 2018). Therefore, the recommendations obtained from this study should be considered when conducting systematic reviews on rare diseases.

DATA AVAILABILITY STATEMENT
All data generated or analyzed during this study are included in this published article and its information files.

AUTHOR CONTRIBUTIONS
MSC and JP-L conceived the idea. MSC, BM-H, and JP-L developed procedures for paper selection and data extraction. AM, JP-G, AL-C, JC, and AP reviewed these procedures. MSC, BM-H, AM, JMP-G, AL-C, JC, AP, and JP-L coordinated, performed, and reviewed paper selection and data extraction. MSC and BM-H developed the statistical analysis. AM, JMP-G, AL-C, JC, AP, and JP-L reviewed statistical outputs. M-SC, AM, AP, and JMP-G drafted the manuscript. All authors critically reviewed and made important intellectual contributions to this manuscript. All authors read and approved the final manuscript.