Procedural and Methodological Quality in Preclinical Stroke Research–A Cohort Analysis of the Rat MCAO Model Comparing Periods Before and After the Publication of STAIR/ARRIVE

The translation of preclinical stroke research into successful human clinical trials remains a challenging task. The first Stroke Therapy Academic Industry Roundtable (STAIR) recommendations for preclinical research and several other guidelines were published to address these challenges. Most guidelines recommend the use of physiological monitoring to detect the occurrence of undesired pathologies such as subarachnoid hemorrhage and to limit the variability of the infarct volume and–therefore-homogenize the experimental result for complete reporting particularly with respect to transparency and methodological rigor. From the years 2009 and 2019, 100 published articles each using a rat stroke model were analyzed to quantify parameters related to anesthesia, physiological monitoring, stroke model type, ischemia verification, and overall study quality over time. No significant difference in the frequency of cerebral blood flow (CBF) measurements over time (28/34% for 2009/2019) was found. Notably, significantly fewer studies reported temperature, blood pressure, and blood gas monitoring data in 2019 compared to 2009. On the other hand, an increase in general study quality parameters (e.g., randomization, reporting of approval) was seen. In conclusion, the frequency of periinterventional monitoring has decreased over time. Some general methodological quality aspects, however, partially have increased. CBF measurement–the gold standard for ischemia verification-was applied rarely. Despite the growing recognition of current guidelines such as STAIR and ARRIVE (both widely approved in 2019) reporting, methods and procedures mostly do not follow these guidelines. These deficits may contribute to the translational failure of preclinical stroke research in search for neuroprotective therapies.


INTRODUCTION
Stroke is the second most common cause of death and adult disability worldwide (1). Globally, 5.5 million people of all ages and both sexes die from stroke annually (2). According to the 2019 Global Burden of Disease Study, stroke remains the second leading cause of global disability-adjusted life-years (DALYs) in patients over age 50 since 1990 (3). Even though there has been a substantial decline in stroke age-standardized DALY rates since 1990 (3), the development of safe and effective treatments is still a major challenge for experimental and clinical neuroscience.
Preclinical stroke research has helped a lot toward a deeper pathophysiological understanding of stroke (4). Further, due to complexity of the disease including multiple interactions (between different organs such as brain-heart interactions (5)) as well as influence of various systems such as the immune system, which shows a pronounced reaction after ischemic stroke (6) preclinical stroke research constitutes an important pillar of stroke research. However, many experimental stroke treatments with regard to neuroprotective agents result in reduction of infarct size and improved clinical presentation in animal models, most ultimately fail when translated into clinical trials (7). As a consequence, both traditional animal models per se and the documentation/reporting of preclinical stroke studies have to be reviewed.
From a methodological point of view, during the last decades several reforms mostly in terms of guidelines have been made: Probably the best known guideline in this field is the Stroke Therapy Academic Industry Roundtable (STAIR) guideline published in 1999 (8) and updated in 2009 (9). In the following years, numerous other guidelines for preclinical stroke trials were presented (10)(11)(12)(13). Further, the ARRIVE criteria (Animal Research: Reporting of in vivo Experiments) depicting general recommendations to improve the reporting of research involving animals were published in 2010 and updated in 2020 (14,15). Besides, some journals such as Stroke provide a Basic Science Checklist requesting details on methodological quality such as on the randomization and blinding procedures, definition of inclusion and exclusion criteria etc., which may increase transparent reporting (16).
Undoubtedly, there are numerous potential reasons for translational failure of preclinical stroke research (particularly with respect to neuroprotectants) conditioned by the experimental setting itself, such as: Generation of a plausible hypothesis, methodological quality of study planning, adequate performance and surveillance of the experimental procedure, objective (ideally blinded) analyses of study results, and full and transparent reporting.
Thus, the experimental procedure itself or rather the appropriateness of its performance may contribute to translational success or failure. With respect to the most common experimental setting in preclinical stroke research in rats, the middle cerebral artery occlusion (MCAO), previous publications have highlighted the need for an appropriate ischemia verification using methods such as cerebral blood flow (CBF) measurement (17)(18)(19)(20)(21)(22). Model-immanent confounders and complications (such as inadvertent induction of subarachnoid hemorrhage-SAH-or insufficient MCAO) may not be prevented by additional monitoring, however, usage of tools like CBF measurement may allow an instantaneous detection of the experimental result including the occurrence of undesired pathologies such as SAH. This is essential because applying the MCAO model, induction of SAH instead or in parallel to ischemia is a common phenomenon (comprising up to 30% of the experiments) (17). Thus, results may be biased by an inadequate modeling of the initially aspired pathology. As Philip and colleagues pointed out, ". . . the reliability of the model to induce ischemia and reproducibly cause infarction. . . " is hampered by a lack of CBF monitoring (23). Therefore, it is essential to question also the performance of disease models in order to allow an adequate interpretation and classification of the results.
Thus, the aim of the study was to document the performance of the rat MCAO procedure over time representing the most commonly used stroke model besides the murine MCAO procedure. Considering the 2009 STAIR update as well as the publication of the ARRIVE criteria in 2010, we compared the years 2009 vs. 2019 (analyzing a sample of 100 original articles each year) with particular respect to periinterventional monitoring and methods of ischemia verification (focusing on CBF measurement). Further, aspects of methodological quality (such as sample size calculation) were evaluated over time (before and after the public awareness of STAIR and ARRIVE). Thus, the results will provide an overview of methodological and periinterventional/procedural quality control of studies applying the intraluminal rat stroke model over time. An additional quality score to assess methodological and procedural aspects is provided. Hence, conclusions may be drawn regarding the relevance of the experimental results and failure of translation may be detected due to inaccurate modeling or inappropriate methodology.

MATERIALS AND METHODS
This review followed the STROBE guidelines (Strengthening the Reporting of Observational Studies in Epidemiology) (24). Our study is exempt from ethics approval because we collected and processed data from previous animal studies in which ethics approval has already been obtained.

Search Strategy
Literature research was conducted on 09 August 2020 on MEDLINE database via PubMed using the search strategy: ((((tMCAO) OR (transient middle cerebral artery occlusion)) OR (middle cerebral artery occlusion)) OR (MCAO)) AND (rat). A time filter was applied to the search results to select only publications from 2009 and 2019, and the results for each year were sorted in ascending order according to their publication date. The results of both years were screened with regard to our inclusion criteria. Only articles that met the following criteria were included: • Written in English.
• Original research article.

Data Extraction
The first author (JF) extracted the data, which was validated by the last author (AH). Data on the parameters listed in Table 2 were taken from each study. If no information was available on a parameter in the article or its supplementary material, it was also documented as "not reported." The parameter concerned was also documented as "not reported" if only a reference was provided without further information. Laser Doppler measurement was considered standard measurement. Other methods that could be potentially suitable for ischemia verification were only recorded when each animal included in the study was subjected to at least one of them. In case only a proportion was analyzed the variable "other potential suitable methods" was assessed as "none." The factor "a priori sample size calculation" was recorded as "not applicable" in case of an exploratory study. If a study does not include different groups "randomization" was assessed as "not applicable." Neurological assessments were only considered as a potentially suitable method for ischemia verification if no treatment had been performed previously (as usually treatment is supposed to alter/improve neurostatus, thus, treatment may cover the induced neurological deficit).
Subsequently, the impact factor of the journal in which the article was published was determined for the year of publication via Web of Science.

Analysis
We defined a quality score that includes information on anesthesia monitoring, ischemia verification, and general quality criteria ( Table 1) to analyze study quality. Category 1 comprises five items with a maximum score of 5. The parameter "ventilation" was not included due to the guidance that "unnecessary use of mechanical ventilation should be avoided when a particular MCAO model is not likely to cause respiratory problems. Ventilation may be needed when the operation lasts long (>1 h) and when the ischemia affects brain stem function" (11). We did not include ventilation in the assessment of study quality as it is possible that the experimenters deliberately chose not to ventilate in accordance with this recommendation. Category 2 comprises two items with a maximum score of 3. For the item "CBF measurement" the statements "not clearly reported" and "unilateral" were given equal scores as we assumed a unilateral measurement in most cases where the CBF measurement was ambiguously described. Category 3 includes four items with a maximum score of 4. Each study was assigned a score from 0 (lowest quality) to 12 (highest quality).

Statistics
All statistical analyses were performed using Jamovi 1.6.15 (25) with the level of significance set at p < 0.05. For each parameter category the Fisher exact test was applied to check whether there were significant differences between 2009 and 2019. To compare the impact factors for 2009 and 2019 the Mann-Whitney test for non-parametric data was used. The quality score related to the continent of origin was analyzed using the Kruskal-Wallis test. The Spearman's correlation analysis was performed to evaluate associations between the quality score and the impact factor. Graphs were created using GraphPad Prism 9.1.1 (LaJolla, USA).

RESULTS
A PubMed search (search strategy: ((((tMCAO) OR (transient middle cerebral artery occlusion)) OR (middle cerebral artery occlusion)) OR (MCAO)) AND (rat)) revealed 433 hits for 2009 and 584 hits for 2019. The results were sorted in ascending order by their publication date and then subsequently screened for relevance and eligibility until a total of 100 studies for each year were available (a list of all studies included can be found in the Supplementary Material). The numbers of publications rejected and the reasons for exclusion are given in Figure 1. In 2009, a last author publishing two manuscripts was observed twice and publishing three manuscripts thrice. In 2019, in five cases a last author was found publishing two manuscripts. The results for each parameter analyzed from the 100 sample articles published in 2009 were compared with those published in 2019 and are presented hereinafter.

Anesthesia and Physiological Monitoring
In 2009, inhalation anesthesia was used in 51% of the studies, whereas in 38% injection anesthesia was used. In 2019, inhalation anesthesia was used significantly less often (33%, p = 0.015) and injection anesthesia was applied more frequently (57%, p = 0.011).
The majority of studies did not report the mode of ventilation Further, the mention of heart rate monitoring was scarce at both time points (missing information in 93% in 2009, 96% in 2019). Both blood pressure monitoring (missing information in 78% in 2009, 94% in 2019, p = 0.002) and monitoring of blood gases and O 2 saturation (missing information in 76% in 2009, 92% in 2019, p = 0.003) in general were rarely reported. In summary, the documentation of periinterventional physiological parameters was lower in 2019 compared to 2009 (for all parameters analyzed).
Details are summarized in Table 2.

MCAO Model
In   In 35% (2009), respectively 16% (2019) no sham group was mentioned. If sham groups were documented, definitions for "Sham" varied between the studies (surgery without filament insertion, surgery with filament insertion, no surgery at all). Of note, if functional testing was included into the measures for ischemia verification, still 27% in 2009 and 16% in 2019 did not perform any method of ischemia verification.

Ischemia Verification
In the majority of publications, a sensorimotor score was used to assess neurological status (2009: 100%; 2019: 22.9%); complex behavioral testing was scarce (see Table 3

Quality Standards
In 2009, only 59 studies documented the approval by responsible animal welfare authorities, 56 of them did not specify a license number. In 2019, significantly more studies reported a study approval (85%, p < 0.001). Out of the approved studies, in 24% a license number was provided (vs. 3% in 2009; p < 0.001).
The majority of all studies did not report if an a priori sample size calculation was performed (99% in 2009, 92% in 2019, p = 0.035). Of note, both in the article samples from 2009 and 2019, a sample size calculation was presented in <10%.
A clear description of inclusion or exclusion criteria was often not reported, notably with a tendency toward poorer reporting in 2019 (49% in 2009, 32% in 2019, p = 0.021).
The results are summarized in Table 4.

Country/Continent of Origin
The studies analyzed were conducted in 20 different countries worldwide. In 2009, most studies were conducted in the USA (33%). In 2019, significantly fewer studies were conducted in the USA (13%, p = 0.001). By contrast, significantly more studies were conducted in China in 2019 (67%) than in 2009 (25%, p < 0.001) (please see Table 5).

Impact Factors
The

Quality Score
Out of the assessed parameters, a quality score system (consisting of items for "anesthesia monitoring"-category 1, "ischemia verification"-category 2 and "general quality standards"-category 3) was formed in order to mirror methodological and experimental quality standards. A maximum of 12 points could be achieved (category 1: 5 points; category 2: 3 points; category 3: 4 points).  Figure 2.
Both for 2009 and 2019, quality scores did not differ dependent on the continent of origin (p = 0.087; p = 0.171).
With respect to study type (any study examining an intervention with regards to its impact on the ischemia was defined as treatment study), 32 studies used an explorative approach, whereas 168 studies evaluated a treatment. The assessed quality score did not differ significantly between exploratory and treatment studies (mean ± SD: 4. 28

DISCUSSION
Our results indicate that an inadequate experimental performance itself namely a lack of sufficient quality control (in terms of ischemia verification and exclusion of undesired pathologies) still is common. The reporting of some methodological aspects (such as the reporting of approval) has increased over time. However, studies largely lack an a priori sample size calculation and other essential items indicating methodological quality. Unfortunately, same applies for other experimental issues (besides ischemia verification) such as periinterventional monitoring of physiological parameters. Thus, despite the publication of the STAIR guideline in 2009 and the original ARRIVE guidelines in 2010 there is little improvement with respect to methodology/reporting.
Stroke is a common neurological disorder resulting in a major socioeconomic burden. Thus, neuroscientific efforts largely have been focused on neuroprotective therapies after stroke. However, preclinical stroke research has proven to be impressively unsuccessful when it comes to clinical translation (7). As said earlier, various guidelines have been developed in order to improve the design quality of these studies, the STAIR criteria being one of the most famous and acknowledged ones (9). However, the STAIR criteria cover only some aspects of the methodology and experimental procedure; some recommendations are expressed only vaguely. CBF measurement (or perfusion imaging) is considered to be an important measure for adequate occlusion (9). Various studies have pointed out the relevance of (bilateral) CBF measurement as a gold standard for occlusion verification but also for detection of undesired events (such as SAH or premature reperfusion) (17,18). Unfortunately, the majority of studies analyzed still lacks a documentation of CBF measurement (both for the samples from 2009 and 2019). It is disappointing that even in 2019, when the guideline had been widely approved, only roughly one third of the studies analyzed reported a method of CBF measurement. Compared to our sample from 2009, when suggestions from the STAIR guideline had not yet been available, the overall usage of CBF measurement has not improved.
In 2009, Philip and colleagues have already demonstrated the poor methodological quality of preclinical stroke studies pointing out the importance of CBF measurement for the reliability of the experimental result (23). However, our results indicate that there is little improvement over time with respect to CBF monitoring. Notably, bilateral CBF measurement explicitly was mentioned in only two cases (out of 100 articles published in 2019). It is not clear why application of CBF measurement is that scarce as its advantage has already been demonstrated. Availability of other methods (such as MR imaging) may be regarded as substitutes. To some extent, this applies for MR imaging, as ischemia (and its extent) is verified and undesired pathologies are ruled out in parallel. Though, MR imaging rarely is used (10 and 12% for 2009/2019) and other methods for ischemia verification mainly suffer from its retrospective character. Thus, entities like SAH may be overseen when methods such as postmortem inspection or TTC staining are applied ex post. Same applies FIGURE 2 | Quality score according to the three analyzed categories: "anesthesia monitoring"-category 1, "ischemia verification"-category 2 and "general quality standards"-category 3.
for neuroscores, which are not a reliable instrument, as they do not reflect the extent of ischemia nor rule out complications such as SAH. Further, although usually hemiparesis is observed, it is not possible to attribute the neurological deficit to a small strategic ischemia (within the basal ganglia) or a large hemispheric infarction.
Lesion patterns also may vary due to different filaments used (26). Our data show that the material used varies largely, which may be a contributing factor to heterogeneity of experimental results. Additionally, definitions for "Sham" vary substantially between the studies (surgery without filament insertion, surgery with filament insertion, no surgery at all).
Further, periinterventional monitoring of physiological parameters in general was reported less common in the articles analyzed from 2019 compared with those from 2009 although the influence of anesthesia and its monitoring for the experimental result has been demonstrated repeatedly (22,27). Similarly, Thomas and colleagues detected no improvement in reporting of periinterventional parameters (such as ventilation, blood gas analysis, end-tidal carbon dioxide concentration, blood pressure, administration of intravenous fluids or analgesics) comparing experimental studies from 2005 and 2015 regarding the adherence of the STAIR guideline (28). This is an astonishing fact, as the STAIR guideline (9) but also others (11) recommend the monitoring explicitly. It is unclear (and only speculative) if this development is due to an increase in "publication pressure" resulting in easier and less elaborated (less expensive) experimental setups (29). The journal-specific word limit may also represent a further-seemingly trivial-obstacle avoiding detailed reporting. However, most journals offer the publication of supplement data to allow complete reporting.
Another issue of preclinical research (and research in general) is the methodological aspect. General guidelines like ARRIVE (14,15) promote the implementation of certain methodological quality standards. Further, specific guidelines like IMPROVE (13) as well as elaborated manuals and critical reviews of rodent stroked modeling (11,23,30,31) emphasize the importance of a sophisticated planning, adequate performance, and transparent reporting of experimental studies in order to create a relevant experimental result. Some general aspects of methodology (like mention of approval by authorities and randomization) have Particularly the documentation of an a priori sample size calculation is scarce; further, the studies often lack a clear declaration of in-/exclusion criteria. Given that an a priori sample size calculation presents the essential fundament of most study designs, the current results are not comprehensible. Most guidelines explicitly point out the importance of sample size calculation. Thus, the negligence of this issue most probably is not a matter of ignorance. The lack of underlying data allowing a proper sample size calculation may take part, but, again, "publication pressure" also may play a role as experimental performance is facilitated by low sample sizes. "Historical" sample sizes (such as treatment groups of three or five animals) usually allow a much faster processing of the experiments, whereas realistic effect sizes regularly result in high sample sizes and, thus, a lengthy workflow. Low sample size very often leads to statistically underpowered studies, not being able to detect true effects or, when finding a significant result, producing inflated estimates of the true effect. Besides the scientific implication, underpowered studies and thus unreliable results also imply an ethical dimension concerning the wasting use of animals in scientific research (32). In general, risk of bias seems to be a major issue in experimental studies, which is not only limited to preclinical stroke research (33).
However, a guideline (such as ARRIVE and STAIR) is not necessarily a one-fits-all concept. ARRIVE, for example, is focused on RCTs and some items may not apply for exploratory studies. Notwithstanding, elementary methodological standards should be generally implemented. Thus, blind adherence to a specific guideline is certainly not desirable, but a deliberated usage including an honest discussion on reasons why certain criteria have not been complied with is needed.
Another pitfall of preclinical stroke research is the experimental focus on young, male rats instead of sex-mixed groups, aged, or hypertensive animals (31,(34)(35)(36)(37). With respect to sex it has been proven, that inclusion of both sexes is highly desirable as it decreases bias of the results and is not necessarily accompanied by increase of sample size (38)(39)(40).
However, the vast majority of animals still are young, male rats (both in our samples from 2009 and 2019). Thus, both the underlying comorbidity/fundament of the pathology is not represented and the possible influence of sex on the specific course is not taken into account. Further, current research emphasizes the importance of the circadian rhythm even for preclinical stroke research (41). Therefore, the reporting of the exact time of day may contribute an important aspect of reporting in the future.
Consistent with the results from Thomas and colleagues (28), a substantial increase of publications from Asia (particularly from China) in parallel to a decrease of studies from North America was noted. The finding is in line with other data not only focusing on stroke research (42), but the predominance of studies from China (67% in 2019) compared to other data is striking (28,42,43). However, these are descriptive results. We have not correlated the country (or continent) of origin with methodological quality. Further, it has not been assessed, whether "established" stroke labs comply more or less with the current guidelines. Both scenarios are conceivable, either due to available know-how by an experienced team or to plugged-in patterns (no longer able to change).
In conclusion, our presented data show that the procedure itself varies largely resulting in heterogeneous lesion patterns and that both specific important aspects of experimental procedure (such as ischemia verification) and methodological issues are still in need of improvement. There is some change over time particularly with respect to reporting of approval and randomization but both study planning and the procedure itself still are prone to biases. This is not only a technical and formal issue because the results itself are influenced by the factors already mentioned. It is rather an essential and fundamental point as the relevance of the experimental result depends on adequate methodology and modeling. Further, it is a matter of appropriate resource management, animal rights and, finally, scientific ethics. Various articles have already pointed out the general importance of methodology and transparency in reporting (44)(45)(46). Our data show, that with respect to preclinical stroke research, there is still a long way to go. In addition, procedure-specific quality aspects (such as the essential aspect of ischemia verification and exclusion of other pathologies in parallel) are widely not applied and/or not reported. Thus, there are several methodological and procedural factors, which may contribute to the translational failure of stroke research. It is therefore important to identify the reasons for the pattern and to develop strategies, which may improve the quality of preclinical stroke research and standardize the disease model. Several journals nowadays demand a confirmation of the authors that current guidelines have been implemented. However, the adherence to the guidelines itself rarely is verified in detail. As a solution ". . . mandatory reporting of key methodological parameters in the published article and not only during submission" has already been proposed by a group analyzing articles on experimental stroke published in Stroke (16). Further, not only general aspects have to be taken into account but also procedure-specific ones. With respect to stroke research, it is commonly referred to the STAIR criteria, which build an excellent framework but lack some detailed advice (11). Finally, it is a matter of science funding and policy to set the agenda for elaborated study designs and procedures. As things have hardly changed so far, it is indispensable to put some pressure on the system. This does not imply a "science police" chasing non-adherent authors, but first of all it implies the willingness of the entire scientific system to change and, further, it implies adequate measures. Quality scores (as exemplarily presented) may be part of the solution in order to facilitate the process for reviewers and editors, but may be too rigid for specific settings. Another important issue is to educate younger scientist accordingly and to encourage a critical scrutiny. Lastly, funders and institutions should adopt a culture which does not only value the mere quantity of publications but also the methodological quality.
That said, it is definitively a long way to go and a real transformation process will be dependent on the willingness and engagement of the entire scientific community. The reasons for disregarding well-approved guidelines are not obvious and most probably may be multifactorial. On the one hand, there seems to be a lack of pressure from institutions/ funders/journals to implement certain quality standards and guideline adherence. Further, historical conditions, human phlegm, sometimes nescience or absence of (financial and timely) resources may contribute to the lack of reporting and experimental quality. However, it has to be assumed that anyone in preclinical science is honestly eager for improvement of the own and the general scientific quality. Therefore, any effort toward high-grade methodology and transparency has to be appreciated from the community and the entire scientific system. Emphasizing the importance of a highest possible scientific quality as well as repetitive education are important tools to raise the public awareness. Besides, it is necessary to repetitively analyze the current status quo of preclinical research in order to detect deficiencies and develop strategies to overcome the broad resistance.
Of note, our study has some limitations: Only one database for literature search was used. However, our goal was not to fully evaluate the literature on experimental stroke research in the rat model for the years studied but to give an adequate overview of the trends in general. Thus, we decided to analyze 100 publications for each year (2009 and 2019) sorted in ascending order according to their publication date as a representative subset. As only 100 publications per year have been analyzed, a random accumulation of methodologically inadequate studies in 1 year compared with the other may have biased the results. However, if the publications of an entire year would have been evaluated arising bias cannot be excluded but may be less probable. In order to evaluate the longitudinal awareness to methodological quality aspects and the adherence to current guidelines, a prospective study would be reasonable.
Further, only two time-points were evaluated. This approach was chosen in order to depict the status quo of preclinical stroke research using the MCAO model before (2009) and after (2019) the public awareness of STAIR and ARRIVE. A further (third) time-point would have allowed to depict a trend over time. Data extraction and analyses were performed by only one person, but it was validated by a second author and there was limited space for subjective interpretation due to the clearly defined parameters in advance. We defined neuroscore assessment after treatment as a doubtful indicator for ischemia evaluation (due to the supposed effect of treatment on neuroscore). CBF measurement might also be influenced by a prior treatment; however, if used as ischemia verification usually clear cut-offs were given, thus, representing an objective indicator for a similar cerebral underperfusion. Further, like any other comparable score with ordinal data, our quality score is subjected to general limitations. In most categories, we have chosen a binary scoring system, whereby the corresponding parameters are all equally rated in terms of their influence on study quality. Only in the case of CBF measurement we chose a ternary decision.

CONCLUSIONS
The reporting of periinterventional parameters in experimental stroke research (particularly, the one in search for neuroprotective agents) using the MCAO model still is scarce. Some methodological aspects have improved over time (2009 compared with 2019), but essential issues (such as sample size calculations) are reported rarely. Thus, deficits in the methodological and procedural quality may contribute to the translational failure of preclinical stroke research.

DATA AVAILABILITY STATEMENT
The raw data is available from the corresponding author on request.

ETHICS STATEMENT
Ethical review and approval was not required for the animal study because of secondary analyses of animal experiments.