A Critical Appraisal of the Quality of Glioma Imaging Guidelines Using the AGREE II Tool: A EuroAIM Initiative

Background: Following the EuroAIM initiative to assess the quality of medical imaging guidelines by using the Appraisal of Guidelines for Research and Evaluation (AGREE) II instrument, we aimed to evaluate the quality of the current imaging guidelines in patients with gliomas. Methods: A literature search was conducted to identify eligible imaging guidelines considered in the management of adult patients with gliomas. The selected guidelines were evaluated using the AGREE II instrument by four independent appraisers. The agreement among the four appraisers was estimated using the intraclass correlation coefficient (ICC) analysis. Results: Seven guidelines were selected for the appraisal. Six out of the seven guidelines showed an average level of quality with only one showing a low quality. The highest scores were found in Domain 1 “Scope and purpose” (mean score = 81.2%) and Domain 4 “Clarity of presentation” (mean score = 77.6%). The remaining domains showed a low level of quality and, in particular, Domain 5 “Applicability” was the most critical with a mean score of 41.7%, mainly related to a minor attention to barriers and facilitators as well as costs and resources implications of applying the guidelines. The ICC analysis showed a very good agreement among the four appraisers with ICC values ranging from 0.907 to 0.993. Conclusions: The available guidelines on glioma imaging emerged as of average quality according to the AGREE II tool analysis. Based on these results, further efforts should be made in order to involve different professional bodies and stakeholders and increase patient and public involvement in any future guideline drafting as well as to improve the applicability of these guidelines into the clinical practice.


INTRODUCTION
Malignant primary brain tumors still represent one of the most difficult cancers to treat with a rather low 5 year overall survival (1). Among these, glioma constitutes the largest subgroup with high grade-glioma, specifically glioblastoma, accounting for almost 50% of cases (2). Diagnostic imaging, particularly magnetic resonance imaging (MRI), plays a fundamental role in diagnosis, staging and follow-up of glioma patients (3,4). Considering the very poor prognosis of such patients and the lack of an effective treatment, especially for recurrent disease, the patient management is very demanding whereas major endeavors are constantly made to develop more effective drug treatments (5,6) and sensitive methods for early tumor detection, in particular recurrent disease as it appears crucial for prolonging survival. In this perspective, imaging and especially MRI make a substantial contribution to the assessment of response to treatment using conventional and advanced techniques that probe the tumor biology (7). The possibility to leverage the efforts by conducting multicenter studies in different research and clinical domains (e.g., treatment trials, identification of diagnostic, and prognostic imaging biomarkers) necessitates a standardization of the imaging protocols, especially in terms of clinical indications and acquisition techniques. To achieve a reasonable level of standardization, diagnostic imaging guidelines covering clinical indications, acquisition protocols, and technical details have been previously realized. However, the reliability of clinical practice guidelines has been questioned and the proposed recommendation statements should be rather judged based on the methodological rigor followed in their drafting process. In order to assess the quality of guidelines, several useful tools have been proposed (8). In particular, the updated Appraisal of Guidelines for Research & Evaluation version 2.0 (AGREE II) (9, 10), first established in 1998, is the most comprehensively validated and has been widely adopted for the quality assessment of clinical practice guidelines (11). A recent initiative to assess meticulously the quality of the current imaging guidelines has been promoted by the European Network for the Assessment of Imaging in Medicine (EuroAIM), founded by the European Institute for Biomedical Imaging Research (EIBIR) (12). First evaluations conducted in this matter revealed that the quality of imaging guideline is heterogeneous, ranging from low to high levels (13)(14)(15)(16). In the context of the EuroAIM initiative, we aimed to evaluate the quality of the existing guidelines on the role of imaging in glioma patients.

Literature Search
Between October and November 2018, an exhaustive literature search was conducted on PubMed using MeSH and non-MeSH terms with and without customizing the search for "Consensus Development Conference, " "Guideline, " "Clinical Practice Guideline, " and "Government Document." The following terms and their expansions were entered: "glioma, " "neoplasms, " "brain tumors, " "guideline, " "practice guideline, " "recommendations, health planning, " "official positions, " "diagnostic imaging, " "imaging." Similarly, EMBASE, Scopus, Wiley Online Library and Google, including gray literature sources, were also searched. The search was focused on the most up-to-date version of the identified guidelines. Inclusion criteria were: (1) guidelines focused on the role of imaging in the management of primary brain tumors and specifically gliomas; (2) guidelines dealing with the adult population; and (3) papers with available English full text. Exclusion criteria were the following: (1) guidelines not developed under the auspices of recognized professional institutions, associations, and/or working groups; (2) clinical practice guidelines in which imaging was included in a wider, rather abstract context (e.g., guidelines dealing with cancer clinical management and treatment); (3) guidelines not dealing with the major imaging techniques employed for the assessment of gliomas, particularly MRI.

Guideline Evaluation
Selected papers were evaluated by four independent radiologists (VR, AS, LU, RC) with 6 to 9 years of clinical expertise and research in a university hospital setting. The appraisers used the AGREE II instrument (http://www.agreetrust.org/), made of six quality domains, each ". . . capturing a unique dimension of guideline quality, " and including a total of 23 key items (9). Specifically: domain 1 "Scope and Purpose" includes items from 1 to 3; domain 2 "Stakeholder Involvement" comprises items from 4 to 6; domain 3 "Rigor of Development" provides items from 7 to 14; domain 4 "Clarity of presentation" contains items from 15 to 17; domain 5 "Applicability" covers items from 18 to 21; and domain 6 "Editorial Independence" includes items from 21 to 22. Domains and items are summarized in Table 1. Each item is rated on a 7-point scale, ranging from "strongly disagree" (score = 1), to "strongly agree" (score = 7). Finally, an Overall Assessment section is provided to summarize in a comprehensive way the quality of the guideline. Each appraiser was asked to assign a score to each item and to the Overall Assessment section as well as to indicate whether he/she would recommend the use of the guideline in clinical practice. Whereas they had previous exposure to the AGREE II tool (15), the appraisers also carried out the freely available online training tool consisting of an overview tutorial and a practice exercise (17).

Quality Assessment
Following the AGREE II manual instructions, domain scores were ". . . calculated by summing up all the scores of the individual items in a domain and by scaling the total as a percentage of the maximum possible score for that domain" (17). Guideline overall quality was considered "high" when 5 or more domains scored more than 60%, "average" when 3 or 4 domains scored more than 60%, and "low" when no more than two domains scored more than 60%, as previously performed (13)(14)(15)(16). Mean scores ± standard deviations of each guideline were then calculated. Domain overall quality was assessed by calculating the mean scores of each domain being considered as good (≥80%), acceptable (60-79.9%), low (40-59.9%), or very low (<40%).

DOMAIN 1. SCOPE AND PURPOSE
Item 1: the overall objective(s) of the guideline is (are) specifically described Item 2: the health question(s) covered by the guideline is (are) specifically described Item 3: the population (patients, public, etc.) to whom the guideline is meant to apply is specifically described

Statistical Analysis
The level of agreement among the four appraisers was assessed using the intraclass correlation coefficient (ICC) analysis and rated as: poor (ICC ≤ 0.20); fair (ICC from 0.21 to 0.40); moderate (ICC from 0.41 to 0.60); good (ICC from 0.61 to 0.80); and very good (ICC ≥ 0.81) (13)(14)(15)(16). Scores collection and calculation as well as the statistical analysis were performed by an independent reviewer (SC) with 9 years of experience in scientific research and biostatistics.

Literature Search and Guidelines Selection
The literature search returned 162 records. The majority of the retrieved papers was excluded after the evaluation of title and abstract, with 29 remaining articles extensively reviewed in fulltext and 7 guidelines finally eligible for the appraisal process (18)(19)(20)(21)(22)(23)(24). A flow-chart of the guideline selection process is illustrated in Figure 1. Details of the selected recommendation papers are reported in Table 2.

Statistical Analysis
The ICC analysis showed a very good agreement among the four appraisers with values ranging from 0.907 to 0.993; the ICC scores with their 95% confidence intervals are reported in Table 3.

Guideline Scores
According to the AGREE II tool, six out of seven guidelines showed an "average" quality with one guideline demonstrating "low" quality. The highest domain scores were found in Domain 1 "Scope and purpose" (mean score = 81.2%) indicating good quality, followed by Domain 4 "Clarity of presentation" (mean score = 77.6%) suggesting an acceptable quality. The remaining domains showed a low level of quality and in particular Domain 5 "Applicability" was the most critical with a mean score of 41.7%. Similarly, Domain 2 "Stakeholder involvement, " Domain 3 "Rigor of development" and Domain 6 "Editorial independence" were considered of low quality achieving mean scores of 52, 55.1, and 58.9%, respectively.
The highest variability in domain scores was observed in Domain 3 "Rigor of development" and Domain 6 "Editorial independence" with a SD of 21.8 and 22.7%, respectively, while the lowest variability was found in Domain 4 "Clarity of presentation" with SD of 9.5%. In the remaining domains, the variability ranged from 12 to 14.4%. All domains and guidelines scores are shown in Table 4 and

DISCUSSION
Overall, the current imaging guidelines for the management of glioma patients showed an intermediate level of quality according to the AGREE II analytical approach. In detail, six out of the seven guidelines showed an average level of quality with only one revealing low quality.

Domain Scores
Domain 1 "Scope and purpose" and Domain 4 "Clarity of presentation" presented with the highest scores as they are primarily taken into account by guideline developers when defining the objectives and convey the recommendations. Domain 4 was the only one performing higher than 60% in all investigated guidelines. The remaining domains were judged with lower mean scores, ranging from 41.7% (Domain 5) to 58.9% (Domain 6). In particular, Domain 2 "Stakeholder involvement" performed poorly (mean score = 52%) as not all relevant professional groups (e.g., medical and/or radiation oncologist) were involved in the guideline drafting. In almost all cases, authors consisted of radiologists along with neurosurgeons. Moreover, the views and preferences of the target patient group were not considered e.g., in terms of experiences and expectations. Although this issue may appear unwonted and not customary for medical/radiological guidelines, the AGREE II tool provides suggestions about how to facilitate patient and public involvement (e.g., by prior conferring with patients to understand main issues, using interviews or literature review on their preferences or by stakeholder's external review on the draft). Of note, target users have been scarcely specified; this is an issue that could be easily addressed by clearly indicating which professionals are meant to use the guideline. Domain 3 "Rigor of development" assesses the methodology by which the guideline is elaborated and unfortunately obtained a low mean score (55.1%), ranging from 20.3 to 73.4% with a SD of 21.8%. The high variability discerned is due to the opacity in the methodology employed for evidence search and evaluation, enabled usually through the performance of systematic literature reviews. Only Thust et al. specifically discussed the possible methodological limitations (24). Furthermore, methods for formulating the recommendations were not always clearly named and structured techniques (e.g., the Delphi method) to reach a final consensus were not used. The most critical results were presented in Domain 5 "Applicability, " in which none of the guidelines achieved a score higher than 60%. It should be noted that issues addressed in this Domain are conventionally difficult to be considered given that resources and costs are heterogeneous among different countries and national healthcare systems. This domain also contains very specific criteria, such as the inclusion of dedicated sections to provide solutions to barrier analysis, tools to capitalize on guideline facilitators or methods by which the cost information was sought. Finally, Domain 6 "Editorial independence, " even if showing a low-quality mean score (58.9%), did not emerge as critical as occurred in previous AGREE-II evaluations (13,14). In almost all papers, conflict of interests and funding disclosures have been stated, as the majority of guidelines is published on peer-reviewed journals, which oblige for such statements. However, in none guideline the declaration "the views of the funding body have not influenced the content of the guideline" was included.

Considerations
General remarks can be made in light of the present appraisal, especially regarding the overall average scores of the evaluated guidelines, with none of them fulfilling high quality. While in certain guidelines the recommendations were well-underpinned by a rigorous systematic review of literature, the recommendations were disadvantaged by lacking contributions from international expert panels, other disciplinesof-interest and considerations for the routine application of the recommended procedures. On the other hand, European guidelines did not immerse in detail for the evidence search and synthesis methodology. It is worth mentioning that 4 out of 7 guidelines though close to reach a final "high" score, having each 4 domains scoring higher than 60%, they overall scored moderately as Domain 2 "Stakeholder involvement" and Domain 5 "Applicability" performed poorly. Thus, emphasizing to these domains will improve dramatically the future guidelines quality. The results of the current guidelines appraisal are better than those of a previous AGREE II evaluation of clinical practice and management guidelines in glioma patients (25).
In the previous evaluation, tangible concerns were voiced for Domain 2 "Stakeholder involvement, " Domain 3 "Rigor of development, " Domain 5 "Applicability, " and Domain 6 "Editorial independence." A further AGREE II evaluation of the clinical practice guidelines for rehabilitation on brain tumor patients also showed moderate quality results (26). Furthermore, an improvement in terms of guidelines quality over time seems to emerge from our analysis. Based on the aforementioned results, the quality of future imaging guidelines in gliomas could be further improved by summarizing the key evidence elements derived from literature review and expert consultation and to report them close to the final recommendations. Data related to any guideline external review and update process should also be provided.

Limitations
The heterogeneity of the selected guidelines, dealing either with the definition of a standardized MRI protocol or with clinical indications of other than MRI techniques, pose an inherent limitation in our evaluation. We attempted to mitigate this risk by a universal and robust appraising tool as the AGREE II domains; we acknowledge, however, that the AGREE II instrument does not directly assess the quality of the guideline content (8). Furthermore, it is sensible that guidelines aiming to assess the role of imaging in the management of glioma patients might  differ in terms of tumor sub-types (i.e., low-vs. high-grade], overall setting or even be published as appendices in wider clinical guidelines. This makes difficult a broad-based acceptance, an obvious finding in the paper of Thust et al. (24), who probed the adherence of European centers to the "mainstay" glioma MRI protocol proposed by Ellingson et al. (19). A further limitation in our study might be the exclusion of guidelines in non-English language. Finally, while initial evidence suggests to weight the domain scores for the overall quality assessment (27), we decided not to embrace this approach. Nevertheless, the possibility to prioritize one domain over the other could be considered in future AGREE II appraisals.

CONCLUSIONS
The existing guidelines on the role of imaging in glioma patients showed an overall intermediate level of quality according to the AGREE II tool evaluation. The fairly high number of available guidelines highlights the profound interest of the oncological and radiological communities to significantly improve the management in terms of clinical indications, protocol appropriateness, and acquisition techniques. In this perspective, issues and suggestions transpired from this appraisal could be taken into account to improve the quality of imaging guidelines in neuro-oncology.

CONTRIBUTION TO THE FIELD STATEMENT
The quality of imaging guidelines in terms of methodological rigor has been recently questioned and found to be heterogeneous, thus potentially affecting the reliability of guidelines themselves. The use of imaging guidelines is crucial for the assessment of glioma patients, especially in the context of multicenter studies and clinical trials for new drugs development and the assessment of response to treatment. We therefore joined a recent initiative of the European Network for the Assessment of Imaging in Medicine (EuroAIM) and assessed the quality of imaging guidelines focused on glioma using the Appraisal of Guidelines for Research & Evaluation version 2.0 (AGREE II) tool. According to our results, existing guidelines on glioma imaging emerged as of average quality. We also provided suggestions to further increase the quality of future guidelines on glioma imaging on the basis of the raised criticisms.

AUTHOR CONTRIBUTIONS
VR, SC, and EI performed the literature search. VR, AS, LU, and RC evaluated the guidelines. SC provided the statistical analysis. The manuscript was drafted by VR, AS, LU, RC, SC, and AB. Data curation was carried out by VR and SC. Critical revision was made by AB and SB. SB was responsible for project administration, study conception and design. All authors revised and approved the manuscript.