The Quality of Clinical Practice Guidelines and Consensuses on the Management of Primary Aldosteronism: A Critical Appraisal

Background: Several guidelines and expert consensuses have been developed for management of primary aldosteronism (PA). It is important to understand the detailed recommendations and quality of these guidelines to help physicians make informed and reliable decision. Methods: PubMed, EMBASE, and three websites were searched for practice guidelines or consensuses of PA from inception to January 24, 2019. We summarized the major recommendations on the management of PA from these guidelines and consensuses. The Appraisal of Guidelines for Research and Evaluation II was used to assess quality of the included guidelines and consensuses. Results: We identified three clinical practice guidelines and three consensus statements. Most of the recommendations on the diagnosis and treatment of PA from these guidelines and consensuses were consistent. Some minor conflicts were recorded for patient's screen and confirmation test. All included guideline documents have a good quality (score, >70%) on the scope and purpose (mean score, 81.02%) and clarity of presentation of the recommendations (mean score, 86.88%). However, the reporting for the stakeholder involvement (mean score, 54.32%) and applicability (mean score, 47.92%) were insufficient. There was an insufficient rigorousness in most of the guideline documents (mean score, 45.56%) on the development process. The Endocrine Society practice guideline 2016 ranked highest in quality (score, 81.13%). Conclusions: Existing guideline documents provided valuable recommendations on the management of PA, but further efforts are needed to improve the methodological quality. The Endocrine Society practice guideline 2016 was recommended for use.


INTRODUCTION
Primary aldosteronism (PA) is a group of disorders caused by the autonomous excessive production aldosterone which escapes regulation from angiotensin or plasma potassium concentrations (1). Mass secreting of aldosterone would lead to high levels of potassium in urinary excretion; therefore, PA patients generally had a hypokalemia, severe resistant hypertension, and metabolic alkalosis (2). Patients who suffer from PA may have a higher risk of cardiovascular and cerebrovascular events than those with essential hypertension (3)(4)(5). But this excess risk may be mitigated by proper treatment, for example, adrenalectomy for unilateral aldosterone-producing adenomas (6). As a result, a proper management on PA patients is important for the prognosis (7).
Clinical practice guidelines are developed to provide implemental basis for physicians and/or patients for the entire spectrum of clinical decision-making process, from prevention, screening, diagnosis, treatment, to rehabilitation, as an effort to improve the healthcare (8). The potential benefits to the healthcare providers and receivers largely depend on the quality of the guideline itself. Trustworthy guidelines are systematically developed based on reliable evidence, patient-oriented recommendation, and informative disclosure (9).
During the past decades, an increasing number of clinical practice guidelines and consensuses have been developed for the management PA. For example, the Endocrine Society Clinical Practice Guideline, the Chinese Endocrine Society consensus, and the Japanese Endocrine Society guideline (10)(11)(12). These guidelines and consensuses form a strong basis of evidence-based recommendations for PA physicians. Some of the recommendations may differ across guidelines. For example, the international Endocrine Society recommended that hypertensive patients with sustained blood pressure (>150/100 mm Hg) should be screened for case detection (10), whereas the Chinese Endocrine Society recommended that patients with sustained blood pressure of 160/100 mm Hg or greater should be screened for case detection (11). Understanding the major discrepancies and the quality of these guidelines and consensuses may be helpful for physicians in clinical practice.
In order to help physicians to make informed and reliable decisions, in this article, we studied the major recommendations and potential discrepancies of current PA guidelines and consensuses; we also conducted a critical appraisal of their quality.

Eligible Criteria, Literature Search, and Screen
We considered both expert consensus and clinical practice guidelines for the management of PA. The definition of expert consensus and clinical practice guideline is available elsewhere (13). In brief, a guideline generally is developed based on existing evidence, whereas consensus may largely rely on the expert experiences. We did not include consensus or guidelines for which the primary objective was outside the scope of PA management. For example, some guidelines for the management of hypertension also contain a small part of recommendation for resistant hypertension caused by PA, which were not considered in current article. In addition, for one guideline that was updated, the latest version would be included for assessment [e.g., the Endocrine Society Clinical Practice Guideline (10)].
PubMed and EMBASE were searched for guidelines or consensus of PA from inception to January 24, 2019. We also searched for the website of the National Guideline Clearinghouse (https://www.ahrq.gov/gam/index.html), the International Network of Agencies for Health Technology Assessment (http://www.inahta.org/), and the Guideline International Network (https://www.g-i-n.net/) for potential unpublished guidelines. We used MeSH terms and keywords relevant to primary aldosteronism, hyperaldosteronism, Conn's syndrome, guidelines, and expert consensus to develop the search strategy (Supplementary Material 1).
Literature screen was conducted by two authors, with one author (Z.M.) acting as a clinical expert and another (C.X.) providing methodological perspectives of evidencebased practice. Titles and abstracts retrieved from the systematic literature searching were scanned, and clearly irrelevant records were excluded; full texts of remaining potentially eligible publications were obtained and assessed for a final decision based on the eligibility criteria. Any disagreements were solved through discussion by the two authors.

The Appraisal Instrument and Quality Assessment
The Appraisal of Guidelines for Research and Evaluation (AGREE) II instrument was used for the quality assessment (14). This was an update of AGREE I by The AGREE Next Steps Consortium (15). We chose the AGREE II instrument because it has been regarded as the most comprehensive and rigorous quality assessment tool (16). The Appraisal of Guidelines for Research and Evaluation II includes 23 items structured in six domains as follows: scope and purpose (domain 1), stakeholder involvement (domain 2), rigor of development (domain 3), clarity of presentation (domain 4), applicability (domain 5), and editorial independence (domain 6) (Supplementary Material 2). Each item was rated by scores from 1 (strongly disagree) to 7 (strongly disagree) according to the extent of adherence (14). The score for each domain was derived from the obtained score (sum of score by each rater for the domain) and the maximum possible score (strongly agree) and minimum possible score (strongly disagree) (14).
The quality assessment expert group took charge of the quality assessment of included guidelines and consensuses. The group consists of two physicians (L.Z., Z.D.), three surgeons of PA (Z.M., G.Q, M.P.), and one methodologist (C.X.). Before the assessment, each group member was trained through a teleconference by the principal investigator (Z.M.) and the methodologist (C.X.) according to AGREE II user's manual.
The members then assessed the quality according to AGREE II instrument independently and were required to record their decisions in a separate Excel 2010 sheet (Microsoft, Redmond, WA, USA). Except for the rater himself/herself, the results were blinded to other members.

Data Analysis
We summarized the recommendations on the screening, diagnosis, and the treatment of each guideline and consensus. The major discrepancies among them were described. For the quality, the obtained score by each rater, maximum possible score, and minimum possible score of each domain were summarized and used to calculate the total score for each domain (17). A domain with score larger than 70% was regarded as good quality, 50 to 70% as moderate quality, and less than 50% as poor quality (17). The mean score of the six domains of each guideline was further calculated as a measurement of the overall quality of the guideline. Similarly, a guideline with the mean score of all six domains larger than 70% and the score of domains 3 and 4 larger than 70% was regarded as have good quality and could be recommended for use. We prespecified domains 3 and 4 as the most important parts because they were regarded indicative for good overall quality and a recommendation for use, respectively. The interclass correlation (ICC) was calculated for each domain, and an ICC value of 0.91 to 1.00 was regarded as excellent, 0.76 to 0.90 as good, 0.51 to 0.75 as moderate, and less than 0.50 as poor reliability (18). Data analysis was conducted using Excel 2010 software (Microsoft).

RESULTS
We obtained 298 records from the literature search. In addition, we obtained one guideline from the Guideline International Network (https://www.g-i-n.net/). After excluding duplicates and those that did not meet the criteria, we identified 14 potentially eligible articles (Figure 1). Of these, the 2016 Endocrine Society Clinical Practice Guideline was an update of the 2008 version; The French Endocrinology Society (SFE), in collaboration with the French Hypertension Society (SFHTA) and Francophone Endocrine Surgery Association (AFCE) consensus was divided into seven separate articles based on the topic from epidemiology to the treatment; the consensus of the Taiwan Society of Aldosteronism was divided into two separate articles, with one focused on screening and diagnosis and another focused on treatment. We finally included six guideline documents (Figure 1). Among them, three were clinical practice guidelines, and three were consensus statements.

Brief Summary of the Management on PA
A brief summary of the management on PA is presented in Table 1. The baseline prevalence of PA among hypertensive  patients in each guideline document ranged from 5 to 18%. Generally, current guidelines and consensuses have consistent recommendations on the diagnosis and treatment for different types of PA. All of the them recommended the use of plasma aldosterone/renin ratio (ARR) for patient screen, the computed tomography for subtype classification, the laparoscopic adrenalectomy for unilateral PA, and the mineralocorticoid receptor antagonist (spironolactone) for bilateral adrenal disease and those patients who were unable or unwilling to undergo surgery. There were several conflicts on the screen and confirmation test for PA. The Endocrine Society (2016), the Chinese Endocrine Society, the France SFE/SFHTA/AFCE, and the Taiwan Society of Aldosteronism recommended patients with high risk (e.g., sustained blood pressure) of PA should be screened (10,11,19,22). However, the Italian Society of Hypertension and the Japanese Endocrine Society recommended all hypertensive patients should be screened because of the high prevalence in their country (12,21). For the detailed target population for screen, the Endocrine Society (2016) and the Taiwan Society of Aldosteronism suggested patients with sustained blood pressure greater than 150/100 mm Hg should be screened for PA (10,12), whereas the Chinese Endocrine Society set this cutoff point at 160/110 mm Hg, and the France SFE/SFHTA/AFCE set it as 180/110 mm Hg (11,22). The Endocrine Society (2016) and the Chinese Endocrine Society suggested hypertensive patients with sleep apnea should be screened for PA, whereas other guidelines and consensuses did not give such a recommendation (10,11).
There were no uniform consensuses on the detailed cutoff value of ARR as a sign for PA. Five of them recommended the confirmation test (e.g., sodium loading test; saline infusion test) for those patients with positive ARR, whereas the Italian Society of Hypertension did not recommend the use of confirmation test because these tests could lead to missing many curable cases (21). Except for the Italian Society of Hypertension and the Japanese Endocrine Society (12,21), genetic testing was recommended for young PA patients (<20 years) and those with family history of PA or stroke at a young age (<40 years). Table 2 presents the score of each domain for the guidelines and consensuses according to AGREE II. There was a good reliability between the six raters (ICC ranges from 0.77 to 0.88), indicating a good agreement for the quality of our assessment. As for the most important two domains (3 and 4): for domain 3, four guideline documents have a poor quality, one has a moderate quality (Taiwan consensus), and one has a good quality (Endocrine Society practice guideline 2016); for domain 4, all of them have a good quality (score, >70%; mean score, 86.88%). All of the guideline documents have a good quality on domain 1 (mean score, 81.02%). None of the guideline documents sufficiently reported the stakeholder involvement (domain 2, mean score was 54.32%) and applicability (domain 5, mean score was 47.92%). For editorial independence (domain 6, mean score was 62.04%), only the Endocrine Society practice guideline 2016 and the France SFE/SFHTA/AFCE consensus reached a good quality.

Quality of Each Guideline and Recommendation for Use
The mean score of each guideline and consensus across the six domains ranged from 50.24 to 81.13%. The Endocrine Society practice guideline 2016 ranked highest in overall quality, whereas the Italian Society of Hypertension ranked the lowest. For the two most important domains (3 and 4), the Endocrine Society practice guideline 2016 has a score that ranked good on quality. Based on the overall quality and score of domains 3 and 4, the Endocrine Society practice guideline 2016 was recommended for use. But it still needs some modifications especially for the stakeholder involvement and application domains. The consensus of Taiwan Society of Aldosteronism has the highest quality among the three consensuses that showed some potential for recommendation (mean score, 68.96%), whereas some improvements were needed (e.g., the rigor of development) in the further version to make it be more reliable for clinical practice. The rest, four guidelines or consensuses, referring to the quality, were suboptimal because of the unsatisfied implementation for domain 3 and/or domain 4 and the overall quality.

DISCUSSION
In the current report, we summarized the recommendations on the management of PA from existing guideline documents and evaluated the overall quality and the use in clinical practice.
To the best of our knowledge, this is the first quality appraisal for PA guidelines. Overall, most of the recommendations by these guideline documents were consistent, although some minor conflicts existed. Our findings suggested that, based on AGREE II, for the existing guideline documents, the stakeholder involvement and applicability were insufficiently reported. Except for the Endocrine Society practice guideline 2016 (10), the development process seems to lack acceptable rigorousness. The Endocrine Society practice guideline 2016 has a good quality. Some conflicts on the recommendations for the management of PA were observed. Several reasons may explain this. First, the prevalence of PA differs by region in that in some regions it was higher, whereas it was lower in some, which makes the recommendation on the screen different. Second, and maybe the most important one, is the lack of high-quality evidence in this area. With a brief look for the evidence used in these guidelines, we can see that the majority of which were based on the results of observational studies or expert experiences; these results were susceptible to potential bias and therefore lead to conflicting recommendations. Third, there are different medical care conditions and economic status. For example, in some regions, robot assistant surgery was used for PA, whereas in some regions it was not available for application. Fourth, the attitude for what is positive screen for PA may differ and remains debatable in this area.
In our study, we observed that the process of guideline development was suboptimal because some of them failed to employ rigorous development methods. Kent et al. (29) also reported a similar finding. As emphasized by the AGREE II tool, evidence to derive practical recommendations should be based on comprehensive literature search, clear selection criteria, and appropriate method to form the recommendations and should take both benefits and harms of interventions into consideration (16,17). Indeed, a rigorous development process is the foundation to form trustworthy guideline recommendations, and it is the key step to build a "bridge" from high-quality evidence to the healthcare practice.
We observed that the domain of stakeholder involvement was underreported in these guidelines and consensuses. This might due to the insufficient collection of patients' views and preferences during guideline development. Similar suboptimal reporting on stakeholder involvement was documented from previous literatures (29)(30)(31). Although for physicians and surgeons, such information may have little role on the reliability of recommendation, the adoption of patients' opinions may be helpful to improve the informed decision for guideline development.
The current study conducted a critical appraisal on the guideline quality of PA based on a comprehensive literature search and a well-established instrument. Our findings may have some implications for further guidelines of PA. First, a clear description of how the evidence was searched, accessed, and linked should be clarified; moreover, a clear description on the facilitators and barriers to its application of the recommendations should be recorded; in addition, stakeholder involvement and editorial independence should be more informative.

CONCLUSIONS
In summary, the recommendations on the management of PA were consistent among existing guidelines and consensuses, although some minor conflicts were recorded. The overall quality of the guidelines and consensuses of PA is suboptimal, and further efforts are needed to improve the quality. Taking account of overall quality and domains 3 and 4, the Endocrine Society practice guideline 2016 has the highest quality and can be recommended for use.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
ZM, LZ, and XW proposed the ideal. ZM searched and screened the literature, and drafted the manuscript. CX screened and analyzed the data. ZM, LZ, ZD, CX, GQ, and MP assessed the quality. XW, YZ, and JK revised the manuscript.