A Call for Research on the Prognostic Role of Follow-Up Histology in Celiac Disease: A Systematic Review

Background: Convincing evidence is lacking on the benefit of follow-up biopsy in celiac disease. Regardless, achieving mucosal recovery (MR) has remained a desirable goal of therapy. We aimed to conduct a systematic review to determine whether MR is a protective factor and persisting villous atrophy (PVA) has negative consequences on long-term outcomes of celiac disease. Methods: Seven databases were searched for articles discussing celiac patients subjected to a gluten-free diet who had a follow-up biopsy, and clinical and laboratory characteristics were reported by follow-up histology (MR vs. PVA). Outcomes included clinical symptoms, mortality, malignant tumors, nutritional parameters, and metabolic bone disease. Comparative and descriptive studies were included. Since data proved to be ineligible for meta-analysis, the evidence was synthesized in a systematic review. Results: Altogether, 31 studies were eligible for systematic review. Persisting symptoms were more frequently associated with PVA than with MR, although a lot of symptom-free patients had PVA and a lot of symptomatic patients achieved MR. PVA might be a risk factor of lymphomas, but mortality and the overall rate of malignant tumors seemed independent of follow-up histology. Patients with PVA tended to develop metabolic bone disease more often, although fracture risk remained similar in the groups except in hip fractures of which PVA was a risk factor. Reports on nutritional markers are only anecdotal. Conclusions: The limited evidence calls for high-quality prospective cohort studies to be arranged to clarify the exact role of follow-up histology in celiac disease.


INTRODUCTION
Celiac disease (CeD) is a systemic disorder with an increasing worldwide prevalence of ∼1% (Di Sabatino and Corazza, 2009;Catassi et al., 2014). In CeD, the immune-mediated destruction of small intestinal villous architecture is triggered by dietary gluten (Reilly et al., 2017).

Rationale of the Study
Although the value of intestinal biopsy at diagnosis is beyond dispute (Bai et al., 2013;Rubio-Tapia et al., 2013;Ludvigsson et al., 2014), the role of follow-up biopsy is a matter of controversy (Pekki et al., 2017). While the restitution of intestinal villi is expected on a strict gluten-free diet, the mucosa fails to restore entirely in a considerable fraction of patients (Szakacs et al., 2017). Recent guidelines recommend a followup biopsy if signs and symptoms persist or relapse despite strict adherence to a gluten-free diet (Bai et al., 2013;Rubio-Tapia et al., 2013;Ludvigsson et al., 2014). However, reports proposed that neither the resolution of symptoms (Bardella et al., 2007;Biagi et al., 2014;Fang et al., 2017;Mahadev et al., 2017) nor a strict dietary adherence (Szakacs et al., 2017) guarantees mucosal recovery (MR). One might expect that CeD patients with persistent villous atrophy (PVA) experience a less favorable disease course than those achieving MR (Haines et al., 2008), although convincing evidence is lacking. Yet, achieving MR has remained a desirable goal in CeD. The importance of the topic roots in the burden imposed by the endoscopic procedures and duodenal histological sampling as well as in the subsequent clinical decisions made upon followup histology.

Objective of the Study
With this systematic review, we aimed to be the first who systematically collect all available evidence on the impact of follow-up histology (MR vs. PVA) on disease characteristics and clinical course of CeD.

Search Strategy
We performed a systematic literature search in MEDLINE (via PubMed), Embase, Cochrane Controlled Register of Trials (CENTRAL), Web of Science, Scopus, WHO Global Index Medicus, and ClinicalTrials.gov from inception up to 14th September 2019 for relevant articles. Free-text terms and Medical Subject Headings were combined into a query, as follows: celiac AND ("mucosal recovery" OR "mucosal healing" OR "mucosal atrophy" OR "intestinal atrophy" OR "duodenal atrophy" OR "villous atrophy" OR "persistent mucosal damage" OR "follow-up biopsy" OR "followup duodenal biopsy" OR "follow-up intestinal biopsy" OR "follow-up small intestinal biopsy" OR "follow-up histology" OR "repeated biopsy" OR "repeated histology" OR "control biopsy" OR "control histology"). No filters were imposed upon the search.
Relevant cited articles were explored by reviewing the reference lists of included papers. Citing articles were identified with Google Scholar.

Eligibility Criteria
Eligible papers discussed CeD patients subjected to a gluten-free diet with an available record of duodenal follow-up histology, and reported on disease and patients characteristics by followup histology. Analytical and descriptive full-text articles or conference abstracts but not case studies were included without language restriction to reduce publication bias.
Disease and patients characteristics included signs and symptoms, vitamin and mineral levels, anemia, body mass index, metabolic bone disease, malignant tumors and other co-morbid conditions, and long-term mortality.

Selection and Data Collection
Records were combined in a reference manager software (EndNote X7.4, Clarivate Analytics, Philadelphia, PA, the USA) to remove duplicates and overlapping database content. Then, the standard three-step selection was performed by title, abstract and full-texts. Each step was carried out by two investigators in duplicate. Discrepancies were resolved by third-party arbitration. K-statistics was used to measure the agreement between the investigators after each step.
Numeric and text data were extracted by two investigators onto a pre-defined Excel sheet, discrepancies were resolved by consensus. Although we contacted the authors of original studies for further raw data via email, we discarded these data from the systematic review when we realized that the material is ineligible for meta-analysis due to several reasons (as detailed later).

Design of the Studies Included and Quality Assessment
First, the design of the included papers was identified. Then, quality indicators were chosen based on the Quality in Prognostic Studies tool (Hayden et al., 2006), as follows: • way of recruitment, • diagnosis of CeD (only biopsy-verified cases or not), • the recency of diagnosis (newly diagnosed patients or treated patients were included), • representativeness of study population to the general CeD population (based on the inclusion and exclusion criteria of the individual studies), • timing of follow-up biopsy (taken prospectively after enrolment or earlier), • time elapsed between the diagnosis of CeD and the followup biopsy, and that between the follow-up biopsy and the measurement of outcomes, • definition of PVA (histological classification), • biopsy sampling site, • timing of outcome assessment (prospectively after enrolment or earlier), • definitions of outcomes (with cut-off values if applicable), • blinding, • adherence to a gluten-free diet (strict or not), and • statistical considerations (the analysis directly compared the clinical characteristics by MR and PVA or not and the analysis was adjusted for reasonable confounding factors or not).

Highlights of the Last Five Years
In this subchapter, we summarize the findings of the studies which investigated the prognostic role of follow-up histology and were published after January 2015 (Lebwohl et al., 2013a(Lebwohl et al., , 2015aPekki et al., 2015Pekki et al., , 2017Haere et al., 2016;Fang et al., 2017;Mahadev et al., 2017;Emilsson et al., 2018;Kurien et al., 2018;Ludvigsson et al., 2018). Eight studies were conducted in Scandinavia, one in the USA, and there was another multicenter study from Europe and North-America. Evidence from univariate analysis suggested a borderline association of PVA with persisting symptoms (OR = 1.656 with CI: 0.949-2.889; p = 0.076 favoring mucosal recovery) (Fang et al., 2017), which was not confirmed by co-variate-adjusted analysis of the baseline cohort of patients of a multicenter randomized trial on symptomatic CeD patients (Mahadev et al., 2017). Conclusions from three Scandinavian studies on symptom and well-being scores corroborated these findings (Pekki et al., 2015(Pekki et al., , 2017Haere et al., 2016). PVA might be associated with lower lumbar T-score measured at 1 year after diagnosis, although neither the risk of osteoporosis at 5 years after diagnosis nor that of fractures at 2 years after diagnosis was higher in this group (Pekki et al., 2015(Pekki et al., , 2017. Decreased vitamin D and calcium levels might contribute to the impaired bone mineral density (Fang et al., 2017). Regarding other nutrients and minerals, zinc deficiency should be highlighted (39 vs. 14% in patients with PVA and MR, respectively; p = 0.0005). The effect of PVA did not manifest itself regarding erythropoiesis: three studies reported similar hemoglobin levels and rate of anemia in patients with PVA and MR (Pekki et al., 2015;Fang et al., 2017;Mahadev et al., 2017). Rate of malignant tumors and that of lymphomas were not significantly different in histology groups in two studies 5 and 8 years after diagnosis (Pekki et al., 2015(Pekki et al., , 2017. Similar neutral associations were found on the rate of cardiovascular diseases (Lebwohl et al., 2015a;Mahadev et al., 2017), serious infections including sepsis, streptococcal, pneumococcal, influenza, herpes zoster, and Clostridium difficile infections (Emilsson et al., 2018), and adverse pregnancy outcomes (Lebwohl et al., 2015b). MR seemed to be a protective factor against respiratory and dermatological diseases (Pekki et al., 2017). In contrast, anxiety and depression co-occurred more frequently with MR  as did epilepsy (Kurien et al., 2018). With respect to overall mortality, a higher rate was reported with PVA on 8-years follow-up (14 vs. 9%) but the difference was not statistically significant (p = 0.259).

Narrative Review
In this subchapter, we summarize all available evidence published on the effect of MR and PVA on the outcomes of CeD.

Metabolic Bone Disease and Bone Fractures
Four studies assessed bone mineral density in univariate analysis (Valdimarsson et al., 1994;Walters et al., 1995;Kaukinen et al., 2007;Pekki et al., 2015) (Table 5). Patients with PVA tended to have lower forearm, femoral and trochanter Z-scores (Valdimarsson et al., 1994); and lower femoral T-score with similar femoral Z-and lumbar T-and Z-scores than those with MR (Pekki et al., 2015). One study showed a reduced bone mineral density with PVA (OR = 24.5, p < 0.0275 favoring MR) (Walters et al., 1995). Two studies investigated the risk of osteoporosis in the long-term and reported conflicting results: a case-control study reported an increased frequency of osteoporosis with PVA (Kaukinen et al., 2007) but a cohort study failed to confirm (Pekki et al., 2015).
Three studies (Valdimarsson et al., 1994;Lebwohl et al., 2014;Pekki et al., 2017) reported no association between fractures and Multivariate analyses are highlighted with bold. a Symptoms included typical (chronic diarrhea, weight loss, anemia), GERD-like, and lower abdominal symptoms (abdominal pain, constipation). b Analysis was adjusted for age, gender, duration of GFD, diagnostic histological severity. c Symptoms included abdominal pain, bloating, nausea, fatigue, diarrhea, bloody stool, steatorrhea, and weight loss. d Symptoms were assessed 4-5 years after the follow-up biopsy. e Symptoms were evaluated after follow-up biopsy (interval undetermined). f Symptoms included gastrointestinal and malabsorptive symptoms, such as diarrhea, borborygmi, abdominal pain, fatty stool, and anemia. g The analysis was adjusted for age, gender, body mass index, duration of GFD, medications, and laboratory tests. (?) indicates uncertainty. CeD, celiac disease; CI, confidence interval; GERD, gastroesophageal reflux disease; GFD, gluten-free diet; m, month; min, minimum; OR, odds ratio; pts, patients; NS, non-significant; SD, standard deviations; Q 1 -Q 3 , 25 and 75% quartiles; y, year.
Frontiers in Physiology | www.frontiersin.org   All measurements were performed with dual-energy X-ray absorptiometry except in forearm bone mineral density in the study of Valdimarsson et al. (1994) which used single-photon absorptiometry. a The number of patients were not reported for atrophic and recovery groups separately. b Parameter was assessed years after follow-up biopsy. (?) indicates uncertainty. GFD, gluten-free diet; m, month; min, minimum; NS, non-significant; pts, patients Q 1 -Q 3 , 25% and 75% quartiles; SD, standard deviation; y, year.
Frontiers in Physiology | www.frontiersin.org follow-up histology, except in the frequency of hip fractures being seemingly increased after a 5-years follow-up with PVA (adjusted HR = 2.18, CI: 1.17-4.05) (Lebwohl et al., 2014) (Table 6). Table 7 summarizes the laboratory findings. The association of water-soluble vitamins and follow-up histology is understudied. We found the numerical values of vitamins B 2 and B 6 levels or their laboratory indicators in patients with MR and PVA but the studies did not perform statistical comparison (Dickey et al., 2008). Regarding vitamins B 12 and B 9 (folic acid) levels, studies found no difference between groups by histology (Pekki et al., 2015;Fang et al., 2017). No reports are available on other water-soluble vitamins.

Micro-and Macronutrients
Reports on fat-soluble vitamins are scarce. Only numerical data are available on vitamin A without statistical comparison made by the authors (Valdimarsson et al., 1994). Patients with PVA tended to have lower 25-hydroxyvitamin D level [and total calcium (Fang et al., 2017)], though serum parathyroid hormone (Valdimarsson et al., 1994;Pekki et al., 2015), ionized calcium (Valdimarsson et al., 1994;Pekki et al., 2015) and alkaline phosphatase (Valdimarsson et al., 1994) seemed similar in patients with MR and PVA.
Zinc but not copper deficiency may be associated with PVA (39 vs. 14% for zinc deficiency with PVA and MR, respectively, p = 0.0005) (Fang et al., 2017).

Body Mass Index
Of the six studies reporting body mass index, three provided statistical evidence on having no significant difference between groups while the other three reported numerical values without analysis (Valdimarsson et al., 1994;Kaukinen et al., 2007;Dickey et al., 2008;Tuire et al., 2012;Pekki et al., 2015;Cornell et al., 2016) (Table 9).
Incidence of lymphomas was 1.4 vs. 1.6% (p = 0.968) with PVA vs. MR in univariate analysis (Pekki et al., 2015). In a registry analysis, PVA had an adjusted HR of 2.26 (CI: 1.18-4.34) for the overall rate of lymphomas, being true for the subgroup of non-Hodgkin lymphomas but not for the subset of T-cell lymphomas (Lebwohl et al., 2013b) (Table 10).
Sporadic records concerned other co-morbid conditions. Cardiovascular diseases including heart failure and atrial fibrillation (Lebwohl et al., 2015a), hypertension (Mahadev et al., 2017), serious infections (including sepsis, streptococcal, Not reported 1 y (min) a Only the total N 0 of pts was given. b Parameter was measured years after the follow-up biopsy. c Parameter was measured within 1 month of follow-up biopsy. d Values were within the normal range for all patients and not reported for recovery and atrophic groups separately. (?) indicates uncertainty. CeD, celiac disease; EGRAC, erythrocyte glutathione reductase activation coefficient; GFD, gluten-free diet; m, month; min, minimum; NS, non-significant; Q 1 -Q 3 , 25% and 75% quartiles; SD, standard deviation; y, year.  pneumococcal, influenza, herpes zoster, and Clostridium difficile infections) (Emilsson et al., 2018) and adverse pregnancy outcomes (Lebwohl et al., 2015b) were not influenced by follow-up histology. However, there was a significant reduction in the frequency of respiratory and dermatological diseases (Pekki et al., 2017) and in the rate of elevated aspartate and alanine transaminase levels with MR (Mahadev et al., 2017). In contrast, MR predisposed to developing anxiety and depression  as well as epilepsy (Kurien et al., 2018). The effect of follow-up histology on immune-mediated co-morbidities including dermatitis herpetiformis is severely understudied (Valdimarsson et al., 1994;Tuire et al., 2012;Mahadev et al., 2017).

Summary of Findings
Whether the follow-up biopsy in asymptomatic CeD is needed is uncertain, and, to date, no systematic review has addressed this question. Achieving MR is a desirable goal of treatment; however, findings of reports which investigated its beneficial effects on disease course are inconsistent. In line with our previous findings, MR rates ranged from 9 to 97% across the studies included (Szakacs et al., 2017). The fact that a considerable fraction of CeD patients does not achieve MR underlines the importance of investigating the potential prognostic role of follow-up histology. One important conclusion of this review is that many asymptomatic patients do not achieve MR, and vice versa, many symptomatic patients do achieve MR. Analogously, MR cannot guarantee the symptoms to disappear, whereas PVA is often associated with persisting symptoms, even on a long-term followup exceeding 1 year (Tables 2, 3). Persisting symptoms may indicate poor dietary adherence (Abdulkarim et al., 2002;Leffler et al., 2007;Haere et al., 2016) or gluten-independent food intolerance (Carroccio et al., 2008) but other diseases, such as pancreatic insufficiency or small intestinal bacterial overgrowth, might be in the background (Fine et al., 1997). Noteworthy that a few studies included patients with questionable adherence to gluten-free diet, although a top-quality study established a straight relationship between persisting symptoms and PVA in co-variate-adjusted analysis of patients with good adherence (Carroccio et al., 2008). Studies used various definitions for assessing symptoms; some did not report how symptoms were specified at all. Although a smaller study favored MR regarding a set of individual symptoms (Carroccio et al., 2008), the most extensive study including only symptomatic patients found significant associations in univariate analysis (with an unexpected inverse association between PVA and heartburn). Interestingly, all associations proved to be nonsignificant after adjusting for co-variates (Mahadev et al., 2017). Self-reporting of subjective complaints may contribute to the discrepant results. Use of symptom scales (Svedlund et al., 1988) provided a comparable measure with homogenous results: none of the studies attributed higher scores for those patients with PVA compared to those with MR ( Table 4). The same association applies to quality of life; however, based on small sets of patients (Table 4). Altogether, the diagnostic accuracy of persisting symptoms seems insufficient for indicating a follow-up biopsy. Evidence coming from the two prospective studies is not enough to decide whether PVA precisely predicts the long-term persistence of symptoms (Pekki et al., 2015(Pekki et al., , 2017 (Table 4). PVA appears to be associated with radiologically detected metabolic bone disease in univariate analysis (Table 5), which translates into an increased risk of hip fractures (as an independent predictor) but not into that of the overall fractures ( Table 6). Calcium malabsorption (Fang et al., 2017) together with vitamin D deficiency (Valdimarsson et al., 1994;Fang et al., 2017) are likely to be causative factors. Length of follow-up (not standardized among studies) may be insufficient for restoring bone mineral density (Pekki et al., 2015), especially in those with severe bone impairment at diagnosis. Body mass index, as an important co-variate, seems to be independent of mucosal status (Table 9).
Laboratory studies describing vitamin and mineral levels tend to recover on an adequate diet. Patients on a long-term strict gluten-free diet may suffer from vitamin deficiencies (Hallert et al., 2002(Hallert et al., , 2009, this phenomenon might be explained by the vitamin-and micronutrient-poor nature of the gluten-free diet as compared to a balanced gluten-containing diet (Thompson, 1999). None of the studies adjusted for dietary adherence and vitamin supplementation which might counteract malabsorption and masked the effects of PVA. Variability in clinical and histological severity at diagnosis might delay villous restitution, thereby affecting vitamin status (not taken into account in the studies) (Kemppainen et al., 1998). Besides, improvement of histology might lag behind the recovery of laboratory values (Pekki et al., 2015), although the length of follow-up exceeded 1 year in all studies ( Table 7).
Regarding mortality, the adjusted HRs calculated in the individual studies attributed a neutral effect to MR and PVA (Table 11) (Rubio-Tapia et al., 2010;Lebwohl et al., 2013a). Findings were similar on malignant tumors, except in certain lymphoproliferative diseases based on an extensive registry analysis (noteworthy that data were not adjusted for dietary adherence) (Lebwohl et al., 2013b). A possible explanation could be that the increment in rates of mortality and tumors with PVA Multivariate analyses are highlighted with bold. a The analysis was adjusted for age and gender. b Total mortality of the cohort of patients was 8.0%. c The analysis was unadjusted. d The analysis was adjusted for age, gender, calendar period of diagnosis, education level, and length of follow-up. CeD, celiac disease; CI, confidence interval; GFD, gluten-free diet; HR, hazard ratio; pts, patients; m, month; y, year.
was counteracted by a decreased risk of ovarian, endometrial and breast cancer, likely due to lower body weight and/or earlier menopause (Ludvigsson et al., 2012;Lebwohl et al., 2013a).

Strengths and Limitations
Our study has several strengths. The question we raised is unique without known previous systematic review. We carried out an extensive systematic search with high coverage of patientimportant outcomes in a transparent manner by independent (and pre-trained) investigators with good inter-rater agreement. The quality of the included papers was rigorously assessed. Although the amount of data presented in the tables would be enough for meta-analytical aggregations that we intended to do initially (as declared in the protocol of the work, see the PROSPERO record); finally, our team decided not to do so because of several concerns. The decision-making algorithm is presented in Figure 2.
We are aware that the evidence suffers from several limitations. (1) Most studies did not recruit patients immediately after the diagnosis of CeD (29/31, 94%), did not take the followup biopsy after recruitment (23/31, 74%) and did not assess the disease course prospectively. (2) Although the diagnosis of CeD was mainly biopsy-proven, only two papers included newly diagnosed cases and observed them longitudinally until the follow-up biopsy. (3) Details of histological sampling and processing (e.g., sampling site, number of tissue samples, orientation, staining) were not consistently reported across papers, while the procedures often deviated from recent gold standards (e.g., samples were taken from the jejunum, histological classification systems were not specified). Similarly, the definition of villous atrophy varied across the studies. (4) Observational studies are vulnerable to bias: in registry analyses, data of the deceased were often missing (survivorship bias), attendance to regular control visits was incomplete (attrition bias), baseline differences between groups of patients with MR and those with PVA were rarely balanced with co-variateadjusted analyses (selection bias), study samples were not taken consistently from the general CeD population with consecutive recruitment (selection bias), and investigators assessing the outcomes were rarely blind to mucosal status (performance bias, detection bias). (5) Most evidence came from studies focusing on adults while an inverse association between age and rate of MR is well-known (Szakacs et al., 2017). (6) A strict gluten-free diet was not a criterion in several studies, the length of diet varied. (7) Publication bias cannot be reliably assessed in systematic reviews.

Implications for Practice
The results of publications on the prognostic role of follow-up histology (that is, MR and PVA) are not in agreement. Some adverse outcomes (e.g., persistent symptoms and metabolic bone FIGURE 2 | Decision-making algorithm. Based on several arguments, we decided not to perform a meta-analysis. disease) may be more common with PVA; however, achieving MR alone cannot guarantee a favorable clinical course to our current knowledge. The question as to whether taking a follow-up biopsy is beneficial has remained a matter of debate.

Implications for Research
With a view to the future, prospective cohort studies are urged to be organized to collect decisive evidence on the prognostic role of follow-up histology.

DATA AVAILABILITY STATEMENT
All datasets analyzed this study are included in the manuscript/supplementary files.

AUTHOR CONTRIBUTIONS
ZS and JB conceptualized the study. AM, DD, VB, LS, AV, and RH collected the data. NG performed the formal analysis and designed the figures. ZS assessed the quality of studies. ZG, DC, and MS interpreted the results. ÁV, ZS, JB, and BE drafted the manuscript. ÁV and PH supervised the work. All authors approved the final draft submitted, involved in the study design, edited, read and approved the final manuscript. Sponsors had no effect on study design, data collection, analysis, interpretation of the findings, or preparation of the manuscript.