Diagnostic Biomarkers to Diagnose Acute Allograft Rejection After Liver Transplantation: Systematic Review and Meta-Analysis of Diagnostic Accuracy Studies

Objective: A systematic review and meta-analysis of diagnostic biomarkers for noninvasive diagnosis of acute allograft rejection following liver transplantation. Background: Noninvasive blood and urine markers have been widely explored in recent decades for diagnosing acute rejection after liver transplantation. However, none have been translated into routine clinical use so far due to uncertain diagnostic accuracy, and liver biopsy remains the gold standard. Methods: Systematic literature searches of Medline, Cochrane and Embase were conducted up to February 2019 to identify studies evaluating the use of noninvasive markers in diagnosing allograft rejection following liver transplantation. Meta-analysis was performed using a random effects model with DerSimonian–Laird weighting and the hierarchical summary receiver operating curve. Results: Of 560 identified studies, 15 studies (1,445 patients) met the inclusion criteria. The following markers were tested: acid labile nitroso-compounds (NOx), serum amyloid A protein, procalcitonin, peripheral blood eosinophil count, peripheral blood T-cell activation and interleukin 2 (IL-2) receptor, guanylate-binding protein-2 mRNA, graft-derived cell-free DNA, pi-glutathione S-transferase, alpha-glutathione S-transferase and serum HLA class I soluble antigens. Only eosinophil count was tested in multiple studies, and they demonstrated high heterogeneity (I2 = 72% [95% CI: 0.5–0.99]). IL-2 receptor demonstrated the highest sensitivity (89% [95% CI: 0.78–0.96]) and specificity (81% [95% CI: 0.69–0.89]). Conclusion: IL-2 receptor expression demonstrated the highest diagnostic accuracy, while the peripheral eosinophil count was the only marker tested in more than one study. Presently, liver biopsy remains superior to noninvasive diagnostic biomarkers as most studies exhibited inferior designs, hindering possible translation into clinical application.


INTRODUCTION
Liver transplantation (LT) exhibits 20-year survival rates of up to 50% (1) and is of great clinical importance. LT is the treatment of choice for acute or acute-on-chronic liver failure, while organ replacement therapies are still waiting for the break (2). LT is superior to liver resection in suitable patients with hepatocellular carcinoma in cirrhosis, the most common primary liver cancer and the second leading cause of cancer-related deaths worldwide (3,4).
Within the first few weeks after transplantation, patients are at high risk of acute rejection (AR), with the incidence ranging from 50 to 70% (5), depending on the immunosuppressive regime selected. AR can be described as an immune response against donor tissues resulting from T-cell recognition of alloantigens. This overwhelming immune response compromises graft integrity and can lead to life-threatening graft loss. Thus, AR is the most common cause of transplant failure and the most common indication for re-transplantation. Indeed, the early diagnosis of AR is crucial for successful anti-rejection therapy and maintenance of graft function/integrity. The importance of prompt AR diagnosis and management is increased due to an organ shortage and an increasing proportion of marginal organs.
The liver graft and liver function can be monitored by standard blood tests such as total bilirubin, alanine aminotransferase, aspartate aminotransferase, γ-glutamyl transpeptidase, and alkaline phosphatase. Leukocytosis and eosinophilia are also frequently present (6). In addition, the trough blood levels of immunosuppressive drugs can be monitored and may predict AR risk (7). Nevertheless, standard laboratory tests are nonspecific and are not suitable for the efficient and timely diagnosis of AR (8). In the case of suspected AR, liver biopsy with histologic specimen diagnosis and grading remains the diagnostic tool of choice in routine clinical practice. Nonetheless, liver biopsy is an invasive procedure associated with severe complications primarily performed by trained colleagues in transplant centers.
The concept of noninvasive measurement applies to biomarkers in the saliva, peripheral blood, urine or other body fluids (e.g., cytokines or surface proteins of different immune cells) (8). Such diagnostic biomarkers have been explored to replace liver biopsy and are predominantly used in the field for diagnostic and monitoring purposes. The first attempt to noninvasively diagnose allograft rejection in LT patients occurred 30 years ago (9). Since then, numerous studies have been published, but a single method has not yet been adopted into routine clinical use. New biomarkers are compared with liver biopsy in terms of their usefulness for diagnosing AR.
Markers for AR have to face sensitivity and specificity, statistical measures that determine the applicability of diagnostic tests. Sensitivity, the true positive rate, refers to how well a test identifies transplant patients who are suffering from AR. Specificity is the true negative rate and is of particular interest because most known markers are not able to discriminate AR from liver dysfunction, cytomegalovirus infection and hepatitis C virus infections (8). Indeed, a diagnostic tool must be validated for accuracy, otherwise its significance remains uncertain. In addition, the test should be easily performed with results that are available on the same day so that anti-rejection treatment may be initiated as soon as possible.
Therefore, the aim of this study was to perform a systematic review and meta-analysis evaluating the diagnostic accuracy of noninvasive markers in the diagnosis of AR compared with conventional liver biopsy in patients following LT.

METHODS
The systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews (PRISMA) (10) guidelines and was registered with the International Prospective Register of Systematic Reviews (PROPSPERO: CRD42017072425).

Literature Search
The author searched MEDLINE (via PubMed), EMBASE and the Cochrane Library electronic databases. No date restriction was applied, and the literature search was performed in August 2017 and updated in February 2019. The search terms were initially reviewed by the author group and were sent to an expert librarian to ensure completeness and accuracy according to the PRESS Guideline Statement (11). The specific search strategy for MEDLINE/PubMed conducted in January 2018 was: Search (((((((((((((("acute rejection") OR ("acute allograft rejection") OR ("acute graft rejection") OR ("acute liver allograft rejection") OR ("acute reject * ") OR ("acute cellular rejection") OR (ACR) OR ("early rejection") OR ("early reject * ") OR ("early allograft rejection") OR ("early graft rejection") OR ("early liver allograft rejection") OR ("early liver graft rejection") OR ("early cellular rejection")))) AND ((biomarker OR biomarkers OR marker OR non-invasive OR liquid-biopsy))) AND (((liver transplantation) OR (liver graft * ) OR (liver transplant * )))) AND ( ))))))))))). The same search strategy was applied for the Cochrane Library and EMBASE as well.

Selection Criteria
Studies were included in the systematic review if they met the following criteria: • The study was designed as a diagnostic accuracy trial testing a noninvasive biomarker(s) of AR in LT patients. • AR was defined according to the International Consensus Document on Terminology of Hepatic Allograft Rejection (6). • The index test was primarily used to diagnose AR.
• The study used liver biopsy and histopathological grading as the reference test to diagnose AR. • The study presented sufficient data to create a 2 × 2 contingency table. • The study enrolled adult patients (≥18 years of age).
• The study was published in English.
Prediction of AR at some time in the future or the grading of patient risk for AR at any time point following LT led to study exclusion, because predicting the likelihood of AR in the future is different than diagnosing AR in clinical practice where anti-rejection therapy must be initiated on the same day as diagnosis. Therefore, studies needed to specify the time interval between sampling and confirmation of BPAR (biopsy-proven acute rejection), ensuring that test samples were timely connected to BPAR for analysis.

Study Selection and Data Extraction
All articles identified by the search were screened and excluded if they did not meet the inclusion criteria. The full texts of all potentially relevant studies were reviewed in detail. Data was extracted using a predetermined standardized form and included the following information: study design (cohort/singlegate or case-control/multi-gate), characteristics of participants, time of follow-up and regimen of immunosuppressive therapy. Test validity was assessed by the total number of patients with AR/no AR, prevalence of AR in the sample, and test parameters including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), likelihood ratio (LR), and area under the curve (AUC).

Risk of Bias Assessment
The Revised Quality Assessment of Diagnostic Accuracy Studies tool (QUADAS-2) was used to appraise the reliability and applicability of the study findings (Supplementary Table 1) (12). The signaling questions were carried out independently by two reviewers (FK and EK). Any discrepancy between the reviewers was resolved through discussion until a common conclusion was achieved. The assessment tested for patient selection, index test, reference standard, and study flow and timing. Each item was answered by yes, no or unclear, indicating a high, low or unclear risk of bias.

Statistical Analysis
Contingency tables were constructed based on values for sensitivity, specificity and corresponding sample size given in the studies. The extracted data was then used to calculate the PPV, NPV, and the positive and negative LR for each test when not reported in the manuscript. Confidence intervals were demonstrated with forest plots (13).
After critical appraisal, a meta-analysis was conducted to pool study estimates of specific markers. All pooled outcome measures were determined using the random effect model described by DerSimonian and Laird. The risk ratio (RR) of patients positive for eosinophilia against the risk of patients negative for eosinophilia to suffer from AR after transplantation confirmed by liver biopsy was estimated for each study. Heterogeneity among studies was quantified using the I² statistic, which describes the percentage of variation across studies due to heterogeneity rather than chance (14). The RR of each study was plotted against the respective measures of study size to investigate any existing bias (15) and was visualized through a funnel plot. Lastly, a hierarchical summary receiver operating curve (HSROC), as described by Rutter and Gatsonis, was used to simultaneously estimate the summary receiver operating curve (SROC) and the expected operating point on the curve for the diagnostic accuracy studies testing eosinophilia (16). All statistical analyses were conducted using statistical software R (www.r-project.org), and P < 0.05 was considered significant.

RESULTS
Of the initial 560 references retrieved from the databases, 104 were filtered for full-text review after the titles and abstracts were screened. Of these, only 15 studies fulfilled all inclusion criteria and were included in the systematic review. A PRISMA flow chart depicting the flow of information through the different phases of the literature review is shown in Figure 1. All studies were published between 1994 and 2019.

Baseline Study Characteristics
The lead author's name, study center, study design, studied index test, follow-up period and sample size are listed for each included study in Table 1. A risk of bias assessment using the QUADAS-2 tool was carried out for all studies (12), and only 6 studies exhibited low bias in all four parameters ( Table 2). The overall allocation of risk of bias of the included studies is graphically depicted in Supplementary Figure 2. In total, 12 studies were prospective consecutive cohort trials, 2 studies identified markers by exploration and validation, and 1 study was retrospective in design. Overall, 1,446 patients were analyzed by the included studies, and all patients underwent LT due to different underlying liver diseases.
Index tests were measured by taking peripheral blood samples at specific pre-or post-transplant time points, and liver biopsies were performed timely connected to the index tests. Note, liver biopsy was used as the reference standard in all included studies (n = 15). The markers identified by our review were eosinophilia (4 studies, 805 patients), serum amyloid A protein (1 study, 12 patients), nitric oxide (1 study, 50 patients), alpha-glutathione Stransferase (2 studies, 67 patients), pi-glutathione S-transferase (1 study, 44 patients), peripheral T-cells and soluble IL-2 receptor (1 study, 119 patients), guanylate-binding protein 2 mRNA (1 study, 46 patients), graft-derived cell-free DNA (1 study, 115 patients), procalcitonin (1 study, 20 patients), and serum HLA class I soluble antigens (1 study, 14 patients). Serum proteome characterization and subsequent validation through ELISA were performed in two studies. The study by Massoud et al. (24) identified the following seven markers: serum amyloid A protein (SAA), complement 4 (C4), fibrinogen, complement 1q (C1q), complement 3, heat shock protein 60, and heat shock protein 70. The study by Okubo et al. (20) identified autoantibodies in sera by fold change and intensity to charge one of the following markers: multivesicular body protein 2B, potassium channel tetramerization domain containing 14, voltage-gated subfamily A regulatory beta subunit 3, and triosephosphate isomerase 1.
The primary assessed outcome was index test accuracy (sensitivity, specificity, PPV, NPV, +LR, and -LR) in AR diagnosis confirmed by liver biopsy. The index test was usually performed on the same day or one day before liver biopsy. In 7 studies, the diagnostic accuracy was tested by the combination of the index test and an additional liver function parameter. Systematic literature searches (Medline, Cochrane Library, and Embase) were conducted to identify studies that evaluated biomarkers to diagnose allograft rejection in patients following liver transplantation. Studies were included when the non-invasive index test(s) and reference test (liver biopsy) were performed at the same time and the sensitivity and specificity were given (n = 15).

Noninvasive Marker Test Accuracy and Meta-Analysis
All test parameters are listed in Table 3. When insufficient data were provided in the studies, test parameters were calculated using a conventional contingency table. Out of all the parameters explored by the 15 included studies, soluble IL2-R, studied by Lun et al. (22), demonstrated superior accuracy; in 119 patients a threshold value of >631 IU/ml predicted AR with a high sensitivity (81%) and a specificity of 89%. In addition, the study was assessed as having a very low risk of bias according to the QUADAS-2 tool and had a considerable sample size ( Table 2). Peripheral blood eosinophil count was analyzed in 4 studies that included 805 patients and 1,076 sample points (21,23,26) for meta-analysis. The studies were pooled in an HSROC to illustrate the diametrical sensitivity and specificity (Figure 2). Blood eosinophilia demonstrated a pooled sensitivity of 50% (95% CI: 0.18-0.78) and specificity of 80% (95% CI: 0.62-0.92). The DerSimonian-Laird random effect method was used to test the overall effect of peripheral blood eosinophilia on accurately diagnosing AR, and single study RRs were calculated: patients positive for eosinophilia had a 1.56 times higher risk (95% CI: 1.21-2.02) of AR compared with patients negative for eosinophilia (p = 0.0006). The results are graphically illustrated in a forest plot in Table 4. The heterogeneity among the studies was moderately high (72%, calculated by I 2 statistics). Furthermore, RR and effect sizes were plotted against each other in a funnel plot to demonstrate possible asymmetry between the studies. The empty left side of the graphic shows existing bias, the origin of which may have been either publication bias, clinical heterogeneity or methodological heterogeneity (Figure 3).
The sample size varied among the studies that tested for eosinophilia and was not equally distributed. Rodriguez-Peralvarez et al.   The Quality Assessment of Diagnostic Accuracy Studies tool (QUADAS-2) was used to appraise trustworthiness and applicability of the study findings (12). The signaling questions were carried out independently by two reviewers (F.K. and E.K.). Any discrepancy between the reviewers was resolved through discussion until a common conclusion was achieved. The analysis found that Barnes  The sample size of the study was small and demonstrated a sensitivity of 66.7% (95% CI: 0.35-0.9) and specificity of 96% (95% CI: 0.97-1). There was no pre-specified threshold, and a cutoff point of 17 mg/dl was derived from the ROC curve. Interestingly, no statistical correlation was found between SAA and CRP (n = 37; r = 0.237; p = 0.157).
Dickson et al. (28) (14 AR cases, 44 samples) studied alphaglutathione S-transferase (alpha-GST) and pi-glutathione Stransferase (pi-GST) as potential markers of hepatocyte and biliary epithelial cell injury, which are considered possible indicators of AR. Alpha-GST was found to have a positive LR of 7.5 and a negative LR of 0.54, with a sensitivity of 50% (95% CI: 0.23-0.76) and specificity of 93% (95% CI: 0.78-0.99). Patients with no and mild rejection were grouped and compared with those with moderate and severe rejection; however, mean values  Non-invasive markers were tested in patients who underwent liver transplantation to diagnose allograft rejection. Test parameters were derived from study records, whenever possible. In the case of missing parameters, data were calculated from contingency tables (Note: imputed data is marked in bold).   of alpha-GST and pi-GST were indistinguishable between the two groups (data not shown in the manuscript). Barnes   They demonstrated that NOx increased during AR (p < 0.0001) in association with histopathological grading and decreased after administration of glucocorticoids. It is unclear from the manuscript whether a threshold was predetermined for NOx, which would interfere with the reliability of the test accuracy. In addition, the author studied the relationship between NOx and circulating TNF-alpha and IL-2R, with a predetermined threshold for circulating TNF-alpha.
Studies of Okubo et al. (20) (20 AR cases, 80 samples) and Massoud et al. (24) (33 AR cases, 62 samples) were similarly designed. The authors used proteomics and ELISA to test blood samples to discover possible markers of AR. In both studies, the discovery set was composed of patients undergoing LT due to hepatitis C infection (with or without histopathological signs of AR). Note, Okubo et al. also included patients with liver dysfunction and healthy volunteers without any signs of AR. Next, a completely separate group of patients was set up for validation (ELISA) of markers that were revealed in the discovery set. The validation panel were still sub-grouped into patients with AR and patients without AR after LT. Massoud et al. identified 41 proteins, while C4 and C1q were both independent predictors for AR with sensitivities of 97% (95% CI: 0.79-0.99) and 56% (95% CI: 0.35-0.76) and specificities of 62% (95% CI: 0.38-0.81) and 86% (95% CI: 0.64-0.97), respectively. A noteworthy secondary outcome was the increased marker specificity of 81% and sensitivity 96% when C4 (cut-off < 0.31 gm/L) was combined with ALT (cut-off > 70 IU/ml). Taken together, C4 demonstrated the best test accuracy in differentiating patients with and without AR. Okubo et al. performed microarray analysis and identified 57 autoantibodies that were upregulated in the AR group; he then selected four autoantibodies (multivesicular body protein 2B [CHMP2B], KCTD14, voltage-gated subfamily A regulatory beta subunit 3 [KCNAB3] and triosephosphate isomerase 1 [TPI1]) by fold change and antibody intensity. KCNAB3 was found to be significantly higher in the AR group compared with the group with liver dysfunction and no AR (+LR 2.25; -LR 0.17). CHMP2B and TPI1 were both significantly overexpressed in the AR group compared with the other control groups. The levels of these antibodies increased only around the time of acute cellular rejection, making them good candidates for diagnostic molecular markers. CHMP2B showed outstanding performance, with an AUC of 0.86 (95% CI: 0.75-0.97) and 80% specificity and 80% sensitivity at a cut-off value of 0.33.
Kuse et al. (19) (10 AR cases, 40 samples) tested whether procalcitonin (PCT) allowed differentiation between infection and AR in cases of fever following LT. The authors demonstrated that PCT had a high predictive value in differentiating between AR and infection, with an AUC of 0.93. The highest sensitivity and specificity values were found at a cut-off of 5.9 ng/ml, with 100% sensitivity (95% CI: 0.71-1) and 75% specificity (95% CI: 0.56-0.9). Renna Molajoni et al. (30) (8 AR cases, 16 samples) studied whether serum HLA class I antigen was associated with AR: a cut-off value of >2.1 g/ml yielded a sensitivity of 100% (95% CI: 0.63-1) and a specificity of 75% (95% CI: 0.35-0.97). However, both studies had low reliability due to underpowered sample sizes and restricted inclusion criteria.
Nagral et al. (29) (38 AR cases, 56 samples) evaluated the efficacy of alpha-GST as a marker of AR in comparison with standard liver function tests (ALT and bilirubin). At a cut-off value of >11.4, alpha-GST demonstrated a sensitivity of 63.1% (95% CI: 0.45-0.78) and a specificity of 38.8% (95% CI: 0.17-0.64); whereas, ALT (cut-off >40 IU/ml) yielded a sensitivity of 97.4% and specificity of 16.6%. Alpha-GST was also tested as a marker of successful anti-rejection therapy in 16 patients. However, alpha-GST only decreased in 7 of 10 cases following anti-rejection therapy (p = 0.9) and, thus, was not linked to histological improvement.
Kobayashi et al. (5) (19 AR cases, 46 samples) analyzed the diagnostic efficacy of guanylate-binding protein 2 mRNA (GBP2) and interferon regulatory factor 1 mRNA (IRF1) as markers of AR using real-time PCR. Patients with liver dysfunction (LD) were included and further subgrouped into LD with AR vs. LD without AR patients. Although both IRF1 and GBP2 were higher in patients with LD (independent of AR) than in controls, only GBP2 was higher in LD with AR patients compared to LD without AR patients (+LR 4.2; -LR 0.43). A cut-off value of 20 produced a sensitivity and specificity of 63% (95% CI: 0.38-0.83) and 85% (95% CI: 0.66-0.95), respectively. A noteworthy result was that GBP2 was unable to distinguish between AR and HCV recurrence (p = 0.2). The study was high in bias for patient selection because of inappropriate patient exclusion (subjective exclusion criteria, e.g., "rejection, infection, or recurrence of primary disease"), which may have led to an overestimation of study findings ( Table 2).

DISCUSSION
The present systematic review and meta-analysis has summarized the current status of noninvasive diagnostic biomarkers to diagnose AR following LT and, in doing so, has identified the best-evaluated diagnostic parameters. In total, 10 blood markers were identified to diagnose AR, while the AEC was validated by most studies and soluble IL-2R exhibited superior study results. In the following paragraphs, we will discuss the most important challenges currently limiting the establishment of a noninvasive diagnostic markers in the diagnosis of AR after liver transplantation.
A summary of the study exclusion criteria can be found in Supplementary Figure 1. When we initially designed the present systematic review, we did not anticipate the number of studies that would not consider test accuracy (e.g., sensitivity and specificity) or timely connect the reference test to the index test (16% and 8%, respectively). Test accuracy is essential in assuring study comparability and performance. The timely correlation between the reference test and the index test is also important as anti-rejection therapy should be initiated on the same day as diagnosis. Therefore, appropriate methods and statistical tests remain of utmost importance for the translation of findings into routine clinical practice. However, when we reviewed the literature, more than 95% of all screened studies did not consider these two fundamental characteristics.
It appears plausible that among the 10 identified markers, IL-2R expression in peripheral blood leukocytes, with a sensitivity, specificity, NPV and PPV all above 80% and a considerable sample size (119 patients), demonstrated the most promising diagnostic accuracy for translation into clinical use. This was underscored by the trial set up, patient criteria and very low risk of bias. Although it did not show as strong a diagnostic accuracy, peripheral blood eosinophilia was studied by Hughes et al.  (23). The discrepancy in study results for the utility of peripheral blood eosinophilia in accurately differentiating between different grades of rejection and for predicting histopathological improvement after appropriate anti-rejection therapy may be due to the considerable difference between the two studies' sample sizes, 487 vs. 45. Another contributing factor may be the sample grouping into no-mild rejection and moderate-severe rejection groups, which could have led to bias. In conclusion, peripheral blood eosinophilia was the most frequently tested marker, but its clinical use as a single marker of AR is not supported due to low data accuracy. However, eosinophilia might be useful as a complementary test to indicate the need for closer noninvasive monitoring of transplant patients for possible liver biopsy.
Schütz et al. (27) studied graft-derived cell-free DNA in the largest cohort study but also excluded patients for undefined reasons. Test accuracy was high, although HCV+ patients were excluded from stable patients in the sensitivity and specificity analyses. Indeed, test accuracy decreased significantly when HCV+ patients were included, yielding a sensitivity of 75% and specificity of 84.2%. Taken together, the concept of graft-derived cell-free DNA is promising, although the study design was imprecise.
In a few studies, true positive, false positive, true negative and false negative values were used to calculate diagnostic accuracy, and the sensitivity, specificity, PPV, NPV, LR, and RR were either not stated at all or only partially stated. Furthermore, Hughes et al. and Rodriguez et al. showed inconsistency in the stated and derived PPV and NPV values, raising concerns about the reliability of the reported AEC data (21,25). Therefore, the statistical error obviously increased as we constructed contingency tables for our own calculations with missing accuracy values. Indeed, calculated sensitivity, specificity, PPV and NPV values did not always match the stated values (21,22,25).
No pediatric study (n = 27) did match our inclusion criteria. Most reviewed studies do not include adults and children at the same time, wherefore a possible particular finding will not be applicable to both, the young and the old transplant recipients. There are differences of the incidence of acute rejection with increasing age (31). In addition, there are changes in immune response with age, why some e.g. cytokines are low in young age, but high in old age (31). Furthermore, immunosuppressive regime is more complex in pediatric patients with longer expected period of intense immunosuppression and changes in pharmacokinetics in older patients (32). Therefore, we did not consider pediatric patients in our systematic review.
In addition to the study design, another important factor currently limiting the success of recent and past studies is the lack of a robust endpoint that plays a crucial role in objectively assessing the diagnostic efficacy of a particular biomarker. Liver biopsy is the gold standard for the diagnosis of acute rejection and the Banff Working Group on Liver Allograft Pathology has defined histopathological finding of AR (33). Nevertheless, percutaneous biopsy and diagnosis of AR is characterized by a lack of reproducibility between experienced and inexperienced pathologists (34,35). Most diagnostic biomarkers consist of proteins, which are elevated not only in AR, but also during inflammation and infectious diseases, thus greatly lacking in specificity (e.g., IL-2). Bacterial, viral and fungal infections are one of the most common and challenging complications following LT. While the immunosuppressive regimens do not vary substantially between different solid organ transplantations, similar infection patterns and pathogens can be encountered (36). The postoperative course of liver transplant patients is often complicated by various infections like reinfection with hepatitis B and C viruses. In addition, every infection can timely overlap with AR episodes, rendering the differential diagnoses and treatment difficult. Most studies do not consider this particular problem to differentiate between AR from infection. For instance, AR and recurrent hepatitis C often coexist at the same time (37). A multicenter study performed by Regev et al. tested the reliability of histopathologic findings between recurrent HCV infection and AR (38). The colleagues unraveled low interobserver and intraobserver agreement rates, with a kappa score of smaller than 0.4. The same diagnostic uncertainty exists with recurrence of autoimmune hepatitis (AIH) and the occurrence of de novo autoimmune hepatitis (dnAIH) after liver transplantation, which can both lead to graft dysfunction if not timely treated. Interestingly, history HCV infection and interferon gamma therapy are both related to dnAIH occurrence post-transplantation (39). Moreover, study of the inflammatory infiltrates in the livers of transplant pediatric patients showed that antibodies against T-bet (transcription factor of T helper cells 1) was lower in dn-AIH than in the AR and AIH group (40). Risk of recurrence of AIH post-transplantation is related to risk factors such as HLA-DR3 or HLA-DR4 positivity (41) and early withdrawal of corticosteroids (42,43). Taken together, autoimmunity and infection can change histological findings and make it difficult to diagnose AR by histology. This is analogue to an ideal biomarker that must not only diagnose the AR, but also be able to correctly differentiate to existing inflammatory states. Note, failure to differentiate between infection, recurrent HCV and AR may result fatal cumbersome by immunosuppression that orchestrate immune response and perifocal immunological changes (44, 45).
A stratification analysis was considered in order to examine the studies in separate sample size layers to test for heterogeneity. However, while only 4 studies were considered for the metaanalysis, stratification would lead to low number with reduced subgroups. Since the estimation of the degree of heterogeneity in a model with random effects depends to a large extent on the number of studies, a stratification means smaller subsets with worse estimates of the amount of heterogeneity (46). The sample sizes of the studies included in the meta-analysis are: 71, 40, 275, 690. Hence, a stratification in two or more groups cannot be made meaningfully. Therefore, we decided to quantify the heterogeneity between the studies using the random-effect model described by DerSimonian and Laird (47).

CONCLUSION
The present review and meta-analysis systematically overviews research on the use of noninvasive markers to diagnose AR after LT. Interestingly, although only tested in one study, IL-2R exhibited superior sensitivity/specificity underscored by a decent sample size and low bias for all screened parameters. Nevertheless, liver biopsy remains superior to the noninvasive approach of diagnostic biomarkers as most of the marker study designs were inferior, hindering the possible translation of this noninvasive technique into routine clinical use at this time. This is complicated by the fact, that a robust endpoint for AR is missing and the evaluation of histopathological findings of AR are influenced by intra-and interobserver variability and occurring infections. Therefore, an appropriate clinical trial design for validating the diagnostic accuracy of a potential noninvasive marker for diagnosing AR remain indispensable.

AUTHOR CONTRIBUTIONS
FK and EK elaborated hypothesis, constructed the search algorithm and performed the literature search systematically. FK, EK, MS, and AL wrote the manuscript. SG and KK substantial contributed to the acquisition of a large part of the extracted data. IS and KS critically revised the manuscript and interpreted the data. EK and KS performed the statistical analysis. JP and LF added important intellectual content to the manuscript. FK, EK, MS and AL edited the revision of the manuscript. All authors read and approved the final version of the manuscript.