Comparison of different predictive biomarker testing assays for PD-1/PD-L1 checkpoint inhibitors response: a systematic review and network meta-analysis

Background Accurate prediction of efficacy of programmed cell death 1 (PD-1)/programmed cell death ligand 1 (PD-L1) checkpoint inhibitors is of critical importance. To address this issue, a network meta-analysis (NMA) comparing existing common measurements for curative effect of PD-1/PD-L1 monotherapy was conducted. Methods We searched PubMed, Embase, the Cochrane Library database, and relevant clinical trials to find out studies published before Feb 22, 2023 that use PD-L1 immunohistochemistry (IHC), tumor mutational burden (TMB), gene expression profiling (GEP), microsatellite instability (MSI), multiplex IHC/immunofluorescence (mIHC/IF), other immunohistochemistry and hematoxylin-eosin staining (other IHC&HE) and combined assays to determine objective response rates to anti–PD-1/PD-L1 monotherapy. Study-level data were extracted from the published studies. The primary goal of this study was to evaluate the predictive efficacy and rank these assays mainly by NMA, and the second objective was to compare them in subgroup analyses. Heterogeneity, quality assessment, and result validation were also conducted by meta-analysis. Findings 144 diagnostic index tests in 49 studies covering 5322 patients were eligible for inclusion. mIHC/IF exhibited highest sensitivity (0.76, 95% CI: 0.57-0.89), the second diagnostic odds ratio (DOR) (5.09, 95% CI: 1.35-13.90), and the second superiority index (2.86). MSI had highest specificity (0.90, 95% CI: 0.85-0.94), and DOR (6.79, 95% CI: 3.48-11.91), especially in gastrointestinal tumors. Subgroup analyses by tumor types found that mIHC/IF, and other IHC&HE demonstrated high predictive efficacy for non-small cell lung cancer (NSCLC), while PD-L1 IHC and MSI were highly efficacious in predicting the effectiveness in gastrointestinal tumors. When PD-L1 IHC was combined with TMB, the sensitivity (0.89, 95% CI: 0.82-0.94) was noticeably improved revealed by meta-analysis in all studies. Interpretation Considering statistical results of NMA and clinical applicability, mIHC/IF appeared to have superior performance in predicting response to anti PD-1/PD-L1 therapy. Combined assays could further improve the predictive efficacy. Prospective clinical trials involving a wider range of tumor types are needed to establish a definitive gold standard in future.


Introduction
Since the approval of anti-PD-1/PD-L1 inhibitors in the treatment of melanoma in 2014, the overall survival of patients has improved significantly.However, anti-PD-1/PD-L1 immunotherapy still has many shortcomings, such as PD-1/L1induced immune-related adverse events (irAEs) and hyperprogression (1).It is important to predict patients' response to PD-1/PD-L1 immunotherapy based on the consideration of medical economics.
Various testing assays have been approved to predict the efficacy of anti-PD-1/PD-L1 immunotherapy response.Food and Drug Administration (FDA) has approved PD-1/PD-L1 IHC, TMB, proficient mismatch repair (pMMR) proteins, deficient mismatch repair (dMMR), and MSI-high (MSI-H) for specific tumor types and drugs as companion or complementary diagnostics (2).Similarly, European Communities (CE) and National Medical Products Administration (NMPA) have carried out their own standards on companion diagnostics and prediction assay applications.
PD-L1 IHC, the first approved companion diagnostic biomarker, aims to detect PD-1/PD-L1 expression on tumor cells or inflammatory cells.However, the efficacy of IHC may be influenced by the experience of pathologists, tumor types examined, and the used scoring methods.Researchers are now exploring the optimal detecting assay and scoring methods for specific tumors (3).
TMB has been found to increase neoantigens of major histocompatibility complexes (MHC) in various cancers, which leading to better immunotherapy response in patients.Increasing evidence indicates that different tumor types own various expression levels of TMB.TMB is usually assessed by nextgeneration sequencing (NGS) platforms, though standards of threshold and application methods need to be defined exactly to enhance accuracy across different tumor types.This would entail considerations such as genome coverage, workflow, and appropriate cutoff values (4).MSI and GEP display the difference in gene expression as well.MSI-H phenotype arises from numerous frameshift mutations due to deficits of the MMR system (5).Patients with MSI-H are more likely to suffer from various cancers, including colorectal cancer.MMR proteins, which could be detected by IHC, polymerase chain reaction (PCR), and gene sequencing, are now being used to identify MSI-H patients in various cancer types.
Detection and evaluation of tumor microenvironment (TME) have also been explored in recent years (6).For example, researchers have found that the epithelial-mesenchymal transition (EMT)-and stroma-related gene expression status is related to patients' tumorigenesis and drug resistance (7,8).mIHC/IF and gene sequencing technique could offer more chances to verify (9).GEP could also allow the integrations of different gene signatures and training models to predict prognosis and drug response based on the results of DNA-microarray and RNA sequencing (RNA-Seq) (10)(11)(12).Some researchers have also explored the combined approaches, such as TMB+GEP or TMB+IHC, since such predictors could work through different mechanisms or may be positively correlated with each other.All biomarker assays mentioned above present novel opportunities to predict the response rate of PD-1/PD-L1 inhibitors.
Assessment and evaluation of diagnostic tests could also benefit from the increasing diagnostic test accuracy (DTA) studies and the continuous development of statistical methods.In the era of evidencebased medicine, meta-analysis plays an important role in integrating of different studies with pairs of intervention using various methodological methods.To enable the comparison of different assays with limited data and generate a whole scale ranking results, NMA turned out to be a better tool to indirectly compare and jointly analyze three or more DTA studies simultaneously.
In this study, we compared the diagnostic accuracy of seven biomarker testing assays, including PD-L1 IHC, TMB, GEP, MSI, mIHC/IF, other IHC&HE, as well as combined assays for predicting anti-PD-1/PD-L1 immunotherapeutic response.Diagnostic accuracy measures used in this study included sensitivity, specificity, relative sensitivity, relative specificity, PPV, NPV, relative predictive values, DOR, and superiority index (13).It is believed that the NMA performed here could provide stronger clinical evidence for current medical practice.

Methods
This NMA was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) NMA checklist.

Eligibility criteria
The included research articles in this study were based on realworld data, and English translations were available.The studies were required to conduct PD-1/PD-L1 monotherapies and utilize at least two predictive biomarker testing assays on pre-treatment tissue samples.These assays could include PD-L1 IHC, TMB, GEP, MSI, mIHC/IF, HE for tumor-infiltrating lymphocytes (TIL), or other IHC methods.Each biomarker testing assay should provide sufficient information to determine the objective response rate (ORR) or non-progression rate (NPR) and allow for the calculation of sensitivity and specificity.If any testing assay had fewer than 15 tissue samples, it would not be considered.Hematologic cancers and flow cytometry studies on tumor lysates were excluded.

Search strategy and data collection
We systematically searched PubMed, Embase, and the Cochrane Library database for relevant studies and their errata (till February 2023).Additionally, we manually searched articles related to relevant clinical trials.For example, the search formula of Embase included: ("Immunohistochemistry " OR " Tumor mutational burden " OR " gene expression profiling " OR " multiplex immunofluorescence " OR " neoantigen load " OR " Immunofluorescence ")[Find articles with these terms] AND ("Pembrolizumab " OR " Nivolumab " OR " Durvalumab " OR " Toripalimab " OR " Camrelizumab " OR " Atezolizumab " OR " Avelumab " OR " Avelumab " OR " Budigalimab ")[Title, abstract or author-specified keywords] AND (Research articles) [Filter].The intact search formula and results were in the Supplementary material.
Necessary information from eligible studies was extracted by three researchers independently and all inconsistencies were settled by discussion.The trial name, first author, year of publication, sample size, trial phase, tumor type, PD-1/PD-L1 antibody, and index test assay was recorded.To calculate sensitivity and specificity for each index test, we organized ORR-related information into a 2x2 table.We used Youden's index, which combines values for sensitivity and specificity to indicate test accuracy, to select the best-performing threshold among multiple thresholds.If a clinical trial has multiply publications, the one with most complete information was adopted.

Statistical analysis and quality assessment
The main outcomes were calculated by NMA.As for Bayesian NMA, the ANOVA model made it possible to use the original data and arm-based (AB) model (14).The latter shows superiority to contrast-based (CB) models by accommodating more complex variance-covariance structures.NMA was mainly performed with the R package "Rstan" (R version 4.2.2).In order to improve accuracy and compare diagnostic assays one by one, calculations were repeated 7 times (model_code = model, chains = 2, iterations = 10000, warmup = 5000, thin = 5), and then, we draw league tables for relative comparations.Given numerical variance, we chose the median of sensitivity, specificity, PPV, NPV, SROC, and superiority index.
The Midas module for DTA meta-analysis facilitated validation of results and assessment of heterogeneity by forest plot and I 2 analysis for every 7 biomarker modalities.Sensitivity, specificity, DOR, and summary receiver operating characteristic (SROC) curves and their associated area under the curve (AUC) were analyzed by Midas, which employs a bivariate mixed-effects logistic regression modeling framework and empirical Bayesian predictions.Publication bias of studies was also evaluated by Deeks' funnel plot asymmetry test (p<0.05indicating significant asymmetry).The network graphs package on Stata were used to draw the network graphs.Meta-analysis and drawing figures were fulfilled in Stata (17.0 MP-Parallel Edition).
The QUADAS-C (Quality Assessment of Diagnostic Accuracy Study) tool was used to assess the risk of bias and applicability in each selected study.There were 4 sections for risk of bias: patient selection, index test, reference standard, and flow and timing; meanwhile, concerns regarding applicability were presented in 3 sections: patient selection, index test, and reference standard.
Table 1 also revealed that the PPV for each assay was below 0.60, indicating that positive results may not correctly predict the response to PD-1/PD-L1 checkpoint inhibitors.MSI (0.56, 95% CI: 0.45-0.67)had the highest PPV, while GEP (0.33, 95% CI: 0.28-0.38)was the lowest.However, all assays provided relatively good performance in NPV, with even the lowest being near 0.80 (GEP: 0.8, 95% CI: 0.77-0.83).This suggested that these assays were useful in providing evidence to refuse immunologic therapy due to the accuracy of figuring out non-responsive patients.

Rankings, DOR and superiority index
Relative sensitivity, relative specificity, relative PPV, and relative NPV were shown in the league table (Table 2).From the league table for relative sensitivity (lower triangle of Table 2 (A), we can see that mIHC/IF, other IHC&HE, and PD-L1 IHC had similar efficacy and performed better than TMB, GEP, combined assays, and MSI according to the relative risk (RR) values.The upper triangle of Table 2(A) represented the relative specificity, MSI and multi-assay showed superiority to the other, meanwhile, the remaining tests exhibited comparable efficacy.Similarly, MSI and combined assays demonstrated higher relative PPVs among assays, as shown in the lower triangle of Table 2

(B). There was no difference among relative NPVs (upper triangle of Table 2(B).
Table 1 presented the odds of responsive patients in test positives versus the odds of responsive patients in test negatives as measured by the DOR.MSI (6.79, 95% CI: 3.48-11.91)has the highest DOR as its high specificity, followed by mIHC/IF (4.44, 95% CI: 3.19-5.93),largely driven by its high sensitivity.In contrast, the DOR for gene expression profiling (GEP) was noticeably lower at  Flowchart Showing Literature Search and Study Selection.The study process followed the PRISMA guidelines.NMA, network meta-analysis.
1.81 (95% CI: 1.31-2.40).The high superiority index indicated biomarkers modality performs comparatively well in both sensitivity and specificity.In contrast, the low superiority index represents biomarkers that had a poor performance of at least one assessment measure.As Table 1 summarized, the ranks of superiority index from highest to lowest were TMB, mIHC/IF, other IHC&HE, MSI, PD-L1 IHC, combined assays, and GEP.

Heterogeneity and quality assessment
To further validate these present results, a meta-analysis was conducted and revealed the same ranks of sensitivity, specificity, and DOR as NMA (Table 3).The value of sensitivity and specificity were very similar, indicating reliable results from the ANOVA model used in the NMA.SROC generated through meta-analysis displayed the AUC for each biomarker testing assay.mIHC/IF had the largest AUC (0.80), while GEP exhibited the smallest (0.61) and AUC of all others were close to 0.70 (Figure 3).Ranking trends for AUC and DOR were similar, indicating the reliability of our ranking results for NMA.
However, the heterogeneity for each biomarker was high due to the absence of testing standards and various tumor types and thresholds.Although we chose the best performance threshold, I 2 was higher than 50% (Supplementary Figure 1).Nonetheless, publication bias wasn't obvious (p>0.1),according to Supplementary Figure 2. QUADAS-C tools allowed us to evaluate the quality (Supplementary Table 3).
Our conclusion is in alignment with those from a previous meta-analyses that had addressed similar topics (63,64), which indicated that mIHC/IF was superior to PD-L1 IHC, TMB and GEP in predicting response to PD-1/PD-L1 checkpoint inhibitors and that combinatorial assays could improve predictive efficacy.Yet, to our best of knowledge, our study was the first to use NMA to demonstrate the objective benefits of mIHC/IF in predicting patients' response to PD-1/PD-L1 checkpoint inhibitors.Upon stratifying by tumor types, we also observed that mIHC/IF had both remarkable sensitivity and specificity in NSCLC.PD-L1, mIHC/IF and IHC also manifested relatively high DOR and superiority index in gastrointestinal cancers, which further substantiated the strengths of mIHC/IF.
To address the challenge of ranking multiple diagnostic tests simultaneously, statistical scientists have developed several new models based on the Bayesian setting for NMA of DTA studies (65), since traditional meta-analysis and NMA of intervention were not efficient enough to handle this issue.Multivariate extensions of meta-analysis models of DTA had been applied to NMA.In addition, the ANOVA model used in this NMA could facilitate ORR to be compared indirectly and rank testing assays directly (14).Researchers could also compare multiple thresholds per testing assay using certain models (66).
High sensitivity, DOR, and AUC of mIHC/IF collectively indicated its superiority in identification of potential patients who may benefit most from immunotherapy.mIHC/IF facilitates the acquisition of quantitative multiplexed data, which plays a pivotal role in deciphering the intricate relationship between tumor cells, their microenvironment, and antigen expressions at the single-cell level.This capability assumes paramount importance in understanding tumorigenesis, cancer progression, and immunotherapy responses.In all instances of mIHC/IF index testing, CD8 was included, and T cell antigen expression was examined.Various studies have established a link between T cells' cytotoxicity and pro-inflammatory activity with patients prognosis through its regulation of inherent immunological function by tumor antigens like CD8 or PD-1 (67-70), which further supports the potency of antigens on tumor-infiltrating lymphocytes (TILs).However, false negative results obtained from mIHC/IF screening may exclude some patients who may could benefit from immunotherapy, suggesting the need to explore additional proteins and combined assays to improve specificity.To enhance the precision in scoring staining, many researchers have incorporated artificial intelligence with mIHC/IF, rendering it a relatively convenient and cost-effective method when compared to combined assays (71).Thus, our study has concluded that mIHC/IF had the best performance and a broad range of applications.
PD-L1 IHC, the most widely used assay, exhibited suboptimal performance in sensitivity, specificity, and DOR.As previously mentioned, TME is excessively intricate and heterogeneous to be comprehensively elucidated by a singular mechanism.Furthermore, expressions of PD-1 and PD-L1 exhibit considerable interpatient variability.These two factors collectively contribute to the suboptimal performance of PD-L1 IHC as a predictive marker.The possible reasons for such unsatisfactory results varied, including the lack of experience for pathologists, sample type examined, and IHC assays used (72).A meta-analysis that scrutinized and compared different IHC assays using tumor proportion score (TPS) revealed that the sensitivity and specificity values were similar except SP142 with lower sensitivity (73).The quantification and assessment of PD-1 protein expression through SROC Plot of "mIHC/IF" "combined assays" "MSI " "TMB" "other IHC&HE" "PDL1 IHC" and "GEP" by Meta-analysis.SROC, Summary receiver operating characteristic curves; AUC, Area under the curve; PD-L1 IHC, Programmed cell death ligand 1 immunohistochemistry; TMB, Tumor mutational burden; GEP, Gene expression profiling; MSI, Microsatellite instability; mIHC/IF, Multiplex immunohistochemistry/ immunofluorescence; other IHC&HE, Other Immunohistochemistry and hematoxylin-eosin staining.scoring methods varied among different assays, such as TPS, combined positivity score (CPS), and immune cell (IC) score (3).Gastrointestinal tumors were characterized by their most extensive proportions of MSI-H/dMMR, therefore, MSI status detection could be a reasonable approach to predict the response to immunotherapy.Subgroup analysis of gastrointestinal tumors indicated that MSI detection offered a valuable method for ruling out non-responsive patients due to its high specificity performance.MSI detection was also conducted in other solid tumors, including endometrial cancer, adrenocortical carcinomas, and multiple endocrine neoplasias (MENs).High specificity, DOR, and AUC of MSI suggested its potential applications in some other tumor types.
Regrettably, generalization of MSI detections to a wider range of tumors may be prevented by the fact that most tumors in fact exhibit microsatellite stability (MSS) status.
Our efficacy rankings placed TMB and other IHC&HE in the middle, while GEP was ranked last, although they are closely related to crucial aspects of tumor immunology such as neoantigen, TME, and inflammatory gene signature.Nevertheless, it is important to note that the MSI status, TMB, and GEP serve as indicators of the gene phenotype, which is not directly associated with the primary mechanism of PD-1/PD-L1 immunotherapy compared to protein expression.The measurements obtained through MSI, TMB, and GEP reflect events upstream of gene expression, which may potentially diminish their predictive efficacy.Uncovering specific and precise gene pathways solely through these indicators can prove to be challenging.Whereas thresholds for TMB and GEP were mainly determined by proportions, other IHC&HE methods typically detected CD8 and TILs with different methods.This highlights the potential impossibility that some immature tests could have covered all types of tumors.
Combined assays provided more chances to improve the prediction accuracy in current challenging scenario.When TMB was combined with PD-L1 IHC, the performance of sensitivity was improved noticeably without sacrificing specificity.Ricciuti, B. et al. have explored the association of high TMB with other biomarkers and found that high TMB was related to higher proportions of tumor-infiltrating CD8+, PD1+ T cells, and high PD-L1 expression in cancer cells (74).Fumet, J.-D. et al. reported that tumors displaying high PD-L1/low CD8 TILs developed microenvironments conducive to tumor proliferation and exhibited poor outcomes (75).This may explain the enhanced efficacy of combined assays.Yet, the shortcomings of combined assays were high cost and technical complexity.
Despite nearly a decade of research on companion or complementary diagnostics for prediction purposes, the most effective indicators for PD-1/PD-L1 inhibitors have not yet been established for most tumors.While some testing assays such as mIHC/IF and combined tests hold potential values, there was still no perfect test with satisfactory sensitivity and specificity simultaneously in our analysis.Consequently, clinicians should exert appropriate caution when detecting predictive biomarkers and interpreting associated results.Additionally, it is believed that our NMA could provide supporting evidence to researchers and clinicians for amelioration of predictive tests in future.

Limitations
It is crucial to note that a high ORR doesn't necessarily translate into a high OS.It is essential to take care when interpreting results based on studies that relied solely on ORR which may not take into account of OS or progressive rate.To mitigate bias, it is worth noting that the threshold we chose with Youden's index may favor higher sensitivity and specificity.An article with two or more biomarker tests was selected, which may cause bias by giving up some robust data in each test.Moreover, there was a significant disparity between the number of studies conducted in PD-L1 IHC versus mIHC/IF.Last but not least, although our study mainly covered 15 types of tumors, the generalization of the conclusion still requires deliberation.

Conclusion
Various large prospective and retrospective studies have investigated biomarkers for the prediction of PD-1/PD-L1 checkpoint inhibitors response.According to our network metaanalysis, mIHC/IF had the best performance and a large range of applications.Given the diverse employment of mIHC/IF with different biomarkers across various studies, further investigations involving precise combinations are warranted to enhance prognostic prediction.When considering the selection of specific markers, it is crucial to take into account not only their efficiency and cost-effectiveness but also rely on substantiation from evidence derived from molecular mechanisms.Further exploration was required in combined assays of the high efficacy of TMB+PD-L1 IHC.Currently, there is a lack of studies or consensus regarding the workflow of companion or complementary diagnostics in this context.The existing approach is primarily based on clinicians' acknowledgment, and we anticipate that future research will provide more foundational evidence to support these practices.What' more, more evidence based medicine are needed to determine detailed testing modalities and thresholds for all types of tumors, e.g.advanced ovarian cancer.Clinicians should be cautious that the prognostic accuracy of each index test should be interpreted in a particular situation.

FIGURE 2 Network
FIGURE 2 Network Plot.Both nodes and lines are weighted according to the number of studies involved in each treatment and direct comparison, respectively.PD-L1 IHC, Programmed cell death ligand 1 immunohistochemistry; TMB, Tumor mutational burden; GEP, Gene expression profiling; MSI, Microsatellite instability; mIHC/IF, Multiplex immunohistochemistry/immunofluorescence; other IHC&HE, Other Immunohistochemistry and hematoxylin-eosin staining.

FIGURE 3
FIGURE 3 editing.YZ: Conceptualization, Data curation, Formal Analysis, Supervision, Writingreview & editing.TD: Conceptualization, Funding acquisition, Software, Supervision, Writingreview & editing.Funding The author(s) declare financial support was received for the research, authorship, and/or publication of this article.This work was supported by Innovation and Development Joint Funds of N a t u r al S c ie n c e F ou n d a t i o n o f S h a n d o n g P r ov in ce [ZR2021LZL009] and National Natural Science Foundation of China [82303956].

TABLE 3
Result validation by meta-analysis.

TABLE 2
Relative sensitivity, relative specificity, relative PPV, and relative NPV by network meta-analysis.

TABLE 4
Subgroup analysis of NSCLC by network meta-analysis.