Deep learning-based assessment of PD-L1 expression in NSCLC predicts outcome for patients treated with anti-PD-1 immunotherapy

Peroz, Morgane; Roussot, Nicolas; Ilie, Alis; Rageot, David; Derangere, Valentin; Truntzer, Caroline; Ghiringhelli, François

doi:10.3389/fimmu.2026.1750816

ORIGINAL RESEARCH article

Front. Immunol., 13 February 2026

Sec. Cancer Immunity and Immunotherapy

Volume 17 - 2026 | https://doi.org/10.3389/fimmu.2026.1750816

This article is part of the Research TopicArtificial Intelligence Advancing Lung Cancer Screening and TreatmentView all 14 articles

Deep learning-based assessment of PD-L1 expression in NSCLC predicts outcome for patients treated with anti-PD-1 immunotherapy

Morgane Peroz^1*

Nicolas Roussot^1,2

Alis Ilie¹

David Rageot¹

Valentin Derangere^1,3

Caroline Truntzer^1,3*

François Ghiringhelli^1,2,3*

¹Université Bourgogne Europe, Centre Georges-François Leclerc, Unicancer, Cancer Biology Transfer Platform, UMR INSERM 1231, Therapies and Immune Response in Cancers (TIRECs) team, Dijon, France
²Department of Medical Oncology, Centre Georges-François Leclerc, Dijon, France
³Genetic and Immunology Medical Institute, Dijon, France

Background: PD-L1 expression is widely used as a predictive biomarker for anti-PD-1 therapies in non-small cell lung cancer (NSCLC). However, its prognostic value remains controversial. Here, we investigated whether deep learning (DL) applied to PD-L1 immunohistochemistry (IHC) slides could identify histological patterns predictive of outcome in patients treated with anti-PD-1 therapy.

Methods: We analyzed two independent NSCLC cohorts: MSK (n=182, training) and CGFL (n=108, validation). Tumor regions were manually annotated, tiled, stain-normalized, and processed through the UNI foundation model to extract deep features. Clustering of tiles from 10 extreme-outcome MSK cases identified histology-based subgroups. These were then applied to the remaining patients by projection and majority voting. Associations with progression-free survival (PFS) and overall survival (OS) were assessed. DL groups were integrated with clinical covariates in a multivariate model.

Results: Clustering revealed two distinct DL-defined groups (DL^High vs. DL^Low). In the MSK cohort, DL^High patients had significantly longer PFS than DL^Low (median 5.7 vs. 2.5 months; HR = 0.63, 95% CI 0.44–0.89; p=0.01). This prognostic value was independently confirmed in the CGFL cohort (median PFS 15.2 vs. 6.2 months; HR = 0.59, 95% CI 0.36–0.96; p=0.03). OS was numerically higher in DL^High patients but did not reach significance. DL classification correlated with higher PD-L1 tumor proportion score (TPS). Discordance between DL and TPS was observed, and the DL model further stratified outcomes among patients with TPS ≥50%. A combined model integrating DL groups with clinical variables improved prediction of PFS compared to clinical features alone (HR = 0.50, 95% CI 0.33–0.75; p<0.001 in MSK; HR = 0.54, 95% CI 0.31–0.91; p=0.02 in CGFL).

Conclusions: Deep learning applied to PD-L1 IHC slides identifies reproducible histomorphological patterns associated with outcomes in anti-PD-1–treated NSCLC patients. This approach provides prognostic information beyond conventional PD-L1 scoring and enhances predictive accuracy when combined with clinical factors.

Introduction

Immune checkpoint inhibitors (ICIs) targeting the programmed death 1 (PD-1) and programmed death-ligand 1 (PD-L1) axis have revolutionized the treatment landscape of advanced and metastatic non-small cell lung cancer (NSCLC), offering durable clinical benefit and improved survival outcomes for a subset of patients (1–4). Currently, these treatments are commonly used as first-line therapy as monotherapy or in combination with chemotherapy in patients without targetable oncogenic driver alterations (5). Despite these advances, only approximately 20–30% of patients receiving ICI monotherapy experience meaningful responses, highlighting the critical need for more accurate predictive biomarkers to guide patient selection and optimize therapeutic efficacy (6, 7).

Currently, the expression of PD-L1 protein, typically measured by immunohistochemistry (IHC) on tumor biopsies, serves as the unique standard biomarker to stratify patients for anti-PD-1/PD-L1 therapies (8). In particular, the decision to select treatment comprising immunotherapy alone or chemoimmunotherapy is mainly based on the assessment of PD-L1 status using Tumor Proportion Score (TPS). When TPS is above 50%, immunotherapy alone may be used instead of chemoimmunotherapy (9). However, PD-L1 expression is an imperfect indicator of response due to several limitations. First, technical variability arising from different antibody clones (e.g., 22C3, SP263, QR1), staining protocols, and interobserver interpretation can lead to inconsistent scoring (10). In addition, intratumoral PD-L1 expression is not only limited to tumor cells, but heterogeneity of PD-L1 expression and dynamic changes induced by prior treatments or the tumor microenvironment further complicate accurate assessment (11). Clinically, some patients with low or negative PD-L1 expression may respond to ICIs, whereas a significant proportion of patients with high PD-L1 levels do not achieve clinical benefit.

In this context, the emergence of artificial intelligence (AI) and deep learning approaches in computational pathology offers promising solutions to overcome these challenges. Deep convolutional neural networks can analyze whole-slide images and extract subtle histopathologic and spatial features beyond the capabilities of traditional microscopy (12, 13). These models have been demonstrated to provide reproducible and objective PD-L1 quantification across different assays and institutions (14–16). Moreover, deep learning algorithms can integrate information on tumor-infiltrating immune cells, tumor architecture, and stromal components that are critical determinants of immunotherapy response, but difficult to quantify manually (17, 18).

Building on these technological advances, our study aims to develop and externally validate a deep learning–based model for the assessment of PD-L1 expression in NSCLC. We hypothesize that this approach will not only refine the accuracy and consistency of PD-L1 scoring but will also improve the prediction of clinical outcomes for patients treated with anti-PD-1 immunotherapy compared to conventional methods.

Methods

Patient selection

The first cohort was a public dataset downloaded from SYNAPSE (https://www.synapse.org/Synapse:syn26722053) and recently published (19) comprising 182 patients. The inclusion criteria for this cohort were: patients with stage IV NSCLC who initiated treatment with anti-PD-(L)1 blockade therapy between 2014 and 2019 at the study institution, and who had a baseline CT scan, baseline PD-L1 IHC assessment and next-generation sequencing by MSK IMPACT. The second cohort comprised 108 NSCLC tumor biopsies collected between 2015 and 2024 in the Department of Pathology of the Georges François Leclerc Cancer Center in Dijon, France. The inclusion criteria for this cohort were: patients with stage IV NSCLC who initiated treatment with anti-PD-(L)1 blockade therapy between January 2017 and December 2023 at the study institution.

Ethics committee approval

Only patients from whom informed consent was obtained were included in this retrospective study. The present study was approved by the CNIL (French national commission for data privacy) and the Georges François Leclerc Cancer Center (Dijon, France) local ethics committee, and was performed in accordance with the Helsinki Declaration and European legislation. This study falls within the scope of the biobanking authorization registered under the registration number AC-2014-2260.

Histological staining

MSK Cohort: IHC was performed on 4-μm FFPE tumor tissue sections using a standard PD-L1 antibody (E1L3N; dilution 1:100, Cell Signaling Technologies) validated in the clinical laboratory at the study institution. Staining was performed using an automated immunostaining platform (Bond III, Leica) using heat-based antigen retrieval employing a high pH buffer (epitope retrieval solution-2, Leica) for 30 min. A polymeric secondary kit (Refine, Leica) was used for detection of the primary antibody.

CGFL Cohort: PD-L1 protein expression in tumor cells was assessed using immunohistochemistry with a ready-to-use PDL1 commercial kit with QR1 or 22C3 antibodies. Tonsil tissue served as positive control tissue.

Image digitalization

MSK Cohort: PD-L1 IHC-stained diagnostic slides were digitally scanned at a minimum of ×20 magnification using an Aperio Leica Biosystems GT450 v.1.0.0.

CGFL Cohort: PD-L1 IHC-stained diagnostic slides were digitalized with an Evident VS200 (Evident) at 20× magnification to generate a whole slide imaging (WSI) file in vsi format.

Image analysis procedure

For all tumor slides, tumor area zones were manually selected, then these areas were separated into 100µm square tiles. Colors were normalized using MACENKO algorithm (20) and processed using UNI deep learning model (21) to extract high dimensional feature vectors (Figure 1A).

Figure 1

Figure consists of four panels. Panel A shows a workflow diagram for processing PD-L1 stained tissue samples, including manual tumor region selection via QuPath software, tile export, Macenko normalization, and feature extraction. Panel B presents a scatterplot with responders in green and non-responders in red, showing their distribution along two dimensions with cluster symbols. Panels C and D display two-dimensional scatterplots with three clusters marked by distinct colors and convex hulls, labeled by cluster numbers.

Figure 1. Feature analysis and cluster creation. (A) Flowchart of study design from PDL1 staining to UNI feature extraction. (B) Principal component analysis (PCA) of tile-level features extracted from 5 responder and 5 non-responder MSK patients. PCA was computed using all tiles from these patients (n=681), followed by hierarchical clustering on principal components (HCPC), resulting in three clusters. Different shapes indicate cluster membership, while tile color reflects response status (green: responders; red: non-responders). The numbers of tiles per cluster were 100, 349, and 232, respectively. Tiles from one MSK patient (C) and one CGFL patient (D) were projected a posteriori onto the PCA space trained on the MSK cohort without re-estimating either the PCA loadings or the cluster structure. Colored regions correspond to the convex hull enclosing all MSK tiles belonging to each cluster and are used solely to visualize the spatial extent of each group. Blue dots indicate tiles from the projected patient.

To ensure consistency across datasets, we performed feature-wise normalization using the MSK cohort as a reference. For each of the 1,024 features, we calculated its mean and standard deviation across all MSK patients. These feature-specific statistics were then used to normalize the data in the CGFL cohort: for each patient and each feature, the corresponding MSK cohort mean was subtracted and the result divided by the corresponding MSK cohort standard deviation. This procedure ensures that each feature is scaled relative to its distribution in the MSK cohort.

Statistical analysis

Quantitative variables are described as median and Interquartile Range (IQR), and qualitative variables as number and percentage. Patient characteristics were compared by cohort (whole cohort, MSK and CGFL) using the Chi-2 or Fisher’s exact test for qualitative variables, and the Wilcoxon rank sum test for continuous variables, as appropriate.

Survival analysis was performed using the survival R library. The prognostic value of the different variables was tested using univariate or multivariate Cox models for PFS when conditions of the model validity were applicable. Proportional hazards assumptions were tested based on Schoenfeld residuals. When the proportionality assumption was not verified, we fitted an extended Cox model, with time dependent coefficients for relevant variables; the time varying coefficient was described with a parametric time function. Survival probabilities were estimated using the Kaplan–Meier method and survival curves were compared using the log-rank test when appropriate. When the proportional hazards assumption was not checked, the estimated restricted mean survival time (RMST) for DFS at 24 months was assessed to compare groups of interest (SurvRM2 R library (22)). P-values less than 0.05 were considered statistically significant.

Statistical analyses were performed using the R software (http://www.R-project.org/) and graphs were drawn using GraphPad Prism version 9.0.2.

Results

Patient selection and characteristics

We used a public data set from patients treated for NSCLC at Memorial Sloan Kettering (MSK) Cancer Center and who received PD-(L)1-blockade-based therapy. These patients were treated between 2014 and 2019 (cohort characteristics are shown in Table 1). The second data set is constituted of patients treated in France for NSCLC at Center Georges Francois Leclerc between 2015 and 2024 with PD-(L)1-blockade-based therapy or chemoimmunotherapy; this cohort was used as a validation cohort. In the total population, there were more male than female patients. Most patients were smokers or former smokers, and the main histological type was adenocarcinoma. When pooling both cohorts, in first line, 212 patients were treated with anti PD-1-blockade-based therapy and 76 with chemoimmunotherapy. Immunotherapy was used in first line for 146 (66%) patients. PD-L1 TPS status is 0% for 66 patients, between 1 and 49% for 74 patients and greater than 50% for 150 patients.

Table 1

Table 1. Patient clinical characteristics in MSK, CGFL and whole cohort.

Comparison of the clinical variable between the two cohorts showed differences for all available characteristics, except for histological type and PD-L1 TPS status with a cutoff at 50%, thus demonstrating the substantial heterogeneity between the two data sets.

Generation of the deep learning procedure

10 patients from the MSK cohort were then isolated to train the model. We selected the five patients with the longest Progression-Free Survival (PFS) who did not progress, and the five patients with the shortest PFS who progressed. This corresponds to 361 tiles associated with response and 350 tiles associated with absence of response. Using Principal Components Analysis (PCA) followed by Hierarchical Clustering, tiles were separated into 3 clusters. Cluster 1 was constituted of responders only, cluster 2 was a mixture of responders and non-responders and cluster 3 was enriched in non-responders (Figures 1A, B).

To illustrate which histological patterns distinguish DL^High from DL^Low groups, Figure 2A provides representative tiles of each cluster. Morphologically, Cluster 1 matched tiles with low-cohesive epithelial cells that displayed a negative or an extremely weak stain for PD-L1. Cluster 2 matched tiles that mixed tumor epithelial cells with or without adjacent connective tissue. In this cluster PD-L1 staining was either low or quite strong, localized on tumor cells (TC) or immune cells (IC). Finally, cluster 3 was mainly represented by tiles displaying epithelial tumor cells with strong PD-L1 staining.

Figure 2

Panel A shows three clusters of tissue sections with varying morphology and PD-L1 staining: Cluster 1 presents non-cohesive cells with weak or absent PD-L1 staining, Cluster 2 features epithelium and connective tissue with variable PD-L1 staining, and Cluster 3 displays mainly epithelium with strong PD-L1 staining. Panel B is a box plot comparing PD-L1 TPS scores across clusters; Cluster 1 scores lowest, Cluster 2 is intermediate, and Cluster 3 scores highest, with significant differences indicated by asterisks.

Figure 2. Cluster interpretation. (A) Representative tiles of each cluster with corresponding visual descriptions. (B) Boxplots of the PD-L1 TPS score in Clusters 1 (n=1923 tiles), 2 (n=21–193 tiles) and 3 (n=28–747 tiles) for the pooled cohort. ***p-value<0.001.

These observations were concordant with quantitative evaluation of PD-L1 through staining (Figure 2B).

For the remainder of the patients, we projected each new patient’s tile onto the training PCA space (Figures 1C, D). We looked at which centroid this tile was closest to, and assigned it the label of the corresponding cluster. We then counted the total number of tiles assigned to each of the three clusters and, by majority voting, assigned the patient to the cluster with the most tiles. The same process was then applied in the remaining 172 patients from MSK cohort and on the validation set from 108 patients from CGFL (Supplementary Figure S1).

Prognostic role of the deep learning model

Clusters 1 and 2 exhibited similar PFS rates (results not shown) and were thus grouped together: in the so-called DL^High group; cluster 3 constituted the DL^Low group. In the training set, 67 patients were attributed to the DL^High group and 115 patients to DL^Low. When looking at response rates, there were 2 Complete Responses (CR), 14 Partial Responses (PR) in the DL^High group and 3 CR and 18 PR in the DL^Low group (Chi-2 test p-value=0.01). When using PFS as an endpoint, patients in the DL^High group had better PFS than patients classified as DL^Low (HR = 0.63 [0.44, 0.89; p=0.01) with a median PFS of 5.7 vs 2.5 months for training cohort. Overall survival was not available for this cohort(Figures 3A, B).

Figure 3

Seven-panel figure comparing clinical outcomes between DLHigh and DLLow groups. Panels A and C are stacked bar charts showing proportions of responders and non-responders, with A indicating a significant difference (p=0.01) and C showing a non-significant result (p=0.22). Panels B, D, F, and G are Kaplan-Meier survival curves for progression-free or overall survival by group, each displaying hazard ratios, confidence intervals, p-values, and numbers at risk. Panel E is an overall survival plot with restricted mean survival times and p-value. Legend identifies responders and non-responders in blue shades.

Figure 3. Association between survival and DL model derived groups. Barplots comparing the proportion of responders (Complete Response and Partial Response) and non-responders (Stable Disease and Progressive Disease) according to DL model derived classifier for the MSK (A) and the CGFL (C) cohorts. Kaplan-Meier curves with patients stratified according to the DL model derived classifier for progression-free survival for the MSK (B) and the CGFL (D) cohorts. (E) Kaplan–Meier curves with patients stratified according to the DL model derived classifier for overall survival for the CGFL cohort. Kaplan-Meier curves with patients stratified according to the DL model derived classifier for progression-free survival for the pooled cohort in patients treated with immunotherapy alone (F) and chemoimmunotherapy (G). DL, Deep Learning.

When applying the DL model in the validation cohort, 58 patients were attributed to the DL^High group and 50 patients to DL^Low. When looking at response rates, there were 11 CR and 25 PR in the DL^High group, and 6 CR and 16 PR in the DL^Low group (Chi-2 test p-value =0.22). When using PFS as an endpoint, patients classified as DL^High had better PFS than patients classified as DL^Low (HR = 0.59 [0.36, 0.96]; p=0.03) with median PFS of 15.2 vs 6.2 months for the validation cohort. When looking at Overall Survival (OS), patients classified as DL^High did not have significantly better OS than patients classified as DL^Low (RMST: DL^Low 14.55[12.03;17.07] vs DL^High 16.98 [14.51;19.45]; p = 0.17) with median OS of 37.7 vs 15.2 months (Figures 3C-E).

To complete the analysis, all patients were grouped together and divided according to their treatment. The DL model successfully identified significant subgroups with distinct survival, offering a more refined stratification for patients treated with immunotherapy alone (Figure 3F). For patients treated with chemoimmunotherapy, the DL model did not distinguish patients’ outcome (Figure 3G).

Correlation with PD-L1 TPS score

We examined the association between DL model groups and PD-L1 TPS score. In each cohort and in the pooled cohort, PD-L1 TPS score was significantly higher in patients in the DL^High group (Figures 4A–C). However, there was not complete agreement between the two scoring systems: PD-L1 TPS score 0% was detected in the DL^High group, while PD-L1 TPS score >50% were also detected in the DL^Low group.

Figure 4

Four-panel figure showing three box plots labeled A, B, and C, each comparing PD-L1 TPS scores between DL Low and DL High groups with significant differences indicated by triple asterisks. Panel D is a Kaplan-Meier curve depicting progression-free survival over twenty-four months, with DL High group showing improved outcomes; hazard ratio, confidence interval, and p-value are displayed, along with risk table beneath the graph.

Figure 4. Link between PD-L1 TPS score and DL model derived groups. Boxplots of the PD-L1 TPS score in DL^Low and DL^High groups for MSK (A), CGFL (B) and whole (C) cohorts. (D) Kaplan-Meier curves with patients stratified according to the DL model derived classifier for progression-free survival for the pooled cohort in the high PD-L1 TPS score group. ***p-value<0.001. DL, Deep Learning.

Moreover, when stratifying patients into high (≥50%) and low (<50%) PD-L1 TPS score groups, the DL model successfully identified significant subgroups with distinct survival, offering a more refined stratification for patients with high PD-L1 TPS score (Figure 4D). In the low (<50%) PD-L1 TPS score group, the DL model did not significantly distinguish patients’ outcome (results not shown).

DL score improves patient prediction in multivariate model

Clinical variables associated with PFS were selected based on univariate Cox models, and a multivariate clinical model was then estimated based on variables with p-values<0.1 (Figure 5A; Table 2). Because PD-L1 TPS score is correlated with the DL model, this variable was excluded from the multivariate model. WHO performance status, smoking status, treatment information and line of therapy were retained in the model. Variables selected in the clinical model and the DL group variable were combined in a unique multivariate survival model, named the “combined model”. A combined score was then estimated using the linear predictor of the combined model. Using the median as a cut-off, patients with a low score had better PFS than those with a high score (HR = 0.50 [0.33; 0.75]; p<0.001, Figure 5B). Similar observations were made in the validation cohort, using a threshold adapted to the cohort (HR = 0.54 [0.31; 0.91]; p=0.02 (Figure 5C). In the pooled cohort, AUCs of the DL model, clinical and combined model were respectively 0.36, 0.66 and 0.71. The likelihood-ratio test showed that our DL score significantly added prognostic value to the clinical model (p=0.03 when comparing clinical and combined model).

Figure 5

Figure consisting of three panels: Panel A displays a forest plot comparing hazard ratios for several clinical variables, with separate markers for univariate and multivariate analyses, indicating statistical significance with asterisks; Panels B and C present Kaplan-Meier curves of progression-free survival over twenty-four months, stratified by combined low and high groups, with hazard ratios, confidence intervals, p-values, and numbers at risk shown below each x-axis.

Figure 5. Survival analysis of clinical variables and DL model. (A) Forest plots representing hazard ratios and confidence intervals for univariate and multivariate Cox models for Progression-Free Survival estimated using clinical variables. *p-value<0.1. Kaplan-Meier curves with patients stratified according to the combined score for progression-free survival for the MSK (B) and the CGFL (C) cohorts. DL, Deep Learning.

Table 2

Table 2. Univariate and multivariate Cox models for progression-free survival (PFS) in the MSK cohort. Only characteristics associated to PFS were reported.

Discussion

The integration of ICIs into the treatment of advanced and metastatic NSCLC has transformed patient care by offering durable responses and improved survival for some patients (1–4). However, despite the revolutionary impact of agents targeting the PD-1/PD-L1 axis, the clinical benefit remains small and limited to approximately 20–30% of patients when allcomers are treated, a reflection of the underlying heterogeneity of NSCLC and the complex nature of antitumor immunity (7). This limitation underscores the critical need for reliable and robust biomarkers to optimize patient selection, guide therapeutic strategies, and ultimately enhance the efficacy of ICIs.

Current clinical decision-making relies heavily on the assessment of PD-L1 expression by IHC, with TPS guiding the choice between ICI monotherapy and chemoimmunotherapy (8, 9). While patients with high PD-L1 TPS (≥50%) may be offered immunotherapy alone, this biomarker is imperfect (10). As shown in recent reviews and practice guidelines, PD-L1 expression is subject to challenges such as technical variability among antibody clones and platforms, subjective interpretation, and spatial as well as temporal heterogeneity within tumors. Furthermore, discordance between PD-L1 status and response is well-documented: some patients with high PD-L1 expression achieve little clinical benefit, while others with low or undetectable PD-L1 respond to ICIs. These shortcomings have driven active research into alternative and complementary biomarkers, including circulating tumor DNA, tumor mutational burden, gene expression signatures, and features derived from the tumor microenvironment. However, the clinical utility of these emerging biomarkers remains under investigation, and none have yet supplemented PD-L1 in routine practice.

In this context, AI and deep learning technologies are emerging as powerful tools in computational pathology. By analyzing digitized histopathology slides, deep learning models can extract high-dimensional features beyond the limits of human interpretation, offering more objective, reproducible, and potentially more informative assessments of the tumor immune landscape. Some studies have established different deep learning models for evaluating or predicting PD-L1 and have shown strong explanatory and predictive power using either H&E or PD-L1 labeled IHC slides (23–30).

In addition, some reports support the capacity of deep learning models to predict outcome in NSCLC using H&E slides (31–34). The present study demonstrates the development and validation of a deep learning-based approach to assess PD-L1 expression and predict outcomes with anti-PD-1 therapy in NSCLC. Not only does the deep learning model provide more consistent scoring versus traditional IHC-based TPS, it also encapsulates critical contextual information such as spatial patterns of immune infiltration that are difficult to quantify manually, thus leading to improved prediction of prognosis in the group of patients with PD-L1 TPS score ≥50%. We assume that our deep learning approach makes it possible to add morphological information that is not taken into account by expression of PD-L1 protein alone.

The clinical utility of this approach is highlighted by its independent prognostic value in both the training and external validation cohorts. Notably, patients classified as DL^High by the model experienced significantly better progression-free and overall survival compared to the DL^Low group, outperforming conventional PD-L1 TPS for predicting RECIST response, as well as PFS and OS. Importantly, while a significant correlation between DL^High status and higher PD-L1 TPS was observed, there remained notable discordance, supporting the notion that deep learning captures complementary—and perhaps more clinically relevant—biological information. The value of the deep learning model in prognostic stratification was further confirmed for patients with high PD-L1 TPS.

These findings align with a growing body of literature advocating for the integration of digital pathology and machine learning into predictive biomarker development for immunotherapy response. AI models enabling clinically relevant risk stratification for cancer immunotherapy beyond conventional PD-L1 TPS have been proposed (31, 34). Some tools for mechanistic interpretability have been designed to extract interpretable spatial features from imaging data (34, 35). The ability of AI-driven models to standardize and enhance the interpretation of complex histological and immunological features represents a major step forward, potentially paving the way for more precise, individualized immunotherapy in lung cancer and beyond.

Nevertheless, several limitations of our study should be acknowledged. First, the choice to select 10 patients may be debated. This choice was intended to consider extreme patients as highlighting representative patterns of response. However, this does raise concerns about the generalizability of our model. Second, the manual annotation of tumor regions by pathologists is inherently subjective and may introduce observer-dependent bias. Third, the retrospective nature of the study, together with the relatively limited sample size used for model training, raises concerns about generalizability. Consequently, extensive validation in larger, prospective, and multi-institutional cohorts is warranted before definitive clinical translation can be considered.

Additionally, while the DL model was built on digitalized IHC slides for PD-L1, integration with other multi-omic and microenvironmental features—such as genomics, transcriptomics, and spatial immune profiling—may further improve predictive power and should be explored in future studies. Finally, future work could be performed to strengthen mechanistic interpretability of our DL model through quantification of tissue heterogeneity and organizational complexity (35).

In summary, this study provides compelling evidence that deep learning models applied to routine histopathology can overcome the technical and biological limitations inherent to traditional PD-L1 assessment, offering a pragmatic and scalable approach to refining immunotherapy selection in NSCLC. As the field moves toward increasingly data-driven and personalized cancer care, such innovations are poised to play a critical role in optimizing outcomes for patients receiving ICIs.

Data availability statement

The MSK cohort data is available at the following link https://www.synapse.org/Synapse:syn26722053 The CGFL cohort data are available under request.

Ethics statement

The studies involving humans were approved by CNIL (French national commission for data privacy). The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

MP: Formal Analysis, Methodology, Visualization, Writing – original draft. NR: Data curation, Writing – review & editing. AI: Data curation, Writing – review & editing. DR: Data curation, Writing – review & editing. VD: Data curation, Writing – review & editing. CT: Conceptualization, Formal Analysis, Methodology, Supervision, Validation, Writing – original draft. FG: Conceptualization, Supervision, Validation, Visualization, Writing – original draft.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Acknowledgments

We wish to thank Fiona Ecarnot (EA3920, University of Franche-Comté, Besancon, France) for correcting the manuscript and for helpful comments.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu.2026.1750816/full#supplementary-material

Supplementary Figure 1 | Workflow of the tile-based clustering and classification. From the MSK cohort (n=182), 10 patients (five responders and five non-responders) were used to perform Hierarchical Clustering on Principal Components (HCPC), identifying two IHC-based clusters. Tiles from the remaining MSK patients (n=172) and the independent CGFL cohort (n=108) were then projected onto this reference space and assigned to the closest cluster, allowing patient-level group prediction.

References

1. Brahmer J, Reckamp KL, Baas P, Crinò L, Eberhardt WEE, Poddubskaya E, et al. Nivolumab versus docetaxel in advanced squamous-cell non–small-cell lung cancer. New Engl J Med. (2015) 373:123–35. doi: 10.1056/NEJMoa1504627

PubMed Abstract | Crossref Full Text | Google Scholar

2. Borghaei H, Paz-Ares L, Horn L, Spigel DR, Steins M, Ready NE, et al. Nivolumab versus docetaxel in advanced nonsquamous non–small-cell lung cancer. New Engl J Med. (2015) 373:1627–39. doi: 10.1056/NEJMoa1507643

PubMed Abstract | Crossref Full Text | Google Scholar

3. Herbst RS, Baas P, Kim DW, Felip E, Pérez-Gracia JL, Han JY, et al. Pembrolizumab versus docetaxel for previously treated, PD-L1-positive, advanced non-small-cell lung cancer (KEYNOTE-010): a randomised controlled trial. Lancet. (2016) 387:1540–50. doi: 10.1016/S0140-6736(15)01281-7

PubMed Abstract | Crossref Full Text | Google Scholar

4. Mok TSK, Wu YL, Kudaba I, Kowalski DM, Cho BC, Turna HZ, et al. Pembrolizumab versus chemotherapy for previously untreated, PD-L1-expressing, locally advanced or metastatic non-small-cell lung cancer (KEYNOTE-042): a randomised, open-label, controlled, phase 3 trial. Lancet. (2019) 393:1819–30. doi: 10.1016/S0140-6736(18)32409-7

PubMed Abstract | Crossref Full Text | Google Scholar

5. Hendriks LE, Kerr KM, Menis J, Mok TS, Nestle U, Passaro A, et al. Non-oncogene-addicted metastatic non-small-cell lung cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up. Ann Oncol. (2023) 34:358–76. doi: 10.1016/j.annonc.2022.12.013

PubMed Abstract | Crossref Full Text | Google Scholar

6. Gandhi L, Rodríguez-Abreu D, Gadgeel S, Esteban E, Felip E, Angelis FD, et al. Pembrolizumab plus chemotherapy in metastatic non–small-cell lung cancer. New Engl J Med. (2018) 378:2078–92. doi: 10.1056/NEJMoa1801005

PubMed Abstract | Crossref Full Text | Google Scholar

7. Mountzios G, Remon J, Hendriks LEL, García-Campelo R, Rolfo C, Van Schil P, et al. Immune-checkpoint inhibition for resectable non-small-cell lung cancer - opportunities and challenges. Nat Rev Clin Oncol. (2023) 20:664–77. doi: 10.1038/s41571-023-00794-7

PubMed Abstract | Crossref Full Text | Google Scholar

8. Hirsch FR, McElhinny A, Stanforth D, Ranger-Moore J, Jansson M, Kulangara K, et al. PD-L1 immunohistochemistry assays for lung cancer: results from phase 1 of the blueprint PD-L1 IHC assay comparison project. J Thorac Oncol. (2017) 12:208–22. doi: 10.1016/j.jtho.2016.11.2228

PubMed Abstract | Crossref Full Text | Google Scholar

9. Reck M, Rodríguez-Abreu D, and Robinson AG. Pembrolizumab versus chemotherapy for PD-L1–positive non–small-cell lung cancer. N Engl J Med. (2016) 375:1823–33. doi: 10.1056/NEJMoa1606774

PubMed Abstract | Crossref Full Text | Google Scholar

10. Büttner R, Gosney JR, Skov BG, Adam J, Motoi N, Bloom KJ, et al. Programmed death-ligand 1 immunohistochemistry testing: A review of analytical assays and clinical implementation in non-small-cell lung cancer. J Clin Oncol. (2017) 35:3867–76. doi: 10.1200/JCO.2017.74.7642

PubMed Abstract | Crossref Full Text | Google Scholar

11. McLaughlin J, Han G, Schalper KA, Carvajal-Hausdorf D, Pelekanou V, Rehman J, et al. Quantitative assessment of the heterogeneity of PD-L1 expression in non–small-cell lung cancer. JAMA Oncol. (2016) 2:46–54. doi: 10.1001/jamaoncol.2015.3638

PubMed Abstract | Crossref Full Text | Google Scholar

12. Baxi V, Edwards R, Montalto M, and Saha S. Digital pathology and artificial intelligence in translational medicine and clinical practice. Mod Pathol. (2022) 35:23–32. doi: 10.1038/s41379-021-00919-2

PubMed Abstract | Crossref Full Text | Google Scholar

13. Coudray N, Ocampo PS, Sakellaropoulos T, Narula N, Snuderl M, Fenyö D, et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat Med. (2018) 24:1559–67. doi: 10.1038/s41591-018-0177-5

PubMed Abstract | Crossref Full Text | Google Scholar

14. Hondelink LM, Hüyük M, Postmus PE, Smit VTHBM, Blom S, von der Thüsen JH, et al. Development and validation of a supervised deep learning algorithm for automated whole-slide programmed death-ligand 1 tumour proportion score assessment in non-small cell lung cancer. Histopathology. (2022) 80:635–47. doi: 10.1111/his.14571

PubMed Abstract | Crossref Full Text | Google Scholar

15. Huang Z, Chen L, Lv L, Fu CC, Jin Y, Zheng Q, et al. A new AI-assisted scoring system for PD-L1 expression in NSCLC. Comput Methods Programs Biomed. (2022) 221:106829. doi: 10.1016/j.cmpb.2022.106829

PubMed Abstract | Crossref Full Text | Google Scholar

16. Shmatko A, Ghaffari Laleh N, Gerstung M, and Kather JN. Artificial intelligence in histopathology: enhancing cancer research and clinical oncology. Nat Cancer. (2022) 3:1026–38. doi: 10.1038/s43018-022-00436-4

PubMed Abstract | Crossref Full Text | Google Scholar

17. Saltz J, Gupta R, Hou L, Kurc T, Singh P, Nguyen V, et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep. (2018) 23:181–193.e7. doi: 10.1016/j.celrep.2018.03.086

PubMed Abstract | Crossref Full Text | Google Scholar

18. Zhang J, Choi H, Kim Y, Park J, Cho S, Kim E, et al. Artificial intelligence-based digital pathology using H&E-stained whole slide images in immuno-oncology: from immune biomarker detection to immunotherapy response prediction. J Immunother Cancer. (2025) 13:e011346. doi: 10.1136/jitc-2024-011346

PubMed Abstract | Crossref Full Text | Google Scholar

19. Vanguri RS, Luo J, Aukerman AT, Egger JV, Fong CJ, Horvat N, et al. Multimodal integration of radiology, pathology and genomics for prediction of response to PD-(L)1 blockade in patients with non-small cell lung cancer. Nat Cancer. (2022) 3:1151–64. doi: 10.1038/s43018-022-00416-8

PubMed Abstract | Crossref Full Text | Google Scholar

20. Macenko M, Niethammer M, Marron JS, Borland D, Woosley JT, Guan X, et al. (2009). A method for normalizing histology slides for quantitative analysis, in: 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, . pp. 1107–10. doi: 10.1109/ISBI.2009.5193250

Crossref Full Text | Google Scholar

21. Chen RJ, Ding T, Lu MY, Williamson DFK, Jaume G, Song AH, et al. Towards a general-purpose foundation model for computational pathology. Nat Med. (2024) 30:850–62. doi: 10.1038/s41591-024-02857-3

PubMed Abstract | Crossref Full Text | Google Scholar

22. Uno H, Claggett B, Tian L, Inoue E, Gallo P, Miyata T, et al. Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis. J Clin Oncol. (2014) 32:2380–5. doi: 10.1200/JCO.2014.55.2208

PubMed Abstract | Crossref Full Text | Google Scholar

23. Ge C, Shi Y, Wang W, Zhang A, Huang M, Zhao F, et al. Artificial Intelligence-driven image analysis for standardised programmed death-ligand 1 expression evaluation in non-small cell lung cancer. Diagn Pathol. (2025) 20:1–12. doi: 10.1186/s13000-025-01707-1

PubMed Abstract | Crossref Full Text | Google Scholar

24. Shamai G, Livne A, Polónia A, Sabo E, Cretu A, Bar-Sela G, et al. Deep learning-based image analysis predicts PD-L1 status from H&E-stained histopathology images in breast cancer. Nat Commun. (2022) 13:6753. doi: 10.1038/s41467-022-34275-9

PubMed Abstract | Crossref Full Text | Google Scholar

25. Sha L, Osinski BL, Ho IY, Tan TL, Willis C, Weiss H, et al. Multi-field-of-view deep learning model predicts nonsmall cell lung cancer programmed death-ligand 1 status from whole-slide hematoxylin and eosin images. J Pathol Inform. (2019) 10:24. doi: 10.4103/jpi.jpi_24_19

PubMed Abstract | Crossref Full Text | Google Scholar

26. Herbst RS, Prizant H, Ruderman D, Conway J, Shamshoian J, Koeppen H, et al. Digital versus manual PD-L1 scoring in advanced NSCLC from the IMpower110 and IMpower150 trials. J Thorac Oncol. (2025) 20:1778–90. doi: 10.1016/j.jtho.2025.07.131

PubMed Abstract | Crossref Full Text | Google Scholar

27. Wu L, Wei D, Chen W, Wu C, Lu Z, Li S, et al. Comprehensive potential of artificial intelligence for predicting PD-L1 expression and EGFR mutations in lung cancer: A systematic review and meta-analysis. J Comput Assist Tomogr. (2025) 49:101. doi: 10.1097/RCT.0000000000001644

PubMed Abstract | Crossref Full Text | Google Scholar

28. Plass M, Olteanu GE, Dacic S, Kern I, Zacharias M, Popper H, et al. Comparative performance of PD-L1 scoring by pathologists and AI algorithms. Histopathology. (2025) 87:90–100. doi: 10.1111/his.15432

PubMed Abstract | Crossref Full Text | Google Scholar

29. Kim H, Kim S, Choi S, Park C, Park S, Pereira S, et al. Clinical validation of artificial intelligence–powered PD-L1 tumor proportion score interpretation for immune checkpoint inhibitor response prediction in non–small cell lung cancer. JCO Precis Oncol. (2024) 8):e2300556. doi: 10.1200/PO.23.00556

PubMed Abstract | Crossref Full Text | Google Scholar

30. Molero A, Hernandez S, Alonso M, Peressini M, Curto D, Lopez-Rios F, et al. Assessment of PD-L1 expression and tumour infiltrating lymphocytes in early-stage non-small cell lung carcinoma with artificial intelligence algorithms. J Clin Pathol. (2025) 78:456–64. doi: 10.1136/jcp-2024-209766

PubMed Abstract | Crossref Full Text | Google Scholar

31. Rakaee M, Tafavvoghi M, Ricciuti B, Alessi JV, Cortellini A, Citarella F, et al. Deep learning model for predicting immunotherapy response in advanced non–small cell lung cancer. JAMA Oncol. (2025) 11:109–18. doi: 10.1001/jamaoncol.2024.5356

PubMed Abstract | Crossref Full Text | Google Scholar

32. Tourniaire P, Ilie M, Mazières J, Vigier A, Ghiringhelli F, Piton N, et al. WhARIO: whole-slide-image-based survival analysis for patients treated with immunotherapy. JMI. (2024) 11:037502. doi: 10.1117/1.JMI.11.3.037502

PubMed Abstract | Crossref Full Text | Google Scholar

33. Captier N, Lerousseau M, Orlhac F, Hovhannisyan-Baghdasarian N, Luporsi M, Woff E, et al. Integration of clinical, pathological, radiological, and transcriptomic data improves prediction for first-line immunotherapy outcome in metastatic non-small cell lung cancer. Nat Commun. (2025) 16:614. doi: 10.1038/s41467-025-55847-5

PubMed Abstract | Crossref Full Text | Google Scholar

34. Li X. Deciphering cell to cell spatial relationship for pathology images using SpatialQPFs. Sci Rep. (2024) 14:29585. doi: 10.1038/s41598-024-81383-1

PubMed Abstract | Crossref Full Text | Google Scholar

35. Li X, Ren X, and Venugopal R. Entropy measures for quantifying complexity in digital pathology and spatial omics. iScience. (2025) 28:112765. doi: 10.1016/j.isci.2025.112765

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: biomarker, deep learning - artificial intelligence, histopathalogical, lung, predictive model

Citation: Peroz M, Roussot N, Ilie A, Rageot D, Derangere V, Truntzer C and Ghiringhelli F (2026) Deep learning-based assessment of PD-L1 expression in NSCLC predicts outcome for patients treated with anti-PD-1 immunotherapy. Front. Immunol. 17:1750816. doi: 10.3389/fimmu.2026.1750816

Received: 20 November 2025; Accepted: 28 January 2026; Revised: 26 January 2026;
Published: 13 February 2026.

Edited by:

Sunyi Zheng, Tianjin Medical University Cancer Institute and Hospital, China

Reviewed by:

Wei Zhang, The University of Utah, United States
Xiao Li, Roche Diagnostics, United States

Copyright © 2026 Peroz, Roussot, Ilie, Rageot, Derangere, Truntzer and Ghiringhelli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Morgane Peroz, bXBlcm96QGNnZmwuZnI=; Caroline Truntzer, Y3RydW50emVyQGNnZmwuZnI=; François Ghiringhelli, ZmdoaXJpbmdoZWxsaUBjZ2ZsLmZy

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.