Integration of single cell and bulk transcriptomic analyses identifies FAM189A2 as a key prognostic gene in lung cancer

Guo, Peng; Xiang, Mengqi; Chen, Jing; Zhang, Huachuan

doi:10.3389/fimmu.2025.1701806

ORIGINAL RESEARCH article

Front. Immunol., 06 January 2026

Sec. Cancer Immunity and Immunotherapy

Volume 16 - 2025 | https://doi.org/10.3389/fimmu.2025.1701806

This article is part of the Research TopicPredictive Biomarkers to Immune Checkpoint Inhibitors in Lung CancerView all 14 articles

Integration of single cell and bulk transcriptomic analyses identifies FAM189A2 as a key prognostic gene in lung cancer

Peng Guo^1†

Mengqi Xiang^2†

Jing Chen^2*

Huachuan Zhang^3*

¹Department of Pathology, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China
²Department of Medical Oncology, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China
³Department of Thoracic Surgery, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China

Background: Lung cancer remains the leading cause of cancer−related mortality. Single−cell RNA sequencing (scRNA−seq) enables high−resolution mapping of tumor microenvironment, but how malignant cell states connect to patient outcomes and therapeutic vulnerabilities is not fully understood.

Methods: We analyzed scRNA−seq profiles from 13 primary lung tumors and matched normal lung tissues (48,149 cells). Malignant versus non−malignant epithelial cells were defined by inferCNV, co−expression programs resolved with Hotspot, transcription factor regulons inferred by pySCENIC, and differentiation potential estimated by CytoTRACE and diffusion pseudotime. Key malignant programs were cross−checked in an independent LUAD scRNA−seq cohort (GSE131907). Program signatures were projected into TCGA−LUAD to build an eight−gene LASSO−Cox risk model, which was further validated in two external NSCLC/LUAD cohorts (GSE30219 and GSE31210). Immune contexture was inferred by CIBERSORT and TIDE, drug sensitivities predicted with pRRophetic, and the function of FAM189A2 tested by knockdown and overexpression in A549 and NCI−H23 cells.

Results: We delineated 12 major cell populations and six malignant epithelial subclusters, including an early GPRC5A⁺ subpopulation with high developmental potential. Malignant programs corresponding to cell−cycle and inflammatory states (Modules 8, 14 and 16) were associated with worse overall survival, whereas Module 15, preferentially active in the GPRC5A⁺ state, was linked to better outcomes. An eight−gene signature (LAMA5, ACTB, B4GALT1, KLF5, KRT18, FAM189A2, SLC34A2, S100A11) robustly stratified patients into high− and low−risk groups across TCGA−LUAD and two validation cohorts. High−risk tumors displayed an immune−enriched yet matrix−restrained microenvironment with higher CD8⁺ T cells, whereas low−risk tumors showed greater predicted sensitivity to Aurora kinase, IGF−1R, mTOR and TGF−β inhibition. FAM189A2 was down−regulated in tumors, higher expression predicted better survival, and bidirectional perturbation in A549 and H23 cells demonstrated that loss of FAM189A2 enhanced, while overexpression attenuated, migration and invasion in vitro.

Conclusions: Integrating single−cell and bulk transcriptomes links malignant epithelial state programs to prognosis, yields a practical eight−gene risk model validated in multiple LUAD cohorts, and nominates FAM189A2 as a putative tumor suppressor and potential biomarker in lung cancer. These findings suggest testable strategies for risk stratification and therapy selection that warrant prospective evaluation.

1 Introduction

Lung cancer is the leading cause of cancer−related mortality worldwide (1). Although the adoption of molecularly targeted agents and immune−checkpoint inhibitors has improved outcomes for selected subgroups, the overall 5−year survival rate for lung cancer still hovers below 20 % (2, 3). Two factors largely account for this shortfall. First, most patients present with advanced or metastatic disease, precluding curative surgery (4). Second, current systemic therapies benefit only those tumors that harbor specific driver alterations or an inflamed immune milieu, and even in these cases resistance inevitably develops (5, 6). Pinpointing new molecular vulnerabilities therefore remains a high priority.

scRNA−seq provides a powerful means to chart the cellular landscape of tumors at unprecedented resolution. By capturing transcriptomes one cell at a time, scRNA−seq distinguishes rare populations and transitional states that bulk RNA analyses obscure, shedding light on mechanisms of tumor evolution, immune evasion and drug resistance. Recent scRNA−seq studies in lung cancer have delineated immune−cell infiltration patterns, mutation−specific transcriptional programs and early premalignant niches (7–9). Yet most of these efforts have centered on a limited number of molecular subtypes or immune components.

Here, we analyzed single−cell transcriptomes from 13 primary lung cancer and matched normal lung tissue. Using harmony−integrated clustering, inferCNV, Hotspot and CytoTRACE, we mapped cellular diversity, identified six malignant epithelial subclusters and highlighted a GPRC5A⁺ malignant epithelial subpopulation with high developmental potential. Leveraging bulk RNA−seq data from the TCGA−LUAD cohort, we built an 8−gene risk model via LASSO−Cox regression and functionally validated one key component, FAM189A2, as a tumor suppressor in vitro. These findings enrich our understanding of lung cancer biology and provide a practical framework for risk stratification and therapeutic targeting.

2 Materials and methods

2.1 Data collection and preprocessing

scRNA-seq data for lung cancer were obtained from GEO (GSE196303, 13 samples, accessed May 26, 2025;GSE131907,58 samples; accessed Nov 20, 2025).

Count matrices were processed with Scanpy v1.9.1 (10). Doublets were identified using Scrublet and excluded from downstream analyses (11). Bulk RNA-seq data and corresponding clinical annotations for LUAD were retrieved from TCGA (accessed May 23, 2025) and analyzed with limma v3.52 (12).

2.2 scRNA-seq clustering, visualization and cell annotation

Cells with <300 or >8,000 detected genes, mitochondrial transcripts >10%, or <1,000 UMIs were filtered out. Library sizes were normalized to 10,000 counts per cell (sc.pp.normalize_total, target_sum=1e4) and log-transformed. Highly variable genes (n=2,500) were selected (sc.pp.highly_variable_genes), followed by PCA (40 components). Batch effects were mitigated using Harmony integration (13). Graph-based clustering was performed with the Leiden algorithm (resolution=0.2) (14), and embeddings were visualized with UMAP. Cluster annotation was guided by literature-curated markers and refined by differential expression using sc.tl.rank_genes_groups (Wilcoxon rank-sum).

2.3 InferCNV analysis

To infer large-scale copy-number alterations at single-cell resolution, we used inferCNVpy (v0.3.0) on the lung cancer scRNA-seq dataset. Gene-level genomic coordinates were derived from the GRCh38 GTF annotation using the genomic_position_from_gtf function, and raw UMI count matrices together with cell-type labels were supplied as inputs. Immune cell populations (T cells, B cells, monocytes, and NK cells) were specified as non-malignant reference categories. Copy-number profiles were smoothed along each chromosome using a sliding window of 101 genes (window_size=101), while other normalization and denoising parameters followed the inferCNVpy defaults.

For each cell, inferCNVpy computed a continuous CNV-burden score (cnv_score) summarizing the magnitude of inferred copy-number deviations relative to the reference cells. Inspection of the CNV-score distribution across the cohort revealed a low-scoring mode corresponding to reference-like cells and a higher-scoring tail. We therefore used a threshold of 0.025. Malignant epithelial cells were then subjected to dimensionality reduction (PCA and UMAP), k-nearest-neighbor graph construction, and Leiden clustering to resolve intratumoral CNV-defined subpopulations.

2.4 Gene set enrichment analysis and gene ontology enrichment analysis

Gene set enrichment analysis (GSEA) and GO enrichment were performed with clusterProfiler (15). Ranked gene lists were queried against curated pathway collections; GO analysis was restricted to the Biological Process ontology. Multiple testing was controlled by the Benjamini–Hochberg method, and pathways with adjusted P < 0.05 were considered significant.

2.5 Subcluster stemness analysis and trajectory analysis

We applied CytoTRACE2 to estimate per-cell differentiation (higher score = less differentiated) across subclusters (16). Scores were summarized per subcluster (median, IQR) and compared using the Kruskal–Wallis test with Benjamini–Hochberg correction for post hoc contrasts. Cells in the upper decile of CytoTRACE2 defined the putative root population.

2.6 Transcription factor regulon analysis

Transcription factor (TF) regulons in malignant epithelial cells were reconstructed using the three−step pySCENIC workflow (17). In the first step, gene regulatory networks were inferred with the pyscenic grn command using the GRNBoost2 method and a curated list of 1,839 human TFs as candidate regulators. In the second step, co−expression modules were pruned to direct TF–target regulons by motif enrichment analysis with the human hg38 cisTarget ranking databases and the corresponding motif annotation table (motifs−v10nr_clust−nr.hgnc−m0.001−o0.0.tbl), using the pyscenic ctx command with the –mask_dropouts option. These databases restrict motif ranking to genomic regions approximately ±10 kb around annotated transcription start sites. We retained only “activating” modules that did not contain “weight>50.0%” flags in the context annotation. For motif enrichment, we required a normalized enrichment score (NES) ≥ 3.0 and either direct gene annotation or orthology to a directly annotated gene, and we discarded regulons containing fewer than 10 target genes. In the third step, regulon activity was quantified at single−cell resolution using AUCell, yielding an AUC matrix for all regulons across all malignant epithelial cells. AUC scores were binarized with AUCell−derived thresholds to define “on/off” regulon states, and both the binarized regulon matrix and AUC scores were imported into the AnnData. Cluster−specific regulons among malignant epithelial subclusters were identified by computing Z−scored regulon activity per cluster relative to the global mean and standard deviation; regulons with Z ≥ 0.2 and high regulon specificity scores (RSS) were considered cluster−enriched. These regulons were visualized using heatmaps of Z−scores and RSS−based rank plots, and TF–target networks were inspected to highlight dominant regulatory programs.

2.7 Construction and validation of a novel prognostic risk model

The top 200 genes from the GPRC5A⁺ epithelial cluster were subjected to univariate Cox regression against overall survival in TCGA-LUAD; genes with P < 0.05 proceeded to LASSO-Cox modeling. Patient-level risk scores were computed as, Risk score = Σ(Coef_i × Expr_i). Patients were dichotomized at the median risk score into high- and low-risk groups. Prognostic performance was evaluated by Kaplan–Meier analysis with log-rank testing and time-dependent ROC curves.

2.8 Immune infiltration analysis and functional enrichment analysis

Bulk expression profiles were transformed to log2(TPM + 1) and deconvolved with CIBERSORT using the LM22 leukocyte signature (1,000 permutations) (18). To evaluate immune evasion potential, TIDE (Tumor Immune Dysfunction and Exclusion) scores were computed for each sample under default parameters.

2.9 Drug sensitivity assessment

Predicted drug responses were inferred with pRRophetic (v0.5) trained on the GDSC2_Expr reference set (19). Study expression matrices were converted to log2(TPM + 1), mapped to HGNC symbols, and gene-wise z-scored to match the training distribution. Batch effects were corrected using the package’s built-in ComBat option. For each compound, ridge-regression models estimated individual IC50 (log10 µM) values; predictions were repeated across random seeds and summarized by the median per sample.

2.10 Cell line culture, FAM189A2 knockdown/overexpression, and qPCR validation

A549 cells (Wuhan Pricella) were cultured in Ham’s F−12K supplemented with 10% FBS (Gibco) and 1% penicillin/streptomycin at 37 °C in 5% CO₂ (~95% humidity); NCI−H23 cells were maintained in RPMI−1640 with 10% FBS under identical conditions. Cell identity was confirmed by STR profiling and all lines tested mycoplasma−negative prior to use. For loss−of−function, shRNAs targeting FAM189A2 (sh−1: GCTCGAATCAAAGGTGTGGAA; sh−2: GCCTTGAAACTCTTTCCGGTT) were cloned into pLKO.1−TRC, packaged in HEK293T cells, and viral supernatants collected at 48 h. A549 cells were transduced (8 µg/mL polybrene) and selected with puromycin (2 µg/mL) to establish stable knockdown pools; a non−targeting shRNA served as negative control. Knockdown efficiency was verified by qPCR. For gain−of−function, the 1,824−bp FAM189A2 coding sequence was inserted into pcDNA3.4 to generate pcDNA3.4−FAM189A2; empty vector (EV) was used as a matched control. Transient overexpression was performed at 80% confluence in 6−well plates. Where indicated, stable pools were generated by G418 selection at the pre−determined kill concentration (600 µg/mL for 3 days) and maintained thereafter at half concentration; overexpression was confirmed by qPCR. Total RNA was extracted with TRIzol, reverse−transcribed from 1 µg RNA, and quantified by SYBR−based qPCR on a 96−well platform. Primer sequences (5′ to 3′) were: FAM189A2_F, CGGAATTCGGTCGCCACCATGTACTCCTGGTAAACCTCTTTGTG; FAM189A2_R, CGTCGACTCACAGGACAGTCTCTCGGATGAC; ACTB_F, GAAGTGTGACGTTGACATCCG; ACTB_R, GCCTAGAAGCATTTGCGGTG. Melt−curve analysis confirmed single products. Relative expression was calculated by 2^−ΔΔCt using ACTB as the endogenous control. All qPCRs were run in technical triplicates with ≥3 biological replicates. Unless otherwise specified, antibiotics were removed 24 h before functional assays.

2.11 Wound healing assay

For scratch assays, confluent monolayers (90%) in 6-well plates were wounded with a sterile 200 µL pipette tip, rinsed twice with PBS, and replenished with fresh medium. Images at 0, 24, and 48 h were analyzed to quantify scratch closure relative to baseline.

2.12 Cell invasion assay

Transwell inserts (8 µm) were coated with Matrigel and solidified at 37°C for 6 h, then pre-wetted with 50 µL complete medium for 30 min. After serum starvation (5% FBS medium), 1.5×10⁵ FAM189A2-NC or knockdown cells in 200 µL serum-free medium were added to the upper chamber; the lower chamber contained 600 µL medium with 15% FBS as chemoattractant. After 48 h at 37 °C, non-invading cells were removed. Cells on the underside were fixed and stained with 0.5% crystal violet. Five random fields per insert were imaged at 200× for quantification.

2.13 Western blot analysis of EMT markers

Cells were lysed in RIPA buffer with protease/phosphatase inhibitors; protein was quantified by BCA, resolved (20 µg/lane) by SDS–PAGE, and transferred to PVDF membranes. After blocking (5% BSA or 5% non−fat milk, 1 h), membranes were incubated overnight at 4 °C with primary antibodies against E−cadherin (CST <ns/>3195, 1:1000), N−cadherin (CST <ns/>13116, 1:1000), with α/β−Tubulin as loading controls (1:3000–1:5000), followed by HRP−conjugated secondary antibodies (1:5000, 1 h, RT) and ECL detection. Band intensities were quantified in ImageJ, normalized to the loading control, then to the matched control group (shNT for knockdown; EV for overexpression). Data are presented as mean ± SEM from ≥3 biological replicates.

2.14 Statistical analysis

Analyses were performed in R v4.2.1. Non-normally distributed variables were compared with the Wilcoxon rank-sum test. Overall survival was assessed using Kaplan–Meier curves with log-rank tests. Two-sided P < 0.05 was considered statistically significant.

3 Results

3.1 Single−cell TME landscapes of lung cancer

We assembled scRNA−seq profiles from 13 lung cancer related samples (10 patients) and, after standard QC and integration, retained 48,149 cells for analysis (Figure 1; Supplementary Table S1). Unsupervised clustering followed by UMAP revealed 12 transcriptionally distinct clusters that map to three major compartments (Figures 2A, B), lymphoid (T, B, NK and plasma cells), myeloid (monocytes, macrophages, mast cells) and stromal/epithelial (vascular and lymphatic endothelial cells, fibroblasts, and lung epithelial cells including AT2, ciliated epithelium and PNEC).

Figure 1

Flowchart depicting the analysis of 13 samples through scRNA sequencing and cell annotation to identify malignant epithelial cells. It utilizes CytoTrace 2, PySCENIC, and Hotspot to produce an eight-gene prognostic signature. This signature is used to assess high and low risk, influence FAM189A2 knockdown, and perform analyses such as immune infiltration, molecular, drug sensitivity, wound healing, and Transwell assays. Accompanied by visual data from these analyses.

Figure 1. Schematic of overall experimental workflow.

Figure 2

Diagram illustrating cell type distribution and expression. Panel A shows a UMAP plot with clusters numbered, each representing different cell types. Panel B labels cell types on another UMAP plot. Panel C displays a bubble heatmap of gene expression and cell fractions. Panel D features a stacked bar chart of cell type percentages in normal versus tumor samples. Panel E shows a stacked bar chart of cell type percentages across multiple patients. A legend links colors to cell types.

Figure 2. Single-cell landscape of immune and stromal cells in lung cancer (A) UMAP of all cells colored by cluster;(B) UMAP by annotated cell type. (C) Dot plot of canonical marker gene expression. (D) Stacked barplot of cell-type composition in tumor vs. normal sample. (E) Per-patient cell-type abundances.

Canonical markers supported these annotations (Figure 2C; Supplementary Table S2), T cells (CD3D, CD3E), NK cells (NKG7, KLRD1, GZMA), B cells (CD79A, IGKC), macrophages (CD68, MARCO), monocytes (S100A8, AIF1), mast cells (TPSAB1, CPA3), endothelial cells (VWF, AQP1), lymphatic endothelium (PROX1, ECSCR), fibroblasts (ACTA2, TAGLN), alveolar type 2 (AT2)−like epithelial cells (SFTPA2, KRT18), ciliated cells (RSPH1, CFAP53) and PNEC (CHGA, SCGN).

We next compared cellular compositions across tissue sources and individuals. Tumor samples showed a relative expansion of fibroblasts and PNEC, whereas monocytes tended to be more abundant in adjacent normal lung, overall patterns were reproducible across patients (Figures 2D, E). This single−cell atlas establishes the cellular baseline of the lung cancer TME and motivates a focused interrogation of the epithelial compartment to distinguish malignant from non−malignant epithelial cells in the next section.

3.2 Identification and functional characterization of malignant epithelial cells

Using inferCNV on the epithelial compartment, we observed broad, segment−level copy−number shifts in tumor−derived epithelium but not in epithelium from non−tumor sources (Figure 3A). A per−cell CNV score projected onto UMAP cleanly separated these two states (Figure 3B), enabling a stringent classification of malignant and non−malignant epithelial cells (Figure 3C).

Figure 3

Panel A shows a heatmap with cell types and gene expression levels. Panel B features a UMAP plot with color gradients. Panel C presents a UMAP plot distinguishing malignant and non-malignant cells. Panel D displays a line graph indicating enrichment scores for different pathways. Panel E includes a bar chart of cellular responses and adjusted P-values.

Figure 3. Identification of malignant cell subpopulations in lung cancer (A) Heatmap of inferred copy-number variation (CNV) profiles across epithelial cells. (B) UMAP colored by CNV score. (C) UMAP with malignant vs. non-malignant epithelial cells labeled. (D) GSEA comparing malignant vs. normal epithelial cells. (E) GO Biological Process terms upregulated in malignant cells.

Transcriptomically, malignant epithelial cells upregulated proliferation and stress−adaptation programs, whereas non−malignant cells retained lineage and homeostatic functions. Hallmark GSEA (malignant vs. non−malignant) highlighted Inflammatory Response and Interferon−γ Response as preferential in non−malignant epithelium, while malignant cells showed positive enrichment for Spermatogenesis (Figure 3D). GO−BP analysis of malignant−up genes emphasized proteostasis and metabolic rewiring (unfolded−protein response, protein catabolic processes, mitochondrial/oxidative programs), whereas non−malignant−up genes mapped to ciliation and surfactant biology (Figure 3E).

Consistent with these pathway−level differences, malignant cells expressed higher TOP2A, MKI67, STMN1, and NUPR1, while non−malignant epithelial cells maintained differentiation markers such as SFTPC and FOXJ1. Together, these results define a robust malignant epithelial state marked by CNV burden and proliferative/metabolic programs, setting the stage to ask which malignant transcriptional modules track with patient prognosis in bulk lung cancer cohorts.

3.3 Co−expression modules in malignant cells and their prognostic significance

To resolve malignant cell programs, we applied Hotspot to malignant epithelial cells and identified 18 robust co−expression modules (Figure 4A, Supplementary Table S3). These modules represent recurrent transcriptional programs across tumor cells. Among them, four modules showed particularly clear biology. Module 16 corresponds to a canonical cell−cycle/DNA−replication program, marked by genes such as CENPK, E2F1, TYMS, TOP2A, NUSAP1 and other mitotic regulators. Module 8 represents an inflammatory and secretory epithelial program enriched for mucins, chemokines and proteases (PIGR, MUC4, MUC5AC, CXCL1/3/5/8, LCN2, MMP7, PLAU), together with extracellular−matrix−remodeling factors. Module 14 is a smaller module containing genes linked to cytoskeletal organization and small secreted peptides (ARPC5L, MYL6B, TMSB15A, CENPH, NPPB). Module 15 captures a signaling and cytoskeletal−remodeling program featuring RTK/IGF1R−related signaling, actin dynamics and vesicle trafficking components, as reflected by genes such as PLOD2, IGF1R, CRK, VASP, SKAP2 and SNX1.

Figure 4

Heatmap and Kaplan-Meier plots are shown. Panel A is a heatmap of module correlations with Z-scores, ranging from blue to red. Panels B to E display Kaplan-Meier survival plots with two groups, high and low, over time. The p-values range from 0.027 to 0.00025, indicating statistical significance.

Figure 4. Hotspot analysis (A) Hotspot analysis of malignant epithelial cells, showing identified gene co-expression modules in UMAP space. (B–E) KM survival curves for TCGA-LUAD patients stratified by high vs. low module scores for module 8, 14, 15, and 16 respectively, demonstrating prognostic impacts.

We next projected each module’s gene set into the TCGA−LUAD cohort and computed per−tumor module scores. Kaplan–Meier analyses showed that high scores for the cell−cycle/DNA−replication module (Module 16), for the inflammatory/secretory epithelial module (Module 8), and for the smaller cytoskeletal/secreted−peptide module (Module 14) were each associated with significantly worse overall survival, consistent with their proliferative and tissue−remodelling biology. In contrast, higher activity of Module 15, the RTK/IGF1R−linked signaling and cytoskeletal−remodeling program correlated with improved survival (Figures 4B–E).

3.4 Heterogeneity of malignant epithelial subpopulations and identification of a key early GPRC5A+ subcluster

Re−clustering of malignant epithelial cells resolved 6 robust subpopulations, labelled by top markers as ADGRG6⁺, GPRC5A⁺, H3F3B⁺, MT1H⁺, PCSK1⁺ and PON1⁺ (Figure 5A). These phenotypes likely represent distinct differentiation lineages or metabolic adaptations within the malignant epithelial compartment. To compare developmental potential, we first computed CytoTRACE scores. The GPRC5A⁺ subpopulation showed the highest CytoTRACE values (least differentiated), whereas PON1⁺ and MT1H⁺ cells scored lower, consistent with a more differentiated state (Figures 5B–D). To further place these states along a continuum, we applied diffusion−pseudotime (DPT) analysis. The GPRC5A⁺ cluster occupied the earliest pseudotime positions with the lowest dpt_pseudotime values, from which more proliferative or stress−adapted clusters appeared to emerge (Supplementary Figure S1A). Together, CytoTRACE and DPT support the view that GPRC5A⁺ cells represent an early, primed malignant state with high developmental potential rather than a terminally differentiated endpoint.

Figure 5

Clustered UMAP plots show cell types labeled ADGRG6+, GPRC5A+, H3F3B+, MT1H+, PCSK1+, and PON1+. CytoTRACE analysis in panel B illustrates differentiation. Panel E displays motif enrichment linked to transcription factors. Panel D features box plots of developmental potency scores by phenotype. Module score maps in panel F highlight gene expression variations.

Figure 5. Malignant epithelial subpopulations analysis (A) UMAP of malignant epithelial cells, colored by subcluster. (B) CytoTRACE scores of each subcluster (higher = more stem-like). (C) Cell-cycle phase distribution by subcluster. (D) boxplot of top differentially expressed genes for each subcluster. (E) Transcription factor regulon activity scores per subcluster. (F) Module score profile of each subcluster.

Regulatory network inference by pySCENIC further distinguished the six phenotypes, recovering transcription−factor regulons with subtype−specific activity, including CUX1, EGR1, ETV2, TBX1, ZNF507 and others (motif logos and regulon AUC heatmap in Figure 5E). These regulons offer hypotheses for upstream drivers of the alveolar−like GPRC5A⁺ state versus the more proliferative or hypoxic programs in other clusters. We next asked how the prognostic Hotspot modules map onto these subpopulations. UMAP overlays revealed that Module 15—the RTK/IGF1R−linked signaling and cytoskeletal−remodeling program—is maximally enriched within the GPRC5A⁺ region, whereas Modules 8, 14 and 16, which capture inflammatory/secretory and cell−cycle programs, preferentially mark other malignant subpopulations and show little overlap with GPRC5A⁺ cells (Figure 5F). This spatial pattern links the GPRC5A⁺ early malignant state to the favorable−prognosis Module−15 program and, conversely, associates Modules 8/14/16 with more aggressive malignant states.

To assess whether a similar architecture exists in lung adenocarcinoma, we analyzed an independent LUAD scRNA−seq dataset (GSE131907). After quality control, clustering and identification of malignant epithelial cells, we projected the Module−15 gene set onto this dataset. Module−15 scores showed clear spatial heterogeneity on the LUAD UMAP (Supplementary Figure S1B) and were highest in malignant cluster 11 (Supplementary Figure S1C). This cluster also exhibited high similarity to our discovery−cohort GPRC5A⁺ signature, indicating an analogous GPRC5A−high population in LUAD. Thus, the coupling between an early GPRC5A−high malignant state and the Module−15 signaling program is reproducible across independent single−cell datasets, supporting the relevance of these carcinoid−derived malignant programs to LUAD biology.

Together, these analyses define a continuum of malignant epithelial states in lung cancer and LUAD, from proliferative and stress−adapted subpopulations dominated by Modules 8, 14 and 16 to an early GPRC5A⁺, stem−leaning subpopulation that preferentially engages the Module−15 signaling program.

3.5 Prognostic risk model derived from the GPRC5A+ tumor subcluster

Guided by the GPRC5A+ subcluster, we used its top genes to build a transcriptomic risk model in TCGA−LUAD. LASSO−Cox identified 8 genes with non−zero coefficients, LAMA5, ACTB, B4GALT1, KLF5, KRT18, FAM189A2, SLC34A2, S100A11, using 10−fold cross−validation for λ selection (Figure 6A). A risk score was computed per patient as Σ(β_i × expression_i), and the cohort was median−split into High− and Low−risk groups. This signature robustly stratified survival: the High−risk group had significantly worse OS than the Low−risk group (P < 0.001; Figure 6B). Time−dependent ROC analysis yielded AUC ≈ 0.70 at 1, 3, and 5 years, indicating stable performance across follow−up windows (Figure 6D). To further assess the robustness and generalizability of this 8−gene signature, we applied it to two independent NSCLC/LUAD microarray cohorts: GSE30219 (NSCLC, n = 293 tumors) and GSE31210 (stage I–II LUAD, n = 226 patients). In each cohort, patients were stratified into high− and low−risk groups according to the 8−gene risk score, and the low−risk group consistently showed significantly better OS than the high−risk group (Supplementary Figures S2A, B). These external validations support the prognostic value of the 8−gene signature beyond the TCGA training set.

Figure 6

Chart composite detailing analysis of clinical data: A) Two plots showing Lasso regression results, with coefficients against log lambda and cross-validated errors against log lambda. B) Kaplan-Meier survival curves illustrating significant differences in survival probability between high risk (red) and low risk (blue) groups (p<0.001). C) Forest plot presenting hazard ratios for genes and clinical covariates, including significant associations for LAMA3, ACTB, and Stage 2. D) Receiver Operating Characteristic (ROC) curves with area under the curve (AUC) values at 1, 3, and 5 years, each at 0.7. E) Enrichment plot for GO biological processes in high risk group, highlighting processes like intermediate filament organization. F) Enrichment plot for KEGG pathways in high risk group, emphasizing pathways like biosynthesis of cofactors and cell cycle.

Figure 6. Construction and evaluation of the prognostic risk model. (A) LASSO coefficient profiles of candidate genes (left) and ten-fold cross-validation error plot (right). (B) KM overall survival curves for TCGA-LUNG CANCER patients stratified by the 8-gene risk score. (C) Multivariable Cox regression forest plot of the 8 genes in the signature. (D) Time-dependent ROC curves for 1, 3, and 5-year OS prediction by the 8-gene risk score. (E) GSEA of GO Biological Process terms. (F) GSEA of KEGG pathways highlighting differences.

To dissect the relative contributions of individual genes and clinical variables, we fitted a multivariable Cox model including the 8 signature genes together with age, gender and stage (Figure 6C). In this model, most genes had modest effect sizes with hazard ratios close to 1.0 per unit increase in expression, whereas advanced stage (III–IV vs. I–II) showed the expected strong risk effect. FAM189A2 displayed a borderline protective tendency (HR ≈ 0.96, 95% CI 0.91–1.00, P = 0.081), which is directionally consistent with its enrichment in the favorable Module−15 RTK/IGF1R−linked signaling program within the GPRC5A⁺ subcluster.

Collectively, the 8−gene signature translates our single−cell findings into a clinically interpretable tool that separates lung cancer into prognostically distinct groups.

3.6 Transcriptomic, immune, and drug−sensitivity differences between high− and low−risk groups

Having stratified TCGA−LUAD patients with the 8−gene signature, we compared the two risk groups across transcriptional programs, tumor immunity, somatic alterations and predicted drug response. Pathway comparison further clarified their biology. High−risk tumors were enriched for keratinisation, keratinocyte differentiation and intermediate filament cytoskeleton (Figure 6E), and for estrogen signaling, pentose and glucuronate interconversions and porphyrin metabolism (Figure 6F). These results suggest a shift toward epithelial remodeling and specific metabolic reprogramming in high−risk disease, whereas low−risk tumors retain more alveolar−differentiation−like features. Deconvolution of bulk expression revealed marked differences in immune composition between risk groups. High−risk tumors showed higher fractions of CD8⁺ T cells, activated memory CD4⁺ T cells, macrophages (M0/M1) and resting NK cells, whereas low−risk tumors were enriched for plasma cells, resting memory CD4⁺ T cells, resting dendritic cells and resting mast cells (Figure 7A). At first glance, this immune−inflamed phenotype appears at odds with the poorer survival of the high−risk group. Side−by−side oncoplots showed broadly similar driver mutation landscapes across groups (Figures 7B, C). Microsatellite instability (MSI) tended to be higher in low−risk tumors (Figure 7D). High−risk tumors also exhibited higher TIDE and Exclusion scores, indicating greater potential for T−cell exclusion.

Figure 7

Five panels illustrating various data analyses in box plots and graphs. Panel A shows box plots comparing high and low risk groups across different cell types with statistical significance marked. Panels B and C display mutation landscapes in high and low risk groups respectively, using stacked bar graphs. Panel D contains three box plots comparing MSI, TIDE, and Exclusion metrics between low and high risk groups. Panel E presents box plots of estimated IC50 values for different drugs across high and low risk groups, highlighting differences in drug response.

Figure 7. Functional differences between high- and low-risk groups. (A) Boxplot of CIBERSORT-estimated abundances of 22 immune cell types. (B, C) Oncoprint-style mutation plots for High-risk (B) and Low-risk (C) groups, showing the top mutated genes in each. (D) Boxplots of MSI scores and TIDE scores in High vs. Low groups. (E) Boxplots of predicted drug sensitivities (IC50) for selected compounds. Significance levels were denoted as *P < 0.05, **P < 0.01, ****P < 0.0001.

To further reconcile the apparent discrepancy between an immune−rich high−risk group and its poor survival, we quantified T−cell exhaustion and stromal signatures in TCGA−LUAD. At the single−gene level, several checkpoint/exhaustion markers (PDCD1, LAG3, CTLA4, TIGIT, HAVCR2, PDCD1LG2 and CD274) showed only modest, directionally variable differences between high− and low−risk tumors (Supplementary Figure S3A). Consistently, an exhaustion−related ssGSEA signature was not significantly different between risk groups (P = 0.11), and a myeloid/bone−marrow ssGSEA score also showed no clear separation (P = 0.22; Supplementary Figure S3B). In contrast, a matrix/cancer−associated fibroblast (CAF)−related ssGSEA signature was significantly enriched in high−risk tumors (P = 2.8 × 10^-4; Supplementary Figure S3B), indicating a more pronounced stromal/ECM activation in this group. Taken together with the higher TIDE Exclusion scores, these findings suggest that high−risk tumors are immune−enriched but embedded in a matrix−rich, CAF−dominated microenvironment, compatible with an “inflamed yet restrained” TME rather than a uniformly favorable immune−hot state.

Using expression−based pharmacogenomic modelling, low−risk tumors showed lower estimated IC₅₀ values for several agents targeting cell−cycle and growth−factor pathways, ZM447439 and Tozasertib (Aurora kinase inhibitors), BMS−754807 (IGF−1R/IR inhibitor), AZD8055 (mTOR inhibitor) and SB505124 (TGF−βRI inhibitor), indicating higher predicted sensitivity (Figure 7E). High−risk tumors were relatively resistant to these agents, in line with their proliferative and stromally remodeled phenotype. Overall, these orthogonal readouts help explain the survival difference between risk groups and point to distinct therapeutic strategies: low−risk LUADs may benefit more from targeting Aurora kinase, IGF−1R, mTOR and TGF−β pathways, whereas high−risk tumors are characterized by greater immune restraint within a dense matrix and may require combination strategies that simultaneously modulate the stroma and reinvigorate anti−tumor immunity.

3.7 Identification of FAM189A2 as a hub gene in the risk signature

To pinpoint potential drivers within the signature, we trained a Random Forest classifier on the 8 signature genes together with closely co−expressed epithelial markers. The model reached a stable out−of−bag error with 1,000 trees (Figure 8A) and consistently ranked KRT7, FAM189A2 and B4GALT1 as the top predictors. Expression analyses using TCGA confirmed tumor downregulation of FAM189A2 and tumor−upregulation of KRT7 and B4GALT1 (Figure 8B). In an independent GEO dataset (GSE17558), FAM189A2 expression was likewise higher in adjacent non−tumor tissue than in matched tumor samples (Supplementary Figure S2C), providing external support for its tumor−suppressive pattern. Kaplan–Meier curves further showed that high FAM189A2 expression predicts better survival (P = 0.003), whereas high KRT7 and high B4GALT1 expression predict worse survival (P = 0.0077 and P = 0.00038, respectively; Figure 8C). Together, the machine−learning importance ranking, tumor–normal contrasts across TCGA and GSE17558, and survival analyses converge on FAM189A2 as a protective hub within the risk framework, in line with its enrichment in the GPRC5A⁺ malignant subpopulation.

Figure 8

Panel A shows a decision tree error rate graph and a variable importance bar chart, highlighting KRT7, FAM189A2, and B4GALT1. Panel B features box plots comparing normal and tumor expression levels for KRT7, FAM189A2, and B4GALT1 genes with statistical significance. Panel C presents Kaplan-Meier survival curves for high and low expression groups of the same genes, with p-values indicating significance. Panel D is a volcano plot displaying gene expression changes between FAM189A2 low and high groups, focusing on chondroitin sulfate degradation.

Figure 8. Identification of FAM189A2 as a key hub gene. (A) Variable importance scores from Random Forest analysis of candidate genes. (B) Boxplots comparing expression of ACTB, FAM189A2, B4GALT1 in normal lung vs. LUAD tumor tissues. (C) KM overall survival curves for patient subsets with high vs. low expression of the three genes. Significance levels were denoted as **P < 0.01, ****P < 0.0001.

3.8 Functional characterization of FAM189A2 in lung cancer cell migration and invasion

We first tested whether FAM189A2 constrains motility and invasiveness in LUAD cells by knocking it down in two cell lines using independent shRNAs. Efficient knockdown was confirmed by qPCR (Supplementary Figure S3C). In A549 cells, FAM189A2−silenced cells closed scratch wounds markedly faster than non−targeting controls at 24 h, indicating enhanced migratory capacity (Figure 9A). Consistently, knockdown significantly increased the number of invading A549 cells in Matrigel Transwell assays (Figure 9B). The corresponding experiments in NCI−H23 yielded similar results: FAM189A2 knockdown accelerated wound closure and increased invasion compared with shRNA controls (Figures 9C, D). In all assays, non−targeting shRNA and parental cells were indistinguishable, suggesting that the phenotype is specific to FAM189A2 depletion.

Figure 9

A set of four panels showing wound healing and cell invasion assays. Panel A: Images of A549 cells at 0 and 24 hours for NC, shRNA1, and shRNA2, with a bar graph showing increased wound closure percentage for shRNA-treated cells compared to NC. Panel B: Microscopy images of invaded A549 cells and a bar graph indicating increased cell invasion for shRNA-treated groups. Panel C: Images of H23 cells at 0 and 24 hours, with a bar graph showing higher wound closure percentages for shRNA treatments. Panel D: Microscopy images and a bar graph depicting increased invaded H23 cells for shRNA groups. Statistical significance is denoted by asterisks.

Figure 9. Effects of FAM189A2 knockdown on lung cancer cell migration and invasion. (A, B) A549 cells transduced with non−targeting control (shNC) and two independent FAM189A2 shRNAs (sh−1, sh−2). (A) Wound−healing assay with representative images at 0 and 24 h (B) Matrigel Transwell invasion assay with representative images of crystal−violet−stained invaded cells and quantification of invaded cells per field. (C, D) NCI−H23 cells with the same shRNA constructs. (C) Wound−healing assay. (D) Matrigel invasion assay. *P < 0.05; **P < 0.01; ***P < 0.001; ns, not significant (*P ≥ 0.05); two−sided tests.

To examine whether the opposite perturbation produces reciprocal effects, we next overexpressed FAM189A2 in both A549 and H23 cells using a pcDNA3.4−FAM189A2 construct with matched empty−vector controls. Overexpression was verified by qPCR (Supplementary Figure S3C). In A549, FAM189A2 overexpression slowed scratch−wound closure and reduced invasion through Matrigel relative to empty−vector cells (Supplementary Figures S4A, B). In H23, which is more aggressive at baseline, FAM189A2 overexpression likewise decreased wound closure and significantly reduced the number of invading cells (Supplementary Figures S4C, D). Thus, across two LUAD lines and two perturbation directions, the pattern was consistent: lower FAM189A2 was associated with faster migration and greater invasion, whereas higher FAM189A2 dampened these traits.

We further investigated EMT markers by Western blot (Figures 10A, B). In A549 knockdown cells, E−cadherin tended to decrease and N−cadherin to increase relative to non−targeting controls, but these changes did not reach statistical significance in our small−n setting and are therefore interpreted as trends (Figure 10A). In H23, FAM189A2 overexpression significantly increased E−cadherin protein levels, whereas N−cadherin showed a modest downward trend without achieving formal significance (Figure 10B). These EMT−marker patterns, together with the robust effects on migration and invasion, are compatible with a partial EMT shift upon FAM189A2 loss and a converse shift toward a more epithelial phenotype upon its gain, but we regard them as supportive rather than definitive mechanistic proof.

Figure 10

Western blot and bar graphs showing E-Cadherin and N-Cadherin expression in A549 and H23 cells. Panels A and B compare expression levels under different treatments: NC, shRNA-1, shRNA-2, EV, and OE. Bars indicate relative expression levels with significant differences marked by asterisks. Tubulin is used as a loading control.

Figure 10. Effects of FAM189A2 knockdown and overexpression on E−cadherin and N−cadherin expression. (A) A549 and (B) NCI−H23 cells. For each panel, the left subpanel shows Western blots of N−cadherin, E−cadherin and Tubulin in NC, two FAM189A2 shRNAs, EV and FAM189A2−OE cells. The right subpanels show densitometric quantification of E−cadherin and N−cadherin, normalized to Tubulin and then to NC or EV. Significance levels were denoted as *P < 0.05, ns indicates no significant difference.

Collectively, bidirectional perturbation of FAM189A2 in two LUAD cell lines demonstrates that reduced FAM189A2 activity is associated with enhanced migratory and invasive behavior, whereas restoring or increasing FAM189A2 dampens these traits. These functional data support the view that FAM189A2 acts as a negative regulator of metastatic potential in lung cancer cells and are directionally consistent with its association with the favorable early GPRC5A⁺ malignant program in our single−cell analyses.

4 Discussion

In this study, we integrated single−cell and bulk transcriptomic data to dissect malignant epithelial states in lung cancer and link them to prognosis and potential therapeutic vulnerabilities. At single−cell resolution we identified at least six malignant epithelial subpopulations and several recurrent co−expression modules, ranging from highly proliferative and inflammatory/secretory programs (Modules 8, 14 and 16) to an early GPRC5A⁺ state with high developmental potential that preferentially engages a signaling and cytoskeletal−remodeling program (Module 15). The relative prevalence of these malignant states was then translated into an eight−gene expression signature derived from the GPRC5A⁺ program that stratifies patients into high− and low−risk groups with distinct tumor microenvironments and predicted drug sensitivities.

Our single−cell analysis confirms that intratumoral heterogeneity in lung cancer extends beyond genomic alterations to the level of cell−state diversity (20). Prior studies have shown that lung tumors harbor epithelial cells spanning a continuum from normal−like to fully transformed states. Kim and colleagues profiled lung cancers from early to advanced stages and observed the emergence of undifferentiated cell clusters associated with metastasis (21). Han et al. recently described KRT8⁺ alveolar intermediate cells as a transitional state between normal cells and malignant cells, with stem−like and immune−evasive features and adverse prognosis (22). In our advanced tumors we likewise detected a GPRC5A⁺ subpopulation with high CytoTRACE and early pseudotime, consistent with high developmental potential. However, tumors with stronger engagement of the GPRC5A⁺/Module−15 program showed better outcome, whereas enrichment of proliferative and inflammatory/secretory program (Modules 8, 14 and 16) portended poor survival. One plausible explanation is that, in established tumors, the presence of an early malignant state may indicate that a substantial fraction of cells has not yet fully transitioned into highly aggressive phenotypes, whereas tumors dominated by plastic, KRT8−high transitional or proliferative states are more lethal (22, 23). These nuances underscore that the clinical impact of a given cell state depends on its prevalence and context within a tumor. Our work reinforces the notion that lung cancer is not a single disease but a collection of phenotypically diverse cell populations whose proportions vary between tumors, driving divergent clinical trajectories and treatment responses (24). From a therapeutic standpoint, selectively targeting the most aggressive subpopulations (with strong Module−8/14/16 activity) while sparing or even stabilizing less aggressive states could be beneficial. Single−cell approaches, as used here, are essential for mapping such sublineages and designing state−targeted interventions (25).

An important conceptual question is whether malignant program derived from pulmonary carcinoid tumors are relevant to LUAD. Our discovery scRNA−seq dataset (GSE196303) comprises neuroendocrine tumors and matched normal lung from LUAD patients, whereas our prognostic modelling was performed in LUAD cohorts. We therefore explicitly tested for recurrence of the key malignant program in a histology−matched LUAD single−cell dataset (GSE131907). In that cohort we identified a GPRC5A−high malignant cluster in which the Module−15 signaling program was again preferentially active, mirroring the pattern seen in the discovery dataset. In addition, the eight−gene signature derived from the GPRC5A⁺ program stratified survival not only in TCGA−LUAD but also in two independent NSCLC/LUAD cohorts (GSE30219 and GSE31210). Taken together, these observations suggest that the GPRC5A⁺/Module−15 axis captures a malignant state that is shared across lung tumor histologies, although we acknowledge that cross−histology projection may still introduce bias and therefore interpret these links with appropriate caution.

By combining single−cell insights with bulk deconvolution, we also showed that the high− and low−risk groups differ markedly in immune contexture. High−risk tumors displayed higher fractions of CD8⁺ T cells, activated CD4⁺ memory T cells, NK cells and macrophages, whereas low−risk tumors contained fewer effector lymphocytes and more resting immune populations. At face value this immune−enriched profile seems inconsistent with the poorer survival of the high−risk group. Our additional analyses help reconcile this. Checkpoint and exhaustion markers (PDCD1, LAG3, HAVCR2 and others) and an exhaustion ssGSEA score were not strongly different between risk groups, but a matrix/CAF−related ssGSEA signature and TIDE Exclusion scores were significantly higher in high−risk tumors, indicating dense stromal remodeling and potential physical or biochemical barriers to T−cell function. Thus, the high−risk group appears “inflamed yet restrained”, lymphocytes are present but embedded in a matrix−rich, CAF−dominated milieu, which may blunt their anti−tumor effects (26). This is consistent with pan−cancer analyses showing that immune−rich but stroma−dominated TMEs can fare worse than truly inflamed, T−cell–effective tumors (27). Conversely, the low−risk group is relatively immune−quiescent but less stromally remodeled. These patterns highlight that immune quantity and immune quality are distinct, and that integrating tumor−intrinsic and microenvironmental features may yield more informative biomarkers than PD−L1 expression alone (21, 27, 28). In this context, our eight−gene signature—or the underlying malignant states from which it was derived—could potentially help stratify patients both by prognosis and by likelihood of benefiting from immunotherapy, although this will require prospective validation.

In addition, pharmacogenomic modelling suggested that low−risk tumors are more sensitive to Aurora kinase, IGF−1R, mTOR and TGF−β pathway inhibition, whereas high−risk tumors are relatively resistant. To begin exploring whether these predictions might be reflected at the cell−line level, we performed dose–response assays for three of the predicted agents—Tozasertib (Aurora kinase inhibitor), BMS−754807 (IGF−1R/IR inhibitor) and AZD8055 (mTOR inhibitor), in A549 and NCI−H23 cells. Across a range of concentrations, H23 cells exhibited lower concentration values and greater growth inhibition than A549 for all three drugs (Supplementary Table S4), indicating that certain LUAD contexts are intrinsically more susceptible to simultaneous targeting of Aurora, IGF−1R and mTOR signaling. While these in vitro experiments are limited to two cell lines and do not directly map onto the high− versus low−risk patient groups, they provide preliminary experimental support for the pharmacogenomic predictions and motivate further testing of pathway−targeted combinations in molecularly defined LUAD subsets. Together with the stromal and immune differences between risk groups, these data may guide hypothesis−generating combination strategies, for example, pairing stroma−modulating agents with immunotherapy in high−risk disease and pathway−targeted agents in tumors with a low−risk−like transcriptional profile.

A particularly novel aspect of our study is the focus on FAM189A2. This gene emerged from our single−cell−derived signature and Random Forest analysis as a candidate protective hub: it is down−regulated in tumors compared with adjacent lung, higher expression is associated with better survival, and this pattern is reproduced in an independent dataset. In vitro, bidirectional perturbation in two LUAD cell lines showed that FAM189A2 knockdown consistently enhanced, whereas overexpression dampened, migration and invasion. Changes in EMT markers (decreased E−cadherin and increased N−cadherin upon knockdown, with the converse trend upon overexpression) were modest but directionally compatible with a partial EMT shift. Taken together, these data support a model in which FAM189A2 restrains motility and invasive potential, possibly by stabilizing an epithelial state, although the precise downstream pathways remain to be defined. We therefore regard FAM189A2 as a promising biomarker and putative tumor suppressor that warrants deeper mechanistic and in vivo investigation rather than a fully validated therapeutic target at this stage.

This study has several limitations. First, the discovery scRNA−seq cohort (GSE196303) comprises pulmonary neuroendocrine (carcinoid) tumors and matched normal lung from LUAD patients, whereas our prognostic modelling was performed in LUAD, although we observed a similar GPRC5A−high/Module−15 malignant state in an independent LUAD single−cell dataset and validated the eight−gene signature in two external NSCLC/LUAD cohorts, cross−histology projection may still introduce bias. Second, the eight−gene risk model and pharmacogenomic predictions were derived from retrospective cohorts and, despite multivariable adjustment and external validation, overfitting or cohort−specific effects cannot be excluded. Finally, functional data for FAM189A2 are limited to in vitro perturbation in two LUAD cell lines, in vivo models and deeper mechanistic studies will be required to fully establish its role in lung cancer progression.

By integrating single−cell and bulk transcriptomes, we delineate malignant epithelial programs in lung cancer, identify an early GPRC5A⁺ malignant state linked to a favorable Module−15 signaling program, and derive an interpretable eight−gene signature that stratifies risk in LUAD across multiple cohorts. We further nominate FAM189A2 as a putative tumor suppressor whose reduced expression and activity are associated with a more migratory and invasive phenotype in vitro. These results provide a reusable analytic framework and generate testable hypotheses about immune contexture, stromal restraint and pathway−specific vulnerabilities. Prospective validation in histology−matched, independent cohorts, together with spatial profiling and mechanistic studies, will be essential to confirm clinical utility and to refine FAM189A2 and the GPRC5A⁺/Module−15 axis as biomarkers and potential therapeutic entry points.

Data availability statement

The datasets analyzed in this study are publicly available in the NCBI Gene Expression Omnibus (GEO) database under accession numbers GSE131907 (bulk RNA-seq) and GSE196303 (single-cell RNA-seq). Related metadata are provided in the Supplementary Material.

Ethics statement

Ethical approval was not required for the studies on humans in accordance with the local legislation and institutional requirements because only commercially available established cell lines were used.

Author contributions

PG: Writing – original draft, Writing – review & editing. MX: Writing – review & editing. JC: Writing – review & editing. HZ: Writing – review & editing, Conceptualization.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Conflict of interest

The authors declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu.2025.1701806/full#supplementary-material

References

1. Imielinski M, Berger AH, Hammerman PS, Hernandez B, Pugh TJ, Hodis E, et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell. (2012) 150:1107–20. doi: 10.1016/j.cell.2012.08.029

PubMed Abstract | Crossref Full Text | Google Scholar

2. Russell PA, Wainer Z, Wright GM, Daniels M, Conron M, and Williams RA. Does lung adenocarcinoma subtype predict patient survival? A clinicopathologic study based on the new international association for the study of lung cancer/american thoracic society/european respiratory society international multidisciplinary lung adenocarcinoma classification. J Thorac Oncol. (2011) 6:1496–504. doi: 10.1097/JTO.0b013e318221f701

PubMed Abstract | Crossref Full Text | Google Scholar

3. Siegel RL, Kratzer TB, Giaquinto AN, Sung H, and Jemal A. Cancer statistics, 2025. Ca-a Cancer J Clin. (2025) 75:10–45. doi: 10.3322/caac.21871

PubMed Abstract | Crossref Full Text | Google Scholar

4. Dragani TA, Muley T, Schneider MA, Kobinger S, Eichhorn M, Winter H, et al. Lung adenocarcinoma diagnosed at a younger age is associated with advanced stage, female sex, and ever-smoker status, in patients treated with lung resection. Cancers. (2023) 15:2395. doi: 10.3390/cancers15082395

PubMed Abstract | Crossref Full Text | Google Scholar

5. Singh SS, Dahal A, Shrestha L, and Jois SD. Genotype driven therapy for non-small cell lung cancer: resistance, pan inhibitors and immunotherapy. Curr Med Chem. (2020) 27:5274–316. doi: 10.2174/0929867326666190222183219

PubMed Abstract | Crossref Full Text | Google Scholar

6. Keibler MA, Wasylenko TM, Kelleher JK, Iliopoulos O, Vander Heiden MG, and Stephanopoulos G. Metabolic requirements for cancer cell proliferation. Cancer Metab. (2016) 4:16–6. doi: 10.1186/s40170-016-0156-6

PubMed Abstract | Crossref Full Text | Google Scholar

7. Wang Z, Li Z, Zhou K, Wang C, Jiang L, Zhang L, et al. Deciphering cell lineage specification of human lung adenocarcinoma with single-cell RNA sequencing. Nat Commun. (2021) 12:6500. doi: 10.1038/s41467-021-26770-2

PubMed Abstract | Crossref Full Text | Google Scholar

8. Li Q, Wang R, Yang Z, Li W, Yang J, Wang Z, et al. Molecular profiling of human non-small cell lung cancer by single-cell RNA-seq. Genome Med. (2022) 14:87. doi: 10.1186/s13073-022-01089-9

PubMed Abstract | Crossref Full Text | Google Scholar

9. He D, Wang D, Lu P, Yang N, Xue Z, Zhu X, et al. Single-cell RNA sequencing reveals heterogeneous tumor and immune cell populations in early-stage lung adenocarcinomas harboring EGFR mutations. Oncogene. (2021) 40:355–68. doi: 10.1038/s41388-020-01528-0

PubMed Abstract | Crossref Full Text | Google Scholar

10. Wolf FA, Angerer P, and Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. (2018) 19:15. doi: 10.1186/s13059-017-1382-0

PubMed Abstract | Crossref Full Text | Google Scholar

11. Wolock SL, Lopez R, and Klein AM. Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Syst. (2019) 8:281–+. doi: 10.1016/j.cels.2018.11.005

PubMed Abstract | Crossref Full Text | Google Scholar

12. Ritchie ME, Phipson B, Wu D, Hu YF, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. (2015) 43:e47. doi: 10.1093/nar/gkv007

PubMed Abstract | Crossref Full Text | Google Scholar

13. Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods. (2019) 16:1289–+. doi: 10.1038/s41592-019-0619-0

PubMed Abstract | Crossref Full Text | Google Scholar

14. Traag VA, Waltman L, and van Eck NJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep. (2019) 9:1–12. doi: 10.1038/s41598-019-41695-z

PubMed Abstract | Crossref Full Text | Google Scholar

15. Wu TZ, Hu EQ, Xu SB, Chen MJ, Guo PF, Dai ZH, et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation. (2021) 2. doi: 10.1016/j.xinn.2021.100141

PubMed Abstract | Crossref Full Text | Google Scholar

16. Kang M, Armenteros JJA, Gulati GS, Gleyzer R, Avagyan S, Brown EL, et al. Mapping single-cell developmental potential in health and disease with interpretable deep learning. BioRxiv: Preprint Server Biol. (2024). doi: 10.1101/2024.03.19.585637

PubMed Abstract | Crossref Full Text | Google Scholar

17. Aibar S, González-Blas CB, Moerman T, Van AHT, Imrichova H, Hulselmans G, et al. SCENIC: single-cell regulatory network inference and clustering. Nat Methods. (2017) 14:1083–+. doi: 10.1038/nmeth.4463

PubMed Abstract | Crossref Full Text | Google Scholar

18. Chen B, Khodadoust MS, Liu CL, Newman AM, and Alizadeh AA. Profiling tumor infiltrating immune cells with CIBERSORT. Methods Mol Biol (Clifton NJ). (2018) 1711:243–59. doi: 10.1007/978-1-4939-7493-1_12

PubMed Abstract | Crossref Full Text | Google Scholar

19. Geeleher P, Cox N, and Huang RS. pRRophetic: an R package for prediction of clinical chemotherapeutic response from tumor gene expression levels. PloS One. (2014) 9:e107468. doi: 10.1371/journal.pone.0107468

PubMed Abstract | Crossref Full Text | Google Scholar

20. Maroni G, Bassal MA, Krishnan I, Fhu CW, Savova V, Zilionis R, et al. Identification of a targetable KRAS-mutant epithelial population in non-small cell lung cancer. Commun Biol. (2021) 4:370. doi: 10.1038/s42003-021-01897-6

PubMed Abstract | Crossref Full Text | Google Scholar

21. Wu F, Fan J, He Y, Xiong A, Yu J, Li Y, et al. Single-cell profiling of tumor heterogeneity and the microenvironment in advanced non-small cell lung cancer. Nat Commun. (2021) 12:2540. doi: 10.1038/s41467-021-22801-0

PubMed Abstract | Crossref Full Text | Google Scholar

22. Han G, Sinjab A, Rahal Z, Lynch AM, Treekitkarnmongkol W, Liu Y, et al. : An atlas of epithelial cell states and plasticity in lung adenocarcinoma. Nature. (2024) 627:656–63. doi: 10.1038/s41586-024-07113-9

PubMed Abstract | Crossref Full Text | Google Scholar

23. Chen H, Chen X, Pan B, Zheng C, Hong L, and Han W. KRT8 serves as a novel biomarker for LUAD and promotes metastasis and EMT via NF-κB signaling. Front Oncol. (2022) 12. doi: 10.3389/fonc.2022.875146

PubMed Abstract | Crossref Full Text | Google Scholar

24. Yu T, Gao X, Zheng Z, Zhao X, Zhang S, Li C, et al. Intratumor heterogeneity as a prognostic factor in solid tumors: A systematic review and meta-analysis. Front Oncol. (2021) 11. doi: 10.3389/fonc.2021.744064

PubMed Abstract | Crossref Full Text | Google Scholar

25. Xu J, Zhang Y, Li M, Shao Z, Dong Y, Li Q, et al. A single-cell characterised signature integrating heterogeneity and microenvironment of lung adenocarcinoma for prognostic strati fi cation. Ebiomedicine. (2024) 102. doi: 10.1016/j.ebiom.2024.105092

PubMed Abstract | Crossref Full Text | Google Scholar

26. Chakravarthy A, Khan L, Bensler NP, Bose P, and De Carvalho DD. TGF-β-associated extracellular matrix genes link cancer-associated fibroblasts to immune evasion and immunotherapy failure. Nat Commun. (2018) 9:4692. doi: 10.1038/s41467-018-06654-8

PubMed Abstract | Crossref Full Text | Google Scholar

27. Bagaev A, Kotlov N, Nomie K, Svekolkin V, Gafurov A, Isaeva O, et al. Conserved pan-cancer microenvironment subtypes predict response to immunotherapy. Cancer Cell. (2021) 39:845–+. doi: 10.1016/j.ccell.2021.04.014

PubMed Abstract | Crossref Full Text | Google Scholar

28. Patkar S, Chen A, Basnet A, Bixby A, Rajendran R, Chernet R, et al. Predicting the tumor microenvironment composition and immunotherapy response in non-small cell lung cancer from digital histopathology images. NPJ Precis Oncol. (2024) 8:280. doi: 10.1038/s41698-024-00765-w

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: FAM189A2, lung cancer, prognostic signature, risk model, scRNA-seq, tumor microenvironment

Citation: Guo P, Xiang M, Chen J and Zhang H (2026) Integration of single cell and bulk transcriptomic analyses identifies FAM189A2 as a key prognostic gene in lung cancer. Front. Immunol. 16:1701806. doi: 10.3389/fimmu.2025.1701806

Received: 09 September 2025; Accepted: 11 December 2025; Revised: 29 November 2025;
Published: 06 January 2026.

Edited by:

Alessandro Russo, Humanitas Catanese Clinical Institute, Italy

Reviewed by:

Parvez Khan, University of Nebraska Medical Center, United States
Xinyu Song, Harvard Medical School, United States

Copyright © 2026 Guo, Xiang, Chen and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Huachuan Zhang, emhhbmdodWFjaHVhbjc3Nzc3QDE2My5jb20=; Jing Chen, Y2hlbmppbmdtYW9tYW8xMTExMUAxNjMuY29t

†These authors share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.