Intratumor Heterogeneity in Early Lung Adenocarcinoma

Lung cancer is one of the deadliest diseases in the world and is the leading cause of cancer-related deaths. Among the histological types, adenocarcinoma is the most common, and it is characterized by a high degree of heterogeneity at many levels including clinical, behavioral, cellular and molecular. While most lung cancers are known for their aggressive behavior, up to 18.5% of lung cancers detected by CT screening are indolent and put patients at risk for overdiagnosis and overtreatment. The cellular and molecular underpinnings of tumor behavior remain largely unknown. In the recent years, the study of intratumor heterogeneity has become an attractive strategy to understand tumor progression. This review will summarize some of the current known determinants of lung adenocarcinoma behavior and discuss recent efforts to dissect its intratumor heterogeneity.


INTRODUCTION
Over the last decades, several efforts have been made to reduce mortality among lung cancer patients. While advances in diagnostic and therapeutics have occurred, long-term survival rates compared to other cancers have barely improved (1). Therefore, new approaches are needed. In the context of lung adenocarcinoma (ADC), this is of great importance due to the high rate of overdiagnosis and lack of accuracy in predicting indolent vs. aggressive behavior of the tumor (2). In order to better predict disease behavior, it is crucial to understand the cellular and molecular underpinnings of the tumor. Thus, the study of intratumor heterogeneity and its clonal composition has become an attractive strategy to understand tumor progression and behavior (3)(4)(5)(6)(7). In the recent years emerging single-cell analysis platforms have allowed the deep profiling of the tumor microenvironment (TME), and seem promising approaches for the dissection and of tumor heterogeneity (8).

AN OVERVIEW OF LUNG ADC
Adenocarcinoma is a subclass of non-small cell lung cancer, which develops within the glandular cells of smaller airways along the outer edges of the lungs. It is the most common histological type, accounting for about 40% of all lung cancer cases. This type of lung cancer mostly occurs among current or former smokers, however it is also the most prevalent type of lung cancer in non-smokers (1). Thus, the exposure to environmental carcinogens combined with genetic susceptibility may also play an important role in the development of the disease (9,10).
The survival rate for lung cancer mostly depends on the stage at the time of diagnosis. On average, the current 5-year survival rate is about 18%, but if detected early it can lead to a better prognosis, with a 5-year survival rate of 54% for localized stage (1). However, only 15% of all cases are diagnosed on time, while the vast majority (57%) are diagnosed at a late stage (11). Therefore, screening for lung cancer in high risk individuals is important.
In the past years, numerous randomized trials have assessed the power of lung cancer screening showing that it is possible to detect lung cancer at an early stage in more than 40% of the cases (12,13). Furthermore, the 5-and 10-year survival rates among lung cancer patients enrolled in screening programs were close to 90%, which is very reassuring (14). The largest lung cancer screening trial at the moment, The National Lung Screening Trial (NLST), enrolled 53,452 high risk individuals for lung cancer across 33 U.S. medical centers and reported a 20% relative risk reduction in mortality using low-dose computed tomography (CT) screening compared to chest radiography (CXR) screening (15). Despite this encouraging statistics, it is worth to mention that 96% of the nodules detected through CT screening were benign. Moreover, confirmed lesions detected through CT screening range from very indolent to severely aggressive cancers. Therefore, screening, which by definition seeks to spot malignant nodules in asymptomatic individuals, bears the inherent feature of overdiagnosis. This phenomenon can be defined as the detection of a cancer that in other circumstances would have not become clinically evident, and represents a serious drawback for lung cancer screening in that it generates unnecessary treatment, morbidity, additional expenses, and anxiety and distress to the patient. A while after the NLST results were published, another study focused on the estimation of overdiagnosis in the NLST, reporting a probability of 18.5% that any lung cancer detected by LDCT was an overdiagnosis, as well as probabilities of 22.5% for non-small cell lung cancer and 78.9% for adenocarcinoma in situ (2). In that sense, a careful assessment of the images is crucial to ensure a more accurate prognosis. Additionally, the ongoing investigation in the discovery of new biomarkers offers a promising avenue to assist or eventually guide the screening and diagnosis process of high risk individuals.

THE MOLECULAR LANDSCAPE OF LUNG ADC
Over the years, genomic alterations occur and accumulate and in some cases those alterations may lead to oncogenesis. The somatic genomic alterations that are involved in cancer development are known as "driver alterations" and the ones that are not are known as "passenger alterations" (16). Lung ADC has one of the highest mutational burdens compared to other cancers (17,18). Those high rates of somatic alterations and genomic rearrangements include a large load of passenger events per tumor genome, which makes the identification of driver alterations even more challenging (19). Despite the difficulties, several genomic alterations have been described in the past years, some of which are currently known as canonical driver alterations, and some others that have recently been reported and may be novel driver events (19)(20)(21)(22).
Driver genomic alterations in lung ADC are generally associated with events that lead to the constitutive activation of signaling proteins, which commonly occur in oncogenes of the receptor tyrosine kinase (RTK)/RAS/RAF pathway (23). In the TCGA study, 62% of the tumors harbored such alterations (21). KRAS driver mutations were reported in 32% of TCGA samples (21). Along with HRAS and NRAS (0.9%), the other members of the RAS family, these proteins play an important role in the regulation of signaling pathways that control cell proliferation (24). Additionally, KRAS mutations are highly correlated with poor prognosis in early lung ADC (25). Cancerassociated mutations in EGFR were present in 11% of TCGA samples (21). EGFR, as well as other member of the EGFR family the oncogene HER2 (1.7%), are known to be involved in the regulation of several cellular processes including cell motility, angiogenesis, cell proliferation and apoptosis (26). Likewise, some EGFR mutations are related to an improved prognosis (27). Another important oncogene is BRAF, which works downstream of RAS proteins and has a crucial role in the RAS-MAPK pathway. Driver mutations of this gene were present in 7% of TCGA samples and are not known to be associated with prognosis (21,28). MAP2K1 encodes for a protein that operates downstream of BRAF and was found mutated in 0.9% of TCGA samples (21). MET exon 14 skipping is another cancer driver event which results in the loss of a negative regulatory site, and occurred in 4.3% of TCGA samples (21). Gene fusions, were reported for the genes ROS1, ALK, and RET, which were altered in 1.7, 1.3, and 0.9% of TCGA samples, respectively (21,23,29).
In addition to the drivers described above, for the 38% of the samples that did not carry a driver oncogene mutation, the TCGA study proposed previously unrecognized driver genes that might be involved in the RTK/RAS/RAF pathway activation (21). They identified significant amplification events of HER2 and MET in the oncogene-negative samples. Higher MET copy number in primary lung ADC at the time of diagnosis has been associated with poor prognosis (28). NF1, a tumor suppressor that negatively regulates the RAS oncogene, was mutated in 8.3% of the samples (21,30). RIT1 is mutated in 2.2% of ADC cases, and has been identified as a new oncogene driver as its mutations have been shown to activate MAPK and PI(3)K signaling in NIH3T3 cells (21,31).
Besides the RTK/RAS/RAF pathway, other relevant somatic genomic alterations have been identified. TP53 was commonly mutated in 46% of the samples (21). PIK3CA, a crucial positive regulator of the PI(3)K-mTOR pathway, was mutated in 7% of the cases, and STK11, a tumor suppressor from the same pathway, was mutated in 17% of the cases (21). Other mutated tumor suppressors were KEAP1 (17%), RB1 (4%), and CDKN2A (4%). In a large-scale project that characterized copy-number alterations in lung ADC, the most common amplification was found in chromosome 14q13.3, which corresponds to NKX2-1 (TTF1), a transcription factor involved in lung development (20). The inhibition of this gene led to reduced cell viability and colony formation in lung ADC cell lines (20). This gene was also reported amplified in 14% of TCGA samples (21). Other significant amplifications in the TCGA study included the telomerase reverse transcriptase TERT (18%), and MDM2 (8%), a negative regulator of p53 (21). The most significant deletion (19%) was the CDKN2A locus, which codes for the proteins p16 and p14arf, two important tumor suppressors and cell cycle regulators of the TP53 pathway (21,32). Some of the alterations described above are depicted in Figure 1. The understanding of lung ADC molecular alterations has significantly impacted patient survival in the past years through the development of targeted therapies. Patients with advanced or metastatic tumors bearing EGFR mutations, EML4-ALK rearrangement or ROS1 fusions have benefited from those. Erlotinib, gefitinib and afatinib are some of the drugs currently used to treat patients with EGFR exon 19 deletion or exon 21 mutations (33)(34)(35). Alectinib, ceritinib, and crizotinib have shown effectiveness in patients with ALK alterations, and the latter is also used in patients with ROS1 translocation (36)(37)(38)(39). The advances on genomic phenotyping of ADC have also benefited the development of immunotherapy. In a healthy individual, the immunecheckpoint PD-1 expressed in T cells protects against autoimmunity and inflammation. In cancer, PD-L1 expressed on tumor cells binds to PD-1 resulting in immunosupression and immune evasion. Nivolumab, pembrolizumab, and atezolizumab are some of the PD-1/PD-L1 FDA approved inhibitor drugs that have shown improved survival in advanced NSCLC patients compared to standard therapies (40)(41)(42). Another immunecheckpoint under the radar is CTLA-4. Two clinical trials (NCT02000947, NCT02352948) are currently investigating the effects of a combination therapy of dual checkpoint inhibition using durvalumab and tremelimumab, PD-1 and CTLA-4 inhibitors, respectively. However, early results suggest that this strategy did not significantly improved overall survival, although treatment with durvalumab alone provided a significant overall survival improvement (43,44). These and other targeted therapies have been extensively reviewed previously (30,45,46).
More recently, the molecular characterization of early lung ADC lesions has also provided some insights on tumor behavior. A recent study from our group has characterized 21 adenocarcinoma in situ (AIS), 27 minimally invasive adenocarcinoma (MIA) and 54 fully invasive adenocarcinoma using deep targeted genome sequecing Qian et al. (47). This work uncovered molecular features associated with aggressive early ADC clinical behavior and disease progression. Most genomic alterations in ADC were already present in AIS and 21 significantly mutated genes including known drivers such as KRAS, EGFR and TP53 were shared among the three groups, suggesting their step-wise role in malignant transition. APOBEC signature was associated with worse survival compared to DNA mismatch repair signature, and KRAS codon 12 mutations were associated with aggressive tumor behavior. Finally, an ensemblelevel progression model using phylogenetic analysis inferred the role of many known alterations in lung ADC progression and introduced several new players such as EPPK1, ATM, SMAD4, KMT2C, and KMT2D, which deserve to be further investigated. This brings new insights into the distinction between indolent and aggressive tumor behavior and will potentially have future implications in early ADC clinical management.

INTRATUMOR HETEROGENEITY AND CLONAL ARCHITECTURE
Intratumor heterogeneity is a highly complex phenomenon and it represents a major challenge in the assessment of cancer, as it acts as a confusing factor resulting in inaccurate diagnosis, prognosis and treatment of the disease (3). As mentioned before, lung ADC is a very heterogeneous disease with one of the highest mutational burdens across different cancer types (17,18). Therefore, a comprehensive understanding of the natural history of these tumors is urgently needed.
The study of tumor growth from an evolutionary perspective is not a new approach. In the early 70's, Alfred Knudson proposed that for a particular cell to became cancerous, both alleles of a given tumor suppressor gene must be mutated, also known as the "two-hit hypothesis" Knudson (48). In 1976, Nowell applied evolutionary models to study tumor progression and treatment failure, and proposed a clonal evolution model in which a tumor arises from a single mutated cell ("clone") and tumor progression occurs as a result of subsequent alterations, in which fitter and more aggressive clones replace the original clone cells (49). This linear evolution model was supported mostly by early studies that focused in a single gene rather than in the whole genome, and therefore clonal diversity was underestimated (50). Advances in new sequencing technologies allowed genome wide sequencing, which have elucidated a more complex clonal structure than previously thought (18).
In the past years, other evolutionary models have derived from applied phylogenetic inference to nextgeneration sequencing data. In neutral evolution, all driver alterations are thought to be present in the original neoplastic cell and subsequent alterations are neutral, thus it is characterized by the absence of selection and heterogeneity arises from stochastic processes as a byproduct of tumor progression (51). In punctuated evolution, it is postulated that tumor heterogeneity is generated in the early development of the neoplasia as a punctuated burst, followed by neutral evolution (52,53). Branching evolution, also known as the trunkbranch model, is defined by the gradual accumulation of driver mutations in subclonal populations (54). In this model, the "trunk" of the tumor consists of progenitor clones bearing early somatic alterations that drive tumorigenesis. Those early alterations are potentially ubiquitous events. Conversely, somatic events that occur later are heterogeneous events and are present in the subclones which make up the "branches" of the tumor and are tumor progression drivers.
Multiregion sequencing has been the most successful strategy to investigate intratumor heterogeneity and clonal evolution FIGURE 2 | Branching process of tumor evolution in lung ADC. A tumor is depicted as a tree structure with the trunk representing ubiquitous (clonal) mutations present in all tumor regions (blue); shared branches representing heterogeneous (subclonal) mutations present in some tumor regions (purple), and private branches (also subclonal) representing unique mutations present in one tumor region only (green). The blue right triangle shows how as the chromosomal instability increases, the subclonal diversification is triggered. The bottom bar indicates that the smoking signature is associated with early events whereas the APOBEC signature is associated with late events.
in lung ADC to date (4-6). The studies conducted by De Bruin and colleagues, Zhang and colleagues, and most recently Jamal-Hanjani and colleagues, provide evidence suggesting that intratumor heterogeneity and branched evolution might be a universal phenomenon across lung ADC (Figure 2). Most known driver alterations (21,23) were mapped to the trunks of the tumors, which suggests that those canonical alterations occur early in tumor evolution. Truncal driver mutations almost always occurred before genome doubling suggesting a particular role in tumorigenesis. On the other hand, truncal genome doubling events occurred before subclonal diversification but after the acquisition of driver mutations, which suggests that chromosomal instability may be a crucial step that induces copy number alterations followed by a burst of mutational heterogeneity (Figure 2). Furthermore, the association of drug resistance and patient relapse with chromosomal instability (55), supports the hypothesis that the ability of chromosomal instability to generate extensive subclonal divergence could be compromising the effectiveness of therapeutics strategies that target truncal driver mutations due to the overlooked and already present clonal heterogeneity (4). Besides, data from these studies suggest that certain alterations in non-canonical cancer genes may also drive tumor development and subclonal diversification.
Another important feature of the disease addressed by these groups was the influence of smoking status in the clonal history of the tumors. Smoking signature (signature 4) is characterized by a high proportion of C>A transversions (18). In these studies, tumors from former and current smokers showed a decrease in the proportion of C>A transversions in subclonal mutations compared to early mutations, which suggests a relative decrease in the mutational burden due to smoking during tumor development (4)(5)(6). Moreover, the decrease of C>A transversions was followed by an increase in C>T and C>G mutations, which indicates APOBEC cytidine deaminase activity (18). This suggests that APOBEC mutagenesis may be playing a role in subclonal expansion in these tumors. In addition, a prolonged tumor latency period was reported by two groups (4,6). In the study conducted by De Bruin and colleagues, a tumor from a patient that ceased smoking 20 years before surgery bore the smoking signature in more than 30% of truncal mutations, which suggests that these events occurred within a smoking tumorigenic setting more than 20 years ago (4). Likewise, Jamal-Hanjani and colleagues reported that 7 patients that were former smokers for several years before surgery, presented a smoking mutational signature suggesting tumor latency for several years before clinical manifestation of the disease (6). Furthermore, Zhang and colleagues and Jamal-Hanjani and colleagues found an association between the proportion of subclonal genomic alterations and recurrence (5,6). In the cohort studied by first group, the three patients that relapsed had a significantly higher proportion of subclonal mutations compared to the patients with no relapse, suggesting that the degree of subclonal divergence may be associated with postsurgical relapse (5). In contrast, the second group did not find a significant association between the proportion of subclonal mutations and disease recurrence in their cohort, but found that patients with a large proportion of copy-number alterations were at higher risk for relapse or death compared to patients with a low proportion (6). Additionally, this group found that many late driver mutations corresponded to alterations that have been reported in other tumor types, and most of them are involved in genome maintenance processes such as DNA damage response, chromatin remodeling and histone methylation. They hypothesized that late mutations may be responsible for providing advantages to the emerging subclones and enabling the late stages of the disease as they may remove tissue specific constrains on the neoplastic genome (6).
These studies raised the question if single-region biopsy is informative enough to help the health providers make accurate treatment decisions. Intratumor heterogeneity has proven to be an intrinsic phenomenon to lung ADC, and it may compromise the ability of a single biopsy to comprehensively and accurately describe the complexity of the disease for an optimal cancer control. In a handful of cases, a large proportion of subclonal events were found in a single region but were absent in other regions of the same tumor, evidencing the limitations of a singleregion biopsy in accurately explaining the clonal architecture of the tumor and highlighting the power of multiregion sequencing to better capture the clonality of the tumor which could help to prioritize some drug targets (4)(5)(6). Nonetheless, in the study conducted by Zhang et al., while they observed that multiregion sequencing is a better strategy to understand intratumor heterogeneity they also provided evidence that demonstrates that an increase in sequencing depth (277x to 863x) allowed the identification of most of the driver mutations in the tumors studied and many subclonal mutations were detectable in all regions of individual tumors. This suggests that a single biopsy analysis might be sufficient if the sequencing depth is increased (5).

THE TUMOR MICROENVIRONMENT OF LUNG ADC
It is known that the immune microenvironment plays a pivotal role in lung ADC development, thus it may also shape intratumor heterogeneity. Neoantigen presentation is an important step for cytolytic T cell response and it is guided by the human leukocyte antigen (HLA) class I molecule, which presents intracellular peptides on the cell surface for the T cell receptors to recognize (56). A person's genome contains up to six different HLA class I alleles encoded by the genes HLAA, HLA-B and HLA-C. Each HLA allotype presents peptide antigens based on specific anchor residues within the peptide sequence that are required for the peptides to bind. Therefore, loss of heterozygosity (LOH) results in loss of an HLA allotype and thus loss of the ability to bind those peptides that only contain anchor residues able to bind to the lost HLA molecule, hence fewer neoantigens can be presented to T cells. The impairment of tumor neoantigen presentation as a consequence of LOH in HLA class I was recently suggested as a mechanism of immune evasion in NSCLC (57). In this study, both lung adenocarcinomas and squamous cell carcinomas tumors with HLA LOH presented higher mutational burden compared to tumors without HLA LOH, with a significant increase in subclonal mutations. Furthermore, tumors harboring HLA LOH were enriched in neoantigens predicted to bind the missing HLA alleles and presented high PD-L1 staining on immune cells. This mechanism may facilitate the sub clonal expansion of cells harboring previously antigenic mutations that had become undetectable to the immune system. A following study from the same group, found that the immune microenvironment tends to be highly heterogeneous between and within patients, showing distinct regions with different levels of immune evasion within individual tumors (58). Additionally, tumors showing high immune infiltration and HLA allelic preservation also presented neoantigen depletion suggesting that immune evasion occurs by HLA LOH or neoantigen suppression. One of the possible mechanisms for the latter is promoter hypermethylation, which explains 23% of the neoantigens included in this study, suggesting that other mechanisms must be in place. Further elucidation of the mechanisms involved in neoantigen-associated immune escape could have important clinical implications in therapy selection and response prediction.
In recent years, more studies focusing on the TME are starting to implement the use of single-cell based technologies, which can elucidate tumor heterogeneity with high resolution by detecting cells individually instead of a bulk signal and yield loads of information (Figure 3). Using single-cell proteomics mass cytometry analysis with paired tumor tissue, normal tissue and peripheral blood, Lavin and colleagues intended to provide an innate immune cell atlas of early LADC (59). In this study, early lesions have shown to bear a unique and TNM stage-independent immune signature, with a particular subset of tumor-infiltrating myeloid cells different from normal lung-PPARγ hi macrophages enrichment and CD141+ dendritic cells (DC) depletion)-which could be compromising T cell immunity and may offer a new avenue of intervention in T cell immunotherapies. PPAR γ is a transcription factor known to drive an immunosuppressive program Reddy (60). Lymphotoxin beta, inflammatory response inducer, has been previously shown to act on high endothelial venules (HEV) to promote lymphocyte homing to peripheral lymph nodes in vivo Moussion and Girard (61). The authors found that the CD141+ DC subset expressed lymphotoxin beta transcripts in lung tumor tissues which suggests that CD141+ DC contribute to tertiary lymphoid structure formation likely through HEV-mediated recruitment of lymphocytes. Therefore, an induced expansion of intratumoral CD141+ DC may serve as a potential anti-tumor immunity strategy. This study highlights the importance of paired analysis to identify tumor-associated immune alterations from normal tissue-imprinting. Other study that also focused on tumor infiltrating myeloid cells (TIM), used single-cell RNA seq to profile a compare TIM populations between mice and humans in the context of NSCLC (62). Although the goal of this study was to establish similarities between mouse and human TIM expression programs, the comprehensive annotation of the different myeloid populations is an important contribution for future studies on clinical implications of the heterogeneity of these cell types. The authors reported that mouse and human TIM subsets show one-to-one equivalence and that blood myeloid cells poorly reflect TIM states. Due to the overlap of TIM states between patients they assessed the association with patient survival addressing the expression of genes specific to each subpopulation. They identified three conserved subsets of neutrophils, N1 that express canonical neutrophil markers, N2 which are tumor specific and promote tumor growth, and N2 which have a expression signature of type I interferon response. They found that human neutrophil subsets N2 and N5 showed an abundance of marker genes associated with poor survival. Conversely, the marker genes of human DC subset 2, which preferentially interacts with CD4+ T cells, showed positive association with survival. Guo and colleagues also investigated the immune system of NSCLC with single-cell RNA seq but focusing on T cell subpopulations of 14 patients (63). They identified two new CD8+ T cell pre-exhausted subsets, which together with the presence of highly migratory effector T cells may provide an explanation for positive responses to immunotherapy. When they interrogated LADC TGCA data with their expression signature, they found that patients mainly clustered into two groups: one enriched in pre-exhausted CD8+ T cells, non-activated Tregs and activated CD4+ T cells, and the other enriched in exhausted T cells and activated Tregs. Patients from group 1 had significantly better prognosis than patients from group 2, therefore T cell composition could be a potential clinical biomarker for LADC patients. In a different study, Lambrechts and colleagues used single-cell RNA sequencing and reported a comprehensive 52,698-cell catalog of the TME transcriptome of lung cancer samples, most of which were LADC patients (64). They identified 52 different stromal subtypes including different populations of cancer-associated fibroblasts, endothelial cells and infiltrating immune cells, some of which were further validated through immunofluorescence. Further analysis of TCGA data indicated that the abundances of some subpopulations and their correlation with patient survival differ between ADC and squamous cell carcinoma (SCC) and that they were influenced by clinical characteristics such as stage. Low expression of CD8 + T cell cluster 8 marker genes were positively and negatively associated with survival in ADC patients and SCC, respectively. This cluster represented CD8+ cytotoxic T cells per their high granzyme and IFN expression, and was characterized by high T cell exhaustion marker expression (LAG3). These and other gene expression changes in tumor stroma reveal potential new directions for intervention.
In conclusion, the TME represents an important component of tumor heterogeneity in LADC and is strongly associated with disease progression and predicted outcome. Although the different flavors of bulk profiling of the tumors are still providing a significant amount of information, it is important to acknowledge that singlecell approaches offer a new level of granularity that are allowing us to deeply dissect and further understand LADC heterogeneity and its implications in early stages of the disease. Nevertheless, such techniques are highly expensive which currently limits the number of samples per study. A combination of both bulk and single-cell approaches as reported in some of the studies mentioned above may be a suitable alternative to get the most out of the data while stateof-the-art techniques become more affordable through the years.

CONCLUSIONS
Lung ADC is a devastating disease and despite the ongoing research efforts, the overall survival rates have barely improved in the past years. While screening programs have proven to significantly increase the chance of survival in high risk individuals, there is also a high probability of overdiagnosis. Therefore, the molecular determinants of early tumor development behavior need to be further investigated. In the past years, it has become more evident that intratumor heterogeneity profiling of lung ADC is the most effective strategy to understand tumor progression. In this context, the rapidly evolving field of single-cell technologies offers a novel set of tools that is unraveling the complexity of lung ADC and other cancers with a resolution never reached before.

AUTHOR CONTRIBUTIONS
M-FS wrote the manuscript with support from PM. PM conceived the original structure of the manuscript and supervised the project.

FUNDING
M-FS and PM were supported by CA196405 and CA152662.