Clinical and Biological Interpretation of Survival Curves of Cancer Patients, Exemplified With Stage IV Non-Small Cell Lung Cancers With Long Follow-up

Worldwide, 18.1 million new invasive cancers and 9.9 million cancer deaths occurred in 2020. Lung cancer is the second most frequent (11.4%) and, with 1.8 million deaths, remains the leading cause of cancer mortality. About 1.7 million of lung cancers are of the non-small cell lung cancer (NSCLC) subtype, and of these, 60%–70% are in advanced stage IV at the time of diagnosis. Thus, the annual worldwide number of new NSCLC stage IV patients is about 1 million, and they have a very poor prognosis. Indeed, 25%–30% die within 3 months of diagnosis. However, the survival duration of the remaining 700,000 new patients per year surviving >3 months varies enormously. Surprisingly, little research has been done to explain these survival differences, but recently it was found that classical patient, tumour and treatment features cannot accurately distinguish short- and very long-term survivors. What then are the causes of these bewildering survival variations amongst “the same cancers”? Clonality, proliferation differences, neovascularization, intra-tumour heterogeneity, genetic inhomogeneity and other cancer hallmarks play important roles. Considering each of these, single or combined, can greatly improve our understanding. Another technique is analysis of the survival curve of a seemingly homogeneous group of cancer patients. This can give valuable information about the existence of subgroups and their biological characteristics. Different basic survival curves and what their shapes tell about the biological properties of these invasive cancers are discussed. Application of this analysis technique to the survival curve of 690 stage IV NSCLC patients with a 3.2–120.0-month survival suggests that this seemingly homogeneously group of patients probably consists of 4–8 subgroups with a very different survival. A subsequent detailed mathematical analysis shows that a model of 8 subgroups gives a very good match with the original survival curve of the whole group. In conclusion, the survival curve of a seemingly homogeneous group of cancer patients can give valuable information about the existence of subgroups and their biological characteristics. Application of this technique to 690 NSCLC Stage IV patients makes it probable that 8 different subgroups with very different survival rates exist in this group of cancers.

In conclusion, the survival curve of a seemingly homogeneous group of cancer patients can give valuable information about the existence of subgroups and their biological characteristics. Application of this technique to 690 NSCLC Stage IV patients makes it probable that 8 different subgroups with very different survival rates exist in this group of cancers.

INTRODUCTION
Worldwide, an estimated 18.1 million new invasive cancer cases and almost 9.9 million cancer deaths occurred in 2020. The average mortality rate of invasive cancers is therefore about 50%. Lung cancer, the second most frequently occurring cancer at 11.4%, with an estimated 1.8 million deaths, remains by far the leading cause of cancer mortality (1). Similar rates for lung cancer are found in the People's Republic of China (2). About 1.7 million of the lung cancers are of the non-small cell lung cancer (NSCLC) subtype, and of these, 60%-70% are in advanced stage IV at the time of diagnosis. Thus, the annual worldwide number of new NSCLC stage IV patients is about 1 million and they are generally regarded as having a very poor prognosis. Indeed, 25%-30% die within 3 months of diagnosis. On the other hand, the survival duration of those remaining approximately 700,000 new patients per year surviving >3 months can vary enormously. In a recent large observational study, median survival was 23.3 months, 1-, 2and 5-year survival rates are 74%, 49% and 16% respectively and 4%-5% survive 10 years and longer (3). The same surprising enormous survival variation can be found in patients with cancers from other organ sites, even if they have the same histological type, stage and other important prognostic characteristics.
What are the causes of these bewildering survival variations amongst "the same cancers"?
An important aspect is clonality. Cancer is an evolutionary process, driven by stepwise, somatic cell mutations with sequential, sub-clonal selection (4). Normal, polyclonal cells have approximately the same proliferation rate. However, sometimes genetic hits occur and change the polyclonal parent cell into neoplastic daughter cells with a new genetic make-up plus growth (proliferation) advantage. As a result, a very small nodule arises, consisting of cells which are genetically somewhat more unstable. Consequently, the risk of the development of another new cell clone with even more genetic instability and higher proliferation, and eventually invasive capacity, increases. These new tumour cells grow in densely packed populations that develop into spheroid or ellipsoid aggregates.
Another important aspect is neovascularization. This is an event that separates the development of any solid tumour into two stages: the avascular stage and the vascular stage. Because of this, angiogenesis plays a critical role in the biology of solid neoplasms. The two stages can be dissociated under experimental conditions. When this is accomplished and capillaries are prevented from penetrating the l-mm tumour, the tumour becomes dormant (5,6). This led to the concept of dormant cancers (7). Dormant solid tumours were produced in vivo by prevention of neovascularization. The beginning of an exponential volume increase was shown to coincide with vascularization of the implant. Although dormant in terms of expansion, these avascular tumours contained a population of viable and mitotically active tumour cells.
The transition from polyclonal to neoplastic cells probably occurs quite often. How long it takes to change from a 1mm diameter dormant tumour (consisting of approximately 1 million cancer cells) to a clinically detectable proliferating invasive cancer, of approximately 10 (7.5-15.0) mm (10 9 tumour cells), is less certain. From there to lethal metastases of 1,000 g (rough estimate), or 10 12 cells, depends amongst other factors on intratumour heterogeneity (ITH) (8). Genomic diversity within single tumours has been recognized as "genetic inhomogeneity" (9). Since next-generation sequencing studies have become available, the full extent of genomic ITH is becoming apparent. The degree of ITH can be highly variable, with between 0 and over 8,000 coding mutations found to be heterogeneous within primary tumours or between primary and metastatic or recurrence sites (10). These findings make it more than likely that especially seemingly homogeneous late-stage cancers are, in fact, genetically widely heterogeneous, also in their clinical behaviour. The latter can be observed in the survival curve of these cancers.
It is of obvious clinical and therapeutic significance to understand why patients, with seemingly homogeneous cancers, have such different survival rates. Of course, often, age, gender, performance stage, histologic subtype, no/minimal versus heavy smoking and different treatment modalities are strongly prognostic. In pulmonary adenocarcinoma, the mean number of clonal and sub-clonal non-silent mutations in non-smokers is much smaller than in smokers (8). However, even when these well-established prognostic factors are all considered, also in a multivariate manner, it may not be possible to explain why certain patients die within a rather short time, while others survive for (very) many years, as we recently found. It is important to emphasize that the number of these patients worldwide is very large indeed.
We recently worked on an article on the survival prediction accuracy of prognostic factors in the seemingly homogeneous group of 690 stage IV NSCLC surviving patients between 3.2 and 120.0 months. In the original manuscript, we hypothesized that this group in fact consisted of several hypothetical subgroups with widely varying survival rates. This hypothesis was based on the interpretation of the survival curve of the patients (3).
Survival curves can give valuable information about the clinical behaviour and the biological characteristics of a group of cancer patients. Such biological interpretation of survival curves was common knowledge in the last 2-3 decades of the 20th century. In fact, the first author of the current manuscript taught this knowledge as a standard part of the curriculum for medical students in Amsterdam. However, the comments of the reviewers of our recent manuscript (3), on our remarks to identify different prognostic subgroups by analysis of the survival curve, made it clear that this survival curve analysis knowledge was not as well-known as we thought. Rather than writing a long new section in that article to explain how we had come to the hypothesis of the existence of 4-8 subgroups with different survival rates, it was advised by the Acting Editor of the revised version of the manuscript that the topic of Clinicobiologic Interpretation of Survival Curves would be interesting enough for a separate new manuscript. This article will first describe different types of survival curves and how to analyse and classify them using essential hallmarks of cancer. Secondly, we will perform quantitative model studies to show that in seemingly homogeneous stage IV NSCLC patients with a 3.2-120.0-month follow-up, about 4-8 subgroups with very different survival rates occur.

DIFFERENT TYPES OF SURVIVAL CURVES AND THEIR CLINICO-BIOLOGIC INTERPRETATION
We will only consider tumours diagnosed as invasive carcinomas. Please remember that although the examples are hypothetical, many "real-world" examples can be found for each of them.
Basically 2 fundamental types of survival curves exist. The first one is shown in Figure 1. The cancer can be detected by the patients when the tumour reaches a certain size, or by radiologic and other screening methods. When the tumours are removed, histopathologic examination will show the invasive nature of the tumour. In the following years, none of the patients develop distant metastases; all survive without evident distant metastases.
The second extreme example of a survival curve type is of patients with cancers, shown in Figure 2. One can think of small cell lung cancer. All have died from their metastases at the end of the observation period (which in the current hypothetical example was set at 23 months but can also be set at 6 or 12 months). They seem homogeneous at the start of the follow-up, yet they have considerable differences in survival rates. 60% have died by the 6-month follow-up, 20% between 6 and 12 months and the last 20% between 12 and 23 months. Figure 3 shows the third type of survival curve, which occurs quite often. 30% of this group dies within 6 months, 10% between 6 and 12 months and 12% between 12 and 23 months. The remaining 45% of the patients survive until the end of the observation period. Such a curve is found when patients from the 2 different groups A and B are taken together.
It can be concluded from the shape of the survival curves shown in Figure 3 that a group with a curve with an initial steep decline, followed by a horizontal plateau, consists of one subgroup with a long distant metastases survival and 3 other groups with a very poor, poor and less poor survival.
The most basic biologic interpretation of Group A and B is that they both show Invasion, as they are pathologically diagnosed as invasive cancers.
They also all have clonal expansion. Clonal expansion is not limited to invasive cancers but also occurs in non-invasive neoplasias, such as for endometrial intraepithelial neoplasia (11).
Cancers from Group B patients not only have invasive and clonal expansion properties, just like those from group A, but also all have distant metastases at the time of the diagnosis. That the net growth of these metastases differs in these group B cancers is clear from the shape of the survival curve, as most of the patients die from their metastases once these have reached a certain lethal level (which is on average 1-2 kg, although much greater weights can be found in individual patients). (Some patients will die from much smaller   tumours if they are located at vitally essential locations, but these are exceptions). The 60% deaths in the first 6 months have on average reached their lethal metastatic mass within 6 months. Of course, the original volume at diagnosis may have varied, but the most important feature of these 60% of the tumours, compared with the other 20% dying between 6 and 12 months, is their higher net growth (the balance between the proliferation rates and death rates of the tumour cells). Likewise, the patients dying between 12 and 23 months again have a lower net growth rate. One can thus conclude that Group B tumours are both invasive, clonally expanding but also metastatic. One can further conclude from the survival curve of group B that it is not completely homogeneous but still consists of at least 3 subgroups with different proliferation rates (net growth speeds): very fast, fast and less fast.
A fourth type of survival curve is shown in Figure 4 of a hypothetical group D. At no point is a horizontal plateau found in the survival curve. Instead, at the end of the observation period 50% have died from distant metastases. On the other hand, the slope of the survival curve is much less steep than in the first, second and third subgroups of Group B. The conclusion is that patients of this group D all have distant metastases at the time of diagnosis, but with much lower net growth speeds than those of the subgroups of Group B. Alternatively, one could argue that the metastatic load of patients from Group B was much larger at the time of diagnosis. These 2 features cannot be discerned with the survival curves.
Of course, such a linear curve can be found with different follow-up times, for example 10, 20 and 30 years. Examples are Hodgkin-type lymphomas and certain breast cancers.      substage, the number of PBT cycles and TKI-TT had independent predictive value. However, with the multivariate combination of these features, identification results of short-term non-survivors and long-term survivors were poor. The shape of the survival graph of the 690 patients ( Figure 5, left part) is curved. As described above, this suggests that the seemingly homogeneous group of 690 patients in fact is heterogeneous, i.e., is comprised of different subgroups with widely different survivals.
A closer inspection of Figure 5, left part, shows the following:

The survival line is almost straight and decreases steeply from
100% survival probability at the 3-month follow-up, to 70% at 12 months. 2. From that point, the survival curve still goes down, but less steeply. This second nearly straight line is between 70% at 12 months and 45% survival probability at the 30-month follow-up.     3. Then, after another bend, the curve is nearly straight between 45% and 28% survival probability (the latter is at about 45 months of follow-up). The slope of this third line again is less steep. 4. Between 28% and 18% survival probability (follow-up at 45% and around 55-60 months), another nearly straight line can be discerned. 5. Then, a somewhat less straight line from 18% to 10% can be observed from approximately 60 to 96 months of follow-up, respectively. 6. Beyond the 96-month follow-up, the line is somewhat irregular, but roughly nearly horizontal from 10% to 5% (at the 120-month follow-up).
The right part of Figure 6 approximates the abovementioned graphically. For the sake of clarity, we have only drawn 4 tangent lines instead of 6.
The quantitative and graphical analyses described in the total group of 690 patients probably consists of different subgroups with different biological behaviour and survival rates.
It is important to note that these 690 individuals had the same histological type (NSCLC) and stage (IV). Thus, all had metastases at the time of diagnosis and also at the entry in this study, at least 3 months after the diagnosis. Yet, some died very  quickly, (within 12 months), and others survived very long (5-10 years). The 690 "homogeneous" group in retrospect was heterogeneous, i.e., consisted of subgroups with different survival rates.
How many different subgroups exist in the 690 patients?
The Actual Observed survival curve gives important clues. Remember that 261 stage IV NSCLC patients from the same observation period had already died before the 3-month survival and are not considered in the current study. This explains why the survival curve of the 690 patients in Figure 7 starts at 100%/ 3-month follow-up. This point is called P.
1. Closer observation shows that there are typical points in the graph, in which the slope of the survival curve shows a subtle  change and becomes less steep. These points are shown in Figure 7 and are located at: 2. Q: 70% survival/14-month follow-up, 3. R: 40%/30 months, 4. S: 18%/54-60 months 5. T: 10%-5%/90-108 months. Note that the number of patients becomes quite low after 90 months of follow-up, which could have caused the less smooth shape of the curve between 90 and 120 months of follow-up.
Linear (straight) lines can be drawn between these points (i.e., P-Q, Q-R, R-S, S-T). These are shown in Figure 8. These lines are slightly shifted in the figure, to make them more visible.
Of course, the subgroups which these lines represent did not start to exist at their respective starting points Q, R, S but were all present in the total group at the start of study (i.e., at point P). Consequently, the lines from Figure 7 can be extrapolated from point P, as lines with the same slope, to the points where they cross the x-axis. These lines are shown in Figure 8 and represent the hypothetical subgroups.
From Figure 9, we can determine the Median Survival Time and Overall Survival Time, and the percentage and total number Alive With Disease for the 4 hypothetical subgroups. Table 1 and Figure 10 show these data.
As the match of 4 hypothetical subgroups was not perfect, we then repeated the modelling study for 8 subgroups. Table 2 shows the total number of 690 patients and their characteristics.
8 hypothetical subgroups are also determined by the linear tangent method used in Figures 7 and 8. Figure 11 shows the survival curves of these 8 hypothetical subgroups. Figure 12 shows that the match of the theoretical line, with the original survival curve of the 690 patients, is close to perfect.
In summary, the abovementioned shows that a combination of patients with linear non-curved survival curves with different survival rates can result in a curved survival line which is very close to the Actual Observed survival curve of the 690 patients. Secondly, it is highly probable that at least 4 and more likely 8 different subgroups with very different survival rates exist in the 690 NSCLC Stage IV patients.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
JB: concept of study and article, analysis, interpretation of results; drafting of the article and revising it critically for important intellectual content; final approval. HL: revising of the manuscript critically for important intellectual content; final approval. HG: conception and design of the study; analysis and analysis support; interpretation of data; drafting of the article or revising it critically for important intellectual content; final approval. All authors contributed to the article and approved the submitted version.

FUNDING
This study was funded by a personal grant no. 2021-177 to JB from Medical Practice Dr. Jan Baak Inc., Tananger, Norway, to participate in this study and for the translation correction and publication costs.