The Relationship Between Short-Term Surrogate Endpoint Indicators and mPFS and mOS in Clinical Trials of Malignant Tumors: A Case Study of Approved Molecular Targeted Drugs for Non-Small-Cell Lung Cancer in China

Objective: Due to the initiation of the priority review program in China, many antitumor drugs have been approved for marketing based on phase II clinical trials and short-term surrogate endpoint indicators. This study used approved targeted drugs for the treatment of non-small-cell lung cancer (NSCLC) in China as an example to evaluate the association between short-term surrogate endpoints [objective response rate (ORR) and disease control rate (DCR)] and median progression-free survival (mPFS) and median overall survival (mOS). Methods: Five databases, i.e., MEDLINE, Embase, Cochrane Library, China National Knowledge Infrastructure (CNKI), and Wanfang Data were searched, for phase II or phase III clinical trials of all molecular targeted drugs that have been marketed in China for the treatment of NSCLC. After screening the literature and extracting information, both univariate and multivariate linear regression were performed on the short-term surrogate indicators and mPFS and mOS to explore the relationship. Results: A total of 63 studies were included (25 studies with only ORR, DCR, and mPFS and 39 studies with ORR, DCR, mPFS, and mOS). In terms of the targeted drugs for the treatment of NSCLC, in addition to the good but not excellent linear relationship between DCR and mOS (0.4 < R2 adj = 0.5653 < 0.6), all other short-term surrogate endpoint indicators had excellent linear relationships with mPFS and mOS (R2 adj≥0.6), while mPFS and mOS had the most excellent linear relationships (R2 adj = 0.8036). Conclusion: For targeted drugs for the treatment of NSCLC, short-term surrogate endpoint indicators such as ORR and DCR may be reliable surrogate indicators for mPFS and mOS. However, whether short-term surrogate endpoint indicators can be used to predict final endpoints remains to be verified.


INTRODUCTION
Malignant tumors are a high-risk factor for death and severely hinder increases in the average life expectancy of the population (1). They are the leading cause of death in the urban population. In 2019, approximately 25.73% of urban population deaths in China were caused by malignant tumors, with a mortality rate of approximately 161.56/100,000 people (2). In 2018, there were 3.804 million new cases of malignant tumors in China, accounting for more than 20% of the global cases. The incidence of malignant tumors was 278.07 per 100,000 people, and the mortality rate was 167.89 per 100,000 people (Ma and Yu, 2020). Malignant tumors seriously threaten the lives and health of people. From the perspective of disease burden, malignant tumors have caused a substantial loss of disability-adjusted life years (DALYs). Studies have shown that (3) the proportion of DALYs caused by trachea, bronchus, and lung cancers was 4.1% of the total DALYs, ranking fourth only after stroke (11.9%), ischemic heart disease (8.1%), and chronic obstructive pulmonary disease (5.5%). From the economic burden perspective, the average medical costs for malignant tumor patients are increasing year by year. In 2005, the average cost of a single hospitalization for discharged patients in China was 10,777 yuan (RMB), increasing to 13,322 yuan in 2011, 15,672 yuan in 2013, and 17,567 yuan in 2016(Wei-jing and xiao-lu, 2019. To increase patients' accessibility to new drugs and to improve the quality of life, the National Medical Products Administration (NMPA) in China launched a priority review program to allow more innovative drugs to be approved as soon as possible to bring patients with malignant tumors benefits. The NMPA priority review processes mainly include three policies: one review process for breakthrough therapeutic drugs, one review process for the conditional approval of drugs for marketing, and one priority review process for drug marketing authorization (Adminstration, 2020). The priority review program greatly shortens the time to market for some new anticancer drugs which often focus on the rare targets, and many of them do not have abundant clinical data based on Chinese patients. Many of these clinical studies are often singlearm with a small sample size and short follow-ups, and even primary endpoint indicators such as progression-free survival (PFS) and overall survival (OS) were not reported. Table 1 summarized the reported status of clinical trial indicators for anticancer drugs approved in China from 2017 to November 2021. An increasing number of drugs were approved using only short-term surrogate endpoint indicators. Among them, only 16 new drugs reported both PFS and OS data. However, the lack of primary endpoint indicators causes challenges in reliably determining the safety and efficacy of anticancer drugs and, likewise, poses a significant challenge for economic evaluations. In the economic evaluation of anticancer drugs, the partitioned survival model (PSM) and the Markov model are most popular model types (Rui et al., 2021). The construction of both the PSM and the Markov model requires the support of mature PFS and OS data (6). Therefore, when only short-term surrogate endpoint indicators available, it is worth investigating whether there is a significant relationship between such indicators and primary endpoint indicators.
To explore this issue, this study will use clinical trials related to targeted therapy for the treatment of NSCLC approved in China as an example to explore the correlation between shortterm surrogate endpoint indicators and primary endpoint indicators.

Literature Search Strategy
Chinese and English databases and platforms were searched for Phase II or phase III clinical trials of molecular targeted drugs for the treatment of NSCLC. The Chinese search included China National Knowledge Infrastructure (CNKI) and Wanfang Data. The English search included MEDLINE through the PubMed search platform, Embase, and the Cochrane Library. The date ranges for the searches were from the establishment of the databases to 20 March 2021. The searches were based on a combination of subject headings and free-text. Chinese search terms and English search terms included non-small-cell lung cancer and clinical trial, among other terms. The English search strategy was shown in the supplementary materials.

Inclusion and Exclusion Criteria
The inclusion criteria for this study were as follows: (Bray et al., 2021) phase II or phase III clinical trials, including single-arm clinical trials and placebo-controlled clinical trials (Ma and Yu, 2020); patients diagnosed with NSCLC by laboratory tests, imaging examinations and clinical signs and symptoms (Wei-jing and xiao-lu, 2019); intervention measures that included molecular targeted drugs for the treatment of NSCLC approved for marketing in China as of March 2021, including gefitinib, erlotinib, icotinib, crizotinib, dacomitinib, afatinib, osimertinib, almonertinib, alectinib, ceritinib, brigatinib, lorlatinib, selpercatinib, entrectinib, dabrafenib + trametinib, erlotinib + linsitinib, erlotinib + pazotinib, erlotinib + sorafenib, and anlotinib (Adminstration, 20202020); short-term surrogate endpoint indicators included ORR or DCR; and (Rui et al., 2021) primary endpoint indicators included median progression-free survival (mPFS) and median overall survival (mOS). The exclusion criteria for this study were as follows: (Bray et al., 2021): duplicate literature (Ma and Yu, 2020); non-Chinese or non-English literature (Wei-jing and xiao-lu, 2019); conference abstract (Adminstration, 20202020); trials other than phase II or phase III clinical trials (Rui et al., 2021); no simultaneous reporting of DCR, ORR, and mPFS; and (Coyle and Coyle, 2014) intervention measures that included molecular targeted drugs combined with other types of therapeutic measures.

Literature Screening and Data Extraction
Two researchers independently screened the literature, extracted the data, and cross-checked the data. Disagreements were resolved through consultation with a third party. Data extraction mainly included ① basic characteristics of the included studies (title, authors, year, etc.); ② sample size of each group; ③ treatment measures and their usage and dosage; ④ key elements of bias risk assessments; and ⑤ endpoint indicators (ORR, DCR, mPFS and mOS).

Quality of the Included Studies
Two investigators independently conducted quality evaluations of the included studies and cross-checked the results. For randomized controlled clinical trials (RCTs), the quality of the included studies was evaluated using the risk of bias assessment tool for RCTs recommended by the Cochrane Manual (Higgins et al., 2011). The Newcastle-Ottawa scale (NOS), recommended by the Cochrane Non-Randomized Studies Methods Group (NRSMG), was used to evaluate the quality of single-arm clinical trials (Margulis et al., 2014).

Data Processing
This study used STATA 15.1 to perform both univariate and multivariate linear regression analysis of the relationship between DCR and ORR and mPFS, as well as the relationship between DCR and ORR and mOS. In the case of a poor linear relationship between the shot-term surrogate endpoint indicators and the primary endpoint indicators, ln transformation was performed on the short-term surrogate endpoint indicators to explore the linear relationship between ln (short-term surrogate endpoint indicators) and the primary endpoint indicators. For the different dosage, medication or duration included in the analysis, the treatments were categorized for inclusion in multivariate regression analysis.
In addition, some studies showed that the OS is largely affected by the number of previous treatment lines, which means that patients received more lines of treatments often have a worse prognosis (Gisselbrecht et al., 2010;Rule et al., 2017). Therefore, the subgroup analyses were performed for first-line treatment and second-line or post-second-line treatment based on the number of treatment lines in the univariate linear regression analysis to separate patients with different treatment lines to reduce heterogeneity. The scatter plots for DCR and ORR vs. mPFS and mOS were plotted using Microsoft Excel. Adjusted goodness-of-fit R 2 adj was used to evaluate the degree of fit of the model. According to Lassere et al. (Lassere et al., 2012), R 2 adj ≥0.6 indicates excellent goodness-of-fit, R 2 adj ≥0.4 indicates good goodness-of-fit, R 2 adj ≥0.2 indicates fair goodness-of-fit, and R 2 adj <0.2 indicates poor goodness-of-fit.

Literature Screening Results
A total of 5,058 articles were obtained in the preliminary searches, and a total of 4,547 articles were included in the preliminary screening after excluding duplicates. After reading the titles and abstracts, 4,019 papers were excluded, and 528 papers were included in the full-text rescreening. After reading the full text of the 528 papers, 63 articles were included in the final sample for the quantitative analysis of DCR, ORR, mPFS and mOS. The literature screening process is shown in Figure 1.

Evaluation of the Quality of the Included Studies
Risk of bias in RCTs: The results of the risk of bias analyses for 42 two-arm or multi-arm RCTs were provided in Figure 2. "Selective reporting," "incomplete outcome data" and "random sequence generation" had a low risk of bias, and "blinding of outcome assessment" and "blinding of participants and personnel" had a high risk of bias. The risk of bias results for "allocation concealment" and "other bias" were not clear.
Quality evaluation results for single-arm trials: The NOS scores for the 21 included single-arm clinical trials were shown in Figure 3. The NOS scores for all studies ranged from 4 to 6, with an average score of 5.6, indicating that the overall quality of the studies was high. Among them, the NOS score for one paper was four points, the NOS score for six papers was five points, and the NOS score for the remaining 14 papers was six points.

RELATIONSHIP BETWEEN SHORT-TERM SURROGATE ENDPOINT INDICATORS AND PRIMARY ENDPOINT INDICATORS Analysis of ORR and mPFS
Taking the natural logarithm of mPFS, the adjusted goodness-offit of the univariate regression between ORR and ln (mPFS) was excellent (R 2 adj = 0.7356 > 0.6), which was shown in Figure 4 and Supplementary Table S4. After controlling the treatment factors, the adjusted goodness-of-fit of the multivariate regression between ORR and ln (mPFS) was excellent (R 2 adj = 0.7772 > 0.6), which was shown in the Supplementary Table S5.

Analysis of DCR and mPFS
Taking the natural logarithm of mPFS, the adjusted goodness-offit of the univariate regression between DCR and ln (mPFS) was excellent (R 2 adj = 0.7642 > 0.6), which was shown in Figure 5 and Supplementary Table S6. After controlling the treatment factors, the adjusted goodness-of-fit of the multivariate regression between DCR and ln (mPFS) was excellent (R 2 adj = 0.7806 > 0.6), which was shown in the Supplementary Table S7.

Analysis of ORR and mOS
The adjusted goodness-of-fit of the univariate regression between ORR and mOS was excellent (R 2 adj = 0.7633 > 0.6), which was shown in Figure 6 and Supplementary Table S8. After controlling the treatment factors, the adjusted goodness-of-fit of the multivariate regression between ORR and mOS was excellent (R 2 adj = 0.7813 > 0.6), which was shown in the Supplementary Table S9.

Analysis of DCR and mOS
Taking the natural logarithm of mOS, the adjusted goodness-offit of the univariate regression between DCR and ln (mOS) was good (R 2 adj = 0.5653 > 0.4), which was shown in Figure 7 and Supplementary Table S10. After controlling the treatment factors, the adjusted goodness-of-fit of the multivariate regression between DCR and ln (mOS) was excellent (R 2 adj = 0.6331 > 0.6), which was shown in the Supplementary Table S11.

Analysis Results of mPFS and mOS
The adjusted goodness-of-fit of the univariate regression between mPFS and mOS was excellent (R 2 adj = 0.7616 > 0.6), which was shown in Figure 8 and Supplementary Table S12. After controlling the treatment factors, the adjusted goodness-of-fit of the multivariate regression between mPFS and mOS was excellent (R 2 adj = 0.8036 > 0.6), which was shown in the Supplementary Table S13.

Results for First-Line Treatment Only
Taking the natural logarithm of mPFS, the adjusted goodness-offit of the univariate regression between ORR and ln (mPFS) was excellent (R 2 adj = 0.6188 > 0.6), which was shown in Figure 9. Taking the natural logarithm of mPFS, the adjusted goodness-offit of the univariate regression between DCR and ln (mPFS) was excellent (R 2 adj = 0.7128 > 0.6), which was shown in Figure 10. The adjusted goodness-of-fit of the univariate regression between ORR and mOS was excellent (R 2 adj = 0.7074 > 0.6), which was shown in Figure 11.
The adjusted goodness-of-fit of the univariate regression between mPFS and mOS was excellent (R 2 adj = 0.7764 > 0.6), which was shown in Figure 12.

Results of Second-Line or Post-Second-Line Treatment Only
Taking the natural logarithm of mPFS, the adjusted goodness-offit of the univariate regression between ORR and ln (mPFS) was excellent (R 2 adj = 0.6926 > 0.6), which was shown in Figure 13.         Taking the natural logarithm of mPFS, the adjusted goodnessof-fit of the univariate regression between mPFS and mOS was excellent (R 2 adj = 0.7497 > 0.6), which was shown in Figure 14.
The adjusted goodness-of-fit of the univariate regression between ORR and mOS was excellent (R 2 adj = 0.7324 > 0.6), which was shown in Figure 15.

DISCUSSION
This study summarized all clinical trials of molecular targeted drugs for the treatment of NSCLC approved for marketing in China as of March 2021. Studies that included DCR, ORR, mPFS concurrently and studies that included DCR, ORR, mPFS, and mOS concurrently were extracted for univariate linear regression analysis. This study included a total of 25 articles that reported DCR, ORR, and mPFS concurrently and 39 articles that reported DCR, ORR, mPFS, and mOS concurrently. In the relationship between DCR and ORR and mPFS, DCR and ORR had an excellent linear relationship with ln (mPFS), and the adjusted goodness-of-fit R 2 adj was >0.6. However, the linear relationships between DCR and ORR and mPFS were slightly weaker. For the relationships between DCR and ORR vs. mPFS and mOS, the linear relationship between DCR and mOS or ln (mOS) was good but not excellent (0.4 < R 2 adj <0.6). mPFS and mOS had most excellent linear relationships (R 2 adj = 0.8036).     The results of the study suggested that short-term surrogate endpoint indicators such as ORR and CR cannot replace primary endpoint indicators such as PFS and OS and that the correlation between the two is weak and unstable. Our result was inconsistent with those reported by Cooper et al. (Cooper et al., 2020) Although Cooper et al. reported that there was no significant correlation between shortterm surrogate endpoint indicators and primary endpoint indicators, the conclusion was likely due to the wide selection of disease types and treatment regimens included in the study. Moreover, Cooper et al. did not address whether there was a correlation between shortterm surrogate endpoint indicators and primary endpoint indicators in the clinical trials of specific types of anticancer drugs, which might be one of the main reasons for the difference between the results of this study and the study by Cooper et al. The results of this study revealed that short-term surrogate endpoint indicators (ORR and DCR) might have a linear relationship with mPFS and mOS, potentially providing the option to use short-term surrogate endpoint indicators to predict mPFS and mOS. In the pharmacoeconomic evaluation of tumors, PFS and OS are the most important evaluation indicators to verify drug efficacy and determine the success of the construction of pharmacoeconomic models. In the traditional pharmacoeconomic models for advanced cancer, 3-state models are often used to construct Markov models or PSMs for pharmacoeconomic evaluations (5). Markov models indicate the transition probability between health states using PFS and OS curves, and PSMs use PFS and OS curves to divide the area under the survival curve into three regions to calculate the area under the curve. If short-term surrogate endpoint indicators are used to predict mPFS and mOS, only two median values can be obtained, posing a challenge for pharmacoeconomic evaluations. We recommend that when only mPFS and mOS are available, assuming the PFS and OS curves obey an exponential distribution, mPFS and mOS should be used to construct an exponential distribution survival curve, thereby allowing the construction of a Markov model and an economic evaluation (Latimer, 2011). Although this method has strong assumptions, it can also provide a certain reference value in the absence of data.
This study has some limitations. Firstly, the molecular targeted drugs included in this study were limited to targeted drugs for the treatment of NSCLC that were approved for marketing in China as of March 2021; many targeted drugs approved for marketing in the other countries were not included in this study. Therefore, the extrapolation of the results is limited. Secondly, although only molecular targeted drugs approved in China were included, the race distribution of the included patients were not considered in the analyses. For many of these drugs, especially the recently approved drugs, were approved based on the published clinical data of the published international population plus the unpublished clinical data of a small sample of the Chinese population. Thirdly, this study did not use a large amount of real-world data for prediction and validation for the focus was to establish the statistical relationship between short-term surrogate endpoint indicators and primary endpoint indicators. Finally, for mOS, in addition to short-term surrogate endpoint indicators, other factors, such as the choice of subsequent treatment, will have a significant impact on mOS;  however, the univariate linear regression used in this study did not include enough influencing factors other than treatments.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
Conception of the study: MR, ZW, and HL; literature search: MR, ZF and ZW; data extraction: MR, ZW, ZF, and LS; statistical analysis: MR, YW, and YCW; drafting the manuscript: MR, YCW, and YW; revising and completion of final work: LS, YS, ZF, and HL; all authors reviewed and approved the final manuscript.

FUNDING
The publishing fees were funded by the corresponding author.