Basket trials in oncology: a systematic review of practices and methods, comparative analysis of innovative methods, and an appraisal of a missed opportunity

Background Basket trials are increasingly used in oncology drug development for early signal detection, accelerated tumor-agnostic approvals, and prioritization of promising tumor types in selected patients with the same mutation or biomarker. Participants are grouped into so-called baskets according to tumor type, allowing investigators to identify tumors with promising responses to treatment for further study. However, it remains a question as to whether and how much the adoption of basket trial designs in oncology have translated into patient benefits, increased pace and scale of clinical development, and de-risking of downstream confirmatory trials. Methods Innovation in basket trial design and analysis includes methods that borrow information across tumor types to increase the quality of statistical inference within each tumor type. We build on the existing systematic reviews of basket trials in oncology to discuss the current practices and landscape. We conceptually illustrate recent innovative methods for basket trials, with application to actual data from recently completed basket trials. We explore and discuss the extent to which innovative basket trials can be used to de-risk future trials through their ability to aid prioritization of promising tumor types for subsequent clinical development. Results We found increasing adoption of basket trial design in oncology, but largely in the design of single-arm phase II trials with a very low adoption of innovative statistical methods. Furthermore, the current practice of basket trial design, which does not consider its impact on the clinical development plan, may lead to a missed opportunity in improving the probability of success of a future trial. Gating phase II with a phase Ib basket trial reduced the size of phase II trials, and losses in the probability of success as a result of not using innovative methods may not be recoverable by running a larger phase II trial. Conclusion Innovative basket trial methods can reduce the size of early phase clinical trials, with sustained improvement in the probability of success of the clinical development plan. We need to do more as a community to improve the adoption of these methods.


Introduction
Advancement in genomics technology has enabled innovation in oncology drug development over the last decade.A growing adoption of a precision medicine, with the aim to identify and develop effective targeted therapies, characterizes the current landscape of pharmaceutical drug development.It requires that a new treatment must not only address a disease defined by the histology and anatomical site from which it arose, but also the specific molecular, genetic, or immunologic subtype (1).To date, the U.S. Food and Drug Administration (FDA) has approved six tumor-agnostic therapies (2): selpercatinib was approved in September 2022 for patients with locally advanced or metastatic rearranged during transfection (RET) fusion-positive solid tumors (3); dabrafenib was approved in June 2022 in combination with trametinib for patients with unresectable or metastatic solid tumors with BRAF V600E mutation (4); dostarlimab, a programmed cell death protein 1 (PD-1) inhibitor, was approved in August 2021 for adult patients with mismatch repair deficient recurrent or advanced endometrial cancer (5); entrectinib, a tyrosine kinase inhibitor (TKI) was approved in August 2019 for adults and pediatric patients 12 years of age and older with solid tumors that have a neurotrophic tyrosine receptor kinase (NTRK) gene fusion without a known acquired resistance mutation (6); pembrolizumab, a PD-1 inhibitor, received accelerated approval in 2017 for adult and pediatric patients who have unresectable or metastatic solid tumors microsatellite instability-high (MSI-H) or deficient mismatch repair (dMMR) (7); larotrectinib, a tropomyosin kinase receptor (TRK) inhibitor, was approved in 2017 for adult and pediatric patients with unresectable or metastatic solid tumors with neurotrophic TRK fusion (8).
Innovation in biotechnology and clinical trials, matched with advanced computational tools, holds promise to accelerate the discovery and development of new targeted therapies.One example of methodological innovation is the design of master protocols, or trials that simultaneously evaluate the effect of multiple investigational drugs and/or multiple cancer types under a single overarching protocol (9).Master protocols in oncology allow detection of specific signal pathways strongly associated with driver gene mutations, cancer cell growth, and progression (10).A basket trial is a particular type of master protocol that evaluates the efficacy and safety of a targeted therapy in multiple diseases that share a common molecular alteration (11) or a tumor agnostic effect.Basket trials can also be used to identify the tumor types where the drug is active with a single operationally efficient homogeneous protocol.Of the six tumor-agnostic therapies approved by the FDA, two development programs-selpercatinib (NCT03157128) and dabrafenib (NCT02465060)-used a basket trial.
A recent systematic review of basket trial master protocols identified a large increase in the number of basket trials in the past 14 years, from 1 basket trial in 2009 to 49 trials in 2019 (11).However, it remains a question whether increasing adoption of basket trials has translated into patient benefit, either through higher response rates because of precision treatment or because of increasing options for rare tumor types that are less represented in oncology drug development (12).There are also challenges on how to evaluate treatment efficacy in basket trials, arising from the disconnect between the implied homogeneity of responses (based on the expectation of a tumor-agnostic effect based on one common molecular alteration) and the heterogeneity observed between different tumor types included in basket trials.Statistical methods have been developed in recent years to address these challenges, and can be classified broadly into tumor-specific analysis, pruning-andpooling methods, Bayesian hierarchical modeling approaches, and model averaging methods.In a process referred to as information borrowing, many of the proposed statistical methods allow for data characterizing the effect of a therapy in one tumor type to inform to some degree the effect in a different-but possibly similar-tumor type.This can increase the amount of information available for performing inferences, such as testing whether each tumor-specific treatment effect exceeds some context-specific threshold.Despite the statistical advantages of these innovative information borrowing methods, their uptake has been slow in the design and analysis of basket trials (13).
To make innovative basket trial methods more accessible to the clinical and research community, we conduct and present a systematic review designed to understand current basked trial practice in oncology and identify barriers to using these methods more often.We also performed a review of recently proposed statistical methodology with a focus on conceptual understanding of the methodologies rather than an in-depth review of mathematical and statistical details.For the clinical community to adopt these advanced statistical methods, we believe it is critical to build a conceptual understanding of the ideas in sufficient depth to engage in meaningful discussion with statisticians during basket trial planning stages.Through case studies, we illustrate the performance of advanced information borrowing methods using actual data from completed basket trials.We also provide a glimpse into the future of basket trials in oncology by showing how a basket trial might be used as a strategic component of a hypothetical clinical development plan (CDP) that includes phase Ib and phase II oncology trials.

Methods
The systematic literature reviews of basket trial practice and basket trial methods were done in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines (14).The EQUATOR checklist for the systematic review is provided in Supplementary Tables S4 in Supplementary File 1.

Data sources and searches
Systematic searches were conducted on February 20, 2023, in MEDLINE, Embase, and the Cochrane Central Register of Controlled Trials.The search strategy mirrored the approach of Park et al. (11), with minor modifications, such as modifying search terms to focus only on basket trials, and we supplemented the search with a review of bibliographies from included publications and trial registries (ClinicalTrials.gov)for registered basket trial protocols.Further details on the number of hits from each database are presented in Supplementary Table S1-S3 in Supplementary File 1.

Study inclusion and exclusion criteria
In addition to the trials already identified by Park et al. (11), we searched for any unique basket trials referenced in abstracts and papers from January 2019 through February 2023 for a systematic review of basket trial practice.The search for basket trial methods was from January 2001 to February 2023.Table 1 provides inclusion and exclusion criteria as per PICOS (population, intervention, comparator, outcomes, study design).The eligible abstracts and papers were restricted to English language only.Three reviewers (SJH, NB, and AK) independently reviewed all abstracts identified in the literature searches and assessed their eligibility.SJH and NB identified individual basket trials and extracted their key characteristics from the eligible abstracts, the corresponding full-text publications, bibliographies of published literatures, protocols, and trial registries.Discrepancies in trial selection were resolved by discussion with a third investigator (AK) or the wider team including MP and HZ.

Data extraction
Abstracts were reviewed to assess whether the studies meet the eligibility criteria for basket trial practice or basket trial methods.Basket trial practice includes any publication that reports on the design and findings from a basket trial, and papers for basket trial methods were defined as any paper that proposed innovative statistical methodology for basket trial design and/or analysis.The full texts were reviewed, and further decisions were made whether to extract data for each paper or not.SJH, NB, and AK independently screened the abstracts, reviewed full texts, and extracted data using bespoke data extraction templates developed by the team.

Data synthesis
We describe the basket trial practices in oncology using a narrative and descriptive statistics.No meta-analysis was done.Further information about the method is presented in Supplementary File 1.

Results
We identified 468 unique abstracts from our database searches.From these and our search on ClinicalTrials.gov,we identified 234 trials which warranted an in-depth review based on the inclusion and exclusion criteria.In total, 146 trials met our inclusion criteria for the review of basket trial practice-138 trials had ClinicalTrials.govNCT numbers whereas 8 trials were not registered in ClinicalTrials.gov.For the methods search, 41 publications met our inclusion criteria for the review of statistical methods for the design and analysis of basket trials.The PRISMA diagrams for the basket trial practice and the basket trial methods systematic reviews are provided in Supplementary Figures S1, S2 in Supplementary File 1, and a complete list of trials and methodological papers are provided in Supplementary Files 2, 3, respectively.

Review of basket trial practices
A systematic review of how master protocols are reported identified inconsistencies in how studies self-identified as a master protocol, with considerable variability in the definition of basket, umbrella, and platform trials (15).A systematic review of basket trials by Park et al. (11) discussed the landscape of basket trials with a focus on trends, trial and disease characteristics, and regional representations.18) discussed the importance of tissue origin and molecular target, noting that differences in response rates depended on tumor type.For example, breast and ovarian cancers were likely to have higher response rates than sarcoma or head and neck cancers.Our current review of basket trial practices and methods complements these existing reviews while focusing on type of design and analytic framework used for analysis.
Of the 146 trials, 7 were not yet recruiting, 38 were ongoing/ recruiting, 52 were active with closed recruitment, and 32 were completed; 17 trials had been terminated, withdrawn, or listed with unknown status (see Figure 1).We found that 75% (109/146) of trials investigated a monotherapy agent, while 25% (37/146) investigated a combination therapy.Most of the basket trials were in phase II (73%, 107/146), whereas 23% (34/146) of trials were in the dose expansion portion of a phase I study (i.e., phase Ib or phase I/II), and 3% (5/146) were designated as phase II/III (see Figure 1).Similar to Park et al. (11) and Haslam et al. (12), all phase Ib or phase II basket trials identified in this review were single-arm nonrandomized open label studies, and 91% (133/146) of the trials had objective response rate as either a primary or secondary endpoint.Further information about the review of basket trials in practice and the risk of bias are presented in the Supplementary File 1.

Risk of bias
To assess the risk of bias for each of the completed basket trials, we used the Risk of Bias In Non-randomized Studies of Interventions (ROBINS-I) tool developed for Cochrane systematic reviews (19).Due to common practices and standardization of the objective response rate (ORR, RECIST (20)) in the design of earlyphase oncology trials, we do not discuss risks of bias that relate to the design of each trial (e.g., confounding, missing outcomes).Instead, we focus on risks of reporting bias associated with the publication of results.According to ROBINS-I, bias can occur when results for reporting are selected from (1) multiple outcome measurements within the outcome domain, (2) multiple analyses of the intervention-outcome relationship, or (3) different subgroups.The risks of bias from these first two sources are minimal in basket trials due to the defining of trial endpoints in The left panel shows the number of trials by their status.'N Recruiting' denoted trials that were not yet recruiting; 'Recruiting' are ongoing trials with active recruitment; 'C Recruiting' denotes active trial with recruitment closed; 'Completed' denotes completed trials; and 'Inactive' are trials that have been terminated, suspended, withdrawn or have unknown status.The right panel shows the number of trials by their clinical phase.
the design stage and the specification of analysis methods prior to data analysis.However, bias can be introduced in the selection of which tumor-specific results are reported.Of the completed basket trials, approximately 38% (12/32) are subject to reporting bias with reasons including the reporting of only the pooled response rate (8/ 32), reporting results for only some tumor types (1/32), or not providing any publicly available results (3/32).Further, only 55% (16/29) of the completed trials on ClinicalTrial.gov had published results within the registry, whereas 38% (11/29) only had references to publications with results and 7% (2/29) do not provide results in any form.To reduce reporting bias and to increase transparency of trial results, we recommend that the results for each and all tumor types be published directly in ClinicalTrial.gov regardless of the strength of evidence supporting treatment efficacy.

Review of basket trial methods
An underlying assumption of statistical methods for basket trial design is that response to the targeted therapy is determined by a biomarker and is not heavily influenced by tumor histology (21), lending itself to a consideration that all tumor types in a basket trial share a similar response rate.As a result, tumor types are sometimes naïvely pooled together to estimate a single response rate during the final analysis of a basket trial.Such approach can increase the type I error rate (i.e., false positive/discovery rate) at the trial level.A tumor-level type I error rate is defined as the rate of falsely progressing a tumor type to the next phase of the clinical development when a treatment is not active on the tumor type (hereafter referred to as the tumor being an inactive tumor type), while the trial-level type I error rate is defined as the rate of falsely progressing any of the tumor types in a basket trial design when all tumor types are inactive.
We identified 41 methodological papers, of which 32% (13/41) use frequentist methods and 68% (28/41) use Bayesian methodology.Frequentist methods typically rely on p-values for hypothesis testing objectives (e.g., whether tumor-specific response rates exceed a pre-specified threshold), whereas Bayesian methods combine prior information and the trial results as part of a continual data stream in which inferences are updated each time new data become available (22). 5% (2/41) of papers proposed methods that incorporate independent analyses of data for each tumor type (23,24) without borrowing information across tumor types.The remaining papers each propose a method that incorporates information borrowing, most of which can be grouped into one of three classes based on the information borrowing mechanism: pruning-and-pooling methods, Bayesian hierarchical models, and model averaging methods.

Pruning-and-pooling methods
Of the proposed methods, 22% (9/41) suggest a two-stage design using a frequentist pruning-and-pooling approach.Under the most basic two-stage design, an interim analysis is performed at the first stage to determine which tumor types are active (i.e., treatment has an effect) and which are inactive.Enrollment in inactive tumor types is stopped, or "pruned", and the active tumor types that passed the interim analysis are then pooled together to estimate the overall response rate or tumor-specific response rates.Proposed adaptions of the pruning-and-pooling approach include designs with any type of endpoint (25)(26)(27)(28) or restricted to either a binary endpoint (29)(30)(31)(32) or time-to-event endpoint (33).Only two pruning-and-pooling methods were published with publicly available software.Lack of software to implement the methods could hinder accessibility and ease of use by the wider community.

Bayesian hierarchical models
A common method for borrowing information across different tumor types is a Bayesian hierarchical model (BHM), which assumes the response rates for all tumor types share a common underlying (bell-shaped) distribution as illustrated in Figure 2. Scenario 1 shows an example of a basket trial where response rates are from different locations of the bell-shape distribution without any obvious pattern.We refer to this as the exchangeability assumption, and we refer to the tumor types as being exchangeable with one another.However, the exchangeability assumption may not always be valid in practice.As an example, A conceptual illustration of a hierarchical model for the design and analysis of basket trials.The blue curve shows the assumed common distribution for treatment response rates.Scenario 1 assumes the tumor types are exchangeable and Scenario 2 shows a situation where response rates may be more similar for some tumor types than others.scenario 2 illustrates a case in which the response rates for two tumor types are more similar compared to the third tumor type.
Of the proposed methods, 39% (16/41) used a variation of the BHM.Thall et al. (34) first proposed the use of the BHM for trials with multiple disease types, and Berry et al. (35) extended this application to basket trials in oncology.Several methods relax the assumption of exchangeability in the BHM, including first testing for heterogeneity in the response and fitting a BHM only if the tumor types are deemed sufficiently homogeneous (36).Tumor types are assumed to be exchangeable with one another with some probability that can either be prespecified (37) or estimated by the data (38).Another approach is clustering similar tumor types into subgroups and then fitting a separate BHM within each cluster (39)(40)(41)(42)(43). Chu et al. (44) proposed a calibrated BHM (CBHM) which uses a fixed value for the between-tumor variance that is calculated via simulation studies to ensure the amount of borrowing is not substantial in the event that a large degree of heterogeneity is observed between tumor-specific response rates.Other variations of the BHM includes joint modeling of toxicity and efficacy (45), using a continuous biomarker to define subgroups of participants (46,47), and conditioning information borrowing on the correlation between response rates (48).

Model averaging methods
Of the 41 proposed methods, 12% (5/41) use model averaging to facilitate information borrowing across tumor types based on the general idea that different models are defined to represent unique scenarios of how the underlying tumor-specific response rates may relate to one another.Each model is fit to estimate the ORR for a subset of tumor types, and a weighted average of these modelspecific results is then calculated to obtain the overall ORR for each tumor type.
The simplest of the model averaging methods assumes heterogeneity in rates can be captured with only two models (49): a model assuming response rates are all equivalent, and a model that groups tumor types as either active or inactive.An alternative approach is to group all tumor types into either an inactive group with low response rate or an active group with a high response rate (50).Psioda et al. (51) propose an approach that considers all possible classifications of tumor types into subsets, where tumor types within a subset are assumed to share a distinct response rate that differs from the response rates in other subsets (see Figure 3).The method does not force the subsets into arbitrary groups of active or inactive, but instead allows the data to dictate which tumor types are similar enough to borrow information from each other.Hobbs et al. (52) proposed a version that restricted model averaging to only pairwise combination of tumor types.

Other design types and methods
Other design methods incorporate information borrowing by pooling all tumor types if deemed homogeneous at the interim analysis (53) or by pooling within subgroups that are defined using a clustering algorithm (54), clustering of tumor types (55)(56)(57)(58), modeling patient-level data with a tree ensemble method (59), and using Bayesian commensurate priors (60).Baumann et al. (61) propose conditions that can improve the behavior of posterior probabilities when applied to various Bayesian approaches for basket trials.Further information on these methods, including their limitations and software availabilities, are listed in Supplementary File 3.

Advantages and disadvantages of the methods
An advantage of most pruning-and-pooling methods lies in the simplicity of the design and analysis.This simplicity, however, brings several limitations.By pooling tumor types during the second stage, tumor types may be pruned prematurely based on limited data from each tumor type at the time of the interim analysis.Further, these methods implicitly assume that all tumor types can be categorized as either active or inactive, failing to allow for potential differences in the magnitudes of tumor-specific response rates.
conventional BHM discussed by Thall et al. (34) and Berry et al. (35) is motivated by the exchangeability assumption which may not be appropriate if, for example, a single tumor type has high activity while others have none.The amount of information borrowing is also sensitive to the choice of prior information on the between-tumor variance.Several of the proposed extensions to the BHM relax the exchangeability assumption, allowing one to better discern the extent to which information should be borrowed across tumor types at the cost of increased computational intensity.
A major strength of some model averaging methods, such as the approaches of Psioda et al. (51) and Hobbs et al. (52), is the flexibility to consider all possible structures of heterogeneity among tumor types.With the increased model flexibility, however, comes greater computational intensity.

Comparative analysis of basket trial methods
The previous section reviewed the proposed methods for basket trial design and analysis by highlighting their key features.Most methods were initially proposed by demonstrating better performance over one or two existing methods with respect to selected operating characteristics (e.g., increased statistical power, lower type I error rates), primarily through simulation studies.When using simulation studies to highlight the benefits of newly proposed methods, the relative performance of the methods can sometimes be exaggerated depending on the simulation setting.In this section, we illustrate the application of two different classes of information borrowing methods-Bayesian hierarchical modeling and model averaging-to data published from completed trials.To the best of our knowledge, there have been very few cases where the results of the methods were compared using actual data from a representative collection of basket trials.
For the comparative analysis of basket trial methods in this section, we compare and contrast the inferences based on the selected methods with respect to the probability that tumorspecific ORRs are greater than some threshold, and we note that it is not possible to assess the type I error rate, power, or other measures of statistical performance (e.g., gains in precision for estimates of ORR) by analyzing data from a specific trial where the ultimate truth is not known.With any clinical development plan, it is important to assess the risk of type I error rates and type II error rates (i.e., false negative rates)-both at the tumor and trial levelsusing simulation studies that imitate clinical scenarios under both the null hypothesis (to assess type I error rates) and the alternative hypothesis (to assess power and type II error rates).The same is applicable to the design of basket trials.Supplementary File 3 includes a list of which methods have been compared to one another via simulation studies in methodological papers, and we refer readers to these papers for a more comprehensive understanding of the operating characteristics of the different methods in a variety of scenarios.
We conducted analyses of data from 12 completed baskets trials with published ORRs per tumor type (Supplementary File 4).For each trial, we estimated the tumor-specific ORRs and the probability that each is greater than the response rate under the standard of care (SoC) or a pre-specified meaningful clinical threshold using three methods: a Bayesian model without information borrowing (IND), the calibrated Bayesian hierarchical model (CBHM) proposed by Chu et al. (44), and the Bayesian model averaging (BMA) approach of Psioda et al. (51).From the class of Bayesian hierarchical models, we selected the CBHM due to its demonstrated ability to control the tumor-specific type I error rate compared to other variations of the BHM (44), and we selected the BMA approach from the class of model averaging methods due to its demonstrated comparability with the CBHM with respect to statistical performance (51).While we use these two methods to represent their respective classes of information borrowing methods, we do not intend to advocate for one method being uniformly better than another.In fact, the authors believe no such method exists.

Homogeneous basket trials
A common assumption in the analysis of basket trials is a similar response rate across tumor types.While this assumption is often not practical, a phase I/II study of hRS7-SN38 Antibody Drug Conjugate in patients with epithelial cancer (NCT01631552) serves as an example where homogeneity in estimated tumor-specific response rates was observed (Table 2).The tumor types had unequal sample size with the triple negative breast cancer (TNBC) cohort being the largest tumor type (108 participants) and the metastatic urothelial cancer (mUC) cohort being the smallest tumor type (45 participants).Both the BMA approach and the CBHM are expected to borrow information across tumor types to augment the information available in tumor types with smaller number of participants.The estimated ORR for metastatic urothelial cancer) (mUC) by the BMA approach and the CBHM borrowed information from the TNBC and the cohort with hormone receptor positive (HR+) or human epidermal growth factor receptor 2 negative (HER2-).In agreeance with the findings of Psioda et al. (51), both the BMA approach and the CBHM provided similar results, both in terms of response rates as well as the posterior probability in support of the efficacy of the treatment.We observe similar results with the IND approach due to the large sample sizes for each tumor type, however, we would expect the advantages of the information borrowing methods to become more apparent as the sample sizes decrease.In such a case, the results from the IND approach would be more susceptible to a greater degree of variation due to limited enrollment for any of the tumor types, whereas the CBHM and the BMA approach can estimate the tumor specific ORRs with greater precision through borrowing information across tumor types demonstrating similar treatment effects.

Basket trials with non-zero response rate in only one tumor type
Analysis of a phase II trial of the cyclin-dependent kinase inhibitor in patients with cancer (NCT01037790), and a study to assess safety and efficacy of the second mitochondria-derived activator of caspases (SMAC) mimetic (NCT04122625) are presented in Table The BMA approach consistent results with the analysis without information borrowing (IND), an indication that the BMA approach borrowed little or no information across the tumor types with non-zero response rates.However, the impact of a small number of participants can be seen in the results.The higher the number of participants per tumor type with no responder, the closer the estimated response rates were to zero.The CBHM treated the response rates as homogeneous, particularly in NCT04122625 where the tumor types have the same estimated response rates due to pooling of information to compensate for the small number of participants per tumor type.It also means that the variability around the overall response rate across tumor type was very small, an indication that the measure of homogeneity in the CBHM could not discriminate between the tumor types.All the methods provided weak evidence in support of treatment efficacy and are likely to result in the same conclusion.Acronyms: CRU-GCT (cisplatin-refractory unresectable germ cell tumors), SCLC (small cell lung cancer), H&N SCC (squamous cell carcinoma of the head and neck).Note that the posterior probability is P(ORR>ORR SoC |Data).

Basket trials with zero response rate in only one tumor type
Table 4 presents the results of (1) a phase Ia/IIa trial investigating the safety, tolerability, and antitumor activity of a monoclonal antibody mixture targeting MET in patients with advanced solid tumor malignancies (NCT02648724); (2) a modular phase II study to link targeted therapy to patients with activated tumors (NCT01885195); and (3) studies of temozolomide in combination with topotecan in refractory and relapsed paediatric solid tumors (NCT00918320).In NCT02648724, the results from the BMA approach and the CBHM approach were generally consistent with the results from the analysis without information borrowing.In this example there was no obvious impact of the information borrowing method because the homogeneous subset of NSCLC MET-amplified and NSCLC METEx14DEL had a similar response rate and comparable sample size.However, the BMA approach was more conservative than the other methods when estimating the response rates for multiple myeloma and acute myeloid leukaemia in NCT01885195 due to a big difference in the number of participants per tumor type.The results from the CBHM were not influenced by the huge difference in number of participants, which could be due to the fact that the measure of homogeneity in the CBHM is not affected by a small number of participants.NCT00918320 presents an interesting case where the response rates for the homogeneous subgroup were not the same.The BMA approach borrowed information from the Neuroblastoma tumor type, which had the largest number of participants, and it resulted in a smaller posterior probability in support of treatment efficacy for the central nervous system tumors as compared to the CBHM or analysis without information borrowing.These results illustrate that an information borrowing method can improve or decrease the level of confidence in support of treatment efficacy depending on the strength of the treatment effect and the number of participants per tumor type.

Basket trials with heterogeneous response rates
Most basket trials are likely to have heterogeneous response rates, and the degree of information borrowing across tumor types will depend on the level of heterogeneity that is anticipated and/or observed.Figure 4 shows an example of a completed basket trial with heterogeneous response rates.This trial is a cross-tumoral phase II clinical trial exploring crizotinib in patients with advanced tumors induced by causal alterations of ALK and/or MET (NCT01524926).The trial had twelve baskets characterized by tumor types and ALK/MET alterations, each sharing a common threshold for the standard of care or a common clinical meaningful threshold.Furthermore, the number of participants differs Acronyms: NSCLC (non-small cell lung cancer), CNS (central nervous system).Note that the posterior probability is P(ORR > ORR SoC |Data).
substantially between the baskets.The results show no evidence in support of a tumor agnostic effect independent of which method was used, although the IMFT (ALK+) and PRCC1 (MET+) baskets achieved more than the targeted response rate of 30% (across all methods).The CBHM borrowed little or no information across the baskets due to the high level of heterogeneity between response rates.The amount of information borrowed by the BMA approach depends on the homogeneous subsets.It is interesting to note that the estimated response rates for IMFT (ALK+) and PRCC1 (MET+) by the BMA approach were smaller than the estimated response rates by both the analysis without information borrowing and the CBHM due to the small number of participants in the baskets (see Supplementary Table S5 in Supplementary File 1).Unlike NCT01524926, a phase Ib study of pembrolizumab in participants with select advanced solid tumors (NCT02054806) had heterogeneous response rates and a comparable sample size of around 20 participants per tumor type.The results from the CBHM approach aligned more closely with the results from the analysis without information borrowing, an indication that the CBHM borrowed little or no information due to the heterogeneous response rates.Compared to the CBHM, the BMA approach overestimated or underestimated the response rates depending on the subsets of homogeneous tumor types; e.g., the response rates for esophageal SCC/ACA, nasopharyngeal carcinoma, and small-cell lung cancer (SCLC) were estimated to be approximately 18% instead of the observed 30%, 26%, and 33%, respectively.The estimated response rates by the analysis without information borrowing were nested between the results from the BMA approach and the CBHM, hence, the results are not obvious from the plots in Figure 4.The posterior probabilities from all the methods in most cases would lead to a similar conclusion in support of treatment efficacy for all the tumor types.Additional results are presented in Supplementary Table S5 of Supplementary File 1.

Missed opportunity?
The current basket trial designs and practices focus on an individual phase of a CDP, particularly phase Ib or phase II.As a core part of the development of innovative statistical methodology for study design or data analysis, simulation studies are typically undertaken to compare the performance of the chosen method to other approaches with respect to some operating characteristic (e.g., chance of false efficacy conclusion or type I error at the tumor type and trial level).Simon (62) discusses the possibility of conducting an extended phase II or a phase III trial following a basket trial for tumor types which are not rare, however, this work includes no systematic investigation into the degree to which the inclusion of a basket trial in a CDP might add value.
To address the gap, we provide a brief illustration to demonstrate how incorporating a basket trial into a larger CDP might impact the success of the subsequent development program.For ease of exposition, we consider a simple, hypothetical CDP consisting of a single arm phase Ib basket trial with five tumor types (10 participants per tumor type) and subsequent randomized phase II trials for any tumor types in which signal is detected in the phase Ib basket trial.Each phase II trial assumed a total of 124 participants with equal allocation of 62 participants per arm (chosen to ensure at least 80% power to detect a risk difference of 20% in a phase II trial).Both the phase Ib and phase II trials measure objective response as the primary endpoint and investigate the same line of therapy with a minimum medicinal profile of 30% response rate against a response rate of 10% for the standard of care (a minimum medicinal profile is a company's projection of the smallest benefit for an investigational product necessary to achieve both commercial and regulatory success).
We evaluated the impact of the basket trial on the subsequent phase II trials using simulation studies, and we consider six scenarios with respect to the underlying tumor-specific response rates in the phase Ib trial.Specifically, we vary the ratio of active tumor types to inactive tumor types (i.e., underlying ORRs of 30% and 10%, respectively), where the 5-active-tumors scenario is motivated by trial NCT01631552 and the heterogeneous scenarios are motivated by trial NCT01885195.For any tumor type in signal is detected in phase Ib, the treatment arm of the subsequent phase II trial is assumed to have the same underlying tumor-specific ORR as defined in the phase Ib trial under the given scenario, whereas the control arm is assumed to have an ORR of 10% to reflect a SoC.
We simulated 100,000 hypothetical phase Ib and phase II trials under each scenario.For each phase Ib dataset, we applied both an independent (IND) Bayesian analysis for each tumor type without information borrowing and the BMA approach to compute the posterior probability that the response rate in a given tumor type was greater than the response rate under a SoC.For each approach, we declared the treatment to be successful (i.e., signal detected) for a given tumor type if the associated posterior probability exceeded 80%.In the phase II analysis, the two arms were compared using a standard Bayesian analysis to test the difference between two proportions, and the treatment was declared successful in phase II if the posterior probability of the treatment response rate being greater than the control response rate exceeded 90%.After repeating this process for each set of simulated studies, we calculated the phase Ib probability of success (PoS) for each tumor type as the percentage of simulated phase Ib trials in which the treatment was declared successful for the corresponding tumor type, and we calculated the joint PoS as the percentage of simulated studies in which the treatment was declared successful in both the phase Ib and phase II trials.Additional details of the simulation study design can be found in Supplementary File 1.
Figure 5 presents the simulation results.In each scenario, both the phase Ib PoS and joint PoS for active tumor types are greater when the BMA approach is used in phase Ib compared to when the IND approach is used.As the number of active tumor types increases, both measures of PoS under the IND approach remain constant, whereas these probabilities increase under the BMA approach due to information borrowing across additional active tumor types, ultimately resulting in a higher probability that an active tumor type successfully continues past both the phase Ib and phase II trials.The increase in information borrowing also results in an increased type I error rate for inactive tumor types, however, the joint PoS is only marginally greater when the BMA approach is used in phase Ib compared to the IND approach.While information borrowing approaches moderately increase the probability that an inactive tumor type continues into phase II, the probability of it progressing into a more expensive phase III trial remains low.Further, these approaches can also improve precision when estimating tumor-specific ORRs (see Supplementary File 1 for results).By evaluating the impact of a basket trial on multiple phases of a CDP rather than focusing on only the operating characteristics of the basket trial alone, sponsors can make more informed decisions relating to the progression of tumor types into later expensive phase III trials by understanding the benefits and risks of the basket trial design and the modeling approach under different scenarios.
To further highlight the advantages of assessing multiple phases within a CDP, we illustrate how the choice of modeling approach in phase Ib and the cost of the CDP (measured by phase II sample size) influence the joint PoS, as shown in Figure 6.In each scenario, the joint PoS under the IND approach quickly plateaus despite the increase in the planned phase II sample size per arm, remaining well below the joint PoS obtained using the BMA approach in phase Ib and a phase II sample size of 62 participants per arm (denoted in Figure 6 by a horizontal dashed line).While the observed differences in joint PoS are primarily driven by the phase Ib PoS under each approach, these results show that further investment in phase II following the use of the IND approach in phase Ib will not make up the discrepancy in joint PoS of the two basket trial modeling approaches for these specific scenarios.If we were to instead increase the sample size of each tumor type in phase Ib (e.g., 20 per tumor type), the joint PoS trajectory under the IND approach may eventually reach the joint PoS that was obtained with the BMA approach and 62 participants per arm in phase II, although a much larger phase II sample size may be required following the use of IND approach to obtain an equivalent joint PoS (see additional simulation results in Supplementary File 1 for the case when phase Ib sample sizes are set as 20 for each tumor type).This highlights that the impact of the basket trial design with information borrowing does not stop at the end of phase Ib but has a cascade effect that impacts the joint probability of success and cost in phase II.This wholistic view of the clinical development plan and the impact of a decision made in an early phase on later expensive clinical development phases motivates the need for information borrowing and basket trial design in early phase oncology where most companies rarely invest in a moderate to large number of participants.

Discussion
Our systematic review of basket trials in practice shows increasing usage of basket trial designs in oncology, which is consistent with the trend reported by other recent reviews (11)(12)(13).The current practice is largely in the design of single-arm phase II trials, although basket trials are also becoming more common in the design of phase Ib trials.Increasing adoption of basket trial design creates a positive shift in phase Ib and phase II trials which historically involve a small number of participants.Most basket trials in practice use ORR or another binary endpoint, and few studies use innovative information borrowing methods.This represents a significant gap in basket trial design in oncology given the lack of sensitivity of ORR and its potential disconnect from overall survival (63).An important area for research in basket trial design and analysis is the extension of innovative information borrowing methods beyond single-arm phase Ib and phase II trials, including the development of methods that address the challenges associated with longer-term endpoints such as large number of participants, data maturity, treatment switching, and other intercurrent events.
Our review of basket trials in practice and proposed statistical methodology has limitations common to systematic reviews.When searching for abstracts relevant to basket trials, only the abstracts that met our defined inclusion criteria for the search were deemed as eligible, potentially resulting in some abstracts not being identified through our search.For example, abstracts that discuss the results of a basket trial without using the phrase "basket trial" or "basket clinical trial" were not returned in the search, and hence the number of basket trials identified in our review is likely underestimated.Additionally, trials and methodological papers were only considered if they met the respective inclusion criteria for each review as defined by the team.While these criteria were defined to allow for objective decision making relating to the inclusion of trials and methods, we note that these criteria may differ from the criteria used in other systematic reviews for either the practice of basket trials or the statistical methodology.Our inclusion of studies that investigated only one treatment regimen across multiple tumor types or biomarker defined subgroups within the same tumor means that studies with a combined feature of basket and umbrella trials were excluded from our review.We also Planned phase II sample size per arm versus the joint PoS for all six scenarios in which we range the number of active tumor types (ORR of 30%) between 0 and 5.The black horizontal dashed line for the first five scenarios corresponds to the joint PoS obtained for active baskets if using the BMA approach in phase Ib and a planned phase II sample size of 62 per arm (i.e., sample size chosen to ensure at least 80% power to detect a risk difference of 20% in a phase II trial).
noticed inconsistencies in the reporting of basket trial results, specifically with investigators publishing the results for the most promising tumor types.Results for all tumor types in basket trials should be published directly on ClinicalTrials.gov to reduce reporting bias and increase transparency of trial results.
Several information borrowing methods assume that tumor types in a basket trial design will readout around the same time and data from all tumor types will be available for pooled analysis or information borrowing both at the interim and final analysis.However, this may not always be the case due to factors such as differences in the prevalence of each tumor type or varying recruitment rates.Although tumor types in a basket trial are not required to have the same number of participants, it is important to assess the feasibility of recruitment at the design stage.Where basket trial design with information borrowing may delay the clinical development plan, the cost of the delay-both for patients and the company-may outweigh the benefit of information borrowing methods.Note that the BMA approach is adaptable, allowing sponsors to place constraints on a tumor-specific minimum number of participants and to determine at each interim look whether enough data have been collected from a tumor type to examine for futility or efficacy.
A comparative analysis of basket trial methods using data from the completed trials indicates that neither complete homogeneous nor complete heterogeneous response rates are practical assumptions for the design of a basket trial in oncology.We recommend the use of innovative information borrowing methods for both the design and analysis of a basket trial to minimize the risk of false positive or false negative conclusion, particularly when small and/or unequal number of participants is envisaged.We also observed a disconnect between the biological expectation of a tumor agonistic effect and the observed heterogeneous response rates between tumor types with a common molecular alteration or biomarker.Most of the current methods focus on only a single endpoint with no adjustment for covariates to reduce the heterogeneity in the outcome data.This approach ignores the different sources of variability, which include tumor-to-tumor heterogeneity, tumor-site interactions, and clinical heterogeneity between patients.Future research which adjusts for known sources of variability using non-targeted biomarkers and other prognostic factors may help to explain what currently appears as heterogeneity between tumor types in basket trials.Our approach to empirical comparison of the methods is specific to the setting of each trial.In practice, innovative methods should be tailored to each trial.Simulation methods provide a flexible scheme to compare performance of different methods, and existing papers on basket trial methods often report a simulation study comparing their method with other selected methods.
Moreover, the current practice of basket trial design, which does not consider its impact on the CDP, may lead to a missed opportunity in improving the probability of success of a future trial.As highlighted by our simulation study, gating phase II with a phase Ib basket trial can reduce the size of phase II trials, and losses in the joint PoS for phase Ib and phase II as a result of not using an advanced inference method in phase Ib may not be recoverable by running a larger phase II trial.Simulation studies should be used to investigate the impact of basket trial design and other study design methods in an earlier clinical phase on the future confirmatory phase III trials.Among other things, future research should quantify the degree to which a basket trial de-risks a future trial and how this degree of de-risking differs when using information borrowing methods compared to independent analyses.It is also important to extend the current framework of basket trials to CDPs in which the basket trial and future confirmatory trials have different endpoints (e.g., binary versus time-to-event).
Park et al. (16) presented an overview of precision oncology basket and umbrella trials for clinicians, with illustrations of basket trial and umbrella trial design with examples.Meyer et al. (17) reviewed the evolution of master protocol clinical trial designs, reporting that most basket trials had a binary primary endpoint, no control group, and analyses that used frequentist methods.A more recent systematic review by Haslam et al. (12) focused on basket trials in oncology with published results in which they characterized the correlation between the size of a basket and the incidence of the respective tumor.Another recent review by Haslam et al. (

FIGURE 3
FIGURE 3An illustrative example of all possible ways of classifying three tumor types into subgroups, where each circle represents a unique classification and colors correspond to different subgroups within a classification.Under the BMA approach of Psioda et al.(51), each classification corresponds to a unique model where tumor types within a subset are assumed to have equivalent response rates while differing from tumor types in other subsets.In the case of three tumor types, one model assumes all three tumor types have the same response rate, three models assume that two tumor types share a response rate that is distinct from the third tumor type, and one model assumes all three tumor types have distinct response rates.

FIGURE 4
FIGURE 4Examples of trials with heterogeneous response rates NCT01524926 (left panel) and NCT02054806 (right panel).The red dotted line denotes the response rate under standard of care or a clinical meaningful threshold.The solid blue line represents the targeted response rate to justify further development of the investigational product.

FIGURE 5 Phase
FIGURE 5Phase Ib PoS and joint PoS for active tumor types (left panel, ORR of 30%) and inactive tumor types (right panel, ORR of 10%) across various scenarios that differ in the number of active tumor types.

TABLE 2
Example of a basket trial with homogeneous response rate across tumor types (ClinicalTrials.govID: NCT01631552).Trial criteria values not reported in trial publications, values instead assumed by study authors for the comparative analysis.Acronyms: TNBC (triple negative breast cancer), HR+ (hormone receptor positive), HER2-(human epidermal growth factor receptor 2 negative), mBC (metastatic breast cancer), mUC (metastatic urothelial cancer).Note that the posterior probability is P(ORR > ORR SoC |Data). *

TABLE 3
Examples of basket trial where all but one tumor types have zero response rates.
*Trial criteria values not reported in trial publications, values instead assumed by study authors for the comparative analysis.

TABLE 4
Basket trials where only one tumor type has zero or close to zero response rate.
*Trial criteria values not reported in trial publications, values instead assumed by study authors for the comparative analysis.