TMBcat: A multi-endpoint p-value criterion on different discrepancy metrics for superiorly inferring tumor mutation burden thresholds

Tumor mutation burden (TMB) is a widely recognized stratification biomarker for predicting the efficacy of immunotherapy; however, the number and universal definition of the categorizing thresholds remain debatable due to the multifaceted nature of efficacy and the imprecision of TMB measurements. We proposed a minimal joint p-value criterion from the perspective of differentiating the comprehensive therapeutic advantages, termed TMBcat, optimized TMB categorization across distinct cancer cohorts and surpassed known benchmarks. The statistical framework applies to multidimensional endpoints and is fault-tolerant to TMB measurement errors. To explore the association between TMB and various immunotherapy outcomes, we performed a retrospective analysis on 78 patients with non-small cell lung cancer and 64 patients with nasopharyngeal carcinomas who underwent anti-PD-(L)1 therapy. The stratification results of TMBcat confirmed that the relationship between TMB and immunotherapy is non-linear, i.e., treatment gains do not inherently increase with higher TMB, and the pattern varies across carcinomas. Thus, multiple TMB classification thresholds could distinguish patient prognosis flexibly. These findings were further validated in an assembled cohort of 943 patients obtained from 11 published studies. In conclusion, our work presents a general criterion and an accessible software package; together, they enable optimal TMB subgrouping. Our study has the potential to yield innovative insights into therapeutic selection and treatment strategies for patients.


Model specification
Assume the observed data were collected from n patients. We suppose that there are K unique response states for the ORR endpoints. Let R i denotes the response for patient i . Given the covariates X i , the conditional probability for R i is defined as where R i = 0 is set to be the anchoring group. Due to the limited number of clinically available CR patients in practice, this paper solely considers the case of K = 3, i.e., R i = 2, 1, 0 denoting for CR&PR, SD and PD, respectively.
Let T * i denotes the underlying time to event for patient i. For the usual right censoring mechanism, we can observe T i = min(T * i , C i ), where C i denoting the censoring time. Meanwhile, we define the event indicator as δ i = I(T * i ≤ C i ), where I(·) is the indicator function that takes the value 1 if the event is observed, and 0 otherwise. Given the covariates X i , the hazard function for the ith patient at time t is defined as The observations are consist of D = {R i , T i , δ i , X i }, i = 1, ...n. The random effects are employed to demonstrate the links between various clinical outcomes. Specifically, we will follow the Generalized Linear Mixed Model (GLMM) framework to introduce random effects into the multinomial logistic regression model and proportional hazards function.
where g 0 (x i ) = 0 denotes the reference group, x i denotes the vector of covariates, β k = (β Rk , γ k ), k = 1 or 2, denotes the corresponding logistic regression coefficients for response k, and β 3 denotes the risk factors effects on hazard function. A positive value of coefficients in β Rk implies a higher probability of response k associated with the corresponding covariate, while a positive value of coefficients in β 3 implies a higher risk of failure associated with the risk factor. U 1i , U 2i and V i , i = 1, ..., n, denote random effects of cluster i on the respective endpoint, which are assumed to follow the multivariate normal distribution N (0, Σ) with .., V n ) , and a = u 1 , u 2 , v . Then the vector a follows the normal distribution with mean zero and covariance matrix Ω = Σ ⊗ I n , where I n is a n × n unit matrix, and ⊗ denotes the Kronecker product of two matrices. We further define the linear predictors as Following the GLMM formulation, the random effects are set to be conditionally fixed, then the best linear unbiased prediction (BLUP) type log-likelihood can be written as the sum of following three terms: where λ 0 (t) is called baseline hazard function, and Λ 0 (t) = t 0 λ 0 (s)ds denotes the corresponding cumulative baseline hazard. To simplify the estimation procedure, the baseline hazard can be eliminated in the conditional partial likelihood function S . Following the marginal likelihood argument, we arrange the n distinct failure times in an increasing order by t 1 < t 2 < · · · < t n . And the log-likelihood with random effects being conditionally fixed, S , can be rewritten as where R(t) = {j : T j ≥ t} is the risk set at t−. So the BLUP type log-likelihood is formulated as Remark: When the response degrades to a dichotomous indicator (clinically, the tumor status is frequently reduced to response or non-response to simplify the study), the multinomial logistic regression in the proposed model degenerates to a binary logistic regression.

Estimation procedure
To estimate the corresponding regression coefficient vectors β 1 , β 2 and the risk effect β 3 , the conditionally fixed random effects u 1 , u 2 , v and the variance component parameters (θ 1 , θ 2 , θ 3 , ρ 1 , ρ 2 , ρ 3 ), the estimation procedure can be performed under GLMM framework. The use of GLMM can avoid the need for intractable high-dimensional integration involved in the marginal likelihood methods or timeconsuming Monte Carlo approximations in the E-step for maximum likelihood estimation. The original work of McGilchrist extended the BLUP method for linear mixed models (LMMs) to a GLMM setting (with additional random effect terms in the linear predictor) through the residual maximum likelihood (REML) estimation procedures, which involves an iterative numerical procedure: [i] For given values of θ 1 , θ 2 , θ 3 and ρ 1 , ρ 2 , ρ 3 , the estimates of β 1 , β 2 , β 3 , u 1 , u 2 and v can be obtained by maximizing the log-likelihood blup in eq(S5), which is performed via Newton-Raphson iteration.
[iii] Repeat steps [i] and [ii] until convergence criteria are fulfilled.
As the above procedure involves manipulations of large matrices, the computational complexity and cost of the proposed method are extremely high. For example, update based on eq(S6) in Step [i] appears in every iteration and involves the computation of matrices I and I −1 , and the time for calculating the standard errors in Step [ii] is also high. But it is needed only once after the simplified REML estimators have been obtained. Furthermore, predictions of the endpoints can be obtained using the empirical Bayes (EB) method to estimate the point estimates and the variances of the random effects.
Following the developments in McGilchrist (1993), simplified version of the above REML estimators for the variance component parameters and their asymptotic standard errors can be obtained.

Measurement errors
To reducing the biasing effect caused by measurement errors and obtain a more robust TMB threshold, we integrated the widely applicable corrected-score with the joint model, resulting in approximately consistent estimators based on the observed data. The covariate which is conditionally unbiased for the true-data score function according to the property of conditional expectation, The corrected scores provide an approach to reducing bias incurred by covariate measurement error.
Based on the correct-score, where the complex variate T M B * j,i = T M B * i + √ −1ξ j,i , and ξ j,i is a normal random vector with zero mean and variance σ e . Since the logistic function does not satisfy the smoothness condition required by the corrected-score theory, it has no intuitively exact corrected score but obtains an approximation by means of complex variable extrapolation. Additionally, , the approximately consistent estimatorsθ,ũ i andṽ i can be derived.

Simulation study without considering TMB measurement errors
In order to assess the proposed estimation procedure's performance, we conducted a series of simulation studies whose primary objective was to assess the fixed effect estimates and the variance of the random effects. Data are simulated in an oncology trial context, with random effects correlated among clusters' multiple endpoints. In the simulations, we assume 200 patients, i.e., i = 1, ..., 200. For each cluster i, we generate the random effects U 1i , U 2i and V i from a multivariate normal distribution with zero mean, θ 1 = θ 2 = θ 3 = 1.0, and general correlations ρ 1 , ρ 2 , ρ 3 . We consider three distinct tumor response states CR&PR (k = 0 is being setting as the reference), SD (k = 1) and PD (k = 2). The response data is generated based on the logistic probabilities for multiple classifications, π 0 = 1 − (π 1 + π 2 ), π 1 = exp η 1 / (1 + exp η 1 + exp η 2 ), π 2 = exp η 2 / (1 + exp η 1 + exp η 2 ). The event time for the patient is generated from the probability density function λ 0 (t) exp η 3ij S 0 (t) exp η3ij , where the baseline hazard λ 0 (t) is assumed to follow the exponential distribution with a mean equal to 500, and S 0 (t) = exp(− t 0 λ 0 (s)ds) is the baseline survival function. Censoring time C is generated from the uniform distribution U (50, 500).
Furthermore, we set β R1 = 1.6, γ 1 = −0.2, β R2 = 1.0, γ 1 = −0.8, β T = −1.2, θ 1 = θ 2 = θ 3 = 1.0. The correlation ρ m (m = 1, 2, 3) is chosen from -0.9, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 0.8 and 0.9, in order to evaluate the performance of the estimators with different correlation levels. In the negative correlation scenario simulations, only two of the three values for ρ m can be set to negative, given the positive definite limitation of the covariance matrix of the multivariate Gaussian distribution. With the specified parameters, 1000 replications are simulated in each setting. Results of the simulations are presented in Table  S1. We report the fitted value, average bias, SE and SD for each parameter, where SE and SD are defined as the average of the standard error of the estimates and the standard error of the estimates over 1000 simulations, respectively.
The results in Table S1 indicate that the regression parameter estimates in three function sections perform fairly well under various multivariate correlation settings, given that the hazard estimation is performed using a semi-parametric method. The biases in estimating the variance component parameters θ 1 , θ 2 , θ 3 , ρ 1 , ρ 2 , ρ 3 are likewise relatively minimal. In general, the results support the suggested multi-endpoint joint model with correlated random effects being applicable. When comparing SE and SD, the accuracy of the reported standard errors is generally acceptable, though slightly underestimated when ρ m approaches the limiting values of ±0.9. In particular, the sample standard errors SD for 1000 correlation estimates are minimal due to the limited one-way volatility on correlation estimates at boundaries (i.e., the correlation estimates can only be larger than -1 or smaller than 1). However, the estimation method for standard error still assumes two-way variation. As a result, when ρ m is close to the limits, bootstrap estimates for the standard error of the correlation parameter could be employed.

Simulation study considering TMB measurement errors
In order to assess the performance of the proposed joint model with corrected-score function, we also conducted a series of simulation studies under TMB measurement errors. The simulation setting is similar with the above section 2.1. β Rz = −1.8, β Rm = 0.3, β T z = 2.2, β T m = −0.4. The variance for the error term is set to be 0.5, 0.75, 1.0, respectively, in order to evaluate the performance of the estimators with different measurement error levels. With the specified parameters, for each dataset, based on joint model, the true-data estimator, the naive estimator and the correct-score estimator with Monte Carlo approximation J=10, were computed 1000 replications. As a comparison, we also based the standard regression models, the true-data estimator and the naive estimator were computed. Results of the simulations are presented in Table S2. We report the fitted value, average bias, SD and SE for each parameter, where SD and SE are defined as the standard error of the estimates over 1000 simulations and the average of the standard error of the estimates, respectively.
According to Table S2, the regression parameter estimates for the two function components perform reasonably well for a variety of measurement error conditions. In the absence of measurement errors, the joint model outperforms ordinary regression models in calculating regression coefficients because it more precisely reflects the potential connections between several endpoints. When considering different levels of measurement errors, the performance of the estimator based on correctedscore was significantly superior to that of the naive estimator and only marginally poorer than that of the true-data estimator. Clearly, the performance of the naive estimator deteriorates with increasing error magnitude, which further suggests that measurement error introduces a more significant bias effect on the parameter estimates. Overall, the results of the simulation experiments support the proposed joint multi-endpoint model and the iterative numerical estimation procedure, as well as the applicability of the associated random effects. Additionally, comparing SE and SD, the precision of the stated standard errors is generally satisfactory. The biases of the joint assessments compared to the standard separate regressions emphasizes that the dependence among clinical endpoints could be an important and non-negligible con-founder in analyzing the factors determining the treatment effect.