PET-based radiomic feature based on the cross-combination method for predicting the mid-term efficacy and prognosis in high-risk diffuse large B-cell lymphoma patients

Objectives This study aims to develop 7×7 machine-learning cross-combinatorial methods for selecting and classifying radiomic features used to construct Radiomics Score (RadScore) of predicting the mid-term efficacy and prognosis in high-risk patients with diffuse large B-cell lymphoma (DLBCL). Methods Retrospectively, we recruited 177 high-risk DLBCL patients from two medical centers between October 2012 and September 2022 and randomly divided them into a training cohort (n=123) and a validation cohort (n=54). We finally extracted 110 radiomic features along with SUVmax, MTV, and TLG from the baseline PET. The 49 features selection-classification pairs were used to obtain the optimal LASSO-LASSO model with 11 key radiomic features for RadScore. Logistic regression was employed to identify independent RadScore, clinical and PET factors. These models were evaluated using receiver operating characteristic (ROC) curves and calibration curves. Decision curve analysis (DCA) was conducted to assess the predictive power of the models. The prognostic power of RadScore was assessed using cox regression (COX) and Kaplan–Meier plots (KM). Results 177 patients (mean age, 63 ± 13 years,129 men) were evaluated. Multivariate analyses showed that gender (OR,2.760; 95%CI:1.196,6.368); p=0.017), B symptoms (OR,4.065; 95%CI:1.837,8.955; p=0.001), SUVmax (OR,2.619; 95%CI:1.107,6.194; p=0.028), and RadScore (OR,7.167; 95%CI:2.815,18.248; p<0.001) independently contributed to the risk factors for predicting mid-term outcome. The AUC values of the combined models in the training and validation groups were 0.846 and 0.724 respectively, outperformed the clinical model (0.714;0.556), PET based model (0.664; 0.589), NCCN-IPI model (0.523;0.406) and IPI model (0.510;0.412) in predicting mid-term treatment outcome. DCA showed that the combined model incorporating RadScore, clinical risk factors, and PET metabolic metrics has optimal net clinical benefit. COX indicated that the high RadScore group had worse prognosis and survival in progression-free survival (PFS) (HR, 2.1737,95%CI: 1.2983, 3.6392) and overall survival (OS) (HR,2.1356,95%CI: 1.2561, 3.6309) compared to the low RadScore group. KM survival analysis also showed the same prognosis prediction as Cox results. Conclusion The combined model incorporating RadScore, sex, B symptoms and SUVmax demonstrates a significant enhancement in predicting medium-term efficacy and prognosis in high-risk DLBCL patients. RadScore using 7×7 machine learning cross-combinatorial methods for selection and classification holds promise as a potential method for evaluating medium-term treatment outcome and prognosis in high-risk DLBCL patients.


Introduction
Diffuse large B-cell lymphoma (DLBCL) is a highly heterogeneous and aggressive B-cell lymphoma, accounting for 30%-40% of initial diagnosed non-Hodgkin's lymphomas (NHL) (1).The first-line immunochemotherapy are R-CHOP (rituximab, cyclophosphamide, doxorubicin, vincristine and prednisone) or R-CHOP-like regimens (2,3).Clinically, 30%-40% of patients undergoing this therapy experience relapse or refractory (4,5).This could be attributed to the tumor heterogeneity, leading to reduced sensitivity to chemotherapy (6,7).Patients classified as high-risk face poorer prognostic survival (8).The gene expression profiling of DLBCL defined three primary subtypes based on "cell of origin" (COO): germinal center B cell-like (GCB), activated B celllike (ABC), and not otherwise specified (NOS).The molecular subclassification could account for some of the heterogeneity in the clinical outcomes of DLBCL (9).Numerous prognostic tools have been identified through large-scale retrospective studies.The International Prognostic Index (IPI) was proposed in 1993, incorporating five risk factors: age, lactate dehydrogenase (LDH), the Eastern Cooperative Oncology Group (ECOG) Physical Status (PS), Ann Arbor stage, and extra-nodal involvement (10).The National Comprehensive Cancer Network -IPI (NCCN-IPI) was proposed in 2014, which form four risk groups based on scores ranging from 0 to 8. The NCCN-IPI provides more accurate identification of intermediate-high (4,5) /high-risk (6-8) DLBCL patients (11).However, the focus of both the IPI and the NCCN-IPI on clinical and biologic indicators makes it difficult to comprehensively assess the tumor heterogeneity of DLBCL (12,13).
18F-fluorodeoxyglucose (FDG)-positron emission tomography/ computed tomography (PET/CT) is widely utilized for early DLBCL diagnosis, staging, and assessing chemotherapy response (14).SUVmax, MTV and TLG are commonly used in PET.These metabolic indicators reflect tumor malignancy and are valuable for baseline assessment as well as improve response prediction (15).In the previous research, SUVmax is the most widely used indices (16).MTV and TLG, are associated with tumor burden, as well as progression-free survival (PFS) and overall survival (OS) (17).Vercellino et al. found that the integration of baseline total metabolic tumor volume (TMTV) with parameters of tumor load distribution has the potential to enhance the accuracy of risk stratification for DLBCL patients (18).Nevertheless, these indicators have limitations on describing tumor heterogeneity.Radiomics were used to assess tumor heterogeneity and assisted in the prediction of clinical outcomes.PET radiomics features present promising biomarkers for predicting treatment outcome and prognosis in DLBCL (19).
Machine learning is commonly used for radiomic feature identification and classification (20).Several studies investigated the risk stratification and efficacy of PET radiomics, Lue et al. used the least absolute shrinkage and selection operator regression (LASSO) method and discovered that the baseline 18F-FDG PET radiomic feature RLN GLRLM is an independent prognostic factor for survival outcomes (21).But these studies utilized limited machine learning methods (22,23).Additionally, other studies reported the outcome and prognostic value of radiomics features using crosscombination methods (24).However, these methods have not yet been applied in high-risk DLBCL patients.In this paper, we therefore employed a cross-combination of seven machine learning methods to select and classify PET radiomics features associated with tumor internal heterogeneity.Furthermore, we established a tool as early prognostic biomarker that predicts mid-term treatment outcome and prognosis, also identifies highrisk DLBCL patients with unresponsive to R-CHOP regimen.

Patient data collection
This study followed the principles outlined in the Declaration of Helsinki.Ethical approval for this retrospective analysis was obtained from the Ethics Committee of two medical centers.Written consent was not required for this study.A total of 177 patients with DLBCL classified as intermediate-high/high-risk according to NCCN-IPI score of 4-8 were enrolled between October 2012 and September 2022.Among them, 125 patients were from Nanjing Drum Tower Hospital of Nanjing University Medical School, and 52 patients were from West China Hospital of Sichuan University.The patients were randomly divided into a training cohort (123) and a validation cohort (54) using a 7:3 randomization ratio.Inclusion criteria were defined as follows: (I) patients with confirmed NCCN-IPI ≥4 for DLBCL, (II) [18F]-FDG PET/CT scan was performed before baseline treatment, and (III) received R-CHOP-like regimens, and (IV) patients had to be aged ≥18 years at the time of diagnosis.Exclusion criteria were used: (I) participants with primary central nervous system lymphoma, (II) participants with a history of other tumors, and (III) participants with incomplete clinical data, and (IV) had undergone previous treatment such as chemotherapy, radiotherapy, or surgery, and (V) lost to follow-up.
The datasets included patient clinical data such as gender, age, B symptoms, ECOG PS, IPI, NCCN-IPI, LDH, Ann Arbor stage, extranodal involvement, bone marrow involvement.Patient followup data were collected through electronic medical records or telephone interviews.The mid-term PET scans based on the Deauville 5-point scale were used as study endpoints for mid-term efficacy and prognosis in DLBCL patients.A score of 1-3 was defined as complete metabolic remission (CMR), and 4-5 was defined as partial metabolic remission (PMR), disease stabilization (SD), or disease progression (PD) (25).Therefore, the patients were divided into CR group and non-CR group.Figure 1 illustrates the baseline and mid-term pet of non-CR and CR patients.

PET/CT scanning protocol
All patients should fast for more than 6 hours before PET/CT scans, and their fasting blood glucose levels were under 8.7 mmol/L.Patients were injected with 18F-FDG (3.70-5.18MBq/kg; Fludeoxyglucose [18F] Injection; AMS Limited) via a superficial forearm vein, and rested quietly for 60 minutes before PET/CT.CT scanning conditions included a tube voltage of 120 kV, tube current of 100 mA, and layer thickness of 2 mm (Philips).PET scanning conditions included acquisition of 7-10 beds, with each bed lasting for 2 minutes (Philips2).At the end of acquisition, a response line image reconstruction was implemented to obtain cross-sectional, coronal, and sagittal PET and CT images, which were later corrected for attenuation.Image reconstruction was performed using voxels of 4 × 4 × 4 mm³ over three iterations and 33 subsets.

VOI drawing and radiomics processing
The PET images were processed using LIFEx (Local Image Feature Extraction) software(version7.3.0)(26).(I) A voxel boundary threshold of 41% SUVmax was employed (15).A semiautomatic segmentation method was used to outline the volume of interest (VOI), (II) with non-lymphoma 18F-FDG uptake being manually excluded.In case of disagreement, a senior nuclear medicine physician was consulted to jointly determine the VOI.(III) The metabolic metrics, including SUVmax, MTV, and TLG, were determined for each lesion.SUVmax represented the maximum standardized uptake value with the highest uptake in tumor lesions.MTV was the volume of tumor lesion for a single VOI, and TLG was calculated as the sum of the product of the SUVmean and the MTV for the lesion (TLG = [SUVmean × MTV]).Lesions with MTV smaller than 10 cm³ were not included.All radiomics features complied with the benchmarks of the Image Biomarker Standardization Initiative (IBSI) (27).
PET radiomic features were extracted from baseline PET images by the open-source software package LIFEx (www.lifexsoft.org).For the original PET image, (I) the Wavelet and Laplacian of Gaussian (LoG) transform were applied to obtain the corresponding Wavelet and LoG images.Then, (II) three types of features were extracted: first-order statistical features (maximum, minimum), shape features (roundness, extensibility), and texture features.Figure 2 illustrates the workflow of radiomic analysis.

Radiomics feature selection and RadScore construction
The extracted radiomic features were screened and classified using a cross-combination method of seven machine learning models.These methods are Gradient Boosted Decision Tree (GBDT), Extreme Tree (ET), Random Forest (RF), Adaptive Boosting (AdaBoost), Least Absolute Shrinkage and Selection Operator (LASSO), Support Vector Machines (SVM) and Logistic Regression (LR).GBDT (28) utilizes decision trees as its base learner, with predictions from a series of trees summed together.RF ( 29) is an ensemble of decision trees, where the results of all the decision trees are voted upon or averaged to obtain the final prediction.ET (30) is the model underlying the feature recursive elimination algorithm, which selects the dataset and obtains weight values for each feature.Features with the smallest absolute weight values are then sequentially removed from the feature set.AdaBoost (31) adapts to different datasets by adjusting the weights of the training samples and combines multiple classifiers linearly to enhance their performance.LASSO (32) is a classical regression analysis method that minimizes regression coefficients through shrinkage operations, preserving non-zero variables in the model.SVM ( 33) is a powerful method for building classifiers that establishes a decision boundary between two categories, enabling label prediction based feature vectors.LR (34) is a generalized linear model used for classification tasks, analyzing the impact of independent variables on classification results by quantifying their effects.
This paper presented a feature selection-classification pairs from 7×7 possible combinations, such as LASSO-LASSO SVM-SVM and SVM-LASSO.Seven machine learning methods were used to select features, and seven machine learning methods were used to classify features.Subsequently, the optimal candidate pair were used to build Radiomic Score (RadScore).RadScore was defined as the sum of the product of the selected radiomic feature and the corresponding feature weights.The identification of the best candidate model involved five steps utilizing fivefold crossvalidation: (I) The patient data was randomly divided into training(n=123) and validation(n=54) cohorts.(II) For the training cohort, we employed seven feature selection models, Analysis workflow in this study.SVM, support vector machine; GBDT, gradient boosting decision tree; RF, random forest; ET, extra-trees; LASSO, least absolute shrinkage and selection operator; LR, logistic regression; AdaBoost, adaptive boosting.

Development and validation of the models
Univariate and multivariate logistic regression were utilized to identify potential independent risk factors in the training group and construct a predictive model for the mid-term treatment outcome.In the univariate analysis, statistically significant clinical and PET factors were included separately in the multivariate analysis.Independent clinical predictors were employed to develop clinical models, while independent PET predictors were utilized to create PET-based models.Subsequently, all independent clinical predictors, PET predictors, and RadScore were assembled a combined model.Additionally, NCCN-IPI model and IPI model were also developed.

Clinical benefit analysis based on the models
All models were assessed in both the training and validation groups through Receiver operating characteristic (ROC) curves and calibration curves.Additionally, decision curve analysis (DCA) was employed to evaluate the net clinical benefits of these models.

Statistical analysis
All data were analyzed using SPSS 25.0 (IBM Corp, Armonk, NY, USA) and R statistical software (version 4.2.2).A P value less than 0.05 was considered statistically significant.The c2 test was used to compare clinical characteristics and PET metabolic metrics in the training and validation groups.Nomograph were used to show the score of independent risk factors.ROCcurves were utilized to determine the optimal thresholds for SUVmax, MTV, TLG, and RadScore in predicting mid-term efficacy, PFS and OS.Logistic regression analyses were employed to assess and develop independent predictors.Calibration curves, ROC, and DCA were calculated for the model in both the training and validation cohorts.Survival analysis was conducted by Cox regression and Kaplan-Meier (KM) analysis.

Patient characteristics
A total of 177 patients (mean age,63 ± 13 years,129 men) were included.Table 1 summarized the baseline characteristics for patients in both the training and validation cohorts.The c2 test revealed no statistically significant(P<0.05)difference between the two groups.The median follow-up time for the training and validation cohorts was 30.5 and 30.8 months, respectively.In the training cohort, 62 individuals experienced disease relapse or progression, resulting in 42 deaths.The 1-year, 3-year, and 5-year PFS rates were 89.6%, 72.2%, and 56.2%, while 1-year, 3-year, and 5-year OS rates were 89.3%, 67.9%, and 63.4%.Likewise, in the validation cohort, disease relapse or progression occurred in 24 individuals, leading to 13 deaths.1-year, 3-year, and 5-year PFS rates were 79.7%, 55.3%, and 38.0%, and 1-year, 3-year, and 5-year OS rates were 90.9%, 75.8%, and 70.0%.

Radiomics feature selection and RadScore construction
Based on the 49 features machine learning selectionclassification pairs, we selected 110 radiomics features to construct the optimal LASSO-LASSO model (AUC=0.74)(Figure 3).The LASSO-LASSO model screened out 10 key radiomics features for constructing RadScore (Table 2).We employed the ROC curves to identify the optimal cut-off for these dichotomous variables, which corresponds to the point with the maximum Youden index.The Youden index represents the sum of sensitivity and specificity and then subtracting 1. Table 3 shows that RadScore cut-off threshold of 2.0, 2.2 and 2.2 was optimal for predicting mid-term efficacy, PFS and OS.

Univariate and multivariate analysis results
Table 4 shows the between-group differences in clinical characteristics, PET metabolic indices regarding mid-term efficacy.

Assessment and validation of models built for predicting mid-term efficacy
To predict mid-term efficacy, we developed a combined model that utilized separate clinical predictors (gender, B-symptoms), PET predictor (SUVmax) and RadScore (Figure 4; Table 5).
Additionally, we also created separate clinical models, PET-based models, IPI model and NCCN-IPI models (Table 5).
Nomograms visualized the score of risk factors on mid-term efficacy.The calibration curves after 1000 repetitions of bootstrapping for each model, which showed satisfactory agreement between the estimated values and the actual observed values in both the training and validation groups for the combined model (Figure 4).
The ROC curves of the models for predicting mid-term response in the training (A) and validation (B) cohorts, which showed that the AUC values of the combined model (0.846;0.724)of clinical factors, pet metabolic parameters and RadScore were better than those of the clinical model (0.714;0.556),PET based model (0.664; 0.589), NCCN-IPI model (0.523;0.406) and IPI model (0.510;0.412) (Figure 5).

Performance analysis of the combined models in clinical use
DCA were shown in Figure 6.These analyses demonstrated that the combined model consistently outperformed the clinical model, PET-based model, IPI model and NCCN-IPI model in terms of overall net benefit for most risk thresholds in both the training and validation cohorts.

Survival analysis in the training and validation cohorts
To confirm the added prognostic value of RadScore, we evaluated it in low RadScore groups and high RadScore groups.The low and high-risk groups identified using the RadScore cut-off threshold demonstrated distinct outcomes in terms of PFS and OS in both the training and validation cohorts (Figure 7; Table 6).The

B A
Heatmaps indicate the AUC performance of the cross-combinations of the feature selection methods (columns) and classification models (rows) in predicting mid-term response (A).The Histogram demonstrate the selected features (IBSI name) and weights to build the optimal candidate model (B).prognosis power of the low RadScore group was superior to that of the high RadScore group.
Kaplan Meier analysis showed that both in the training cohort and the validation cohort, the low RadScore group and the high RadScore group showed the same results in PFS and OS.However, the probability of adverse prognosis risk events in the low RadScore group was lower than that in the high RadScore group (Figure 7).

Discussion
In this retrospective study utilizing real-world data, we found that a combined model, which incorporated RadScore, outperformed clinical, PET, and NCCN-IPI models in predicting mid-term efficacy and prognsis of DLBCL patients.This combined model can serve as a valuable tool for individualized outcome prediction and guiding treatment decisions for early-stage, highrisk DLBCL patients.
Accurately predicting the mid-term outcomes of DLBCL patients is crucial for optimizing treatment strategies.Numerous studies have endeavored to evaluate the predictive value of PET radiomics features for DLBCL.Santiago et al (35) demonstrated a models based on radiomics accurately predicted refractory DLBCL.Their study employed RF as a classifier, randomly assigning patients to training (70%) and independent test cohorts (30%).The AUC of the two cohorts was 0.83 and 0.79, respectively.Coskun et el.found that texture features extracted from baseline PET predicted chemotherapy insensitivity to R-CHOP regimens in DLBCL patients with an ROC accuracy of 0.87 (AUC=0.81).Notably, SUVmax and the differences in grey-scale covariance matrix played crucial roles in predicting chemotherapy insensitivity (36).Consistent with prior studies, our study independently associated the RadScore based on PET radiomic features with mid-term outcomes in high-risk DLBCL patients (OR=7.167(95%CI:2.815-18.248),P=0.001).The RadScore on 11 key radiomic features obtained from PET were valuable in predicting the mid-term efficacy of high-risk DLBCL patients.This is likely attributed to the close association between radiomics features and tumor heterogeneity (37,38), which serves as a prognostic determinant of patient survival (39, 40).Nomogram to predict the patient mid-term efficacy risk (A).Calibration curves of the model for predicting mid-term response in the training (B) and validation (C) cohorts.With the increasing utilization of machine learning techniques in extracting and classifying image features.The LASSO model is a selection method that effectively narrows down and regresses from a large pool of potentially multicollinear variables to obtain a set of relevant predictors (32).Many studies have employed the LASSO to identify and classify data features.However, existing studies often employ a single machine learning method for radiomic features selection and construction.In clinical practice, a machine learning method that combines feature classification and cross-validation can enhance the accuracy and generalization of the predicted results (41).In this study, we developed the cross-combination pairs of seven machine learning method generate 49 permutations, and determine the optimal feature selection-classification pairs based on the maximum AUC results to obtain the final RadScore.Our research method made RadScore more robust and reproducible than those studies with single machine learning method.Figure 3 illustrated that the ET-LASSO model had a poor AUC (0.370), while AUC of the LASSO-LASSO model for predicting mid-term efficacy were 0.74.Additionally, our study revealed that the best LASSO-LASSO models selected radiomic features from shape feature (Surface Area), first-order features (Global Intensity Peak etc.) and texture features (GLCM), which indicated that main firstorder features and texture features possess good ability to discriminate high-risk patients.
[18F]-FDG PET/CT can provide information about tumor biology by measuring cellular glucose metabolism.Our study demonstrated that SUVmax as an independent predictor of medium-term efficacy (OR=2.619(95% CI: 1.107-6.194),P=0.028).The result were consistent with previous studies (42).We developed a user-friendly model that integrated RadScore, PET metabolic factors, and clinical risk factors and compared it with other models (e. g. clinical models, PET-based models, IPI model and NCCN-IPI model).The ROC curves and DCA results demonstrated that the combined model outperformed the other models, the performance of IPI model and NCCN-IPI model in the training cohort and the validation cohort were both unsatisfactory.Additionally, the combined model exhibited good agreement with the calibration curve and demonstrated a clear advantage in terms of AUC.These results indicate that the combined model is more suitable and practical for predicting medium-term outcome of DLBCL.Consistent with previous studies (43,44), our results suggest that the IPI and NCCN-IPI may require improvement in identifying intermediate-high/high-risk DLBCL patients who would benefit from non-first-line treatment.Furthermore, our results support the RadScore of radiomic features(shape, firstorder and GLCM) with SUVmax and clinical predictors, aligning with the findings of Jiang et al. (24,45), to accurately identify intermediate-high/high-risk DLBCL patients.
One limitation of our study was its retrospective.We collected patient data from two medical centers, but future studies should include data from additional centers to ensure clinical generalizability.Given the specificity of DLBCL, the distribution of intra-and/or extra-lymph node lesions are highly variable and heterogeneous.The morphological and textural features of the lesions are highly sensitive to tumor segmentation methods.Thus, we employed the 41% of SUVmax tumor segmentation method recommended by the European Association of Nuclear Medicine.This method may be more practical and straightforward to implement in clinical.Additionally, the use of mid-term PET as the study endpoint in our research may lead to false-positive interpretation results.To justify therapeutic decisions, complementary studies utilizing end-stage PET should be conducted in the future.However, a major strength of our study lies in the homogeneity of the included patients, as they all had new-onset DLBCL histology and received R-CHOP-like regimens as standard treatment.The methodology employed also supports the general applicability of our model.

Conclusion
The RadScore is obtained by the feature selection-classification crossover combination of 7×7 machine learning method that included shape feature, first-order features and texture features (GLCM), can serve as a predictor for both mid-term efficacy and prognosis in

B A
Receiver operating characteristic curve of the models for predicting mid-term response in the training (A) and validation (B) cohorts.

B A
Decision curve analysis for the models in the training (A) and validation (B) cohorts.

B A
Kaplan-Meier plots according to RadScore for patients' progression-free survival and overall survival in the training (A) and validation cohorts (B).
DLBCL patients.In addition, the combined model which integrates the RadScore, PET metabolic indicator (SUVmax), and clinical risk factors (sex, B symptoms), can aid in rational risk stratification and facilitate the screening of appropriate treatment regimens for at intermediatehigh/high risk DLBCL patients in the early stages.

1
FIGURE 1 show the baseline and mid-term 18F-FDG PET/CT of the patients.Baseline (A) and mid-term image (B) of the patient without complete remission (Non-CR), and baseline (C) and mid-term image (D) of the patient with complete response (CR).
radiomics features and obtained corresponding feature weights after dimensionality reduction.Based on these feature weights, we trained feature selection models by recursively considering subsets of radiomic features.The feature selection model with the largest area under curve (AUC) value was identified as the most important one.Then(III) fivefold crossvalidation was applied to the reduced training cohort that divided it into approximately equal-sized groups, with four groups used for training and one group for test.(IV) Four training groups had been separately developed using the seven feature classification models.The feature classification model with the largest AUC value was identified as the most important one.(V) We calculated the AUC of each feature selection-classification model and outputted the average AUC.The model with the largest average AUC was selected as the optimal candidate model.(VI) Finally, we validated the optimal model in the test group.

TABLE 1
Demographics and clinical characteristics of the study population.

TABLE 2
The 110 radiomic features extracted from PET and the 11 key features* for constructing RadScore in this study.

TABLE 3
Optimal cut-off thresholds of SUVmax, MTV, TLG and RadScore area under the curve (AUC) of mid-term outcome, progression-free survival and overall survival in the training and validation cohorts.

TABLE 4
Univariate and multivariate analyses of factors predictive of mid-term treatment outcome in the training cohort.

TABLE 5
The mid-term treatment outcome prediction models included in this study.