18F-FDG PET/CT radiomics signature and clinical parameters predict progression-free survival in breast cancer patients: A preliminary study

Introduction This study aimed to investigate the feasibility of predicting progression-free survival (PFS) in breast cancer patients using pretreatment 18F-fluorodeoxyglucose positron emission tomography/computed tomography (FDG PET/CT) radiomics signature and clinical parameters. Methods Breast cancer patients who underwent 18F-FDG PET/CT imaging before treatment from January 2012 to December 2020 were eligible for study inclusion. Eighty-seven patients were randomly divided into training (n = 61) and internal test sets (n = 26) and an additional 25 patients were used as the external validation set. Clinical parameters, including age, tumor size, molecular subtype, clinical TNM stage, and laboratory findings were collected. Radiomics features were extracted from preoperative PET/CT images. Least absolute shrinkage and selection operators were applied to shrink feature size and build a predictive radiomics signature. Univariate and multivariate Cox proportional hazards models and Kaplan-Meier analysis were used to assess the association of rad-score and clinical parameter with PFS. Nomograms were constructed to visualize survival prediction. C-index and calibration curve were used to evaluate nomogram performance. Results Eleven radiomics features were selected to generate rad-score. The clinical model comprised three parameters: clinical M stage, CA125, and pathological N stage. Rad-score and clinical-model were significantly associated with PFS in the training set (P< 0.01) but not the test set. The integrated clinical-radiomics (ICR) model was significantly associated with PFS in both the training and test sets (P< 0.01). The ICR model nomogram had a significantly higher C-index than the clinical model and rad-score in the training and test sets. The C-index of the ICR model in the external validation set was 0.754 (95% confidence interval, 0.726–0.812). PFS significantly differed between the low- and high-risk groups stratified by the nomogram (P = 0.009). The calibration curve indicated the ICR model provided the greatest clinical benefit. Conclusion The ICR model, which combined clinical parameters and preoperative 18F-FDG PET/CT imaging, was able to independently predict PFS in breast cancer patients and was superior to the clinical model alone and rad-score alone.


Introduction
Breast cancer is the most prevalent cancer and leading cause of cancer death in women (1). Although adjuvant therapy had improved survival, 5-year overall relative survival rates for locally advanced and metastatic breast cancer were 55% and 18%, respectively (2). Determining predictors of survival is essential for developing individualized treatment strategies and improving prognosis.
High intratumoral heterogeneity in breast cancer is associated with worse prognosis (3,4) and is difficult to ascertain using typical invasive biopsy techniques. Clinicopathological parameters including age, tumor size and stage, and metastasis status are conventional prognostic factors for breast cancer (5). However, clinical outcomes may vary because of highly heterogeneity and these factors alone may not provide accurate prognostic information.
Radiomics can noninvasively characterize intratumoral heterogeneity by extracting multiple high-dimensional quantitative features from medical images. This approach has the ability to reveal the biological behavior of the entire tumor and has great potential to predict prognosis (16)(17)(18)(19).
Therefore, this study aimed to develop and validate model nomograms to predict progression-free survival (PFS) in breast cancer patients using clinical parameters and PET/CT radiomics features.

Study population
This retrospective study was approved by Ethics Committee of the Union Hospital of Tongji Medical College of Huazhong University of Science and Technology, and the requirement for written informed consent was waived. We retrospectively analyzed 87 female breast cancer patients (51.8 ± 12.9 years, range 25.0-81.0) who underwent 18 F-FDG PET/CT imaging before treatment in our institution (first center) from January 2012 to December 2020. Patients were randomly divided into a training set (n = 61) and internal test set (n = 26). A total of additional 25 patients (female, 55.9 ± 11.1 years, range 35.0-82.0) from the first center (Wuhan Union Hospital) and second center (Taizhou Hospital) were collected as an external validation set.
Patients who underwent treatment before PET/CT and those with a history of other cancer, unknown molecular subtype, or blood glucose concentration > 11.1 mmol/L before 18 F-FDG injection were excluded. We also excluded patients with missing data and those lost to follow-up. A study flowchart is shown in Figure 1A.

Clinical evaluation
Clinical parameters, including age, tumor size, molecular subtype, TNM stage, and concentrations of pretreatment carcinoembryonic antigen (CEA), carbohydrate antigen 125 (CA125), and carbohydrate antigen 15-3 (CA15-3) were recorded. F-FDG was synthesized using 18 F produced by a cyclotron (MINItrace ® , GE Healthcare, Milwaukee, WI, USA) with radiochemical purity >95%. All patients were required to fast for at least 6 hours before 18 F-FDG injection. Blood glucose concentration was measured prior to injection (only patients with concentration ≤ 11.1 mmol/L were included). Intravenous 18 F-FDG (3.70-5.55 MBq/kg) was administered and PET/CT was performed approximately 60 minutes later using a Discovery VCT ® system (GE Healthcare). PET/CT acquisition and reconstruction parameters are shown in the Additional file 1.

Delineation and segmentation of PET/CT images
The radiomics workflow is shown in Figure 1B. 18 F-FDG PET/ CT digital imaging and communications in medicine images were retrieved and loaded into ITK-SNAP software (www.itksnap.org) for manual segmentation. Before PET image segmentation, 40% maximum standardized uptake value threshold mapping was calculated using LIFEx (https://www.lifexsoft.org/). Delineation of the region of interest (ROI) was performed manually by a nuclear medicine physician with 3 years of experience (XX). All ROIs were segmented by two nuclear medicine physicians with more than 15 years of experience (XS and XL). Repeatability of parameters extracted from the ROIs segmented by these two physicians was evaluated using the interclass correlation coefficient (ICC), and reserved the parameters with ICC coefficient greater than 0.6.

Radiomics features extraction
The PyRadiomics feature package imported into Anaconda prompt software (github.com/Radiomics/pyradiomics, version 4.2.0) was used to extract radiomics features according to the feature guide of the image biomarker standardization initiative. The categories and number of extracted radiomics features are detailed in the Additional file 1.

Features screening and models construction
Continuous variables were concentrated and standardized. Eighty-seven patients were randomly divided into training and test sets at a ratio of 7:3.

Radiomics signature (Rad-score)
The minimal redundancy maximal relevance (mRMR) algorithm (31), which can improve the accuracy of feature selection and classification, was used to select the initial features in the training set. The least absolute shrinkage and selection operator (LASSO) was used to screen features. Parameters corresponding to the minimum penalty and weight coefficients were selected to construct the radiomics signature. The radiomics signature was calculated for each patient by a linear combination of selected features weighted by their respective coefficients.

Clinical model
In the training set, univariate and multivariate Cox proportional hazard regression were used to analyze and screen clinical features. Features were selected using the minimum Akaike information criterion to avoid overfitting. Furthermore, associations between the clinical parameters and PFS in the training set were evaluated and then verified in the test set.

Integrated clinical-radiomics model
Clinical features and rad-score were used to create a multivariate Cox proportional hazard regression model.

Evaluation of model performance
To evaluate model performance, the radiomics nomogram, clinical nomogram, and ICR model nomogram were built in the training set, then evaluated in the internal test set, and verified in the external validation set.
The concordance index (C-index), which measures the proportion of the predicted results consistent with the actual results in all patient pairs, was used to evaluate discriminating ability. C-index between 0.50 and 0.70 indicated poor accuracy, while a value between 0.71 and 0.90 indicated moderate accuracy; values above 0.90 indicated high accuracy (32). Bootstrap verification (2000 Bootstrap resampling) was performed on the training and test sets to calculate the relative corrected C-index.
A calibration curve was used to validate the ICR model nomogram performance, which used bootstrap resampling to evaluate the original data. Integrated area under curve (iAUC) of the receiver operating characteristic (ROC) curve was used to evaluate predictive performance of the combined model.

Outcome evaluation
Follow-up was conducted by clinic visits or telephone. The study endpoint was PFS. PFS was defined as time from the date of initial PET/CT to the date of disease progression, recurrence, death from any cause, or last follow-up. Patients who did not have progression/recurrence at the date of their last clinical follow-up were considered as a censored data.

Statistical analysis
Categorical variables were compared using Pearson's chi-square test. Continuous variables were compared using the unpaired twotailed Students t-test assuming or the Wilcoxon rank sum test as appropriate. P< 0.05 was considered significant. ROC curve analysis was used to determine the rad-score threshold and divide patients into high-and low-risk groups. Survival was analyzed using the Kaplan-Meier method. Survival curves were compared using the log-rank test. Statistical analyses were performed using R software version 3.6.4 (www.rproject.org). The packages used included lattice, use this, devtools, tidyverse, caret, publish, survival, glmnet, ggpubr, survminer, rolr, survIDINRI, survAUC, rms, dca.)

Patient characteristics
A total of 112 newly diagnosed breast cancer patients were included for analysis. The clinicopathological characteristics of the training (n = 61) and internal test (n = 26) sets patients are shown in Table 1. Characteristics of the 25 patients in external validation set are summarized in Supplementary Table S1.

PFS
All patients underwent breast-conserving surgery or mastectomy. The details of adjuvant therapy (including radiotherapy, chemotherapy and endocrine therapy) are shown in Table 1

Radiomics signature construction and testing
Based on the training set, a total of 1920 PET/CT radiomics features were extracted. A LASSO Cox regression was performed to achieve regression coefficient compression and select variables (Figures 2A, B).
After screening, 11 radiomics features were included in the final model: original_shape_Elongation.PET, wavelet_HHL_ g l d m _ D e p e n d e n c e V a r i a n c e . P E T , l o g _ s i g m a _ 5 _ 0 _ mm_3D_glcm_ClusterShade.CT, wavelet_LLL_glcm_Inverse Variance.CT, log_sigma_2_0_mm_3D_glszm_Large Area Low GrayLevelEmphasis, wavelet_LLH_glszm_SizeZoneNon UniformityNormalized.CT, wavelet_LLH_glszm_SmallArea E m p h a s i s . P E T , l o g _ s i g m a _ 5 _ 0 _ m m _ 3 D _ g l d m _ GrayLevelVariance.CT, wavelet_HHL_firstorder_Mean.CT, l o g _ s i g m a _ 4 _ 0 _ m m _ 3 D _ fi r s t o r d e r _ M e d i a n a n d wavelet_HLL_glszm_SmallAreaEmphasis.PET. Rad-score was calculated for each patient using a linear combination of selected features weighted by their respective coefficients as follows ( Figure 2C): The scores of patients in the training and test sets were calculated through the constructed radiomics signature. Patients were divided into high-and low-risk groups based on the optimal cutoff determined by ROC curve analysis. In the training set, PFS was significantly shorter in patients with a higher rad-score (P< 0.001; Figure 3A). In the test set, the difference was not significant (P = 0.260, Figure 3B).
The ICR model equation was as follows: (1:345ÂInitial M staging+0:0019ÂCA125+0:293 Âpathological N staging+1:87ÂRad−score) S significantly differed between the high-and low risk groups in both the training and test sets (P< 0.001 and P = 0.003, respectively;  Figures 3E, F). The ICR model was examined for correlation between parameters using Spearman analysis; parameters with the same trend were examined through unsupervised cluster analysis. Hierarchically clustered heatmap of the feature correlation matrix is shown in Figure 4. Features with an inter-correlation above the selected threshold (≥0.7) were removed from the dataset.
In the ICR model, mean iAUC in the training and test sets was 0.835 and 0.826, respectively ( Figure 6A). To assess consistency between predicted and actual PFS, calibration curves of the ICR model in the training and test sets were plotted ( Figure 6B). Agreement between the predicted and observed curves was good and the bias curves in both sets were near to the ideal line.

Models constructed based on PET or CT alone
Rad-score was also constructed based on PET and CT images alone. The regression coefficient and variable selection are shown in the Supplementary Figure S1. Compared with PET/CT, the performance of the rad-score and ICR model as constructed by PET and CT alone was worse (Supplementary Figures S2, S3).

Performance in the external validation set
To fully evaluate the ICR model performance, external validation was performed. The ICR model nomogram yielded a favorable C-index value in the external validation set (0.754; 95% CI, 0.726-0.812). PFS significantly differed between the low-and high-risk groups stratified by the nomogram (Figure 6C), suggesting good prognostic value (P = 0.009).

Discussion
In this study, we retrospectively analyzed newly diagnosed breast cancer patients and developed models based on 18 F-FDG PET/CT imaging and clinical parameters before treatment to predict PFS. Through internal and external validation, we demonstrated that the ICR model could predict PFS well. Moreover, the ICR model was significantly better than models comprised solely of clinicopathologic variables or PET/CT imaging data. This emphasizes and supports the importance of multidisciplinary collaboration and indicates that integration of clinical parameters and PET/CT imaging features can better predict breast cancer progression and improve prognosis. Our model provides a simple and easily used tool for breast cancer patients with strong heterogeneity, aiding clinicians in rapidly evaluating the probability of progression. However, it still needs to be validated in large prospective studies.
The ICR model was able to predict PFS of breast cancer patients with a higher C-index and better calibration than the radiomics signature or clinical model. It took advantage of the synergy of rad-  Hierarchically clustered heatmap of the feature correlation matrix. Features with an inter-correlation above the selected threshold (≥0.7) were removed from the dataset. Clinical (A) and ICR (B) model nomograms to predict survival using the training set. Drawing a vertical line to the points' axis from specific variables determined the number of points toward the probability of progression-free survival. The process was repeated for each variable and the points for each risk factor were added. The final total was then located on the total points axis.  score and clinical features, which concurred well with the results of previous studies (3,22,24,26,30,(33)(34)(35). Our results also showed that the addition of rad-score to clinical data might be used for risk assessment. PET/CT radiomics has shown considerable potential for prognostication in breast cancer patients. In our study, rad-score comprised four PET radiomics features and seven CT features. Most were derived texture features, including GLCM, GLDM, and GLSZM. These features reflect the interaction between adjacent pixels, which are appropriate for quantifying textural heterogeneity of tumors. The prognostic value of these features in breast cancer has been reported and emphasized in previous studies (22,30,36,37).
In this study, rad-score was an independent predictor of PFS in the training set but not the test set, although rad-score was higher in patients who experienced tumor progression. The results in previous studies that examined PET/CT radiomics in breast cancer prognostication were also inconsistent. However, most yielded promising findings, suggesting that rad-score is an independent prognostic factor (3,22,27). Similar to our study, Groheux et al. (38) found that entropy value derived from PET/CT imaging could predict event-free survival of locally advanced breast cancer (P< 0.050); however, in multivariate analysis, PET texture analysis had no added value. The likely reason was that, first, to avoid and reduce the over-fitting effect, the radiomics features were de-redundant and removing impurity when constructing the radscore model in the training set. Second, due to our small sample size, the amount of data in the model training process was small, the performance might be reduced. Furthermore, our model might be affected by heterogeneity between different datasets and research methodologies.
Similar to rad-score, the clinical model alone did not independently predict PFS in the test set, which suggests that clinical parameters alone do not accurately reflect heterogeneity and the risk of progression. Among clinical parameters, N and M stage are well-known conventional prognostic factors (11,39). In addition, 18 F-FDG PET/CT has the ability to detect distant metastases, which adds to its value in prognostic evaluation.
This study had several limitations. It was retrospective in design and had both a small sample size and relatively short follow-up. In addition, ROI delineation and calculation of imaging parameters were not automatically performed. Prospective large-scale multicenter studies are warranted to validate our models and expand the application of PET/CT radiomics in breast cancer.
In conclusion, our ICR model, which combines clinical parameters with radiomics score, shows considerable promise in predicting PFS in breast cancer patients and deserves further study.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The studies involving human participants were reviewed and approved by the Committee of Union Hospital, Tongji Medical College. The patients/participants provided their written informed consent to participate in this study.