ORIGINAL RESEARCH article

Front. Oncol., 13 August 2024

Sec. Gynecological Oncology

Volume 14 - 2024 | https://doi.org/10.3389/fonc.2024.1400109

Hematological indicator-based machine learning models for preoperative prediction of lymph node metastasis in cervical cancer

  • 1. School of Medical Imaging, Bengbu Medical University, Bengbu, Anhui, China

  • 2. Department of Gynecology and Oncology, First Affiliated Hospital, Bengbu Medical University, Bengbu, Anhui, China

Article metrics

View details

6

Citations

2k

Views

653

Downloads

Abstract

Background:

Lymph node metastasis (LNM) is an important prognostic factor for cervical cancer (CC) and determines the treatment strategy. Hematological indicators have been reported as being useful biomarkers for the prognosis of a variety of cancers. This study aimed to evaluate the feasibility of machine learning models characterized by preoperative hematological indicators to predict the LNM status of CC patients before surgery.

Methods:

The clinical data of 236 patients with pathologically confirmed CC were retrospectively analyzed at the Gynecology Oncology Department of the First Affiliated Hospital of Bengbu Medical University from November 2020 to August 2022. The least absolute shrinkage and selection operator (LASSO) was used to select 21 features from 35 hematological indicators and for the construction of 6 machine learning predictive models, including Adaptive Boosting (AdaBoost), Gaussian Naive Bayes (GNB), and Logistic Regression (LR), as well as Random Forest (RF), Support Vector Machines (SVM), and Extreme Gradient Boosting (XGBoost). Evaluation metrics of predictive models included the area under the receiver operating characteristic curve (AUC), accuracy, specificity, sensitivity, and F1-score.

Results:

RF has the best overall predictive performance for ten-fold cross-validation in the training set. The specific performance indicators of RF were AUC (0.910, 95% confidence interval [CI]: 0.820–1.000), accuracy (0.831, 95% CI: 0.702–0.960), specificity (0.835, 95% CI: 0.708–0.962), sensitivity (0.831, 95% CI: 0.702–0.960), and F1-score (0.829, 95% CI: 0.696–0.962). RF had the highest AUC in the testing set (AUC = 0.854).

Conclusion:

RF based on preoperative hematological indicators that are easily available in clinical practice showed superior performance in the preoperative prediction of CC LNM. However, investigations on larger external cohorts of patients are required for further validation of our findings.

Introduction

Cervical cancer (CC) is one of the most common gynecological malignancies, with 600,000 new cases and 340,000 deaths reported worldwide in 2020 (1). Multiple studies have demonstrated that lymph node metastasis is an important independent risk factor affecting the prognosis of patients with CC and remains the major cause of mortality in CC patients (2, 3). The 5-year overall survival rate of CC patients without LNM is 80–90%, whereas in those patients with LNM, it is reduced to 50–65% (46). Therefore, the 2018 International Federation of Gynecology and Obstetrics (FIGO) officially incorporated LNM into the CC staging system (7). The importance of LNM in the diagnosis, treatment decision and prognosis assessment of CC is increasing. For early-stage CC patients without LNM, radical hysterectomy is recommended (8); for CC patients with LNM, radiotherapy or chemotherapy is the recommended treatment (9). Therefore, the accurate preoperative evaluation of LNM status in CC patients is essential for treatment decisions and prognostic assessment.

Lymph nodes biopsy is the gold standard for diagnosing LNM status (10); however, it is invasive and can cause complications, such as pain and lymphedema (11). Currently, imaging examination is a conventional diagnostic method for the preoperative and noninvasive evaluation of LNM status. Common imaging examinations include computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography-CT (PET-CT) (10, 12). However, the detection of metastatic lymph nodes via CT and MRI mainly relies on morphological criteria and has relatively low sensitivity (38–56%) (13). Although PET-CT is considered the most effective method for detecting CC LNM, it has a high false-positive rate (1416). By challenging the limits of traditional imaging examinations, emerging radiomics can further improve the accuracy of preoperative prediction of CC LNM (17, 18). However, the current research on radiomics for the preoperative prediction of CC LNM is still in its initial stages, and there is still a gap in knowledge from a practical application standpoint.

In recent years, with the development of artificial intelligence technology, machine learning (ML) has been playing an increasingly important role in the identification of LNM status in a variety of cancers, including breast cancer, kidney cancer, colon cancer, lung cancer, and cervical cancer (1923). For example, Arezzo et al. (23) developed an Extreme Gradient Boosting (XGBoost) model based on clinical features and pelvic MRI features for the prediction of LNM in patients with advanced CC. The results of the study showed that the XGBoost model exhibited good predictive performance (89% accuracy, 83% precision, 83% recall, 0.79 AUC). Yu et al. (19) used the Random Forest (RF) algorithm to select MRI radiomics features and establish a Support Vector Machines (SVM) model for predicting axillary lymph node status in breast cancer. The results showed that the AUC of SVM in the training cohort and the external validation cohort were 0.90 and 0.91, respectively. All of the above studies show that ML models have some potential in predicting cancer LNM status.

Hematological indicators are quantifiable indicators that are clinically accessible. Previous studies have suggested associations between some hematological indicators and CC LNM. For example, increased preoperative plasma squamous cell carcinoma antigen (SCC-Ag) levels may predict an increased incidence of CC LNM (24, 25). Moreover, Gavrilescu et al. (26) demonstrated that CC patients without LNM had a significantly higher neutrophil-lymphocyte ratio (NLR) than CC patients with LNM. To our knowledge, no studies have used pure hematological indicators to build machine learning models for the preoperative prediction of LNM status in CC patients. Therefore, this study aimed to evaluate the feasibility of machine learning models characterized by preoperative hematological indicators to predict the LNM status of CC patients before surgery.

Methods

Participant characteristics

The clinical data of CC patients who were admitted to the Department of Gynecology and Oncology of the First Affiliated Hospital of Bengbu Medical University (Anhui, China) from November 2020 to April 2021 were retrospectively analyzed. The inclusion criteria were as follows: (1) patients who were first diagnosed with CC; (2) in line with the indication of CC radical surgery, radical hysterectomy and pelvic lymph node dissection were performed; and (3) patients with CC that were confirmed via postoperative pathology. The exclusion criteria were as follows: (1) patients complicated with other malignancies; and (2) patients with missing clinical and pathological data.

This retrospective study was approved by the Clinical Medical Research Ethics Committee of The First Affiliated Hospital of Bengbu Medical University (Bengbu, Anhui, China) (registration number: 2021KY010). The experiments were performed in strict accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments. Written inform consent was waived by the Clinical Medical Research Ethics Committee of The First Affiliated Hospital of Bengbu Medical University.

Data collection and feature selection

Clinical features and hematological indicators were collected from the clinical data for patients with CC. Hematological indicators included routine blood indicators, routine biochemical indicators, coagulation function indicators, and tumor markers. Routine blood indicators included white blood cell (WBC), percentage of neutrophil (NEUT %), percentage of lymphocyte (LYM %), percentage of monocytes (MON %), hemoglobin (HGB), platelet large cell ratio (PLCR), etc.; routine biochemical indicators included alanine aminotransferase (ALT), aspartate aminotransferase (AST), prealbumin (PAB), total protein (TP), albumin (ALB), globulin (GLB), total cholesterol (TCHO), low-density lipoprotein (LDL), Cystatin C (Cys C), c-reactive protein (CRP), superoxide dismutase (SOD), etc.; coagulation function indicators included prothrombin time (PT), fibrinogen (FIB), D-dimer (DD), thrombin time (TT), activated partial thromboplastin time (APTT), international normal ratio of prothrombin time (PT-INR), and prothrombin activity (PTA); tumor markers included squamous cell carcinoma antigen (SCC-Ag). Various hematological indicators were measured by using a BC-6000plus automated hematology analyzer (Mindray, Shenzhen, China), a Sysmex CS5100 automatic blood coagulation analyzer (Sysmex, Kobe, Honshu Island, Japan), and an automatic biochemical analyzer (MEDATC, Shanghai, China).

Data set division and data class balance

The training set and the testing set were randomly generated in a ratio of 7:3. However, there is an imbalance of sample categories in the training dataset (21% of the samples with LNM and 79% of the samples without LNM), which can lead to a large bias in the classification results of the machine learning models (27). Currently common class balancing methods include random oversampling, random undersampling and synthetic sampling methods. Both random oversampling and random undersampling can balance the distribution of sample classes in the dataset, which is conducive to alleviating the data imbalance problem. However, random oversampling will repeat a few class samples in the dataset many times, which can easily lead to overfitting of the model; random undersampling will remove some samples in the dataset, which leads to the problem of information loss. Synthetic sampling methods are an improvement on random sampling methods, and the most classic and popular synthetic sampling method is the synthetic minority over-sampling technique (SMOTE) (28). This method can effectively reduce the overfitting of the model and enhance the generalization ability of the model by randomly constructing non-repeating samples on the connecting lines of the same few classes of samples. SMOTE can compensate for the shortcomings of random oversampling to some extent. Therefore, in this study, the SMOTE method was used to class balance the training set prior to feature selection.

Feature selection

In biological data, the performance of various machine learning classifiers depends heavily on the selection of important features. The methods of feature selection are categorized into rank-based and subset methods (29). Ranking-based feature selection methods do not depend on the performance of the algorithm, are computationally fast and less prone to overfitting, and can rank the importance of all features. Popular ranking-based methods include information gain, Fisher score, chi-square and minimum redundancy maximum relevance (30). However, ranking-based methods do not consider the joint importance of features and lack a threshold to determine the optimal number of features. Therefore, the ranking-based feature selection method was not selected for this study. Subset methods are feature selection methods that determine thresholds based on certain criteria to select the optimal subset of features (31). Popular subset-based methods include the least absolute shrinkage and selection operator (LASSO) and Recursive Feature Elimination (RFE) (32). However, RFE is a feature selection method based on a particular machine learning model (such as XGBoost, RF, and SVM). In order to avoid the influence of the basic model used for RFE on the results of the study, only the LASSO method was used for feature selection in this study.

Establishment and evaluation of machine learning models

Following the recommendations made by the Scikit-Learn developers, we used six supervised machine learning models to predict CC LNM. The six machine learning models were Adaptive Boosting (AdaBoost), Gaussian Naive Bayes (GNB), Logistic Regression (LR), RF, SVM and XGBoost.

In this study, accuracy, specificity, sensitivity, F1-score and the areas under the receiver operator characteristic curves (AUC) were used as assessment metrics to compare the performance of the models. The ten-fold cross-validation was performed in the training set, and the AUC of the ten-fold cross-validation was used as the main evaluation metric to identify the machine learning model with the best prediction performance. This study evaluates the prediction performance of six machine learning models in the testing set using the receiver operating characteristic (ROC) curves.

Python (version 3.9) was used to build and verify machine learning models. The flowchart for building and validating machine learning models was shown in Figure 1.

Figure 1

Statistical analysis

There are three main types of data representation: mean ± standard deviation (SD) for normal continuous data, median [interquartile range (IQR)] for non-normal continuous data, and count (percentages) for counting data. The Shapiro-Wilk test was used to examine the normality distribution of the continuous data. For the comparison of all variables between CC patients with and without LNM, the independent sample t test and the Mann-Whitney U test were used to analyze the normal and non-normal continuous data, respectively, and the chi-square test was used for analyzing the counting data. The DeLong test was used to compare the differences between the ROC curves of the six machine learning models (33). Statistical analysis was performed by using SPSS Statistics 26.0 (IBM Corp., Chicago, Illinois, United States of America) software and MedCalc 20.1.0 (Solvusoft., Las Vegas, Nevada, United States of America) software. P values less than 0.05 (P < 0.05) were considered to be statistically significant.

Results

Participant characteristics

The clinical characteristics of the CC patients are shown in Table 1. A total of 236 patients with CC were enrolled in this study, and the mean age and body mass index (BMI) of the patients were 53.6 ± 10.5 years and 24.7 ± 3.1 kg/m2, respectively. All of the CC patients were classified into two groups (LNM group, n = 49; non-LNM group, n = 187) according to the results of histopathological examinations. The results of the independent sample t test and the chi-square test showed that there were no significant differences in age, BMI, menopausal status, tubal ligation, diabetes, hypertension, histological subtypes of cervical cancer and lymphovascular space invasion between the LNM group and non-LNM group (P > 0.05). There was a significant difference in FIGO staging between the LNM group and Non-LNM group (P < 0.05).

Table 1

CharacteristicsTotal
(N = 236)
LNM
(N = 49)
Non-LNM
(N = 187)
P value
Age (years)53.6 ± 10.554.5 ± 10.353.3 ± 10.60.501
BMI (kg/m2)24.7 ± 3.125.1 ± 2.624.6 ± 3.30.268
Menopausal status0.590
Premenopausal98 (41.5%)22 (44.9%)76 (40.6%)
Postmenopausal138 (58.5%)27 (55.1%)111 (59.4%)
Tubal ligation0.800
Negative129 (54.7%)26 (53.1%)103 (55.1%)
Positive107 (45.3%)23 (46.9%)84 (44.9%)
Diabetes0.829
Negative225 (95.3%)47 (95.9%)178 (95.2%)
Positive11 (4.7%)2 (4.1%)9 (4.8%)
Hypertension0.379
Negative192 (81.4%)42 (85.7%)150 (80.2%)
Positive44 (18.6%)7 (14.3%)37 (19.8%)
FIGO stage<0.001
I133 (56.4%)15 (30.6%)118 (63.1%)
II87 (36.8%)20 (40.8%)67 (35.8%)
III16 (6.8%)14 (28.6%)2 (1.1%)
IV0 (0%)0 (0%)0 (0%)
LVSI1.000
Negative3 (1.3%)0 (0%)3 (1.6%)
Positive233 (98.7%)49 (100%)184 (98.4%)
Histological subtypes0.163
adenocarcinoma28 (11.9%)3 (6.1%)25 (13.4%)
SCC208 (88.1%)46 (93.9%)162 (86.6%)

Clinical characteristics of cervical cancer patients.

Values are expressed as the number of patients (percentages) or mean ± SD. P values refer to the results of independent samples t test and chi-square test.

N, number of individuals; LNM, lymph node metastasis; BMI, body mass index; FIGO, International Federation of Gynecology and Obstetrics; LVSI, lymphovascular space invasion; SCC, squamous cell carcinoma.

Table 2 shows the basic descriptive statistics of the hematological indicators in CC patients, as well as the results of the independent sample t test and the Mann-Whitney U test. In the univariate analyses, 8 hematological indicators, including SCC-Ag, DD, HGB, PAB, TP, ALB, TCHO, and LDL, were significantly different between the LNM group and the non-LNM group (P < 0.05). These results were based on the raw data analysis of 236 CC patients, whereas the feature selection was based on the processed data. Class balancing of the training data by using SMOTE resulted in 130 CC patients with LNM and 130 CC patients without LNM in the training set.

Table 2

CharacteristicsTotal
(N = 236)
LNM
(N = 49)
Non-LNM
(N = 187)
P value
SCC-Ag1.70 [3.93]6.00 [12.83]1.40 [2.71]<0.001
PT10.83 ± 0.5910.79 ± 0.5710.84 ± 0.600.605
PT-INR0.94 ± 0.050.93 ± 0.050.94 ± 0.050.318
PTA111.55 ± 10.78110.95 ± 10.89111.71 ± 10.770.661
TT18.27 ± 1.3418.35 ± 1.5918.25 ± 1.260.703
APTT25.24 ± 2.1224.94 ± 2.0325.31 ± 2.140.270
FIB2.93 ± 0.792.83 ± 0.712.95 ± 0.810.349
DD0.33 [0.32]0.47 [0.49]0.32 [0.25]0.003
WBC5.92 ± 1.735.55 ± 1.396.01 ± 1.800.092
NEUT3.23 [1.59]3.03 [2.04]3.26 [1.47]0.222
LYM1.84 ± 0.591.72 ± 0.541.87 ± 0.610.117
MON0.40 [0.15]0.41 [0.13]0.40 [0.16]0.359
NEUT %58.22 ± 9.3458.29 ± 10.3858.21 ± 9.080.959
LYM %31.96 ± 8.9932.02 ± 9.8631.94 ± 8.790.955
MON %7.16 ± 2.907.35 ± 1.857.11 ± 3.120.613
HGB121.21 ± 14.49117.00 ± 16.17122.31 ± 13.860.022
PLT243.90 ± 65.46241.86 ± 61.43244.44 ± 66.630.807
MPV10.80 [1.48]10.70 [1.65]10.80 [1.50]0.423
PCT0.26 ± 0.070.26 ± 0.070.26 ± 0.070.652
PDW12.70 [2.88]12.50 [2.90]12.70 [2.90]0.429
P-LCR32.14 ± 9.2131.11 ± 9.7532.41 ± 9.070.378
ALT14.00 [5.00]14.00 [6.00]14.00 [5.00]0.475
AST20.06 ± 11.4121.22 ± 14.3619.76 ± 10.520.425
PAB264.15 ± 59.61246.35 ± 48.24268.81 ± 61.510.019
TP68.39 ± 4.9967.00 ± 4.8268.76 ± 4.990.028
ALB42.31 ± 3.4641.30 ± 3.0042.57 ± 3.530.022
GLB26.08 ± 4.5725.69 ± 4.6926.18 ± 4.550.508
A/G1.60 [0.50]1.60 [0.35]1.60 [0.50]0.892
TCHO4.30 ± 0.914.01 ± 0.764.38 ± 0.930.011
TG1.11 [0.91]1.12 [0.80]1.11 [0.98]0.830
HDL1.13 ± 0.261.07 ± 0.231.14 ± 0.260.101
LDL2.57 ± 0.642.41 ± 0.542.61 ± 0.660.032
Cys C0.80 [0.20]0.80 [0.23]0.80 [0.20]0.994
CPR1.60 [2.21]1.50 [2.66]1.60 [2.20]0.643
SOD170.97 ± 24.48166.20 ± 17.46172.22 ± 25.900.126

Association between hematologic indicators and lymph node metastasis status.

Values are expressed as mean ± SD or median [interquartile range (IQR)]. P values refer to the results of independent samples t-test or Mann-Whitney U test. Bold values indicates a P value of less than 0.05.

N, number of individuals; LNM, lymph node metastasis; SCC-Ag, squamous cell carcinoma antigen; PT, prothrombin time; PT-INR, international normal ratio of prothrombin time; PTA, prothrombin activity; TT, thrombin time; APTT, activated partial thromboplastin time; FIB, fibrinogen; DD, D-dimer; WBC, white blood cell; NEUT, neutrophils; LYM, lymphocytes; MON, monocytes; NEUT %, percentage of neutrophils; LYM %, percentage of lymphocytes; MON %, percentage of monocytes; HGB, hemoglobin; PLT, platelets; MPV, mean platelet volume; PCT, the product of MPV and PLT; PDW, platelet distribution width; PLCR, platelet large cell ratio; ALT, alanine aminotransferase; AST, aspartate aminotransferase; PAB, prealbumin; TP, total protein; ALB, albumin; GLB, globulin; A/G, albumin to globulin ratio; TCHO, total cholesterol; TG, triglyceride; HDL, high-density lipoprotein; LDL, low-density lipoprotein; Cys C, cystatin C; CRP, c-reactive protein; SOD, superoxide dismutase.

Feature selection

In this study, the LASSO feature selection technique was applied to select 21 features from the 35 features in the training dataset. Figure 2 illustrated the features selected by LASSO and their estimated coefficients. The top 21 hematological indicators in terms of coefficients (from high to low) were TCHO, SCC-Ag, DD, FIB, NEUT %, CRP, LYM %, AST, APTT, TT, TP, ALT, PLCR, GLB, PTA, WBC, MON %, PAB, SOD, HGB and Cys C. The absolute value of coefficients reflects the feature importance of hematological indicators.

Figure 2

Establishment and evaluation of machine learning models

The results of ten-fold cross-validation in the training set show that the RF model outperforms the other five machine learning models (including AdaBoost, GNB, LR, SVM, and XGBoost) in all predictive indicators (Table 3). The specific performance indicators of the RF model were AUC (0.910, 95% confidence interval [CI]: 0.820–1.000) (Figure 3), accuracy (0.831, 95% CI: 0.702–0.960), specificity (0.835, 95% CI: 0.708–0.962), sensitivity (0.831, 95% CI: 0.702–0.960), and F1-score (0.829, 95% CI: 0.696–0.962).

Table 3

ModelAccuracySpecificitySensitivityF1-scorePPVNPVAUC
AdaBoost0.785 (0.145)0.797 (0.147)0.785 (0.145)0.782 (0.146)1.000 (0.000)0.992 (0.000)0.831 (0.130)
GNB0.611 (0.081)0.687 (0.101)0.612 (0.081)0.558 (0.104)0.794 (0.024)0.793 (0.034)0.786 (0.122)
LR0.719 (0.097)0.723 (0.100)0.719 (0.097)0.718 (0.098)0.796 (0.024)0.760 (0.023)0.793 (0.102)
RF0.831 (0.129)0.835 (0.127)0.831 (0.129)0.829 (0.133)1.000 (0.000)0.986 (0.007)0.910 (0.090)
SVM0.719 (0.100)0.731 (0.107)0.719 (0.100)0.716 (0.101)0.663 (0.011)0.763 (0.024)0.782 (0.124)
XGBoost0.826 (0.129)0.832 (0.129)0.827 (0.129)0.826 (0.129)1.000 (0.000)0.992 (0.000)0.901 (0.107)

Ten-fold cross-validated predictive performance of the six models in the training set.

Values are expressed as mean (standard deviation).

AdaBoost, Adaptive Boosting; GNB, Gaussian Naive Bayes; LR, Logistic Regression; RF, Random Forest; SVM, Support Vector Machines; XGBoost, Extreme Gradient Boosting; PPV, Positive Predictive Value; NPV, Negative Predictive Value; AUC, Area under receiver operating characteristic curve.

Figure 3

Figure 4 showed the ROC curves of six machine learning models for predicting CC LNM on the testing set. Among them, RF had the highest AUC value (AUC = 0.854), which was significantly higher than the other five models (all P values < 0.05, Delong test), which was a key metric for assessing the performance of predictive models. In the testing set, the accuracy, specificity, sensitivity, F1-score, and AUC of the RF model were all above 0.8, and the RF model showed the best performance among the six machine learning algorithms (Table 4). Therefore, the RF model was determined to be the best model in this study.

Figure 4

Table 4

ModelAccuracySpecificitySensitivityF1scorePPVNPVAUC
AdaBoost0.5770.4910.9290.6420.3090.9660.729
GNB0.7610.7890.6430.7090.4290.9000.707
LR0.7890.8250.6430.7230.4740.9040.777
RF0.8170.8070.8570.8310.5230.9580.854
SVM0.6480.5790.9290.7130.3510.9710.776
XGBoost0.7740.7540.8570.8020.5000.9570.842

Predictive performance of six machine learning models in the testing set.

AdaBoost, Adaptive Boosting; GNB, Gaussian Naive Bayes; LR, Logistic Regression; RF, Random Forest; SVM, Support Vector Machines; XGBoost, Extreme Gradient Boosting; PPV, Positive Predictive Value; NPV, Negative Predictive Value; AUC, Area under receiver operating characteristic curve.

Discussion

In this study, six machine learning models were used to predict LNM status in CC patients. The machine learning models were based on a variety of preoperative hematological indicators, including routine blood indicators, routine biochemical indicators, coagulation function indicators, and tumor markers. The results of ten-fold cross-validation showed that the overall prediction performance of the RF model was better than that of the other five models, thus indicating that the model had the best stability.

In recent years, ML techniques have been widely used to identify LNM in CC patients. For example, Liu et al. (34) collected clinical features and MRI radiomics features of 180 CC patients and established 7 ML models. The results showed that among the 7 ML models, Multinomial Naive Bayes (MNB) had the most robust predictive performance, with an AUC of 0.745, an accuracy of 0.778, and a specificity of 0.900. Compared to the present study, the model needs to be improved in terms of accuracy of prediction, and the method is more costly and time-consuming to test. Guan et al. (35) collected preoperative 5-minute electrocardiograms from 292 CC patients and developed 6 ML models based on 32 heart rate variability parameters. The results showed that among the 6 ML models, the RF model had the best predictive performance (AUC of 0.852, accuracy of 0.744, sensitivity of 0.783 and specificity of 0.785). In contrast, the RF model characterized by hematological parameters in this study showed improved AUC, accuracy, sensitivity and specificity (AUC of 0.854, accuracy of 0.817, sensitivity of 0.857 and specificity of 0.807).

To improve the interpretability of machine learning models, we used coefficients to represent the feature importance of each hematological indicator. Higher feature importance indicates that the feature is more useful for predicting CC LNM. In this study, TCHO showed the highest feature importance. Increased serum TCHO levels have been reported to be a risk factor for the development of certain cancers, and serum TCHO levels have been associated with LNM in a variety of cancers, such as esophageal cancer, gastric cancer and pancreatic cancers. Sako et al. (36) found that TCHO levels in esophageal cancer patients with LNM were significantly higher than those without LNM. Wu et al. (37) demonstrated that TCHO levels in pancreatic cancer patients were significantly correlated with tumor grade and LNM. Kitayama et al. (38) reported that patients with early gastric cancer who suffered from hypercholesterolemia (TCHO ≥ 220 mg/dl) had a significantly higher rate of LNM. It has been shown that T lymphocytes play a major role in killing malignant cells, but their activity is influenced by the tumor microenvironment. High cholesterol levels upregulate the expression of immune checkpoints in T lymphocytes, which leads to a weakening of the anti-tumor function of T cells (39). In addition, Mahmoud et al. (40) found that prostate cancer cells store cholesterol and use it as energy for growth. Therefore, it is possible that elevated levels of TCHO promote malignant tumor growth and thus malignant tumorigenesis LNM. To the best of our knowledge, there are no studies on the correlation between LNM and TCHO in CC. The results of this study confirmed that TCHO levels in CC patients were significantly correlated with LNM. However, the exact mechanism of TCHO as a predictor of LNM in CC patients is unclear and requires further study.

In this study, SCC-Ag was ranked second in terms of feature importance. SCC-Ag is a specific antigen produced by squamous cell carcinoma (SCC) that has good application value for predicting LNM in cervical cancer derived from squamous cells (41, 42). Preoperative serum SCC-Ag is the tumor marker that is commonly used to predict squamous cell CC LNM (43, 44). Previous studies have suggested that preoperative high SCC-Ag levels may be associated with CC LNM (4547). Wei et al. (48) found that cancer-associated fibroblasts (CAFs) in patients with cervical squamous cell carcinoma impaired lymphatic endothelial barriers by activating the integrin-FAK/Src-VE-cadherin signaling pathway in lymphatic endothelial cells, thus consequently enhancing CC LNM.

In this study, coagulation function indicators (such as DD and FIB) also showed high feature importance. Previous studies have indicated that the coagulation function of patients with malignant tumors exhibit different degrees of abnormality (4951). This may be related to tumor cells causing changes in coagulation function through various pathways to promote tumor growth, infiltration, and metastasis (52). Similarly, the hyperactivation of the coagulation system in CC patients can promote LNM development (53, 54). In this study, the univariate analysis confirmed that the DD levels of CC patients with LNM were significantly higher than those of CC patients without LNM (P = 0.003). Remarkably, in our study, hematological indicators such as TT, APTT, PT-INR, TP, and NEUT% were also confirmed to contribute to the construction of machine learning models. However, the specific mechanism of the above-mentioned indicators as predictors of LNM in CC patients is unclear, and further studies are warranted. Furthermore, it has been suggested that some hematological parameters that were not used in this study, such as sugar chain antigen 125 (CA125), sugar chain antigen 199 (CA19-9), α fetoprotein (AFP), and alkaline phosphatase (ALP), may also be associated with LNM in CC patients, which may also provide a feasible direction for future research (55).

However, this study also had some limitations. First, the present study was a retrospective analysis derived from a single- center, and a relatively small sample size was taken into account. Therefore, further validation of predictive models will need to be conducted in a larger multicenter study to establish the robustness of the current findings. Second, hematological indicators are always affected to varying degrees by testing equipment and testing reagents. Thus, hematological indicators will need to be collected under different conditions in the future to verify the generalizability of the predictive model. Third, CC often occurs in remote areas with limited medical care, leading to some difficulties in collecting the required hematological parameters (e.g., SCC-Ag). In the future, there will be a need to use fewer hematological indicators for modeling while ensuring the performance of the ML model to improve the usability of the ML model in most areas.

Conclusion

In conclusion, we used machine learning algorithms to establish six machine learning models based on preoperative hematological indicators for the preoperative prediction of LNM status in CC patients. Ten-fold cross-validation proved that the RF model had higher stability. The higher AUC values of the RF model in the testing set indicate a better generalization performance. Our results suggested that the RF model based on preoperative hematological indicators had great potential in clinical practice. Through further validation and refinement, the RF model has the potential to help develop more effective treatment plans for cervical cancer patients through preoperative diagnosis.

Statements

Data availability statement

The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding authors.

Ethics statement

This retrospective study was approved by the Clinical Medical Research Ethics Committee of The First Affiliated Hospital of Bengbu Medical University (Bengbu, Anhui, China) (registration number: 2021KY010). The studies were conducted in accordance with the local legislation and institutional requirements. The human samples used in this study were acquired from primarily isolated as part of your previous study for which ethical approval was obtained. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and institutional requirements.

Author contributions

HZ: Formal analysis, Writing – original draft. YLW: Data curation, Writing – original draft. YS: Data curation, Writing – original draft. YQW: Writing – original draft. BS: Conceptualization, Methodology, Writing – review & editing. JL: Writing – review & editing. SZ: Conceptualization, Methodology, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This research was funded by the “512” Outstanding Talents Fostering Project of Bengbu Medical University (grant number BY51201312), the Natural Science Research Project of Anhui Educational Committee (grant number 2022AH051471) and Research project of Bengbu Medical University (grant number 2021byzd057).

Acknowledgments

We thank all of the patients who provided their clinically relevant data for this study, as well as the surgical teams who facilitated this work.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

  • 1

    SungHFerlayJSiegelRLLaversanneMSoerjomataramIJemalAet al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. (2021) 71:209–49. doi: 10.3322/caac.21660

  • 2

    CohenPAJhingranAOakninADennyL. Cervical cancer. Lancet. (2019) 393:169–82. doi: 10.1016/S0140-6736(18)32470-X

  • 3

    KilicCKimyon ComertGCakirCYukselDCodalBKilicFet al. Recurrence pattern and prognostic factors for survival in cervical cancer with lymph node metastasis. J Obstet Gynaecol Res. (2021) 47:2175–84. doi: 10.1111/jog.14762

  • 4

    AokiYSasakiMWatanabeMSatoTTsunekiIAidaHet al. High-risk group in node-positive patients with stage IB, IIA, and IIB cervical carcinoma after radical hysterectomy and postoperative pelvic irradiation. Gynecol Oncol. (2000) 77:305–9. doi: 10.1006/gyno.2000.5788

  • 5

    GienLTCovensA. Lymph node assessment in cervical cancer: prognostic and therapeutic implications. J Surg Oncol. (2009) 99:242–7. doi: 10.1002/jso.21199

  • 6

    RuengkhachornITherasakvichyaSWarnnissornMLeelaphatanaditCSangkaratSSrisombatJet al. Pathologic Risk Factors and Oncologic Outcomes in Early-stage Cervical Cancer Patients Treated by Radical Hysterectomy and Pelvic Lymphadenectomy at a Thai University Hospital: A 7 year Retrospective Review. Asian Pac J Cancer Prev. (2015) 16:5951–6. doi: 10.7314/APJCP.2015.16.14.5951

  • 7

    BhatlaNDennyL. FIGO cancer report 2018. Int J Gynaecol Obstet. (2018) 143:23. doi: 10.1002/ijgo.12608

  • 8

    HouLZhouWRenJDuXXinLZhaoXet al. Radiomics analysis of multiparametric MRI for the preoperative prediction of lymph node metastasis in cervical cancer. Front Oncol. (2020) 10:1393. doi: 10.3389/fonc.2020.01393

  • 9

    NCCN clinical practice guidelines in oncology: cervical cancer (2022. V1) (2022). Available online at: https://www.nccn.org/professionals/physician_gls/pdf/cervical.pdf (Accessed 15 October 2022).

  • 10

    BourgiotiCChatoupisKMoulopoulosLA. Current imaging strategies for the evaluation of uterine cervical cancer. World J Radiol. (2016) 8:342–54. doi: 10.4329/wjr.v8.i4.342

  • 11

    PlanteMRenaudMCTêtuBHarelFRoyM. Laparoscopic sentinel node mapping in early-stage cervical cancer. Gynecol Oncol. (2003) 91:494503. doi: 10.1016/j.ygyno.2003.08.024

  • 12

    WilliamsADCousinsCSoutterWPMubasharMPetersAMDinaRet al. Detection of pelvic lymph node metastases in gynecologic Malignancy: a comparison of CT, MR imaging, and positron emission tomography. AJR Am J Roentgenol. (2001) 177:343–8. doi: 10.2214/ajr.177.2.1770343

  • 13

    ChoiHJJuWMyungSKKimY. Diagnostic performance of computer tomography, magnetic resonance imaging, and positron emission tomography or positron emission tomography/computer tomography for detection of metastatic lymph nodes in patients with cervical cancer: meta-analysis. Cancer Sci. (2010) 101:1471–9. doi: 10.1111/j.1349-7006.2010.01532.x

  • 14

    SteccoABuemiFCassaràAMatheoudRSacchettiGMArnulfoAet al. Comparison of retrospective PET and MRI-DWI (PET/MRI-DWI) image fusion with PET/CT and MRI-DWI in detection of cervical and endometrial cancer lymph node metastases. Radiol Med. (2016) 121:537–45. doi: 10.1007/s11547-016-0626-5

  • 15

    BrunetteLLBonyadlouSJiLGroshenSShusterDMehtaAet al. Predictive value of FDG PET/CT to detect lymph node metastases in cervical cancer. Clin Nucl Med. (2018) 43:793801. doi: 10.1097/RLU.0000000000002252

  • 16

    KaźmierczakKCholewińskiWNowakowskiB. Comparison of positron emission tomography with computed tomography examination with histopathological assessment of pelvic lymph nodes in patients with cervical cancer treated surgically. Contemp Oncol (Pozn). (2021) 25:160–7. doi: 10.5114/wo.2021.109209

  • 17

    YuYYZhangRDongRTHuQYYuTLiuFet al. Feasibility of an ADC-based radiomics model for predicting pelvic lymph node metastases in patients with stage IB-IIA cervical squamous cell carcinoma. Br J Radiol. (2019) 92:20180986. doi: 10.1259/bjr.20180986

  • 18

    WangTGaoTYangJYanXWangYZhouXet al. Preoperative prediction of pelvic lymph nodes metastasis in early-stage cervical cancer using radiomics nomogram developed based on T2-weighted MRI and diffusion-weighted imaging. Eur J Radiol. (2019) 114:128–35. doi: 10.1016/j.ejrad.2019.01.003

  • 19

    YuYHeZOuyangJTanYChenYGuYet al. Magnetic resonance imaging radiomics predicts preoperative axillary lymph node metastasis to support surgical decisions and is associated with tumor microenvironment in invasive breast cancer: A machine learning, multicenter study. EBioMedicine. (2021) 69:103460. doi: 10.1016/j.ebiom.2021.103460

  • 20

    FengXHongTLiuWXuCLiWYangBet al. Development and validation of a machine learning model to predict the risk of lymph node metastasis in renal carcinoma. Front Endocrinol (Lausanne). (2022) 13:1054358. doi: 10.3389/fendo.2022.1054358

  • 21

    EresenALiYYangJShangguanJVelichkoYYaghmaiVet al. Preoperative assessment of lymph node metastasis in Colon Cancer patients using machine learning: a pilot study. Cancer Imaging. (2020) 20:30. doi: 10.1186/s40644-020-00308-z

  • 22

    MengNFengPYuXWuYFuFLiZet al. An [18F]FDG PET/3D-ultrashort echo time MRI-based radiomics model established by machine learning facilitates preoperative assessment of lymph node status in non-small cell lung cancer. Eur Radiol. (2024) 34:318–29. doi: 10.1007/s00330-023-09978-2

  • 23

    ArezzoFCormioGMongelliMCazzatoGSilvestrisEKardhashiAet al. Machine learning applied to MRI evaluation for the detection of lymph node metastasis in patients with locally advanced cervical cancer treated with neoadjuvant chemotherapy. Arch Gynecol Obstet. (2023) 307:1911–9. doi: 10.1007/s00404-022-06824-6

  • 24

    XuDWangDWangSTianYLongZRenXet al. Correlation between squamous cell carcinoma antigen level and the clinicopathological features of early-stage cervical squamous cell carcinoma and the predictive value of squamous cell carcinoma antigen combined with computed tomography scan for lymph node metastasis. Int J Gynecol Cancer. (2017) 27:1935–42. doi: 10.1097/IGC.0000000000001112

  • 25

    ZhuCZhangWWangXJiaoLChenLJiangJ. Predictive value of preoperative serum squamous cell carcinoma antigen level for lymph node metastasis in early-stage cervical squamous cell carcinoma. Med (Baltimore). (2021) 100:e26960. doi: 10.1097/MD.0000000000026960

  • 26

    GavrilescuMMHutanuIIoanidNMusinaAMMihaelaBAMoscaluMet al. Clinical value of hematological biomarkers in uterine cervical cancer. Chirurgia (Bucur). (2016) 111:493–9. doi: 10.21614/chirurgia.111.6.493

  • 27

    KhodabandeluSGhaemianNKhafriSEzojiMKhaleghiS. Development of a machine learning-based screening method for thyroid nodules classification by solving the imbalance challenge in thyroid nodules data. J Res Health Sci. (2022) 22:e00555. doi: 10.34172/jrhs.2022.90

  • 28

    TaftLMEvansRSShyuCREggerMJChawlaNMitchellJAet al. Countering imbalanced datasets to improve adverse drug event predictive models in labor and delivery. J BioMed Inform. (2009) 42:356–64. doi: 10.1016/j.jbi.2008.09.001

  • 29

    KimSHalabiS. High dimensional variable selection with error control. BioMed Res Int. (2016) 2016:8209453. doi: 10.1155/2016/8209453

  • 30

    PengHLongFDingC. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell. (2005) 27:1226–38. doi: 10.1109/TPAMI.2005.159

  • 31

    DitzlerGMorrisonJCLanYRosenGL. Fizzy: feature subset selection for metagenomics. BMC Bioinf. (2015) 16:358. doi: 10.1186/s12859-015-0793-8

  • 32

    LiWSongYChenKYingJZhengZQiaoSet al. Predictive model and risk analysis for diabetic retinopathy using machine learning: a retrospective cohort study in China. BMJ Open. (2021) 11:e050989. doi: 10.1136/bmjopen-2021-050989

  • 33

    DeLongERDeLongDMClarke-PearsonDL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. (1988) 44:837–45. doi: 10.2307/2531595

  • 34

    LiuSZhouYWangCShenJZhengY. Prediction of lymph node status in patients with early-stage cervical cancer based on radiomic features of magnetic resonance imaging (MRI) images. BMC Med Imaging. (2023) 23:101. doi: 10.1186/s12880-023-01059-6

  • 35

    GuanWWangYZhaoHLuHZhangSLiuJet al. Prediction models for lymph node metastasis in cervical cancer based on preoperative heart rate variability. Front Neurosci. (2024) 18:1275487. doi: 10.3389/fnins.2024.1275487

  • 36

    SakoAKitayamaJKaisakiSNagawaH. Hyperlipidemia is a risk factor for lymphatic metastasis in superficial esophageal carcinoma. Cancer Lett. (2004) 208:43–9. doi: 10.1016/j.canlet.2003.11.010

  • 37

    WuBShenWWangXWangJZhongZZhouZet al. Plasma lipid levels are associated with the CD8+ T-cell infiltration and prognosis of patients with pancreatic cancer. Cancer Med. (2023) 12:14138–48. doi: 10.1002/cam4.6080

  • 38

    KitayamaJHatanoKKaisakiSSuzukiHFujiiSNagawaH. Hyperlipidaemia is positively correlated with lymph node metastasis in men with early gastric cancer. Br J Surg. (2004) 91:191–8. doi: 10.1002/bjs.4391

  • 39

    ZhangJWangYFWuBZhongZXWangKXYangLQet al. Intraepithelial attack rather than intratumorally infiltration of CD8+T lymphocytes is a favorable prognostic indicator in pancreatic ductal adenocarcinoma. Curr Mol Med. (2017) 17:689–98. doi: 10.2174/1566524018666180308115705

  • 40

    YangWBaiYXiongYZhangJChenSZhengXet al. Potentiating the antitumour response of CD8(+) T cells by modulating cholesterol metabolism. Nature. (2016) 531:651–5. doi: 10.1038/nature17412

  • 41

    GaarenstroomKNBonfrerJMKorseCMKenterGGKenemansP. Value of Cyfra 21-1, TPA, and SCC-Ag in predicting extracervical disease and prognosis in cervical cancer. Anticancer Res. (1997) 17:2955–8.

  • 42

    GaarenstroomKNKenterGGBonfrerJMKorseCMVan de VijverMJFleurenGJet al. Can initial serum cyfra 21-1, SCC antigen, and TPA levels in squamous cell cervical cancer predict lymph node metastases or prognosis? Gynecol Oncol. (2000) 77:164–70. doi: 10.1006/gyno.2000.5732

  • 43

    FengSYZhangYNLiuJG. Risk factors and prognosis of node-positive cervical carcinoma. Ai Zheng. (2005) 24:1261–6.

  • 44

    OlthofEPvan der AaMAAdamJAStalpersLJAWenzelHHBvan der VeldenJet al. The role of lymph nodes in cervical cancer: incidence and identification of lymph node metastases-a literature review. Int J Clin Oncol. (2021) 26:1600–10. doi: 10.1007/s10147-021-01980-2

  • 45

    TakedaMSakuragiNOkamotoKTodoYMinobeSNomuraEet al. Preoperative serum SCC, CA125, and CA19-9 levels and lymph node status in squamous cell carcinoma of the uterine cervix. Acta Obstet Gynecol Scand. (2002) 81:451–7. doi: 10.1034/j.1600-0412.2002.810513.x

  • 46

    ChoiKHLeeSWYuMJeongSLeeJWLeeJH. Significance of elevated SCC-Ag level on tumor recurrence and patient survival in patients with squamous-cell carcinoma of uterine cervix following definitive chemoradiotherapy: a multi-institutional analysis. J Gynecol Oncol. (2019) 30:e1. doi: 10.3802/jgo.2019.30.e1

  • 47

    GuoQZhuJWuYWenHXiaLWuXet al. Predictive value of preoperative serum squamous cell carcinoma antigen (SCC-Ag) level on tumor recurrence in cervical squamous cell carcinoma patients treated with radical surgery: A single-institution study. Eur J Surg Oncol. (2020) 46:131–8. doi: 10.1016/j.ejso.2019.08.021

  • 48

    WeiWFChenXJLiangLJYuLWuXGZhouCFet al. Periostin+ cancer-associated fibroblasts promote lymph node metastasis by impairing the lymphatic endothelial barriers in cervical squamous cell carcinoma. Mol Oncol. (2021) 15:210–27. doi: 10.1002/1878-0261.12837

  • 49

    TikhomirovaIPetrochenkoEMalyshevaYRyabovMKislovN. Interrelation of blood coagulation and hemorheology in cancer. Clin Hemorheol Microcirc. (2016) 64:635–44. doi: 10.3233/CH-168037

  • 50

    MartinezCCohenATBamberLRietbrockS. Epidemiology of first and recurrent venous thromboembolism: a population-based cohort study in patients without active cancer. Thromb Haemost. (2014) 112:255–63. doi: 10.1160/TH13-09-0793

  • 51

    Langouo FontsaMAielloMMMiglioriEScartozziMLambertiniMWillard-GalloKet al. Thromboembolism and immune checkpoint blockade in cancer patients: an old foe for new research. Target Oncol. (2022) 17:497505. doi: 10.1007/s11523-022-00908-8

  • 52

    FalangaAMarchettiMVignoliA. Coagulation and cancer: biological and clinical aspects. J Thromb Haemost. (2013) 11:223–33. doi: 10.1111/jth.12075

  • 53

    NakamuraKNakayamaKIshikawaMKatagiriHMinamotoTIshibashiTet al. High pre-treatment plasma D-dimer level as a potential prognostic biomarker for cervical carcinoma. Anticancer Res. (2016) 36:2933–8.

  • 54

    ZhaoKDengHQinYLiaoWLiangW. Prognostic significance of pretreatment plasma fibrinogen and platelet levels in patients with early-stage cervical cancer. Gynecol Obstet Invest. (2015) 79:2533. doi: 10.1159/000365477

  • 55

    YuJZhengQDingXZhengBChenXChenBet al. Systematic re-analysis strategy of serum indices identifies alkaline phosphatase as a potential predictive factor for cervical cancer. Oncol Lett. (2019) 18:2356–65. doi: 10.3892/ol

Summary

Keywords

cervical cancer, lymph node metastasis, machine learning, hematological indicators, preoperative prediction

Citation

Zhao H, Wang Y, Sun Y, Wang Y, Shi B, Liu J and Zhang S (2024) Hematological indicator-based machine learning models for preoperative prediction of lymph node metastasis in cervical cancer. Front. Oncol. 14:1400109. doi: 10.3389/fonc.2024.1400109

Received

13 March 2024

Accepted

29 July 2024

Published

13 August 2024

Volume

14 - 2024

Edited by

Giuseppe Vizzielli, University of Udine, Italy

Reviewed by

Cristina Taliento, University of Ferrara, Italy

Veronica Tius, University of Udine, Italy

Updates

Copyright

*Correspondence: Sai Zhang, ; Jian Liu,

†These authors have contributed equally to this work

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics