A comprehensive approach for osteoporosis detection through chest CT analysis and bone turnover markers: harnessing radiomics and deep learning techniques

Purpose The main objective of this study is to assess the possibility of using radiomics, deep learning, and transfer learning methods for the analysis of chest CT scans. An additional aim is to combine these techniques with bone turnover markers to identify and screen for osteoporosis in patients. Method A total of 488 patients who had undergone chest CT and bone turnover marker testing, and had known bone mineral density, were included in this study. ITK-SNAP software was used to delineate regions of interest, while radiomics features were extracted using Python. Multiple 2D and 3D deep learning models were trained to identify these regions of interest. The effectiveness of these techniques in screening for osteoporosis in patients was compared. Result Clinical models based on gender, age, and β-cross achieved an accuracy of 0.698 and an AUC of 0.665. Radiomics models, which utilized 14 selected radiomics features, achieved a maximum accuracy of 0.750 and an AUC of 0.739. The test group yielded promising results: the 2D Deep Learning model achieved an accuracy of 0.812 and an AUC of 0.855, while the 3D Deep Learning model performed even better with an accuracy of 0.854 and an AUC of 0.906. Similarly, the 2D Transfer Learning model achieved an accuracy of 0.854 and an AUC of 0.880, whereas the 3D Transfer Learning model exhibited an accuracy of 0.740 and an AUC of 0.737. Overall, the application of 3D deep learning and 2D transfer learning techniques on chest CT scans showed excellent screening performance in the context of osteoporosis. Conclusion Bone turnover markers may not be necessary for osteoporosis screening, as 3D deep learning and 2D transfer learning techniques utilizing chest CT scans proved to be equally effective alternatives.


Background
Bone turnover markers (BTM) are biochemical substances produced during the dynamic process of bone remodeling, providing a timely and accurate reflection of bone turnover in the human body (1).These markers play a pivotal role in the diagnosis and treatment of osteoporosis, a prevalent bone disorder (2).However, the diagnosis of osteoporosis primarily relies on DXA (3).Although there is some correlation between BTM and BMD, this correlation is not robust enough for the diagnosis of osteoporosis (4).These limitations have hindered the widespread use of these markers.The diagnosis of osteoporosis currently requires quantitative CT or DXA examinations, which may increase additional costs (5).Furthermore, the availability of these devices is limited, particularly QCT, in many medical centers.Simultaneously, it is worth noting that the screening rate for osteoporosis remains unsatisfactory, likely indicating a lack of comprehensive understanding regarding the intricacies of the disease (6).
Chest CT is a crucial and commonly performed medical checkup.Regular chest CT scans are recommended for certain populations, particularly the elderly, who are considered to be at higher risk for lung cancer (7).If this examination can successfully diagnose osteoporosis, it may potentially eliminate the need for DXA scans, thereby reducing radiation exposure.The technologies of deep learning and radiomics provide possibilities for the implementation of this idea.Radiomics refers to the extraction of data that can be analyzed from medical imaging, and it has been extensively applied in enhancing the accuracy of medical diagnosis, prognosis, and clinical decision-making.Its application aims to achieve precise medical treatment (8).This technology has gained widespread adoption and its efficacy has been validated (9,10).Deep learning is also extensively utilized in the field of medicine.This technology is not only applied for disease diagnosis but also widely employed for the automatic segmentation of medical images (11).This technology has also been employed in the diagnosis of osteoporosis and has yielded promising outcomes.Previously reported studies primarily focused on analyzing 2D images such as lumbar and hip X-rays, using deep learning techniques to diagnose osteoporosis in patients (12).However, in medical imaging, three-dimensional images such as CT scans and MRI scans are more commonly used.In this regard, employing pretrained 3D deep learning models can significantly enhance the analysis of such medical images.
In this study, we aim to develop a comprehensive screening model for osteoporosis by integrating patient demographics and bone turnover markers.We also utilize radiomics techniques and both 2D and 3D deep learning algorithms to analyze chest CT scans and identify potential cases of osteoporosis.To extract transfer learning, transfer learning will be employed.Transfer learning enables the acquisition of valuable features from a source domain, encoding these features, and transferring them from the source domain to the target domain, thus effectively enhancing the performance of the target domain task (13).
This study aims to identify the most optimal methods for osteoporosis screening utilizing chest CT scans.It will explore and compare various techniques including radiomics, 2D and 3D deep learning, and 2D and 3D transfer learning techniques.Additionally, these methods will be compared to conventional bone turnover markers for their efficacy in osteoporosis screening.

Participants in the study and development of clinical models
This study retrospectively analyzed a population of patients who underwent both chest CT scans and DXA bone density testing at a large hospital from January 2019 to May 2023.Patients with the following conditions will be excluded from the scope of the study: Severe scoliosis, both the eleventh and twelfth thoracic vertebrae with severe compressibility fractures that cannot be corrected, the fixed artifacts affecting the feature extraction area, and no results of bone turnover marker examination.Approval was obtained from the Hospital Institutional Review Board, and the study was conducted in compliance with the principles outlined in the Declaration of Helsinki.Almost all patients underwent chest CT scans and bone metabolism marker detection.The BTM included in the analysis were vitamin D, total type 1 collagen amino acid extension peptide (TPINP), and b-B-Cross Laps.The gold standard for distinguishing osteoporosis was the result of DXA, whereby a T-value of -2.5 or less indicated the presence of the condition (12).
The patients were randomly assigned to training sets, and their baseline data is depicted in Table 1.The clinical characteristics of the patients were analyzed using either an independent sample ttest or chi-square test, depending on the type of data.

Clinical signature
The training set data underwent initial univariate analysis to identify factors with a p-value less than 0.05, indicating their significance.These selected clinical factors were then used for subsequent multivariate analysis.

Delineation of ROI
The preprocessing step involved adjusting the window width and window level of the images to bone windows, as well as standardizing the resolution of the images.Furthermore, for the purpose of standardization, all images will undergo adjustment to ensure consistent layer thickness and spacing.Radiomics feature extraction primarily focused on the twelfth thoracic vertebra during the analysis.In cases where measurement difficulties were encountered, the eleventh thoracic vertebra was selected instead to minimize potential deviations.This approach considered the physical susceptibility of the chest and waist, which are common areas for osteoporotic fractures resulting from movement and pressure (24).Axial images from the chest CT scans were chosen for analysis, and image reconstruction and delineation of the region of interest (ROI) were performed using ITK SNAP software (25).Typically, the images captured both vertebral bodies in the chest CT scans.Anatomical markers, such as the twelfth rib, the lower edge of the scapula, and the seventh cervical spine spinous process, were used to outline the segment of the thoracic spine for ROI delineation.The process of ROI drawing is illustrated in Figure 1A.

Intra-and inter-observer variability
The intra-and inter-observer variability of the ROI delineation on the CT images was evaluated using the Intraclass correlation coefficient (ICC).One researcher defined the ROI, while another researcher with more than 10 years of experience in orthopedics randomly selected 30 cases and redefined the ROI.Both researchers were unaware of each other's results.The ICC values were calculated based on these 30 cases.Prior to feature selection, an assessment of intra-observer variability will be performed on the extracted radiomics features of all patients.Features exhibiting intraclass correlation coefficients (ICCs) exceeding 0.9 will be deemed reliable and will proceed to subsequent analyses.

Feature selection
In order to identify the most relevant features associated with the presence of osteoporosis, a meticulous feature selection process was  implemented.Initially, the U test (p<0.05)was employed to identify features that exhibited significant differences between the osteoporosis and non-osteoporosis groups.Furthermore, to ensure the inclusion of only statistically significant and reliable features, those with ICC coefficients lower than 0.9 were excluded from this step.This rigorous approach effectively reduced the number of features while maintaining their predictive power.To address the issue of multicollinearity, Pearson correlation analysis was conducted to examine the relationships between features.Calculation of correlation coefficients allowed for identification of feature pairs with values ≥0.9 or ≤-0.9.In such cases, only the feature demonstrating superior diagnostic performance was retained, thereby preventing redundancy within the model introduced by highly correlated features.Employing the Maximum Correlation Minimum Redundancy (mRMR) algorithm for feature selection, we retain only the top 20 most informative features.To further refine the feature set, the least absolute shrinkage and selection operator (LASSO) logistic regression technique was employed.We employed the same machine learning model used for analyzing clinical features to analyze the extracted radiomics features.

2D deep learning
The maximum cross-sectional area of ROI should be selected at first, as it represents the most prominent area of the thoracic vertebral body.These areas can then be cropped from the original CT image using Python.The source code for the cropping process is available open source and can be obtained from the CSDN website (https://blog.csdn.net/).
In this study, a pre-trained model was employed, and the researchers made no alterations to its parameters.Consequently, the study lacked a validation group, comprising solely a training group and a testing group (26).The division of this group aligns with the approach used in prior studies involving clinical and radiomics models.
The cropped image serves as the input for deep learning algorithms.To update the model parameters, the stochastic gradient descent (SGD) optimizer is utilized.The training process consists of 100 epochs, each containing 1800 iterations.A batch size of 32 is used during these iterations.Each slice of the cropped image is treated as an independent input for the deep learning model.

3D deep learning
The complete Region of Interest (ROI) is extracted and serves as both the training and testing dataset for the 3D deep learning model.To update the model parameters, the stochastic gradient descent (SGD) optimizer is utilized.The training process consists of 100 epochs, each containing 1800 iterations.A batch size of 4 is used during these iterations.Each slice of the cropped image is treated as an independent input for the deep learning model.The 8 most common deep learning neural network architectures are used for learning and recognizing images of these patients.These models are denseNet121, resnet10, resnet101, resnet152, resnet18, resnet34, resnet50, shuffleNet.The parameter settings for 3D deep learning mirror those utilized in 2D deep learning.

Transfer learning extraction
After completing both 2D and 3D deep learning, the most efficient deep learning model will be chosen for feature extraction.
Once the feature extraction is finalized, these features will undergo a screening process identical to that used for radiomics features.Additionally, the same machine learning models would be employed for training and testing these features.

Statistical analysis
The study will assess the efficacy of osteoporosis screening through a comparative analysis of radiomics models, deep learning models, transfer learning models, and clinical models.Ultimately, we will identify the model that demonstrates the highest screening efficiency.
To evaluate the performance of the model, data from the test set will be used.The effectiveness of the model will be assessed using the Area Under Curve (AUC) (27), a commonly employed metric in evaluating the performance of predictive models.The AUC provides a comprehensive measure of the model's discriminatory ability and will be used to determine the overall quality of the predictions made by the model.
The patient's baseline data were analyzed using statistical software packages, specifically SPSS (version 20.0) and Python.Continuous variables were presented as mean ± standard deviation, while categorical variables were described using frequencies and percentages.To assess the distribution of continuous variables, the Kolmogorov-Smirnov (KS) (28) test was employed.Additionally, the Levene test ( 29) was used to evaluate the homogeneity of continuous variances.To compare inter-group differences, the or Student's t-test was used, depending on the distribution of the variables.For categorical variables, the Chisquared test or Fisher's exact test was employed.Statistical significance was defined as a p-value < 0.05.The AUC was used to evaluate the performance of predictive models, and the 95% confidence interval (CI) of the AUC was calculated using the bootstrap method with 1000 intervals.To compare the AUCs of different models, the DeLong testing method was applied, enabling a statistical assessment of the differences in performance metrics between the models (30).The study aims to compare the performance of radiomics features, transfer learning models, and clinical features in different models.The most effective model will then be compared to the performance of deep learning in order to identify the optimal method for screening patients for osteoporosis using chest CT scans.The Selection Criteria: Evaluating the Performance of Machine Learning Models for Osteoporosis Screening in the Test Group, Prioritizing Accuracy and AUC.

Results
A total of 488 patients were included in the study and randomly divided into a training group and a testing group.In the training group, out of a total of 170 patients, none were diagnosed with osteoporosis, while 222 patients were identified as osteoporosis patients.Similarly, in the test group, 40 patients were found to be free from osteoporosis, while 56 patients were diagnosed with the condition.Figures 1B-E shows the process of feature extractionfor clinical models, radiomics models, and 2D/3D transfer learning models, respectively

Screening of risk factors for osteoporosis
In the univariate analysis, the p-values of gender, age, vitamin D, TPINP, and b-cross, were found to be less than 0.05.These indicators were subsequently chosen for the multivariate analysis.In the multivariate analysis, the p-values of indicators such as gender, age, and b-cross also remained below the 0.05 threshold.Based on these results, these indicators were selected as the foundation for establishing clinical models.The outcomes of both single factor analysis and multivariable analysis are presented in Table 2. Table 3 and Figure 2A displays the performance of these features in the machine learning models.In the testing group, the AdaBoost model exhibited the highest performance.The accuracy of this model is 0.698, and the AUC is 0.665.

Establishment of radiomics model
1834 radiomics features were extracted for each patient.Finally, 14 features were ultimately included in the study.Table 4 and Figure 2B displays the performance of these radiomics features in the machine learning models.Figure 2C shows the importance ranking of filtered radiomics features among six tree models.In the testing group, the LR model showcased the best performance.The accuracy of this model in the test group is 0.750, and the AUC is 0.739.

Efficiency of 2D deep learning models
After performing image processing and inputting the data, a total of 24 2D deep learning models were employed to detect osteoporosis using the maximum cross-sectional area of the ROI in chest CT scans.These findings are summarized in Table 5 and visually presented in Figure 3. Additionally, the visualization results of the model can be observed in Figure 4.Among the various models tested for screening osteoporosis through chest CT, ResNet152 exhibited the most optimal performance.The accuracy of this model in the test group is 0.812, and the AUC is 0.855.

Efficiency of 3D deep learning models
After performing image processing and inputting the data, a total of 8 3D deep learning models were employed to detect osteoporosis using the all regions of the region of interest (ROI) in chest CT scans.These findings are summarized in Table 6 and visually presented in Figure 5.Among the various models tested for screening osteoporosis through chest CT, ResNet10 exhibited the

Extraction and efficiency of 2D transfer learning
ResNet152, the most potent model in the arena of 2D deep learning, was identified as the top choice for feature extraction in the domain of 2D deep transfer learning based on the results of the previous step.In the testing group, the SVM model showcased the best performance.The accuracy of this model in the test group is 0.854, and the AUC is 0.880.Table 7 and Figure 2D displays the performance of these features.

Extraction and efficiency of 3D transfer learning
ResNet10, the most potent model in the arena of 3D deep learning, was identified as the top choice for feature extraction in the domain of 3D deep transfer learning based on previous research findings.In the testing group, the MLP model showcased the best performance.The accuracy of this model in the test group is 0.740, and the AUC is 0.737.Table 8 and Figure 2E displays the performance of these features.

Comparison of the effectiveness of screening osteoporosis
Table 9 presents the comparison of the effectiveness of various features in screening for osteoporosis among machine learning models.In the LR and AdaBoost models, the radiomics features were found to be more effective in screening for osteoporosis compared to clinical features.However, in the other models, there was no statistically significant difference between the effectiveness of the two feature types.On the other hand, the effectiveness of 3D transfer learning model was not superior to clinical and radiomics features in any of the models.Furthermore, among all the models, the 2D transfer learning model were superior to clinical features in screening for osteoporosis.Additionally, the effectiveness of 2D transfer learning was found to be superior to 3D transfer learning in all models.Moreover, when considering the seven models (LR,   NaiveBayes, SVM, KNN, LightGBM, GradientBoosting, and MLP), the effectiveness of 2D transfer learning model in screening for osteoporosis was superior to radiomics features.

Assessing the effectiveness of the optimal machine learning model and deep learning technology for osteoporosis screening
The optimal machine models for screening osteoporosis based on each feature were chosen as the reference for comparison with deep learning techniques.When utilizing clinical features for osteoporosis screening, the AdaBoost model demonstrates the highest performance.The LR model, on the other hand, shows the best performance when employing radiomics features for osteoporosis screening.For 2D transfer learning, the SVM model exhibits the most optimal performance, while for 3D transfer learning, the MLP model shows the highest performance.Among the various 2D deeplearning models, ResNet152 exhibited the most optimal performance.Among the various 3D deeplearning models, ResNet10 exhibited the most optimal performance.The comparison between these models is presented in Table 10.In the test group, the AUC (Area Under the Curve) did not show any significant difference between 2D deep learning and 3D deep learning  Frontiers in Endocrinology frontiersin.orgmethods.However, when compared to clinical models, radiomics models, and 3D transfer learning models, the AUC of 3D deep learning was significantly better.Interestingly, there was no statistical difference in AUC when comparing 3D deep learning with 2D transfer learning models.On the other hand, the AUC of 2D deep learning was superior to clinical models, but there was no statistically significant difference between the AUC of 2D deep learning and radiomics models or 2D transfer learning models or 3D transfer learning models.These results are presented in Table 10.

Discussion
Our study provides initial evidence supporting the potential of using chest CT for osteoporosis screening.Moreover, we observed that deep learning technology, and transfer learning technology based on chest CT are more effective than bone transition biomarkers for screening osteoporosis.Typically, in a tertiary hospital in China, the cost of a chest CT examination is around $26, while a DXA examination costs approximately $23. On the other hand, a bone turnover marker examination is priced at around $48. Osteoporosis is a silent and widespread condition, making screening crucial for identifying potential patients early on (31).Our findings suggest that conducting bone turnover biomarker testing solely for the purpose of osteoporosis screening may not be necessary.On the other hand, chest CT scans serve multiple purposes such as lung tumor screening and exclusion of pneumonia (32).Elderly individuals and the female demographic, with a particular emphasis on Asian women, are disproportionately susceptible to lung cancer.Consequently, some experts advocate for the inclusion of chest CT scans as part of routine health screenings for these groups (33).Interestingly, our study also found a correlation between age, gender, and osteoporosis, which coincides with the population commonly advised to undergo regular chest CT examinations.Older age and female gender have been consistently identified as risk factors for osteoporosis in various studies (34).Specifically, postmenopausal women in the older age group are considered a high-risk population for this condition (35).Regular chest CT examinations are often recommended for individuals in this group (36).As DXA screening for osteoporosis has not been widely adopted due to limited awareness regarding the risks associated with osteoporosis (37), utilizing chest CT for osteoporosis screening can not only benefit potential patients but also help save a substantial amount of money for medical insurance funds.By combining osteoporosis screening with routine chest CT scans, we can effectively identify atrisk individuals and allocate resources more efficiently.
There are various reasons why bone turnover markers cannot be used for osteoporosis screening.In our study, we examined three different BTM as research subjects, which were vitamin D, TPINP, and b-Cross.Initially, in the univariate regression analysis, all three markers were found to have associations with the occurrence of osteoporosis.However, in the subsequent multivariable analysis, it was determined that only b-Cross showed a significant relationship with the occurrence of osteoporosis, along with the variables of age and gender.Sufficient levels of vitamin D have been shown to enhance the absorption of calcium and facilitate the process of bone mineralization (38).Vitamin D deficiency is a prevalent issue that warrants attention, and it is not limited to individuals with osteoporosis (39).Furthermore, many osteoporosis patients are  This also provides a preliminary screening for the presence of osteoporosis for patients who undergo regular chest CT examinations.The first features to be used were extracted through radiomics methods.As a newly developed technology, computed tomography (CT) radiomics has the ability to identify radiomic features that are challenging to recognize visually.This advanced approach offers a convenient, comprehensive, and accurate method for diagnosing osteoporosis (44).In our study, the radiomics model demonstrated an accuracy of 0.750 and an AUC of 0.739 for recognizing osteoporosis, the 95% confidence interval is 0.6321-0.8456.Compared to clinical models, radiomics models have shown better potential for osteoporosis screening in some machine learning models.However, compared to other previous studies, such as using HU values on chest CT to screen for osteoporosis with accuracy and AUC of 0.831 and 0.972 (45), the effectiveness of using chest CT radiomics technology to screen for osteoporosis in this study is still unsatisfactory.
Deep learning technology has emerged as a valuable tool for the diagnosis of osteoporosis, with numerous studies demonstrating its effectiveness (46).In our study, we employed a combination of 2D and 3D deep learning models to screen for osteoporosis using chest CT scans.Specifically, we utilized 24 widely used 2D deep learning models and 8 commonly used 3D deep learning models.The effectiveness of 2D and 3D deep learning models based on chest CT scans in screening for osteoporosis is significantly improved compared to clinical models that rely on bone turnover markers.While there was no statistically significant difference in performance between 2D and 3D deep learning models in the test group, it was observed that the 3D deep learning model outperformed radiomics models in terms of performance.The method of extracting 2D transfer learning has been proven to improve the effectiveness of disease prediction (47).In our study, the 2D transfer learning model showed good performance, with an accuracy of 0.854 and an AUC of 0.880.However, it is worth noting that the 3D transfer learning model did not demonstrate a better AUC (Area Under the Curve), possibly Recognizing several limitations of this study is of utmost importance.Firstly, the absence of external validation is a noteworthy concern and should be given priority in future research efforts.Secondly, it is worth noting that the ROI

FIGURE 2
FIGURE 2 Effectiveness of radiomics models, clinical models, 2D transfer learning models, and 3D transfer models.(A) Effectiveness of clinical models.(A.1) Utilizing Machine Learning Models for Osteoporosis Screening Based on Clinical Features.(A.2) Assessing the Accuracy of Machine Learning Models for Osteoporosis Screening using Clinical Features.(A.3) Evaluation of Machine Learning Models in the Testing Group, Leveraging Clinical Features for Osteoporosis Screening: AUC and Sensitivity Analysis.(B) Effectiveness of radiomics models.(B.1) Utilizing Machine Learning Models for Osteoporosis Screening Based on Radiomics Features.(B.2) Assessing the Accuracy of Machine Learning Models for Osteoporosis Screening using Radiomics Features.(B.3) Evaluation of Machine Learning Models in the Testing Group, Leveraging Radiomics Features for Osteoporosis Screening: AUC and Sensitivity Analysis.(C) Weights of radiomics features in tree models.(D) Effectiveness of 2D Transfer Learning Model.(D.1) Utilizing Machine Learning Models for Osteoporosis Screening Based on 2D Transfer Learning.(D.2) Assessing the Accuracy of Machine Learning Models for Osteoporosis Screening using 2D Transfer Learning.(D.3) Evaluation of Machine Learning Models in the Testing Group, Leveraging 2D Transfer Learning for Osteoporosis Screening: AUC and Sensitivity Analysis.(E) Effectiveness of 3D Transfer Learning Model.(E.1) Utilizing Machine Learning Models for Osteoporosis Screening Based on 3D Transfer Learning.(E.2) Assessing the Accuracy of Machine Learning Models for Osteoporosis Screening using 3D Transfer Learning.(E.3) Evaluation of Machine Learning Models in the Testing Group, Leveraging 3D Transfer Learning for Osteoporosis Screening: AUC and Sensitivity Analysis.

FIGURE 3
FIGURE 3Effectiveness of 2D deep learning models.

FIGURE 4
FIGURE 4Visualization of 2D deep learning models.

FIGURE 5
FIGURE 5Effectiveness of 3D deep learning models in test group.

TABLE 1
The baseline clinical characteristics of patients.

TABLE 2
Screening of risk factors for osteoporosis and establishment of clinical models.
TPINP, Total type I collagen amino terminal extender peptide.b-CTX, b-Cross Laps.

TABLE 3
Effectiveness of clinical model.

TABLE 4
Effectiveness of radiomics model.
(43)s is a marker that indicates the level of bone resorption activity by osteoclasts(42).It is widely recognized as the most effective bone turnover marker for identifying the presence of osteoporosis in patients(43).In our research, b-Cross is believed to indicate the occurrence of osteoporosis and constitutes a clinical model with two variables: gender and age.However, the clinical efficacy of this model is not entirely satisfactory, with an accuracy of 0.698 and an AUC of 0.665.However, the imaging features obtained through chest CT imaging greatly improve the accuracy of identifying osteoporosis.

TABLE 8 Continued
(47)to the overfitting phenomenon caused by the recognition of excessive image information by the extracted 3D deep learning model(47).The development of enhanced deep learning models based on 3D medical images holds the potential to further improve this phenomenon.Although 2D transfer learning models demonstrate better AUC when compared to standard 2D deep learning models, they do not significantly outperform 3D deep learning models.Therefore, researchers suggest that both 3D deep learning technology and 2D transfer learning technology should be prioritized when utilizing chest CT scans for osteoporosis screening.

TABLE 9
Comparison of the effectiveness of screening osteoporosis through clinical models and radiomics, 2D deep learning features, and 3D deep learning features by Delong test.

TABLE 10
Comparison of the optimal machine learning model and deep learning technology for osteoporosis screening by delong test.utilized in this study employed a combination of manual and semi-automatic methods.In conclusion, our study indicates that bone turnover markers may not be necessary for osteoporosis screening.Instead, a combination of 3D deep learning and 2D transfer learning techniques based on chest CT scans can be considered as effective alternatives for osteoporosis screening. delineation