A radiomics approach to the diagnosis of femoroacetabular impingement

Introduction Femoroacetabular Impingement (FAI) is a hip pathology characterized by impingement of the femoral head-neck junction against the acetabular rim, due to abnormalities in bone morphology. FAI is normally diagnosed by manual evaluation of morphologic features on magnetic resonance imaging (MRI). In this study, we assess, for the first time, the feasibility of using radiomics to detect FAI by automatically extracting quantitative features from images. Material and methods 17 patients diagnosed with monolateral FAI underwent pre-surgical MR imaging, including a 3D Dixon sequence of the pelvis. An expert radiologist drew regions of interest on the water-only Dixon images outlining femur and acetabulum in both impingement (IJ) and healthy joints (HJ). 182 radiomic features were extracted for each hip. The dataset numerosity was increased by 60 times with an ad-hoc data augmentation tool. Features were subdivided by type and region in 24 subsets. For each, a univariate ANOVA F-value analysis was applied to find the 5 features most correlated with IJ based on p-value, for a total of 48 subsets. For each subset, a K-nearest neighbor model was trained to differentiate between IJ and HJ using the values of the radiomic features in the subset as input. The training was repeated 100 times, randomly subdividing the data with 75%/25% training/testing. Results The texture-based gray level features yielded the highest prediction max accuracy (0.972) with the smallest subset of features. This suggests that the gray image values are more homogeneously distributed in the HJ in comparison to IJ, which could be due to stress-related inflammation resulting from impingement. Conclusions We showed that radiomics can automatically distinguish IJ from HJ using water-only Dixon MRI. To our knowledge, this is the first application of radiomics for FAI diagnosis. We reported an accuracy greater than 97%, which is higher than the 90% accuracy for detecting FAI reported for standard diagnostic tests (90%). Our proposed radiomic analysis could be combined with methods for automated joint segmentation to rapidly identify patients with FAI, avoiding time-consuming radiological measurements of bone morphology.

Introduction: Femoroacetabular Impingement (FAI) is a hip pathology characterized by impingement of the femoral head-neck junction against the acetabular rim, due to abnormalities in bone morphology. FAI is normally diagnosed by manual evaluation of morphologic features on magnetic resonance imaging (MRI). In this study, we assess, for the first time, the feasibility of using radiomics to detect FAI by automatically extracting quantitative features from images. Material and methods: 17 patients diagnosed with monolateral FAI underwent pre-surgical MR imaging, including a 3D Dixon sequence of the pelvis. An expert radiologist drew regions of interest on the water-only Dixon images outlining femur and acetabulum in both impingement (IJ) and healthy joints (HJ). 182 radiomic features were extracted for each hip. The dataset numerosity was increased by 60 times with an ad-hoc data augmentation tool. Features were subdivided by type and region in 24 subsets. For each, a univariate ANOVA F-value analysis was applied to find the 5 features most correlated with IJ based on p-value, for a total of 48 subsets. For each subset, a K-nearest neighbor model was trained to differentiate between IJ and HJ using the values of the radiomic features in the subset as input. The training was repeated 100 times, randomly subdividing the data with 75%/25% training/testing. Results: The texture-based gray level features yielded the highest prediction max accuracy (0.972) with the smallest subset of features. This suggests that the gray image values are more homogeneously distributed in the HJ in comparison to IJ, which could be due to stress-related inflammation resulting from impingement. Conclusions: We showed that radiomics can automatically distinguish IJ from HJ using water-only Dixon MRI. To our knowledge, this is the first application of radiomics for FAI diagnosis. We reported an accuracy greater than 97%, which is higher than the 90% accuracy for detecting FAI reported for standard diagnostic tests (90%). Our proposed radiomic analysis could be combined with methods for automated joint segmentation to rapidly identify patients with FAI, avoiding time-consuming radiological measurements of bone morphology.

Introduction
Femoroacetabular impingement (FAI) is a common cause of hip pain in young adults with an estimated incidence of 54.4 per 1,00,000 person-years (1). FAI is characterized by impingement of the femoral head-neck junction against the acetabular rim during hip joint motion due to morphologic abnormalities of the proximal femur and acetabulum (2)(3)(4). There are two distinct pathoanatomic types of FAI, although mixed types are commonly detected at arthroscopy (5). Cam FAI is caused by decreased offset and asphericity of the femoral head-neck junction, while Pincer FAI is due to focal or generalized acetabular over-coverage (3,4). Although the natural history of FAI is unknown, early diagnosis and appropriate surgical treatment of the condition has been shown to reduce symptoms and improve function, at least in the shortterm (6).
Imaging plays an important role in the diagnosis of FAI as distinguishing the disorder from other causes of hip pain is challenging using clinical history and physical examination (7). Quantitative measures of bone shape on radiographs including the alpha angle for Cam impingement and the center edge angle for Pincer impingement are typically used for the initial diagnosis of FAI (3,4). However, radiographic measures of bone shape may be influenced by technical factors during image acquisition (8)(9)(10), and three-dimensional (3D) bone morphology may not be reliably assessed on two-dimensional (2D) radiographs (11,12). Thus, computed tomography (CT) is commonly used for pre-operative planning to provide the most accurate assessment of 3D bone shape (3,4). While CT provides high spatial resolution and excellent tissue contrast for evaluating bone, it may result in potentially harmful ionizing radiation exposure to the pelvis (13).
Recent literature (3,4,14,15) focused the attention of FAI diagnosis on 3D MR imaging, which can enable radiologists to detect the typical osseous pathological condition in FAI with accuracy, sensitivity and specificity around 90% (3,4). These analyses are usually based on metrics arising from the shape of the hip structures or from range of motion simulations of the hip joint (6,7,(15)(16)(17).
Radiomics has gained increasing popularity over the recent years as a diagnostic image analysis method to predict and characterize a wide variety of pathologic conditions (18)(19)(20)(21)(22). Radiomics involves the high-throughput extraction of quantitative features from medical imaging studies such as CT and MRI (19-21). The assumption of radiomics is that image features quantify crucial information regarding pathologic conditions through intra-region heterogeneity (19). Several studies have used radiomics to evaluate musculoskeletal diseases of soft tissue and bone (23). However, to our knowledge no previous work has investigated the use of radiomics to diagnose FAI (24). Thus, our study was performed to investigate the feasibility of using radiomics on 3D-MRI to distinguish between hips with and without symptomatic impingement in patients with FAI.

Image data
The study group consisted of 17 patients (13 females and four males with mean age of 37.1 ± 5.7 years) with unilateral FAI diagnosed at hip arthroscopy who underwent an MRI examination of the hip prior to surgery. One patient was diagnosed with isolated Cam FAI, while the remaining 16 patients were diagnosed with mixed Cam and Pincer FAI at arthroscopy. Three patients underwent a follow-up MRI examination one year after surgery. All MRI examinations were performed on a 3T scanner (Skyra, Siemens Healthineers, Erlangen, Germany) and included an axial dual echo T1weighted 3D fast low angle shot (FLASH) sequence of the pelvis with Dixon fat-water separation and the following imaging parameters: repetition time = 10 ms, echo time = 2.4 ms and 3.7 ms, field of view = 32 cm, acquisition matrix = 320 × 320, and slice thickness = 1 mm.
For each MRI dataset, a fellowship-trained musculoskeletal radiologist with 20 years of clinical experience delineated regions of interest (ROIs) for the femur and acetabulum on each wateronly image slice of the 3D-FLASH sequence using an opensource software viewer (ITK-SNAP v3.8.0; www.itksnap.org) (25). The ROIs were drawn using the automatic 3D seed based segmentation tool available in ITK-SNAP and then manually fine-tuned slice by slice in the three main visualization axes: axial, coronal, and sagittal.
Left and right hip femur and acetabulum ROIs for the 17 patients were subdivided into healthy joints (HJs) and joints with impingement (IJs) according to the surgical reports. The IJs of the three patients with follow-up MRI examinations were excluded as the femur and acetabulum were surgically remodeled during arthroscopy. This resulted in a total of 37 segmented femoral and acetabular ROIs, which included 17 HJs and 17 IJs from the pre-operative MRI examinations and three HJs from the post-operative MRI examinations. Figure 1 shows representative examples of segmented femoral and acetabular ROIs from HJs and IJs.

Data augmentation
To increase sample size, a data augmentation method was used that provided rototranslated couples of images and ROIs that were sampled at different resolutions. Directly applying rototranslation and subsequently changing the resolution of the image could result in erroneously labeled pixels in the transformed ROIs due to the interpolation process after the rototranslation, or pixels affected by partial volume averaging. The developed data augmentation technique instead transformed every label map ROI in a collection of meshes, one per value of the map, and then transformed them along with the corresponding image. The transformations were applied in the non-gridded space of the meshes and then rasterized in the desired space. The output coordinate system could be also customized by setting origin, direction, resolution, and size of the output grid space. Data augmentation was implemented in ITK4 (26) and a containerized version of the software has been made freely available at https:// hub.docker.com/r/erosmontin/daug.
As described in the workflow diagram in Figure 2, the 37 labeled and segmented femur and acetabulum ROIs were augmented by a factor of 60 for a total of 2,220 datasets. The 2,220 augmented datasets were obtained by creating randomly uniformed rototranslation between −5 and 5°in the first two Euler's angles (left/right and anterior/posterior) and between −15 and 15°in the third Euler's angle (inferior/superior), with random translations ranging between 5 and −5 mm. The resulting images were re-sampled using two output coordinate systems: a uniform grid of 1 mm side and a size of 120 voxels per dimension and an anisotropic grid of resolution 0.4 × 0.4 × 1.2 mm and matrix size of 320 × 320 × 120. In order to maintain the anatomical shape of the hips as realistic as possible, no scaling was applied to the datasets.

Radiomic features extraction
For each couple consisting of an image and one associated femur or acetabulum ROI in the augmented dataset, 182 features were extracted using a previously described radiomic feature extractor (19), including 91 features for the femur and 91 features for the acetabulum. The 91 features could be classified into three main classes: (i) intensity and histogram based first order statistics (FOS) features, (ii) texture features, and (iii) shape and size features. A complete list of the 91 features extracted from the augmented datasets is summarized in Table 1.

Frontiers in Radiology
For each femur and acetabulum ROI, the 12 signal FOS features were extracted from the water-only 3D-FLASH grayscale image values in the ROIs. The following 25 histogram FOS features described the complexity of the shape of the histogram distribution of the grayscale values in the ROIs. The histogram settings for all feature classes were set to 32 bins with a marginal scale of 0.5 and minimum and maximum equal to 0 and 200, respectively. These first two subsets of features belonged to the FOS features. Texture features were based on the gray level co-occurrence matrices (GLCM) and gray level run length matrices (GLRLM) (27), calculated in 26 directions, one for every neighbor of a voxel in a 3D space with a radius set to one pixel. For each GLCM and GLRLM feature, the extracted features were averaged over the 26 directions to get 23 GLCM features and 11 GLRLM features per ROI. Lastly, 20 shape and size features were extracted from the ROI mesh of the femur and acetabulum separately.
The resulting 182 features were subdivided in 24 subsets with a variable number of features, divided by feature type and femur or acetabulum ROI. For each subset, a univariate ANOVA F-value analysis was applied to find the five most pertinent features based on p-values among those included. This yielded 24 additional F-contrast subsets with five features each, for a total of 48 subsets. The feature selection was repeated 100 times using 90% of the dataset and used the five most frequent features selected by the F-contrast rank.

Machine learning model training and evaluation
A K-nearest neighbor machine learning model was used to identify the features most pertinent to differentiate IJs from HJs. From the available data, 240 augmented datasets consisting of two HJs and two IJs were randomly selected as a hold-out testing dataset for model evaluation. The remaining 1,980 augmented datasets consisting of 900 datasets from 15 IJs and 1,080 datasets from 18 HJs were used for model training and validation. For each of the 48 feature subsets, a K-nearest neighbor model (k = 3) was trained and validated using 100-fold cross-validation with a 75/25 data split. During this selection process, the augmented images of one patient belonged only to one group either training or testing. The inputs of each model were the z-scored values of the radiomic features in the corresponding subset, and the outputs were the labels HJ and IJ. The trained model with the highest prediction accuracy was selected as the final model for the particular subset of features and was evaluated against the hold-out testing dataset to assess its performance in differentiating IJs from HJs. The process resulted in one trained FIGURE 2 Schematic representation of the data workflow. Data was pre-processed, and images and regions of interest (ROIs) from a total of 3 datasets [18 healthy joints (HJs) and 15 joints with impingement (IJs)] were used for the model training phase, while four hold-out testing datasets (two healthy joints and two joints with impingement) were used for model evaluation. The size of the training and validation datasets was augmented by a factor of 60 using a data augmentation (dAug) method. 48 subsets of features were created from randomly selected 75% of the training data. For each subset of features, a KNN machine learning process was repeated 100 times and the most accurate model was selected for each case. Finally, the performance of the best model for each subset was assessed on the hold-out testing dataset.
Montin et al. 10.3389/fradi.2023.1151258    Table 1 shows the mean values of each feature distribution in the femur and acetabulum for HJs and IJs and the corresponding p-values for the Wilcoxon rank sum tests comparing differences in values between groups. The results show that 116 features out of the total 182 features could differentiate IJs from HJs (p < 0.05, hereinafter indicated by *). Out of these 116 features, 45 features (39%) belonged to the intensity-based FOS group [16 signal (14%) and 29 histogram (25%)], 33 (28%) to the shape and size group, and 38 (33%) to the textural features group [28 GLCM (24%) and 10 GLRLM (9%)]. Among the 45 statistically significant FOS features, 24 features were from the femur (8 signal and 16 histograms), 21 from the acetabulum (8 signal and 13 histogram), and 20 were from both the femur and acetabulum (8 signal and 12 histogram). Table 2 shows the diagnostic performance of the machine learning models for differentiating IJs from HJs using the hold-out testing dataset. For each subset of features, the accuracy, specificity, sensitivity, and AUC of the models were reported along with the number of features in the training subset. The table had 48 entries, 24 reporting the performance of the model trained using all the features in a specific subset and 24 entries reporting the performance of the model trained using only the five most pertinent features in the specific subset with the lowest F-contrast p-values. The top performing models analyzed all GLCM texture features from the femur and acetabulum followed by the models analyzing all intensity-based FOS features from the femur and acetabulum, all shape and size features from the femur and acetabulum, and all intensity-based histogram FOS features of the femur.

Results
The model trained with all GLCM texture features from the femur and acetabulum had the highest diagnostic performance for differentiating IJs from HJs with 0.977 accuracy, 0.  Figure 3 shows the diagnostic performance of the machine learning models for differentiating IJs from HJs during the 100fold cross-validation training phase. Models trained with femur intensity-based FOS and GLCM texture features all had accuracies above 0.95, while most models trained with acetabular intensity-based FOS and GLCM texture features had accuracies under 0.95. The differences were more notable for the F- Frontiers in Radiology contrast models trained using the five most pertinent features with the lowest F-contrast p-values, where three of the four models with the highest accuracy used features from the femur. As shown in Table 2, differences in model performance were also confirmed using the hold-out testing dataset, where the model trained with 91 features from the femur had higher diagnostic performance (0.977 accuracy, 0.977 sensitivity, 0.976 specificity, and 0.977 AUC) when compared to models trained with all 182 features from the femur and acetabulum (0.976 accuracy, 0.980 specificity, 0.971 sensitivity, and 0.975 AUC) and models trained with 91 features from the acetabulum (0.963 accuracy, 0.965 specificity, 0.962 sensitivity, and 0.963 AUC). In particular, the model trained with the femur had higher accuracy compared to the ones trained with the acetabulum ones (Rank-sum test p < 0.05) even in the F-contrasted subset (bottom subplot). Figure 4 shows the five most pertinent features with the lowest F-contrast p-values for each feature class, while Figure 5 shows z- scored values of each feature for the femur and acetabulum. For the femur, the five most pertinent features were three textural features (GLRLM Long Run Low Gray Level Emphasis*, GLRLM Short Run Low Gray Level Emphasis*, and GLRLM Low Gray Level Run Emphasis*) and two shape and size features (SS Area*, SS Volume*). The values of the three GLRLM features of the femur and the area and volume of the femur were higher in IJs than HJs. The importance of the three GLRLM features of the femur were further confirmed by the results in Table 2; Supplementary  Table S1, which showed that the five most pertinent features with the lowest F-contrast p-values in the model trained with all 182 features from the femur and acetabulum included GLRLM Long Run Low Gray Level Emphasis*, GLRLM Short Run Low Gray Level Emphasis*, and GLRLM Low Gray Level Run Emphasis* of the femur.

Discussion and conclusions
Our study was performed to investigate the feasibility of using radiomics of 3D-MRI to distinguish between hips with and without symptomatic impingement in patients with FAI. Our results showed some of the highest diagnostic performance for differentiating IJs from HJs using imaging studies reported in the literature. The top performing radiomic model in our study analyzed all GLCM texture features from the femur and acetabulum on 3D-MRI, followed by models analyzing all intensity-based FOS features from the femur and acetabulum, all shape and size features from the femur and acetabulum, and all histogram FOS features of the femur.
FAI is characterized by impingement of the femoral head-neck junction against the acetabular rim due to morphologic abnormalities of the proximal femur and acetabulum (2)(3)(4). In our study, the radiomic model trained with all shape and size features from the femur and acetabulum on 3D-MRI yielded the highest performance with 0.970 accuracy, 0.968 specificity, 0.972 sensitivity, and 0.970 AUC for differentiating IJs from HJs. The model had higher diagnostic performance for detecting FAI than currently used quantitative measures of bone shape on radiographs, CT, and MRI. For example, studies have shown that the alpha angle has sensitivities between 0.360 and 0.920 and specificities between 0.620 and 0.950 for detecting cam FIGURE 3 Diagnostic performance of the machine learning models for differentiating IJs from HJs using each subset of features during model training. The histogram bars represent the distribution of the prediction metrics during the 100-fold cross-validation.  Radar charts for shape and size (SS), gray level Co-occurrence matrix (GLCM), gray level Run matrix (GLRLM) and intensity based first order statistic (FOS) of the acetabulum (left) and the femur (right). The five spokes represent the five most informative features in the group (F-contrast), the radial length of each spoke is proportional to the magnitude of the value of the associated feature. The spokes are normalized so that the difference between hip joints with impingement (IJ, blue line) and the healthy ones (HJ, orange line) is emphasized. For example, in the SS Acetabulum radar plot it is possible to see how four features values are higher for the healthy joints compared to the injured ones (first plot on the left) while the mean normal0 features values are higher in the injured acetabulum than in the healthy ones. impingement (28)(29)(30)(31)(32), while the center edge angle has sensitivities between 0.820 and 0.842 and specificities between 0.390 and 1.00 for detecting pincer impingement (32,33). Furthermore, a high prevalence of abnormal quantitative measures of proximal femur and acetabulum shape have been described in healthy subjects with no clinical evidence of FAI, which raises questions regarding the high specificities of these metrics reported in some studies (34). Although FAI is a condition caused by morphological abnormalities of bone, our study found that the radiomic model analyzing all GLCM texture features of the femur and acetabulum on 3D-MRI had the highest diagnostic performance for differentiating IJs from HJs. GLCM features are calculated over the co-occurrence matrix, which highlights how spread out the image pixel signal intensity values are around a given pixel in a square matrix. If all the pixels in the ROI had the same grayscale value (i.e., pixel signal intensity values were homogeneous), the co-occurrence matrix would have only one bin containing that particular co-occurrence image intensity value set to 1 and all the other bins set to 0. The presence of multiple peaks in the co-occurrence implies heterogeneity in image pixel signal intensity. If the imaged tissue is mildly heterogeneous, the values in the co-occurrence matrix are less parse and more close to each other, whereas if the pixel values are completely random, the co-occurrence matrix will have sparser peaks (27). In our study, the model trained with all GLCM texture features from the femur and acetabulum had 0.977 accuracy, 0.977 specificity, 0.976 sensitivity, and 0.977 AUC for distinguishing between IJs and HJs. The five most pertinent features of this model were GLCM Max Probability, GLCM1 Energy, and GLCM1 Correlation of the femur and GLCM Correlation and GLCM Inverse Variance of the acetabulum. All Heat map of the values of the features for the acetabulum (top) and femur (bottom). Each row corresponds to one patient and each column corresponds to one normalized (z-score) radiomic feature. HJ or IJ before the patient number refers to a healthy joint and joint with impingement, respectively. From the heat map it is possible to see how both femur and acetabulum GLCM Correlation feature is of higher value for HJ than IJ.
Montin et al. 10.3389/fradi.2023.1151258 these features were higher in the IJ than the HJ, indicating that FAI leads to a more heterogeneous distribution of image pixel signal intensity values. The femur and acetabulum primarily consist of trabecular and cortical bone, hematopoietic cells, and fat with little if any water content. As the water-only 3D-FLASH images used for radiomic analysis in our study reflect the presence of water within each image pixel, the greater heterogeneity of pixel signal intensity values in the IJs likely results in increased water content in some pixels. This may be due to subtle and nonuniform bone inflammation due to impingement of the femoral head-neck junction against the acetabular rim, which cannot even be detected in the image by the human eye.
Our study has shown that it is possible to create machine learning models to differentiate IJ from HJ with a high diagnostic performance using only a small subset of radiomic features on 3D-MRI. For each feature class, there was a relatively small decrease in model performance when using the five most pertinent features with the lowest F-contrast p-values compared to the full model analyzing all features from the femur and acetabulum. For example, the F-contrast model for GLCM texture features had 0.972 accuracy, 0.977 specificity, 0.966 sensitivity, and 0.972 AUC for differentiating IJs from HJs compared to 0.977 accuracy, 0.977 specificity, 0.976 sensitivity, and 0.977 for the full model. Radiomic models analyzing a smaller number of features are better suited for widespread use in clinical practice as they are quicker and easier to create and are likely more reproducible across different MRI scanners, sequences, and imaging parameters.
Our study had several limitations. One limitation was the small number of subjects used for model training and evaluation. The problem of model training with a small number of subjects was overcome by using a novel data augmentation framework to create pseudo-plausible image data that magnified the pattern in the features space between the IJs and HJs. Furthermore, our models were created using a simple K-nearest neighbor method to focus attention on the information content of the image features rather than the accuracy of the models per se. However, the relative simplicity of our machine learning approach may improve the reproducibility of the models and indirectly determines the lower bound of model performance as sensitivity and specificity could likely be improved with use of more sophisticated machine learning methods and larger training datasets. A final limitation was that our study could not assess model generalizability as model training and evaluation was performed using homogenous image datasets acquired on the same MRI scanner with the same sequence and imaging parameters.
In conclusion, our study has documented the feasibility of using radiomics of 3D-MRI to distinguish between hips with and without symptomatic impingement in patients with FAI. Our radiomic models analyzed intensity-based FOS features, shape and size features, and texture features and had some of the highest diagnostic performance for differentiating IJs from HJs using imaging studies reported in the literature. Additional studies are needed to investigate the use of more sophisticated machine learning approaches and larger training datasets to optimize model performance and to evaluate model generalizability using more heterogeneous patient populations imaged with different MRI scanners and imaging protocols.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The studies involving human participants were reviewed and approved by Institutional Review Board (IRB). The patients/ participants provided their written informed consent to participate in this study.