- 1Technische Universitat Munchen School of Computation Information and Technology, Munich, Germany
- 2Isomics, Inc, Cambridge, MA, United States
- 3Alkalay Spine Biomechanics Laboratory, Beth Israel Deaconess Medical Center, Boston, MA, United States
- 4Beth Israel Deaconess Medical Center, Cancer Center, Boston, MA, United States
- 5EBATINCA, S.L., Las Palmas de Gran Canaria, Spain
- 6Department of Radiology, Beth Israel Deaconess Medical Center, Boston, MA, United States
- 7Department of Radiation Oncology, Brigham and Women’s Hospital, Boston, MA, United States
- 8Rocky Vista University - Colorado Campus, Englewood, MA, United States
- 9Department of Radiology, Brigham and Women’s Hospital, Boston, MA, United States
Introduction: Given the high prevalence of vertebral fractures following radiotherapy in patients with metastatic spine disease, torso muscle segmentation is necessary for biomechanical modeling of vertebral loading, permitting individualized evaluation of fracture risk.
Methods: In this study, we developed and validated a deep-learning model for full volumetric segmentation of the thoracic and abdominal spinal musculature in cancer patients with metastatic spine disease from sparsely annotated clinical CT image data. We obtained CT data for 148 metastatic spine disease patients undergoing radiotherapy treatment, and an external set of randomly selected 30 subjects from the National Lung Screening Trial. We extracted 1924 axial CT images at the midpoint of each vertebral level (T4 to L4) and manually labeled the key extensor and flexor muscles (up to 8 muscles per side) at each level. We trained a 2D nnU-Net deep-learning (DL) model to segment each muscle and, using these sparse annotations, trained the model to segment each muscle’s 3D volume per spine. Two experienced radiologists independently and blindly evaluated the anatomical fidelity of the segmentations using a Likert scale, for 1) manual- and 2) DL-segmentation, 3) random test samples from the muscle’s 3D volume and 4) an external NLST CT data.
Results: The DL method achieved comparable performance to manual segmentations with a mean Dice score above 0.769. Mann-Whitney test analysis showed that the radiologist ratings of DL-generated muscle segmentations were noninferior to the manual segmentation for each muscle.
Discussion: Demonstrating excellent performance for rapid, high-anatomical fidelity 3D segmentation of the main flexor, extensor, and stabilizing thoracolumbar muscles, the DL model from clinical CT scans, this development holds significant potential for reducing the manual effort required to generate individualized musculoskeletal models in cancer patients.
1 Introduction
With cancer therapy extending patients’ life expectancy and improving cancer prognosis (Wewel and O’Toole, 2020; Allaire et al., 2023), the incidence of metastatic spine (MSD) disease (Morimoto et al., 2024; Santucci et al., 2020), affecting 30%–70% of cancer patients in the US, continues to increase (Van den Brande et al., 2022). Vertebral fracture (VF), affecting up to 16% (Van den Brande et al., 2022) of the 5.4 million cancer patients with MSD in the US (2022) (US Cancer Statistics Working Group, 2022), cause catastrophic complications, including debilitating pain, spinal cord and nerve root compression and paralysis (Van den Brande et al., 2022; Amelot et al., 2024), shortening patient survival (Oefelein et al., 2002; Oster et al., 2013; Saad et al., 2007) and the 3-years life expectancy (Oefelein et al., 2002; Pond et al., 2014). Prior investigations of the pathomechanics of pathologic vertebral fracture uniquely focused on the effect of bone metastasis on vertebral compressive strength and stiffness (Alkalay and Adamson, 2008; Alkalay et al., 2022; Alkalay and Harrigan, 2016; Burke et al., 2017; Burke et al., 2016; Soltani et al., 2024; Tamada et al., 2005; Tschirhart et al., 2004). However, this singular assessment cannot account for the complex vertebral loading produced by the trunk and abdominal muscles (Arjmand and Shirazi-Adl, 2006; Christophy et al., 2012; Ignasiak et al., 2016; Rajaee et al., 2021; Wang W. et al., 2021; Malakoutian et al., 2022; Mokhtarzadeh et al., 2021) acting to balance the task-specific external loads (Martuscello et al., 2013; Andersson et al., 1977) while providing mechanical stability to maintain spinal posture (Cholewicki et al., 1997; McGill et al., 2003). Given the knowledge gaps regarding the pathomechanics underlying the risk of VF in cancer patients (Alkalay and Adamson, 2008; Alkalay et al., 2022; Alkalay and Harrigan, 2016; Burke et al., 2017; Burke et al., 2016; Soltani et al., 2024; Tamada et al., 2005; Tschirhart et al., 2004), there is a need for a greater understanding of the systemic effects of cancer on spinal musculoskeletal health to develop more physiological and objective tools for VF risk assessment.
Patient-specific musculoskeletal simulations of the spine allow insight into the relative magnitudes of vertebral loading conditions (force-, moment-based) that cannot be measured non-invasively in patients (Bruno et al., 2015). Such models can be improved and personalized by using detailed geometry and density properties of the thoracic and lumbar musculature and osseous spine derived from clinical imaging (Bruno et al., 2015). Compositional analysis to extract skeletal muscle index (SMI) was shown clinically to be significantly associated with poor functional outcomes in cancer patients (Shachar et al., 2016). However, derived from the segmentation of the total cross-sectional area of the intra-abdominal musculature, segmented from a single axial CT image slice (mid-level at the L3 vertebra), this measure provides no information on the effect of cancer on the individual muscles’ anatomy or their physiological cross-sectional area (PCSA), a good approximation of the muscle’s maximum isometric force (Christophy et al., 2012; Bruno et al., 2015), at any of the lumbar and thoracic vertebral levels. This information is critical for establishing spinal musculoskeletal models to investigate the effect of cancer on the patient’s functional performance, frailty as it relates to risk of falls, and vertebral fracture risk.
The process of muscle segmentation, specifying the area/volume of the features of interest to be extracted by delineation (labeling) of the 2D region of interest (ROI) or the 3D volume of interest (VOI), is a critical step in establishing information about the muscle spatial location (Bruno et al., 2015), PCSA (Bruno et al., 2015) and its density (Wang L. et al., 2021), as part of the input files for musculoskeletal models. Historically, the tissue or object/region/volume of interest (ROIs/VOIs) were contoured manually or semi-manually using a segmentation software process (Heckel et al., 2014) by a trained observer or an expert clinician from imaging data (CT, MRI, PET, Ultrasound). Nevertheless, manual segmentation remains operator-dependent, is error-prone and is highly labor and time-consuming (Preim et al., 2014), resulting in the effort to establish such segmentations in a patient cohort larger than a few single patients remaining prohibitive. Deep learning (DL) methods employing 2D U-Net models are increasingly being advocated as enabling high-throughput automated abdominal spinal musculature segmentation from clinical CT (Hemke et al., 2020; Edwards et al., 2020; Ackermans et al., 2021; Shen et al., 2023) and MRI (Kim et al., 2024) data. U-Net models have achieved a dice score of 0.95 for agreement with expert segmentation for cross-sectional segmentation of L3 abdominal musculature from axial CT data (Hemke et al., 2020; Edwards et al., 2020; Ackermans et al., 2021; Shen et al., 2023). However, these approaches did not differentiate between individual muscles. Aiming to expand these models to capture the volumetric muscle anatomy, successive application of the 2D U-Net (K and amiya, 2018) yielded 82.8% agreement with manually derived volumetric segmentation of the erector spinae. The efficacy of tools like nnU-Net depends on the availability of high-quality manual segmentations for training. However, curated CT or MRI-based 3D data sets of individual spinal muscles are not available, either in non-cancer or cancer subjects. To address this challenge, self-supervised and semi-supervised methods have been developed, which leverage limited annotated data to achieve accurate segmentations (Ouassit et al., 2022). Lei et al. (2024) used scribble annotations and propagated them from annotated to unannotated. Cai et al. (2023) leveraged sparse annotation by cross-teaching the 3D and 2D networks, in which two 2D networks are trained on the transverse and coronal slices, and the coinciding predictions are used as pseudo labels for the 3D network, achieving an 82.67% Dice score. To date, however, most of these models were established on lower-resolution CT data or focus on single slices, typically at the lumbar level, yielding muscle segmentation unsuitable for musculoskeletal spinal modeling.
2 Materials and methods
2.1 Study aims
This study consisted of three specific aims.
a. To adapt and validate a deep-learning (DL) algorithm for volumetric segmentation of the main flexor, extensor and stabilizing musculature in the thoracic and lumbar regions in a cohort of cancer patients with metastatic spine disease planned for radiotherapy treatment.
b. Based on a radiological review of the segmentation anatomical fidelity, test the DL model performance with the manual segmentations (training set) and its performance in a sample selected from the volumetric muscle segmentation (test data).
c. Evaluate the DL model generalizability in external CT data of subjects from the National Lung Screening Trial (ACRIN LSSgLatACoRIN, 2009), representing out-of-sample data.
Figure 1 provides a flow chart summary of the study’s patient selection, deep-learning model, and steps performed for model performance evaluation based on a radiological review.
Figure 1. A flow chart summary of the study patient selection, deep-learning model and steps performed for model performance evaluation based on radiological review.
2.2 Study sites
The study was conducted as part of NIH grant R01AR075964 at the Radiation Oncology Department at Brigham and Women’s Hospital (BWH)/Dana Farber Cancer Institute’s (DFCI) Joint Cancer and the Center for Advanced Orthopedic Studies at Beth Israel Deaconess Medical Center, Boston, MA. Patients enrolled in the study were treated with radiotherapy for spinal metastatic disease between September 2020 and July 2023. The recruitment and enrollment process occurred during in-hospital evaluations, where clinicians confirmed a patient’s eligibility for study enrollment. Patients who agreed to participate had previously consented to the Broadband biorepository research project (MGB IRB 2016P001582). The Broadband research coordinators performed the consent process for the participants.
2.3 Inclusion and exclusion criteria and recruitment and enrollment
This study comprised 148 patients (49 females and 99 males, Table 1) who were aged above 18 at presentation and had a confirmed diagnosis of cancer with metastatic spread to the thoracic, thoracolumbar or lumbar spine. Study inclusion criteria were 1) histologically or cytologically documented stage IV BM and radiographic evidence of BM (computed tomographic [CT] scan or bone scan) and 2) Karnofsky Performance Status (Peus et al., 2013) >70, selected to enhance the likelihood of patient participation and follow-up. Patients were excluded if they had: 1) a history of prior spine surgery, 2) radiation (<6 months) for metastatic disease to the area/region currently targeted for radiation treatment or 3) vertebral augmentation at the site of radiation or adjacent levels. Patients with diseases of abnormal bone metabolism (Paget disease and untreated hyperthyroidism, hyperprolactinemia or Cushing disease) were excluded.
Table 1. Demographic characteristics of the cancer patients recruited as part of NIH grant R01AR075964.
2.4 Data collection
This study collected CT data as part of the standard protocol for simulation in patients’ radiotherapy treatment planning. Simulations were performed using the Siemens SOMATOM Confidence (Siemens Healthcare GmbH, Erlangen, Germany) or the GE Lightspeed (General Electric Medical System, Waukesha, WI) CT scanner. The CT scans were not gated, and breath-hold techniques were not used. The CT images in the collected datasets had a section thickness of 0.5 or 1.25 mm. Table 2 details the simulation scan parameters. Images were de-identified using the anonymization feature in MIM (version 7.1.12, MIM Software Inc., Cleveland, OH) and further anonymized by study staff who removed all dates from scans.
2.5 Ground-truth muscle annotations
From the 148 patients, we extracted 1924 Axial CT image slices corresponding to the centroid of T4 to L4 vertebral bodies. Following a published protocol (Allaire et al., 2023), manual (ground truth) muscle segmentations were performed in Analyze™ (Biomedical Imaging Resource, Mayo Clinic, Rochester, MN) (Robb, 2001) by a single research associate. For this purpose, the research associate was specifically trained for this task through guided analysis before the study. The training consisted of the trainee, having been instructed by an analyst how to perform the segmentations using a curated set of muscle annotations (Allaire et al., 2023), performing unsupervised manual segmentation for a complete set of thoracoabdominal musculature in 12 training CT scan volumes (anonymized, full skin-to-skin (T4-L4)). The resulting segmentations were visually reviewed and scored for accuracy by an analyst and compared to existing curated segmentations. This training method is similar to that reported in Johannesdottir et al. (2018), and this particular analyst’s reliability results were reported in Allaire et al. (2023). Based on inter-rater intraclass correlation coefficients (ICCs) comparing results from the analyst with gold-standard segmentations, the research associate segmentation ICC scores ranged from 0.85 to 0.99 for muscle area and 0.84 to 0.99 for muscle position, depending on the muscle group measured.
a. Segmentation of patient data: For the patient cohort, and for each patient’s CT volume, single axial CT data at the mid-vertebral level (T4-L4 levels) were extracted using Analyze™. The Object Extractor module (Analyze™) was used to manually trace the major spinal flexor, extensor, and lateral stabilizer muscles and the muscles segmented depending on the CT image field of view (FOV) and level viewed. Figure 2 illustrates a manual segmentation and associated labels performed for L3 and T5 levels, with Table 3 detailing the muscles segmented per vertebral level. Each manual muscle segmentation required 1–2 min per muscle (initial tracing and required corrections), requiring 14–28 min per axial CT slice (both left and right, 14 muscles on average), leading to segmentation times of 3–6.1 h per subject for 13 axial CT slices.
b. Data preparation: The individual muscle boundaries at each vertebral level were further processed using a Sigma filter (Lee, 1983) to reduce noise while preserving the edges of the muscle’s tissue boundaries. Based on the protocol published by Johannesdottir et al. (2018), a hydroxyapatite phantom, scanned asynchronously on each CT system before the patient’s simulation CT (Image Analysis, Inc., Lexington, KY, United States), was used to standardize the CT image and each muscle’s contour was processed to exclude voxels outside the 50 to 150 HU range, removing voxels associated with pure fat, tendon, and bone from each muscle contour’s margin. The asynchronous CT scan was required due to strict limitations on adding a phantom to the cancer patient’s radiotherapy planning CT. The resulting data file containing binary labels at each vertebral level (each muscle cross-sectional area (CSA) and the corresponding vertebral body CSA) was exported in Analyze™ 7.5 format.
Figure 2. Graphic illustration (RNA) of the muscle groups segmented in the lumbar and thoracic regions.
Table 3. Lists the thoracic and lumbar muscles measured from axial CT scans at each vertebral level.
2.6 Establishing deep learning muscle segmentation
At present, no curated 3D volume annotations are available for flexor, extensor and stabilizing spine muscles in the thoracic, thoracolumbar and lumbar regions from CT data. We therefore employed a two-stage approach for building an efficient, fully automated pipeline for accurate segmentation of this musculature.
a. 2D segmentation: Utilizing the nnU-Net v2 package from the German Cancer Research Center (DKFZ) without modifications for training and inference (Isensee et al., 2021), we trained the nnU-Net exclusively on mid-vertebral 2D slices corresponding to manually-segmented slices as ground truth. We treat these slices as representative samples of the whole torso, allowing the model to learn meaningful features without requiring densely annotated 3D volumes; we posit that the resulting model would generalize and be able to segment CT slices that are not at the mid-vertebral levels. Data preparation followed the nnU-Net preprocessing pipeline, with model planning and training conducted using the 2D nnU-Net configuration. Figure 3B illustrates the initial iteration of the model. We adopted a default 5-fold training method with one-fifth of the training data held out for validation in each fold. Model training and validation were performed on the Jetstream2 cloud computing environment (https://jetstream-cloud.org/) at Indiana University (Boerner et al., 2023; Hancock et al., 2021).
b. 2D to volumetric segmentation: The volumetric thoracic and abdominal muscle segmentation was performed by employing the nnUnet sparse annotation option (Gotkowski et al., 2024), with the 2D annotation used as dense annotation. In effect, by successively applying the trained 2D model to the stack of axial CT image data within the CT volume, a complete segmentation of the muscle volumes is achieved (Figure 4). On average, the model required an average of 1 min to process all slices (600–900). The default data augmentation for training the nnU-Net model includes flipping the images left-right and front-back. However, we found that this was causing the model to determine the left/right orientation of the segmented muscles incorrectly. Upon further review, we retrained the model without any left-right or front-back flipping during data augmentation. This eliminated the issue of the model incorrectly labeling the laterality of the muscles in the segmentation outputs. The trained model weights from the 2D model are compatible with the “3 days_fullres” mode of nnU-Net version two and have been made available along with installation and usage instructions at the GitHub repository linked below.
Figure 3. (A) The training process for fold 0 is typical of the results across all five folds. (B) Presents a graphic comparison of the manual and DL segmentation for an axial slice. The magnified sections compare the DL segmentation quality for the extensor [i] and oblique [ii] musculature.
Figure 4. Shows the 2D segmentation per vertebral level and the resulting 3D volume when inferring every slice of the volume. Horizontal banding, visible in the abdominal region as seen in the frontal view figure (lower right), is believed to be due to breathing artifacts during the CT acquisition, as no breath-hold control was performed during the radiation planning CT scan, but may merit further investigation.
2.7 Model performance assessment
Two experienced radiologists independently evaluated the quality of model-generated segmentation for each muscle contour (Table 3), blind to whether the CT segmentation was generated manually or by deep learning. This evaluation followed a 0 to 5 Likert scale for clinical acceptability for muscle segmentation contour with 0: Segmentation is unacceptable; 1: Poor (<50% matches muscle anatomy); 2: Inadequate: <75% matches muscle anatomy). 3: Acceptable: >75% but <90% matches muscle anatomy), 4: Good: small differences from the muscle anatomy) with 5: Best: Completely matches the muscle anatomy. We applied this scale to the following groups.
Group 1: Manual-segmentation: We selected 30 CT images from thoracic [T4: n = 2), T5: n = 22 and T6: n = 6] and lumbar (L2: n = 4, L3: n = 26] levels, yielding 757 individual muscle segmentations.
Group 2: DL-segmentation: The DL segmentations were performed for group 1 CT data (n = 757 individual muscle segmentations).
Group 3: Assessment of the volumetric segmentation: To evaluate the model’s ability to segment muscle anatomy within the volume of the muscles segmented, we randomly selected axial CT slices at non-mid-vertebral levels from the 5-fold ensemble used to create the full volumetric muscle segmentation at the thoracic and lumbar regions and extracted the corresponding 2D axial CT image data. The resulting dataset included thoracic [T1: n = 3, T2: n = 3, T4: n = 3, T5: n = 7, T6: n = 7, and T7: n = 14] and lumber [L1: n = 6, L2: n = 2, L3: n = 8, L4: n = 3, and L5: n = 11], yielding 704 individual muscle segmentations.
Group 4: External data: To demonstrate the model generalizability, we selected CT data from 30 subjects from the National Lung Screening Trial (NLST) (ACRIN LSSgLatACoRIN, 2009). The subject’s CT data was acquired using the following parameters (a mean (SD)) of: Tube voltage (kVp): 120.4 (2.9), Tube Current (mA): 107.5 (45.1), In-Plane Pixel Size (mm): 0.66 (0.64) and Slice Thickness (mm): 2.21 (0.61). We performed volumetric (3D) DL muscle segmentation and evaluated the model performance for thoracic [T1: n = 4, T2: n = 3, T4: n = 4, T5: n = 4, T6: n = 4, and T7: n = 11] and 30 lumbar [L1: n = 2, and L2: n = 28] yielding 673 individual muscle segmentations. Supplementary Table S1 details the CT imaging parameters for the NLST subjects (ACRIN LSSgLatACoRIN, 2009).
2.8 Data analysis
a. Interobserver variability: To evaluate the radiologist’s inter-rater reliability, we calculated Gwet’s AC2 statistic with ordinal weights. This statistical method has the advantage compared to Cohen’s Kappa that, in the presence of high agreement, it allows for the accounting for misclassification errors, while not being based on the assumption of independence between raters, making it a more robust analysis. We performed this analysis by evaluating the agreement between the two radiologist raters by creating the crosstabulation table between the assigned scores (Table 4) and calculating the Gwet AC2 statistic; we calculated the agreement between the raters using the package irrCAC with ordinal weights (Gwet, 2019).
b. DL segmentation Model performance: Radiologists’ ratings were averaged for each CT image and muscle left and right segmentation to create a single average rating score in each CT dataset (Manual-, DL-segmentation, test sample, and external data). For each muscle and segmentation method, we verified that the ratings were similar and did not significantly differ between the left and right sides; following this, the scores of the two sides were collapsed into a single array of scores. We computed each muscle’s grand mean and corresponding standard deviation per evaluation and evaluation group (manual-, DL-segmentation, test sample and external data). For making conclusions on the similarity of rater scores between the DL and manual segmentation, we used the Wilcoxon test with a non-inferiority margin of 0.25. We referred to the Wilcoxon test to make the comparison of the scores due to the paired nature of the data (DL- and Manual segmentation were applied on the same set of muscles). We concluded that the DL segmentation was non-inferior to the Manual segmentation if the difference (Rating score for DL segmentation − Rating score for Manual segmentation) was significantly greater than −0.25.
2.9 Code availability
All codes and processing scripts are freely available at https://github.com/Spine-Biomechanics-Group-Alkalay-Lab/Spine-Muscle-Segmenter.git.
3 Results
3.1 DL-muscle segmentation
Using a single g3. xl Jetstream2 instance (A100, 40 GB video RAM, https://jetstream-cloud.org/), model training took 7 days, with the nnU-Net achieving a mean Dice score across five folds greater than 0.769. Training took approximately 1 min per epoch and ran for 2,000 epochs per fold, stabilizing quickly with subsequent epochs adding little to the result, Figures 3A,B. Once training was completed, the nnU-Net model inference applied on a slice-by-slice approach required, on average, 1 min per CT volume to generate a continuous volumetric (3D) segmentation of the upper body muscles, Figure 4.
3.2 Interobserver variability
Based on the crosstabulation of the scores between the raters (Table 4), the majority of the scores (83.6%) are four or 5, and for 60.1% of the segmentations, the rater scores agree perfectly. Gwett AC2 statistic calculated with ordinal weight is estimated to be 0.91 (95% Confidence Interval: 0.90–0.92). These results suggest that there is a good agreement in the rating of segmentation between the two radiologists.
3.3 Muscle segmentation model validation
a. Manual vs. DL muscle segmentation: We first evaluated the nnU-Net model segmentation ability to delineate individual muscle anatomy with manual segmentation based on the radiologist’s scores. Wilcoxon test showed that the average radiologist rating score for the DL-segmentation is non-inferior to the one from the Manual segmentation (the difference between the DL-segmentation and Manual segmentation is significantly greater than −0.25; p-value <0.001), Table 5. Further on, the same conclusion can be made about individual muscle segmentations, except for Rectus Abdominis (P = 0.058). Plotting the radiologist scores by level score (Figure 5) suggested that the reduced scores occurred predominantly at the higher thoracic levels. Figure 6 presents examples of DL segmentation corresponding to high and low radiologist scores.
b. Assessment of the volumetric segmentation: The radiologist’s review scores for the model’s muscle volumes segmentations, derived from assessment of 2D axial CT slices extracted from the thoraco-abdominal volume inferred by the model (Group 3), are summarized in Table 6. We found 65.7% of the average segmentation quality scores equal to or exceeded 4.6.
c. Out-of-study muscle segmentations: The radiologist’s review scores for the model’s segmentation of the external data (Group 4) are summarized in Table 6. 64.3% of the segmentation quality scores were equal to or exceeded 4.6. The lowest review scores in group 4, corresponding to the psoas major muscle, had a score of 4.12.
Table 5. Summary statistics (mean, standard deviation) for the manual and corresponding DL segmentations of the cancer patient data (groups 1, manual segmentation, and 2, DL segmentation.
Figure 5. Presents scatter plots of radiologist assessment scores of the manual and DL segmentation by muscle and vertebral level. The radiologist’s scores found that segmentation accuracy was not uniform across spinal muscles and was less accurate at upper thoracic levels. We added minimal jitter to the marker position within each sub-figure to enhance clarity and readability.
Figure 6. Graphic presentation of a successful DL muscle segmentation (A) corresponding to a high score by the radiologist raters and low scores by the radiologist rater resulting from either poor (B) or partial and missing segmentation (C).
Table 6. Average Radiologist rating for cancer patients, group 3, and external CT database of subjects, group 4.
4 Discussion
This study demonstrated a nnUNet DL model, trained on a sparsely annotated 2D muscle segmentations from axial CT data, to successfully segment the volumes of the individual flexors, extensors and stabilizing thoracic and lumbar muscles from the clinical CT data of cancer patients with metastatic spine disease. Based on radiologists’ assessments, the DL generated muscle segmentations showed high anatomical fidelity for the individual muscle segmentations at the lumbar and thoracic regions for the training and test data. However, the DL model’s performance was not uniform across muscles, with challenges observed in segmenting thin muscles at higher thoracic levels, reflecting the difficulties in separating muscles at these loci. Our novel approach advances the possibility of efficiently creating spinal musculoskeletal models in large cancer and non-cancer cohorts, to better understand the role of disease in affecting the patient’s fracture risk and alteration of function, as well as response to therapy.
Accurate and robust medical image segmentation is crucial for establishing patient-specific musculoskeletal models. Deep learning (DL) methods are increasingly being advocated for high-throughput automated abdominal spinal musculature segmentation from clinical CT (Hemke et al., 2020; Edwards et al., 2020; Ackermans et al., 2021; Shen et al., 2023) and MRI (Kim et al., 2024) data. High-quality, fully labelled training datasets remain a key requirement of these advancements. Obtaining annotations for medical images is, however, costly and time-consuming, particularly for 3D volumetric data, requiring specialized expertise to delineate each case. Sparse annotations, in which only a few image slices or organs within the image slice are labelled, were demonstrated to preserve accurate boundaries for different structures (Gotkowski et al., 2024; Gao et al., 2022). Our approach leveraged the sparse annotation option (Gotkowski et al., 2024) in the well-characterized nnUNet DL model (Isensee et al., 2021), employing manual 2D annotation as dense annotations from 12 axial CT data, representing approximately 2% of, on average, 600 axial CT image data within the T4-L4 volume. Based on the 5-fold validation (80% of the data used for training and 20% for validation), we found the mean Dice score across five folds greater than 0.769, suggesting a strong agreement between the model-based and the manual segmentation. The radiologist’s independent review, comparing manual with the DL-generated segmentation for 757 individual 2D muscle segmentations across the lumbar and thoracic spines, finding no significant difference between the two sets of segmentation for the majority of the muscles reviewed, appears to confirm this finding. However, this review highlighted lower anatomical fidelity for the model segmentation of the Rectus Abdominis muscle, and, although not statistically significant, for the transversospinalis and the internal and external obliques, to occur predominantly at the higher thoracic levels. The predominant cause of error was either oversegmentation or undersegmentation, as shown in Figure 6, a finding similarly observed by Edwards et al. (2020), for segmenting combined muscle areas at the thoracic region. These errors likely reflect the difficulties in delineating muscle boundaries for the thin, elongated muscles, for example, the Internal and External Obliques and Rectus Abdominis. Although our data preparation was aimed at excluding bone, muscles in close contact with complex osseous structures of the posterior element, for example, the transversospinalis located deep to the erector spinae and the Serratus Anterior, may present greater difficulty for the model due to partial volume associated with the clinical CT resolution. Specific to the study cohort, metastatic spine disease patients are likely to be at advanced stages of cancer, presenting higher muscle damage due to chemotherapy-induced peripheral neuropathy (Kolb et al., 2016; Ward et al., 2014) and cachexia, affecting approximately 20% of prostate and breast cancer patients and around 40%–50% for colorectal and lung cancer (Hebuterne et al., 2014). Such patients experience dramatic rates of sarcopenia (Pamoukdjian et al., 2018), resulting in loss of muscle quality (characterized on CT as a lower Hounsfield Units (HU)) value due to fat infiltration (myosteatosis) within the muscle tissue and loss of area due to atrophic damage (Martin and Freyssenet, 2021). Combined with the presence of intramuscular fat or fatty-replaced muscles, these process affect the loss of textural details with respect to neighboring tissue, resulting in the neural network unable to fully segment the muscle, which would have been segmented in the manual reference segmentation.
Applying the 2D DL model weights for the DL network 3D model resulted in the segmentation of the muscles’ complete thoraco-abdominal volumes between T4 and L4 levels. Based on a random selection of inferred muscle segmentations from the CT volume not used for training (test set), resulting in 704 DL-generated individual muscle segmentations across the lumbar and thoracic spines, the radiologist’s review found 65.7% of the average segmentation quality scores equal to or exceeded 4.6 out of five on the Likert scale, Table 6, suggesting the model inferred segmentations to maintain high anatomical fidelity throughout the muscle volume. Our evaluation of the DL model in a new independent set of subjects’ data from the National Lung Screening Trial (NLST) (ACRIN LSSgLatACoRIN, 2009), for which 673 individual muscle segmentations were evaluated, showed 64.3% of the segmentation quality scores equal to or exceeding 4.6 based on the radiologist’s review, Table 6. This level of inference performance suggests that the model is generalizable for the non-cancer population, a finding recently demonstrated in 40 non-cancer subjects evaluated for abdominal surgery in our group (non-published data).
Although this study did not develop a new DL model for muscle segmentation, our study demonstrates an approach to achieve rapid, muscle-specific segmentation with high anatomical fidelity throughout the volume of the thoraco-abdominal region using only a small set of sparse dense annotations. In the absence of a gold standard volumetric segmentation for the spinal muscle group evaluated in this study, creating a manual segmentation for a single, T4–L4 spine, will require 204–408 h [14 muscles per CT slice * 25 CT slices per vertebra * 7–14 levels * 5 min per segmentation)/60 min], clearly a prohibitive effort. In the absence of gold standard volumetric segmentations of the spinal thoraco-abdominal musculature, our study provides a novel methodology critical for the creation of anatomically accurate musculoskeletal spinal models for research and for the assessment of cancer patient clinical outcomes in large patient cohorts. This effort thus far has been prohibitive due to the labor and cost required. Muscle attenuation, related to fat infiltration and tissue density, is of ongoing interest as a possible marker of aging, muscle strength, and physical function (Wang L. et al., 2021; Johannesdottir et al., 2018; Soufi et al., 2025). The methods developed here may enable rapid muscle-specific assessment of attenuation. In elderly non-cancer subjects, forces generated by spinal muscles were found to be associated with vertebral material properties (Chalhoub et al., 2018), low-back pain (Fortin and Macedo, 2013; Greig et al., 2014) and the incidence of fragility-based vertebral fractures (Cangussu-Oliveira et al., 2020; Mokhtarzadeh and Anderson, 2016). Biomechanically, a vertebral fracture may initiate when the loading applied, largely produced by the thoracic and abdominal muscles during daily activities, exceeds vertebral strength (Alkalay and Adamson, 2008). As a metric, this risk can be evaluated from a load-strength-ratio (LSR) value perspective if the LSR value is greater than one [4]. Recent studies (Anderson et al., 2022; Anderson et al., 2025), LSR, with osteolytic vertebrae having higher LSRs and osteosclerotic vertebrae having lower LSR than an age and sex-matched normative control group (Anderson et al., 2022). Uniquely, this study found that vertebrae without radiographic evidence of BM had higher LSR than healthy normative values (Anderson et al., 2022), suggesting that cancer has a systematic effect on the spinal column’s biomechanical properties. Understanding the role of patient- and task-specific LSR on PVF risk may have important implications for determining its potential clinical utility for a more comprehensible assessment of this risk in patients with metastatic spine disease. Enabling rapid, muscle-specific segmentation with high anatomical fidelity in large volumes of interest from clinical imaging is critical, as cancer patients are living longer with a greater metastatic burden as a result of advancements in cancer treatments, with therapies focused on palliation and quality of life being key tenets of multidisciplinary care coordination and survivorship efforts (Massaad et al., 2021; Rothrock et al., 2021).
This study had several limitations. The thoracolumbar muscles chosen for annotation in this study did not include the transversus abdominis, as the segmentation was strictly limited to muscles required for the musculoskeletal model of spinal loading (Anderson et al., 2022). Although applying the same approach to MR images of the torso would be appealing, we have not assessed performance using this modality. A prior study of MR-based lumbar paraspinal muscle segmentation and subsequent modeling (Hess et al., 2022) showed that lumbar loading estimates of models created by automatic vs. manual segmentation were correlated. However, that agreement varied by vertebral level, from L5 (R = 0.55) to L2 (R = 0.87). While not yet tested similarly, the current DL muscle segmentation is better-suited to creating musculoskeletal models as it incorporates all key trunk muscles across the full thoracolumbar spine. We acknowledge that there is no true gold standard for these segmentations, ultimately relying on expert subjective evaluation. At present, it is unclear what constitutes a more reliable standard. However, the strong agreement between the automated and manual techniques suggests they should be of comparable accuracy.
Our approach used the nnUnet 3 days_fullres model, sparse annotation option (Gotkowski et al., 2024). Although the model will accept 3D data, it will only do a prediction in 2 days for the slices, simplifying the pipeline. With the CT slice stack being continuous, the outcome is a volumetric segmentation. However, using this option, the model does not incorporate 3D information from the CT stack to perform inference, resulting in a “staggard” segmentation along the Z (caudal-cranial) axis, the degree of which is dependent on the scan slice thickness. This approach may present a problem in accurately representing the muscles’ 3D volume in the case of spinal conditions involving large spinal curvature changes along both the sagittal and coronal planes, as well as vertebral rotations, for example, scoliotic spines (Jiang et al., 2018), leading to significant 3D deformity of the thoracolumbar regions. Our group is currently working on establishing a fully 3D segmentation approach to handle such conditions.
5 Conclusion
This CT DL muscle segmentation model achieved segmentation performance of spinal muscle anatomy comparable to human raters with remarkably higher efficiency, resulting in automated segmentation of the complete volume of human thoracic and lumbar regions. This work represents a significant step toward automated musculoskeletal modeling in cancer patients, potentially enabling routine assessment of vertebral fracture risk in clinical settings and may have applications in other fields of medicine. This advancement could help identify high-risk patients earlier, allowing for more timely interventions and improved patient outcomes.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, Upon request.
Ethics statement
The studies involving humans were approved by Broadband biorepository research project (MGB IRB 2016P001582). The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.
Author contributions
VH: Data curation, Formal Analysis, Methodology, Software, Visualization, Writing – original draft, Writing – review and editing. SP: Data curation, Formal Analysis, Methodology, Software, Visualization, Writing – review and editing, Validation. JJ: Data curation, Investigation, Methodology, Writing – review and editing. DA: Formal Analysis, Investigation, Supervision, Writing – review and editing. CP: Software, Validation, Writing – review and editing. YC: Data curation, Investigation, Writing – review and editing. AB: Investigation, Writing – review and editing. DK: Investigation, Writing – review and editing. PD: Investigation, Writing – review and editing. SC: Investigation, Writing – review and editing. HK: Investigation, Writing – review and editing. TB: Investigation, Project administration, Supervision, Funding acquisition, Writing – review and editing. AS: Investigation, Writing – review and editing. MK: Data curation, Formal Analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review and editing. RK: Formal Analysis, Methodology, Software, Validation, Writing – review and editing. DH: Funding acquisition, Methodology, Validation, Writing – review and editing. RA: Conceptualization, Data curation, Formal Analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review and editing.
Funding
The author(s) declared that financial support was received for this work and/or its publication. The authors acknowledge the financial support of the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) under award numbers R01AR075964 and 3R01AR075964-03S1. National Institute of Biomedical Imaging and Bioengineering under award number P41EB015902, P41EB028741, National Cancer Institute under award number R01CA235589 and the National Cancer Data Ecosystem, Task Order No. 413 HHSN26110071 under Contract No. HHSN261201500003l. This work used Jetstream2 at Indiana University through allocation CIS230102 from the Advanced Cyberinfrastructure Coordination Ecosystem Services and Support (ACCESS) program, which the National Science Foundation supports with grants numbers #2138259, #2138286, #2138307, #2137603, and #2138296. The authors would like to acknowledge the BROADBAND Research Project at the Brigham and Women’s Hospital Department of Radiation Oncology for providing regulatory and personnel support for this project. The BROADBAND Project was partly made possible by the generous donations of Stewart Clifford, Fredric Levin, and their families.
Conflict of interest
Author SP weas employed by Isomics, Inc. Author CP was employed by EBATINCA, S.L.
The remaining author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The authors RA, DA declared that they were an editorial board member of Frontiers at the time of submission. This had no impact on the peer review process and the final decision.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Ackermans, L., Volmer, L., Wee, L., Brecheisen, R., Sanchez-Gonzalez, P., Seiffert, A. P., et al. (2021). Deep learning automated segmentation for muscle and adipose tissue from abdominal computed tomography in polytrauma patients. Sensors (Basel) 21 (6), 2083. doi:10.3390/s21062083
R. N. Alkalay, and R. Adamson (2008). Un-contained osteolytic defect within the vertebral body of a human spine alters its structural behavior in response to complex loading: an experimental study (Lucerne, Switzerland: International Society for the Study of the Lumbar Spine).
Alkalay, R. N., and Harrigan, T. (2016). Mechanical assessment of the effects of metastatic lytic defect on the structural response of human thoracolumbar spine. J. Orthop. Res. 34 (10), 1808–1819. doi:10.1002/jor.23154
Alkalay, R. N., Groff, M. W., Stadelmann, M. A., Buck, F. M., Hoppe, S., Theumann, N., et al. (2022). Improved estimates of strength and stiffness in pathologic vertebrae with bone metastases using CT-derived bone density compared with radiographic bone lesion quality classification. J. Neurosurg. Spine 36 (1), 113–124. doi:10.3171/2021.2.SPINE202027
Allaire, B. T., Mousavi, S. J., James, J. N., Bouxsein, M. L., and Anderson, D. E. (2023). Dependence of trunk muscle size and position on age, height, and weight in a multi-ethnic cohort of middle-aged and older men and women. J. Biomech. 157, 111710. doi:10.1016/j.jbiomech.2023.111710
Amelot, A., Terrier, L. M., Farah, K., Aggad, M., Le Nail, L. R., Francois, P., et al. (2024). Impact of metastatic epidural spinal cord compression (MESCC) and pathological vertebral compression fracture (pVCF) in neurological and survival prognosis. Eur. J. Surg. Oncol. 50 (2), 107935. doi:10.1016/j.ejso.2023.107935
Anderson, D. E., Groff, M. W., Flood, T. F., Allaire, B. T., Davis, R. B., Stadelmann, M. A., et al. (2022). Evaluation of load-to-strength ratios in metastatic vertebrae and comparison with age and sex matched healthy individuals. Front. Bioeng. Biotechnol. 10, 866970. doi:10.3389/fbioe.2022.866970
Anderson, D. E., Keko, M., James, J., Allaire, B. T., Kozono, D., Doyle, P. F., et al. (2025). Metastatic spine disease alters spinal load-to-strength ratios in patients compared to healthy individuals. medRxiv [Prepeint], 2025.01.06.25320075. doi:10.1101/2025.01.06.25320075
Andersson, G. B., Ortengren, R., and Herberts, P. (1977). Quantitative electromyographic studies of back muscle activity related to posture and loading. Orthop. Clin. North Am. 8 (1), 85–96. doi:10.1016/s0030-5898(20)30938-x
Arjmand, N., and Shirazi-Adl, A. (2006). Model and in vivo studies on human trunk load partitioning and stability in isometric forward flexions. J. Biomech. 39 (3), 510–521. doi:10.1016/j.jbiomech.2004.11.030
Boerner, T., Deems, S., Furlani, T., Knuth, S., and Towns, J. (2023). ACCESS: advancing innovation: NSF’s advanced cyberinfrastructure coordination ecosystem: services and support, 173–176.
Bruno, A. G., Bouxsein, M. L., and Anderson, D. E. (2015). Development and validation of a musculoskeletal model of the fully articulated thoracolumbar spine and rib cage. J. Biomech. Eng. 137 (8), 081003. doi:10.1115/1.4030408
Burke, M. V., Atkins, A., Akens, M., Willett, T. L., and Whyne, C. M. (2016). Osteolytic and mixed cancer metastasis modulates collagen and mineral parameters within rat vertebral bone matrix. J. Orthop. Res. 34 (12), 2126–2136. doi:10.1002/jor.23248
Burke, M., Atkins, A., Kiss, A., Akens, M., Yee, A., and Whyne, C. (2017). The impact of metastasis on the mineral phase of vertebral bone tissue. J. Mech. Behav. Biomed. Mater 69, 75–84. doi:10.1016/j.jmbbm.2016.12.017
H. Cai, L. Qi, Q. Yu, Y. Shi, and Y. Gao (2023). “3D medical image segmentation with sparse annotation via cross-teaching between 3D and 2D networks,” International conference on medical image computing and computer-assisted intervention.
Cangussu-Oliveira, L. M., Porto, J. M., Freire Junior, R. C., Capato, L. L., Gomes, J. M., Herrero, C., et al. (2020). Association between the trunk muscle function performance and the presence of vertebral fracture in older women with low bone mass. Aging Clin. Exp. Res. 32 (6), 1067–1076. doi:10.1007/s40520-019-01296-2
Chalhoub, D., Boudreau, R., Greenspan, S., Newman, A. B., Zmuda, J., Frank-Wilson, A. W., et al. (2018). Associations between lean mass, muscle strength and power, and skeletal size, density and strength in older men. J. Bone Min. Res. 33 (9), 1612–1621. doi:10.1002/jbmr.3458
Cholewicki, J., Panjabi, M. M., and Khachatryan, A. (1997). Stabilizing function of trunk flexor-extensor muscles around a neutral spine posture. Spine (Phila Pa 1976) 22 (19), 2207–2212. doi:10.1097/00007632-199710010-00003
Christophy, M., Faruk Senan, N. A., Lotz, J. C., and O'Reilly, O. M. (2012). A musculoskeletal model for the lumbar spine. Biomech. Model Mechanobiol. 11 (1-2), 19–34. doi:10.1007/s10237-011-0290-6
Edwards, K., Chhabra, A., Dormer, J., Jones, P., Boutin, R. D., Lenchik, L., et al. (2020). Abdominal muscle segmentation from CT using a convolutional neural network. Proc. SPIE Int. Soc. Opt. Eng. 11317, 113170L. doi:10.1117/12.2549406
Fortin, M., and Macedo, L. G. (2013). Multifidus and paraspinal muscle group cross-sectional areas of patients with low back pain and control patients: a systematic review with a focus on blinding. Phys. Ther. 93 (7), 873–888. doi:10.2522/ptj.20120457
Gao, F., Hu, M., Zhong, M.-E., Feng, S., Tian, X., Meng, X., et al. (2022). Segmentation only uses sparse annotations: unified weakly and semi-supervised learning in medical images. Med. Image Anal. 80, 102515. doi:10.1016/j.media.2022.102515
Gotkowski, K., Lüth, C., Jäger, P. F., Ziegler, S., Krämer, L., Denner, S., et al. (2024). Embarrassingly simple scribble supervision for 3d medical segmentation. arXiv Preprint arXiv, 240312834.
Greig, A. M., Briggs, A. M., Bennell, K. L., and Hodges, P. W. (2014). Trunk muscle activity is modified in osteoporotic vertebral fracture and thoracic kyphosis with potential consequences for vertebral health. PLoS One 9 (10), e109515. doi:10.1371/journal.pone.0109515
Gwet, K. L. (2019). irrCAC: computing chance-corrected agreement coefficients (CAC). R. Package Version 1, 2019.
Hancock, D., Fischer, J., Lowe, J., Snapp-Childs, W., Pierce, M., Marru, S., et al. (2021). Jetstream2: accelerating cloud computing via jetstream, 1–8.
Hebuterne, X., Lemarie, E., Michallet, M., de Montreuil, C. B., Schneider, S. M., and Goldwasser, F. (2014). Prevalence of malnutrition and current use of nutrition support in patients with cancer. JPEN J. Parenter. Enter. Nutr. 38 (2), 196–204. doi:10.1177/0148607113502674
Heckel, F., Moltz, J. H., Meine, H., Geisler, B., Kiessling, A., D'Anastasi, M., et al. (2014). On the evaluation of segmentation editing tools. J. Med. Imaging (Bellingham) 1 (3), 034005. doi:10.1117/1.JMI.1.3.034005
Hemke, R., Buckless, C. G., Tsao, A., Wang, B., and Torriani, M. (2020). Deep learning for automated segmentation of pelvic muscles, fat, and bone from CT studies for body composition assessment. Skelet. Radiol. 49 (3), 387–395. doi:10.1007/s00256-019-03289-8
Hess, M., Allaire, B., Gao, K. T., Tibrewala, R., Inamdar, G., Bharadwaj, U., et al. (2022). Deep learning for multi-tissue segmentation and fully automatic personalized biomechanical models from BACPAC clinical lumbar spine MRI. Pain Med. 24 (Suppl. ment_1), S139–S148. doi:10.1093/pm/pnac142
Ignasiak, D., Dendorfer, S., and Ferguson, S. J. (2016). Thoracolumbar spine model with articulated ribcage for the prediction of dynamic spinal loading. J. Biomech. 49 (6), 959–966. doi:10.1016/j.jbiomech.2015.10.010
Isensee, F., Jaeger, P. F., Kohl, S. A. A., Petersen, J., and Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18 (2), 203–211. doi:10.1038/s41592-020-01008-z
Jiang, W. W., Cheng, C. L. K., Cheung, J. P. Y., Samartzis, D., Lai, K. K. L., To, M. K. T., et al. (2018). Patterns of coronal curve changes in forward bending posture: a 3D ultrasound study of adolescent idiopathic scoliosis patients. Eur. Spine J. 27 (9), 2139–2147. doi:10.1007/s00586-018-5646-5
Johannesdottir, F., Allaire, B., Anderson, D. E., Samelson, E. J., Kiel, D. P., and Bouxsein, M. L. (2018). Population-based study of age- and sex-related differences in muscle density and size in thoracic and lumbar spine: the framingham study. Osteoporos. Int. 29 (7), 1569–1580. doi:10.1007/s00198-018-4490-0
Kamiya, N. (2018). Muscle segmentation for orthopedic interventions. Adv. Exp. Med. Biol. 1093, 81–91. doi:10.1007/978-981-13-1396-7_7
Kim, H. B., Kim, H. S., Kim, S. J., and Yoo, J. I. (2024). Spine muscle auto segmentation techniques in MRI imaging: a systematic review. BMC Musculoskelet. Disord. 25 (1), 716. doi:10.1186/s12891-024-07777-4
Kolb, N. A., Smith, A. G., Singleton, J. R., Beck, S. L., Stoddard, G. J., Brown, S., et al. (2016). The association of chemotherapy-induced peripheral neuropathy symptoms and the risk of falling. JAMA Neurol. 73 (7), 860–866. doi:10.1001/jamaneurol.2016.0383
Lee, J.-S. (1983). Digital image smoothing and the sigma filter. Comput. Vision, Graphics, Image Processing 24 (2), 255–269. doi:10.1016/0734-189x(83)90047-6
Lei, W., Su, Q., Jiang, T., Gu, R., Wang, N., Liu, X., et al. (2024). One-shot weakly-supervised segmentation in 3D medical images. IEEE Trans. Med. Imaging 43 (1), 175–189. doi:10.1109/TMI.2023.3294975
Malakoutian, M., Sanchez, C. A., Brown, S. H. M., Street, J., Fels, S., and Oxland, T. R. (2022). Biomechanical properties of paraspinal muscles influence spinal Loading-A musculoskeletal simulation study. Front. Bioeng. Biotechnol. 10, 852201. doi:10.3389/fbioe.2022.852201
Martin, A., and Freyssenet, D. (2021). Phenotypic features of cancer cachexia-related loss of skeletal muscle mass and function: lessons from human and animal studies. J. Cachexia Sarcopenia Muscle 12 (2), 252–273. doi:10.1002/jcsm.12678
Martuscello, J. M., Nuzzo, J. L., Ashley, C. D., Campbell, B. I., Orriola, J. J., and Mayer, J. M. (2013). Systematic review of core muscle activity during physical fitness exercises. J. Strength Cond. Res. 27 (6), 1684–1698. doi:10.1519/JSC.0b013e318291b8da
Massaad, E., Saylor, P. J., Hadzipasic, M., Kiapour, A., Oh, K., Schwab, J. H., et al. (2021). The effectiveness of systemic therapies after surgery for metastatic renal cell carcinoma to the spine: a propensity analysis controlling for sarcopenia, frailty, and nutrition. J. Neurosurg. Spine 35 (3), 356–365. doi:10.3171/2020.12.SPINE201896
McGill, S. M., Grenier, S., Kavcic, N., and Cholewicki, J. (2003). Coordination of muscle activity to assure stability of the lumbar spine. J. Electromyogr. Kinesiol 13 (4), 353–359. doi:10.1016/s1050-6411(03)00043-9
Mokhtarzadeh, H., and Anderson, D. E. (2016). The role of trunk musculature in osteoporotic vertebral fractures: implications for prediction, prevention, and management. Curr. Osteoporos. Rep. 14 (3), 67–76. doi:10.1007/s11914-016-0305-4
Mokhtarzadeh, H., Anderson, D. E., Allaire, B. T., and Bouxsein, M. L. (2021). Patterns of load-to-strength ratios along the spine in a population-based cohort to evaluate the contribution of spinal loading to vertebral fractures. J. Bone Min. Res. 36 (4), 704–711. doi:10.1002/jbmr.4222
Morimoto, T., Toda, Y., Hakozaki, M., Paholpak, P., Watanabe, K., Kato, K., et al. (2024). A new era in the management of spinal metastasis. Front. Oncol. 14, 1374915. doi:10.3389/fonc.2024.1374915
Oefelein, M. G., Ricchiuti, V., Conrad, W., and Resnick, M. I. (2002). Skeletal fractures negatively correlate with overall survival in men with prostate cancer. J. Urol. 168 (3), 1005–1007. doi:10.1097/01.ju.0000024395.86788.cc
Oster, G., Lamerato, L., Glass, A. G., Richert-Boe, K. E., Lopez, A., Chung, K., et al. (2013). Natural history of skeletal-related events in patients with breast, lung, or prostate cancer and metastases to bone: a 15-year study in two large US health systems. Support Care Cancer 21 (12), 3279–3286. doi:10.1007/s00520-013-1887-3
Ouassit, Y., Ardchir, S., Ghoumari, M. Y. E., and Azouazi, M. (2022). A brief survey on weakly supervised semantic segmentation. Int. J. Online Biomed. Eng. 18, 83–113. doi:10.3991/ijoe.v18i10.31531
Pamoukdjian, F., Bouillet, T., Levy, V., Soussan, M., Zelek, L., and Paillaud, E. (2018). Prevalence and predictive value of pre-therapeutic sarcopenia in cancer patients: a systematic review. Clin. Nutr. 37 (4), 1101–1113. doi:10.1016/j.clnu.2017.07.010
Peus, D., Newcomb, N., and Hofer, S. (2013). Appraisal of the karnofsky performance status and proposal of a simple algorithmic system for its evaluation. BMC Med. Inf. Decis. Mak. 13, 72. doi:10.1186/1472-6947-13-72
Pond, G. R., Sonpavde, G., de Wit, R., Eisenberger, M. A., Tannock, I. F., and Armstrong, A. J. (2014). The prognostic importance of metastatic site in men with metastatic castration-resistant prostate cancer. Eur. Urol. 65 (1), 3–6. doi:10.1016/j.eururo.2013.09.024
Preim, B., and Botha, C. (2014). “Chapter 4 - image analysis for medical visualization,” in Visual computing for medicine. (Boston: Morgan Kaufmann), 111–175.
Rajaee, M. A., Arjmand, N., and Shirazi-Adl, A. (2021). A novel coupled musculoskeletal finite element model of the spine - critical evaluation of trunk models in some tasks. J. Biomech. 119, 110331. doi:10.1016/j.jbiomech.2021.110331
Robb, R. A. (2001). The biomedical imaging resource at Mayo clinic. IEEE Trans. Med. Imaging 20 (9), 854–867. doi:10.1109/42.952724
Rothrock, R. J., Barzilai, O., Reiner, A. S., Lis, E., Schmitt, A. M., Higginson, D. S., et al. (2021). Survival trends after surgery for spinal metastatic tumors: 20-year cancer center experience. Neurosurgery 88 (2), 402–412. doi:10.1093/neuros/nyaa380
Saad, F., Lipton, A., Cook, R., Chen, Y. M., Smith, M., and Coleman, R. (2007). Pathologic fractures correlate with reduced survival in patients with malignant bone disease. Cancer 110 (8), 1860–1867. doi:10.1002/cncr.22991
Santucci, C., Carioli, G., Bertuccio, P., Malvezzi, M., Pastorino, U., Boffetta, P., et al. (2020). Progress in cancer mortality, incidence, and survival: a global overview. Eur. J. Cancer Prev. 29 (5), 367–381. doi:10.1097/CEJ.0000000000000594
Shachar, S. S., Williams, G. R., Muss, H. B., and Nishijima, T. F. (2016). Prognostic value of sarcopenia in adults with solid tumours: a meta-analysis and systematic review. Eur. J. Cancer 57, 58–67. doi:10.1016/j.ejca.2015.12.030
Shen, H., He, P., Ren, Y., Huang, Z., Li, S., Wang, G., et al. (2023). A deep learning model based on the attention mechanism for automatic segmentation of abdominal muscle and fat for body composition assessment. Quant. Imaging Med. Surg. 13 (3), 1384–1398. doi:10.21037/qims-22-330
Soltani, Z., Xu, M., Radovitzky, R., Stadelmann, M. A., Hackney, D., and Alkalay, R. N. (2024). CT-based finite element simulating spatial bone damage accumulation predicts metastatic human vertebrae strength and stiffness. Front. Bioeng. Biotechnol. 12, 1424553. doi:10.3389/fbioe.2024.1424553
Soufi, M., Otake, Y., Iwasa, M., Uemura, K., Hakotani, T., Hashimoto, M., et al. (2025). Validation of musculoskeletal segmentation model with uncertainty estimation for bone and muscle assessment in hip-to-knee clinical CT images. Sci. Rep. 15 (1), 125. doi:10.1038/s41598-024-83793-7
Tamada, T., Sone, T., Jo, Y., Imai, S., Kajihara, Y., and Fukunaga, M. (2005). Three-dimensional trabecular bone architecture of the lumbar spine in bone metastasis from prostate cancer: comparison with degenerative sclerosis. Skelet. Radiol. 34 (3), 149–155. doi:10.1007/s00256-004-0855-x
Tschirhart, C. E., Nagpurkar, A., and Whyne, C. M. (2004). Effects of tumor location, shape and surface serration on burst fracture risk in the metastatic spine. J. Biomechanics 37 (5), 653–660. doi:10.1016/j.jbiomech.2003.09.027
US Cancer Statistics Working Group (2022). “US cancer statistics data visualizations tool bosd,” in Cancer facts & figures 2022. (American cancer society).
Van den Brande, R., Cornips, E. M., Peeters, M., Ost, P., Billiet, C., and Van de Kelft, E. (2022). Epidemiology of spinal metastases, metastatic epidural spinal cord compression and pathologic vertebral compression fractures in patients with solid tumors: a systematic review. J. Bone Oncol. 35, 100446. doi:10.1016/j.jbo.2022.100446
Wang, W., Wang, D., Falisse, A., Severijns, P., Overbergh, T., Moke, L., et al. (2021). A dynamic optimization approach for solving spine kinematics while calibrating subject-specific mechanical properties. Ann. Biomed. Eng. 49 (9), 2311–2322. doi:10.1007/s10439-021-02774-3
Wang, L., Yin, L., Zhao, Y., Su, Y., Sun, W., Chen, S., et al. (2021). Muscle density, but not size, correlates well with muscle strength and physical performance. J. Am. Med. Dir. Assoc. 22 (4), 751–9 e2. doi:10.1016/j.jamda.2020.06.052
Ward, P. R., Wong, M. D., Moore, R., and Naeim, A. (2014). Fall-related injuries in elderly cancer patients treated with neurotoxic chemotherapy: a retrospective cohort study. J. Geriatr. Oncol. 5 (1), 57–64. doi:10.1016/j.jgo.2013.10.002
Keywords: cancer, deep learning, muscle, segmenation, spine biomechanics, thoracolumbar sparse annotations
Citation: Hong V, Pieper S, James J, Anderson DE, Pinter C, Chang YS, Bulent A, Kozono D, Doyle P, Caplan S, Kang H, Balboni T, Spektor A, Keko M, Kikinis R, Hackney DB and Alkalay RN (2026) Automated segmentation of trunk musculature with a deep CNN trained from sparse annotations in radiation therapy patients with metastatic spine disease: an observational study. Front. Bioeng. Biotechnol. 13:1707724. doi: 10.3389/fbioe.2025.1707724
Received: 17 September 2025; Accepted: 29 December 2025;
Published: 02 February 2026.
Edited by:
Fabiano Bini, Sapienza University of Rome, ItalyReviewed by:
Mazen Soufi, University of Miyazaki, JapanSyed Furqan Qadri, BGI Life Sciences Institute, China
Copyright © 2026 Hong, Pieper, James, Anderson, Pinter, Chang, Bulent, Kozono, Doyle, Caplan, Kang, Balboni, Spektor, Keko, Kikinis, Hackney and Alkalay. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Ron Noah Alkalay, cm5fYWxrYWxheUBiaWRtYy5oYXJ2YXJkLmVkdQ==
Steve Pieper2