- 1Core Research & Development Center, Korea University Ansan Hospital, Ansan, Republic of Korea
- 2Coreline Soft Co. Ltd., Seoul, Republic of Korea
- 3Department of Pediatrics, College of Medicine, Korea University, Seoul, Republic of Korea
- 4Department of Convergence Medicine, College of Medicine, Korea University, Seoul, Republic of Korea
- 5Miso Information Technology Co. Ltd., Seoul, Republic of Korea
Introduction: Identifying the thoracic vertebra visible on chest radiographs is a standard practice to assess proper position of a tube and catheter tips within their designated anatomical target regions in critically ill newborn infants. We introduce a fully automated deep learning system based on the nnU-Net architecture for segmenting and labeling T1, T7, and T12 in neonatal chest radiographs.
Methods: We retrospectively collect 14,660 neonatal chest radiographs from 10 university hospitals in Korea, including both infants with tubes or catheters and those without. All images were deidentified and annotated for T1, T7, and T12 vertebrae using rectangular bounding boxes, validated by pediatricians. We split the dataset into training (11,860), validation (1,400), and test (1,400) sets, maintaining an even distribution by gestational age and birth weight.
Results: The automatic segmentation algorithm demonstrated excellent agreement with human-annotated segmentation for the T1, T7 and T12 vertebrae [Dice similarity coefficient (DSC): 0.8327, 95% CI: 0.8237–0.8418; 0.8322, 95% CI: 0.8213–0.8432; 0.7998, 95% CI: 0.7864–0.8133, respectively]. To identify the approximate location of each vertebra, a relatively modest DSC threshold of 0.50 or 0.60 already yielded an accuracy above 90% for T1, T7, and T12.
Conclusion: Our deep learning-based automated algorithm built on the nnU-Net framework could accurately segment and label T1, T7, and T12 thoracic vertebrae in neonatal chest radiographs. This artificial intelligence-driven approach can map anatomical target regions based on thoracic vertebrae for assessing the positioning of a tube and catheter tips in a neonatal intensive care unit.
Introduction
An endotracheal tube and various intravascular catheters, such as umbilical artery and/or vein catheters, are commonly used in neonatal intensive care units (NICUs) for life-supporting purposes, especially in critically ill newborn infants. Tips of tubes and/or catheters should be placed in specific anatomic positions to ensure proper operation and to minimize the risk of related complications (1–3).
In neonates, identifying the thoracic vertebra visible on chest radiographs is a standard practice to assess proper position of a tube and catheter tips within their designated anatomical target regions (4). For example, the tip of an endotracheal tube should be placed between the first thoracic vertebra (T1) and the second thoracic vertebra (T2). The tip of an umbilical artery catheter for high position should be placed at T6–T9 and the tip of an umbilical vein catheter should be placed at T8–T10. Additionally, to determine an appropriate inhalation status in critically ill infants, the diaphragm's position is identified at the level of posterior ribs of thoracic vertebrae T7–T9.
Recent advances in deep learning for medical image analysis have enabled the development of various approaches demonstrating high accuracy in detecting and classifying tubes and catheters on neonatal chest radiographs (5, 6). However, the lack of research on anatomical target regions in chest radiographs for evaluating proper placement of a tube and catheter tips poses challenges for clinical application in neonates.
This study aimed to address this gap by providing information on anatomical target regions based on thoracic vertebrae in neonatal chest radiographs. To achieve this, we developed a fully automated deep learning system utilizing the nnU-Net architecture to segment and label T1, T7, and T12. The nnU-Net framework, a self-configuring deep learning segmentation system widely used for medical imaging, was selected to ensure robust and reproducible performance across diverse neonatal radiographs. By defining these vertebrae as consistent anatomical reference points, the proposed system seeks to facilitate accurate and objective evaluation of tube and catheter positioning in neonatal clinical practice.
Methods
Data collection and dataset division
We retrospectively collected 14,660 neonatal chest radiographs (October 2022–February 2023) from 11 university hospitals in Korea, including both infants with tubes or catheters and those without. This study was approved by the Institutional Review Board (IRB) of each participating hospital (approval no. 2022AS0056) with a waiver of informed consent.
All images were deidentified and annotated for T1, T7, and T12 vertebrae using quadrilateral polygonal labels to reflect vertebral orientation in rotated or tilted infants. Annotations were reviewed independently by two pediatricians and finalized by consensus. We split the dataset into training (11,860), validation (1,400), and test (1,400) sets at the patient level, ensuring balanced distributions of gestational age (GA) and birth weight (BW). The hospital-wise dataset composition, including GA and BW group distributions, is summarized in Table 1. To reduce identifiability, GA was categorized into three groups: <28 weeks, 28–32 weeks, and ≥33 weeks, and BW was also categorized into three groups: <1,000 g, 1,000–1,500 g, and ≥1,500 g. Additional steps, including the removal of patient identifiers and DICOM metadata, were taken to ensure anonymization.
Each pixel was labeled into five classes: background (0), lung (1), T1 (2), T7 (3), and T12 (4). This labeling provides a foundation for detailed segmentation and analysis of key structures involved in this study.
Deep learning model
The nnU-Net framework, a two-dimensional U-Net architecture, was employed to segment T1, T7, and T12 vertebrae. The model runs on a high-performance workstation with GPU acceleration (7). Figure 1 shows an overview of the proposed network architecture. We used a batch size of 12 and a patch size of 448 × 576 pixels, leveraging nnU-Net's automated configuration for medical image segmentation. For training, we used a composite loss function combining Dice loss and Cross Entropy. The segmentation model was optimized using Stochastic Gradient Descent (SGD) with a Nesterov momentum of 0.99. A polynomial learning rate scheduler was used, starting with an initial learning rate of 0.01. The model was trained for 1,000 epochs to obtain final weights, and no early stopping was applied. In all experiments, a fixed random seed (102) was used to ensure reproducibility.
To support vertebral localization, the model was trained as a five-class segmentation task comprising background, lung, and the T1, T7, and T12 vertebrae. Lung was included as a contextual class, serving as a consistent anatomical landmark across variable neonatal postures and imaging fields, as illustrated in Supplementary Figure S1.
Data augmentation
Various data augmentation techniques, including rotation, scaling, and noise injection, were applied to enhance the model's robustness and generalization. By introducing these transformations, the model could better handle diverse clinical scenarios without overfitting. Augmentation details are shown in Table 2.
Pre- and post-processing
We performed Z-score normalization and resized all images to have a uniform resolution. This ensured consistency across the dataset, allowing the model to focus on relevant features rather than variations in image intensity or scale.
Post-processing was performed using Connected Component Labeling (CCL) to eliminate spurious fragmented predictions and retain only the largest connected component for each vertebral class (T1, T7, T12). This step reduced noise and improved the anatomical plausibility of the segmented output, as shown in Supplementary Figure S2.
Dice similarity coefficient (DSC)
To evaluate concordance between automated and manual segmentations, the Dice similarity coefficient (DSC) was primarily used. In addition, we computed Intersection-over-Union (IoU) values, 95% Hausdorff distances (HD95; mm), and mean surface distances (MSD; mm); all metrics are summarized in Table 3. Supplementary Figure S3 illustrates qualitative differences across DSC thresholds (0.3–0.8) and supports 0.5 as a practical cutoff for acceptable segmentation quality. It was calculated with the following formula: DSC = 2 (AS ∩ MS)/(AS + MS). DSC values ranged from 0 to 1. A DSC value of 0 indicated complete discordance between automated and manual segmentations, while a value of 1 signified perfect concordance and identical segmentation shapes (8). Because each thoracic vertebra (T1, T7, T12) was segmented as a separate class, we calculated DSC values for each class individually (DSC_T1, DSC_T7, DSC_T12). Additionally, since manual ground truth used quadrilateral polygon labels, reported DSC values might be conservative when predictions are smaller or differ in shape from annotated polygons. For patient-level reporting, image-level metrics were averaged per patient before summarizing.
Table 3. Overall segmentation performance at the patient level in neonatal chest radiographs, including dice similarity coefficients (DSCs), intersection-over-union (IoU) values, 95% hausdorff distances (HD95; mm), and mean surface distances (MSD; mm) for T1, T7, and T12.
Results
Training behavior and convergence
During training, both training and validation losses showed a steady decrease and eventually converged, indicating minimal risk of overfitting. Notably, the validation Dice Similarity Coefficient (DSC) converged within a range of 0.80–0.85 across T1, T7, and T12 vertebrae, mirroring the trend observed in the training set. This trend was consistent across hospitals with balanced distributions of gestational age (GA) and birth weight (BW) (Table 1), indicating stable training behavior across datasets. The dataset was balanced among gestational-age and birth-weight groups, which helped mitigate potential bias related to demographic variations. Validation and test performance remained stable between hospitals, demonstrating consistent model behavior across diverse clinical data.
Furthermore, the final performance on the test set closely matched validation results, with DSC values remaining within the same range, reinforcing the robustness of the trained model under real-world conditions. Learning curves showed stable convergence of training and validation losses, and validation DSC improved consistently without overfitting (Supplementary Figure S4).
Quantitative segmentation performance
The performance of the segmentation model was evaluated using test datasets. Overall segmentation performance is summarized in Table 3, including DSC values (all >0.79), Intersection-over-Union (IoU) values, 95% Hausdorff distances (HD95; mm), and mean surface distances (MSD; mm), each reported with narrow confidence intervals across T1, T7, and T12. This accuracy would allow clinicians to identify key anatomical landmarks reliably.
Threshold-based accuracy for clinical localization
To complement the primary evaluation, Table 4 presents accuracy rates at various DSC thresholds. For example, thresholds of ≥0.5 or ≥0.6 yielded over 90% accuracy for T1, T7, and T12, suggesting that the model can support clinical landmark identification even when perfect pixel-level alignment is unnecessary. A range of DSC threshold values (e.g., 0.50, 0.60, 0.70, etc.) was applied to determine whether each vertebra segmentation was acceptably accurate at the patient level. Given that our primary objective was to identify the approximate location of each vertebra, a relatively modest threshold of 0.50 or 0.60 already yielded an accuracy above 90% for T1, T7, and T12. This level of performance is generally sufficient for clinical tasks where exact pixel-level concordance is less critical.
Table 4. T1, T7, and T12 accuracies at the patient level in neonatal chest radiographs at different dice similarity coefficient (DSC) cutoff values.
Qualitative assessment of segmentation
To qualitatively evaluate the model's segmentation performance across various scales and cases, representative results are presented in Figure 2, where automatic predictions (red), manual ground truth (green), and overlapping areas (yellow) are visualized. Supplementary Figure S3 further illustrates qualitative differences across DSC thresholds (0.3–0.8), showing that DSC ≥ 0.5 corresponds to visually acceptable localization. Although DSC values varied, the automatic segmentation demonstrated strong visual concordance with the actual thoracic vertebral bodies, providing insight into model performance and potential directions for refinement.
Figure 2. Representative visualization of segmentation results for three cases. Automatic segmentation results are shown in red. Manual segmentation ground truth is shown in green. Overlapping areas indicating concordance are shown in yellow. Dice Similarity Coefficients (DSCs) were used to quantify agreement between automated and manual segmentations.
Discussion
Overall clinical implications of vertebral labeling
To the best of our knowledge, this is the first study that demonstrates a fully automated deep-learning system designed to accurately segment and label thoracic vertebrae in neonatal chest radiographic images, particularly in assessing proper positioning of tube and catheter tips and the appropriate inhalation status.
In this study, the accuracy of our deep-learning system for labeling of each vertebra (T1, T7, T12) was evaluated using DSC. Patient-level accuracy based on various cutoff values of DSC is presented in Table 4.
In clinical applications, anatomical information on thoracic vertebrae in neonatal chest radiographs can be obtained using various DSC thresholds depending on the clinical purpose. Simply, for identifying and assigning numbers to the thoracic vertebra on neonatal chest radiographs, high accuracy can be achieved with a relatively low cutoff value such as 0.50.
However, in infants with very low birth weight, accurate identification of anatomical target regions is essential for assessing the proper positioning of tube and catheter tips. In these cases, precision is not only affected by the location, but also affected by the shape and size of the vertebrae. To achieve this level of precision, a higher cutoff value for the DSC is necessary, with 0.80 being clinically appropriate. Nonetheless, the algorithm used in this study demonstrated a high accuracy that would be clinically acceptable.
Labeling strategy and multi-metric evaluation
During this study, trained experts performed manual segmentation of thoracic vertebrae. However, several limitations were encountered. Due to constraints of the segmentation tools, annotations were limited to rectangular regions. In addition, manual segmentation was also performed by several individuals using variously sized rectangular shapes. We found that automated segmentation was generally accurate, whereas manual results varied in consistency and precision, which lowered concordance and reduced DSC scores.
In our revised labeling strategy, vertebral regions were defined using quadrilateral polygons to better reflect vertebral orientation and reduce annotation–prediction shape mismatch. Accordingly, we reported standard segmentation metrics such as DSC, IoU with 95% confidence intervals, HD95, and MSD as primary outcomes (Table 3), while threshold-based accuracies (Table 4) provided a task-oriented supplementary view for clinical localization. Patient-level scores were calculated as per-patient averages of image-level metrics to support consistent interpretation at hospitals. These refinements collectively improved reproducibility and inter-hospital consistency.
Therefore, independent use of automated deep learning systems is expected to improve patient-level accuracy, with higher values observed at elevated DSC thresholds, demonstrating more precise concordance. The inclusion of a lung mask as an auxiliary input and the application of connected component labeling (CCL) during post-processing improved localization and reduced fragmented predictions (Supplementary Figures S1, S2), which suggests the system is reliable and well-suited for clinical use.
Methodological considerations and limitations of the nnU-Net framework
Recent advancements in digital image analysis utilizing deep learning have positioned artificial intelligence as an indispensable tool in the medical field. Specifically, medical image classification has proven valuable in aiding healthcare professionals to make more accurate clinical decisions and reduce the risk of misdiagnosis (9, 10).
In recent years, deep convolutional neural networks (CNNs) have demonstrated exceptional efficacy in various pattern recognition tasks, including object detection, semantic segmentation, and classification in medical imaging, underscoring their potential to revolutionize the analysis and interpretation of medical images (11, 12). Deep classification methods in particular are well-suited for the proposed task, as these models require only image-level labels for training, thereby minimizing the need for detailed annotations (13). In particular, utilizing architectures such as nnU-Net allows for robust performance for performing medical image segmentation tasks due to its self-configuring nature and adaptability to various datasets (7).
In our study, the nnU-Net framework offered remarkable performance for thoracic vertebra segmentation. However, it primarily focuses on pixel-level segmentation tasks without directly integrating anatomical priors or clinical workflows. This limitation means that while the model can accurately delineate shapes of the vertebrae, it could not fully account for their anatomical context or nuances of clinical decision-making. In future studies, incorporating rule-based approaches or additional imaging modalities (e.g., CT or MRI) could enhance the model's ability to capture spatial relationships and clinical significance of segmented structures.
Future directions for clinical translation
Clinically, our study facilitated precise determination of the thoracic vertebrae's location and size in neonatal chest radiographs, laying the groundwork for additional applications. Based on these findings, we aim to develop an integrated algorithm that not only can segment vertebrae, but also can identify adjacent anatomical regions to verify positions of tubes or catheter tips. Such an algorithm would be particularly beneficial for extremely low birth weight infants, in whom even minor deviations in tube or catheter placement can have significant clinical implications. Extending this approach to include other relevant structures (e.g., ribs, lung fields) could further augment computer-aided diagnostic systems, ultimately improving the safety and effectiveness of NICU interventions.
In conclusion, we propose a fully automated deep learning-based algorithm built on the nnU-Net architecture and designed to accurately segment and label thoracic vertebrae in neonatal chest radiographs. This artificial intelligence-driven approach can map anatomical target regions based on thoracic vertebrae for positioning tubes and catheter tips.
In future prospective studies, deep learning algorithms for multi-class classification of neonatal tubes and catheters should be integrated with these anatomical target regions based on thoracic vertebrae. Incorporating this approach into computer-aided diagnostic systems in the NICU may enhance the accuracy of tube and catheter tip localization, thus enabling neonatologists to perform more timely and precise assessments in clinical practice.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.
Ethics statement
The studies involving humans were approved by Korea University Ansan Hospital. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants' legal guardians/next of kin in accordance with the national legislation and institutional requirements.
Author contributions
SJ: Software, Formal analysis, Writing – original draft, Methodology, Data curation, Writing – review & editing, Supervision. HY: Writing – original draft, Investigation, Writing – review & editing, Data curation, Methodology, Resources, Validation, Formal analysis. HC: Writing – original draft, Supervision, Methodology, Data curation, Investigation, Validation. JK: Formal analysis, Validation, Conceptualization, Supervision, Writing – review & editing, Funding acquisition. DY: Supervision, Software, Formal analysis, Writing – original draft, Data curation. JS: Project administration, Supervision, Validation, Conceptualization, Software, Investigation, Writing – original draft. BC: Conceptualization, Writing – review & editing, Validation, Investigation, Funding acquisition, Project administration, Supervision, Formal analysis, Resources, Writing – original draft.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This work was supported by a Korea University Ansan Hospital (Grant number: K2409201) and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2024-00457819).
Conflict of interest
HY, DY were employed by Coreline Soft Co Ltd.; JS was employed by Miso Information Technology Co Ltd.
The remaining author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fped.2026.1673925/full#supplementary-material
References
1. Concepcion NDP, Laya BF, Lee EY. Current updates in catheters, tubes and drains in the pediatric chest: a practical evaluation approach. Eur J Radiol. (2017) 95:409–17. doi: 10.1016/j.ejrad.2016.06.015
2. Hermansen MC, Hermansen MG. Intravascular catheter complications in the neonatal intensive care unit. Clin Perinatol. (2005) 32(1):141–56. doi: 10.1016/j.clp.2004.11.005
3. MacDonald MG, Ramasethu J, Rais-Bahrami K. Atlas of Procedures in Neonatology. 5th ed. Philadelphia: Lippincott Williams & Wilkins (2013).
4. Henderson RDE, Padash S, Adams SJ, Augusta C, Yi X, Babyn P. Neonatal catheter and tube placement and radiographic assessment statistics in relation to important anatomic landmarks. Am J Perinatol. (2024) 41:e2299–306. doi: 10.1055/s-0043-1771051
5. Henderson RDE, Yi X, Adams SJ, Babyn P. Automatic detection and classification of multiple catheters in neonatal radiographs with deep learning. J Digit Imaging. (2021) 34:888–97. doi: 10.1007/s10278-021-00473-y
6. Yi X, Adams SJ, Henderson RDE, Babyn P. Computer-aided assessment of catheters and tubes on radiographs: how good is artificial intelligence for assessment? Radiol Artif Intell. (2020) 2:e190082. doi: 10.1148/ryai.2020190082
7. Isensee F, Jaeger PF, Kohl SAA, Petersen J, Maier-Hein KH. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. (2021) 18:203–11. doi: 10.1038/s41592-020-01008-z
8. Zou KH, Warfield SK, Bharatha A, Tempany CMC, Kaus MR, Haker SJ, et al. Statistical validation of image segmentation quality based on a spatial overlap index 1. Acad Radiol. (2004) 11:178–89. doi: 10.1016/S1076-6332(03)00671-8
9. Tang H, Hu Z. Research on medical image classification based on machine learning. IEEE Access. (2020) 8:93145–54. doi: 10.1109/ACCESS.2020.2993887
10. Zhou SK, Greenspan H, Davatzikos C, Duncan JS, Ginneken BV, Madabhushi A, et al. A review of deep learning in medical imaging: imaging traits, technology trends, case studies with progress highlights, and future promises. Proc IEEE Inst Electr Electron Eng. (2021) 109:820–38. doi: 10.1109/JPROC.2021.3054390
11. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. A survey on deep learning in medical image analysis. Med Image Anal. (2017) 42:60–88. doi: 10.1016/j.media.2017.07.005
12. Ravì D, Wong C, Deligianni F, Berthelot M, Andreu-Perez J, Lo B, et al. Deep learning for health informatics. IEEE J Biomed Heal Informatics. (2017) 21:4–21. doi: 10.1109/JBHI.2016.2636665
Keywords: automatic segmentation, chest radiographs, labeling, neonate, thoracic vertebrae
Citation: Jung S, Yun H, Cho HW, Kim J, Yu D, Son J and Choi BM (2026) Automatic segmentation and labeling of T1, T7, and T12 thoracic vertebrae in neonatal chest radiographs: a deep learning approach using nnU-Net framework. Front. Pediatr. 14:1673925. doi: 10.3389/fped.2026.1673925
Received: 26 July 2025; Revised: 8 December 2025;
Accepted: 12 January 2026;
Published: 5 February 2026.
Edited by:
Moulay Akhloufi, Université de Moncton, CanadaReviewed by:
Bojan Žlahtič, University of Maribor, SloveniaRakesh Choudhary, SUNY Upstate Medical University, United States
Copyright: © 2026 Jung, Yun, Cho, Kim, Yu, Son and Choi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Byung Min Choi, Y2JtaW5Aa29yZWEuYWMua3I=
†These authors share first authorship