AUTHOR=Wang Aobo , Wang Tianyi , Liu Xingyu , Fan Ning , Yuan Shuo , Du Peng , Zou Congying , Chen Ruiyuan , Xi Yu , Gu Zhao , Song Hongxing , Fei Qi , Zhang Yiling , Zang Lei TITLE=Automated diagnosis and grading of lumbar intervertebral disc degeneration based on a modified YOLO framework JOURNAL=Frontiers in Bioengineering and Biotechnology VOLUME=Volume 13 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/bioengineering-and-biotechnology/articles/10.3389/fbioe.2025.1526478 DOI=10.3389/fbioe.2025.1526478 ISSN=2296-4185 ABSTRACT=BackgroundThe high prevalence of low back pain has led to an increasing demand for the analysis of lumbar magnetic resonance (MR) images. This study aimed to develop and evaluate a deep-learning-assisted automated system for diagnosing and grading lumbar intervertebral disc degeneration based on lumbar T2-weighted sagittal and axial MR images.MethodsThis study included a total of 472 patients who underwent lumbar MR scans between January 2021 and November 2023, with 420 in the internal dataset and 52 in the external dataset. The MR images were evaluated and labeled by experts according to current guidelines, and the results were considered the ground truth. The annotations included the Pfirrmann grading of disc degeneration, disc herniation, and high-intensity zones (HIZ). The automated diagnostic model was based on the YOLOv5 network, modified by adding an attention module in the Cross Stage Partial part and a residual module in the Spatial Pyramid Pooling-Fast part. The model’s diagnostic performance was evaluated by calculating the precision, recall, F1 score, and area under the receiver operating characteristic curve.ResultsIn the internal test set, the model achieved precisions of 0.78–0.91, 0.90–0.92, and 0.82 and recalls of 0.86–0.91, 0.90–0.93, and 0.81–0.88 for disc degeneration grading, disc herniation diagnosis, and HIZ detection, respectively. In the external test set, the precision values for disc degeneration grading, herniation diagnosis, and HIZ detection were 0.73–0.87, 0.86–0.92, and 0.74–0.84 and recalls were 0.79–0.87, 0.88–0.91, and 0.77–0.78, respectively.ConclusionThe proposed model demonstrated a relatively high diagnostic and classification performance and exhibited considerable consistency with expert evaluation.