AUTHOR=Zhu Yixin , Wu Ji , Long Qiongxian , Li Yan , Luo Hao , Pang Lu , Zhu Lin , Luo Hui TITLE=Multimodal deep learning with MUF-net for noninvasive WHO/ISUP grading of renal cell carcinoma using CEUS and B-mode ultrasound JOURNAL=Frontiers in Physiology VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/physiology/articles/10.3389/fphys.2025.1558997 DOI=10.3389/fphys.2025.1558997 ISSN=1664-042X ABSTRACT=ObjectiveThis study aimed to develop and validate a multimodal deep learning model that utilizes preoperative grayscale and contrast-enhanced ultrasound (CEUS) video data for noninvasive WHO/ISUP nuclear grading of renal cell carcinoma (RCC).MethodsIn this dual-center retrospective study, CEUS videos from 100 patients with RCC collected between June 2012 and June 2021 were analyzed. A total of 6,293 ultrasound images were categorized into low-grade (G1-G2) and high-grade (G3-G4) groups. A novel model, the Multimodal Ultrasound Fusion Network (MUF-Net), integrated B-mode and CEUS modalities to extract and fuse image features using a weighted sum of predicted weights. Model performance was assessed using five-fold cross-validation and compared to single-modality models. Grad-CAM visualization highlighted key regions influencing the model’s predictions.ResultsMUF-Net achieved an accuracy of 85.9%, outperforming B-mode (80.8%) and CEUS-mode (81.8%, P < 0.05) models. Sensitivities were 85.1%, 80.2%, and 77.8%, while specificities were 86.0%, 82.5%, and 82.7%, respectively. The AUC of MUF-Net (0.909, 95% CI: 0.829-0.990) was superior to B-mode (0.838, 95% CI: 0.689-0.988) and CEUS-mode (0.845, 95% CI: 0.745-0.944). Grad-CAM analysis revealed distinct and complementary salient regions across modalities.ConclusionMUF-Net provides accurate and interpretable RCC nuclear grading, surpassing unimodal approaches, with Grad-CAM offering intuitive insights into the model’s predictions.