AUTHOR=Zhou Qian , Zou Hua 

TITLE=A layer-wise fusion network incorporating self-supervised learning for multimodal MR image synthesis

JOURNAL=Frontiers in Genetics

VOLUME=Volume 13 - 2022

YEAR=2022

URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2022.937042

DOI=10.3389/fgene.2022.937042

ISSN=1664-8021

ABSTRACT=Magnetic resonance (MR) imaging plays an important role in medical diagnosis and treatment, different modalities of MR images can provide rich and complementary information to improve the accuracy of diagnosis. However, due to the limitations of scanning time and medical conditions, certain modalities of MR may be unavailable or low-quality in clinical practice. In this paper, we propose a new multimodal MR image synthesis network to generate missing MR images. Our proposed model contains three stages: feature extraction, feature fusion, and image generation. During the feature extraction, 2D and 3D self-supervised pretext tasks are introduced to pretrain the backbone for better representations of each modality. Then, a channel attention mechanism is used when fusing features so that the network can adaptively weight different fusion operations to learn common representations of all modalities. Finally, a generative adversarial network is considered as the basic framework to generate images, in which a feature-level edge information loss is combined with the pixel-wise loss to ensure the consistency between the synthesized and real images in terms of anatomical characteristics. 2D and 3D self-supervised pre-training can have better performance on feature extraction to retain more details in the synthetic images. Moreover, the proposed multimodal attention feature fusion block (MAFFB) in the well-designed layer-wise fusion strategy can model both common and unique information in all modalities, consistent with the clinical analysis. We also perform an interpretability analysis to confirm the rationality and effectiveness of our method. The experimental results demonstrate that our method can be applied in both single modal and multimodal synthesis with high robustness and outperforms other state-of-the-art approaches objectively and subjectively.