AUTHOR=Li Feng , Luo Jiusong , Wang Lingling , Liu Wei , Sang Xiaoshuang TITLE=GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition JOURNAL=Frontiers in Neuroscience VOLUME=Volume 17 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2023.1183132 DOI=10.3389/fnins.2023.1183132 ISSN=1662-453X ABSTRACT=Emotion recognition plays an essential role in interpersonal communication. However, existing emotion recognition systems use only features of a single modality for emotion recognition, ignoring the interaction of multimodal information. In order to improve the accuracy of speech emotion recognition, we propose a Global-aware Cross-modal feature Fusion Network (GCF$^2$-Net) for recognizing speech emotion in this paper. Firstly, we introduce a residual cross-modal fusion attention module (ResCMFA) and a Global-Aware block to fuse information from multiple modalities and capture global information. Then, we utilize transfer learning to extract wav2vec 2.0 features and text features that are fused by the ResCMFA module. The multimodal features are fed into the Global-Aware block to capture the most important emotional information on a global information. Finally, the experiment results have shown that our proposed method has significant advantages than state-of-the-art methods on the IEMOCAP and MELD datasets. Our model improves WA by 1.65$\%$ and UA by 1.10$\%$ on the IEMOCAP dataset, and improves accuracy by 1.90$\%$ and weighted average F1 by 1.10$\%$ on the MELD dataset, respectively.