AUTHOR=Jin Nani , Ye Renjia , Li Peng TITLE=Diagnosis of depression based on facial multimodal data JOURNAL=Frontiers in Psychiatry VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/psychiatry/articles/10.3389/fpsyt.2025.1508772 DOI=10.3389/fpsyt.2025.1508772 ISSN=1664-0640 ABSTRACT=IntroductionDepression is a serious mental health disease. Traditional scale-based depression diagnosis methods often have problems of strong subjectivity and high misdiagnosis rate, so it is particularly important to develop automatic diagnostic tools based on objective indicators.MethodsThis study proposes a deep learning method that fuses multimodal data to automatically diagnose depression using facial video and audio data. We use spatiotemporal attention module to enhance the extraction of visual features and combine the Graph Convolutional Network (GCN) and the Long and Short Term Memory (LSTM) to analyze the audio features. Through the multi-modal feature fusion, the model can effectively capture different feature patterns related to depression.ResultsWe conduct extensive experiments on the publicly available clinical dataset, the Extended Distress Analysis Interview Corpus (E-DAIC). The experimental results show that we achieve robust accuracy on the E-DAIC dataset, with a Mean Absolute Error (MAE) of 3.51 in estimating PHQ-8 scores from recorded interviews.DiscussionCompared with existing methods, our model shows excellent performance in multi-modal information fusion, which is suitable for early evaluation of depression.