Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Big Data

Sec. Machine Learning and Artificial Intelligence

Volume 8 - 2025 | doi: 10.3389/fdata.2025.1670833

Decoding Deception: State-of-the-Art Approaches to Deep fake Detection

Provisionally accepted
Dr.Tarak  HussainDr.Tarak Hussain1*B. Tripathi  ReddyB. Tripathi Reddy2Kondaveti  PhanindraKondaveti Phanindra2
  • 1Assistant Professor Department of Computer Science& Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur 522302, India, Vaddeswaram, India
  • 2KL Deemed to be University, Vijayawada, India

The final, formatted version of the article will be published soon.

Abstract: Deepfake technology evolves at an alarming pace, threatening information integrity and social trust. We present new multimodal deepfake detection framework exploiting cross-domain inconsistencies, utilizing audio-visual consistency. Its core is the Synchronization-Aware Feature Fusion (SAFF) architecture combined with Cross-Modal Graph Attention Networks (CM-GAN), both addressing the temporal misalignments explicitly for improved detection accuracy. Across eight models and five benchmark datasets with 93,750 test samples, the framework obtains 98.76% accuracy and significant robustness against multiple compression levels. Synchronized audio-visual inconsistencies are thus highly discriminative according to statistical analysis (Cohen's d = 1.87). With contributions centering around a cross-modal feature extraction pipeline, a graph-based attention mechanism for inter-modal reasoning and an extensive number of ablation studies validating the fusion strategy, the paper also provides statistically sound insights to guide future pursuit in this area. With a 17.85% generalization advantage over unimodal methods, the framework represents a new state of the art and introduces a self-supervised pretraining strategy that leverages labeled data 65% less.

Keywords: DeepFake Detection, multimodal analysis, Audio-visual synchronization, Cross-Modal Graph Attention Networks, statistical validation, Algorithmic robustness, Self-supervised learning

Received: 23 Jul 2025; Accepted: 20 Oct 2025.

Copyright: © 2025 Hussain, Reddy and Phanindra. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Dr.Tarak Hussain, tariqsheakh2000@gmail.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.