- Department of Computer Science, Faculty of Mathematics, Statistics, and Computer Science, University of Tabriz, Tabriz, Iran
Introduction: Fake news has become a significant threat to public discourse due to the swift spread of online content and the difficulty of detecting and distinguishing it from real news. This challenge is further amplified by society's increasing dependence on online social networks. Many researchers have developed machine learning and deep learning models to combat the spread of misinformation and identify fake news. However, the studies focused on a single language, and the performance analysis achieved a low accuracy, especially for Arabic, which faces challenges due to resource constraints and linguistic intricacies.
Methods: This paper introduces an effective deep-learning technique for fake news detection (FND) in Arabic and English. The proposed model integrates a multi-channel Convolutional Neural Network (CNN) and dual Bidirectional Long Short-Term Memory (BiLSTM), parallelly capturing semantic and local textual features embedded by a pre-trained FastText model. Subsequently, a global max-pooling layer was added to reduce dimensionality and extract salient features from the sequential output. Finally, the model classifies news as fake or real. Moreover, the model is trained and evaluated on three benchmark datasets, AFND and ANS, Arabic datasets, and WELFake, an English dataset.
Results: Experimental results highlight the model's effectiveness and performance superiority over state-of-the-art (SOTA) approaches, with (94.43 ± 0.19)%, (71.63 ± 1.45)%, and (98.85 ± 0.03)%, accuracy on AFND, ANS and WELFake, respectively.
Discussion: This work provides a robust approach to combating misinformation, offering practical applications in enhancing the reliability of information on social networks.
1 Introduction
Online social networks (OSNs) like Facebook, Twitter, and Instagram have emerged as the major sources of information in recent years. They make sharing news articles and trending topics easy. However, this ease of sharing information brought a major challenge: the credibility of that information. Fake news is fabricated or misleading information that presents itself as an actual news item and gets circulated through various channels (Lazer et al., 2018) to mislead people and influence their opinions or decisions (Giglou et al., 2020). Fake news is additionally referred to as misinformation, disinformation, hoax, and rumor in the relevant literature, all accounting for various types of false information (Nasir et al., 2021). During the 2016 U.S. presidential elections, fake news widely proliferated (Samadi et al., 2021; Alghamdi et al., 2024). During that time, the fake accounts thronged social media platforms like Twitter and Facebook and indulged in spreading adverse information about political candidates Hillary Clinton and Donald Trump.
Recent studies indicate a growing interest in FND, as researchers seek to develop effective systems to detect misleading news in multiple languages (Faustini and Covões, 2020), including English and others. This also applies to Arabic-speaking communities, especially in political, health, and celebrity news. Nonetheless, Arabic is a uniquely complex language; researchers face additional challenges when dealing with fake news in Arabic compared to other languages (Aljohani, 2024; Touahri and Mazroui, 2024). Arabic is a rich morphological language, considering its intricate grammar and multiple dialects; thus, it poses a further challenge for the natural language processing (NLP) models to perceive and analyze the Arabic text correctly. An Arabic word can change entirely due to adding prefixes and suffixes. Different contexts reflect different meanings with every word. For example, the term “ذهب” means “gold” if it comes as a noun, but could refer to “to go” when it is used as a verb. Moreover, the researchers face a shortage of resources in Arabic compared to English, which further limits the ability to develop models that could identify fake news effectively (Al Anezi, 2022; Alnabrisi and Saad, 2024). At the core of all these challenges, the unavailability of proper databases for training such models in Arabic remains a significant challenge. Therefore, due to these challenges, the domain of FND in Arabic continues to see insufficient study and unsatisfactory outcomes, presenting a substantial problem for scholars. To systematically illustrate these challenges and to highlight the specific architectural considerations required to address them, Table 1 provides a detailed breakdown of key linguistic complexities inherent in Arabic. It presents each challenge with a practical example, compares it to English to clarify the distinction for a broader audience, outlines its direct implications for FND, and introduces the fundamental architectural principle needed for its mitigation.
Early studies in FND leaned on traditional, frequency-based methods for representing text. Aljwari et al. (2022) and Faustini and Covões (2020) used Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) to extract features. Subsequently, machine learning (ML) models were employed to detect whether news articles are real or fake. Sedik et al. (2022) and Ouassil et al. (2022) identified credible news by combining Word2Vec and GloVe word embeddings with deep learning (DL) methods, including Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and Long Short-Term Memory (LSTM) networks. Other studies, AlEsawi and Haqi Al-Tai (2024) and Samadi et al. (2021), have endeavored to integrate transformer pre-trained models, including BERT and GPT-2, with DL-based models. This transition to DL allowed models to autonomously extract more significant features from the data, resulting in progress in detecting bogus news.
Significant progress has been made in FND in English; there are very scant studies combining CNN and LSTM to evaluate the validity of news articles in Arabic. Integrated CNN and LSTM models have been found to perform differently on different datasets (Verma et al., 2021). This lacuna results from the absence of exhaustive experimentation with influential hyperparameters. The hyperparameters, like the number of LSTM layers, CNN channels, and filter dimensions, are not well studied and are mostly ineffective across varied fake news datasets of differing sizes. This lack of adaptability ends up discouraging model generalizability. The method proposed employs an ensemble of dual BiLSTM and multi-channel CNN for FND in Arabic and English to tackle these problems.
This paper introduces an effective model based on pre-trained word embedding and an ensemble of DL models for FND. The main innovation of our study is proposing a four-layered model comprising an input, a word embedding, a feature extraction, and a classification layer. At first, news articles are input into the system for the preprocessing process to filter out irrelevant information. Next, the word embedding layer takes over the responsibility for the representation of the text data as numerical vectors by using the FastText pre-trained approach. The core of the model is the feature representation layer in the form of two methodologies running in parallel: dual BiLSTM and multi-channel CNN. The BiLSTM captures sequential information and long-range dependencies in text, while the multi-channel CNN captures the local features from the text using different filter sizes (2, 3, and 4), enabling the collection of patterns of variable lengths. This allows the capture of patterns with variable lengths. The resulting representations from these layers are combined and further fed into a global max-pooling layer that is important in reducing dimensionality and critical in extracting the most outstanding characteristics from the various concatenated outputs. Ultimately, in the classification layer, the global max-pooling results are fed to dense layers to produce a final prediction: whether the input text is “Real” or “Fake”. The architecture utilizes the advantages of BiLSTM and CNN to capture many facets of textual data, improving the model's efficacy in FND while facing challenges in spotting misinformation.
Despite the progress made, a look at the present literature points to a critical gap in the literature. Many of the dominant models adopt DL ning architectures like CNNs or LSTMs, individually or sequentially, and these might limit their ability to simultaneously exploit a range of subtle textual features. Specifically, CNNs are particularly effective at detecting local features (e.g., important phrases and n-grams), while BiLSTMs are particularly effective at detecting long-distance contextual relations in sequences. The potential to combine these respective strengths in a parallel architecture is left too poorly studied, particularly in a context. This is the crux of our key contribution: we present an original hybrid design combining a Multi-Channel CNN (to extract features of mixed length) with a Dual BiLSTM (to achieve a richer contextual understanding) running in parallel. This design allows the model to simultaneously leverage the respective advantages of both approaches and construct a more robust and thorough feature representation to exploit this identified gap.
The rest of the paper is structured as follows: Section 2 examines the pertinent literature on FND. Section 3 delineates the design technique of the proposed model. Section 4 presents the experimental results, alongside substantial comparison to the SOTA results, and discusses an ablation study. Conclusion and possibilities for future enhancements are presented in Section 5.
2 Related work
While considerable studies exist on FND, the Arabic language has received limited attention in this area, resulting in a noticeable gap. To better understand the landscape of existing approaches, the literature review is categorized into FND for English and Arabic languages.
2.1 English fake news detection
Ahmed et al. (2017) suggested a model for N-gram analysis. They examined feature extraction techniques [Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF)] and six various ML approaches to online FND. Using TF-IDF and Linear Support Vector Machine (LSVM) for feature extraction and classification, attained the best accuracy. Nasir et al. (2021) investigated whether CNN-LSTM was efficient for Twitter FND. Adding a one-dimensional CNN after the word embedding in the LSTM model improved accuracy to 80%. FNDNet, introduced by Kaliyar et al. (2020), takes advantage of the GloVe pre-trained model for creating word embedding vectors. The architecture consists of three parallel CNNs with varying kernel sizes to extract multiscale information from the text. Goldani et al. (2021) presented a new method for FND, capsule neural networks (CapsNet). It is based on two architectures to learn varying lengths of news statements and non-static embedding in learning.
In 2021, Saleh et al. (2021) implemented an optimized CNN architecture in which TF-IDF was employed to perform feature extraction, after which six layers were added to extract both low and high-level features. At each level, the parameters were tuned using Hyperopt optimization techniques. A novel model for FND was adopted by Mohapatra et al. (2022) through a hybrid model consisting of a BiLSTM network alongside a self-attention mechanism. The included structure of the BiLSTM, self-attention, and other layers can pick out and classify fake news articles correctly, distinguishing them from actual ones. Khan et al. (2022) introduced BiCHAT, an innovative solution for spotting hate speech. It combines BERT embeddings with the power of BiLSTMs, deep CNNs, and hierarchical attention, making it a really interesting approach in the field. Meanwhile, Fazil et al. (2023) introduced a new approach for hate speech prediction by leveraging the combined strengths of multi-channel CNNs and BiLSTM cells with an attention mechanism.
In 2025, Jain et al. (2025) suggested a hybrid CNN-BiLSTM model enhanced by Harris Hawks Optimization (HHO) for selecting features in FND. The model includes data pre-processing, extraction of 80 linguistic features, optimization by HHO reducing redundancy, and classification by CNN and sequential dependence by BiLSTM. The model is tested using ISOT, Kaggle, ConFake, and McIntire datasets, and outperforms SOTA methods. Chen and Yin (2025) tested the performance of DistilBERT and CNN-LSTM variants using GloVe embeddings on a Kaggle dataset and reported 99.65% accuracy for DistilBERT while beating baseline models, but being highly computationally costly and restricted to the English language. Finally, Raza et al. (2025) proposed a BERT-based framework with progressive training for FND, using the WELFake dataset. Methods include BERT fine-tuning with episode-based learning for textual analysis. The model achieved 95.3% accuracy, outperforming baselines. Table 2 gives a description of the most important studies about the English FND.
2.2 Arabic fake news detection
Alzanin and Azmi (2019) presented a system for early Arabic tweet rumor detection to identify rumors before official clarifications. They used an Anti-Rumor dataset without denial cases and incorporated predefined features of sensitive content and novel features into the credibility of followers. Semi-supervised and unsupervised EM algorithms were used. The best results were obtained by using semi-supervised EM in sufficient conditions of training data. The system was limited as it depended on a private dataset and a single source. Nassif et al. (2022) developed a method based on deeply contextualized embedding for Arabic FND. In this study, they translated an English fake news dataset into Arabic. Various models were tested, such as MARBERT, Roberta, Araelectra, Arabert, GigaBert, QaribBert, ARBERT, and Arabic-BERT. The ARBERT model achieved outstanding results, attaining an accuracy of 98%, significantly better than the performance of other advanced models.
Wotaifi and Dhannoon (2023a) presented a hybrid deep neural network. They combined CNN for feature extraction and LSTM to capture long-term dependency in the textual sequences. Trained on the AraNews dataset, it outperforms Text-CNN and LSTM models individually concerning accuracy, demonstrating the effectiveness of combining these architectures for improved prediction of Arabic fake news. Fouad et al. (2022) explored various DL methods over the Arabic text for FND. The authors consider a real-life dataset, which is manually collected, along with a benchmark dataset, and combine them into one huge merged dataset to test CNN, LSTM, BiLSTM, CNN+LSTM, and CNN+BiLSTM models. Their findings showed BiLSTM achieved the highest accuracy on all the datasets, highlighting its effectiveness in FND in Arabic. Focusing on Arabic, Himdi et al. (2022) proposed an innovative approach by collecting a dataset of crowdsourced Arabic articles related to the Hajj. They utilized a custom-built NLP tool to extract textual features, namely part-of-speech tags, syntactic-semantic roles, emotional expressivity, and contextual polarity. The results obtained by various supervised ML models, comprising SVM, RF, and NB, trained on these features, reveal that RF attained an accuracy score of 78%. Wotaifi and Dhannoon (2023b) introduced an attention-based Bi-LSTM trained on the large-scale AFND dataset. Their approach, which outperforms previous models and baseline models, including Simple RNN, LSTM, and GRU, shows the efficiency of attention mechanisms for enhancing FND in Arabic.
Bahurmuz et al. (2022) developed an Arabic rumor detection model using AraBERT and MARBERT, two BERT models pre-trained on large Arabic corpora, achieving a high accuracy of up to 97% across three datasets. Their work highlights the power of transformer-based models for effectively identifying rumors in Arabic social media content. AlEsawi and Haqi Al-Tai (2024) proposed an attention mechanism-based BiLSTM model for detecting Arabic misinformation. They have used AraBERT for feature extraction. Their model has outperformed all SOTA methods on the AraNews and ArCovid19-Rumors datasets, achieving 90% and 96% accuracy, respectively. The work results concluded that including an attention mechanism and contextual embeddings enhances the performance of Arabic misinformation detection. Khalil et al. (2023) proposed an enhanced hybrid CNN-BiLSTM model for Arabic FND. Their model leveraged the concatenation of the GloVe and FastText embeddings, multiple two-dimensional convolutions, and Bi-LSTM with auxiliary output layers to make further improvements. Using the large Arabic Fake News Dataset (AFND), their model achieved 88% and 78% accuracy for binary and multi-class classification, respectively. In other research, Azzeh et al. (2024) addressed the challenge of limited Arabic data by creating a large, annotated corpus from diverse sources. They then used this corpus to test the effectiveness of pre-trained transformers like ARBERT, AraBERT, and CAMeLBERT. The results proved that the highest accuracy for text representation corresponded to CAMeLBERT and that transformer-based models were better compared to traditional ML classifiers.
In 2025, Merzah et al. (2025) suggested a hybrid DL model that integrated two BiGRU along with a self-attention mechanism and FastText word embedding for Arabic FND. The model was evaluated on the AFND dataset. Finally, Albtoush et al. (2025) evaluated ML, DL, and Arabic transformer-based models (AraBERT, AraELECTRA) for FND of Arabic headlines in AFND and ANS datasets. Transformers outperformed, with AraBERTv02 achieving 70.41% accuracy in AFND. Table 3 gives a description of the most important studies about the Arabic FND.
Analysis of previous studies suggests that traditional methods like BoW and TF-IDF contain grave limitations. Traditional methods treat the text as a bag of words; hence, they only capture word frequency without semantic meaning representation. While effective independently, CNNs and BiLSTMs alone fail to capture all relevant features in text sequences. Models that did attempt to combine CNN and LSTM architectures often did so sequentially, which risks information loss between stages. Our research distinguishes itself by introducing a parallel hybrid architecture that extracts these different feature types independently before merging them. We posit that this parallel approach provides a richer, more holistic text representation, leading to improved accuracy in distinguishing fake from real news.
3 Materials and methods
This section introduces a proposed model for FND in Arabic and English. Figure 1 depicts a schematic diagram of our model architecture, and Figure 2 shows the layered architecture. The proposed model includes four layers: an input layer, a word-embedding layer, a feature extraction layer, and a classification layer. The model is built upon a unified pipeline where each component is able to interact with other components in response to the issues concerning FND, especially in low-resource languages like Arabic. For example, the raw data introduces noise and linguistic variations (rich Arabic morphology and different dialects), which require special preprocessing for feeding embeddings to them and extracting significant features from those. Subsequent subsections delineate the task of each layer.
3.1 Datasets description
The model was evaluated on three benchmark datasets: two in Arabic (AFND, ANS) and one in English (WELFake). The first dataset, AFND, is a sizable Arabic dataset containing 606,912 Arabic news articles from 134 public news websites across 19 Arab countries. This dataset, provided by Khalil et al. (2022), offers a diverse representation of the Arabic news landscape. Misbar, an Arabic fact-checking platform, manually categorized these news sources into three labels: credible, not credible, or undecided. Figure 3A illustrates the count of news articles across classes. As the proposed method employs binary classification, only the credible (real) and not credible (fake) labels were used, while undecided articles were excluded.
The second dataset, the Arabic News Stance (ANS) corpus (Khouja, 2020), contains 4,547 claims created by paraphrasing or contradicting headlines from trustworthy news agencies. This dataset presented a notable class imbalance, with 3,072 “not fake” instances (67.6%) and 1,475 “fake” instances (32.4%), as shown in Figure 3B. To address this imbalance, we employed an oversampling technique. This method entails duplicating random examples from the minority class until equilibrium between the classes is achieved, thereby mitigating bias toward the majority class and enhancing the model's resilience in managing imbalanced conditions encountered in real-world applications.
Finally, WELFake (Verma et al., 2021), is an English dataset consisting of a large compilation of news data, curated with much care to be well-balanced and unbiased, and this plays a critical role in ensuring that training data is high quality and that results are delivered effectively. Although multiple free datasets can be used for FND research, most are restricted due to size, bias, and classification. Nonetheless, this limitation is addressed in the WELFake dataset as the four published datasets (McIntire, Reuters, Kaggle, and BuzzFeed) are combined, and this was for two main reasons. Firstly, these datasets have similar structures, consisting of a two-class classification: real and fake. Secondly, merging the two datasets overcomes both biases and limitations. Consequently, a WELFake dataset containing 72,134 news articles is produced, and the data can be categorized as either real or fake news: 35,028 real news articles and 37,106 fake news articles. This dataset comprises three columns: text, title, and label, with each column designated a binary label of either real or fake news. Figure 3C presents the well-balanced presence of real and fake news in the WELFake dataset.
3.2 Preprocessing
The datasets were cleaned and preprocessed to remove noise and irrelevant content. Preprocessing was performed on input data using multiple steps: for the Arabic dataset, these include text cleaning by removing punctuation, non-Arabic characters (Elnagar et al., 2020), and stop words, followed by tokenization and normalization using the Farasa1 library. However, the English dataset was preprocessed by converting text data into lowercase letters, filtering punctuation, hashtags, URLs, non-English characters, and stop words, followed by tokenization and lemmatization using the NLTK2 library.
3.3 Word embedding
Following preprocessing, the text data was transformed into the Word Embedding Layer. Word embedding techniques, like Word2Vec (Mikolov et al., 2013) and GloVe (Pennington et al., 2014), interpret the word as a single element that hampers its impact on languages with complex morphologies. The Arabic language, which is structured on roots and has complex morphology, is quite problematic for most of these approaches. FastText, developed by Facebook AI Research (Joulin et al., 2016), which was utilized in our model, resolves these limitations by using a sub-word embedding approach. FastText provides pre-trained word vectors for 157 languages, using training data from Wikipedia and Common Crawl. This approach has shown superior performance in morphologically rich languages like Arabic due to its power to encode sub-word information effectively, as it breaks words into n-grams (e.g., handling prefixes/suffixes in words like ‘كتابي' as ‘ي' + ‘كتاب'), capturing dialectal variations better than traditional embeddings like Word2Vec. FastText mitigates these issues by using a subword embedding method, where the words are decomposed into n-grams, preserving dialectal variations more effectively compared to traditional embeddings such as Word2Vec. This layer is directly connected to the feature representation layer and provides robust vectors for BiLSTM and CNN to exploit long-distance dependencies and local patterns.
3.4 Feature representation
The core of our proposed architecture is the feature representation layer. The proposed model processes the input article news, embedded by FastText pre-trained, using a dual BiLSTM and multi-channel CNN working in parallel to take different features out of the input text. The logical rationale for employing dual BiLSTM and multi-channel CNN in parallel stems from their complementary strengths in handling textual data, particularly for FND. Traditional sequential hybrids (e.g., CNN followed by LSTM) may lead to information loss during transitions, as local features extracted by CNN could be diluted in LSTM's sequential processing. In contrast, a parallel architecture allows independent feature extraction: BiLSTM captures bidirectional long-range dependencies and contextual nuances (essential for Arabic's morphological complexity and dialectal variations), while multi-channel CNN identifies local patterns via varying filter sizes (2, 3, 4), addressing variable-length n-grams common in both Arabic and English misinformation. The multichannel CNN and dual BiLSTM outputs are merged in the concatenation layer and then passed to a global max-pooling layer to extract salient features, mitigate overfitting, and enhance generalization.
3.4.1 Dual BiLSTM
Bidirectional Long Short-Term Memory (BiLSTM) networks represent an advanced extension of the Long Short-Term Memory (LSTM) architecture. LSTM networks were first introduced to resolve the issues of classical RNNs, including vanishing gradients. LSTMs have memory cells that retain and process information over long sequences (Méndez et al., 2023). An LSTM cell consists of three gates, an input gate it, a forget-gate ft, and an output gate ot, in addition to a memory cell state Ct. The information flow in the cell is regulated to update its state based on the input using Equation 1. The amount of data to be deleted in time t is determined by forget-gate using Equation 2. The candidate cell value, , is computed using Equation 3. Similarly, Equations 4–6 describe the calculation of the cell state Ct, the output gate's output ot and the LSTM cell's final output ht at a time t, respectively. In these equations, W, σ, b, and tanh represent the weight vector, sigma function, bias vector, and hyperbolic tangent function, respectively. Meanwhile, Ft is the input for the BiLSTM at the timestamp t. Additionally, ⊗ carries out element-wise multiplication.
Bidirectional LSTMs can extract more contextual information than regular LSTMs (Tabrizchi et al., 2023). This is often important for many tasks related to NLP. BiLSTM has a pair of LSTMs, forward LSTM and backward LSTM, wherein bidirectional LSTMs extract more contextual data compared to standard LSTMs. The network utilizes both forward and backward time series to learn the current timestamp in the past and the future, which enables it to generate more accurate time-series predictions. This process leads to the formation of two hidden representations and as in Equations 7, 8. Furthermore, BiLSTM computes the final representation by concatenating the data from the two LSTM networks, as demonstrated in Equation 9. Our model uses dual BiLSTM of size 64 and 128 neurons, respectively. After a pair of BiLSTMs, a max-pooling operation of pool size 2 was performed to reduce the sequence length, emphasizing key features. Finally, the proposed model transmits this encoded information to a concatenation layer for merging with the output of a multi-channel CNN.
3.4.2 Multichannel CNN
CNN is a variant of the DL model employed for image classification. However, in recent years, researchers have used it for NLP in areas such as text classification, text summarization, and question answering. Three layers make up the foundation of CNN's architecture, namely a convolution, a pooling, and a fully connected layer (Amiri et al., 2023). These layers work sequentially to extract important local features from the input. The central layer of a CNN is the convolutional layer, which excels at feature detection via matrix computation. This layer uses kernels (or filters) to operate on small chunks of the input data and introduces an activation function called ReLU (Rectified Linear Unit) to introduce non-linearity (Trueman et al., 2021). After the convolutional layer, the max pooling operation decreases the data dimensionality and highlights the most important features.
A one-dimensional convolution is applied in the current approach, as word embeddings are treated as row vectors. The model uses three parallel convolutional layers with 128 filters each and kernel sizes of (2, 3, and 4), which enables it to learn patterns of varying n-grams. Also, a max-pooling operation (pool size = 2) is performed after each convolutional layer. Finally, the output of the linked layer is merged with the output of a double BiLSTM layer for further processing.
3.4.3 Global max-pooling layer
The proposed model passes the merged outputs for the multi-channel CNN and dual BiLSTM in the concatenation layer to the global max-pooling layer. This layer reduces the feature map's spatial dimensions to a fixed size, independent of the input size, but retains the most significant feature from each feature map. This will be very useful in converting variable-length sequences to fixed-size vectors, returning just one vector of 640 dimensions. The vector effectively summarizes the most salient features from the whole sequence.
The BiLSTM and multi-channel CNN are acting in parallel to make long-range context (from BiLSTM) connect with local patterns (from CNN with sizes 2, 3, 4) effectively, whereas the former prevents problems (e.g., vanishing gradient) that most RNN-type neural networks have to face, while the latter minimizes difficulty in handling variable length of patterns bearing long texts. The outputs are concatenated and refined with global max-pooling to obtain meaningful features and prevent overfitting, which is one of the limitations when training in imbalanced datasets. This interdependence leads to a more comprehensive text representation, especially in the involvement of Arabic grammar.
3.5 Classification layer
To tackle the problem of overfitting, a dropout layer is employed at the start of the classification layer, with a rate of 0.5. Thus, 50% of neurons are randomly eliminated throughout the training process. The first dense layer fed from the dropout layer reduces the layer's output to 256 features. This layer learns to combine the features extracted from previous layers. The second dense layer reduces the dimensionality further to 64 features, refining the learned features. Finally, the output layer outputs a single value, which is typically passed through a sigmoid activation function for binary classification tasks, providing a distribution over the two classes, “Real” and “Fake”, to give the final classification decision.
3.6 Experimental and hyperparameter settings
To evaluate and analyze the proposed model's efficacy and account for class imbalance, we employed stratified 5-fold cross-validation (CV). We executed all experiments using Python on Google Colab. The DL models were implemented using the TensorFlow and Keras libraries. Libraries such as Pandas, NLTK, and Farasa were used for preprocessing. The two-layer stacked BiLSTM network used in the model contains 64 and 128 memory cells, respectively, and then applies max-pooling of size 2. The multi-channel CNN has three layers, each with 128 filters, sized 2, 3, and 4, respectively. A max-pooling operation with a pool size of 2 is performed. To prevent overfitting, a dropout rate of 0.5 was used. The Adam optimizer was used with a batch size of 32 across all CV runs. Table 4 summarizes all hyperparameter values used for the experimental evaluation. Building on the stratified 5-fold CV setup, we reproduced strong transformer baselines to ensure fair comparison under identical preprocessing and data splits. For the Arabic AFND and ANS datasets, AraBERT and MARBERT were implemented using the Hugging Face Transformers library, the same tokenization, batch size of 32, and Adam optimizer with early stopping based on F1-score. For the English WELFake dataset, BERT and DistilBERT were similarly reproduced with equivalent settings.
3.7 Performance evaluation metrics
To measure the effectiveness of the proposed model, four standard metrics were applied: recall (Rc), accuracy (Ac), F1-score (F1), and precision (Pr). These metrics are commonly used for FND and are calculated based on the confusion matrix components: True-Positive (TP), True-Negative (TN), False-Positive (FP), and False-Negative (FN). Pr indicates the percentage of correctly classified fake news articles to the total number of articles classified as fake news shown in Equation 10. Rc, however, represents the percentage of correctly classified fake news from the total number of fake news labels, as defined in Equation 11. F1, also called the F-measure, is the harmonic mean of precision and recall, as shown in Equation 12. Finally, Ac denotes the proportion of accurately identified real or fake news relative to the total labeled articles in the dataset, the mathematical formulation of which can be seen in Equation 13.
Additionally, to address class imbalance sensitivity, we computed macro-F1, which is the unweighted average of F1 across all classes, treating each class equally regardless of size. For binary classification, macro-F1 is calculated as:
We also report the Area Under the Curve-Precision-Recall (AUC-PR), which evaluates the trade-off between precision and recall across various thresholds.
4 Results
This section reports the quantitative results of the experiments carried out to assess the suggested model.
4.1 Model performance on benchmark datasets
The proposed model was trained and evaluated using AFND, ANS, and WELFake datasets. For the Arabic AFND dataset, the proposed method achieved a mean Ac of (94.43 ± 0.19) %, Pr of (95.4 ± 0.2) %, Rc of (94.5 ± 0.2) %, and F1 of (94.95 ± 0.2). For the ANS dataset, our model achieved a mean Ac of (71.63 ± 1.45) %, Pr of (77.42 ± 1.32) %, Rc of (81.9 ± 0.91) %, and F1 of (79.6 ± 1.42) %. On the English WELFake dataset, the model demonstrated superior performance with mean Ac of (98.85 ± 0.03) %, Pr of (98.8 ± 0.19) %, Rc of (98.84 ± 0.16) %, F1 of (98.82 ± 0.03). Training and validation accuracy and loss curves (averaged across stratified 5-fold CV runs) on the AFND, ANS, and WELFake datasets are shown in Figures 4A–C, respectively.
Figure 4. Training and validation accuracy and loss curves. (A) AFND dataset. (B) ANS dataset. (C) WELFake dataset.
To ensure a fair and comprehensive comparison, we reproduced several strong transformer baselines: AraBERT and MARBERT for Arabic, and BERT and DistilBERT for English. All the baselines were executed in our experiment framework using the same preprocessing, splitting schemes for the data, and evaluation for metrics in order to achieve a comparable and easy-to-follow comparison to our proposed model.
4.2 Comparison with state-of-the-art methods
To evaluate the efficiency of our proposed model, we carried out a double comparison. Initially, we compared our work against existing SOTA methods. Next, for a fair and clear comparison, we also implemented some SOTA transformer baselines (BERT and DistilBERT for English, while AraBERT and MARBERT for Arabic) and subjected them to testing in our precise experimental setup using the same protocol of stratified cross-validation.
For the AFND dataset, our model compares with the SOTA methods: Capsule Network (Khalil et al., 2021), BiLSTM-Attention (Wotaifi and Dhannoon, 2023b), CNN-BiLSTM (Khalil et al., 2023), Ensemble (Fares et al., 2024), LSTM (Abd Elminaam et al., 2023), WLT-araBERT+BiLSTM (Turki et al., 2025). As indicated in Table 5, the performance of our model was comparatively evaluated against prominent methodologies using the AFND dataset. Our method achieved a mean Ac of (94.43 ± 0.19) %, Pr of (95.4 ± 0.2) %, Rc of (94.5 ± 0.2) %, and F1 of (94.95 ± 0.2) %. While the models by Abd Elminaam et al. (2023) and Khalil et al. (2023) were strong competitors, our model surpassed their reported Ac by 4.29% and 6.94%, respectively. Furthermore, our model showed a significant improvement of 7.95% in F1 over the CNN-BiLSTM method (Khalil et al., 2023). When compared to our reproduced baselines, our model also demonstrated superior performance, exceeding the accuracy of AraBERTv2 by 1.07% and MARBERT by 2.18%.
In the challenging case of the ANS dataset, our proposed approach is more effective on imbalanced data. Our model was trained on the augmented balanced data with random over-sampling techniques, which avoids the bias toward the majority class. Nevertheless, all reported performances are evaluated on the original imbalanced test set to mimic a realistic use case. This strategy proved highly effective.
Our model compares with the SOTA methods on the ANS dataset: AraGPT2 (Al-Yahya et al., 2021), BERT (Khouja, 2020), LSTM-CNN (Sorour and Abdelkader, 2022), JointBERT (Shishah, 2022), and APBTM (Abdelhakim Othman et al., 2024). As can be seen in Table 6, our hybrid model demonstrated an average Ac of (71.63 ± 1.45) % and an F1 of (79.6 ± 1.42) %. This is a great improvement over Previous SOTA for DL, such as the APBTM (Abdelhakim Othman et al., 2024) model (Ac 71.42%). We obtained SOTAs on all individual datasets used with directly comparable transformer baselines, AraBERTv2 and MARBERT, by 0.99% and 4.9% respectively. The model's high AUC -PR score of 81.85% deserves a special mention, since this seems to show that together, the blended hybrid architecture and oversampling training strategy yield a resilient model capable of dealing with drastic class imbalance.
Moreover, the SOTA models compared to our model, based on the WELFake dataset, are: SVM (Verma et al., 2021), N-Gram with TF-IDF and BERT (Kausar et al., 2022), Attention-based BiLSTM (Padalko et al., 2024), CNN-BiLSTM (Ouassil et al., 2022), and BERT+BiLSTM (Al-Quayed et al., 2024). Table 7 shows that the proposed model performs impressively on the WELFake dataset, beating all the previous models on key metrics. Our model obtains the highest mean Ac of (98.85 ± 0.03) %, beating BERT+BiLSTM (Al-Quayed et al., 2024) and CNN-BiLSTM (Ouassil et al., 2022), which attained an Ac of 98.1% and 97.74%, respectively. Our model also consistently outperformed our reproduced transformer baselines when directly compared. It outperformed BERT and DistilBERT with an increasing Ac of 0.82% and 1.0%, respectively. Despite small performance margins on this benchmark English dataset, the model's wide margin of victory and extremely high AUC-PR value of 99.9%, combined with its minuscule standard deviation (±0.03 for Ac), reinforce our claim that it is robust and dependable and capable of successfully identifying fake news in the context of English. A graphical comparison of the accuracy of the proposed model and SOTA methods on datasets AFND, ANS, and WELFake is shown in Figure 5.
Table 7. Comparison of the proposed model against baselines and SOTA methods on the WELFake dataset.
Figure 5. A graphical comparison of the accuracy of the proposed method with SOTA. (A) AFND dataset. (B) ANS dataset. (C) WELFake dataset.
4.3 Ablation study
An ablation study was performed by systematically removing and adding some components to assess the efficacy of the proposed architecture. The proposed DL model incorporates three neural network components: a deep CNN (3 layers), dual BiLSTM, and a global max-pooling layer. To understand the impact of each component, we performed an ablation analysis that involved removing one element at a time: the deep CNN, the stacked BiLSTM, and the global max-pooling layer. We also evaluated a model using a single CNN and a single BiLSTM. The models are:
• Model 1 (our model without dual BiLSTM): In this model, the dual BiLSTM, which is responsible for capturing semantic features and contextual information, was removed from our model to evaluate the impact of its removal on the model.
• Model 2 (our model without multi-CNN): The multi-CNN networks, which effectively explore local features by using multiple filter sizes that enable the network to detect patterns of varying lengths, were removed.
• Model 3 (our model without global max-pooling): The global max-pooling layer, responsible for dimensionality reduction and extracting prominent features, is replaced with a flattened layer to convert the features to a vector for passing to the dense layer.
• Model 4 (single BiLSTM and single CNN): Instead of dual BiLSTM and multichannel CNN, a single BiLSTM with 128 neurons and a single CNN with 128 filters of size three were used to compare performance with the proposed model.
Table 8 presents an ablation study analyzing the impact of each model component on performance across the AFND, ANS, and WELFake datasets.
The complete model achieves the highest performance on the AFND dataset, with a mean Ac of (94.43 ± 0.19) % and an F1 of (94.95 ± 0.2) %. Excluding the BiLSTM layer significantly impacts the results, reducing Ac to (91.68 ± 0.3) % and F1 to (92.29 ± 0.3) %. In contrast, removing the multi-CNN network results in a performance decrease, with Ac dropping to (92.77 ± 0.43) % and F1 to (92.29 ± 0.3) %. Furthermore, the effect of the global max-pooling layer is revealed when its removal decreases Ac and F1 to (92.46 ± 0.2) % and (93.09 ± 0.2) %, respectively. Finally, the configuration using a single CNN and BiLSTM record decreases Ac and F1 by (92.74 ± 0.25) % and (93.41 ± 0.25), respectively.
On the ANS dataset, our complete model again achieved the highest performance with a mean Ac of (71.63 ± 1.45) %, outperforming the four ablation configurations (Models 1–4) by margins of 4.9%, 5.19%, 5.06%, and 6.95%, respectively.
A similar study on the WELFake dataset confirms that the proposed model outperforms all other configurations, achieving a mean Ac of (98.85 ± 0.03) % and an F1 of (98.82 ± 0.03) %. In the ablated models (Models 1–4), the accuracy decreased to (97.51 ± 0.2) %, (97.79 ± 0.31) %, (97.99 ± 0.18) %, and (98.1 ± 0.09) %, respectively.
4.4 Error analysis
To surface concrete failure modes without relying on aggregate numbers, we qualitatively traced each misclassification in AFND, ANS, and WELFake to the textual cues most likely responsible for the model's decision. In AFND (Figure 6), the classifier overweighted sensational morphology and authority framing. The true headline “أريزونا تغزو وبعوض عقارب” (Scorpions and mosquitoes invade Arizona) was judged Fake; when the verb is neutralized (“تنتشر” instead of “تغزو”), the same content is classified correctly, indicating a causal dependence on aggressive morphology rather than the corroborating details in the body. Conversely, the fake denial “≪عملياتأيننفذلم: ≪التحالف…” (The coalition: We did not conduct any operations…) was accepted as True: the formal register and institutional attribution acted as sufficient signals despite the absence of dates, places, or external sources.
Figure 6. Sample news articles from the AFND dataset where the proposed model misclassified the content.
In ANS (Figure 7), two mechanisms recur: domain/locale under-coverage and a prior for bureaucratic phrasing. The true economic headline about gold prices (“الذهب انخفاض إلى يؤدي … فشل” – Failure … leads to a decline in gold) was labeled Fake even though the body encodes canonical economic causality (“السعر انخفاض → التحفيزفشل”) and contains no rumor markers; this pattern does not appear on comparable English economic items, pointing to sparse representation of Arabic financial news rather than lexical ambiguity. By contrast, plainly phrased policy claims such as “دبي في الحكومية الرسوم رفع” (Raising government fees in Dubai) were granted True despite lacking citations or quotations, revealing a stylistic prior that equates administrative tone with credibility.
Figure 7. Sample news articles from the ANS dataset where the proposed model misclassified the content.
In WELFake (Figure 8), errors reflect missing title–body entailment and sensitivity to stylistic packaging. The fake headline “How Congress finally killed No Child Left Behind” was predicted True while its body pivots to unrelated campaign coverage; a simple entailment probe—presence of the bill, legislative actors, or a repeal timeline—would yield a non-support signal. Mentions of reputable outlets also misled the classifier: “GOP hits another roadblock on Obamacare repeal” was judged True likely due to the “POLITICO has learned” cue, whereas all-caps and bracketed media tokens in similar items are down-weighted as stylistic noise. These cases jointly isolate key failure modes—aggressive-morphology priors, authority-framing bias, missing title–body entailment, and an Arabic domain-coverage gap—explaining the mistakes without aggregate statistics.
Figure 8. Sample news articles from the WELFake dataset where the proposed model misclassified the content.
5 Discussion
This section provides an interpretation and analysis of the results presented above. The comparative results evidently show the superiority of our proposed hybrid DL model over the current state-of-the-art methods on both Arabic and English datasets for FND. Interestingly, the model surpassed transformer-based and ensemble approaches, demonstrating its strength and flexibility. The considerable gain in Pr, Rc, and F1, particularly on the AFND dataset, indicates the model's ability to generalize well over varied linguistic structures and news domains. Our suggested model clearly outperformed the transformer-based hybrid approach, WLT-araBERT+BiLSTM suggested by Turki et al. (2025). On a small sample of 30,000 records, their model obtained an accuracy of 89.91% under rigorous preprocessing conditions. Although their study's highest reported accuracy was 93.83%, it's important to remember that this outcome was achieved by employing only punctuation mark removal. This distinction demonstrates our model's strong performance, particularly its superiority in deep data cleaning scenarios, where it outperforms contemporary transformer-based models. This performance improvement is due to the synergistic combination of multi-channel CNNs and dual BiLSTM layers, allowing the model to effectively learn both local textual patterns and long-range contextual dependencies. The application of global max-pooling is a further strength of the model in extracting the most salient features, while the use of pre-trained FastText embeddings offers strong word representations, with special benefits for morphologically rich and low-resource languages such as Arabic.
The adoption of stratified 5-fold CV provides a more reliable performance estimate, with low standard deviations (e.g., ±0.19% for Ac on AFND) indicating model stability across data splits. The high AUC-PR scores (e.g., 98.8 ± 0.01 on AFND) and macro-F1 (94.31 ± 0.2) % confirm the model's effectiveness in imbalanced scenarios, where precision-recall trade-offs and equal class treatment are critical for minimizing false positives in fake news detection. The reproduction of transformer baselines under identical conditions further validates the proposed model's superiority, with consistent outperformance across metrics like AUC-PR and macro-F1. For instance, on AFND, the model achieves a higher Ac (94.43 ± 0.19) % than AraBERT (93.36 ± 0.35) %, underscoring the hybrid DL approach's robustness in handling class imbalance, paving the way for future cross-lingual extensions.
The ablation study conducted in the current research provides important insight into the contribution of each element in the proposed model framework for false news detection on both AFND (Arabic) and WELFake (English) datasets. Removing the dual BiLSTM layer (Model 1) resulted in the strongest decrease in accuracy and F1-score on both datasets. This finding highlights the importance of the role played by the BiLSTM in identifying sequential dependencies and contextual information in news reports. The ability of the BiLSTM to encode long-range relations in text is particularly important in detecting subtle cues distinguishing real from fake news, especially in morphologically rich languages like Arabic.
Likewise, the removal of the multi-channel CNN (Model 2) resulted in an apparent drop in performance. This highlights the value of having multiple convolutional filters in order to capture varied local features and patterns of different lengths in the text. The multi-CNN framework allows the model to identify a broad variety of linguistic and stylistic features that are typically characteristic of fake news. The deletion of the global max-pooling layer (Model 3) also led to degraded performance, signifying its usefulness in extracting the most important features from the concatenated outputs of the CNN and BiLSTM layers. It assists in dimensionality reduction while retaining the most informative parts of the feature maps.
In addition, the utilization of just one CNN and one BiLSTM (Model 4), as opposed to the stacked and multi-channel strategy, led to worse performance. This indicates that the combination of multiple channels and stacked layers offers a dramatic improvement and enables the model to learn local as well as global textual patterns more efficiently. Each component, as demonstrated by its corresponding ablation model, contributes uniquely to the model's ability to generalize across different languages and datasets. The findings accentuate the need for a balanced and carefully designed architecture, rather than any single approach, to address the complex problem of fake news detection.
A deeper analysis of the misclassified samples reveals that the errors extend beyond general linguistic complexity. For example, a large percentage of false negatives (real news labeled as fake) happened when real news articles used sensationalist or “clickbait-style” headlines (e.g., “أريزونا تغزو غريبة آفات”). The model appears to have over-indexed on this stylistic feature, penalizing legitimate news for adopting a tone commonly associated with fake news. Conversely, many false positives (fake news classified as real) were observed when the fake article successfully mimicked a formal, official tone (e.g., “عسكرية بعمليات قيامه التحالف نفي”). In these cases, the model was deceived by the professional-sounding language, demonstrating a vulnerability to sophisticated fakes that lack typical stylistic red flags. This suggests the model relies more on stylistic heuristics than on a deeper semantic analysis of the content.
The low false positive and false negative rates indicate a somewhat balanced model. The false positives often occurred due to a complete semantic disconnect between the title and the article's body, a tactic used to generate plausible-sounding but incoherent fake news. The model failed to detect this logical mismatch, especially when the text mentioned a credible source like “POLITICO,” which appeared to give the content a false sense of authenticity. Furthermore, the false negatives were frequently caused by stylistic choices in real news headlines, such as the use of rhetorical questions or a blog-like tone (e.g., “TRUMP SWEEPS FIVE STATES...”). This confirms a cross-lingual weakness: the model is overly sensitive to stylistic cues across both Arabic and English, sometimes at the expense of factual content.
In summary, though dataset imbalance is a factor, our error analysis suggests that the model's biggest weakness is its dependence on stylistic and structural heuristics instead of semantic coherence and factuality. Future research should concentrate on training the model to be more resilient against sensationalist language, to better identify logical discrepancies between titles and content, and to achieve a more complex understanding of source credibility than just keyword recognition in order to reduce these misclassifications.
6 Conclusion
This work uses an effective DL model for FND on both English (a high-resource language) and Arabic (a low-resource language). To create the input data, pre-trained FastText word representation is applied to produce word vectors. This generated matrix flows parallel to a multi-channel CNN and dual BiLSTM to capture semantic features and local patterns. Our model was evaluated on three benchmark datasets, achieving superior performance, as evidenced by accuracies of (94.43 ± 0.19) % on AFND, (71.63 ± 1.45) % on ANS, and (98.85 ± 0.03) % on WELFake, outperforming SOTA methods by up to 4.29%,0.21%, and 0.75%, respectively. However, our hybrid DL model achieves superior performance compared to the transformer baselines reproduced under the same stratified 5-fold CV framework. Specifically, it outperforms AraBERTv2 by 0.88% in F1 on AFND, and 0.8% in F1 on ANS, and BERT by 0.82% in F1 on WELFake. These results, including enhanced macro-F1 and AUC-PR, underscore the model's robustness in FND and its potential for real-world applications. Additionally, an ablation study was conducted to examine the effects of the components on the structure of the suggested model. Ablation studies further support our model's superiority, with the full model improving F1-scores by 1.47% on AFND, 2.84% on ANS, and 0.74% on WELFake, highlighting the hybrid architecture's effectiveness in addressing bilingual challenges. Although the proposed model has shown a promising performance, it must be acknowledged that there are some limitations and challenges. One of these challenges is the intricacy of the Arabic language, including its rich morphology, dialect diversity, and precise grammar. While FastText pre-trained word embedding addresses some of these challenges, it remains insufficient to handle linguistic details such as dialect variation. Another limitation is that the efficacy of the model was tested on just two languages, Arabic and English. Furthermore, the model's focus on textual data alone is a limitation, as real-world fake news often includes multimodal content, such as images and videos, which the current model does not consider. Lastly, the model faces difficulties in identifying nuanced forms of fake news, such as satire or sarcasm, which share linguistic features with genuine news and require deeper contextual understanding. In the future, we plan to expand our model to work with multiple languages. We will evaluate it on datasets for low-resource languages like Persian or Turkish, utilizing techniques such as transfer learning, which is a promising research direction. Additionally, we aim to extend our model to a multimodal framework, incorporating images and videos using approaches like vision transformers. This will address the limitations of text-only detection in real-world misinformation scenarios. The suggested model can be applied by OSN service providers for FND, demonstrating its efficacy on real-world datasets.
Data availability statement
Publicly available datasets were analyzed in this study. This data can be found here: The AFND dataset can be found at: https://data.mendeley.com/datasets/67mhx6hhzd/1. The ANS dataset can be found at: https://github.com/latynt/ans. The WELFake dataset can be found at: https://zenodo.org/records/4561253.
Author contributions
BM: Formal analysis, Methodology, Conceptualization, Investigation, Writing – original draft, Software, Visualization. JR: Supervision, Writing – review & editing, Project administration, Methodology, Validation. ZS: Writing – review & editing.
Funding
The author(s) declare that no financial support was received for the research and/or publication of this article.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that Gen AI was used in the creation of this manuscript. The author(s) acknowledge that they utilized artificial intelligence (AI) language models (OpenAI's GPT-4 and Google's Gemini Pro model) during the preparation of this manuscript. The use of these tools was limited to improving the grammar, editing, and proofreading of the text.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Footnotes
References
Abd Elminaam, D. S., Abdelaziz, A., Essam, G., and Mohamed, S. E. (2023). “AraFake: a deep learning approach for Arabic fake news detection,” in 2023 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC) (Cairo: IEEE), 1–8.
Abdelhakim Othman, N., Elzanfaly, D. S., and Elhawary, M. M. M. (2024). Arabic fake news detection using deep learning. IEEE Access 12, 122363–122376. doi: 10.1109/ACCESS.2024.3451128
Ahmed, H., Traore, I., and Saad, S. (2017). “Detection of online fake news using N-gram analysis and machine learning techniques,” in Computational Data and Social Networks (Cham: Springer International Publishing), 127–138.
Al Anezi, F. Y. (2022). Arabic hate speech detection using deep recurrent neural networks. Appl. Sci. 12:6010. doi: 10.3390/app12126010
Albtoush, E. S., Gan, K. H., and Ahmad Alrababah, S. A. (2025). Evaluation of machine learning and deep learning models for fake news detection in Arabic headlines. IEEE Access 13, 162009–162026. doi: 10.1109/ACCESS.2025.3606114
AlEsawi, B., and Haqi Al-Tai, M. (2024). Detecting Arabic misinformation using an attention mechanism-based model. Iraqi J. Comput. Sci. Math. 5, 285–298. doi: 10.52866/ijcsm.2024.05.01.020
Alghamdi, J., Lin, Y., and Luo, S. (2024). Unveiling the hidden patterns: a novel semantic deep learning approach to fake news detection on social media. Eng. Appl. Artif. Intell. 137:109240. doi: 10.1016/j.engappai.2024.109240
Aljohani, E. (2024). Enhancing Arabic fake news detection: evaluating data balancing techniques across multiple machine learning models. Eng. Technol. Appl. Sci. Res. 14, 15947–15956. doi: 10.48084/etasr.8019
Aljwari, F., Alkaberi, W., Alshutayri, A., Aldhahri, E., Aljojo, N., and Abouola, O. (2022). Multi-scale machine learning prediction of the spread of Arabic online fake news. Postmod. Openings 13, 01–14. doi: 10.18662/po/13.1Sup1/411
Alnabrisi, I. K., and Saad, M. K. (2024). Detect Arabic fake news through deep learning models and transformers. Expert Syst. Appl. 251:123997. doi: 10.1016/j.eswa.2024.123997
Al-Quayed, F., Javed, D., Jhanjhi, N. Z., Humayun, M., and Alnusairi, T. S. (2024). A hybrid transformer-based model for optimizing fake news detection. IEEE Access 12, 160822–160834. doi: 10.1109/ACCESS.2024.3476432
Al-Yahya, M., Al-Khalifa, H., Al-Baity, H., AlSaeed, D., and Essam, A. (2021). Arabic fake news detection: comparative study of neural networks and transformer-based approaches. Complexity 2021:5516945. doi: 10.1155/2021/5516945
Alzanin, S. M., and Azmi, A. M. (2019). Rumor detection in Arabic tweets using semi-supervised and unsupervised expectation–maximization. Knowl. Based Syst. 185:104945. doi: 10.1016/j.knosys.2019.104945
Amiri, R., Razmara, J., Parvizpour, S., and Izadkhah, H. (2023). A novel efficient drug repurposing framework through drug-disease association data integration using convolutional neural networks. BMC Bioinformatics 24:442. doi: 10.1186/s12859-023-05572-x
Azzeh, M., Qusef, A., and Alabboushi, O. (2024). Arabic fake news detection in social media context using word embeddings and pre-trained transformers. Arab J. Sci. Eng. 49, 5293–5304. doi: 10.1007/s13369-024-08959-x
Bahurmuz, N. O., Amoudi, G. A., Baothman, F. A., Jamal, A. T., Alghamdi, H. S., and Alhothali, A. M. (2022). Arabic rumor detection using contextual deep bidirectional language modeling. IEEE Access 10, 114907–114918. doi: 10.1109/ACCESS.2022.3217522
Chen, Y., and Yin, B. (2025). Transformer-based fake news classification: evaluation of DistilBERT with CNN-LSTM and GloVe embedding. Informatica 49, 172–184. doi: 10.31449/inf.v49i25.7710
Elnagar, A., Al-Debsi, R., and Einea, O. (2020). Arabic text classification using deep learning models. Inf. Process. Manag. 57:102121. doi: 10.1016/j.ipm.2019.102121
Fares, A., Chougui, R. Y., Drif, A., and Giordano, S. (2024). “An ensemble deep learning models based on metadata for measuring Arabic fake news uncertainty,” in 2024 IEEE International Conference on Advanced Systems and Emergent Technologies (IC_ASET) (Hammamet: IEEE), 1–6.
Faustini, P. H. A., and Covões, T. F. (2020). Fake news detection in multiple platforms and languages. Expert Syst. Appl. 158:113503. doi: 10.1016/j.eswa.2020.113503
Fazil, M., Khan, S., Albahlal, B. M., Alotaibi, R. M., Siddiqui, T., and Shah, M. A. (2023). Attentional multi-channel convolution with bidirectional LSTM cell toward hate speech prediction. IEEE Access 11, 16801–16811. doi: 10.1109/ACCESS.2023.3246388
Fouad, K. M., Sabbeh, S. F., and Medhat, W. (2022). Arabic fake news detection using deep learning. Comput. Mater. Continua 71, 3647–3665. doi: 10.32604/cmc.2022.021449
Giglou, H. B., Razmara, J., Rahgouy, M., and Sanaei, M. (2020). “LSACoNet: a combination of lexical and conceptual features for analysis of fake news spreaders on Twitter,” in CLEF (Working Notes). Aachen: CEUR-WS.org.
Goldani, M. H., Momtazi, S., and Safabakhsh, R. (2021). Detecting fake news with capsule neural networks. Appl. Soft Comput. 101:106991. doi: 10.1016/j.asoc.2020.106991
Himdi, H., Weir, G., Assiri, F., and Al-Barhamtoshy, H. (2022). Arabic fake news detection based on textual analysis. Arab J. Sci. Eng. 47, 10453–10469. doi: 10.1007/s13369-021-06449-y
Jain, M. K., Gopalani, D., and Meena, Y. K. (2025). Hybrid CNN-BiLSTM model with HHO feature selection for enhanced fake news detection. Soc. Netw. Anal. Min. 15:43. doi: 10.1007/s13278-025-01455-6
Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., and Mikolov, T. (2016). FastText.zip: compressing text classification models. arXiv [Preprint]. arXiv:1612.03651.
Kaliyar, R. K., Goswami, A., Narang, P., and Sinha, S. (2020). FNDNet – a deep convolutional neural network for fake news detection. Cogn. Syst. Res. 61, 32–44. doi: 10.1016/j.cogsys.2019.12.005
Kausar, N., Ali Khan, A., and Sattar, M. (2022). Towards better representation learning using hybrid deep learning model for fake news detection. Soc. Netw. Anal. Min. 12:165. doi: 10.1007/s13278-022-00986-6
Khalil, A., Jarrah, M., and Aldwairi, M. (2023). Hybrid neural network models for detecting fake news articles. Human-Centric Intelligent Systems 4, 136–146. doi: 10.1007/s44230-023-00055-x
Khalil, A., Jarrah, M., Aldwairi, M., and Jaradat, M. (2022). AFND: Arabic fake news dataset for the detection and classification of articles credibility. Data Brief 42:108141. doi: 10.1016/j.dib.2022.108141
Khalil, A., Jarrah, M., Aldwairi, M., and Jararweh, Y. (2021). “Detecting Arabic fake news using machine learning,” in 2021 Second International Conference on Intelligent Data Science Technologies and Applications (IDSTA), (Tartu: IEEE), 171–177.
Khan, S., Fazil, M., Sejwal, V. K., Alshara, M. A., Alotaibi, R. M., Kamal, A., et al. (2022). BiCHAT: BiLSTM with deep CNN and hierarchical attention for hate speech detection. J. King Saud Univ. Comput. Inf. Sci. 34, 4335–4344. doi: 10.1016/j.jksuci.2022.05.006
Khouja, J. (2020). “Stance prediction and claim verification: an Arabic perspective,” in Proceedings of the Third Workshop on Fact Extraction and VERification (FEVER) (Stroudsburg, PA, USA: Association for Computational Linguistics), 8–17.
Lazer, D. M. J., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., et al. (2018). The science of fake news. Science 359, 1094–1096. doi: 10.1126/science.aao2998
Méndez, M., Merayo, M. G., and Núñez, M. (2023). Long-term traffic flow forecasting using a hybrid CNN-BiLSTM model. Eng. Appl. Artif. Intell. 121:106041. doi: 10.1016/j.engappai.2023.106041
Merzah, B. M., Razmara, J., and Karimpour, J. (2025). Self-attention enhanced dual BiGRU for Arabic fake news detection. Mesopotamian J. Comput. Sci. 2025, 247–257. doi: 10.58496/MJCSC/2025/016
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space.
Mohapatra, A., Thota, N., and Prakasam, P. (2022). Fake news detection and classification using hybrid BiLSTM and self-attention model. Multimed. Tools Appl. 81, 18503–18519. doi: 10.1007/s11042-022-12764-9
Nasir, J. A., Khan, O. S., and Varlamis, I. (2021). Fake news detection: a hybrid CNN-RNN based deep learning approach. Int. J. Inf. Manag. Data Insights 1:100007. doi: 10.1016/j.jjimei.2020.100007
Nassif, A. B., Elnagar, A., Elgendy, O., and Afadar, Y. (2022). Arabic fake news detection based on deep contextualized embedding models. Neural Comput. Appl. 34, 16019–16032. doi: 10.1007/s00521-022-07206-4
Ouassil, M.-A., Cherradi, B., Hamida, S., Errami, M., Gannour, O., and El Raihani, A. (2022). A fake news detection system based on combination of word embedded techniques and hybrid deep learning model. Int. J. Adv. Comput. Sci. Appl. 13:61. doi: 10.14569/IJACSA.2022.0131061
Padalko, H., Chomko, V., and Chumachenko, D. (2024). A novel approach to fake news classification using LSTM-based deep learning models. Front. Big Data 6:1320800. doi: 10.3389/fdata.2023.1320800
Pennington, J., Socher, R., and Manning, C. (2014). “Glove: global vectors for word representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Stroudsburg, PA, USA: Association for Computational Linguistics), 1532–1543.
Raza, N., Abdulkadir, S. J., Abid, Y. A., Albouq, S. S., Alwadain, A., Rehman, A. U., et al. (2025). Enhancing fake news detection with transformer-based deep learning: a multidisciplinary approach. PLoS ONE 20:e0330954. doi: 10.1371/journal.pone.0330954
Saleh, H., Alharbi, A., and Alsamhi, S. H. (2021). OPCNN-FAKE: optimized convolutional neural network for fake news detection. IEEE Access 9, 129471–129489. doi: 10.1109/ACCESS.2021.3112806
Samadi, M., Mousavian, M., and Momtazi, S. (2021). Deep contextualized text representation and learning for fake news detection. Inf. Process. Manag. 58:102723. doi: 10.1016/j.ipm.2021.102723
Sedik, A., Abohany, A. A., Sallam, K. M., Munasinghe, K., and Medhat, T. (2022). Deep fake news detection system based on concatenated and recurrent modalities. Expert Syst. Appl. 208:117953. doi: 10.1016/j.eswa.2022.117953
Shishah, W. (2022). JointBert for detecting Arabic fake news. IEEE Access 10, 71951–71960. doi: 10.1109/ACCESS.2022.3185083
Sorour, S. E., and Abdelkader, H. E. (2022). AFND: Arabic fake news detection with an ensemble deep CNN-LSTM model. J. Theor. Appl. Inf. Technol. 100, 5072–5086.
Tabrizchi, H., Razmara, J., and Mosavi, A. (2023). Thermal prediction for energy management of clouds using a hybrid model based on CNN and stacking multi-layer bi-directional LSTM. Energy Reports 9, 2253–2268. doi: 10.1016/j.egyr.2023.01.032
Touahri, I., and Mazroui, A. (2024). Survey of machine learning techniques for Arabic fake news detection. Artif. Intell. Rev. 57:157. doi: 10.1007/s10462-024-10778-3
Trueman, T. E., Jayaraman, A. K., Narayanasamy, P., and Vidya, J. (2021). Attention-based C-BiLSTM for fake news detection. Appl. Soft Comput. 110:107600. doi: 10.1016/j.asoc.2021.107600
Turki, H. M., Al Daoud, E., Samara, G., Alazaidah, R., Qasem, M. H., Aljaidi, M., et al. (2025). Arabic fake news detection using hybrid contextual features. Int. J. Electr. Comput. Eng. 15:836. doi: 10.11591/ijece.v15i1.pp836-845
Verma, P. K., Agrawal, P., Amorim, I., and Prodan, R. (2021). WELFake: word embedding over linguistic features for fake news detection. IEEE Trans. Comput. Soc. Syst. 8, 881–893. doi: 10.1109/TCSS.2021.3068519
Wotaifi, T. A., and Dhannoon, B. N. (2023b). Attention mechanism based on a pre-trained model for improving Arabic fake news predictions. Iraqi J. Sci. 64, 6041–6054. doi: 10.24996/ijs.2023.64.11.45
Keywords: deep learning, fake news detection, multi-channel CNN, dual BiLSTM, transformers
Citation: Merzah BM, Razmara J and Salmanian Z (2026) Hybrid deep learning models for fake news detection: case study on Arabic and English languages. Front. Big Data 8:1683786. doi: 10.3389/fdata.2025.1683786
Received: 11 August 2025; Revised: 17 October 2025; Accepted: 25 November 2025;
Published: 06 January 2026.
Edited by:
Jinjia Zhou, Hosei University, JapanReviewed by:
Ajey Kumar, Symbiosis International (Deemed University), IndiaIman Abduljaleel, University of Basrah, Iraq
Ahmed Mahfouz, Arab Open University, Oman
Copyright © 2026 Merzah, Razmara and Salmanian. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Jafar Razmara, cmF6bWFyYUB0YWJyaXp1LmFjLmly
Zolfaghar Salmanian