- 1Department of Mathematics, College of Science and Humanity, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
- 2Department of Computer Science, College of Information Technology, Amman Arab University, Amman, Jordan
- 3Department of Information Systems, Sohag University, Sohag, Egypt
- 4Department of Computer Engineering and Information College of Engineering - Wadi Addawasir, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
- 5Faculty of Computers and Information, Damanhour University, Damanhour, Egypt
Introduction: The growing prevalence of fraud and malware, fueled by increased online activity and digital transactions, has exposed the shortcomings of conventional detection systems, particularly in handling novel or obfuscated threats, class imbalance, and high-dimensional data with many irrelevant features. This underscores the need for robust and adaptive detection methodologies.
Methods: This study proposes an advanced Fraud Detection (FD) methodology, BKOA-GOBL, that enhances the Binary Kepler Optimization Algorithm (BKOA) by integrating Ghost Opposition-Based Learning (GOBL) to improve Feature Selection (FS). The BKOA dynamically models gravitational attraction, planetary motion mechanics, and cyclic control to maintain a balance between exploration and exploitation. At the same time, the GOBL enhances broader search diversification and prevents early convergence, allowing the local optimum to be avoided. The Random Under-Sampling (RUS) technique is utilized to mitigate the class imbalance in fraud benchmarks.
Results and discussion: Experimental validation is conducted on five real-world benchmarks, including the Australian, European, CIC-MalMem-2022, Synthetic Financial Transaction Log, and Real vs Fake Job Postings datasets, using k-Nearest Neighbors (K-NN) and XGBoost (Xgb-tree) classifiers. The BKOA-GOBL achieves outstanding performance, reaching classification accuracies up to 99.96% in some benchmarks and corresponding feature reduction rates up to 81.82%. Precision, recall, ROC_AUC, and F1-scores were consistently high across most benchmarks, demonstrating reliable and balanced detection. However, some challenging benchmarks—such as the Real vs Fake Job Postings dataset using k-NN classifier—returned lower scores (Precision = 76.14%, Recall = 66.55%, F1-score = 71.00%, and ROC_AUC = 74.15%), reflecting the difficulty of the problem. Comparative analyses against 12 recent Metaheuristic Algorithms (MHAs) and Machine Learning (ML) classifiers confirmed BKOA-GOBL's dominance in terms of accuracy and computational efficiency. Its statistical superiority is confirmed by the Wilcoxon rank-sum test, underscoring its robustness, adaptability, and effectiveness in high-dimensional fraud and malware detection tasks and real-world fraud and malware detection scenarios.
1 Introduction
The rise in the number of computers and mobile devices in recent years has resulted in improvements in computer network processes. Accordingly, there has been an alarming uptick in the frequency of network attacks. The ingenuity and intricacy of the attacks have been increasing, leading to a rise in the profile of network security (Adil et al., 2020; Almaiah and Almomani, 2020).
There are three primary strategies for network security: prevention, detection, and mitigation. The primary focus is on prevention. This proactive approach aims to make it hard for attacks to succeed. If prevention fails to protect the network, we use detection strategies to watch for potential threats. Finally, we have implemented mitigation strategies to ensure that devices continue to function, even during an attack. Detection strategies can be split into two types: network-based, which monitors the entire network, and host-based, which focuses on individual devices. We can also employ two methods for detection: signature-based, which identifies known threats, and anomaly-based, which detects unusual behavior (Rajadurai and Gandhi, 2022).
Intrusion Detection Systems (IDS) and malware detection are key applications that rely on network traffic classification. Host-based detection monitors a computer's internal activities, while network-based systems analyze real-time traffic logs for potential intrusions. One effective method is signature detection, which identifies known attack patterns but struggles to detect new threats (Singh A. P. et al., 2022).
Anomaly-based detection sets a threshold for expected network behavior and triggers an alarm for any deviations. It classifies data as normal or abnormal, but current Intrusion Detection Systems (IDSs) struggle with low detection accuracy and high false alarm rates (Chiba et al., 2019).
The COVID-19 pandemic has led to a significant increase in demand for online purchases of essential goods, which in turn has driven greater use of online payment methods and increased fraud and malware. With the expansion of online commerce, many enterprises have switched to credit cards for transactions. However, this increase in credit card use for online shopping has opened new avenues for criminals to exploit and steal customers' credit card information (Fanai and Abbasimehr, 2023).
Fraud in economic activities poses significant challenges across education, regulation, and business. It harms both service providers and their customers. This issue is particularly critical in the financial industry, as it affects daily financial transactions worldwide. Fraud involves using money or assets illegally for personal gain, eroding trust in financial institutions, and raising living costs. Economic fraud encompasses various harmful practices, including bank fraud, financial statement manipulation, insurance fraud, communications fraud, and illicit actions in commodity and stock markets. These fraudulent activities disrupt the global economy, push services online, and reveal recent weaknesses in the sector (Singh A. et al., 2022; Wahid et al., 2023).
Between 2000 and 2015, losses from debit and Credit Card Fraud (CCF) increased significantly. Although unauthorized transactions and fake cards accounted for only a small number of cases, they accounted for the majority of the financial losses. This issue has led both public and private sectors to invest more in advanced FD systems. These trends highlight the pressing need for robust FD strategies in the financial and e-commerce industries (Rodrigues et al., 2022).
Fraud is an illegal act, and CCF occurs when someone illegally obtains cardholder information through means such as phone calls, letters, or cyberattacks to commit financial crimes. These fraudulent activities are usually carried out using special software controlled by the perpetrator. The CCF identification process begins when a customer makes a transaction that requires verification of their credentials (Alamri and Ykhlef, 2022; Asha and KR, 2021).
The increasing prevalence of Android malware poses significant challenges for effective, efficient detection. Although traditional detection techniques, including static and dynamic analyses, have been essential for identifying malicious applications, cybercriminals have adopted evasion methods such as encryption, polymorphism, code obfuscation, and dynamic code loading to circumvent them. While dynamic analysis provides some protection against obfuscation, it struggles to scale to keep pace with the rapidly increasing volume of malicious Android applications.
As the use of Android devices increases, enhancing their security against malware threats has become critical. To address this problem, ML techniques were applied, focusing on both dynamic behaviors and static properties to detect malware. Despite this progress, there remains a need for more effective features to further enhance detection accuracy. Recently, researchers have begun exploring sonification techniques, which convert data into audio signals to reveal unique acoustic fingerprints. This innovative approach may reveal malicious features that are difficult to detect using traditional analysis methods. Additionally, sonification offers advantages such as increased processing speed, improved code coverage, and reduced resource consumption (Firdaus et al., 2018).
With the advent of big data and large datasets, in areas such as fraud and malware detection, additional problems often involve many features that individually have low discriminative power, making it difficult to achieve satisfactory classification accuracy. Classifiers tend to perform sub-optimally when faced with high-dimensional, low-quality features. Besides using highly representative features, it is also necessary to improve features through techniques such as FS and hyperparameter optimization. FS addresses the challenges of data classification in high-dimensional environments, particularly in areas such as fraud and malware detection, where many features have low discriminative power. Classifiers often perform poorly with these low-quality features, underscoring the need for efficient FS methods to identify the most relevant features and discard irrelevant or redundant ones. Traditional exhaustive search methods for determining optimal feature sets are often inefficient, prompting research into wrapper-based, biologically inspired MHAs that can simplify this process. Although these algorithms have proven effective across various applications, there is a noticeable lack of studies evaluating their performance, especially in fraud and malware detection. Ensures effective discrimination between benign and malicious transactions. FS is a method for determining the best combination of features that gives optimal results. It reduces the input feature space by removing irrelevant, redundant, or noisy features (El-Mageed et al., 2025). Using exhaustive search methods to find the best feature sets is not always practical with high-dimensional feature spaces. Biologically inspired wrapper-based MHAs have been proposed to reduce the time required to find the optimal solution (Hussien et al., 2024; El-Mageed et al., 2024; Abd El-Mageed et al., 2023). However, there is a lack of literature available on evaluating the performance of fraud and malware detection.
Taken together, financial fraud prevention, Android malware analysis, and intrusion detection all exemplify the broader challenge of high-dimensional imbalanced classification. Every domain requires models to sift through vast feature spaces dominated by benign activity, while rare but critical malicious events must be accurately identified. By framing these diverse applications under a shared methodological foundation, we highlight the continuity of challenges across domains and emphasize the importance of advanced FS and optimization strategies to improve detection accuracy and reduce false alarms.
1.1 Motivations
The FS plays a pivotal role in ML and data mining, especially when dealing with high-dimensional datasets, as it significantly improves classification model performance by identifying the most relevant features. The inherent complexity and large search spaces associated with FS require the use of efficient optimization algorithms (Abdel-Basset et al., 2023a). The Kepler optimization algorithm (KOA) stands out as a notable solution, which uses concepts from planetary motion to model gravitational interactions between potential features (Russell, 1964; Stephenson, 2012). This enables the KOA to skillfully balance exploration of the search space and exploitation of promising solutions (Hu et al., 2024; Houssein et al., 2024).
The most noteworthy constraint in FS is scalability, due to the challenges posed by dimensionality, which can hinder traditional search methods. The KOA excels at tackling large-scale FS issues due to its innovative structure, which enables it to navigate large search areas via typical gravitational interactions efficiently. The KOA has proven its effectiveness in tackling a variety of optimization challenges, making it a valuable tool for FS, reliably delivering high-quality solutions (Hu et al., 2024; Russell, 1964; Stephenson, 2012). The performance of KOA is characterized by fast convergence and high accuracy, which are essential for FS tasks that aim to reduce the feature set while maintaining classification accuracy (Mohamed et al., 2024). Moreover, the dynamic search mechanism in the KOA plays a pivotal role in mitigating the risk of overfitting, as it enhances feature diversity (Abdel-Basset et al., 2024).
The KOA has proven exceptionally effective in solving FS optimization problems, inspired by celestial mechanics (Abdel-Basset et al., 2023b). Its unique ability to simultaneously explore globally and exploit locally makes it a strong candidate for tackling large search spaces and high-dimensional optimization tasks. The KOA exhibits remarkable robustness in terms of solution quality and is flexible enough to adapt to various FS challenges. KOA is superior in accurately detecting a limited set of suitable attributes. But optimizing binaries and FS with a KOA requires adjustments to mitigate issues such as premature convergence and ensure ensemble diversity.
1.2 Contributions
This study proposes a novel BKOA-GOBL to address the FD problem by enhancing the capabilities of FS. This algorithm leverages planetary motion mechanics and integrates GOBL to escape local optima. The contribution of this study extends beyond a simple integration. The novelty of the proposed BKOA-GOBL approach lies in several methodological and conceptual enhancements that significantly improve optimization behavior, exploration-exploitation balance, and FD performance. The primary contributions of this study, which highlight the innovation of the proposed BKOA-GOBL methodology for tackling FD via FS, are summarized as follows:
• An enhanced solution updating mechanism is introduced in the proposed BKOA-GOBL, which dynamically models a gravitational-orbital updating rule that integrates planetary motion parameters (gravitational attraction, orbital velocity, and planetary distance) to refine feature subset selection. The incorporation of a cyclic control parameter and a local escaping operator boosts convergence stability and search efficiency, prevents oscillations, and improves FS accuracy–an advancement over the standard KOA.
• The class unbalance problem inherent in FD datasets is addressed through integrating the RUS technique with the proposed BKOA-GOBL methodology to achieve scalable and real-time FD. This ensures balanced training without excessive preprocessing, mitigates model bias toward the majority class, and improves the sensitivity in detecting fraudulent transactions.
• The integration of the GOBL strategy within BKOA differs from traditional opposition-based learning methods, which enhance global exploration by generating ghost-based solutions beyond the central search region using adaptive relations among the present, best, and proposed solutions. This approach enables broader and more flexible exploration, mitigates premature convergence to escape local optima, and diversifies the population, leading to more robust solutions.
• The continuous BKOA-GOBL model was redesigned for binary FS tasks via an effective threshold-based transformation and a multi-objective fitness function that simultaneously reduces the number of selected features while increasing classification accuracy–tailored for high-dimensional fraud datasets.
• Extensive evaluation across five diverse benchmarks (Australian, European, Synthetic Financial Transaction Log, CIC-MalMem-2022, and Real vs. Fake Job Postings Prediction) demonstrates the proposed BKOA-GOBL's superiority in terms of various evaluation metrics, including classification accuracy, fitness, feature reduction, precision, recall, F-score and ROC_AUC.
• Statistical validation based on the test of Wilcoxon rank-sum (5% significance level) confirms the significant superiority of BKOA-GOBL over state-of-the-art MHAs. Its consistent performance across diverse scenarios and observed improvements in convergence rate, robustness, and accuracy affirm its adaptability, robustness, and practical effectiveness for real-world FD challenges.
1.3 Structure
This is how the remainder of the paper is organized. Section 2 analyzes the current research in the area of fraud and malware classifications using MHTs. Section 3 explains and outlines the steps of the proposed BKOA-GOBL method to address FS issues related to fraud and malware detection; Section 4 presents the empirical findings of the recommended BKOA-GOBL and its peers, and the conclusions, in addition to problems for future investigation in Section 6.
2 Literature review
This section presents recent ML, DL, and metaheuristics techniques for classifying fraud and malware.
Tarwireyi et al. (2024) investigated the use of audio features for detecting Android malware. Their study involved extracting 191 static audio features from Android micro APK datasets and evaluating fourteen different MHAs to ensure the efficiency of FS. These selected features were then used to train a light gradient-boosted classification model. The results showed that this method had high discriminatory power, with the genetic MHA achieving a significant 50.26% feature reduction and boosting classification accuracy to 99.72%.
Toğaçar and Ergen (2024) utilized the CIC-Evasive-PDFMal2022 dataset designed by the Canadian Cybersecurity Institute, which classified PDFs into benign and malicious classes. During the preprocessing phase, parameters from text-based PDFs were transformed into 2D barcode representations. Several 2D Convolutional Neural Network (CNN) models, including ShuffleNet, ResNet18, and MobileNetV2, were trained on this data to extract distinct feature sets. The Honey Badger optimizer was employed to identify the most effective feature set, which was then classified using the softmax method, yielding a remarkable accuracy of 99.73%.
Kaplan and Babalik (2025) employed various MHAs, including Artificial Bee Colony optimizer, Genetic Algorithm (GA), Particle Swarm Optimization (PSO), while also introducing a novel GA-PSO algorithm aimed at improving task scheduling efficiency within cloud computing, particularly under adversarial conditions such as DDoS attacks that could compromise system performance. The findings underscored the potential of advanced scheduling methods to enhance the sustainability of cloud computing while providing practical solutions to real-world security threats.
Alashjaee (2023) proposed a new technique to improve intrusion detection (ID) called the Remora Optimization Algorithm-Levy Flight (ROA-LF). This method aims to enhance the original ROA by using Levy Flights for better performance. To test the effectiveness of ROA-LF, the researchers used various performance measures on five benchmark datasets for ID. These datasets come from data mining competitions, the ID Evaluation benchmark, and network security labs. Besides ID, ROA-LF was also applied to solve three engineering problems: pressure vessel design, three-bar truss, and cantilever beam design. Comparison showed that their proposed methodology outperformed its peers, including Particle Swarm Optimization (PSO), the Salp Swarm Algorithm (SSA), the original ROA, and the snake optimizer.
Kale et al. (2024) developed a method that combines Black Widow Optimization (BWO) with Generative Adversarial Networks (GANs) to enhance cryptojacking detection. By optimizing features with Hybrid BWO and augmenting the dataset using GANs, they enriched the training data, resulting in a detection accuracy of 98.02%. Their approach significantly outperformed existing methods and provides a valuable framework for addressing digital security challenges.
Ghaleb et al. developed a spam detection system that combines six types of Advanced Grasshopper Optimization Algorithms (AGOA) with a Multilayer Perceptron (MLP). This system, called AGOAMLPs, effectively classifies emails as spam or not spam. Using datasets such as UK-2011 Webspam, SpamAssassin, and SpamBase, the results showed that the MLP with AGOA techniques outperformed other methods in terms of detection rate, accuracy, and reducing false alarms (Ghaleb et al., 2021).
Ramesh et al. (2025) presented an innovative approach to cybersecurity through Enhanced Threat Intelligence for Cybersecurity Using an Ensemble of DL Models with MHAs (ETIC-EDLMHAs). It aimed to detect and effectively address network attacks. The process began with data preprocessing, which involved preparing the input data for analysis using the Word2vec model for feature extraction. In the classification phase, an ensemble of DL models was employed, notably recurrent neural networks, long short-term memory networks, and conditional variational autoencoders. Hyperparameter tuning was performed using the Wolverine optimization algorithm. Extensive simulations demonstrated that the ETIC-EDLMHAs model surpassed existing methods, achieving a remarkable accuracy of 98.51% on the CybAttT dataset.
Mosa et al. (2024) created a framework that integrates MHAs with ML models to enhance the accuracy of fraud prediction while tackling data imbalances. They utilized 15 MHTs for FS and evaluated predictive performance using Random Forest (RF) and Support Vector Machine (SVM). Working with a Kaggle dataset containing 284,807 European card transactions, they implemented an under-sampling technique to achieve data balance. Their findings indicated that the Sailfish Optimizer, in combination with RF, achieved a classification accuracy of 97%, significantly reducing the feature set by up to 90% and improving computational efficiency.
Prabhakaran and Nedunchelian (2023) introduced an FS method for CCF detection based on oppositional cat swarm optimization. This approach combines ML and DL techniques to improve accuracy. They employed the Oppositional Cat Swarm Optimization (OCSO) for FS. They utilized a bidirectional gated recurrent unit model for classification, along with the chaotic krill herd algorithm for hyperparameter tuning. Their research analyzed a Kaggle dataset comprising 284,807 transactions, with only 0.172% identified as fraudulent. This effectively addressed the significant class imbalance using the SMOTE technique. The results demonstrated a remarkable classification accuracy of 99.97%, surpassing traditional approaches such as Decision Trees (DTs) and RFs.
Sorour et al. (2024) developed a CCF detection framework that utilizes the Brown Bear Optimization (BBO) algorithm to improve FS and classification accuracy while reducing dimensionality. They introduced a Binary BBO Algorithm designed to optimize feature dimensionality and used three ML classifiers–SVM, K-NN, and XGBoost–to detect fraudulent transactions. The framework was assessed using the Australian Credit Approval dataset and further validated on ten benchmark datasets, achieving a classification accuracy of up to 91% and reducing feature dimensionality by 67%, which significantly improved computational efficiency. Performance evaluations demonstrated that the method significantly outperformed ten other multi-hypothesis testing methods.
Mniai et al. (2023) developed a framework designed to improve CCF detection by addressing the issue of imbalanced data and optimizing classification through FS and hyperparameter tuning. They implemented an undersampling technique to create a balanced dataset and used the Support Vector Data Description (SVDD) algorithm for classification. To enhance the hyperparameters of SVDD, they introduced a modified Polynomial Self-Learning PSO (PSLPSO) algorithm. Utilizing the Kaggle European Credit Card dataset, which consisted of 284,807 transactions with only 0.172% being fraudulent, the framework achieved a classification accuracy of 93%, outperforming models like RF, DTs, Logistic Regression (LR), and K-Nearest Neighbors (K-NN). This framework not only provided effective FD but also lowered computational complexity and enhanced model generalization. However, it had some drawbacks, including dataset limitations and the risk of overfitting due to the undersampling approach.
3 The suggested BKOA-GOBL methodology to improve FD via FS
Several interconnected stages determine this BKOA-GOBL methodology's ability to enhance FD, including managing unbalanced data, KOA-driven solution initialization and enhancement, hybridization with the GOBL strategy, and binary alteration and fitness assessment. The subsequent subsections describe these stages.
3.1 FD's unbalanced data addressing by RUS technique
FD datasets typically suffer from severe class unbalance, where the number of samples in non-fraud transactions overwhelmingly outnumbers the fraud ones. Typical classifiers may be biased to forecast the largest class (non-fraud transactions) as a result of this class unbalance, which could result in inadequate detection of the critical minority class (fraud transactions). Resampling techniques (Elsoud et al., 2024) are frequently employed to balance class distributions to overcome class imbalance. RUS is the most popular resampling technique chosen for this study due to its computational simplicity, effectiveness, scalability, and suitability for large-scale financial datasets. In FD, where the number of legitimate transactions can exceed that of fraudulent ones by several orders of magnitude, efficient preprocessing becomes crucial to maintaining real-time detection capabilities.
RUS (Yap et al., 2014) is a procedure to balance the dataset and equalize class distributions by randomly removing samples from the largest class (non-fraud transactions) to equal the samples in the minority class (fraud transactions). This helps reduce the bias that classifiers often develop toward the majority class in unbalanced datasets. By balancing the data, the model can better learn to detect fraudulent transactions. Although RUS may discard some useful non-fraud data, it is effective when working with large datasets where the majority class dominates. Overall, RUS enhances the system's ability to detect infrequent instances of fraud by increasing its sensitivity to the minority class. RUS is a sensible option in this case, as it minimizes the size of the largest class, which accelerates model training and eliminates unnecessary complexity. Its major advantages include:
• Computational efficiency and dataset size: the dataset utilized in this study is massive and high-dimensional. RUS substantially reduces the overall size of the training dataset, accelerating the training process of classifiers and metaheuristic optimization algorithms. This makes the training of complex models (such as BKOA-GOBL) computationally feasible and efficient without compromising the ability to identify the complex patterns of the minority class.
• Memory economy: by working on a smaller dataset, RUS minimizes storage and memory requirements, which is essential when dealing with big data environments.
• Reduction of model bias: by balancing the class proportions, RUS helps mitigate the bias of classifiers toward the dominant (non-fraud) class.
• Preventing noise and distribution shift: RUS uses only real, observed instances from the dataset, ensuring that the model is trained on genuine data points, thus mitigating the risk of introducing synthetic noise or overfitting.
• Ease of integration: RUS can be directly applied before FS or model training without introducing additional parameters or synthetic data generation, making it robust and implementation-friendly. when paired RUS with FS, yielded superior or comparable results compared to implementing hybrid sampling methods. This empirical evidence confirmed RUS as the most practical and effective balancing technique for our specific problem and model architecture.
RUS has the drawback of discarding some informative majority-class instances, which can slightly limit model generalization. To mitigate this limitation, several enhanced sampling techniques (Altalhan et al., 2025; Nguyen et al., 2024) have been developed, including Synthetic Minority Oversampling Technique (SMOTE), and Ensemble-based resampling techniques such as EasyEnsemble and BalancedBaggingClassifier. Despite these synthetic and ensemble techniques enhancing data diversity and learning balance, they require greater computational resources, increased memory usage, and more extensive parameter tuning–factors that substantially increase complexity when applied to high-dimensional and large-scale fraud datasets. Consequently, RUS was adopted in this study as a practical and computationally lightweight technique that allows the proposed BKOA-GOBL framework to focus on feature optimization and classification accuracy without incurring excessive preprocessing overhead. This choice achieves a well-balanced trade-off between computational efficiency and FD sensitivity, supporting the framework's objective of building a scalable and real-time FD system.
3.2 Solution initialization and enhancement using the suggested KOA
This stage is carried out using KOA (Abdel-Basset et al., 2023a), which is a physics-inspired MHA founded on Kepler's laws of planetary motion (Russell, 1964). The search region is modeled as a solar system, where the Sun represents the KOA's optimal solution and planets symbolize the KOA's potential solutions. The KOA is guided by Kepler's three rules: The planets' elliptical orbits around the Sun are stated in the first rule with a single focus. The second rule describes the variation in the Earth's speed as it revolves around the Sun: it moves quickly when it is nearer the Sun and slowly when it is farther away. The third rule states that the square of the orbital period is directly proportional to the cube of its semi-major axis, establishing a connection between a planet's orbital period and the size of its orbit. According to these rules, a planet's trajectory is influenced by its mass, position, orbital speed, and gravitational force. To properly balance exploration and exploitation during optimization, these factors form the foundation of the KOA's mathematical modeling. Theoretically, planetary locations and speeds can be predicted using Kepler's laws. The anticipated KOA's proceedings are described in depth in the subsequent subsections.
3.2.1 Solution initialization
Every planet in KOA stands for a solution within the algorithm's population. A set of N planets, representing the population size, are created at the start of the search process to act as potential solutions in the search space. A d-dimensional vector, where d signifies the dataset's feature count, is used to represent each solution. These potential solutions are initialized randomly within their defined lower and upper boundaries. The random initialization is performed using the following formula:
Here, Xi, j refers to the ith initial solution for decision variables (j=1, 2, ..., d), while rand is a value that is created at random inside the interval [0, 1]. The terms and represent the upper and lower boundaries for each j variable, respectively. Additionally, the normal distribution is used to select the orbit period of each planet randomly. From the [0, 1], the eccentricity ei of each planet's orbit is arbitrarily selected.
3.2.2 Attraction of gravity computation
This step calculates the gravitational attraction between each planet and the Sun. Each planet has a unique gravitational pull, influenced by its volume and the distance between the Sun and the planet. As the planet's orbital speed increases, it gets closer to the Sun, and vice versa. The Sun's gravity as the optimal solution and planets as prospective, can be estimated as follows:
where t is the existing generation's number, and ei is an arbitrary value inside [0, 1] that indicates a planet's eccentricity of orbit. To prevent the error of dividing by zero, ϵ represents a tiny value, and rand1 is an arbitrary value within [0, 1], which gives the gravity values more variation throughout the optimization process. The masses of the Sun XBest and every planet Xi are denoted by MBest and mi, respectively, and are determined as follows:
The normalized mass values of MBest and mi are denoted by and , respectively. A random number rand2 in [0, 1] is introduced to diversify the mass values among different planets. The tth generation's worst and optimal solutions are and , respectively, while the kth solution is , and the current ith solution at the tth generation is . The worst highest and the optimal minimum fitness function values at generation t is given by and , respectively. Ri is the Euclidean distance among XBest and Xi, and is calculated by:
The normalized value of Ri is . To assure the precision of the search, μt represents the global gravity constant, which decreases exponentially with each generation t. This μt is calculated as follows:
where Tmax is the allowed generations' number, γ is a constant value, and μ0 is a premier value.
3.2.3 Planet' speed measurement
A planet's position in relation to the Sun determines its speed. For a planet near the Sun, the gravitational pull is exceedingly powerful. To avoid being drawn toward the Sun, the Earth attempts to speed away from it. Conversely, a planet's speed diminishes as the distance from the Sun increases, which decreases the Sun's gravitational influence. The mathematical formulation applied to measure the planet's speed around the Sun at the tth generation is shown in Equation 7. The evaluation of planetary speed in KOA can be understood through two complementary search scenarios inspired by planetary motion around the Sun, as follows:
1. Planets close to the Sun: if the normalized distance , the planet is considered to be near the Sun. In this scenario, due to the Sun's gravitational pull, the planet attempts to accelerate and push itself away in an effort to escape being pulled inward. In optimization terms, this condition represents a situation where a solution is in a dense or critical region of the search space, and stronger movement is needed to discover better areas. Mathematically, the speed in this scenario is influenced by either the distance between two randomly chosen solutions or the distance between the current solution and a randomly selected solution. This scenario serves to diversify the search behavior of KOA and corresponds to an exploration-oriented behavior, ensuring the algorithm does not become trapped early. However, this strategy may result in reduced speed for the planets when population diversity is limited, potentially hindering the search process. To counteract this effect and maintain adequate movement throughout the optimization, this component incorporates the distinction between the search space's upper and lower bounds, which helps maintain speed and prevents the local optimum from converging prematurely.
2. Planets away from the Sun: if , the planet is considered to be far from the Sun. In this scenario, the gravitational force is weaker, and the planet correspondingly reduces its speed. In optimization terms, this condition reflects that a solution is in a relatively stable and less critical region of the search space. Mathematically, depending on the distance between the present solution and a randomly selected one, the speed is decreased in this scenario. While this scenario promotes exploitation, its primary drawback is that solutions remain unchanged, which may make it harder for the algorithm to break out of the local optimum. To address this issue, the distinction between the search space's upper and lower bounds is also integrated, thereby enhancing planet mobility even in low-diversity settings.
Here, and represent two arbitrary chosen candidate solutions (the ath and bth) at generation t. The scalars rand3 and rand4 are random values drawn uniformly from the interval [0, 1], while is a random vector with elements in the same range. The parameters ℓ, ℑ, and ζ are calculated using the following expressions:
To determine the proportion of movement or step size for each planet, Equation 10 is employed.
means a random vector between 0 and 1. At generation t, the orbital semi-major axis of planet i is denoted by , which was calculated using:
The orbital period of planet i, denoted as Ti, is computed as the absolute value of a randomly generated number, i.e., Ti=|rand|. The values of the control parameters and U2 are defined as follows:
To lessen the possibility that planets get stuck in a local optimum, a directional flag г is introduced. This flag alters the search direction, thereby enhancing the algorithm's ability to explore the search region thoroughly. Here is a definition of the mechanism.
3.2.4 Exploration and exploitation optimization
The KOA simulates how planets move through space as they move closer and farther from the Sun by alternating between exploration and exploitation capabilities. By investigating planets farther from the Sun to detect new candidate solutions (exploration case) and intensifying the search close to the Sun to refine and improve existing solutions (exploitation case), the KOA imitates this pattern.
The following mathematical equation represents the improved solution for each planet i farther from the Sun in the exploration case:
In KOA, a planet's speed enables exploration when it is farther from the Sun, while the Sun's gravity encourages a planet to exploit regions near the Sun (the optimal). If the Sun represents a local optimum, the Earth can increase its speed to escape, helping the algorithm avoid a local optimum. Thus, the Sun's gravity drives exploitation, and the planet's speed ensures balanced exploration. Furthermore, to enhance the KOA's exploration and exploitation capabilities, the dynamic variation in distance between planets and the Sun is simulated. The KOA promotes exploration while planets are further from the Sun and exploitation when they are closer. This behavior is adjusted by a dynamic controlling parameter h–larger value enhance exploration, while smaller value favor exploitation. The stochasticity alternation between this behavior and Equation 18 strengthens the capacity of the KOA to move away from local optimum and toward global solutions. This phenomenon is represented mathematically below.
During the optimization process, the cyclic control parameter a2 drops by gradual from −1 to −2 for TC cycles.
3.3 GOBL strategy incorporation
To enhance the KOA's capacity to escape local optima, this paper incorporates a GOBL strategy. Unlike traditional opposition-based learning methods (Tizhoosh, 2005; Mahdavi et al., 2018), which rely on a fixed central point within the search space and generate opposite solutions confined to the midpoint region, GOBL introduces greater flexibility and spatial diversity. Traditional opposition-based learning is centered around a fixed midpoint within the search space. In this framework, opposite solutions are generated based on a static rule that reflects the current position around the center of the exploration range. As a result, the newly generated solutions tend to cluster near this midpoint, and their spatial extent typically does not surpass the distance between the present solution and the central point, making it challenging for the algorithm to investigate regions distant from the central area where the global optimum is located and may struggle to escape local optima.
In contrast, GOBL dynamically combines information from the present individual, a proposed individual, and the best individual found so far to replace poor proposed positions with newly generated ghost possible solutions. These ghost solutions are designed to extend beyond the conventional bounds set by the midpoint, thereby enabling broader and more adaptive exploration and increasing the chances of escaping local optima, especially when the global optimum is far from the search center. To better illustrate the GOBL strategy, suppose a space with two dimensions defined by the X-axis and Y-axis. The X-axis defines the search boundaries [LB, UB]. Within this space, let Xnew denote the position of a newly generated possible solution with a height hnew. The best solution discovered is projected onto the X-axis at XBest position with a hBest height. Also, the present possible solution has a projection Xi with height hi. Using these reference points, the position of the ghost xi with height hi is calculated, as follows:
Let Pi=(xi, hi) represent the ghost position, where xi represents the X-axis projection and hi is the height. In this context, the Y-axis is used metaphorically as a convex lens to simulate optical imaging. When Pi passes through the lens, it produces a genuine image =, where corresponds to the opposite solution of xi. Hence, the relationship between the ghost position and its genuine image is defined by:
As a result, the GOBL calculation can be derived from the previous equation to generate opposition-based solutions that go beyond traditional midpoint reflections, as follows:
3.4 Binary alteration and assessment of continuous solution
In FS, the goal is to lessen the number of features while retaining classification effectiveness. Achieving this requires careful selection of the most relevant features and discarding those that negatively impact the classification accuracy. In binary optimization, FS problems require encoding solution representations as binary vectors. Since the KOA operates in a continuous domain, which is incompatible with binary FS problems, it must be adapted by altering the continuous values to a binary format. Each solution is represented as a one-dimensional binary vector, where 1 mentions a picked feature and 0 mentions exclusion. The transformation of a continuous solution into its binary counterpart , using a random threshold thrrand in [0, 1], is defined by the following rule:
thrrand should be chosen with consideration of the problem context, as this directly influences the FS behavior. Different problem scenarios may require different threshold settings to achieve optimal performance.
To assess a solution's quality, two conflicting objectives must be balanced: increasing the accuracy of classification and minimizing the number of chosen features. While high accuracy ensures reliable predictive performance and utility of models, aggressive feature reduction can lead to performance degradation. Therefore, a well-balanced fitness function is essential. It incorporates both the size of the feature subset and the accuracy of classification, and is described mathematically as:
where (1 − classification accuracy) denotes the misclassification error rate, D refers to the features' count, and d* means the features chosen's count. w1 and w2 signify the contribution from the accuracy and the cardinality of the feature sets, respectively, with w1∈[0, 1], and w2=1−w1.
After presenting the fundamental stages of the suggested BKOA-GOBL in the previous subsections, the BKOA-GOBL's pseudo-code is summarized in Algorithm 1. Additionally, the whole process and key stages of the BKOA-GOBL are also illustrated in the flowchart in Figure 1.
3.5 Computational complexity of the BKOA-GOBL methodology
3.5.1 Time computational complexity of the BKOA-GOBL methodology
The time computational complexity of the proposed BKOA-GOBL methodology can be evaluated by analyzing its core stages that collectively contribute to its performance improvement in FD. These stages include addressing class imbalance using the RUS technique, generating and refining solutions via the KOA, incorporating the GOBL strategy, and performing binary alterations on solutions, as well as evaluating the fitness function. The total time computational complexity, expressed in big-O notation, Otime(BKOA−GOBL), is derived as follows:
• RUS technique: balances the dataset by randomly removing samples from the majority class to equalize the number of minority and majority samples. This process operates linearly with respect to the number of instances S in the dataset, resulting in a time complexity of Otime(S).
• Solution generation and refinement: Generates an initial population of N candidate solutions, each represented in a d-dimensional search space. The time complexity of this process is Otime(N×d). After that, each candidate's position is updated iteratively based on gravitational and orbital dynamics across Gmax generations. The time complexity of this iterative update is Otime(Gmax×N×d).
• GOBL strategy: generates ghost-based opposition solutions to replace inferior candidates and maintain diversity. The computational cost of this step is proportional to both the population size and the problem dimension, resulting in a time complexity of Otime(N×d).
• Binary alteration and fitness function assessment: converts continuous feature representations into binary form during each iteration for all individuals to adapt the continuous KOA for discrete FS. This step has a time complexity of Otime(Gmax×N×d). Then, the classification-based fitness for each solution has been computed at every iteration. Assuming each fitness computation depends primarily on model accuracy using M classifier evaluations, this step has a time complexity of Otime(Gmax×N×M), which simplifies to Otime(Gmax×N) when M is constant.
Where N means the number of individuals in the population, Gmax is the maximum iterations allowed, and d identifies the dimensionality of the problem space. After that, the overall time computational complexity of the BKOA-GOBL can be determined by combining all stages as follows:
After simplification, the overall time computational complexity is dominated by the iterative BKOA-GOBL-driven solution update and binary conversion processes, resulting in:
This time complexity Otime(Gmax×N×d) is consistent with other population-based metaheuristic algorithms used for FS. While the inclusion of GOBL and binary alteration increases computational demand, these additions significantly enhance exploration and exploitation balance, reducing the risk of premature convergence and improving FS quality. The trade-off between computational cost and detection accuracy is justified, as BKOA-GOBL achieves superior convergence, robustness, and scalability across high-dimensional datasets for FD. Additionally, the algorithm can benefit from parallel and distributed implementations, where the evaluation of candidate solutions can be executed concurrently, effectively mitigating computational overhead in large-scale applications.
3.5.2 Space computational complexity of the BKOA-GOBL methodology
The space computational complexity reflects the memory usage or storage space required for the BKOA-GOBL algorithm to handle a problem as the input size grows. It includes the memory required to store all input variables, internal vectors, temporary structures, and auxiliary states used during the optimization process. The following analysis divides the total required memory into two main components:
• Memory space complexity of input parameters: this refers to the memory needed to store the algorithm's input parameters to operate. The proposed BKOA-GOBL framework (as shown in Algorithm 1) utilizes eight input variables: N, Tmax, d, μ0, γ, TC, LB, and UB. Each variable is stored as a single numerical value requiring 4 bytes of memory. Thus, the total memory footprint for the input variables is: (8 × 4 = 32 bytes), and therefore the input values space complexity contributes only constant memory.
• Memory space complexity of contributory parameters: this refers to the additional temporary storage required by the algorithm during optimization for internal computations. It consists of the following components:
- Population vector X: the BKOA-GOBL maintains a population of N continuous candidate solutions, each of dimension d. Each solution is a floating-point value requires 4 bytes; therefore, the memory space required is: (4 × N×d) bytes. This contributes linear space complexity in terms of N×d.
- Binary population vector Xbinary: After binarization, each candidate solution is represented as a binary vector of dimension d, consuming 1 byte per entry, but approximated by standard 4-byte allocation for uniformity. Thus, the memory space for binary vectors: (4 × N×d) bytes, which is linear in N×d.
- Fitness values and scalar variables: the algorithm stores the fitness of all individuals (fit(Xi), fit(Xk)), optimal and worst solutions (fit(XBest), fit(XWorst)), orbital quantities (MBest, mi, Fi, Ri, ai, ϵ, Ti, ei, Vi, ℑ, г, ζ, ℓ, , , , U2), control parameters (μ, μ0, γ, Tmax, LB, UB, h, TC, a2), and random coefficients (rand, rand1, rand2, rand3, rand4, , ). In total, the algorithm uses 37 such scalar variables, each requiring 4 bytes: (37 × 4 bytes = 148 bytes), which corresponds to constant space.
- Population vectors: the population in the algorithm consists of eight vectors: Xi, Xk, XBest, XWorst, Xa, Xb, Xnew, . Each vector has a dimensionality of d. Since every position requires 4 bytes of memory, each vector occupies (4 × d) bytes. Therefore, the total complexity of the memory space required for all eight vectors is: 8 × 4 × d bytes = 32 × d bytes. This results in a linear space complexity with respect to dimensionality d.
Thus, the total complexity of memory space for the previous contributory parameters is:
Putting everything together, the total memory space complexity of the BKOA-GOBL methodology is:
Ignoring all constants, the big-O notation Ospace(BKOA−GOBL) for the total memory space complexity of the BKOA-GOBL becomes:
Therefore, the overall space computational complexity of the BKOA-GOBL methodology is:
4 Experimental results and analysis
The experimental results for the proposed BKOA-GOBL methodology, in comparison to various alternative algorithms, are described in detail in this section. The presented technique is verified utilizing three distinct benchmark datasets from multiple sources. The average and Standard Deviation (STD) of the evaluation metrics were estimated and presented. Information regarding the benchmark datasets and the parameters for MHTs can be found in Sections 4.1, 4.2, respectively. Performance metrics are explained in Subsection 4.3. The results of the recommended BKOA-GOBL via k-NN, and Xgb-tree classifiers are discussed in Subsection 4.4. The findings of the BKOA-GOBL against its peers are studied in Sections 4.6, 4.7. The convergence graphs are also depicted in Section 4.8. Finally, Wilcoxon's test determines the differences in the values of fitness between the proposed BKOA-GOBL and its competitors.
4.1 Benchmarks description
In this section, we examine five publicly accessible datasets that are frequently utilized in the creation and assessment of classification models for FD, cybersecurity, and financial decision-making. These datasets encompass a range of domains, including credit approval, transaction fraud, malware analysis, economic simulation, and employment fraud. Each dataset is distinguished by its complexity of features, number of records, class distribution, domain specificity, and availability. Table 1 provides a summary of the key attributes of each dataset, along with direct access links for reproducibility and further investigation.
4.2 Parameters configuration
The BKOA-GOBL was assessed alongside several binary versions of different MHTs, which included the original BKOA and ten recent MHTs. Each algorithm was tested thirty times per dataset to account for variability, and average performance metrics were provided for equitable comparisons. To achieve equity, all MHTs were governed by a 10 size of population and a limit of 100 generations. The attributes of the datasets indicated the scale of the problem, while the continuous search domain was set to [−1, 1] to create a broad yet controlled search space.
A 10-fold cross-validation method was employed for evaluating the generalizability and robustness of the BKOA-GOBL and its competitors. The datasets were split into 80% training and 20% testing subsets. The training subset was utilized to fine-tune the classifiers, while the testing subset was used to assess the effectiveness of the chosen features. Parameter configurations for each technique adhered to the original specifications established in foundational studies, with a summary presented in Table 2. The experiments were conducted in a Python environment on a high-performance computing system equipped with 256 GB of RAM and a Dual Intel Xeon Gold 5115 CPU, running on Microsoft Windows Server 2022.
Table 3 presents the main coefficients of the ML classifiers employed in this study.
4.3 Evaluation measures
We utilize a wide range of evaluation measures to measure the efficacy of the suggested BKOA-GOBL for fraud and malware detection. These measures are essential for evaluating the model's predictive ability, stability, and effectiveness.
• Accuracy (AC): accuracy evaluates how correct the model is by determining the ratio of instances that have been classified correctly, as in Equation 27.
where:
- True positives (TP): unauthorized transactions accurately recognized as Fraud.
- True negatives (TN): trustworthy transactions accurately recognized as authentic.
- False positives (FP): trustworthy transactions inaccurately recognized as fraud.
- False negatives (FN): unauthorized transactions inaccurately recognized as trustworthy.
An increased accuracy reflects improved model performance.
• Fitness function: the fitness function of the KOA optimizer estimated the performance of the model by finding a balance between classification accuracy and FS.
• Size of chosen attributes: this metric reflects the total number of features retained after the SBO algorithm performs FS. Reducing the number of features while maintaining high accuracy improves the model's efficiency and interpretability.
• This measure estimates the size of features contained after applying the KOA for FS. Minimizing the size of selected features while preserving increased accuracy enhances both the efficiency and interpretability of the model.
• Precision (P): precision (De Medeiros et al., 2007) measures the percentage of accurately identified fraudulent transactions out of all transactions that were classified as fraudulent, as in Equation 28.
• Recall (R): recall (Amigó et al., 2009) estimates the ability of the model to identify fraud transactions accurately, as in Equation 29.
• F1-Score: the F1-score (Amigó et al., 2011) is the harmonic average of recall and precision, providing a specific measure that balances both, as in Equation 30.
In the following subsections, we will thoroughly review and investigate the experimental outcomes, highlighting the significant results in bold.
4.4 Empirical outcomes of two ML models (Xgb-tree, and K-NN) and the suggested BKOA-GOBL
This section compares the results of the K-NN and Xgb-tree models with the proposed BKOA-GOBL. It focuses on assessing their effectiveness by examining average classification accuracy and the average number of features selected, enabling us to gauge the impact of the BKOA-GOBL method.
Table 4 illustrates the performance metrics for the proposed BKOA-GOBL alongside the primary K-NN, focusing on mean accuracy and the size of the selected features. As depicted in Table 4, the proposed BKOA-GOBL combined with K-NN has significantly enhanced classification accuracy across five benchmark datasets, achieving an increase of 11.52% in the Australian dataset, 44.76% in the European dataset, 3.56% in the Synthetic Financial Transaction Log dataset, 31.37% in the Real vs. Fake Job Postings Prediction dataset, and a slight improvement of 0.05% in the CIC-MalMem-2022 dataset. Furthermore, the BKOA-GOBL method has led to a reduction in the number of features selected from the benchmark datasets, with decrease rates of 58.57% in the Australian dataset, 62.77% in the European dataset, 80.00% in the CIC-MalMem-2022 dataset, 81.82% in the Synthetic Financial Transaction Log dataset, and 48.96% in the Real vs. Fake Job Postings Prediction dataset.
Table 4. Outcomes of the basic K-NN classifier and the suggested BKOA-GOBL concerning mean accuracy and size of picked features.
Additionally, Table 5 illustrates the performance metrics for the proposed BKOA-GOBL alongside the primary Xgb-tree, focusing on mean accuracy and the size of the selected features. As shown in Table 5, the suggested BKOA-GOBL combined with Xgb-tree has significantly improved accuracy across five benchmark datasets, achieving an increase of 05.94% in the Australian dataset, 02.70% in the European dataset, 4.08% in the Real vs. Fake Job Postings Prediction dataset, and a slight improvement of 0.0600% in the CIC-MalMem-2022 dataset and 0.0504% in the Synthetic Financial Transaction Log dataset. Furthermore, the BKOA-GOBL method has led to a reduction in the number of features selected from the benchmark datasets, with decrease rates of 55.50% in the Australian dataset, 53.33% in the European dataset, 83.45% in the CIC-MalMem-2022 dataset, 60.91% in the Synthetic Financial Transaction Log dataset, and 50.22% in the Real vs. Fake Job Postings Prediction dataset.
Table 5. Outcomes of the basic Xgb-tree classifier and the suggested BKOA-GOBL concerning mean accuracy and size of picked features.
Finally, the BKOA-GOBL method surpassed the basic ML models, K-NN, and XGB-Tree in terms of mean accuracy and the number of selected attributes across the five datasets. This demonstrates its potential effectiveness for FS in comparison to these basic ML models.
4.5 Comparative evaluation of the suggested BKOA-GOBL under various resampling techniques
To further validate the robustness and adaptability of the proposed BKOA-GOBL framework, additional experiments were performed using several resampling techniques to handle the class imbalance challenge commonly observed in FD datasets. Specifically, the RUS technique adopted in the primary BKOA-GOBL framework was compared with two Ensemble-based resampling techniques, namely EasyEnsemble and BalancedBaggingClassifier. Each resampling method was applied at the preprocessing stage before FS and classification to ensure a fair comparative assessment across all experiments.
Table 6 presents the comparative performance metrics obtained across the examined datasets, including accuracy, fitness score, number of selected features, precision, recall, F1-score, and ROC_AUC. The results clearly show that RUS delivers the strongest and most consistent performance across the majority of the evaluation measures. When paired with the BKOA-GOBL optimization framework, RUS achieves the highest or near-highest accuracy and ROC_AUC values on most datasets, while also generating smaller feature subsets and lower fitness values, indicating more efficient and effective feature selection. These findings demonstrate that RUS provides a robust balance between predictive performance and computational efficiency. Although the Ensemble-based methods yield competitive results in specific cases, they do not consistently outperform the RUS configuration and typically introduce additional computational cost due to increased sample size or multiple resampling stages. Overall, the comparative analysis confirms that BKOA-GOBL coupled with RUS represents the most effective configuration for addressing class imbalance within the tested fraud detection datasets.
Table 6. Outcomes of the suggested BKOA-GOBL under different resampling techniques in terms of the average classification accuracy, fitness, selected features, precision, recall, F-score and ROC_AUC.
The comparative findings demonstrate that ensemble-based resampling techniques can enhance minority-class sensitivity and classification stability, yet they do so at the expense of increased computational and memory complexity. Conversely, the RUS-based implementation of BKOA-GOBL presents a strategic compromise, delivering reliable performance with minimal preprocessing overhead.
This efficiency allows the proposed BKOA-GOBL framework to maintain scalability, fast convergence, and real-time applicability while preserving balanced FD performance. Therefore, the choice of RUS in this study reflects a deliberate trade-off between generalization and computational economy, aligning with the overarching goal of developing a robust and deployable FD system for big data financial environments. Future research directions may investigate hybrid resampling schemes that combine RUS with adaptive ensemble techniques to further improve detection sensitivity without sacrificing runtime efficiency.
4.6 Experimental outcomes of the proposed BKOA-GOBL vs. various recent MHTs employing K-NN classifier
Table 7 shows an evaluation of the proposed BKOA-GOBL algorithm performance via considerable MHTs utilizing a K-NN classifier regarding five benchmark datasets (Australian, European, Synthetic Financial Transaction Log, Real vs. Fake Job Postings Prediction, and CIC-MalMem-2022). Essential measures examined contain classification accuracy, fitness, selected features, precision, recall, F-score and ROC_AUC.
Table 7. Outcomes of the suggested BKOA-GOBL and various MHTs using the K-NN classifier concerning the average classification accuracy, fitness, selected features, precision, recall, F-score, and ROC_AUC.
Table 7 presents the outcomes of the BKOA-GOBL based on K-NN and its peers regarding classification accuracy across five datasets used (Australian, European, Synthetic Financial Transaction Log, Real vs. Fake Job Postings Prediction, and CIC-MalMem-2022). The performance of each algorithm is evaluated based on average accuracy and SD from multiple runs, shedding light on their reliability and effectiveness. The proposed BKOA-GOBL ranked first, achieving the highest average accuracy and smallest SD across all datasets, which reflects its stability and exceptional performance. For instance, in the Australian benchmark, BKOA-GOBL records a mean accuracy of 0.9051 with an SD of 0.0022, while on CIC-MalMem-2022, it achieves a perfect score of 0.9996 with an SD of just 0.0002. The BAVO comes closely behind, consistently ranked second in average accuracy across most datasets and providing competitive SD values, including an average accuracy of 0.9024 on the Australian dataset and 0.9460 on the European dataset.
In addition, the proposed BKOA-GOBL ranked first, achieving the highest average precision, recall, F1-measures, and smallest SD across most benchmark datasets, which reflects its stability and exceptional performance. The BAVO comes closely behind, consistently ranked second in average precision, recall, and F1 measures across most datasets and provides competitive SD values. These measures emphasize that the proposed BKOA-GOBL has a balanced performance and strong reliability. Precision demonstrates its success in minimizing false positives, while recall estimates its sensitivity to true positives. The F1-score presents a combined assessment of both precision and recall, reflecting overall classification quality. The consistently low SD values indicate BKOA-GOBL's stability and effectiveness across numerous runs, resulting in minimal variability and reduced risk of performance decline.
Moreover, the proposed BKOA-GOBL achieves the smallest feature reduction size across four of the five benchmark datasets, demonstrating its effectiveness in selecting the most appropriate attributes while ensuring high classification accuracy. BKOA-GOBL achieves the smallest mean feature size in Australian (5.80), European (11.17), Synthetic Financial Transaction Log (2.00), and CIC-MalMem-2022 (11.00), significantly reducing the size of the chosen attributes compared to other MHTs. The ranking demonstrates BKOA-GOBL's superiority with three wins, one tie, and one loss, making it one of the most effective techniques for FS across all benchmark datasets. Finally, the proposed BKOA-GOBL ranked first, achieving the smallest fitness values and smallest SD across all benchmark datasets, which reflects its stability and exceptional performance.
The ROC_AUC results further confirm the superiority and consistency of the proposed BKOA-GOBL among other competing MHTs across all benchmark datasets. It achieves the highest mean AUC with the lowest standard deviation in nearly all cases, demonstrating exceptional ability to distinguish fraudulent from legitimate instances under varying decision thresholds. For example, on the Synthetic Financial Transaction Log (0.9935) and CIC-MalMem-2022 (0.9998) datasets, BKOA-GOBL reaches near-perfect AUC values with extremely small variability, reflecting remarkable reliability and robustness. Even on the more challenging Real vs. Fake Job Postings dataset, it still secures the highest mean AUC while maintaining competitive SD values. Its low SD demonstrates strong reliability and minimal sensitivity to data variation. Competing algorithms such as BMOA and BAVO rank closely behind but consistently show higher variability, further reinforcing the strong stability and discriminative capability of the proposed approach. The ranking clearly confirms the superiority of BKOA-GOBL, achieving three wins, two ties, and no losses, positioning it as the most effective FS technique across all benchmark datasets.
4.7 Experimental outcomes of the proposed BKOA-GOBL vs. various recent MHTs employing Xgb-tree classifier
Table 8 shows an evaluation of the proposed BKOA-GOBL algorithm performance via considerable MHTs utilizing an Xgb-tree classifier regarding five benchmark datasets (Australian, European, Synthetic Financial Transaction Log, Real vs. Fake Job Postings Prediction, and CIC-MalMem-2022). Essential measures examined classification accuracy, fitness, selected features, precision, recall, F-score and ROC_AUC.
Table 8. Outcomes of the suggested BKOA-GOBL and various MHTs using Xgb-tree classifier concerning the average classification accuracy, fitness, selected features, precision, recall, F-score and ROC_AUC.
Table 8 presents the outcomes of the BKOA-GOBL based on Xgb-tree and its peers concerning classification accuracy across five utilized datasets (Australian, European, Synthetic Financial Transaction Log, Real vs. Fake Job Postings Prediction, and CIC-MalMem-2022). The performance of each algorithm is evaluated based on average accuracy and SD from multiple runs, shedding light on their reliability and effectiveness. The proposed BKOA-GOBL with Xgb-tree ranked first, achieving the highest average accuracy and smallest SD across all datasets, which reflects its stability and exceptional performance. For instance, in the Australian benchmark, BKOA-GOBL records a mean accuracy of 0.9135 with an SD of 0.0061, while on CIC-MalMem-2022, it achieves a perfect score of 0.9997 with an SD of just 0.0001. The BAVO comes closely behind, consistently ranked second in average accuracy across most datasets and providing competitive SD values, including an average accuracy of 0.0.9582 on the Australian dataset and 0.9460 on the European dataset.
In addition, the proposed BKOA-GOBL with Xgb-tree ranked first, achieving the highest average precision, recall, F1-measures, and smallest SD across most benchmark datasets, which reflects its stability and exceptional performance. The BAVO comes closely behind, consistently ranked second in average precision, recall, and F1 measures across most datasets and provides competitive SD values. These measures emphasize that the proposed BKOA-GOBL with Xgb-tree has a balanced performance and strong reliability. Precision demonstrates its success in minimizing false positives, while recall estimates its sensitivity to true positives. The F1-score presents a combined assessment of both precision and recall, reflecting overall classification quality. The consistently low SD values indicate that BKOA-GOBL exhibits stability and effectiveness, as demonstrated by numerous runs, resulting in minimal variability and reduced risk of performance decline.
Moreover, the proposed BKOA-GOBL with Xgb-tree achieves the smallest size of feature reduction in all benchmark datasets, establishing its effectiveness in selecting the most appropriate attributes while ensuring high classification accuracy. BKOA-GOBL with Xgb-tree obtains the smallest mean size of features in Australian (06.27), European (13.13), Synthetic Financial Transaction Log (04.00), Real vs. Fake Job Postings Prediction (709.8), and CIC-MalMem-2022 (09.10), significantly decreasing the size of chosen attributes compared to other MHTs. The ranking demonstrates BKOA-GOBL's superiority over Xgb-tree, with four wins, one tie, and zero losses, making it one of the most effective techniques for FS across all benchmark datasets. Finally, the proposed BKOA-GOBL with Xgb-tree ranked first, achieving the smallest fitness values and smallest SD across all benchmark datasets, which reflects its stability, exceptional performance, and balancing capability between accuracy and number of selected features.
Regarding the ROC_AUC results, the proposed BKOA-GOBL with Xgb-tree consistently achieves the highest ROC_AUC scores and the lowest SD across most benchmark datasets, reflecting its strong ability to distinguish fraudulent from legitimate instances under varying threshold settings. For example, it secures an almost perfect AUC of 1.0000 with an SD of 0.0001 on the CIC-MalMem-2022 dataset and records leading performance across the Synthetic Financial Transaction Log (0.9990) and European (0.9795) datasets as well. Competing algorithms rank noticeably lower, with fewer wins and higher variability, reinforcing the superior and stable discriminative power of the BKOA-GOBL with XGB-Tree classifier. The ranking clearly confirms the superiority of BKOA-GOBL, achieving three wins, two ties, and no losses, positioning it as the most effective FS technique across all benchmark datasets.
4.8 Convergence investigation
The asymptotic capabilities of the proposed approaches (BKOA-GOBL with k-NN and BKOA-GOBL with Xgb-tree) are examined in this section for addressing fraud and malware classification using five datasets. The aim is to evaluate the performance of convergence, as shown in Figures 2, 3. These figures demonstrate that the suggested BKOA-GOBL with K-NN and Xgb-tree classifiers achieves both optimal and rapid convergence with all datasets, outperforming other MHTs under the same conditions of population size and number of iterations.
Figure 2. Convergence curves of the suggested BKOA-GOBL and its peers concerning the K-NN classifier via the datasets: (a) Australian, (b) European, (c) Synthetic Financial Transaction Log, (d) CIC-MalMem-2022, and (e) Real vs Fake Job Postings Prediction.
Figure 3. Convergence curves of the suggested BKOA-GOBL and its peers concerning the Xgb-tree classifier via the datasets: (a) Australian, (b) European, (c) Synthetic Financial Transaction Log, (d) CIC-MalMem-2022, and (e) Real vs Fake Job Postings Prediction.
4.9 Precision-recall analysis
The precision-recall curves provide a detailed view of the classification performance of the proposed BKOA-GOBL framework under varying discrimination thresholds. Unlike accuracy, which can be misleading in highly imbalanced datasets, precision-recall curves focus on two critical measures for FD–precision (the ability to avoid false alarms) and recall (the ability to detect true fraud). As shown in Figures 4, 5, the superiority of the suggested BKOA-GOBL with K-NN and Xgb-tree classifiers is especially pronounced with most datasets, where most alternative approaches show sharp declines in precision as recall increases. The consistently smooth and high-positioned curves reinforce that the suggested BKOA-GOBL effectively avoids local optima and yields reliable features that distinguish fraudulent patterns even under difficult data conditions.
Figure 4. Precision-Recall curves of the suggested BKOA-GOBL and its peers concerning the K-NN classifier via the datasets: (a) Australian, (b) European, (c) Synthetic Financial Transaction Log, (d) CIC-MalMem-2022, and (e) Real vs Fake Job Postings Prediction.
Figure 5. Precision-Recall curves of the suggested BKOA-GOBL and its peers concerning the Xgb-tree classifier via the datasets: (a) Australian, (b) European, (c) Synthetic Financial Transaction Log, (d) CIC-MalMem-2022, and (e) Real vs Fake Job Postings Prediction.
4.10 Wilcoxon's rank-sum test
The Wilcoxon signed-rank test was used to perform a statistical analysis comparing the fitness function values from the BKOA-GOBL and other algorithms, as shown in Tables 9, 10 (Derrac et al., 2011). The aim was to determine if there were any significant differences between them.
Table 9. Wilcoxon's test for the average classification error of the proposed BKOA-GOBL and its peers concerning K-NN.
Table 10. Wilcoxon's test for the average classification error of the proposed BKOA-GOBL and its peers concerning Xgb-tree.
The Wilcoxon signed-rank test is a non-parametric method used in hypothesis testing to compare two related samples. It involves calculating the differences between paired results for a set of problems and ranking these differences by their absolute values. The process then computes the totals of ranks for positive differences (R+) and negative differences (R−), identifying the smaller of the two. The significance of the test is determined using a p-value; if it is below 0.05, it indicates that the differences between the two approaches are statistically significant, suggesting strong evidence against the null hypothesis.
The analysis of the results in Tables 9, 10 demonstrates that the BKOA-GOBL method significantly outperforms other methods when implemented with either k-NN or Xgb-tree classifiers across the entire test scenarios. The p-values in the tables are consistently below the 0.05 threshold, indicating that the enhancements provided by BKOA-GOBL are statistically significant and not merely coincidental. These results confirm the superior performance of the BKOA-GOBL method compared to other alternatives.
In summary, the results of the Wilcoxon test showcase the strong performance of the BKOA-GOBL algorithm, reaffirming its statistical superiority. The consistent rejection of the null hypothesis indicates that the enhancements made by BKOA-GOBL are significant and worthwhile.
4.11 Real-time integration feasibility
The BKOA-GOBL framework shows real promise for real-time FD, particularly in environments where decisions must be made in milliseconds. By reducing the number of features, the entire process is accelerated–less data means faster scoring and fewer chances of system slowdowns. That's a significant win when you're trying to catch fraud before a transaction is processed. These improvements make it a strong candidate for integration into live monitoring systems, where speed and accuracy are non-negotiable.
Reducing the feature set doesn't just help with speed–it also reduces memory usage, which is a significant advantage when deploying models in production. Whether it's running on a cloud server or a small device at the edge, like a payment terminal, leaner models are easier to manage. The beauty of BKOA-GOBL lies in its flexibility: for smaller workloads, it runs smoothly on regular CPUs, but when dealing with massive volumes–such as in an extensive financial network–it can scale up with GPUs or distributed systems to maintain speed and responsiveness.
Another strength of BKOA-GOBL is its ability to work in both batch and streaming setups. Batch processing is ideal for retraining and updating the model periodically, while streaming enables real-time decisions as transactions occur. Although this study didn't simulate live data streams directly, the performance gains we observed suggest that the system is well-equipped for such an environment. Testing it on actual transaction flows would be a logical next step–and one that could really show how well it holds up under pressure.
5 Practical deployment considerations
Although the proposed BKOA-GOBL has demonstrated significant efficacy in benchmark datasets, transitioning from experimental validation to deployment in real-world financial systems necessitates addressing multiple operational and systemic challenges. This section outlines the key considerations for effective integration.
5.1 Integration into financial infrastructure
The implementation of the proposed BKOA-GOBL system within current financial ecosystems necessitates the seamless incorporation with extant legacy systems, the establishment of secure data transmission channels, and adherence to prevailing regulatory frameworks. Principal challenges associated with integration encompass:
• The capability to achieve seamless integration and compatibility among a variety of data formats and sources across different institutional frameworks.
• Tuning the algorithm for low-latency environments where FD must occur instantly.
• Maintaining data confidentiality and adhering to standards.
5.2 Scalability and computational efficiency
Financial institutions process vast volumes of transactions daily. BKOA-GOBL must scale efficiently to handle:
• Optimizing parallel processing and memory usage for large-scale deployment.
• Leveraging distributed architectures to support scalable and resilient operations.
• Automating updates to maintain performance as fraud patterns evolve.
5.3 Interpretability and regulatory compliance
In FD, interpretability is crucial for establishing trust, ensuring auditability, and maintaining legal accountability. To meet these needs:
• Providing clear insights into which features influenced detection decisions.
• Integrating post-hoc interpretability methods such as Explainable AI to visualize decision boundaries and model behavior.
• Allowing analysts to validate and override automated decisions when necessary.
5.4 Operational monitoring and maintenance
Long-term success of BKOA-GOBL depends on robust operational support:
• Monitoring for changes in data distribution that may degrade model performance.
• Prioritizing alerts to reduce false positives and analyst fatigue.
• Incorporating user feedback to refine model accuracy and relevance.
6 Conclusion and future directions
This study introduced a robust and adaptive FD methodology, BKOA-GOBL, for improved FD and convergence behavior. The method effectively balances exploration and exploitation through planetary motion-inspired dynamics and addresses class imbalance using RUS. Two classifiers, K-NN and Xgb-tree, were employed to assess the classification accuracy of selected feature subsets. Comprehensive experiments across five diverse and real-world datasets demonstrated that BKOA-GOBL consistently outperforms traditional classifiers and twelve state-of-the-art MHAs in terms of several performance indicators, such as accuracy, feature reduction, and fitness. Specifically, the proposed methodology achieved classification accuracies of up to 99.96% and feature reduction rates of up to 81.82%, while maintaining high precision, recall, and F1-scores (all exceeding 0.95) across the datasets. The BKOA-GOBL exhibited superior exploration and exploitation compared to its counterparts. The statistical significance of its superiority was confirmed using Wilcoxon's rank-sum test at a 5% significance level. These results affirm the proposed model's adaptability, efficiency, and robustness, making it a promising tool for real-world FD applications in high-dimensional and imbalanced data environments. The proposed BKOA-GOBL, while effective, has several limitations: the use of RUS helps balance the dataset but may remove valuable information and reduce classification accuracy; the integration of BKOA and mutation strategies enhances FS efficiency but introduces additional computational complexity compared to simpler models; its success depends heavily on optimal parameter tuning, requiring extra effort in hyperparameter optimization; and although validated on five benchmark datasets, its applicability to real-time, large-scale transaction data across diverse regions and industries remains to be investigated.
Looking ahead, future research can focus on enhancing the capabilities of BKOA-GOBL through hybridization with other swarm-based or evolutionary algorithms to improve its global search ability and convergence behavior. Exploring the adaptation of BKOA-GOBL to real-time FD systems using streaming data environments, where latency and adaptability are critical. Incorporating online learning mechanisms into the framework would allow it to update dynamically as new transaction patterns emerge, enhancing its responsiveness to evolving fraud tactics. The integration of BKOA-GOBL with advanced classification techniques, such as DL and neural networks, may yield further improvements. Furthermore, exploring multi-objective extensions of BKOA-GOBL could allow simultaneous optimization of multiple conflicting goals, such as maximizing accuracy while minimizing computational cost or energy consumption. These directions offer valuable opportunities to evolve BKOA-GOBL into a more powerful and versatile optimization framework.
Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.
Author contributions
RE: Writing – review & editing, Resources, Funding acquisition, Writing – original draft, Visualization. AAE-M: Methodology, Supervision, Data curation, Writing – review & editing, Writing – original draft. MG: Project administration, Writing – original draft, Validation, Writing – review & editing, Formal analysis. AA: Conceptualization, Investigation, Software, Writing – original draft, Writing – review & editing.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This work was funded by the Prince Sattam bin Abdulaziz University through the project number (PSAU/2024/01/31333).
Conflict of interest
The author(s) declared that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Abd El-Mageed, A. A., Abohany, A. A., and Elashry, A. (2023). Effective feature selection strategy for supervised classification based on an improved binary Aquila optimization algorithm. Comput. Ind. Eng. 181:109300. doi: 10.1016/j.cie.2023.109300
Abdel-Basset, M., Mohamed, R., Alrashdi, I., Sallam, K. M., and Hameed, I. A. (2024). CNN-IKOA: convolutional neural network with improved Kepler optimization algorithm for image segmentation: experimental validation and numerical exploration. J. Big Data 11:13. doi: 10.1186/s40537-023-00858-6
Abdel-Basset, M., Mohamed, R., Azeem, S. A. A., Jameel, M., and Abouhawwash, M. (2023a). Kepler optimization algorithm: a new metaheuristic algorithm inspired by Kepler's laws of planetary motion. Knowl. Based Syst. 268:110454. doi: 10.1016/j.knosys.2023.110454
Abdel-Basset, M., Mohamed, R., Hezam, I. M., Sallam, K. M., Alshamrani, A. M., Hameed, I. A., et al. (2023b). A novel binary Kepler optimization algorithm for 0-1 knapsack problems: methods and applications. Alex. Eng. J. 82, 358–376. doi: 10.1016/j.aej.2023.09.072
Adil, M., Almaiah, M. A., Omar Alsayed, A., and Almomani, O. (2020). An anonymous channel categorization scheme of edge nodes to detect jamming attacks in wireless sensor networks. Sensors 20:2311. doi: 10.3390/s20082311
Alamri, M., and Ykhlef, M. (2022). Survey of credit card anomaly and fraud detection using sampling techniques. Electronics 11:4003. doi: 10.3390/electronics11234003
Alashjaee, A. M. (2023). An efficient approach based on remora optimization algorithm and levy flight for intrusion detection. Intell. Autom. Soft Comput. 37, 235–254. doi: 10.32604/iasc.2023.036247
Almaiah, A., and Almomani, O. (2020). An investigator digital forensics frequencies particle swarm optimization for dectection and classification of apt attack in fog computing enviroment (IDF-FPSO). J. Theor. Appl. Inf. Technol. 15:98.
Altalhan, M., Algarni, A., and Alouane, M. T.-H. (2025). Imbalanced data problem in machine learning: a review. IEEE Access 13:13686–13699. doi: 10.1109/ACCESS.2025.3531662
Amigó, E., Gonzalo, J., Artiles, J., and Verdejo, F. (2009). A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr. Boston. 12, 461–486. doi: 10.1007/s10791-008-9066-8
Amigó, E., Gonzalo, J., Artiles, J., and Verdejo, F. (2011). Combining evaluation metrics via the unanimous improvement ratio and its application to clustering tasks. J. Artif. Intell. Res. 42, 689–718. doi: 10.1613/jair.3401
Asha, R., and Kumar, K. R. S. (2021). Credit card fraud detection using artificial neural network. Glob. Tran. Proc. 2, 35–41. doi: 10.1016/j.gltp.2021.01.006
Canadian Institute for Cybersecurity (CIC) University of New Brunswick (UNB). (2022). CIC-MalMem-2022 Dataset. Available online at: https://www.unb.ca/cic/datasets/malmem-2022.html (Accessed January 10, 2025).
Chiba, Z., Abghour, N., Moussaid, K., El Omri, A., and Rida, M. (2019). “An efficient network ids for cloud environments based on a combination of deep learning and an optimized self-adaptive heuristic search algorithm,” in Networked Systems: 7th International Conference, NETYS 2019, Marrakech, Morocco, June 19-21, 2019, Revised Selected Papers 7 (Cham: Springer), 235–249. doi: 10.1007/978-3-030-31277-0_15
De Medeiros, A. K. A., Guzzo, A., Greco, G., Van Der Aalst, W. M., Weijters, A., Van Dongen, B. F., et al. (2007). “Process mining based on clustering: a quest for precision,” in International Conference on Business Process Management, eds. A. ter Hofstede, B. Benatallah, and H. Y. Paik (Cham: Springer), 17–29. doi: 10.1007/978-3-540-78238-4_4
Derrac, J., García, S., Molina, D., and Herrera, F. (2011). A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol. Comput. 1, 3–18. doi: 10.1016/j.swevo.2011.02.002
El-Mageed, A. A. A., Abohany, A. A., and Hosny, K. M. (2025). Enhanced binary kepler optimization algorithm for effective feature selection of supervised learning classification. J. Big Data 12:93. doi: 10.1186/s40537-025-01125-6
El-Mageed, A. A. A., Elkhouli, A. E., Abohany, A. A., and Gafar, M. (2024). Gene selection via improved nuclear reaction optimization algorithm for cancer classification in high-dimensional data. J. Big Data 11:46. doi: 10.1186/s40537-024-00902-z
Elsoud, E. A., Hassan, M., Alidmat, O., Al Henawi, E., Alshdaifat, N., Igtait, M., et al. (2024). Under sampling techniques for handling unbalanced data with various imbalance rates: a comparative study. Int. J. Adv. Comput. Sci. Appl. 15, 1274–1284. doi: 10.14569/IJACSA.2024.01508124
Fanai, H., and Abbasimehr, H. (2023). A novel combined approach based on deep autoencoder and deep classifiers for credit card fraud detection. Expert Syst. Appl. 217:119562. doi: 10.1016/j.eswa.2023.119562
Firdaus, A., Anuar, N. B., Karim, A., and Razak, M. F. A. (2018). Discovering optimal features using static analysis and a genetic search based method for android malware detection. Front. Inf. Technol. Electron. Eng. 19, 712–736. doi: 10.1631/FITEE.1601491
Ghaleb, S. A., Mohamad, M., Fadzli, S. A., and Ghanem, W. A. H. (2021). Training neural networks by enhance grasshopper optimization algorithm for spam detection system. IEEE Access 9, 116768–116813. doi: 10.1109/ACCESS.2021.3105914
Houssein, E. H., Abdalkarim, N., Samee, N. A., Alabdulhafith, M., and Mohamed, E. (2024). Improved kepler optimization algorithm for enhanced feature selection in liver disease classification. Knowl. Based Syst. 297:111960. doi: 10.1016/j.knosys.2024.111960
Hu, G., Gong, C., Li, X., and Xu, Z. (2024). CGKOA: an enhanced Kepler optimization algorithm for multi-domain optimization problems. Comput. Methods Appl. Mech. Eng. 425:116964. doi: 10.1016/j.cma.2024.116964
Hussien, R. M., Abohany, A. A., Abd El-Mageed, A. A., and Hosny, K. M. (2024). Improved binary meerkat optimization algorithm for efficient feature selection of supervised learning classification. Knowl. Based Syst. 292:111616. doi: 10.1016/j.knosys.2024.111616
Kaggle (2013). European Credit Card Transactions Dataset. Kaggle. Available online at: https://www.kaggle.com/mlg-ulb/creditcardfraud (Accessed January 6, 2025).
Kaggle (2016). Synthetic Financial Transaction Log for Fraud Detection. Kaggle. Available online at: https://www.kaggle.com/ealaxi/paysim1 (Accessed January 6, 2025).
Kaggle (2020). Real vs Fake Job Postings Prediction Dataset. Kaggle. Available online at: https://www.kaggle.com/shivamb/real-or-fake-fake-jobposting-prediction (Accessed January 10, 2025).
Kale, M. R., Anantha, N. L., Rao, V. S., Godla, S. R., Thenmozhi, E., et al. (2024). Enhancing cryptojacking detection through hybrid black widow optimization and generative adversarial networks. Int. J. Adv. Comput. Sci. Appl. 15, 871–884. doi: 10.14569/IJACSA.2024.0150387
Kaplan, F., and Babalik, A. (2025). Performance analysis of cloud computing task scheduling using metaheuristic algorithms in ddos and normal environments. Electronics 14:1988. doi: 10.3390/electronics14101988
Mahdavi, S., Rahnamayan, S., and Deb, K. (2018). Opposition based learning: a literature review. Swarm Evol. Comput. 39, 1–23. doi: 10.1016/j.swevo.2017.09.010
Mniai, A., Tarik, M., and Jebari, K. (2023). A novel framework for credit card fraud detection. IEEE Access 11, 112776–112786. doi: 10.1109/ACCESS.2023.3323842
Mohamed, R., Abdel-Basset, M., Sallam, K. M., Hezam, I. M., Alshamrani, A. M., Hameed, I. A., et al. (2024). Novel hybrid kepler optimization algorithm for parameter estimation of photovoltaic modules. Sci. Rep. 14:3453. doi: 10.1038/s41598-024-52416-6
Mosa, D. T., Sorour, S. E., Abohany, A. A., and Maghraby, F. A. (2024). CCFD: efficient credit card fraud detection using meta-heuristic techniques and machine learning algorithms. Mathematics 12:2250. doi: 10.3390/math12142250
Nguyen, Q. T., Tran, M. P., Prabhakaran, V., Liu, A., and Nguyen, G. H. (2024). Compact machine learning model for the accurate prediction of first 24-hour survival of mechanically ventilated patients. Front. Med. 11:1398565. doi: 10.3389/fmed.2024.1398565
Prabhakaran, N., and Nedunchelian, R. (2023). Oppositional cat swarm optimization-based feature selection approach for credit card fraud detection. Comput. Intell. Neurosci. 2023:2693022. doi: 10.1155/2023/2693022
Rajadurai, H., and Gandhi, U. D. (2022). A stacked ensemble learning model for intrusion detection in wireless network. Neural Comput. Appl. 34, 15387–15395. doi: 10.1007/s00521-020-04986-5
Ramesh, S. N., Al Fardan, B. M. M., Anupama, C., Kumar, K. V., Cho, S., Acharya, S., et al. (2025). Leveraging cyberattack news tweets for advanced threat detection and classification using ensemble of deep learning models with wolverine optimization algorithm. IEEE Access 13, 48343–48358. doi: 10.1109/ACCESS.2025.3550378
Rodrigues, V. F., Policarpo, L. M., da Silveira, D. E., da Rosa Righi, R., da Costa, C. A., Barbosa, J. L. V., et al. (2022). Fraud detection and prevention in e-commerce: a systematic literature review. Electron. Commer. Res. Appl. 56:101207. doi: 10.1016/j.elerap.2022.101207
Russell, J. L. (1964). Kepler's laws of planetary motion: 1609-1666. Br. J. Hist. Sci, 2, 1–24. doi: 10.1017/S0007087400001813
Singh, A., Jain, A., and Biable, S. E. (2022). Financial fraud detection approach based on firefly optimization algorithm and support vector machine. Appl. Comput. Intell. Soft Comput. 2022:1468015. doi: 10.1155/2022/1468015
Singh, A. P., Kumar, S., Kumar, A., and Usama, M. (2022). “Machine learning based intrusion detection system for minority attacks classification,” in 2022 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES) (Greater Noida: IEEE), 256–261. doi: 10.1109/CISES54857.2022.9844381
Sorour, S. E. AlBarrak, K. M., Abohany, A. A., and El-Mageed, A. A. A. (2024). Credit card fraud detection using the brown bear optimization algorithm. Alex. Eng. J. 104, 171–192. doi: 10.1016/j.aej.2024.06.040
Stephenson, B. (2012). Kepler's Physical Astronomy, Volume 13. Cham: Springer Science & Business Media.
Tarwireyi, P., Terzoli, A., and Adigun, M. O. (2024). Meta-sonifieddroid: metaheuristics for optimizing sonified android malware detection. IEEE Access 12, 134779–134808. doi: 10.1109/ACCESS.2024.3415355
Tizhoosh, H. R. (2005). “Opposition-based learning: a new scheme for machine intelligence,” in International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06), Volume 1 (Vienna: IEEE), 695–701. doi: 10.1109/CIMCA.2005.1631345
Toğaçar, M., and Ergen, B. (2024). Processing 2d barcode data with metaheuristic based cnn models and detection of malicious pdf files. Appl. Soft Comput. 161:111722. doi: 10.1016/j.asoc.2024.111722
Wahid, A., Msahli, M., Bifet, A., and Memmi, G. (2023). NFA: a neural factorization autoencoder based online telephony fraud detection. Digit. Commun. Netw. 10, 158–167. doi: 10.1016/j.dcan.2023.03.002
Yap, B. W., Rani, K. A., Rahman, H. A. A., Fong, S., Khairudin, Z., Abdullah, N. N., et al. (2014). “An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets,” in Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013) (Cham: Springer), 13–22. doi: 10.1007/978-981-4585-18-7_2
Keywords: feature selection, fraud detection, ghost opposition-based learning (GOBL), Kepler optimization algorithm, machine learning, metaheuristic algorithms
Citation: Egami RH, Abd El-Mageed AA, Gafar M and Abohany AA (2026) Tackling fraud detection with an enhanced Kepler optimization and ghost opposition-based learning. Front. Artif. Intell. 8:1710387. doi: 10.3389/frai.2025.1710387
Received: 24 September 2025; Revised: 26 November 2025;
Accepted: 04 December 2025; Published: 09 January 2026.
Edited by:
Ehi Eric Esoimeme, James Hope University, NigeriaReviewed by:
Shadab Alam, Jazan University, Saudi ArabiaWence Nwoga, James Hope University, Nigeria
Zakhele Hlophe, South African Government, Department of Health, South Africa
Sreekanth Rallapalli, Nitte Meenakshi Institute of Technology, India
Copyright © 2026 Egami, Abd El-Mageed, Gafar and Abohany. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Mona Gafar, bS5nYWZhckBwc2F1LmVkdS5zYQ==
Amr A. Abohany5