Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Big Data

Sec. Data Mining and Management

This article is part of the Research TopicMachine Learning for Large-Scale Data Processing: Algorithms and ApplicationsView all 4 articles

A Genetic Algorithm-Based Framework for Online Sparse Feature Selection in Data Streams

Provisionally accepted
Guanyu  LiuGuanyu Liu1,2Jinhang  LiuJinhang Liu1Guifan  HeGuifan He1Yifan  LiuYifan Liu1Huabo  BaiHuabo Bai1Zhou  MinZhou Min1*
  • 1Southwest University, Chongqing, China
  • 2PetroChina Qinghai Oilfield Company, Qinghai, China

The final, formatted version of the article will be published soon.

High-dimensional streaming data implementations commonly utilize online streaming feature selection (OSFS) techniques. In practice, however, incomplete data due to equipment failures and technical constraints often poses a significant challenge. Online Sparse Streaming Feature Selection (OS2FS) tackles this issue by performing missing data imputation via latent factor analysis. Nevertheless, existing OS2FS approaches exhibit considerable limitations in feature evaluation, resulting in degraded performance. To address these shortcomings, this paper introduces a novel genetic algorithm-based online sparse streaming feature selection (GA-OS2FS) in data streams, which integrates two key innovations: 1) imputation of missing values using a latent factor analysis model, and 2) application of genetic algorithm to assess feature importance. Comprehensive experiments conducted on six real-world datasets show that GA-OS2FS surpasses state-of-the-art OSFS and OS2FS methods, consistently attaining higher accuracy through the selection of optimal feature subsets.

Keywords: Feature Selection, Genetic Algorithm, Latent factor analysis, missing data, Online Learning

Received: 07 Jan 2026; Accepted: 20 Jan 2026.

Copyright: © 2026 Liu, Liu, He, Liu, Bai and Min. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Zhou Min

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.