Finding the Needle in the Haystack -An Interpretable Sequential Pattern Mining Method for Classification Problems

Grote, Alexander; Hariharan, Anuja; Weinhardt, Christof

doi:10.3389/fdata.2025.1604887

ORIGINAL RESEARCH article

Front. Big Data

Sec. Machine Learning and Artificial Intelligence

Volume 8 - 2025 | doi: 10.3389/fdata.2025.1604887

Finding the Needle in the Haystack -An Interpretable Sequential Pattern Mining Method for Classification Problems

Provisionally accepted

Alexander Grote^*

Anuja Hariharan

Christof Weinhardt

Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

The final, formatted version of the article will be published soon.

The analysis of discrete sequential data, such as event logs and customer clickstream data, is hindered by the vast number of possible sequential patterns, making it difficult to identify meaningful sequences and extract valuable insights. To address this challenge, we propose a novel feature selection algorithm that combines unsupervised sequential pattern mining with supervised machine learning to identify and correlate meaningful sequences with classification objectives. Unlike existing approaches in interpretable machine learning, our algorithm determines important sequential patterns during the mining process, eliminating the need for subsequent classification to estimate their importance, and compared to existing interesting measures utilised in significant pattern mining, we provide an inherently interpretable and local classspecific interesting measure r1 . We evaluate our algorithm on three diverse datasets, including churn prediction, malware sequence analysis, and a synthetic dataset, which vary in size, application area, and feature complexity. Our results show that our algorithm achieves comparable classification performance to existing feature selection algorithms while maintaining interpretability and reducing computational resources. Our work provides a practical and efficient approach for practitioners to uncover important sequences in classification problems, offering an alternative to existing interpretable machine learning solutions and paving the way for new research opportunities in sequential data analysis.

Keywords: Sequential pattern mining, Feature Selection, sequence classification, Interpretable machine learning, Categorical time series

Received: 02 Apr 2025; Accepted: 11 Sep 2025.

Copyright: © 2025 Grote, Hariharan and Weinhardt. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Alexander Grote, alexander.grote@kit.edu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.