CUR Matrix Approximation through Convex Optimization for Feature Selection

Linehan, Kathryn; Balan, Radu

doi:10.3389/fams.2025.1632218

ORIGINAL RESEARCH article

Front. Appl. Math. Stat.

Sec. Mathematics of Computation and Data Science

Volume 11 - 2025 | doi: 10.3389/fams.2025.1632218

CUR Matrix Approximation through Convex Optimization for Feature Selection

Provisionally accepted

Kathryn Linehan^1*

Radu Balan²

¹University of Virginia, Charlottesville, United States
²University of Maryland, College Park, College Park, Maryland, United States

The final, formatted version of the article will be published soon.

The singular value decomposition (SVD) is commonly used in applications requiring a low rank matrix approximation. However, the singular vectors cannot be interpreted in terms of the original data. For applications requiring this type of interpretation, e.g., selection of important data matrix columns or rows, the approximate CUR matrix factorization can be used. Work on the CUR matrix approximation has generally focused on algorithm development, theoretical guarantees, and applications. In this work, we present a novel deterministic CUR formulation and algorithm with theoretical convergence guarantees. The algorithm utilizes convex optimization, finds important columns and rows separately, and allows the user to control the number of important columns and rows selected from the original data matrix. We present numerical results and demonstrate the effectiveness of our CUR algorithm as a feature selection method on gene expression data. These results are compared to those using the SVD and other CUR algorithms as the feature selection method. Lastly, we present a novel application of CUR as a feature selection method to determine discriminant proteins when clustering protein expression data in a self-organizing map (SOM), and compare the performance of multiple CUR algorithms in this application.

Keywords: CUR matrix approximation, convex optimization, Low rank matrix approximation, Feature Selection, interpretation

Received: 20 May 2025; Accepted: 15 Jul 2025.

Copyright: © 2025 Linehan and Balan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Kathryn Linehan, University of Virginia, Charlottesville, United States

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.