AUTHOR=Mhaskar Hrushikesh N. , Cheng Xiuyuan , Cloninger Alexander 

TITLE=A Witness Function Based Construction of Discriminative Models Using Hermite Polynomials

JOURNAL=Frontiers in Applied Mathematics and Statistics

VOLUME=Volume 6 - 2020

YEAR=2020

URL=https://www.frontiersin.org/journals/applied-mathematics-and-statistics/articles/10.3389/fams.2020.00031

DOI=10.3389/fams.2020.00031

ISSN=2297-4687

ABSTRACT=In machine learning, we are given a dataset of the form $\{(x_j,y_j)\}_{j=1}^M$, drawn as  i.i.d. samples from an unknown probability distribution $\mu$; the marginal distribution for the $x_j$'s being $\mu^*$, and the marginals of the $k^{th}$ class $\mu_k^*(x)$ possibly overlapping.  We address the problem of detecting, with a high degree of certainty, for which $x$ we have $\mu_k^*(x)&gt;\mu_i^*(x)$ for all $i\neq k$.  We propose that rather than using a positive kernel such as the Gaussian for estimation of these measures, using a non-positive kernel that preserves a large number of moments of these measures yields an optimal approximation.  We use multi-variate Hermite polynomials for this purpose, and prove optimal and local approximation results in a supremum norm in a probabilistic sense.  Together with a permutation test developed with the same kernel, we prove that the kernel estimator serves as a ``witness function'' in classification problems.   Thus, if the value of this estimator at a point $x$ exceeds a certain threshold, then the point is reliably in a certain class.   This approach can be used to modify pretrained algorithms, such as neural networks or nonlinear dimension reduction techniques, to identify in-class vs out-of-class regions for the purposes of generative models, classification uncertainty, or finding robust centroids.  This fact is demonstrated in a number of real world data sets including MNIST, CIFAR10, Science News documents, and LaLonde data sets.