AUTHOR=Mhaskar Hrushikesh N. , Cheng Xiuyuan , Cloninger Alexander TITLE=A Witness Function Based Construction of Discriminative Models Using Hermite Polynomials JOURNAL=Frontiers in Applied Mathematics and Statistics VOLUME=Volume 6 - 2020 YEAR=2020 URL=https://www.frontiersin.org/journals/applied-mathematics-and-statistics/articles/10.3389/fams.2020.00031 DOI=10.3389/fams.2020.00031 ISSN=2297-4687 ABSTRACT=In machine learning, we are given a dataset of the form $\{(x_j,y_j)\}_{j=1}^M$, drawn as i.i.d. samples from an unknown probability distribution $\mu$; the marginal distribution for the $x_j$'s being $\mu^*$, and the marginals of the $k^{th}$ class $\mu_k^*(x)$ possibly overlapping. We address the problem of detecting, with a high degree of certainty, for which $x$ we have $\mu_k^*(x)>\mu_i^*(x)$ for all $i\neq k$. We propose that rather than using a positive kernel such as the Gaussian for estimation of these measures, using a non-positive kernel that preserves a large number of moments of these measures yields an optimal approximation. We use multi-variate Hermite polynomials for this purpose, and prove optimal and local approximation results in a supremum norm in a probabilistic sense. Together with a permutation test developed with the same kernel, we prove that the kernel estimator serves as a ``witness function'' in classification problems. Thus, if the value of this estimator at a point $x$ exceeds a certain threshold, then the point is reliably in a certain class. This approach can be used to modify pretrained algorithms, such as neural networks or nonlinear dimension reduction techniques, to identify in-class vs out-of-class regions for the purposes of generative models, classification uncertainty, or finding robust centroids. This fact is demonstrated in a number of real world data sets including MNIST, CIFAR10, Science News documents, and LaLonde data sets.