AUTHOR=Luck Stanley 

TITLE=Inverse problems in covariate data analysis

JOURNAL=Frontiers in Applied Mathematics and Statistics

VOLUME=Volume 11 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/applied-mathematics-and-statistics/articles/10.3389/fams.2025.1646650

DOI=10.3389/fams.2025.1646650

ISSN=2297-4687

ABSTRACT=The fact that Pearson's correlation coefficient and effect size are perspective functions of covariance parameters demonstrates that how covariance is defined is one of the most important issues in data analysis. We suggest that covariance analysis for pairwise numeric, categorical, and mixed numeric-categorical data types are mathematically distinct problems. This is because of the disparate algebraic properties and systematic effects associated with numeric and categorical quantities. We examine the weighted least squares (WLS) formulation of linear regression and obtain definitions for heteroscedastic covariance and variance. Covariance and variance as functions of centered variable vectors are instrumental quantities. Then it is essential that the instrumental effects cancel when dividing covariance by the variance to estimate the slope in linear regression. The tensor product form of the covariance demonstrates that the composite properties of variable vectors are intrinsic to covariate data analysis. The solution of the inverse problem for linear regression takes the form of a relation between slope and covariance parameters, and requires the specification of an error model for the data; otherwise, the inverse problem is ill-posed. We propose that, in current practice, the term “effect size” is ambiguous because it does not distinguish between the different algebraic components of the inverse problem in a case-control data analysis. Then, it is necessary to identify the analogs of WLS covariance for case-control data and to distinguish between covariance and functional parameters in effect size analysis. The development of effect size methodology for studies of complex systems is complicated by the fact that the functional inverse problem is ill-posed.