Frontiers reaches 6.4 on Journal Impact Factors

# Frontiers in Applied Mathematicsand Statistics

## Original Research ARTICLE Provisionally accepted The full-text will be published soon. Notify me

Front. Appl. Math. Stat. | doi: 10.3389/fams.2017.00024

# Better Autologistic Regression

• 1Shanghai Center for Mathematical Sciences, Fudan University, China

Autologistic regression is an important probability model for dichotomous random variables observed along with covariate information. It has been used in various fields for analyzing binary data possessing spatial or network structure. The model can be viewed as an extension of the autologistic model (also known as the Ising model, quadratic exponential binary distribution, or Boltzmann machine) to include covariates. It can also be viewed as an extension of logistic regression to handle responses that are not independent.

Not all authors use exactly the same form of the autologistic regression model. Variations of the model differ in two respects. First, the variable coding---the two numbers used to represent the two possible states of the variables---might differ. Common coding choices are (zero, one) and (minus one, plus one). Second, the model might appear in either of two algebraic forms: a standard form, or a recently proposed centered form. Little attention has been paid to the effect of these differences, and the literature shows ambiguity about their importance. It is shown here that changes to either coding or centering in fact produce distinct, non-nested probability models. Theoretical results, numerical studies, and analysis of an ecological data set all show that the differences among the models can be large and practically significant. Understanding the nature of the differences and making appropriate modelling choices can lead to significantly improved autologistic regression analyses. The results strongly suggest that the standard model with plus/minus coding, which we call the symmetric autologistic model, is the most natural choice among the autologistic variants.

Keywords: probabilistic graphical models, Markov random fields, Logistic regression, correlated binary random variables, spatial statistics

Received: 22 Mar 2017; Accepted: 10 Nov 2017.

Edited by:

Hau-tieng Wu, Duke University, United States

Reviewed by:

Antonio Calcagnì, University of Trento, Italy
John Hughes, University of Colorado Denver, United States

Copyright: © 2017 Wolters. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Dr. Mark A. Wolters, Fudan University, Shanghai Center for Mathematical Sciences, Shanghai, China, mark@mwolters.com