Degenerate Beta autoregressive model for proportion time-series with zeros or ones: An application to antimicrobial resistance rate using R shiny app

Lobo, Jevitha; Kamath, Asha; Kalwaje Eshwara, Vandana

doi:10.3389/fpubh.2022.969777

ORIGINAL RESEARCH article

Front. Public Health, 10 January 2023

Sec. Infectious Diseases: Epidemiology and Prevention

Volume 10 - 2022 | https://doi.org/10.3389/fpubh.2022.969777

This article is part of the Research TopicAntimicrobial Resistance and Antimicrobial AlternativesView all 12 articles

Degenerate Beta autoregressive model for proportion time-series with zeros or ones: An application to antimicrobial resistance rate using R shiny app

Updated

A correction has been applied to this article in:

Corrigendum: Degenerate Beta autoregressive model for proportion time-series with zeros or ones: an application to antimicrobial resistance rate using R shiny app
1. Read correction

Jevitha Lobo¹

Asha Kamath¹^*

Vandana Kalwaje Eshwara²

¹Department of Data Sciences, Prasanna School of Public Health, Manipal Academy of Higher Education (MAHE), Manipal, Karnataka, India
²Department of Microbiology, Kasturba Medical College of Manipal, Manipal Academy of Higher Education (MAHE), Manipal, Karnataka, India

Background: Antimicrobial resistance has emerged as one of the foremost public health troubles of the 21st century. This has ended in a public health disaster of the global situation, which threatens the exercise of present-day remedy. There is an urgent requirement for a cost-effective strategy to reduce antimicrobial resistance. Infectious disease control researchers most often analyze and predict antimicrobial resistance rate data that includes zeros or ones. Commonly used time-series analysis such as autoregressive moving average model is inappropriate for such data and may arrive at biased results.

Objective: This study aims to propose a time-series model for continuous rates or proportions when the interval of series includes zeros or ones and compares the model with existing models.

Data: The Escherichia coli, isolated from blood cultures showing variable susceptibility results to different antimicrobial agents, has been obtained from a clinical microbiology laboratory of a tertiary care hospital, Udupi district, Karnataka, during the years between 2011 and 2019.

Methodology: We proposed a Degenerate Beta Autoregressive model which is a mixture of continuous and discrete distributions with probability mass at zero or one. The proposed model includes autoregressive terms along with explanatory variables. The estimation is done using maximum likelihood with a non-linear optimization algorithm. An R shiny app has been provided for the same.

Results: The proposed Degenerate Beta Autoregressive model performed well compared to the existing autoregressive moving average models. The forecasted antimicrobial resistance rate has been obtained for the next 6 months.

Conclusion: The findings of this article could be beneficial to the infectious disease researchers to use an appropriate time-series model to forecast the resistance rate for the future and to have better or advance public health policies to control the rise in resistance rate.

1. Introduction

Antimicrobial resistance (AMR) is a serious problem in many developing countries. The World Health Organization (WHO) has categorized AMR as a serious health problem affecting the patients of various countries. In 2015, the WHO surveyed in its six regions called country situation analysis to examine the current practices and to determine the gaps which increase antimicrobial resistance. Eleven and eight countries from Asian and African continents, respectively, from low- to middle-income participated in the study. The analysis showed that AMR is a major issue in both the continents, and in the South-East Asia, nosocomial infections are of particular concern. The main cause of resistance is the inappropriate use of antimicrobial medicines and poor healthcare facilities (1).

In the past, to study antibiotic usage and resistance time-series, models such as Box-Jenkins ARIMA (autoregressive-integrated moving average) (2) and transfer function models have been used. Some studies used time-series to forecast trend and seasonality of the data with the exponential smoothing technique.

However, there is a limitation of these models in terms of accuracy of prediction or violation of assumptions in the applicability. Hence, this study proposes to develop and test stochastic models to predict antibiotic resistance rate, which helps us in understanding the pattern of resistance to plan strategies for the rational use of antibiotics.

Oftentimes, the data like antimicrobial resistance include data in the interval [0, 1) or (0, 1], when the bacteria is highly susceptible or resistant to the antibiotic. To model data with rates/proportions regression, models are proposed by Ferrari and Cribari-Neto (3), Mitnik and Baek (4), Pumi et al. (5), and Artur and Bazán (6) and the time-series models are proposed by Rocha and Cribari-Neto (7) and Bayer et al. (8). But these models are not appropriate for the proportion data with zeros or ones. To model such kind of data, we look for a mixture of distributions. Ospina and Ferrari (9) and Cribari-Neto and Santos (10) introduced inflated Beta distributions and inflated Kumaraswamy distributions which is a mixture of discrete and continuous distributions. Ospina and Ferrari (11) and Bayer et al. (12) introduced the inflated Beta regression model and inflated Kumaraswamy regression model. Currently, there is scope to use a time-series model for proportion data in the interval [0, 1) or (0, 1].

This paper proposes to model the time-series data in the interval [0, 1) or (0, 1] using a mixture of Degenerate and Beta distributions through a frequentist approach. This study is an extension of βARMA model proposed by Rocha and Cribari-Neto (7), where instead of Beta distribution, inflated Beta distribution is incorporated. The proposed model is compared with the ARIMA model.

The developed model is illustrated with an application based on antimicrobial resistance (AMR) data. Rates of Escherichia coli (E. coli) isolated from blood cultures showing variable susceptibility results to different antimicrobial agents are considered for the modeling in this study. E. coli is a gram-negative bacteria, most regularly isolated in patients with blood stream infection (BSI), and in severe instances, it may cause loss of life. The rates of BSI have accelerated steadily in recent years (13). However, knowledge of future resistance rate using forecasting may help in recommending new interventions or policy recommendations in hospital settings.

2. Materials and methods

2.1. Data description

The variable susceptibility of Escherichia coli (E. coli), isolated from blood cultures showing variable susceptibility results to different antimicrobial agents, has been obtained from a clinical microbiology laboratory of a tertiary care hospital, Udupi district, Karnataka, during the years between 2011 and 2019. Institutional ethical clearance was obtained from Kasturba Medical College and Kasturba Hospital Institutional Ethics Committee (IEC no. 832/2019). The laboratory generally receives more than 10,000 blood culture tests annually from patients who have been suspected of bacteremia/sepsis. The blood cultures yielded positive results in 20–30% of cases and E. coli is the most common among several other bacteria causing bacteremia in our patient population. We retrieved the data of antimicrobial susceptibility test (AST) results of this bacteria from the electronic records of the laboratory. Antimicrobial susceptibility tests were performed using the Kirby Bauer disc diffusion method until 2014 and by the Vitek-2 automated method after 2015, both of which are accepted and standardized test methods according to the Clinical Laboratory Standards Institute (Ref:CLSI supplement M100. Wayne, PA: Clinical and Laboratory Standards Institute; 2012). In the former method, the bacteria was considered resistant to the antimicrobial agent amoxicillin-clavulanic acid, based on the ESBL phenotypic test, although the individual agent appeared susceptible. However, in the latter method, results were reported unchanged as detected in the automated system. Laboratory had tested E. coli against different antimicrobial agents and the results were interpreted as either susceptible or resistant. The data include information on the monthly number of E. coli isolates obtained and the total number of isolates resistant to the different antibiotics which includes a total of 108 time points. The resistance proportion is calculated as the number of E. coli isolates that were resistant to a particular antimicrobial agent/total number of E. coli tested during that time interval. This study considered the data on the antibiotic amoxicillin-clavulanic acid (AMC) and Cefoperazone-sulbactam (CSL) since it met the requirements of the model (i.e., time dependency and in the interval [0,1) or (0,1]).

2.2. Degenerate Beta autoregressive (DeβAR) model

2.2.1. Beta distribution

The parameterized probability density function of random variable X of Beta distribution is

\begin{array}{l} f (x; μ, ζ) = \frac{Γ (ζ)}{Γ (μ ζ) Γ ((1 - μ) ζ)} x^{μ ζ - 1} {(1 - x)}^{(1 - μ) ζ - 1}, 0 < x < 1 & (1) \end{array}

Here, 0 < μ < 1 and ζ > 0. The mean and variance of the random variable X is, E(X = x) = μ and $V (X = x) = \frac{V (μ)}{1 + ζ}$ ; where V(μ) = μ(1 − μ). Hence, here μ and ζ act as distribution mean and precision parameters. Here, when value of μ is fixed, as the value of ζ increases, the variance of x decreases.

2.2.2. Degenerate distribution

A Degenerate distribution is a one-point distribution where a random variable X has a single possible value.

The probability mass function of random variable X can be written as follows:

\begin{array}{l} P (X = x) = {\begin{array}{l} 1 & if x = c; \\ 0 & elsewhere \end{array} & (2) \end{array}

That is, a random variable, X, is degenerate if, for some constant, c, P(X = c) = 1.

This distribution has a single parameter, c, and it ranges from - ∞ to ∞.

2.2.3. Inflated Beta distribution

Ospina and Ferrari (9) introduced inflated Beta distributions which is a mixture of Degenerate and Beta distributions. The probability density function is

\begin{array}{l} b i_{c} (x; ω, μ, ζ) = {\begin{array}{l} ω & if x = c; \\ (1 - ω) f (x; μ, ζ) & if x \in (0,1) \end{array} & (3) \end{array}

where, 0 < ω < 1 is the mixture parameter, f(x; μ, ζ) is the p.d.f of Beta distribution and c=0 or 1 known value which follows Degenerate distribution. The cumulative distribution function (c.d.f) of mixture distribution is given by

B I_{c} (x; ω, μ, ζ) = ω I_{[c, 1]} (x) + (1 - ω) F (x; μ, ζ)

Where, $I_{A} (x)$ is an indicator function that equals 1 if x ∈ A and 0 if x ∉ A. Here, F(.; μ, ζ) is the c.d.f. of the beta distribution.

The r^th moment and variance of inflated beta distribution is

E (x^{r}) = ω c + (1 - ω) μ_{r}

V a r (x) = (1 - ω) \frac{V (μ)}{ζ + 1} + ω (1 - ω) {(c - μ)}^{2}

where, μ_r = (μζ)_r/ζ_r.

2.2.4. Beta autoregressive moving average model

Rocha and Cribari-Neto (7) introduced βARMA model to fit continuous time-series data in the interval (0, 1), which follows Beta distribution.

The proposed βARMA(p, q) model is

\begin{array}{l} g (μ_{t}) = β_{0} + s_{t}^{'} β + Σ_{i = 1}^{p} ϕ_{i} {g (x_{t - i}) - s_{t - i}^{'} β} + Σ_{j = 1}^{q} α_{j} r_{t - j} & (4) \end{array}

where, g(.) is the link function, β₀ is the intercept term, s_t's are the regressor variables, and $β = {(β_{1}, β_{2}, \dots, β_{k})}^{'}$ are set of parameters of regressors. The ϕ's and the α's are the autoregressive (AR) and moving average (MA) parameters, p and q are the AR and MA orders, and r_t is an error term, respectively.

In case of a real-life scenario when time-series data includes zeros or ones, it is challenging to use this model by assuming Beta distribution. To model such kind of continuous time-series data, we replaced Beta distribution with inflated Beta distribution proposed by Ospina and Ferrari (9). The density function of inflated Beta distribution with time can be written as follows:

Let X_t t=1,2…n be the response variable of proportion data which includes zeros or ones. We assume that the proportion series is conditionally distributed as InBE (μ_t, ζ, ω) (where, InBE stands for "Inflated Beta") with probability density function defined as:

\begin{array}{l} f_{X_{t}} (x_{t} | F_{t - 1}) = ω I_{(x_{t} = c)} + (1 - ω) \frac{Γ (ζ)}{Γ (μ_{t} ζ) Γ ((1 - μ_{t}) ζ)} \\ x_{t}^{μ_{t} ζ - 1} {(1 - x_{t})}^{(1 - μ_{t}) ζ - 1} & (5) \end{array}

or equivalently

\begin{array}{l} f_{X_{t}} (x_{t} | F_{t - 1}) \\ = \begin{array}{l} {ω i f x = c; \\ (1 - ω) \frac{Γ (ζ)}{Γ (μ_{t} ζ) Γ ((1 - μ_{t}) ζ)} x_{t}^{μ_{t} ζ - 1} {(1 - x_{t})}^{(1 - μ_{t}) ζ - 1} i f x \in (0, 1) \end{array} \end{array}

which is a mixture of Beta and Degenerate distributions. Here, $F_{t - 1}$ is the previous information set of response series. When the mixture parameter ω = 0, inflated Beta distribution reduces to Beta distribution. Based on Equations (3) and (4), the mean and variance of distribution can be written as follows:

For [0,1),

E (X_{t} | F_{t - 1}) = (1 - ω) μ_{t}

V (X_{t} | F_{t - 1}) = (1 - ω) \frac{μ_{t} (1 - μ_{t})}{ζ + 1} + {ω (1 - ω) μ_{t}^{2}}

For (0,1],

E (X_{t} | F_{t - 1}) = ω + (1 - ω) μ_{t}

V (X_{t} | F_{t - 1}) = (1 - ω) \frac{μ_{t} (1 - μ_{t})}{ζ + 1} + {ω (1 - ω) {(1 - μ_{t})}^{2}}

2.2.5. Proposed model

The proposed Degenerate Beta Autoregressive (DeβAR) model for the parameters of mixture distribution is

\begin{array}{l} η_{t} = g (μ_{t}) = s_{t}^{'} β + Σ_{i = 1}^{p} ϕ_{i} {g (x_{t - i}) - s_{t - i}^{'} β} & (6) \end{array}

where, η_t is the linear or non-linear predictor of the model, where g(.) is the link function (we used logit), s_t's are the regressor variables, and $β = {(β_{1}, β_{2}, \dots, β_{k})}^{'}$ are the unkonwn parameters of regressor variables. $ϕ_{i} = {(ϕ_{1}, ϕ_{2}, \dots, ϕ_{p})}^{'}$ are the autoregressive parameters with order p. Let $θ = {(β^{'}, ζ, ω, ϕ_{i}^{'})}^{'}$ be vector of unknown parameters with length k+p+2.

2.2.6. Parameter estimation

The parameters of the model are estimated by maximizing log-likelihood function.

Here, let X_t, t=1,2,…,n be a random variable and $F_{t - 1}$ be the set of past information. Then, the likelihood function of the parameters θ conditioning on the past p observations can be written as follows:

\begin{array}{l} L (θ; x_{t}) = \prod_{t = p + 1}^{n} f_{X_{t}} (x_{t} | F_{t - 1}) \end{array}

\begin{array}{l} L (θ; x_{t}) = \prod_{t = p + 1}^{n} b i_{c} (z_{t}; ω, μ_{t}, ζ) = L_{1} (ω) L_{2} (μ_{t}, ζ) \end{array}

where,

\begin{array}{l} L_{1} (ω) = \prod_{t = p + 1}^{n} ω^{I_{x_{t} = c}} {(1 - ω)}^{I_{x_{t} ϵ (0, 1)}} \end{array}

\begin{array}{r} L_{2} (μ_{t}, ζ) = \prod_{t = p + 1}^{n} {\frac{Γ (ζ)}{Γ (μ_{t} ζ) Γ ((1 - μ_{t}) ζ)} x_{t}^{μ_{t} ζ - 1} (1 - x_{t}) \\ ^{(1 - μ_{t}) ζ - 1}}^{ℐ_{x_{t} ϵ (0, 1)}} \end{array}

The likelihood function for the parameters of Degenerate Beta AR model is given by,

\begin{array}{r} L (θ; x_{t}) = \prod_{t = p + 1}^{n} {ω ℐ_{x_{t} = c} + ℐ_{x_{t} ϵ (0, 1)} (1 - ω) \\ \frac{Γ (ζ)}{Γ (μ_{t} ζ) Γ ((1 - μ_{t}) ζ)} x_{t}^{μ_{t} ζ - 1} (1 - x_{t})^{(1 - μ_{t}) ζ - 1}} \end{array}

Then, the log-likelihood of model is

\begin{array}{l} l o g (L (θ; x)) = l (θ) \\ = \sum_{t = p + 1}^{n} I_{x_{t} = c} l o g (ω) + \sum_{t = p + 1}^{n} I_{x_{t} ϵ (0, 1)} l o g {(1 - ω) \\ \frac{Γ (ζ)}{Γ (μ_{t} ζ) Γ ((1 - μ_{t}) ζ)} x_{t}^{μ_{t} ζ - 1} {(1 - x_{t})}^{(1 - μ_{t}) ζ - 1}} \end{array}

Here, take $I_{x_{t} = c} = x_{c t}$ , then l(θ) equals to

\begin{array}{l} \sum_{t = p + 1}^{n} x_{c t} l o g (ω) + \sum_{t = p + 1}^{n} (1 - x_{c t}) l o g {(1 - ω) \\ \frac{Γ (ζ)}{Γ (μ_{t} ζ) Γ ((1 - μ_{t}) ζ)} x_{t}^{μ_{t} ζ - 1} {(1 - x_{t})}^{(1 - μ_{t}) ζ - 1}} \end{array}

Then, the score function is given by

U (θ) = \frac{\partial}{\partial θ} l (θ) = 0

i.e., for l=1,2…k

\frac{\partial l (θ)}{\partial β_{l}} = \sum_{t = p + 1}^{n} \frac{\partial l (θ)}{\partial μ_{t}} \frac{\partial μ_{t}}{\partial η_{t}} \frac{\partial η_{t}}{\partial β_{l}}

Note that, $\frac{\partial μ_{t}}{\partial η_{t}} = μ_{t} (1 - μ_{t})$ and $\frac{\partial η_{t}}{\partial β_{l}} = s_{t l} - \sum_{i = 1}^{p} ϕ_{i} s_{(t - i) l}$ .

Then,

\begin{array}{l} \frac{\partial l (θ)}{\partial β_{l}} = \sum_{t = p + 1}^{N} (1 - x_{c t}) ζ [l o g (\frac{x_{t}}{1 - x_{t}}) \\ - {ψ (μ_{t} ζ) - ψ ((1 - μ_{t}) ζ)}] μ_{t} (1 - μ_{t}) (s_{t l} - \sum_{i = 1}^{p} ϕ_{i} s_{(t - i) l}) \end{array}

Here, let $x_{t}^{*} = l o g (\frac{x_{t}}{1 - x_{t}})$ if x_tϵ(0, 1) else $x_{t}^{*} = 0$ (11, 14) and $ψ (μ_{t} ζ) - ψ ((1 - μ_{t}) ζ) = μ_{t}^{*}$ , where ψ(.) is a digamma function. Then,

\frac{\partial l (θ)}{\partial β_{l}} = \sum_{t = p + 1}^{n} (1 - x_{c t}) ζ (x_{t}^{*} - μ_{t}^{*}) μ_{t} (1 - μ_{t}) (s_{t l} - \sum_{i = 1}^{p 1} ϕ_{i} s_{(t - i) l})

Similarly,

\begin{array}{l} \frac{\partial l (θ)}{\partial ζ} = \sum_{t = p + 1}^{n} ζ (1 - x_{c t}) {μ_{t} (x_{t}^{*} - μ_{t}^{*}) + l o g (1 - x_{t}) \\ - ψ ((1 - μ_{t}) ζ) + ψ (ζ)} \end{array}

\frac{\partial l (θ)}{\partial ω} = \sum_{t = p + 1}^{n} \frac{(x_{c t} - ω)}{ω (1 - ω)}

For i=1,2…p

\frac{\partial l (θ)}{\partial ϕ_{i}} = \sum_{t = p + 1}^{n} (1 - x_{c t}) ζ_{t} (x_{t}^{*} - μ_{t}^{*}) μ_{t} (1 - μ_{t}) (g (x_{t - i}) - s_{t - i}^{'} β)

The maximum likelihood estimator of θ is obtained by equating U(θ) = 0. Since there exists no closed form solution for these equations, a non-linear optimization algorithm like Newton's method or a Quasi-Newton algorithm such as limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm (L-BFGS-B) has been used (15, 16).

Practically, we can use the gamlss function in R software in the package GAMLSS (generalized additive models for location scale and shape) (17) to get the initial values for estimating the parameters.

In this study, we have used the Quasi-Newton method (18) under which we performed the L-BFGS-B algorithm to obtain the optimum solution for the parameters.

Large sample inference: If the model specified by Equation (5) follows the regularity condition of maximum likelihood estimation (MLE) then, MLE of θ and J(θ) (Fisher information matrix) are consistent. Assuming that $I (θ) = {lim}_{n \to \infty} {n^{- 1} J (θ)}$ exists and is nonsingular, we have $\sqrt{n} (\hat{θ} - θ)$ converges in distribution to N(0, I(θ)⁻¹).

Note: The proposed DeβAR model is applicable when $x_{t}^{*}$ is converted to 0 as mentioned above. To overcome with this limitation Bayer et al. (19) proposed Inflated beta autoregressive moving average models, which are more suitable when interval data includes 0 or 1.

2.3. Simulation study

In this simulation study, we featured finite-sample performance of the MLE. Due to time consumption, the simulated time-series data generated only for DeβAR with lag 1 and lag 2.

η_{t} = l o g i t (μ_{t}) = β_{0} + ϕ_{1} g (x_{t - 1})

η_{t} = l o g i t (μ_{t}) = β_{0} + ϕ_{1} g (x_{t - 1}) + ϕ_{2} g (x_{t - 2})

Here, ζ and ω are constant for all observations. We took β₀ = 1.2, ϕ₁ = −0.8, ϕ₂ = −0.2, ζ = 50, and ω = 0.5 as true parameters.

The Monte Carlo simulation with 15,000 replications was carried out each with sample size n = 100, 250, and 500. Of the 15,000 replicants, 5,000 were the burn out. Parameter estimates were obtained as the convergent outcome of the remaining 10,000. The parameters are estimated by maximizing the log-likelihood function using the L-BFGS-B algorithm. The bias and root mean square error of the estimates are reported.

Table 1 represents the bias and root mean square error of the parameters. Here, we can observe that, as the sample size increases, the algorithm converged for all the samples. The mean of β₀, ϕ₁, ϕ₂, and ω are close to the true values or the initial values. Also, the root mean square errors of all the estimators are decreased as the size of the sample increases, as anticipated.

TABLE 1

Table 1. Simulation results.

2.4. Forecast evaluation criteria

Selecting a proper model among several competing candidates is a hassle in a lot of time-series analysis. In many cases, to look for out-of-sample forecast accuracy, the MAPE (“Mean absolute percentage error") is used for model selection or comparison. However, when data are close to zero, other forecast measurement criteria can be used, such as MAE (“mean absolute error"), MSE (“mean square error"), RMSE (“root mean square error"), and ex post forecast error (20). In the case of time-series analysis, the ACF plot is one of the model selection criteria, and the AIC (“Akaike information criterion") and the BIC (“Bayesian information criterion") will be used for an in-sample accuracy check.

2.5. R shiny application

R shiny is a website application available freely to build under R-studio. It is an interactive website application that requires no web development skills¹. R shiny web app can be accessed at DeBAR.app².

3. Results

3.1. Exploratory data analysis

To understand the characteristics of the response series, the descriptive statistics for the data are represented in Table 2. Figure 1 represents the time-series plot of the bacteria E. coli isolated from blood cultures resistant to the antimicrobial agents AMC and CSL, where the sudden change between the years 2015 and 2016 is due to the improvisation in the testing method in April 2015, which involved a shift from manual testing method (i.e., Kirby Bauer disc diffusion method) to Vitek-2 automated method from the year 2015. This change point has been considered as a covariate in the model and represented as an indicator variable I_t. Figure 2 displays the histogram of the same. The proposed model is applicable to model stationary time-series data. The ACF plot will be used to identify the stationarity in the series. If non-stationarity exists, then significant deterministic elements can be added as a regressor variable in the model. As the current study data include change points in the series, stationarity cannot be identified through the ACF plot.

TABLE 2

Table 2. Descriptive statistics for the data on rate of E. coli resistant to AMC and CSL.

FIGURE 1

Figure 1. Time plot of proportion of E. coli in blood resistant to antibiotics AMC and CSL.

FIGURE 2

Figure 2. Histogram of proportion of E. coli in blood resistant to antibiotics AMC and CSL.

3.2. Time-series analysis

The data have been modeled using Degenerate Beta Autoregressive (DeβAR(p)) model proposed in Section 2 and the analysis has been carried out using the software R. Quasi-Newton algorithm (L-BFGS-B) has been used to maximize the partial log-likelihood function.

First, the following DeβAR(p) models, where p=1, 2, 3,…,12 fitted to the AMC data,

M o d e l 1 : η_{t} = l o g i t (μ_{t}) = β_{0} + α I_{t} + ϕ_{1} {g (x_{t - 1})}

M o d e l 2 : η_{t} = l o g i t (μ_{t}) = β_{0} + α I_{t} + Σ_{i = 1}^{2} ϕ_{i} {g (x_{t - i})}

M o d e l 3 : η_{t} = l o g i t (μ_{t}) = β_{0} + α I_{t} + Σ_{i = 1}^{3} ϕ_{i} {g (x_{t - i})}

M o d e l 12 : η_{t} = l o g i t (μ_{t}) = β_{0} + α I_{t} + Σ_{i = 1}^{12} ϕ_{i} {g (x_{t - i})}

where, x_t−i i=1,2,…,12 is the lagged response series and I_t is the change point (Indicator) variable with coefficient α. Among the 12 models, the best model is selected using the AIC and the BIC for which these values are minimum. The selected model is then compared with the existing ARIMA model.

The AIC and the BIC select Model 1 as the best among the 12 models as its values are minimum compared to the others (Table 4). From the significance test (Table 3), it can be seen that for the data AMC, lag-1 is significant at 5% level of significance (l.o.s) and for data CSL, lag-1 is significant at 10% level of significance (l.o.s) along with the indicator variable and parameters of the model, respectively. Next, the model is compared with the ARIMA model. Autocorrelation function (ACF) plots for residuals, forecast accuracy criteria (MAE, MSE, and MAPE), and information criteria (AIC and BIC) were used to select the best model for the given data.

TABLE 3

Table 3. Fitted DeβAR model for rate of E. coli resistant to AMC and CSL.

Models considered were,

Model 1: DeβAR(p), with lag at 1

η_{t} = l o g i t (μ_{t}) = β_{0} + α I_{t} + ϕ_{1} g (x_{t - 1})

Model 2: ARIMA(0, 0, 1) model using Arcsine transformation

y_{t} = β_{0} + α I_{t} - θ_{1} ϵ_{t - 1}

In the analysis, as a final step, data have been forecasted by keeping out the last six observations (hold-out data) from the time-series, and Model 1 and Model 2 have fitted to the selected (test data) series and forecasted for the next 6 months. The out-of-sample forecast accuracy are calculated using the hold-out and forecasted data from the models and comparisons between models have been carried out. The hold-out series of AMC is 0.545, 0.592, 0.7, 0.469, 0.667, and 0.6. The forecasted series from Model 1 is 0.597, 0.594, 0.6, 0.595, 0.595, and 0.6. The forecasted series from Model 2 is 0.602, 0.579,0.579, 0.579, 0.579, and 0.579. Similarly, the hold-out series of CSL is 0.205, 0.184, 0.133, 0.184, 0.294, and 0.067. The forecasted series from Model 1 is 0.192, 0.194, 0.193, 0.193, 0.194, and 0.195. The forecasted series from Model 2 is 0.223, 0.215, 0.213, 0.214, 0.214, and 0.213.

The study found that for both the data, the forecast accuracy for Model 1 is better compared to the Model 2 (Table 5) and residual plot of Model 1 follows white noise assumptions (i.e., the residual series should have mean 0 and no autocorrelation within the series), whereas for AMC data (Figure 3), Model 2 is serially correlated at lag 4. Thus, we select Model 1 as the best fit model for both the data.

FIGURE 3

Figure 3. Sample ACF of the residuals obtained from the fitted models for AMC and CSL.

Thus, the estimated DeβAR(1) model for AMC is

\hat{μ_{t}} = \frac{e x p (β_{0} + α I_{t} + ϕ_{1} g (x_{t - 1}))}{1 + e x p (β_{0} + α I_{t} + ϕ_{1} g (x_{t - 1}))}

E (x) = \hat{ω} + (1 - \hat{ω}) \hat{μ_{t}}

where, $(\hat{β_{0}}, \hat{α}, \hat{ϕ_{1}}, \hat{ω})$ = (1.68, –1.34, –0.11, 0.03). The forecasted series for next 6 months is 0.604, 0.587, 0.613, 0.587, 0.604, and 0.582.

Similarly, the estimated DeβAR(1) model for CSL is

\hat{μ_{t}} = \frac{e x p (β_{0} + α I_{t} + ϕ_{1} g (x_{t - 1}))}{1 + e x p (β_{0} + α I_{t} + ϕ_{1} g (x_{t - 1}))}

E (x) = (1 - \hat{ω}) \hat{μ_{t}}

where, $(\hat{β_{0}}, \hat{α}, \hat{ϕ_{1}}, \hat{ω})$ = (–2.185,0.877, –0.021, 0.108). The forecasted series for next 6 months is 0.19, 0.20, 0.19, 0.194, 0.192, and 0.189.

The R code of the analysis has been provided in the supporting file.

4. Discussion

AMR is difficult to control and expensive to deal with, and the outcome is that it can lead to death or severe disability. Many researchers have used different statistical techniques to look at the relationship between antimicrobial use and resistance. To list a few of them, a study by Athanasiou and Kopsini (21) in 2018 systematically reviewed the statistical methods used to analyze the AMR rate time-series data. Many of the studies used the ARIMA model (22) and the transfer function model and few studies used the multiple linear regression model. Infectious disease control researchers most often analyze and predict antimicrobial resistance rate data, which includes zeros or ones [resistance rate with zeros can be seen in the figure of the article by Lopez-Lozano et al. (23)]. The commonly used ARIMA model is inappropriate for non-Gaussian (not normally distributed) time-series data and may arrive at biased results. Rocha and Cribari-Neto (7) and Bayer et al. (8) introduced the Beta autoregressive moving average model and the Kumaraswamy autoregressive moving average model to fit rate/proportion time-series data in the interval (0, 1). To analyze proportion data with zeros or ones, Ospina and Ferrari (11) proposed an inflated beta regression model, and as an alternative, Bayer et al. (12) proposed an inflated Kumaraswamy regression model. But none of the studies addressed the autocorrelation in the response series in the interval [0, 1) or (0, 1]. So, to fill this gap, in this article, we proposed the Degenerate Beta Autoregressive (DeβAR) model.

For this intention, we collected time-series data on E.coli isolates resistant to the different antimicrobial agents, and for the current study, only amoxicillin–clavulanic acid (AMC) and Cefoperazone–sulbactam (CSL) were considered. The primary focus of this study is to forecast the AMR rate for the time-series data in the interval (0, 1] or [0, 1). The proposed DeβAR(p) models (where p=1,2,…,12) fitted to the data and best order for “p” was selected for which the values of the AIC and the BIC were minimum. The best model among these was then compared with the existing ARIMA model and the best among both was decided based on the forecasting evaluation criteria (MSE, MAE, and MAPE).

From Table 4, we can see that for the data AMC, the values of the AIC and the BIC are minimum for Model 1 (i.e., AIC = −126.19 and BIC = −112.78) compared to other models. The study found that the DeβAR (1) model performed well compared to the remaining 11 models. Next, the selected DeβAR(1) model is compared with the existing ARIMA (0, 0, 1) model selected from the autogeneration. From Table 5, we can see the values of MAE, MSE, and MAPE are minimum for the proposed Model 1 compared to the existing Model 2 along with the AIC and BIC. Hence, the study selected the DeβAR(1) model as the best among all. Similar procedure followed for the data of CSL.

TABLE 4

Table 4. AIC and BIC for DβAR(p) models for AMC and CSL data.

TABLE 5

Table 5. Model selection criterion.

By using the proposed DeβAR(1) model, the AMR rate was forecasted for next 6 months. The study results indicate the forecasted resistance rate of E. coli to the antimicrobial AMC ranges between 58 and 62% for the next 6 months, implying constant variations in the resistance rate.

The DeβAR(p) model introduced in this article would be beneficial to healthcare providers to implement early public health measures to control and prevent the rise in resistance rate.

5. Conclusion and future direction

Antimicrobial resistance is an emerging issue of public health. However, taking appropriate precautions or interventions in advance may help in reducing the resistance rate. Forecasting using an appropriate time-series model may help in predicting the expected resistance rate for the future which is further useful for policy-making.

This study proposed the Degenerate Beta Autoregressive model (DeβAR) to model antimicrobial resistance rate data, which is an extension of the βARMA model proposed by Rocha and Cribabri-Neto (2009). The proposed model can be used to fit continuous time-series data in the interval [0, 1) and (0, 1], for example, rates or proportions. The model is applicable when the series is stationary in nature. The parameters of the model are estimated by maximizing the likelihood function and closed-form solutions for the score function are obtained by using the non-linear optimization algorithm (L-BFGS-B). The application of the model is presented using AMR data.

The outcome from a time-series model helps the healthcare policymakers to implement an appropriate intervention in advance to reduce the risk of rise in resistance rate.

In future, the study results can be improvised by considering antimicrobial consumption as a regressor variable and the proposed model can be improvised by incorporating moving average terms and seasonal components.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Ethics statement

The studies involving human participants were reviewed and approved by Institutional Ethical Clearance was obtained from Kasturba Medical College and Kasturba Hospital Institutional Ethics Committee (IEC No. 832/2019). Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Author contributions

JL, AK, and VK framed the objective of the study. VK collected the data. JL conducted the analysis, drafted initial manuscript, and developed R shiny app. AK and VK revised the manuscript. All authors have read and approved the final manuscript.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh.2022.969777/full#supplementary-material

Footnotes

1. ^https://shiny.rstudio.com/

2. ^https://jevithalobo.shinyapps.io/column/

References

1. WHO. Worldwide Country Situation Analysis: Response to Antimicrobial Resistance. Geneva: WHO (2015). Available online at: https://ghc.fiu.edu/wpcontent/uploads/sites/38/2018/04/Supplement-Material.pdf

Google Scholar

2. Zeng S, Xu Z, Wang X, Liu W, Qian L, Chen X, et al. Time series analysis of antibacterial usage and bacterial resistance in China: observations from a tertiary hospital from 2014 to 2018. Infect Drug Resist. (2019) 12:2683. doi: 10.2147/IDR.S220183

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Ferrari S, Cribari-Neto F. Beta regression for modelling rates and proportions. J Appl Stat. (2004) 31:799–815. doi: 10.1080/0266476042000214501

CrossRef Full Text | Google Scholar

4. Mitnik PA, Baek S. The Kumaraswamy distribution: median-dispersion re-parameterizations for regression modeling and simulation-based estimation. Stat Pap. (2013) 54:177–92. doi: 10.1007/s00362-011-0417-y

CrossRef Full Text | Google Scholar

5. Pumi G, Rauber C, Bayer FM. Kumaraswamy regression model with Aranda-Ordaz link function. Test. (2020) J3:1-21. doi: 10.1007/s11749-020-00700-8

CrossRef Full Text | Google Scholar

6. Lemonte AJ, Bazán JL. New class of Johnson distributions and its associated regression model for rates and proportions. Biometr J. (2016) 58:727–46. doi: 10.1002/bimj.201500030

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Rocha AV, Cribari-Neto F. Beta autoregressive moving average models. Test. (2009) 18:529. doi: 10.1007/s11749-008-0112-z

CrossRef Full Text | Google Scholar

8. Bayer FM, Bayer DM, Pumi G. Kumaraswamy autoregressive moving average models for double bounded environmental data. J Hydrol. (2017) 555:385–96. doi: 10.1016/j.jhydrol.2017.10.006

CrossRef Full Text | Google Scholar

9. Ospina R, Ferrari SL. Inflated beta distributions. Stat Pap. (2010) 51:111. doi: 10.1007/s00362-008-0125-4

CrossRef Full Text | Google Scholar

10. Cribari-Neto F, Santos J. Inflated Kumaraswamy distributions. Anais da Academia Brasileira de Ciências. (2019) 91:955. doi: 10.1590/0001-3765201920180955

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Ospina R, Ferrari SL. A general class of zero-or-one inflated beta regression models. Comput Stat Data Anal. (2012) 56:1609–23. doi: 10.1016/j.csda.2011.10.005

CrossRef Full Text | Google Scholar

12. Bayer FM, Cribari-Neto F, Santos J. Inflated Kumaraswamy regressions with application to water supply and sanitation in Brazil. Stat Neerlandica. (2021) 75: 453–81. doi: 10.1111/stan.12242

CrossRef Full Text | Google Scholar

13. Daga AP, Koga VL, de Matos CM, Perugini MR, Pelisson M, Kobayashi RK, et al. Escherichia coli bloodstream infections in patients at a university hospital: virulence factors and clinical characteristics. Front Cell Infect Microbiolgy. (2019) 9:191. doi: 10.3389/fcimb.2019.00191

PubMed Abstract | CrossRef Full Text | Google Scholar

14. 14. Benjamin MA, Rigby RA, Stasinopoulos MD. Fitting non-Gaussian time series models. In: InCOMPSTAT: Proceedings in Computational Statistics 13th Symposium held in Bristol, Great Britain, 1998. (1998). p. 191–6.

Google Scholar

15. Nocedal J, Wright SJ. Numerical Optimization. New York, NY: Springer (1999).

Google Scholar

16. Xiao Y, Wei Z, Wang Z. A limited memory BFGS-type method for large-scale unconstrained optimization. Comput Math Appl. (2008) 56:1001–9. doi: 10.1016/j.camwa.2008.01.028

CrossRef Full Text | Google Scholar

17. Stasinopoulos DM, Rigby RA. Generalized additive models for location scale and shape (GAMLSS) in R. J Stat Softw. (2008) 23:1–46. doi: 10.18637/jss.v023.i07

CrossRef Full Text | Google Scholar

18. Schoenberg R. Optimization With the Quasi-Newton Method. Maple Valley, WA: Aptech Systems, Inc. (2001).

Google Scholar

19. Bayer FM, Pumi G, Pereira TL, Souza TC. Inflated beta autoregressive moving average models. Comput Appl Math. (2023) 42:183. doi: 10.1007/s40314-023-02322-w

Crossref Full Text | Google Scholar

20. Doszyń M. Forecasting randomly distributed zero-inflated time series. Folia Oeconomica Stetinensia. (2017) 17:7–19. doi: 10.1515/foli-2017-0001

CrossRef Full Text | Google Scholar

21. Athanasiou CI, Kopsini A. Systematic review of the use of time series data in the study of antimicrobial consumption and Pseudomonas aeruginosa resistance. J Glob Antimicrob Resist. (2018) 15:69–73. doi: 10.1016/j.jgar.2018.06.001

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Choi B. ARMA Model Identification. Berlin: Springer Science & Business Media (2012).

Google Scholar

23. Lopez-Lozano JM, Monnet DL, Yagüe A, Burgos A, Gonzalo N, Campillos P, et al. Modelling and forecasting antimicrobial resistance and its dynamic relationship to antimicrobial use: a time series analysis. Int J Antimicrob Agents. (2000) 14:21–31. doi: 10.1016/S0924-8579(99)00135-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: Beta distribution, time-series model, mixture distribution, rates, proportions, inflated distribution, AMR, resistance

Citation: Lobo J, Kamath A and Kalwaje Eshwara V (2023) Degenerate Beta autoregressive model for proportion time-series with zeros or ones: An application to antimicrobial resistance rate using R shiny app. Front. Public Health 10:969777. doi: 10.3389/fpubh.2022.969777

Received: 15 June 2022; Accepted: 20 December 2022;
Published: 10 January 2023.

Edited by:

Marwan Osman, Cornell University, United States

Reviewed by:

Rohan Kumar Raman, ICAR-Research Complex for Eastern Region, India
Imad Al Kassaa, Fonterra, New Zealand
Ananda Kosuvaripalli, JSS Banashankari Arts, Commerce and Shanthi Kumar Gubbi Science College, India

Copyright © 2023 Lobo, Kamath and Kalwaje Eshwara. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Asha Kamath, yes YXNoYS5rYW1hdGhAbWFuaXBhbC5lZHU=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.