Forecasting of Milk Production in Northern Thailand Using Seasonal Autoregressive Integrated Moving Average, Error Trend Seasonality, and Hybrid Models

Milk production in Thailand has increased rapidly, though excess milk supply is one of the major concerns. Forecasting can reveal the important information that can support authorities and stakeholders to establish a plan to compromise the oversupply of milk. The aim of this study was to forecast milk production in the northern region of Thailand using time-series forecast methods. A single-technique model, including seasonal autoregressive integrated moving average (SARIMA) and error trend seasonality (ETS), and a hybrid model of SARIMA-ETS were applied to milk production data to develop forecast models. The performance of the models developed was compared using several error matrices. Results showed that milk production was forecasted to raise by 3.2 to 3.6% annually. The SARIMA-ETS hybrid model had the highest forecast performances compared with other models, and the ETS outperformed the SARIMA in predictive ability. Furthermore, the forecast models highlighted a continuously increasing trend with evidence of a seasonal fluctuation for future milk production. The results from this study emphasizes the need for an effective plan and strategy to manage milk production to alleviate a possible oversupply. Policymakers and stakeholders can use our forecasts to develop short- and long-term strategies for managing milk production.


INTRODUCTION
Thailand's dairy industry is one of the economically important agricultural sectors. In 2018, 1.29 million tons of raw milk were produced in the country (1). According to data from May 2020, there were 24,229 dairy farmers operating dairy farms and the number of dairy cattle in Thailand was 8,11,756 head. The majority of dairy farmers were smallholders who utilized the tie-stall system for farming. The primary breed of dairy cows was Holstein with a small fraction of other breeds. The three most extensive dairy farming regions are central, northeast, and north (2). In the northern region, dairy farms are located in six provinces and the majority of them are in Chiang Mai (3). The average milk production in the north is 437 tons per day (4). Dairy farmers in the north, similar to those in the rest of Thailand, are members of dairy cooperatives that manage milk collection centers (5,6). Bulk tank milk from dairy farms is collected, cooled, stored at milk collection centers, and then transported to milk processing plants (7).
In recent years, an oversupply of milk has been a matter of concern among policymakers and stakeholders in Thailand (8,9). Even though the milk supply has risen, the dairy sector struggles to increase milk consumption in the country owing to cultural hurdles (9). In 2020, Thailand's per capita for dairy products was ∼17 L per year, which is quite low, compared with other Asian countries such as Japan (31 L per year) and India (49 L per year) (10). The oversupply of milk may pose challenges to several sectors in the supply chain. To control situations of oversupply, authorities and stakeholders need to formulate effective short-and long-term plans to manage milk supply based on accurate estimations of milk production derived from well-accepted prediction methods.
Forecasting is a vital element in economic decision-making. Of the several time-series forecast methods demonstrated in the literature (11)(12)(13)(14)(15), the autoregressive integrated moving average (ARIMA) is one of the most commonly used. This method has been extensively used in various domains such as economic (16,17), human medicine (18)(19)(20), veterinary science (21,22), and agriculture science (23)(24)(25). ARIMA is extended by the seasonal autoregressive integrated moving average (SARIMA) method which is suitable for modeling time-series data with a seasonal trend (26)(27)(28). Moreover, another commonly used method for analyzing seasonality time-series data is the exponential smoothing (ES) (15,27). Within a state-space framework, the ES method can combine error (E), trend (T), and seasonal (S) components in a smoothing calculation which is referred to as an error trend seasonality (ETS) model (29). A total of 30 possible ETS combinations within the state-space framework render the ability to analyze any time-series data, even with both heterogeneity and non-linearity (30). Furthermore, in the last two decades, a hybrid modeling which combines forecasting methods has been used in many disciplines to enhance the accuracy of forecast models (11,15,31). For example, a study of healthcare data demonstrates that combining SARIMA and ETS (SARIMA-ETS hybrid) methods offers better performance and forecast accuracy than models developed from a single method (32). At present, numerous hybrid models have been proposed; however, selecting methods to be combined as a hybrid model necessitates an understanding of the nature of each timeseries data (27). Interestingly, although hybrid models are widely used with a variety of datasets such as medical (33,34) and environmental data (15,35), their application to milk production data is very limited.
Several researchers have applied time-series methods to forecast milk production in Bangladesh (36), China (37), Ethiopia (38), and India (39), but the methods used in those studies were limited to ARIMA (11). Although several forecast methods such as artificial neural network (ANN) and support vector machines (SVM) are widely used in time-series models (40,41), this study focused primarily on the methods that are capable of dealing with seasonality time-series data since milk production is likely to follow a seasonal pattern (39,(42)(43)(44)(45). As a result, the SARIMA, ETS, and SARIMA-ETS hybrid models were selected to analyze the milk production data in this study.
Milk production forecasting based on well-accepted statistical methods is essential for government authorities, decision-makers and stakeholders in Thailand to establish short-and long-term plans to deal with future milk production, particularly when an oversupply is anticipated. Hence, the objective of this study was to develop forecast models to predict milk production in northern Thailand using SARIMA, ETS, and SARIMA-ETS hybrid models. Meanwhile, the predictive performances of forecast models were evaluated in order to identify the most accurate model, which could subsequently be used to estimate future milk production.

Study Area and Milk Production Data
The milk production data from all dairy cooperatives and private milk collection centers in all northern provinces (n = 6) were collected monthly and then verified by livestock authorities from the Department of Livestock Development. Furthermore, data on milk production were summarized in order to represent the monthly milk production of the northern region's dairy sector. The provinces in northern Thailand from which milk production data were collected are shown in Figure 1.
We have two different approaches for data management and data analysis based on the objectives of the study. The first approach was to develop forecast models and predict the milk production using the entire dataset (full dataset). The second approach was to build forecast models and then compared forecast values with the actual data value by holding the last 12 months of the data as a validation data (validation dataset) and developing models from the remaining data (training dataset). Hence, milk production data from the full dataset (January 2016-December 2020) was divided into training (January 2016-December 2019) and validation datasets (January 2020-December 2020). The training dataset was used for model development. Furthermore, the final models obtained from this training dataset were evaluated for their performance using the validation dataset. By this process, we could compare the model performances among models developed. Similar to the process using the training dataset, the full dataset was used to develop models to forecast milk production for the period of January 2021-December 2022. Notably, the advantage of using the full dataset was its update. If the training dataset was used to forecast, the models would be trained only for 2016-2019 data, which might affect the forecast of milk production for 2021-2022 due to the long-term forecast effect that may contribute to a decrease in forecast accuracy. Since we used the full dataset to developed the models, the forecast for the period of 2021-2022 is less likely to be affected by this constraint, providing the relevant users and policy makers a more accurate prediction.

Statistical Analysis
Data were analyzed using R statistical software version 4.0.4 and relevant packages (46). Raw milk production data were converted Frontiers in Veterinary Science | www.frontiersin.org to time-series data using functions from the "xts" package and then decomposed with functions from "TSstudio" packages. To develop forecast models, several functions from a "forecast" package were used.

Time-Series Data Decomposition
The time-series milk production data were decomposed into trend, season, and error to facilitate separate examination for each of them.
Additive time-series data consisted of trend, season, and irregular (error) components, and the model is given as follows (26): where y t is the milk production value at time t, T t is the trendcycle component at time t, S t is the seasonal component at time t, and I t is the irregular (remainder) component at time t.

Model Development
For the pre-modeling steps, we examined the milk production data by testing the following assumptions including (i) the stationary of time-series data using Augmented Dickey-Fuller Test, (ii) the heteroscedasticity in a time series using an autoregressive conditionally heteroscedastic test (ARCH test), and (iii) autocorrelation of time-series data using autocorrelation function (ACF) and partial autocorrelation function (PACF) plots.

SARIMA Model
The SARIMA model (26) is expressed as where φ and θ are parameters of the autoregressive and moving average, whereas and are parameters of the seasonal autoregressive and seasonal moving average, respectively. Meanwhile, p, d, and q are the order of autoregressive, degree of differencing, and order of moving average, respectively. Additionally, P, D, and Q are the orders of the seasonal autoregressive, degree of seasonal differencing, and order of seasonal moving average, respectively. S is the seasonal length, y t is predicted variable, and ε t is a random error at time t.
The function auto.arima from the "forecast" package (47) was applied to the milk production data to determine the best fitting ARIMA model. The auto.arima algorithm performed various steps to select the model (26). In brief, the steps were given as follows-(i) if the data were non-stationary, the algorithm would determine the number of differences in the range of 0 ≤ d ≤ 2 using repeated Kwiatkowski-Phillips-Schmidt tests until they became stationary, (ii) the values of p and q were chosen by minimizing the corrected Akaike Information Criterion (AICc) after differencing the data d times, (iii) the algorithm searched for a combination of p and q using a stepwise approach, and (iv) the algorithm repeated the search to evaluate AICc, and variations could be found until they reach the lowest AICc. The model with the lowest AICc was defined as the final model and used for further predictions (26,47).

ETS Model
For ETS methods, the forecast was made considering a weighted average of past observation. The latest observation is given exponentially more weight than earlier observations (48). The state-space model of the ETS was defined as ETS (., ., .) for (Error, Trend, Seasonal). The possibilities for each component are Error = (A, M), Trend = (N, A, A d ), and Seasonal = (N, A, M). The letters A, M, N, and A d refer to additive, multiplicative, none, and additive damped, respectively (29). For instance, the ETS (A, A, M) was the method with the additive trend, multiplicative seasonality and additive errors (25,26). By using the state-space structure, the optimal exponential smoothing model could be determined. According to the model building algorithm from the "forecast" package, the ets function was utilized to determine the final ETS model, defined as the model with the lowest AICc value among models developed (26).

Hybrid Model
Several functions from the "forecast" and "forecastHybrid" packages (26,49) were employed to develop a hybrid model of SARIMA and ETS (SARIMA-ETS). Since functions from "forecastHybrid" R package developed by Shaub and Ellis offer the automated procedure to obtain the best fitting model (27); thus, the hybridModel function from this package was employed to develop the SARIMA-ETS model. According to this procedure, forecast values generated from auto.arim and ets functions were combined equally to develop the hybrid model. The automatic methodology from the hybridModel function provided results obtained by optimizing the prediction features of the modelbased minimizing error (27). The detail of these procedures was previously described (49,50).

Model Assumption Diagnostics
Residuals from all final models were tested for model assumptions. The Ljung-Box Q test would be used to diagnose if the residual error sequence from the final model was a white-noise sequence (51). In addition, residuals from the final model were plotted to examine autocorrelation and partial autocorrelation functions (26). All of these methods for checking residuals were performed using a function checkresiduals from the "forecast" package, which produced a time plot, ACF plot, and histogram of the residuals (with an overlaid normal distribution for comparison) and conducted a Ljung-Box test (26).

Model Performance and Forecast
With the validation dataset, error matrices that were commonly used including mean absolute error (MAE), root mean square error (RMSE), and mean absolute percent error (MAPE) were calculated to determine forecast performances among the developed models (26,52). It is generally accepted that the lower the measure error matric values, the better the method (33).
As described previously, the forecasts of milk production were estimated from the models developed from training and full datasets. Thus, we forecasted milk production for the periods of January-December 2020 and January 2021-December 2022 using training and full datasets, respectively. The performance of forecast models was determined by comparing the milk production forecast values from forecast models applied to the training dataset with actual milk production values from the validation dataset. In addition, forecast values for milk production for the period of January 2021-December 2022 were computed using forecast models applied to the full dataset.

Decomposition
Overall, the actual monthly milk production showed an increasing trend with the existence of seasonal fluctuation (Figure 2). Upon decomposition of the actual data into trends, it was revealed that a consistently increasing trend in milk production could be observed over the period 2017-2018, and the milk production was gradually raised from 2019 to 2020. When the data was decomposed into a seasonal component, a seasonal pattern was clearly shown, with the predominant peak in milk production occurring between March and May every year (Figure 2).

Models
Based on the ARIMA approach, the SARIMA (1,0,0) (1,1,0) 12 was defined as the final model. This model is interpreted as follows: the number of lag observations included in the model or lag order is equal to one (p = 1), the degree of differences is equal to zero (d = 0), the order of moving average is equal to zero (q = 0), the last seasonally offset observation is used in the model (P =1), seasonal differences are equal to one (D = 1) and moving average order in seasonality is equal to zero (Q =0), and yearly seasonal is set (m

Model Performance
The SARIMA-ETS model performed better than other models as it had the lowest MAE, RMSE, and MAPE values ( Table 1). This finding implied that the hybrid model approach had a better forecast accuracy compared with the single model approach. The actual and predicted values for milk production based on the validation dataset are shown in Figure 3. It was demonstrated that several SARIMA-ETS forecast values were remarkedly closer to the actual values of milk production than those from other models. Notably, for the last six months of the testing dataset, the hybrid model predicted milk production with a high degree of accuracy, whereas the ETS appeared to perform well in prediction over the course of a year.

Milk Production Forecast
According to the ETS, SARIMA-ETS and SARIMA models, milk production was expected to increase by 3.2, 3.4 and 3.6% per year between 2021 and 2022, respectively. The forecast values of milk production from all model are illustrated in Figure 4 and presented in the Supplementary Material S2. Results from the best performance model (SARIMA-ETS) highlighted an increasing trend of milk production with fluctuation due to the seasonality. The SARIMA-ETS hybrid model delivered forecast values that were all in the middle of forecast values from SARIMA and ETS methods. Notably, the SARIMA provided the highest forecast values for 14 of the 24 months of the prediction.

DISCUSSION
Oversupply of milk is a matter of serious concern for Thailand's dairy industry (7,53). Therefore, establishing accurate forecast models for milk production becomes the cornerstone of efficient planning and management. In this study, we developed and evaluated forecast models using various time-series statistical methods to predict milk production. The novelty of this study lies in the fact that milk production data were firstly analyzed using the hybrid forecasting model.
It is generally accepted that there is no one-size-fits-all forecast method for various data types and each one comes with its own merits (27,54). Thus, numerous models based on several forecast methods to predict milk production were developed. The findings from this study demonstrated that the ETS model performed better than the SARIMA model. This finding can be explained by the fact that each method has a different capacity for dealing with the data. Although both methods have a great capability to analyze data with seasonal patterns, the SARIMA has difficulties in detecting and considering the nonlinear pattern of the data (15,55) which may exist in our data. On the other hand, the state-space methods of the ETS can capture both linear and non-linear patterns of the time-series data (32,56). Hence, it provided a higher forecast accuracy. Our study also highlighted that the hybrid model outperformed the single model, which has been supported by numerous previous studies (33,52,57,58). In this study, the SARIMA tend to overestimate the milk productions, but the ETS appeared to underestimate them. The most accurate forecast model was obtained from the SARIMA-ETS hybrid model. The better forecast performance of the hybrid model over the single model observed in this study was likely attributed to its capability to capture irregular fluctuations, seasonality patterns, and other data behaviors (27,52) because the hybrid model combined strengths of the single model. While the SARIMA performed well in dealing with the autocorrelation in data, the ETS has a great capability to deal with the trend and irregular patterns (59). Indeed, the hybrid model combines the strengths of each model that can work on its own expertise. In addition, the hybrid model can make the model work together to overcome other weaknesses (60). Thus, the hybrid model was expected to deliver more accurate predictions compared to the single model (61) as observed in the present and numerous previous studies (11,55,57,60).
The present study provided the estimation of future milk production from the northern region, which policymakers could combine with other milk production data from other regions to establish a plan for managing the overall milk production in Thailand. In this study, we identified the major challenges regarding the future milk production in the north FIGURE 3 | The actual milk production (blue circles) from the full dataset (January 2016-December 2020) and forecast milk production values (red circles) for January-December 2020 derived from seasonal autoregressive integrated moving average (SARIMA), error trend seasonality (ETS), and SARIMA-ETS hybrid models applied to the training dataset. The performance of time series models was measured by comparing forecasted and milk production values from January to December 2020.
Frontiers in Veterinary Science | www.frontiersin.org including the increasing trend of milk production for the period of 2021-2022 and the fluctuation in the production due to seasonality (Figure 4). The upward trend of milk production poses a challenge to authorities and stakeholders along the milk supply chain to formulate a strategic plan to organize future milk production. Identifying new potential target consumers or expanding milk markets may be a part of the plan. Moreover, our prediction results suggested that the milk production during some months was relatively high compared with other months due to the seasonality. Thus, we suggested that milk marketing should be planned to manage this fluctuation.
Regarding previous reports from other countries such as Bangladesh, India, and Pakistan, milk production forecasts are limited to ARIMA models (43,62,63). Accordingly, the lack of the application of other classical and hybrid models to fit the data may result in some knowledge gaps in the pursuit of a better forecasting strategy for this type of data. To the best of our knowledge, this was the first study to forecast milk production utilizing ETS and hybrid models. However, it's worth noting that the best-fitting model in this study may not be well-suited for all milk production data. Thus, for future studies, we suggest applying several appropriate forecast models to a particular milk production dataset and then determine the best-suited model. For a follow-up study in Thailand, we recommend establishing forecast models for milk production using nationwide data. Herein, it was the first study for our ongoing project, and the method selected to be utilized was based on the nature of milk production data in Thailand, which obviously has a seasonal pattern; however, an application of other advanced forecast models is warranted for follow-up studies.
In summary, we applied time-series statistical methods including SARIMA, ETS, and SARIMA-ETS hybrid models to analyze and forecast milk production. Our findings indicated a continuously increasing trend of milk production in the northern region of Thailand from 2016 to 2020. The most preferred model was the SARIMA-ETS model yielded the lowest values of error matrices. Our results have provided policymakers and stakeholders with useful information for developing an effective strategic plan for managing milk volumes in the coming years. This approach is not limited to the data used in this study; it can be applied to an updated dataset to produce ongoing forecasts.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Materials, further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
VP, KJ, and CS designed the study, analyzed the data, and wrote the main manuscript text. VP, CS, and KK collected, managed, and visualized the data. VP provided oversight for the study and draft the manuscript. VP, KJ, KK, and CS contributed to the interpretation of the result and reviewed the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This work was funded by Chiang Mai University (Grant numbers: R000026522 and R000026062). The funder had no role in the study design, data analysis, decision to publish, or manuscript preparation.

ACKNOWLEDGMENTS
This work was supported by staff from the Research group for Veterinary Public Health, Faculty of Veterinary Medicine, Chiang Mai University.