Development of a Conjunctivitis Outpatient Rate Prediction Model Incorporating Ambient Ozone and Meteorological Factors in South Korea

Ozone (O3) is a commonly known air pollutant that causes adverse health effects. This study developed a multi-level prediction model for conjunctivitis in outpatients due to exposure to O3 by using 3 years of ambient O3 data, meteorological data, and hospital data in Seoul, South Korea. We confirmed that the rate of conjunctivitis in outpatients (conjunctivitis outpatient rate) was highly correlated with O3 (R2 = 0.49), temperature (R2 = 0.72), and relative humidity (R2 = 0.29). A multi-level regression model for the conjunctivitis outpatient rate was well-developed, on the basis of sex and age, by adding statistical factors. This model will contribute to the prediction of conjunctivitis outpatient rate for each sex and age, using O3 and meteorological data.


INTRODUCTION
Air pollution is a significant global issue that has substantial effects on air quality, human health, earth hydrological cycle, and climate change (Correia et al., 2013;Lelieveld et al., 2015;Sicard et al., 2016;Duan et al., 2017). The Clean Air Act recommends that the U.S. Environmental Protection Agency (EPA) build National Ambient Air Quality Standards for "six criteria air pollutants, " which include particulate matter (PM), carbon monoxide (CO), sulfur dioxide (SO 2 ), nitrogen dioxide (NO 2 ), lead, and ozone (O 3 ) (U. S. Environmental Protection Agency, 2010). The six criteria air pollutants are known to cause a wide range of health effects, including respiratory (Guan et al., 2016), cardiovascular (Franklin et al., 2015), eye (Szyszkowicz et al., 2018), and skin diseases (Eastham et al., 2018). Among the six criteria air pollutants, O 3 is commonly known as the most toxic component produced by photochemical reactions in the atmosphere (Seinfeld and Pandis, 2006). Bell et al. (2004) revealed the relationship between O 3 and short-term mortality in 95 communities in the United States.
Previous epidemiological studies have associated significant adverse human health effects by exposure to O 3 (Fann et al., 2012). While much attention is focused on the effect of O 3 on respiratory diseases (Sousa et al., 2013;Karakatsani et al., 2017;Stergiopoulou et al., 2018), less effort has been attached to discerning its role in eye disease. The effects of O 3 on eye disease have been investigated in epidemiological studies (Hong et al., 2016;Hwang et al., 2016). Hong et al. (2016) studied the relationships of air pollutants (SO 2 , NO 2 , O 3 , PM 10 , PM 2.5 ) and meteorological data with allergic conjunctivitis outpatients by using a retrospective registry study. However, that study had limitations in its analysis of the multi-level effect of air pollutants and meteorological data on conjunctivitis outpatient rate because it used the relationships between outpatients and individual factors. Hwang et al. found that dry eye disease outpatient rate was associated with high ozone concentration and low relative humidity, by using multivariable regression analysis.
The goal of this study was to develop a multi-level prediction model for conjunctivitis outpatient rate according to O 3 and meteorological factors in Seoul, South Korea. Three years of O 3 data, meteorological factors, and conjunctivitis outpatient rates in Seoul are reported. The subsequent discussion focuses on development and validation of a conjunctivitis outpatient prediction model with those data.

Hospitalization Data
Conjunctivitis outpatient statistic data between January 1, 2011 and December 31, 2013 in Seoul were obtained from the Korea Health Insurance Review and Assessment Service (KHIRAS) for research purpose. The KHIRAS provided number of ophthalmology outpatient based on diagnostic codes excluding patient personal information. In total, 97.2% of Korean residents receive Korea National Health Insurance Service (KNHIS) health insurance (Korean National Health Insurance Services, 2016). All hospitals in Korea are required to submit claim documents for medical services. We obtained data for 48,344 conjunctivitis patients, except waterborne and chronic conjunctivitis patients, based on disease code. The conjunctivitis outpatient rates of each age range and gender were calculated as the number of outpatients divided by the population, in order to normalize the data.

Air Pollutants and Meteorological Data
Hourly measurements of O 3 were obtained for the years between January 1, 2011 and December 31, 2013 from 40 ground-based air pollutant monitoring sites operated by the city of Seoul, South Korea (Figure 1). To determine how meteorological factors are related to conjunctivitis outpatient rate, hourly temperature and relative humidity data were obtained at the collocated sites. We used weekly average data of patient visits and meteorological factors to avoid statistical errors due to no patient visits on weekends.

Model Development
A multi-level regression model (two-level regression model) was developed for the prediction of conjunctivitis outpatient rate. The structure of the model is shown in Figure 2. The level 1 regression model describes the relationship between level 1 independent variables and the conjunctivitis outpatient rate. Four air pollutants (PM 10 , NO 2 , SO 2 , and O 3 ) and two meteorological factors (temperature and humidity) were considered as candidate level 1 model independent variables. Correlations between these factors and the conjunctivitis outpatient rate were calculated. PM 10 , NO 2 , and SO 2 were removed from the level 1 regression model due to their negative correlations. The level 1 regression model was developed for each age range and gender. The shapes of the level 1 regression model were changed based on age range and gender. The coefficients of level 1 regression model can be explained by level 2 independent variables. An ANOVA was tested for the level 1 regression models and multi-level regression models. The detailed analysis and results are shown in the next section. Figure 3 shows the weekly trends of meteorological factors, O 3 , and conjunctivitis outpatient rates between 2011 and 2013. The highest and lowest seasonal averages of O 3 concentrations from the sampling sites were 0.27 (April-June) and 0.12 ppm (October-December), respectively. The July-September data contained the highest values for temperature (24.7 • C), humidity (70.7%), and number of conjunctivitis outpatients (359.5), while between January and March data had lowest values for temperature (−0.8 • C), humidity (51.2%), and number of conjunctivitis outpatients (267.0). The number of conjunctivitis outpatients was positively correlated with the temperature (R 2 = 0.72) and humidity (R 2 = 0.29). The correlation coefficient between the number of conjunctivitis outpatients and O 3 is 0.49. We developed a regression model based on the relationships between the number of conjunctivitis outpatients and other factors.

RESULTS AND DISCUSSION
In previous research (Hong et al., 2016), the effect of each factor on conjunctivitis was examined individually. In contrast, in this study, regression models were developed with five independent factors, including temperature, humidity, O 3 , sex, and age, in order to consider these factors concurrently. First, the regression models for temperature, humidity, and O 3 were developed, then sex and age factors were added by multi-level regression modeling. All regression models were developed by R 3.2.3 with the MASS library. The response variable and independent variables for the developed regression models were as follows: y: outpatient rate per week (the number of outpatients per week/the population), X 1 : average temperature per week + 20 ( • C), X 2 : average humidity per week (%), X 3 : average O 3 per week(ppm).
The estimated coefficients of each model and the test results are shown in Table 1. One week for every 3 weeks over 156 weeks was randomly selected for only model validation (out-ofsample test). The other 2 weeks for every 3 weeks were used for model development and validation (in-sample test). All three models were significant based on their small p values. However, model 2 was the best model due to better R 2 and Adjusted R 2 for in-sample and out-of-sample tests. Figure 4 shows the normal probability plot for model 2. Most residuals in the graph are located near the diagonal line, which shows normality of residuals.
The model 2 can predict the outpatient rate with temperature, humidity, and O 3 . The out of sample test shows the prediction accuracy of the regression model since the sample for out of sample test does not use for model development. The Figure 5 shows an example of the outpatient rate prediction with the model 2. Figure 5 shows the estimated outpatient rate by model 2 for three different temperature and humidity combinations (Temperature, Humidity) over O 3 . In South Korea, temperature and humidity increase during the summer and decrease during the winter. The three temperature and humidity combinations, high, average, and low, were determined based on the average temperature and humidity over the test time periods; these were 12.34 • C and 58.5%, respectively. The outpatient rate increased with increased temperature and humidity. In contrast, the dry eye disease outpatient rate increased with reduced relative humidity (Hwang et al., 2016). This is presumably due to multiple factors rather than the simple effect of relative humidity. The regression models including sex and age, were developed based on model 2. The additional independent variables for the regression model were defined as follows:  Sex: 0 for male and 1 for female, Age: 1 (0-9 years old), 2 (10-19 years old), 3 (20-29 years old), . . . , 9 (> 80 years old). Figure 6 shows the average outpatient rate over 156 weeks for each sex and age. The outpatient rates decrease until the 20-29 years old group, then typically increase for the younger ages for both males and females. The female outpatient rates are higher than those for males, for all age ranges except 0-9 years old. Regression models were developed for each sex and age combination, as shown in Table 2. However, sex and age can be independent variables by assuming each coefficient of model 2 is a function of sex and age.
Assuming β 0 and β ij in Table 2 are functions of sex and age, then let the function be g 0 (sex, age) and g ij sex, age ; the regression model can be represented as follows: y = g 0 (sex, age) + g 11 (sex, age) · X 1 + g 12 (sex, age) · ln(X 1 ) + g 21 (sex, age) · X 2 + g 22 (sex, age) · ln(X 2 ) + g 31 (sex, age) · X 3 + g 32 (sex, age) · ln(X 3 ) + ε This is a multi-level regression model; thus, model 2 is a first-level regression model and g 0 (sex, age) and g ij sex, age are second-level regression models (Gelman and Hill, 2007). This model is applicable when there is a hierarchical structure among independent variables. In this study, sex and age were considered higher-level independent variables. Because the effect of age is nonlinear, as shown in Figure 6, the regression model for g 0 (sex, age) was developed by the following relationship: sex + age + sex · age + ln age + sex· ln age + exp(age) + sex · exp(age). In order to develop a simple model, the model selected for g ij sex, age was one of the following relationships: 1) sex + age + sex · age, 2) sex + ln(age) + sex · ln(age), 3) sex + exp(age) + sex · exp(age).
May insert up to 5 heading levels into your manuscript as can be seen in "Styles" tab of this template. These formatting styles are meant as a guide, as long as the heading levels are clear, Frontiers style will be applied during typesetting.

CONCLUSIONS
The weekly average O 3 concentrations were highly correlated with meteorological factors and numbers of outpatients. This study provides models for prediction of conjunctivitis outpatient rates by using multiple concurrent independent variables, such as temperature, humidity, and O 3 . This model verifies the effect of O 3 by the developed regression model. When O 3 increases, the outpatient rate also increases. A method to develop a multi-level regression model for the conjunctivitis outpatient rate is provided. Sex and age factors are added to the developed regression model by using multilevel regression modeling. This enabled us to predict the conjunctivitis outpatient rate by using five independent factors concurrently. The developed models can be used to identify the characteristics of conjunctivitis outpatient rate on the basis of each independent variable. Test results for the developed models and their prediction examples are provided. Other pollutants can be included in future research. In future study, we will apply the multi-level regression model to other environmental diseases.

AUTHOR CONTRIBUTIONS
SP and C-KJ supervised overall research. J-WS contributed to paper writing and model development. J-SY performed the air pollutant and meteorological data analysis.