Could the Recent Zika Epidemic Have Been Predicted?

Given knowledge at the time, the recent 2015–2016 zika virus (ZIKV) epidemic probably could not have been predicted. Without the prior knowledge of ZIKV being already present in South America, and given the lack of understanding of key epidemiologic processes and long-term records of ZIKV cases in the continent, the best related prediction could be carried out for the potential risk of a generic Aedes-borne disease epidemic. Here we use a recently published two-vector basic reproduction number model to assess the predictability of the conditions conducive to epidemics of diseases like zika, chikungunya, or dengue, transmitted by the independent or concurrent presence of Aedes aegypti and Aedes albopictus. We compare the potential risk of transmission forcing the model with the observed climate and with state-of-the-art operational forecasts from the North American Multi Model Ensemble (NMME), finding that the predictive skill of this new seasonal forecast system is highest for multiple countries in Latin America and the Caribbean during the December-February and March-May seasons, and slightly lower—but still of potential use to decision-makers—for the rest of the year. In particular, we find that above-normal suitable conditions for the occurrence of the zika epidemic at the beginning of 2015 could have been successfully predicted at least 1 month in advance for several zika hotspots, and in particular for Northeast Brazil: the heart of the epidemic. Nonetheless, the initiation and spread of an epidemic depends on the effect of multiple factors beyond climate conditions, and thus this type of approach must be considered as a guide and not as a formal predictive tool of vector-borne epidemics.


INTRODUCTION
Zika virus (ZIKV, family Flaviviridae, genus flavivirus) disease is a viral illness transmitted primarily by the Aedes aegypti and Aedes albopictus mosquitoes (Abushouk et al., 2016). ZIKV has recently emerged as a major epidemic in Latin America and the Caribbean, with 738,783 suspected and confirmed cases reported to date (PAHO, 2017). Prior studies from Yapp Island suggest that the majority of ZIKV infections are asymptomatic or result in mild disease (Duffy et al., 2009), and initial studies from Latin America suggest that the ZIKV infections are less severe and less febrile than chikungunya (CHIKV) or dengue (DENV) infections (Waggoner et al., 2016). The spread of ZIKV has been accompanied by severe neurological complications, including children born with microcephaly (Calvet et al., 2016;Schuler-Faccini et al., 2016) and people with Guillain-Barré syndrome (Cao-Lormeau et al., 2016;PAHO, 2016b).
In a previous study (Muñoz et al., 2016a), our team analyzed the potential contribution of climate signals acting at different timescales in creating the environmental scenario for the current ZIKV epidemic. We found that suitable climate conditions were present, due to the co-occurrence of anomalously high temperatures and persistent below-normal rainfall in several regions of South America, especially in Brazil, the heart of the epidemic.
These suitable conditions are not only favorable for ZIKV, but in general enhance the probability of both Aedes sp. reproduction and viral replication. Due to the fact that ZIKV, DENV, and CHIKV share the same mosquito vectors and seem to have similar temperature dependence for their extrinsic incubation periods (Mordecai et al., 2017), there are advantages in considering the overall eco-epidemiological conditions for the potential risk of transmission of Aedes-borne arboviruses rather than focusing on the risk of transmission of only one disease. The effect of rainfall on Aedes sp. is more complex than temperature (e.g., Stewart-Ibarra and Lowe, 2013;, because Aedes vectors breed in domestic water containers which are more abundant during droughts and water shortages (Chretien et al., 2007). Their presence is also known to increase following unusually high rainfall when peri-domestic breeding sites (discarded containers, flower pots, tires, etc.) are filled with water.
The study of the different environment-virus-vector-human interactions in this field is normally performed using a diversity of mathematical models. Most of them are based on the Ross-McDonald model (Smith et al., 2012) or its generalizations. These models are commonly referred to as compartmental models, normally stratifying the population in susceptible (S), infected (I) and recovered (R) individuals (so-called SIR models). A set of coupled differential equations is used to describe the evolution of each compartment (Anderson and May, 1991;Murray, 2002). These models vary in complexity, and tend to be classified as homogeneous or heterogeneous models; for further details, see for example (Moreno et al., 2002).
Although these models are most frequently used to diagnose past or present epidemics, they can also be used in predictive mode, even at seasonal scale (see Thomson et al., 2006). Predicting conditions of environmental suitability presents a complex problem, but it is indeed less complex than predicting the occurrence and transmission of the diseases in human populations. The complexity resides in the nonlinear interactions between the different components of the coupled disease model system in consideration, in which the effects of population immunity and susceptibility, or different possible immunological interactions between the diseases (e.g., co-infections of DENV and ZIKV) are still not well understood. Nonetheless, some new studies are already considering some of these interactions (for recent ZIKV examples, see Ferguson et al., 2016;Lourenco et al., 2017;Perkins, 2017), underscoring-in addition to the role of climate-the importance of herd immunity and the frequency of viral re-introductions in the modulation of potential future outbreaks.
Here, we develop a new seasonal forecast system to assess suitable climate conditions for the transmission risk of Ae. aegypti-and Ae. albopictus-borne diseases. We use a twovector one-host basic reproduction number model driven by state-of-the-art climate forecasts to assess its predictive skill, and we discuss the implications for Latin America and the Caribbean. For brevity, in the following pages we will use "potential risk of transmission" to refer to potential transmission associated with climate conditions suitable for transmission of the aforementioned diseases. Data and general methods are presented in Section Data and Methods, the basic reproduction number model is discussed in Section Two-Vector One-Host Ento-Epidemiological Model, the skill assessment for different seasons of the year is analyzed in Section Skill Assessment and DJF 2014-2015 Forecast, and the concluding remarks are presented in Section Concluding Remarks.

DATA AND METHODS
The domain of study includes Latin America and the Caribbean, and is defined by the boundaries 120-25 • W and 60 • S-32 • N.
The observed monthly temperature and rainfall fields for the period 1950-2015 were obtained from the University of East Anglia Climate Research Unit product version 3.4 (CRUv3.4; Harris et al., 2014), available at a horizontal resolution of 0.5 degrees. These datasets were selected to be consistent with our previous study on a similar topic (Muñoz et al., 2016a). Tests indicated that the results are consistent with other large scale gridded climate datasets, such as the Climate Anomaly Monitoring System (CAMS, Global Historical Climatology Network version 2 Fan and van den Dool, 2008) used in Caminade et al. (2017).
State-of-the-art temperature and rainfall forecasts at monthly timescales were obtained from the North American Multi-Model Ensemble project (NMME; Kirtman et al., 2014), at a common horizontal resolution of 1 • × 1 • degrees. The total of 116 members available was used for the hindcast 1 period of 1982-2010, but only 104 members were used for the December-February 2014-15 forecast due to data availability (no members from the NCAR-CESM1 and NASA-GMAO models). Hindcasts and forecasts correspond to the month prior to the target season; for example, for the December-February season, the hindcast and forecast of November was used.
The vector model used in this work was recently developed by Caminade et al. (2017). For the sake of organization, the basic reproduction number model equations are presented in the next section. The model requires climate information, and thus the observations and NMME forecasts mentioned above were used, the first one for diagnostics and baseline validation, and the second one for the prognostic set up. The model was coded and executed in Matlab at a monthly timescale for a total of 792 months when forcing it with observed data, and 348 months per member when using the NMME hindcasts; each member was run independently before computing the ensemble and seasonal averages. The basic reproduction number model output, forced with both climate observations and hindcasts, is available online at the Latin American Observatory's Datoteca (Muñoz et al., 2010(Muñoz et al., , 2012(Muñoz et al., , 2016bChourio, 2016): http://datoteca.ole2.org/maproom/Sala_ de_Salud-Clima/ContexHist-Map-1/index.html.es. When analyzing the model forced with observations, standardization was performed with respect to the 1950-2015 period. Anomalies are defined as the value of the variable being analyzed minus its 1950-2015 average. To analyze inter-annual variability, a 12-month running average was computed. A linear detrending was used.
Skill was assessed using both Kendall's τ and the 2AFC score (Mason and Weigel, 2009), computed using the International Research Institute for Climate and Society (IRI) Climate Predictability Tool, CPT (Mason and Tippet, 2016), version 15.4.7. Kendall's τ is a non-parametric rank correlation coefficient used here to measure the overall association between observations and model output, with positive values indicating that the forecasts are better than using the average expected value (negative values imply that it is better to use the average expected value). The 2AFC score indicates the probability of correctly discriminating an observation in a higher category from one in a lower (e.g., an "above-normal" observation from a "normal" observation) given the forecasts expressed in deterministic form (i.e., the actual model values, and not the probabilities associated with them). The following four seasons were considered: December-February, DJF, March-May, MAM, June-August, JJA, and September-November, SON. A crossvalidation window of 5 years was used, for the 1982-2010 period. For each iteration, 5 years were left out and the remainder years were used to build the statistical model, forecasting the middle year of the 5-year window. This window is shifted 1 year into the future for the next iteration, and so on. The skill reported is the average of the metric computed for each iteration, and it was assessed after magnitude and spatial biases were corrected using a simple Model Output Statistics approach involving a Principal Component Regression (PCR; Mason and Baddour, 2008;Jolliffe and Stephenson, 2012), an option available in the CPT software. For further details see (Mason and Baddour, 2008).
Maps showing the 2AFC score computed using this methodology were produced for each of the seasons considered. Categories for above normal, normal, and below normal were identified in the vector model output using the typical 33.33 and 66.66% thresholds in the corresponding probability density function. Forecast probabilities for each category were computed using the PCR model built with the CPT package.

TWO-VECTOR ONE-HOST ENTO-EPIDEMIOLOGICAL MODEL
Both Ae. aegypti and Ae. albopictus are considered the most important vectors in Latin America and the Caribbean for the transmission of ZIKV, CHIKV, and DENV (e.g., Lambrechts et al., 2010;Li et al., 2012;Grard et al., 2014;Chouin-Carneiro et al., 2016;Gardner et al., 2016;Muñoz et al., 2016a;Mordecai et al., 2017) These vectors are known to have different susceptibilities to these diseases, as well as different feeding characteristics (Caminade et al., 2017). While Ae. aegypti and Ae. albopictus are considered to be a domestic and peri-domestic mosquito, respectively, it is in principle possible to find them coexisting in the same place (Li et al., 2014;Kraemer et al., 2015), something that is expected to be even more common in the near future due to global warming (Gardner et al., 2016;Lessler et al., 2016). Hence, we consider that an actionable seasonal forecast system should involve at least these two species for Latin America and the Caribbean. This section presents the model equations used by the prediction system.
As it has been shown by other authors (Turner et al., 2013;Caminade et al., 2017) the equations for the dynamics of a two-vector one-host SIR model, a generalization of the standard Ross-McDonald model (Smith et al., 2012), are where S H , I H , and R H are the number of susceptible, infectious and recovered hosts, respectively, associated with the Aedesborne disease of interest. S i , L i, and I i are the number of susceptible, latent and infectious vectors of kind i = 1,2 (Ae. aegypti and Ae. albopictus, respectively). In addition, and a i is the daily biting rate (a function of temperature), b i is the vector-to-host transmission probability, φ i quantifies the vector's preference for humans, m i is the vector-to-host ratio (a function of both temperature and rainfall; see Caminade et al., 2017 for details), β i is the host-to-vector transmission probability, r is the daily recovery rate, and ν i and µ i are the inverse of the extrinsic incubation period of the virus in days and the mortality rate, respectively, both a function of temperature. As in (Caminade et al., 2017), the vector-to-host-ratio m i is defined in terms of the probability of occurrence of the vectors (multiplied by 1,000), which was obtained in (Kraemer et al., 2015) using maximum and minimum annual rainfall to account for the presence of waterfilled containers, and other environmental variables involving temperature and urbanization; for details see the Materials and Methods section in (Kraemer et al., 2015). H and N i are the total number of hosts and the total number of the i-th kind of vector, respectively. This is a 5-compartmental model which includes infectious human host, latent Ae. aegypti vectors, latent Ae. albopictus vectors, infectious Ae. aegypti vectors and infectious Ae. albopictus vectors. If and are the new infectious rate appearing in a compartment and the rate at which individuals leave said compartment, respectively, then The basic reproduction number R 0 is the dominant eigen-value of the next-generation matrix (Caminade et al., 2017) for m,l = 1..5 identifying the different compartments, x, being a vector with the number of individuals in each compartment, and x 0 denoting the disease-free equilibrium state. The only nonzero elements K ml (new infections in compartment m produced by infectious individuals in compartment l) of K are K 13 = a 2 b 2 φ 2 ν 2 (ν 2 + µ 2 )µ 2 (13) R 0 is the largest eigenvalue solution of the eigenvalue problem |K − R 0 I| = 0: where, as the indices suggest, the first term in the square root corresponds to Ae. aegypti and the second one to Ae. albopictus. As in (Caminade et al., 2017), we set R 0 =0 in all locations and times for which the total monthly rainfall has not been at least 80 mm during a minimum of 5 months, a condition for stable transmission. This model has been reported (Caminade et al., 2017) to reproduce well the observed basic reproduction number obtained when using the relatively short record of ZIKV cases available in Latin America. Because of this, we have chosen the same values of the parameters and functional dependence on temperature and rainfall that was used in that study (Caminade et al., 2017).
The basic reproduction number can be understood as the expected number of new cases generated by a single (typical) infection in a completely susceptible population. It is a dimensionless number that can be associated with the potential risk of transmission of the disease, considering only basic environmental, entomological, and epidemiological information. Only values of R 0 >1, which are related to spreading of the epidemic in a fully susceptible population, were considered in this study.
The temperature dependence of certain parameters in the model (for example, the mortality rate µ i ; see Figure 1) strongly controls the spatial and temporal distribution of R 0 . Most of Latin America and the Caribbean typically exhibits high values of R 0 (Figure 2). The potential risk of transmission of Aedes-borne diseases is higher for the northern half of South America, especially in Brazil, most of Colombia, Venezuela, Guyana, Suriname, and the French Guyana, coastal Ecuador and the Ecuadorian and Peruvian Amazon. Central America and the Caribbean, although to a lesser degree, also exhibit high values of R 0 . Furthermore, with the increasing occurrence of high-temperature records, the frontier is extending farther into southern South America, in countries like Uruguay, which reported the first cases of autochthonous dengue fever in 2016 (WHO, 2016). Nonetheless, places that are too hot decrease the life expectancy of the vectors (roughly speaking, when temperatures exceed 40 • C, see Figure 1), and thus some regions in the future could start seeing a relative decrease in vector abundance if temperatures keep increasing.
An analysis of the evolution of the suitable conditions for transmission during 2013-2015 (Figure 3) complements the study on the associated temperature and rainfall anomalies  The neutral standardized anomalies in the Brazilian Nordeste (Northeast), one of the most affected regions in terms of the 2015 ZIKV outbreak, are attributed to the buffering role of the Atlantic Ocean in controlling the local temperatures. Still, neutral standardized anomalies in Nordeste are associated with R 0 ranging between 3.5 and 5.5, indicating a very high potential risk of transmission.
The high values of the 2015 standardized anomalies ( Figure 3C) are also consistent with the observed burden of other diseases like dengue; for example, the reported number of dengue cases for Ecuador in 2015 (42,667) was about 3 times larger than the average number of cases for 2011-2014 (14,467.5); for details see (PAHO, 2016a). Nonetheless, unpublished work of our team in Machala (coastal Ecuador) suggests that a high percentage of the 2015 dengue cases reported there are likely to be chikungunya cases. Even if that is the case, the model was able to capture enhanced conditions leading to a larger burden of Aedes-borne diseases.
The evolution of the spatially-averaged R 0 standardized anomalies for Latin America and the Caribbean exhibits a clear trend between 1950 and 2015 (black curve in Figure 3D), as reported by (Caminade et al., 2017), that is consistent with the persistent increase in temperatures observed in the region. Once the longer-term signals are filtered-out, the inter-annual component of the R 0 standardized anomalies (filled curved in Figure 3D), show a peak in 2015 that is the second-highest on record, following the largest one occurred during 1998. This slightly contrasts with the analysis performed by (Caminade et al., 2017); overall Figure 3D is telling the same story as Figure 3 in (Caminade et al., 2017), the main differences due to the use of a different dataset and mostly to the use of a 12-month running average in our case (see Section Data and Methods above). Our interpretation is consistent with our previous study on the 2015 climate conditions (Muñoz et al., 2016a): a superposition of longterm, decadal and inter-annual signals was responsible for the 2015 absolute maximum in the unfiltered time series (black curve in Figure 3D). Although most likely the 2015 El Niño had an important contribution, the maximum cannot be explained only by this inter-annual phenomenon.

SKILL ASSESSMENT AND DJF 2014-2015 FORECAST
A new seasonal forecast system for potential risk of transmission of Aedes-borne diseases can be developed by driving the R 0 model discussed in the previous section with a multi-model ensemble of climate predictions at seasonal scale. For this purpose, we have selected the set of coupled global models participating in the North American Multi-Model Ensemble project (Kirtman et al., 2014). Although, our focus is Latin America and the Caribbean, the same system can be used for other regions of the world, and a subset of the NMME models or a completely different Black empty curve and filled curve show the raw and linearly detrended standardized anomalies, respectively. A 12-month running average filter was applied to both curves to better capture the inter-annual variability. There is no data over the oceans. seasonal climate forecast system can be used straightforwardly if that provides higher skill for the particular region of interest.
In brief, the system uses the monthly climate information from each one of the 116 (or 104, if the target period is between 2010 and 2015) realizations of the NMME models to compute the associated value of the basic reproduction number for each grid box in our geographical domain. Although, the forecast horizon is typically 9 months after the initialization month, skill is normally higher for the first few seasons; to illustrate the approach here we focus on the first season starting immediately after the initialization month (e.g., JJA for forecasts initialized in May). After the multi-model ensemble and the seasonal average is computed, the output is corrected using a simple Principal Component Regression, which provided better results than other methods like Canonical Correlation Analysis or the use of the raw model output. For additional details, see Section Data and Methods.
The cross-validated analysis shows that there is relatively high skill (>60%, as measured by the 2AFC metric) for R 0 for all the seasons over the northern half of South America and several regions of Central and North America, and some Caribbean nations (Figure 4). Overall, the skill is higher in DJF and MAM (with Kendall's τ of 0.199 and 0.191, respectively), and minimum in JJA (0.123), SON being in the middle (0.146). These values of Kendall's τ are typical for rainfall predictions in the region, as can be seen in the Validation Maproom of the Latin American Observatory's Datoteca (Muñoz et al., 2010(Muñoz et al., , 2012Chourio, 2016): http://datoteca.ole2.org/maproom/Sala_de_Validacion/.
Regionally speaking, skill is higher in Mexico in JJA, especially in the south (Figure 4). Central American countries exhibit high skill (above 70% for most of them) for DJF and MAM, with the unskilled values (<50%) occurring in JJA and SON for Panama and Costa Rica. The western Caribbean tends to show higher skill during JJA, while the Central Caribbean and Lesser Antilles during MAM.
The northern part of South America shows relatively high skill (>70%) all year around, with the exception of some regions such as Ecuador, northern Peru, southwestern Colombia, northeastern Venezuela, and northern Guyana which show no skill during JJA and SON (Figure 4). The forecast system has in general low skill or no skill at all for southern South America, with some exceptions, e.g., the Bolivian Amazon in DJF, Paraguay and northern Argentina in SON, and northwestern Uruguay in DJF. Most of Brazil exhibits values of the 2AFC metric that are above 50% in all seasons, although southern Brazil has very low skill in MAM. In general, Chile and central and southern Argentina do not show potential risk of transmission with this model, and thus those regions appear in white in our skill maps (Figure 4).
To illustrate an example of the bias-corrected probabilistic forecasts produced by our system, we now consider the season preceding the first reported case of ZIKV in Brazil (May 2015 Faria et al., 2016Faria et al., , 2017Kindhauser et al., 2016): DJF 2014-2015. The probabilistic prediction indicates that there were mostly conditions for above-normal risk of transmission in eastern Brazil, which is similar to the observed conditions ( Figure 5). Nonetheless, below-normal conditions were in general no forecast in the ZIKV hotspot places (in Brazil, for example), and as discussed above, the normal category the northern half of South America is already conducive to epidemic conditions. Hence, we claim that this particular forecast, even if not perfect, could have been useful for decision-makers at the time (November 2014), assuming that they already knew that ZIKV was already circulating in the region, which was of course not the case.
The previous example also illustrates why this tool can only be used as a guide for the local and international experts, as these diseases involve complex interactions beyond the presence or not of enhanced environmental (climatic) conditions suitable for the occurrence and transmission of Aedes-borne epidemics.

CONCLUDING REMARKS
We have discussed the development and predictive skill of a new probabilistic forecast system to estimate climatic suitable conditions for potential risk of transmission of diseases like ZIKV, DENV, and CHKV. To the best of our knowledge this is the first seasonal forecast system of this type for Latin America and the Caribbean, although it is conceptually similar to a malaria forecast system developed for Africa years ago (Thomson et al., 2006). Instead of focusing on the different Aedes-borne diseases separately (for which some model parameters are still uncertain or are actually unknown, as for example in the case of zika), our approach addresses suitable conditions for the risk of transmission of these diseases as a whole. This idea is consistent with the information required by international health agencies and general health practitioners. As a matter of fact, although the two-vector model used in this study was developed by Caminade et al. (2017) for zika using some denguelike parameters, they reported notorious epidemic hotspots for 2015 for Angola and the Democratic Republic of Congo (which reported very active circulation of yellow fever), and for India (which reported high number of dengue cases in the south of the country). Although further verification studies are needed, these results seem to support our argument for a generalized potential risk of transmission of Aedes-borne diseases. From the regional perspective, this forecast system has the potential to help the Pan-American Health Organization (PAHO), the World Health Organization (WHO) and other decision-makers to prepare more detailed epidemiological alerts and guides for zika's surveillance and other arboviruses; to calculate different levels of population at risk and incidence rates for regional assessment, to prepare vector control guidelines for a more integrated management; to plan and support vector control resources an equipment; to organize and program activities and resource mobilization, as well as improve risk communication materials. One of the co-authors (PN) has already started to explore ways to take advantage of this forecast system at PAHO/WHO.
Our system is a first attempt to provide predictive tools for health practitioners and decision-makers interested in Aedesborne diseases in Latin America and the Caribbean, and can be considered an additional step in the direction followed by other research groups (Kraemer et al., 2015;Carlson et al., 2016;Lessler et al., 2016;Messina et al., 2016;Monaghan et al., 2016;Samy et al., 2016;Caminade et al., 2017).
Indeed, forecasts of health events are designed to change human behavior. Nonetheless, as with the practice of medicine, there are ethical issues to consider. It is possible that there might be negative consequences from an epidemic risk forecast (i.e., incidence, or cases), even if the prediction is skillful. To illustrate this idea, consider that a forecast for ZIKV is provided to the community, indicating that there is above 80% probability of acquiring the disease in Rio de Janeiro during a certain season, but less than 10% probability of infection in Montevideo. People-some of whom could already be infected with ZIKV, or even with a different disease-might decide to travel to Montevideo instead of Rio de Janeiro because of that forecast, thus igniting or being part of a new focus of an epidemic there, that was not predicted and that is partially caused by the original prediction itself. This is an important caveat to be considered by the decision-makers. Another consideration is that a ZIKV forecast may have negative consequences for tourism, leading to livelihood impacts that may have negative health consequences.
There are a number of important limitations related to our forecast system. As indicated earlier in this paper, by itself this kind of system cannot forecast the occurrence and spread of new epidemics, but only partial conditions for that to happen. The model employed here only considers the effect of climatic conditions, through temperature and rainfall, on disease transmission via the vectors and viruses of interest. Direct human-to-human transmission via sexual intercourse and blood transfusion are outside the scope of this modeling approach. Also, the present version of the model cannot simulate co-infections or mixed states (e.g., a fraction of the population recovered from dengue but still susceptible to zika infections).
One particular way in which the model needs to be improved involves how rainfall is considered. The present version of the model only uses rainfall in a rather simplistic way, without really considering its seasonal characteristics. There are examples in the scientific literature that could be used to improve the representation of rainfall in this type of model (see for example, Magori et al., 2009;Santos et al., 2009;Morin and Comrie, 2010). In addition, it is key to have a good representation in the model of immunity and viral re-introduction (see Ferguson et al., 2016;Lourenco et al., 2017). There is also room to consider a better set of realizations in the ensemble of simulations, varying the ento-epidemiological parameters of the model. These options for further model development will be explored in the near future.

AUTHOR CONTRIBUTIONS
ÁM and MT established the concept of the study. ÁM obtained the data. All authors undertook the analysis and interpretation of results. ÁM, MT, and AS drafted the manuscript. All authors critically reviewed and revised the manuscript and agreed the final submission.