<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Sustain. Food Syst.</journal-id>
<journal-title>Frontiers in Sustainable Food Systems</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Sustain. Food Syst.</abbrev-journal-title>
<issn pub-type="epub">2571-581X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fsufs.2020.00052</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Sustainable Food Systems</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Data Science for Weather Impacts on Crop Yield</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Konduri</surname> <given-names>Venkata Shashank</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/625563/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Vandal</surname> <given-names>Thomas J.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/727551/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Ganguly</surname> <given-names>Sangram</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Ganguly</surname> <given-names>Auroop R.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Sustainability and Data Sciences Laboratory, Department of Civil and Environmental Engineering, Northeastern University</institution>, <addr-line>Boston, MA</addr-line>, <country>United States</country></aff>
<aff id="aff2"><sup>2</sup><institution>NASA Ames Research Center, Bay Area Environmental Research Institute</institution>, <addr-line>Moffett Field, CA</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Naoki Abe, IBM Research, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Molly E. Brown, University of Maryland, United States; Haishun Yang, University of Nebraska-Lincoln, United States</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Venkata Shashank Konduri <email>konduri.v&#x00040;northeastern.edu</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Climate-Smart Food Systems, a section of the journal Frontiers in Sustainable Food Systems</p></fn></author-notes>
<pub-date pub-type="epub">
<day>19</day>
<month>05</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="collection">
<year>2020</year>
</pub-date>
<volume>4</volume>
<elocation-id>52</elocation-id>
<history>
<date date-type="received">
<day>08</day>
<month>02</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>07</day>
<month>04</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2020 Konduri, Vandal, Ganguly and Ganguly.</copyright-statement>
<copyright-year>2020</copyright-year>
<copyright-holder>Konduri, Vandal, Ganguly and Ganguly</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>Private businesses in sectors, such as food, energy, and retail, as well as public sector and federal agencies are interested in the predictive understanding of weather impacts on crop yield, which is an important aspect of food security. Scientific literature has mainly examined how crop yield is impacted by growing season-averaged weather indices. Although a few studies did consider weather extremes in their analysis, their scope was either restricted to measuring their conditional relationship with yield or the extreme event types considered were limited. Selection of regression models, whether the more commonly used linear approaches or nonlinear methods, have not been appropriately justified in this context. Here, we develop data-driven methods to examine two inter-related hypotheses for improved scientific understanding and enhanced predictive modeling. The first hypothesis, that extreme weather indices have a statistically significant information content in them is found to be valid based on linear and nonlinear methods for pairwise dependence. The second hypothesis, examines the value addition of nonlinear regression methods, and suggests that linear approaches may not alone be adequate. The results of this study can inform scientific understanding, generation and relevance of indices and end-to-end risk assessment systems in the context of climate impacts on crop yield. An immediate application may be in the context of NASA Earth Exchange (NEX) which facilitates the generation and dissemination of impacts relevant weather data and indices using a multitude of satellite-derived data sets and model outputs.</p></abstract>
<kwd-group>
<kwd>crop yield</kwd>
<kwd>weather indices</kwd>
<kwd>nonlinear regression</kwd>
<kwd>pairwise dependence</kwd>
<kwd>food security</kwd>
</kwd-group>
<counts>
<fig-count count="6"/>
<table-count count="2"/>
<equation-count count="13"/>
<ref-count count="49"/>
<page-count count="11"/>
<word-count count="6857"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Several studies have shown that the global food production would have to double by 2050 to meet the needs of rising population and diet shifts (Bruinsma, <xref ref-type="bibr" rid="B7">2009</xref>; Tilman et al., <xref ref-type="bibr" rid="B43">2011</xref>; OECD and Food and Agriculture Organization of the United Nations, <xref ref-type="bibr" rid="B33">2012</xref>). However, a prior study found that the current growth rates in yield for the major cereals grown across the globe are insufficient to achieve this target (Ray et al., <xref ref-type="bibr" rid="B36">2013</xref>). According to the Fifth Assessment Report (AR5) of the Intergovernmental Panel on Climate Change (IPCC), surface temperature is projected to rise over the twenty-first century under all assessed emission scenarios with a high degree of likelihood of an increase in the intensity and duration of heat waves and extreme precipitation events in many regions (IPCC, <xref ref-type="bibr" rid="B14">2013b</xref>). This is expected to cause a significant decline in the global crop production (Gourdji et al., <xref ref-type="bibr" rid="B10">2013</xref>; Deryng et al., <xref ref-type="bibr" rid="B8">2014</xref>), thus making the world more food insecure in future. <xref ref-type="fig" rid="F1">Figures 1A,B</xref> are graphics taken from the IPCC AR5 Working Group 2 report on Food Security and Food Production Systems (IPCC, <xref ref-type="bibr" rid="B13">2013a</xref>) which provide a summary of results from several studies on the impact of climate change on yields for four major crops grown in different regions of the world. An overwhelming majority of these studies show a declining trend in yields over the historical period 1960&#x02013;2013 (shown in <xref ref-type="fig" rid="F1">Figure 1A</xref>), with several of them also projecting major declines in future across different regions of the globe, especially toward the end of the twenty-first century (shown in <xref ref-type="fig" rid="F1">Figure 1B</xref>). The threat to food security from climate change is a critical issue for a number of businesses like food and beverage, retail, agriculture, insurance, biofuels, transportation and weather analytics. With the world population expected to hit 9 Billion by 2050, governments across the globe need to be well-equipped to deal with supply shocks in major cereals.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>As per the Fifth Assessment Report (AR5) of the IPCC (IPCC, <xref ref-type="bibr" rid="B13">2013a</xref>), changes in temperature and precipitation patterns are expected to cause a significant decline in global crop production. Climate change is also expected to increase the inter-annual variability in yields across different regions. <bold>(A)</bold> Summary of estimated impacts of historical changes in climate (1960-2013) on yields for four major crops grown in different regions across the globe. Numbers in brackets for each category represent the number of studies. <bold>(B)</bold> Summary of projected changes in yield over the twenty-first century. This includes projections for different emission scenarios, for temperate and tropical regions, with and without adaptation.</p></caption>
<graphic xlink:href="fsufs-04-00052-g0001.tif"/>
</fig>
<p>Statistical models and, more recently, tools from machine learning have been used to model crop yield variability using weather indices as inputs. Previous studies have shown the importance of growing season-averaged temperature and precipitation in explaining crop yield variability (Schlenker and Roberts, <xref ref-type="bibr" rid="B39">2009</xref>; Lobell and Burke, <xref ref-type="bibr" rid="B22">2010</xref>; Lobell and Field, <xref ref-type="bibr" rid="B23">2011</xref>; Lobell et al., <xref ref-type="bibr" rid="B25">2011b</xref>; Urban et al., <xref ref-type="bibr" rid="B45">2012</xref>; Osborne and Wheeler, <xref ref-type="bibr" rid="B31">2013</xref>; Moore and Lobell, <xref ref-type="bibr" rid="B29">2014</xref>; Ray et al., <xref ref-type="bibr" rid="B35">2015</xref>). However, extreme weather events from the recent past, like the droughts in Russia in 2010-2011 and in United States (U.S.) in 2012 and their impact on the regional crop production and global commodity markets has clearly made the case to also consider weather extremes in crop yield modeling (Otto et al., <xref ref-type="bibr" rid="B32">2012</xref>). Winter Wheat, for example, has been shown to be particularly susceptible to freezing temperatures during Fall and to heat stress during grain filling and stem elongation (Tack et al., <xref ref-type="bibr" rid="B42">2015</xref>). This vulnerability to extreme temperatures is believed to be the reason behind a decline in wheat yields across Europe (Brisson et al., <xref ref-type="bibr" rid="B6">2010</xref>). As per a different study (Schauberger et al., <xref ref-type="bibr" rid="B38">2017</xref>), each day above 30&#x000B0;C causes a decline in maize and soybean yields by upto 6% under rainfed conditions. Similarly, the interannual variation in rainfall also has a crucial role to play in crop growth. Although a few studies did consider extreme weather indices in their analysis, their scope was either restricted to measuring conditional relationship with yields (Troy et al., <xref ref-type="bibr" rid="B44">2015</xref>) or the extreme event types considered were limited (Lobell and Burke, <xref ref-type="bibr" rid="B22">2010</xref>; Lesk et al., <xref ref-type="bibr" rid="B20">2016</xref>). Non-linear and threshold-type relationships have been shown to exist between yields and weather indices (Schlenker and Roberts, <xref ref-type="bibr" rid="B39">2009</xref>; Lobell et al., <xref ref-type="bibr" rid="B21">2011a</xref>; Troy et al., <xref ref-type="bibr" rid="B44">2015</xref>). However, most of the previous studies have modeled this nonlinearity using regression models with quadratic terms for mean weather indices without appropriate justification. Understanding the exact relationship between weather outcomes and yield is essential given that a prior study reported a significant stagnation and declines in yield for major cereal crops on more than a quarter of global croplands (Ray et al., <xref ref-type="bibr" rid="B37">2012</xref>).</p>
</sec>
<sec id="s2">
<title>2. Research Questions and Hypotheses</title>
<p>This study addresses the following two research questions:</p>
<list list-type="order">
<list-item><p>Are extreme weather indices relevant in crop yield modeling?</p></list-item>
<list-item><p>Are nonlinear regression models better at capturing crop yield variability than linear approaches?</p></list-item>
</list>
<p>Using linear and nonlinear measures for pairwise dependence along with a suite of linear and nonlinear regression models, this study tries to understand the nature of the crop yield-weather relationship with the hypotheses that extreme weather indices have a statistically significant information content and that nonlinear regression models capture yield variability better than linear approaches.</p>
</sec>
<sec id="s3">
<title>3. Data</title>
<p>In addition to mean weather indices like growing season-averaged maximum and minimum temperature and growing season-averaged precipitation, this study also considered extreme weather indices, as defined by the CCI/CLIVAR/JCOMM Expert team on Climate Change Detection and Indices (ETCCDI) (Karl et al., <xref ref-type="bibr" rid="B17">1999</xref>), as predictors in the regression models. <xref ref-type="table" rid="T1">Table 1</xref> provides the list of mean and extreme weather indices along with their definitions. The crop considered for this study was Corn (Maize), a major agricultural input to food production. The U.S. is the largest producer and exporter of this crop with 36% of the world&#x00027;s production (Schlenker and Roberts, <xref ref-type="bibr" rid="B39">2009</xref>). The majority of the U.S. corn production takes place in the midwest region (also known as the &#x0201C;Corn Belt&#x0201D;). The county of Cerro Gordo situated in the state of Iowa in the U.S. midwest was chosen as the area of interest for this study. Yearly values for corn yield (measured in bushels/acre) were collected over a 76-years period starting from 1940 to 2015 from the NASS portal of the USDA (USDA, <xref ref-type="bibr" rid="B46">2010</xref>) for this county. The time series of corn yield over this period, shown in <xref ref-type="fig" rid="F2">Figure 2</xref>, has a strong positive trend due to advancements in farming technology over the years. In order to account for this trend, the year corresponding to the yield was used as one of the predictors in the regression model.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Predictor variables used for studying the impact of mean and extreme weather on corn yield.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Type</bold></th>
<th valign="top" align="left"><bold>Predictor</bold></th>
<th valign="top" align="left"><bold>Definition</bold></th>
</tr>
</thead>
<tbody>
<tr style="border-bottom: thin solid #000000;">
<td/>
<td valign="top" align="left">Year</td>
<td valign="top" align="left">The year was included as one of the predictors in order to account for the time series trend due to technological advances</td>
</tr> <tr>
<td valign="top" align="left">Mean Weather Indices</td>
<td valign="top" align="left">Growing Season Precipitation (<italic>GSP</italic>)</td>
<td valign="top" align="left">Precipitation averaged over the growing season</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Growing Degree Days (<italic>GDD</italic>)</td>
<td valign="top" align="left">It is a heat index that can be used to predict when a crop will reach maturity. Each day&#x00027;s GDD is calculated by subtracting the reference temperature (10&#x000B0;C) from the mean temperature for that day. GDD for the growing season is found by adding all the daily GDDs.</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Growing Season T<sub><italic>max</italic></sub> (<italic>GST</italic><sub><italic>max</italic></sub>)</td>
<td valign="top" align="left">Daily maximum temperature (T<sub><italic>max</italic></sub>) averaged over the growing season</td>
</tr>
<tr style="border-bottom: thin solid #000000;">
<td/>
<td valign="top" align="left">Growing Season T<sub><italic>min</italic></sub> (<italic>GST</italic><sub><italic>min</italic></sub>)</td>
<td valign="top" align="left">Daily minimum temperature (T<sub><italic>min</italic></sub>) averaged over the growing season</td>
</tr> <tr>
<td valign="top" align="left">Extreme Weather Indices</td>
<td valign="top" align="left">Frost Days</td>
<td valign="top" align="left">Number of days during the growing season when T<sub><italic>min</italic></sub> &#x0003C; 0&#x000B0;C</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Summer Days</td>
<td valign="top" align="left">Number of days during the growing season when T<sub><italic>max</italic></sub> &#x0003E; 25&#x000B0;C</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Heat Wave Index</td>
<td valign="top" align="left">No. of consecutive days during the growing season when the T<sub><italic>max</italic></sub> for a particular day is greater than the calendar day 90th percentile for the base period 1961&#x02013;1990</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Cold Wave Index</td>
<td valign="top" align="left">No. of consecutive days during the growing season when the T<sub><italic>min</italic></sub> for a particular day is less than the calendar day 10th percentile for the base period 1961&#x02013;1990</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Longest Dry Spell</td>
<td valign="top" align="left">Maximum number of consecutive days when precipitation &#x0003C;1 mm</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Longest Wet Spell</td>
<td valign="top" align="left">Maximum number of consecutive days when precipitation &#x0003E; 1 mm</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">95<sup>th</sup> percentile precipitation (<italic>prcp</italic>95<italic>p</italic>)</td>
<td valign="top" align="left">No. of days during the growing season when the precipitation is greater than the 95<sup>th</sup> percentile of the base period 1961&#x02013;1990.</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The weather indices used in this study were chosen from a list of 27 indices that were compiled by the CCI/CLIVAR/JCOMM Expert team on Climate Change Detection and Indices (ETCCDI) (Karl et al., <xref ref-type="bibr" rid="B17">1999</xref>)</italic>.</p>
</table-wrap-foot>
</table-wrap>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Time series of yearly corn yield (bushels/acre) for Cerro Gordo county over the period 1940&#x02013;2015. The strong positive trend in the time series can be attributed to advancements in farming technology over the years.</p></caption>
<graphic xlink:href="fsufs-04-00052-g0002.tif"/>
</fig>
<p>Data for three weather variables: daily maximum temperature (<italic>T</italic><sub><italic>max</italic></sub>) in &#x000B0;C, daily minimum temperature (<italic>T</italic><sub><italic>min</italic></sub>) in &#x000B0;C and daily precipitation (<italic>Precip</italic>) in mm were collected for the period of interest for three weather stations within the county from the Global Historical Climate Network (GHCN) daily database (Menne et al., <xref ref-type="bibr" rid="B27">2012</xref>) using the Climate Data online portal of the National Oceanic and Atmospheric Administration (NOAA) (NOAA, <xref ref-type="bibr" rid="B30">2018</xref>). The county-averaged time series of weather was created by taking an average of the daily data from the three stations, as shown in <xref ref-type="fig" rid="F3">Figure 3</xref>. May 10<sup>th</sup> and Oct 20<sup>th</sup> were chosen as the start (sowing) and end (harvesting) dates for the growing season and were kept constant over the entire period of interest. Any fluctuations in weather occurring outside the growing period were assumed to have no impact on crop growth. The predictor and response variables were normalized prior to their use by subtracting the mean and dividing by their standard deviation.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Data for daily maximum and minimum temperature and daily precipitation was collected from three weather stations in Cerro Gordo county, Iowa over the period 1940&#x02013;2015. Daily county-averaged values for these variables were generated by taking an average over the daily values from the three stations.</p></caption>
<graphic xlink:href="fsufs-04-00052-g0003.tif"/>
</fig>
</sec>
<sec sec-type="methods" id="s4">
<title>4. Methods</title>
<sec>
<title>4.1. Correlation Between Yield and Weather Indices</title>
<p>Previous studies have used linear correlation measures, such as Pearson correlation coefficient, to estimate the conditional dependence of yield on weather indices. However, multiple studies have shown that this relationship is actually nonlinear and is characterized by the existence of critical thresholds. This study, therefore, uses a correlation coefficient which gives a measure of the overall dependence (linear and nonlinear) between yield and each of the mean and extreme weather indices. This correlation coefficient, namely Mutual Information, is defined in the following section.</p>
<sec>
<title>4.1.1. Mutual Information</title>
<p>The basic intuition behind information theory is the idea of characterizing the &#x0201C;unpredictability&#x0201D; of a random variable, also known as <italic>information entropy</italic>. For a random variable <italic>X</italic> which takes on values in the set &#x003C7; = {<italic>x</italic><sub>1</sub>,<italic>x</italic><sub>2</sub>,&#x02026;,<italic>x</italic><sub><italic>n</italic></sub>} with a probability mass function <italic>p</italic>(<italic>x</italic>), the entropy <italic>H</italic>(<italic>X</italic>) can be formulated as</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>H</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mi>&#x003B5;</mml:mi><mml:mi>&#x003C7;</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The negative sign ensures that entropy is always positive or zero. <italic>H</italic>(<italic>X</italic>) can be seen as being approximately equal to how much information we learn from one instance of the random variable <italic>X</italic>. The information content will be high when the probability is low and vice versa.</p>
<p>Mutual Information (MI) measures how much a random variable tells us about another and is closely related to the concept of entropy. MI for two random variables <italic>X</italic> and <italic>Y</italic>, denoted by <italic>I</italic>(<italic>X</italic>; <italic>Y</italic>) can be stated as</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>I</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>;</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>H</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mi>H</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>|</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where, <italic>H</italic>(<italic>X</italic>|<italic>Y</italic>) is the conditional entropy for <italic>X</italic> given <italic>Y</italic>. <italic>I</italic>(<italic>X</italic>; <italic>Y</italic>) measures the average reduction in uncertainty about <italic>X</italic> that results in learning the value of <italic>Y</italic> (MacKay, <xref ref-type="bibr" rid="B26">2003</xref>). It is a more general form of correlation coefficient, providing an overall measure of dependence (linear and nonlinear) between two variables (Fraser and Swinney, <xref ref-type="bibr" rid="B9">1986</xref>). The larger the value of MI, the greater is the relationship between the two variables. It is an important statistic when analyzing time series from non-linear systems (Moon et al., <xref ref-type="bibr" rid="B28">1995</xref>). The MI between two random variables <italic>X</italic> and <italic>Y</italic> with joint probability mass function <italic>p</italic>(<italic>x, y</italic>) and marginal probability density functions (<italic>pdfs</italic>) <italic>p</italic>(<italic>x</italic>) and <italic>p</italic>(<italic>y</italic>) is defined as</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>I</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>;</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mi>&#x003B5;</mml:mi><mml:mi>&#x003C7;</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mi>&#x003B5;</mml:mi><mml:mi>&#x003A5;</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mfrac><mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
</sec>
<sec>
<title>4.1.2. Estimate for Mutual Information</title>
<p>Estimates for MI were obtained using a procedure similar to the one used by Khan et al. (<xref ref-type="bibr" rid="B18">2006</xref>). The estimation of MI requires the estimation of joint and marginal <italic>pdfs</italic>, which were approximated using kernel density estimators (KDE).</p>
<p>For any bivariate dataset (<italic>X, Y</italic>) of size <italic>N</italic>, the estimate for MI, <inline-formula><mml:math id="M4"><mml:mover accent="false"><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>;</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, is given as</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mover accent="false"><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>;</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mover accent="false"><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>X</mml:mi><mml:mi>Y</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mover accent="false"><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>X</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mover accent="false"><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>Y</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M6"><mml:msub><mml:mrow><mml:mover accent="false"><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>X</mml:mi><mml:mi>Y</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> is the estimated joint <italic>pdf</italic> and <inline-formula><mml:math id="M7"><mml:msub><mml:mrow><mml:mover accent="false"><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>X</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M8"><mml:msub><mml:mrow><mml:mover accent="false"><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>Y</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> are the estimated marginal <italic>pdfs</italic> at (<italic>x</italic><sub><italic>i</italic></sub>, <italic>y</italic><sub><italic>i</italic></sub>) (Khan et al., <xref ref-type="bibr" rid="B18">2006</xref>).</p>
<p>A gaussian kernel was used for the multivariate kernel density estimator, which is defined as</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M9"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mover accent="false"><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>X</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:msup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msqrt><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>2</mml:mn><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msup><mml:mo>|</mml:mo><mml:mi>S</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:msup><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:msup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mrow></mml:msup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>N</italic> is the number of data points; <italic>x</italic> and <italic>x</italic><sub><italic>i</italic></sub> are the <italic>d</italic>-dimensional vectors; <italic>S</italic> is the covariance matrix on the <italic>x</italic><sub><italic>i</italic></sub> and <italic>h</italic> is the kernel bandwidth. For this study, the kernel bandwidth is chosen as <inline-formula><mml:math id="M10"><mml:mi>h</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mn>4</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>2</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>4</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mfrac><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>4</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:msup></mml:math></inline-formula>. The MI estimates were obtained by first estimating <inline-formula><mml:math id="M11"><mml:msub><mml:mrow><mml:mover accent="false"><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>X</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math id="M12"><mml:msub><mml:mrow><mml:mover accent="false"><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>Y</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, and <inline-formula><mml:math id="M13"><mml:msub><mml:mrow><mml:mover accent="false"><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>X</mml:mi><mml:mi>Y</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> using Equation (5) and then using them in Equation (4). The value of MI can vary from 0 to &#x0221E;. In order to compare the linear and nonlinear dependence measures, a scaled estimate for MI, denoted as <inline-formula><mml:math id="M14"><mml:mover accent="false"><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> and ranging from 0 to 1 (Joe, <xref ref-type="bibr" rid="B15">1989</xref>; Granger and Lin, <xref ref-type="bibr" rid="B11">1994</xref>), is defined as</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M15"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mover accent="false"><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mn>2</mml:mn><mml:mover accent="false"><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>;</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:msqrt></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Pearson correlation coefficient (&#x003C1;), defined in Equation (7), was used to measure linear dependence between two random variables <italic>X</italic> and <italic>Y</italic>.</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M16"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>&#x003C1;</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mstyle displaystyle="true"><mml:msubsup><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msubsup></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x0002D;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mi>&#x00233;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msqrt><mml:mrow><mml:mstyle displaystyle="true"><mml:msubsup><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msubsup></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x0002D;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:msqrt><mml:msqrt><mml:mrow><mml:mstyle displaystyle="true"><mml:msubsup><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msubsup></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mi>&#x00233;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>In statistics, it is common to estimate the bias and standard error of an estimate. The bias-corrected estimates for <inline-formula><mml:math id="M17"><mml:mover accent="false"><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> and &#x003C1; were obtained using jackknife resampling. Resampling was performed using 100 samples of size 0.8 N. The bias for <inline-formula><mml:math id="M18"><mml:mover accent="false"><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> was calculated as <inline-formula><mml:math id="M19"><mml:mover accent="false"><mml:mrow><mml:mi>b</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>s</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> = <inline-formula><mml:math id="M20"><mml:mover accent="false"><mml:mrow><mml:mi>&#x003BB;</mml:mi><mml:mo>&#x0002A;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> - <inline-formula><mml:math id="M21"><mml:mover accent="false"><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula>, where <inline-formula><mml:math id="M22"><mml:mover accent="false"><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> is the original estimate for scaled MI calculated using all <italic>N</italic> observations and <inline-formula><mml:math id="M23"><mml:mover accent="false"><mml:mrow><mml:mi>&#x003BB;</mml:mi><mml:mo>&#x0002A;</mml:mo></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> is the mean of all jackknife replications. The bias-corrected estimator, <inline-formula><mml:math id="M24"><mml:mover accent="true"><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mo>&#x0002D;</mml:mo></mml:mover></mml:math></inline-formula>, was defined as <inline-formula><mml:math id="M25"><mml:mover accent="true"><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mo>&#x0002D;</mml:mo></mml:mover></mml:math></inline-formula> = <inline-formula><mml:math id="M26"><mml:mover accent="false"><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> - <inline-formula><mml:math id="M27"><mml:mover accent="false"><mml:mrow><mml:mi>b</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>s</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula>. The lower and upper bounds of 90% confidence bounds were defined as the 5% and 95% quantiles of the 100 jackknife samples, respectively (Khan et al., <xref ref-type="bibr" rid="B18">2006</xref>). The same method was used to obtain the bias-corrected estimate and error bounds for &#x003C1;.</p>
</sec>
</sec>
<sec>
<title>4.2. Linear Regression</title>
<p>Prior to fitting a Multiple Linear Regression (MLR) model with <italic>P</italic> predictors, Pearson correlation coefficient (&#x003C1;) was calculated between each pair of predictor variables to measure pairwise linear dependence, as shown in <xref ref-type="fig" rid="F4">Figure 4</xref>. Many pairs of mean and extreme weather indices were found to have a high absolute value of &#x003C1; with one another, implying the presence of a strong positive/negative linear relationship between them. Notable among them are indices like <italic>GST</italic><sub><italic>max</italic></sub>, <italic>GST</italic><sub><italic>min</italic></sub>, Summer Days and Heat Wave indices which have a strong positive correlation between them. On the other hand, indices like Frost Days and <italic>GST</italic><sub><italic>min</italic></sub> have a strong negative linear relationship. The problem of multicollinearity is quite common in weather data and needs to be addressed prior to fitting a linear regression model. Multicollinearity inflates the standard errors of the regression coefficients, making them highly sensitive to minor changes in the model.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Pearson correlation coefficient was used to measure pairwise linear dependence between predictors. Many pairs of mean and extreme weather indices were found to have high absolute values of correlation, indicating the presence of a strong positive/negative linear relationship. The problem of multicollinearity is quite common in weather data and needs to be addressed before fitting any statistical model.</p></caption>
<graphic xlink:href="fsufs-04-00052-g0004.tif"/>
</fig>
<sec>
<title>4.2.1. Principal Component Regression</title>
<p>In order to address the issue of multicollinearity, dimensionality reduction using Principal Component Analysis (PCA) was performed. PCA does feature extraction by taking projections of data along axes of maximum variance (principal components) which are independent of one another (Jolliffe, <xref ref-type="bibr" rid="B16">1986</xref>). Principal Component Regression (PCR) uses these principal components (PCs) as inputs instead of the original correlated features. The appropriate number of PCs to be used as inputs for the MLR model was determined with the help of a cumulative plot of the proportion of variance explained, as shown in <xref ref-type="fig" rid="F5">Figure 5</xref>. By setting a threshold of 95% for the accumulated explained variance, the number of components chosen for the regression was 8. After randomly shuffling the data, about 80% (60 out of 76 samples) was used for fitting the linear model, with the rest used for testing. The resulting PCR model is shown in Equation (8)</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="M28"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>8</mml:mn></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>.</mml:mo><mml:mi>P</mml:mi><mml:msub><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003B2;<sub><italic>i</italic></sub> are the coefficients.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>An important step in using Principal Component Regression is the ability to decide how many principal components are needed to describe the data. This can be determined with the help of a plot of cumulative explained variance as a function of the number of principal components. Setting a threshold of 95%, the number of principal components selected for the study was 8.</p></caption>
<graphic xlink:href="fsufs-04-00052-g0005.tif"/>
</fig>
</sec>
<sec>
<title>4.2.2. Ridge Regression</title>
<p>Ridge regression is a technique for creating a multiple regression model for data that are highly correlated (Hoerl and Kennard, <xref ref-type="bibr" rid="B12">1970</xref>). By adding a degree of bias to the model coefficients, ridge regression reduces their variance, thus giving estimates that are more reliable. Equation (9) represents a multiple linear regression model between corn yield and the 12 predictors, with &#x003B2;<sub><italic>j</italic></sub> representing the coefficients. In addition to minimizing the deviation from <italic>y</italic><sub><italic>i</italic></sub>, the objective function for ridge regression, shown in Equation (10), also includes a penalty term that shrinks the coefficient values closer to the &#x0201C;true&#x0201D; population parameters. This penalty term, also referred to as <italic>L</italic>2 regularization, equals the square of the magnitude of coefficients. The tuning parameter (&#x003BB;) controls the strength of regularization. When &#x003BB; = 0, ridge regression reduces to a multiple linear regression and when &#x003BB; = &#x0221E;, all of the coefficients drop to 0.</p>
<disp-formula id="E9"><label>(9)</label><mml:math id="M29"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mrow><mml:mi>o</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>12</mml:mn></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E10"><label>(10)</label><mml:math id="M30"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mrow><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>g</mml:mi><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:msubsup><mml:mrow><mml:msup><mml:mstyle mathsize="140%" displaystyle="true"><mml:mo>&#x02211;</mml:mo></mml:mstyle><mml:mtext>&#x000A0;</mml:mtext></mml:msup></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mrow><mml:msup><mml:mstyle mathsize="140%" displaystyle="true"><mml:mo>&#x02211;</mml:mo></mml:mstyle><mml:mtext>&#x000A0;</mml:mtext></mml:msup></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>12</mml:mn></mml:mrow></mml:msubsup><mml:msub><mml:mi>&#x003B2;</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:mi>&#x003BB;</mml:mi><mml:msubsup><mml:mrow><mml:msup><mml:mstyle mathsize="140%" displaystyle="true"><mml:mo>&#x02211;</mml:mo></mml:mstyle><mml:mtext>&#x000A0;</mml:mtext></mml:msup></mml:mrow><mml:mrow><mml:mn>12</mml:mn></mml:mrow><mml:mrow><mml:mn>12</mml:mn></mml:mrow></mml:msubsup><mml:msubsup><mml:mi>&#x003B2;</mml:mi><mml:mi>j</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>One of the drawbacks of using the ridge regression is estimating the value of &#x003BB;. Multiple values for &#x003BB; (ranging from 0.1 to 10) were considered and the optimal value of &#x003BB; = 5 was chosen using 5-fold cross validation. Ridge regression was implemented in python using the scikit-learn package (Pedregosa et al., <xref ref-type="bibr" rid="B34">2011</xref>).</p>
</sec>
</sec>
<sec>
<title>4.3. Nonlinear Regression</title>
<sec>
<title>4.3.1. Support Vector Regresssion</title>
<p>Support Vector Machine (SVM), first identified by Vladimir Vapnik and his colleagues in 1992, is a popular machine learning tool for classification (Vapnik, <xref ref-type="bibr" rid="B47">2013</xref>). Support Vector Regression (SVR), which uses the same principles as SVM, aims at finding a best possible continuous-valued function which balances model complexity and prediction error (Awad and Khanna, <xref ref-type="bibr" rid="B2">2015</xref>). In other words, the goal of Vapnik&#x00027;s &#x003F5;-insensitive approach (Vapnik, <xref ref-type="bibr" rid="B48">1995</xref>) is to find a function <italic>f</italic>(<italic>x</italic>) which has at the most &#x003F5; deviation from the individual points <italic>y</italic><sub><italic>i</italic></sub> and at the same time does not overfit the data. Any deviance less than &#x003F5; does not contribute to the regression fit, while data points with an absolute difference greater than that threshold, called support vectors, contribute a linear scale amount (Smola and Sch&#x000F6;lkopf, <xref ref-type="bibr" rid="B41">2004</xref>; Kuhn and Johnson, <xref ref-type="bibr" rid="B19">2013</xref>).</p>
<p>The general form of the regression equation for SVR is shown in Equation (11), where &#x0003C;., .&#x0003E; denotes the dot product and &#x003B2; is a vector of coefficients. The objective function for this model is shown in Equation (12). Model complexity can be controlled by seeking a small &#x003B2;. This can be ensured by minimizing the norm ||&#x003B2;||<sup>2</sup> = &#x0003C; &#x003B2;, &#x003B2;&#x0003E;.</p>
<disp-formula id="E11"><label>(11)</label><mml:math id="M31"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mo>,</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x0003E;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>b</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E12"><label>(12)</label><mml:math id="M32"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtable><mml:mtr><mml:mtd columnalign="left"><mml:mtext>Minimize&#x000A0;</mml:mtext><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:msup><mml:mrow><mml:mo>&#x02016;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mo>&#x02016;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="left"><mml:mtext>Subject&#x000A0;to&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mo>-</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mo>.</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x0003E;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo>-</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>b</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02264;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003F5;</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="left"><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mo>.</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x0003E;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>b</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>-</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02264;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003F5;</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The constraints in Equation (12) may be too strict in some situations, making the optimization problem infeasible. Hence, it is a usual practice to introduce slack variables &#x003BE;<sub><italic>i</italic></sub> and <inline-formula><mml:math id="M33"><mml:msubsup><mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula> in the constraints. The new objective function would therefore look like Equation (13). The constant <italic>C</italic> is a positive numeric value that determines the trade-off between model complexity and the extent upto which deviations larger than &#x003F5; are tolerated.</p>
<disp-formula id="E13"><label>(13)</label><mml:math id="M34"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd columnalign="left"><mml:mtable><mml:mtr><mml:mtd columnalign="left"><mml:mtext>Minimize&#x000A0;</mml:mtext><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:msup><mml:mrow><mml:mo stretchy="false">&#x02016;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mo>&#x02016;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:mi>C</mml:mi><mml:mo>.</mml:mo><mml:mstyle displaystyle="true"><mml:msubsup><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="left"><mml:mtext>Subject&#x000A0;to&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mo>.</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x0003E;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo>-</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>b</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02264;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003F5;</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="left"><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mo>.</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x0003E;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>b</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>-</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02264;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003F5;</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msubsup></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="left"><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msubsup><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>&#x02265;</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>SVR was implemented in python using the scikit-learn package (Pedregosa et al., <xref ref-type="bibr" rid="B34">2011</xref>). The values for the hyperparameters: type of kernel function, cost parameter <italic>C</italic> and error tolerance &#x003F5; were determined using a grid search over a range of possible values for each parameter using 5-fold cross validation on the shuffled dataset. <italic>C</italic> = 0.1, &#x003F5; = 0.15 and a linear kernel were chosen as the hyperparameters for the study. With a linear kernel, the cross product is simply taken in the original space instead of transforming the data into a higher dimension. This way, the predictors would be in the form of a quadratic polynomial of weather indices, something which has been considered by past studies.</p>
</sec>
<sec>
<title>4.3.2. Random Forest Regression</title>
<p>Random Forest (Breiman, <xref ref-type="bibr" rid="B4">2001</xref>), which is a special case of Classification and Regression Trees (CART) (Breiman et al., <xref ref-type="bibr" rid="B5">1984</xref>), is one of the most commonly used machine learning models for classification and regression. Using just one decision tree often creates a model that is unstable, meaning a small change in the data can lead a significant change in the tree structure. Random Forest, on the contrary, is an ensemble model which makes predictions by combining predictions from multiple decision trees using a technique called Bootstrap aggregation or Bagging (Breiman, <xref ref-type="bibr" rid="B3">1996</xref>). Boostrapping involves random sampling of data with replacement and helps control model variance (overfitting). Training a Random forest involves training each decision tree on a randomly sampled subset of features and data. The final prediction is produced by taking an average of outputs from each tree. Random Forest is good at handling tabular data with numerical features and at capturing nonlinear interactions between the response variable and the predictors.</p>
<p>Random Forest Regression was implemented in python using the scikit-learn package (Pedregosa et al., <xref ref-type="bibr" rid="B34">2011</xref>). Values of hyperparameters like number of trees, maximum tree depth, maximum number of features considered at each split and minimum samples at each split were determined using the grid search cross validation method. The model was trained on 80% of data and tested on the remaining 20%.</p>
</sec>
</sec>
</sec>
<sec id="s5">
<title>5. Results and Discussion</title>
<p><xref ref-type="fig" rid="F6">Figure 6</xref> shows the bias-corrected estimates for <inline-formula><mml:math id="M35"><mml:mover accent="false"><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> and &#x003C1; between corn yield and each of the 11 mean and extreme weather indices. The shaded areas in blue and red represent the 90% confidence bounds (5% and 95% quantiles) for the bias-corrected estimates generated using jackknife resampling. For some indices like GSP, Cold Wave index and Longest Wet Spell, the gap between the <inline-formula><mml:math id="M36"><mml:mover accent="false"><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> and &#x003C1; is narrow. This shows that the variation of yield with respect to these indices is mostly linear in nature. Mean weather indices like GDD, GST<sub><italic>max</italic></sub> and GST<sub><italic>min</italic></sub> and extreme weather indices like Summer days, Longest Dry Spell and <italic>prcp</italic>95<italic>p</italic> have a strong nonlinear relationship with yield even though the absolute value of their linear dependence is weak. It is interesting to note that the information contained in certain extreme weather indices like Summer days, Heat Wave index and Longest Wet Spell is more than that contained in mean weather indices, thus making the case for their inclusion as predictors in regression models.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Unbiased estimates for Pearson correlation coefficient (which measures the linear dependence) and scaled Mutual Information (which measures the overall dependence) between corn yield and weather indices. The shaded regions represent the 90% confidence bounds (5% and 95% quantiles) for the unbiased estimates calculated using jackknife resampling. The information contained in certain extreme weather indices, like Summer Days, Heat Wave Index, Longest Dry Spell and Longest Wet Spell is greater than or equal to that contained in the mean weather indices, thus making the case for their inclusion in regression models for predicting crop yield.</p></caption>
<graphic xlink:href="fsufs-04-00052-g0006.tif"/>
</fig>
<p>The results obtained here indicate a high degree of susceptibility of crop yield to extreme weather, thereby conforming with the key insights from past research (Lobell et al., <xref ref-type="bibr" rid="B25">2011b</xref>, <xref ref-type="bibr" rid="B24">2013</xref>). Many of the previous studies did not include extreme weather indices in their regression models for multiple reasons. The most common being the lack of availability of daily weather data (Lobell et al., <xref ref-type="bibr" rid="B25">2011b</xref>). Also, some of these studies assessed the impact of climate change on crop yield using temperature and precipitation derived from Global Circulation Models (GCMs). The outputs from the current generation of GCMs, however, are usually not thought to be credible at the spatiotemporal resolutions required to directly capture the effect of weather extremes on crop yield. Including extreme weather indices is crucial as they capture the variability of weather within the growing season which is not taken into account in mean weather indices. For example, the same average growing season temperature may arise from two very different seasons, one with little temperature variation and the other with wide fluctuations in temperature. A growing season with widely varying temperatures can result in an increased exposure to extreme conditions, which may critically impact the yields. The insights from this work also agree with those from a different study on the negative impact of temperatures on crop yield (Zhao et al., <xref ref-type="bibr" rid="B49">2017</xref>), which state that with each &#x000B0;C increase in global mean temperature, the global maize yield would reduce by about 7.4% (without any consideration of adaptation strategies or effects of CO<sub>2</sub> fertilization).</p>
<p><xref ref-type="table" rid="T2">Table 2</xref> compares the performance of linear and nonlinear regression models based on metrics like <italic>R</italic><sup>2</sup> and RMSE. For the linear models, PCR and Ridge regression were found to have <italic>R</italic><sup>2</sup> values of 0.89 and 0.88, respectively and RMSE values of 0.32 and 0.33, respectively. Nonlinear regression methods like SVR and Random Forest were found to have slightly better performance. <italic>R</italic><sup>2</sup> values were 0.90 and 0.93 for SVR and Random Forest, respectively with the corresponding RMSE values being 0.32 and 0.25. Overall, Random Forest regression was found to have the best <italic>R</italic><sup>2</sup> and RMSE. This could be attributed to its robustness to data with multicollinearity and for being adept at capturing non-linear interactions. The existence of nonlinear relationships between crop yield and weather indices is not newfound and have been conformed by multiple studies in the past (Schlenker and Roberts, <xref ref-type="bibr" rid="B39">2009</xref>; Lobell et al., <xref ref-type="bibr" rid="B21">2011a</xref>).</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Comparison of linear and nonlinear regression approaches to model crop yield using weather indices.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="left"><bold>Regression model</bold></th>
<th valign="top" align="center"><bold><italic>R</italic><sup>2</sup></bold></th>
<th valign="top" align="center"><bold>RMSE</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Linear</td>
<td valign="top" align="left">Principal component</td>
<td valign="top" align="center">0.89</td>
<td valign="top" align="center">0.32</td>
</tr>
<tr style="border-bottom: thin solid #000000;">
<td/>
<td valign="top" align="left">Ridge</td>
<td valign="top" align="center">0.88</td>
<td valign="top" align="center">0.33</td>
</tr> <tr>
<td valign="top" align="left">Nonlinear</td>
<td valign="top" align="left">Support vector</td>
<td valign="top" align="center">0.90</td>
<td valign="top" align="center">0.32</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Random forest</td>
<td valign="top" align="center">0.93</td>
<td valign="top" align="center">0.25</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Results from this study could help researchers interested in understanding the impact of environmental factors on crop production. Mechanistic crop simulation models have been traditionally used to model crop growth and yield and to understand patterns of crop yield response to climate change. However, gaps exist in our understanding of crop growth and development processes. One example being the effect of extreme temperatures on crop growth. Asseng et al. (<xref ref-type="bibr" rid="B1">2013</xref>) simulated climate change impacts on future global wheat yields and concluded that a greater proportion of the uncertainty was due to variations among mechanistic crop models than to variations among downscaled climate models. Insights from this study could contribute toward a better understanding of the relevant predictors in crop yield modeling and improve our existing knowledge on the precise nature of crop-weather relationship.</p>
<p>Future studies should focus on expanding the scope of this study in terms of the number of crops considered and the spatial extent of the study. When performing this analysis for a broader region, care should be taken to include effects, such as spatial autocorrelation of environmental variables. The presence/absence of irrigation has been shown to negate some of the effects of extreme heat stress on crop growth (Siebert et al., <xref ref-type="bibr" rid="B40">2017</xref>) and hence, should also be considered. There are several limitations of this study. First, the way in which some of the weather indices are computed can have a sizeable impact on the results. A separate analysis was performed to test the sensitivity of some of the extreme weather indices to the specific value of thresholds, as shown in <xref ref-type="supplementary-material" rid="SM1">Figures S1, S2</xref>. With a couple of indices as test cases (Summer Days and <italic>p</italic>th percentile precipitation), it was found that the value of threshold used can have a huge impact on the value of the index. This is a problem that has also been acknowledged in previous studies. According to Tack et al. (<xref ref-type="bibr" rid="B42">2015</xref>), when calculating growing degree days, including information on the distribution of temperature within each day provides a statistically significant improvement in capturing yield variability. For this particular study, data on intraday variability in temperature was not available and therefore not used. Second, different crop growth stages have different sensitivities to an extreme event. Although this study did include extreme weather indices, it did not consider the specific crop growth stage affected by it. Third, this study included only temperature and precipitation-based indices. However, other environmental factors like relative humidity, ozone and CO<sub>2</sub> concentration have also been shown to affect yield.</p>
</sec>
<sec sec-type="conclusions" id="s6">
<title>6. Conclusions</title>
<p>Changes in the mean and extreme weather pose a major risk to governments and businesses all across the globe. With corn as a test case, the aim of this study was to come up with a systematic approach to understand the nature of the crop yield-weather relationship and determine if extreme weather indices are relevant for yield modeling. Using Mutual Information as a metric for pairwise dependence, it can be concluded that the yield-weather relationship is indeed nonlinear. The information contained in certain extreme weather indices like Summer days, Heat Wave index, Longest Dry spell and Longest Wet Spell was found to be greater than or equal to that contained in the mean weather indices, thus making a case for their inclusion as predictors in crop yield modeling. The results also suggest that Mutual Information can be a better metric for covariate selection over Pearson correlation coefficient as it gives a measure of the overall relationship (linear and nonlinear) between the predictor and response variables. Using a combination of mean and extreme weather indices as inputs, the nonlinear regression models were found to have a slightly better fit than the linear models, with the Random Forest regression giving the best fit and least error on the test set. Future studies should focus on expanding the scope of this analysis, both in terms of the spatial scale and number of crops considered. The implications of this work are important for researchers, businesses and government agencies and especially for platforms like NASA Earth Exchange which facilitate the generation and dissemination of impacts relevant weather data and indices using a multitude of satellite-derived datasets and model outputs.</p>
</sec>
<sec sec-type="data-availability-statement" id="s7">
<title>Data Availability Statement</title>
<p>The data that support the findings of this study are openly available at <ext-link ext-link-type="uri" xlink:href="https://www.ncdc.noaa.gov/cdo-web/">https://www.ncdc.noaa.gov/cdo-web/</ext-link> and <ext-link ext-link-type="uri" xlink:href="https://quickstats.nass.usda.gov/">https://quickstats.nass.usda.gov/</ext-link>.</p>
</sec>
<sec id="s8">
<title>Author Contributions</title>
<p>VK, TV, SG, and AG contributed to the conceptualization of this study. VK, TV, and AG contributed to the methodology. VK led the preparation of the manuscript with guidance from AG. VK, TV, and AG edited the manuscript.</p>
</sec>
<sec id="s9">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<ack><p>The authors would like to thank the data providers including NOAA, USDA NASS, and the reviewers for providing valuable feedback and suggestions.</p>
</ack>
<sec sec-type="supplementary-material" id="s10">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fsufs.2020.00052/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fsufs.2020.00052/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.PDF" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Asseng</surname> <given-names>S.</given-names></name> <name><surname>Ewert</surname> <given-names>F.</given-names></name> <name><surname>Rosenzweig</surname> <given-names>C.</given-names></name> <name><surname>Jones</surname> <given-names>J. W.</given-names></name> <name><surname>Hatfield</surname> <given-names>J. L.</given-names></name> <name><surname>Ruane</surname> <given-names>A. C.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Uncertainty in simulating wheat yields under climate change</article-title>. <source>Nat. Clim. Change</source> <volume>3</volume>, <fpage>827</fpage>&#x02013;<lpage>832</lpage>. <pub-id pub-id-type="doi">10.1038/nclimate1916</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Awad</surname> <given-names>M.</given-names></name> <name><surname>Khanna</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). <article-title>Support vector regression</article-title>, in <source>Efficient Learning Machines</source> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>67</fpage>&#x02013;<lpage>80</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-4302-5990-9</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breiman</surname> <given-names>L.</given-names></name></person-group> (<year>1996</year>). <article-title>Bagging predictors</article-title>. <source>Mach. Learn</source>. <volume>24</volume>, <fpage>123</fpage>&#x02013;<lpage>140</lpage>. <pub-id pub-id-type="doi">10.1007/BF00058655</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breiman</surname> <given-names>L.</given-names></name></person-group> (<year>2001</year>). <article-title>Random forests</article-title>. <source>Mach. Learn</source>. <volume>45</volume>, <fpage>5</fpage>&#x02013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.1023/A:1017934522171</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Breiman</surname> <given-names>L.</given-names></name> <name><surname>Friedman</surname> <given-names>J.</given-names></name> <name><surname>Olshen</surname> <given-names>R.</given-names></name> <name><surname>Stone</surname> <given-names>C.</given-names></name></person-group> (<year>1984</year>). <source>Classification and Regression Trees</source>. <publisher-loc>Dordrecht</publisher-loc>: <publisher-name>Taylor &#x00026; Francis</publisher-name>.</citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brisson</surname> <given-names>N.</given-names></name> <name><surname>Gate</surname> <given-names>P.</given-names></name> <name><surname>Gouache</surname> <given-names>D.</given-names></name> <name><surname>Charmet</surname> <given-names>G.</given-names></name> <name><surname>Oury</surname> <given-names>F.-X.</given-names></name> <name><surname>Huard</surname> <given-names>F.</given-names></name></person-group> (<year>2010</year>). <article-title>Why are wheat yields stagnating in Europe? A comprehensive data analysis for France</article-title>. <source>Field Crops Res</source>. <volume>119</volume>, <fpage>201</fpage>&#x02013;<lpage>212</lpage>. <pub-id pub-id-type="doi">10.1016/j.fcr.2010.07.012</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bruinsma</surname> <given-names>J.</given-names></name></person-group> (<year>2009</year>). <article-title>The resource outlook to 2050: by how much do land, water and crop yields need to increase by 2050</article-title>, in <source>Expert Meeting on How to Feed the World</source> <italic>in</italic>, Vol. <volume>2050</volume> (<publisher-loc>Rome</publisher-loc>), <fpage>24</fpage>&#x02013;<lpage>26</lpage>.</citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deryng</surname> <given-names>D.</given-names></name> <name><surname>Conway</surname> <given-names>D.</given-names></name> <name><surname>Ramankutty</surname> <given-names>N.</given-names></name> <name><surname>Price</surname> <given-names>J.</given-names></name> <name><surname>Warren</surname> <given-names>R.</given-names></name></person-group> (<year>2014</year>). <article-title>Global crop yield response to extreme heat stress under multiple climate change futures</article-title>. <source>Environ. Res. Lett</source>. <volume>9</volume>:<fpage>034011</fpage>. <pub-id pub-id-type="doi">10.1088/1748-9326/9/3/034011</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fraser</surname> <given-names>A. M.</given-names></name> <name><surname>Swinney</surname> <given-names>H. L.</given-names></name></person-group> (<year>1986</year>). <article-title>Independent coordinates for strange attractors from mutual information</article-title>. <source>Phys. Rev. A</source> <volume>33</volume>:<fpage>1134</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevA.33.1134</pub-id><pub-id pub-id-type="pmid">9896728</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gourdji</surname> <given-names>S. M.</given-names></name> <name><surname>Sibley</surname> <given-names>A. M.</given-names></name> <name><surname>Lobell</surname> <given-names>D. B.</given-names></name></person-group> (<year>2013</year>). <article-title>Global crop exposure to critical high temperatures in the reproductive period: historical trends and future projections</article-title>. <source>Environ. Res. Lett</source>. <volume>8</volume>:<fpage>024041</fpage>. <pub-id pub-id-type="doi">10.1088/1748-9326/8/2/024041</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Granger</surname> <given-names>C.</given-names></name> <name><surname>Lin</surname> <given-names>J.-L.</given-names></name></person-group> (<year>1994</year>). <article-title>Using the mutual information coefficient to identify lags in nonlinear models</article-title>. <source>J. Time Series Anal</source>. <volume>15</volume>, <fpage>371</fpage>&#x02013;<lpage>384</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-9892.1994.tb00200.x</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hoerl</surname> <given-names>A. E.</given-names></name> <name><surname>Kennard</surname> <given-names>R. W.</given-names></name></person-group> (<year>1970</year>). <article-title>Ridge regression: biased estimation for nonorthogonal problems</article-title>. <source>Technometrics</source> <volume>12</volume>, <fpage>55</fpage>&#x02013;<lpage>67</lpage>. <pub-id pub-id-type="doi">10.1080/00401706.1970.10488634</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><collab>IPCC</collab></person-group> (<year>2013a</year>). <source>Climate Change 2014: Impacts, Adaptation, and Vulnerability. Part A: Global and Sectoral Aspects. Contribution of Working Group II to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change</source>. <publisher-loc>Cambridge; New York, NY</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>, <fpage>485</fpage>&#x02013;<lpage>533</lpage>.</citation></ref>
<ref id="B14">
<citation citation-type="book"><person-group person-group-type="author"><collab>IPCC</collab></person-group> (<year>2013b</year>). <source>Summary for Policymakers, Book Section SPM</source>. <publisher-loc>Cambridge; New York, NY</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>, <fpage>1</fpage>&#x02013;<lpage>30</lpage>.</citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Joe</surname> <given-names>H.</given-names></name></person-group> (<year>1989</year>). <article-title>Relative entropy measures of multivariate dependence</article-title>. <source>J. Am. Stat. Assoc</source>. <volume>84</volume>, <fpage>157</fpage>&#x02013;<lpage>164</lpage>. <pub-id pub-id-type="doi">10.1080/01621459.1989.10478751</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Jolliffe</surname> <given-names>I. T.</given-names></name></person-group> (<year>1986</year>). <article-title>Principal components in regression analysis</article-title>, in <source>Principal Component Analysis</source> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>129</fpage>&#x02013;<lpage>155</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-4757-1904-8</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="editor"><name><surname>Karl</surname> <given-names>T. R.</given-names></name> <name><surname>Nicholls</surname> <given-names>N.</given-names></name> <name><surname>Ghazi</surname> <given-names>A.</given-names></name></person-group> (eds.). (<year>1999</year>). <article-title>Clivar/GCOS/WMO workshop on indices and indicators for climate extremes workshop summary</article-title>, in <source>Weather and Climate Extremes</source> (<publisher-loc>Dordrecht</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>3</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1007/978-94-015-9265-9</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Khan</surname> <given-names>S.</given-names></name> <name><surname>Ganguly</surname> <given-names>A. R.</given-names></name> <name><surname>Bandyopadhyay</surname> <given-names>S.</given-names></name> <name><surname>Saigal</surname> <given-names>S.</given-names></name> <name><surname>Erickson</surname> <given-names>D. J.</given-names></name> <name><surname>Protopopescu</surname> <given-names>V.</given-names></name> <etal/></person-group>. (<year>2006</year>). <article-title>Nonlinear statistics reveals stronger ties between ENSO and the tropical hydrological cycle</article-title>. <source>Geophys. Res. Lett</source>. <volume>33</volume>:<fpage>L24402</fpage>. <pub-id pub-id-type="doi">10.1029/2006GL027941</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kuhn</surname> <given-names>M.</given-names></name> <name><surname>Johnson</surname> <given-names>K.</given-names></name></person-group> (<year>2013</year>). <source>Applied Predictive Modeling</source>, Vol. <volume>26</volume>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>. <pub-id pub-id-type="doi">10.1007/978-1-4614-6849-3</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lesk</surname> <given-names>C.</given-names></name> <name><surname>Rowhani</surname> <given-names>P.</given-names></name> <name><surname>Ramankutty</surname> <given-names>N.</given-names></name></person-group> (<year>2016</year>). <article-title>Influence of extreme weather disasters on global crop production</article-title>. <source>Nature</source> <volume>529</volume>:<fpage>84</fpage>. <pub-id pub-id-type="doi">10.1038/nature16467</pub-id><pub-id pub-id-type="pmid">26738594</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lobell</surname> <given-names>D. B.</given-names></name> <name><surname>B&#x000E4;nziger</surname> <given-names>M.</given-names></name> <name><surname>Magorokosho</surname> <given-names>C.</given-names></name> <name><surname>Vivek</surname> <given-names>B.</given-names></name></person-group> (<year>2011a</year>). <article-title>Nonlinear heat effects on African maize as evidenced by historical yield trials</article-title>. <source>Nat. Clim. Change</source> <volume>1</volume>, <fpage>42</fpage>&#x02013;<lpage>45</lpage>. <pub-id pub-id-type="doi">10.1038/nclimate1043</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lobell</surname> <given-names>D. B.</given-names></name> <name><surname>Burke</surname> <given-names>M. B.</given-names></name></person-group> (<year>2010</year>). <article-title>On the use of statistical models to predict crop yield responses to climate change</article-title>. <source>Agric. For. Meteorol</source>. <volume>150</volume>, <fpage>1443</fpage>&#x02013;<lpage>1452</lpage>. <pub-id pub-id-type="doi">10.1016/j.agrformet.2010.07.008</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lobell</surname> <given-names>D. B.</given-names></name> <name><surname>Field</surname> <given-names>C. B.</given-names></name></person-group> (<year>2011</year>). <article-title>California perennial crops in a changing climate</article-title>. <source>Clim. Change</source> <volume>109</volume>, <fpage>317</fpage>&#x02013;<lpage>333</lpage>. <pub-id pub-id-type="doi">10.1007/s10584-011-0303-6</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lobell</surname> <given-names>D. B.</given-names></name> <name><surname>Hammer</surname> <given-names>G. L.</given-names></name> <name><surname>McLean</surname> <given-names>G.</given-names></name> <name><surname>Messina</surname> <given-names>C.</given-names></name> <name><surname>Roberts</surname> <given-names>M. J.</given-names></name> <name><surname>Schlenker</surname> <given-names>W.</given-names></name></person-group> (<year>2013</year>). <article-title>The critical role of extreme heat for maize production in the United States</article-title>. <source>Nat. Clim. Change</source> <volume>3</volume>:<fpage>497</fpage>. <pub-id pub-id-type="doi">10.1038/nclimate1832</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lobell</surname> <given-names>D. B.</given-names></name> <name><surname>Schlenker</surname> <given-names>W.</given-names></name> <name><surname>Costa-Roberts</surname> <given-names>J.</given-names></name></person-group> (<year>2011b</year>). <article-title>Climate trends and global crop production since 1980</article-title>. <source>Science</source> <volume>333</volume>, <fpage>616</fpage>&#x02013;<lpage>620</lpage>. <pub-id pub-id-type="doi">10.1126/science.1204531</pub-id><pub-id pub-id-type="pmid">21551030</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>MacKay</surname> <given-names>D. J.</given-names></name></person-group> (<year>2003</year>). <source>Information Theory, Inference and Learning Algorithms</source>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>.</citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Menne</surname> <given-names>M. J.</given-names></name> <name><surname>Durre</surname> <given-names>I.</given-names></name> <name><surname>Vose</surname> <given-names>R. S.</given-names></name> <name><surname>Gleason</surname> <given-names>B. E.</given-names></name> <name><surname>Houston</surname> <given-names>T. G.</given-names></name></person-group> (<year>2012</year>). <article-title>An overview of the global historical climatology network-daily database</article-title>. <source>J. Atmos. Ocean. Technol</source>. <volume>29</volume>, <fpage>897</fpage>&#x02013;<lpage>910</lpage>. <pub-id pub-id-type="doi">10.1175/JTECH-D-11-00103.1</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moon</surname> <given-names>Y.-I.</given-names></name> <name><surname>Rajagopalan</surname> <given-names>B.</given-names></name> <name><surname>Lall</surname> <given-names>U.</given-names></name></person-group> (<year>1995</year>). <article-title>Estimation of mutual information using kernel density estimators</article-title>. <source>Phys. Rev. E</source> <volume>52</volume>:<fpage>2318</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevE.52.2318</pub-id><pub-id pub-id-type="pmid">9963673</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moore</surname> <given-names>F. C.</given-names></name> <name><surname>Lobell</surname> <given-names>D. B.</given-names></name></person-group> (<year>2014</year>). <article-title>Adaptation potential of European agriculture in response to climate change</article-title>. <source>Nat. Clim. Change</source> <volume>4</volume>:<fpage>610</fpage>. <pub-id pub-id-type="doi">10.1038/nclimate2228</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="book"><person-group person-group-type="author"><collab>NOAA</collab></person-group> (<year>2018</year>). <source>Climate Data Online</source>. <publisher-name>National Climatic Data Center</publisher-name> (accessed April 30, 2018).</citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Osborne</surname> <given-names>T. M.</given-names></name> <name><surname>Wheeler</surname> <given-names>T. R.</given-names></name></person-group> (<year>2013</year>). <article-title>Evidence for a climate signal in trends of global crop yield variability over the past 50 years</article-title>. <source>Environ. Res. Lett</source>. <volume>8</volume>:<fpage>024001</fpage>. <pub-id pub-id-type="doi">10.1088/1748-9326/8/2/024001</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Otto</surname> <given-names>F. E.</given-names></name> <name><surname>Massey</surname> <given-names>N.</given-names></name> <name><surname>Oldenborgh</surname> <given-names>G.</given-names></name> <name><surname>Jones</surname> <given-names>R.</given-names></name> <name><surname>Allen</surname> <given-names>M.</given-names></name></person-group> (<year>2012</year>). <article-title>Reconciling two approaches to attribution of the 2010 Russian heat wave</article-title>. <source>Geophys. Res. Lett</source>. <volume>39</volume>:<fpage>L04702</fpage>. <pub-id pub-id-type="doi">10.1029/2011GL050422</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="book"><person-group person-group-type="author"><collab>OECD and Food and Agriculture Organization of the United Nations</collab></person-group> (<year>2012</year>). <source>OECD-FAO Agricultural Outlook 2012</source>. <fpage>286</fpage>. <pub-id pub-id-type="doi">10.1787/agr_outlook-2012-en</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Pedregosa</surname> <given-names>F.</given-names></name> <name><surname>Varoquaux</surname> <given-names>G.</given-names></name> <name><surname>Gramfort</surname> <given-names>A.</given-names></name> <name><surname>Michel</surname> <given-names>V.</given-names></name> <name><surname>Thirion</surname> <given-names>B.</given-names></name> <name><surname>Grisel</surname> <given-names>O.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>Scikit-learn: machine learning in Python</article-title>. <source>J. Mach. Learn. Res</source>. <volume>12</volume>, <fpage>2825</fpage>&#x02013;<lpage>2830</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://jmlr.org/papers/v12/pedregosa11a.html">http://jmlr.org/papers/v12/pedregosa11a.html</ext-link></citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ray</surname> <given-names>D. K.</given-names></name> <name><surname>Gerber</surname> <given-names>J. S.</given-names></name> <name><surname>MacDonald</surname> <given-names>G. K.</given-names></name> <name><surname>West</surname> <given-names>P. C.</given-names></name></person-group> (<year>2015</year>). <article-title>Climate variation explains a third of global crop yield variability</article-title>. <source>Nat. Commun</source>. <volume>6</volume>:<fpage>5989</fpage>. <pub-id pub-id-type="doi">10.1038/ncomms6989</pub-id><pub-id pub-id-type="pmid">25609225</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ray</surname> <given-names>D. K.</given-names></name> <name><surname>Mueller</surname> <given-names>N. D.</given-names></name> <name><surname>West</surname> <given-names>P. C.</given-names></name> <name><surname>Foley</surname> <given-names>J. A.</given-names></name></person-group> (<year>2013</year>). <article-title>Yield trends are insufficient to double global crop production by 2050</article-title>. <source>PLoS ONE</source> <volume>8</volume>:<fpage>e66428</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0066428</pub-id><pub-id pub-id-type="pmid">23840465</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ray</surname> <given-names>D. K.</given-names></name> <name><surname>Ramankutty</surname> <given-names>N.</given-names></name> <name><surname>Mueller</surname> <given-names>N. D.</given-names></name> <name><surname>West</surname> <given-names>P. C.</given-names></name> <name><surname>Foley</surname> <given-names>J. A.</given-names></name></person-group> (<year>2012</year>). <article-title>Recent patterns of crop yield growth and stagnation</article-title>. <source>Nat. Commun</source>. <volume>3</volume>, <fpage>1</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1038/ncomms2296</pub-id><pub-id pub-id-type="pmid">23250423</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schauberger</surname> <given-names>B.</given-names></name> <name><surname>Archontoulis</surname> <given-names>S.</given-names></name> <name><surname>Arneth</surname> <given-names>A.</given-names></name> <name><surname>Balkovic</surname> <given-names>J.</given-names></name> <name><surname>Ciais</surname> <given-names>P.</given-names></name> <name><surname>Deryng</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Consistent negative response of US crops to high temperatures in observations and crop models</article-title>. <source>Nat. Commun</source>. <volume>8</volume>, <fpage>1</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1038/ncomms13931</pub-id><pub-id pub-id-type="pmid">28102202</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schlenker</surname> <given-names>W.</given-names></name> <name><surname>Roberts</surname> <given-names>M. J.</given-names></name></person-group> (<year>2009</year>). <article-title>Nonlinear temperature effects indicate severe damages to US crop yields under climate change</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>106</volume>, <fpage>15594</fpage>&#x02013;<lpage>15598</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0906865106</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Siebert</surname> <given-names>S.</given-names></name> <name><surname>Webber</surname> <given-names>H.</given-names></name> <name><surname>Zhao</surname> <given-names>G.</given-names></name> <name><surname>Ewert</surname> <given-names>F.</given-names></name></person-group> (<year>2017</year>). <article-title>Heat stress is overestimated in climate impact studies for irrigated agriculture</article-title>. <source>Environ. Res. Lett</source>. <volume>12</volume>:<fpage>054023</fpage>. <pub-id pub-id-type="doi">10.1088/1748-9326/aa702f</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smola</surname> <given-names>A. J.</given-names></name> <name><surname>Sch&#x000F6;lkopf</surname> <given-names>B.</given-names></name></person-group> (<year>2004</year>). <article-title>A tutorial on support vector regression</article-title>. <source>Stat. Comput</source>. <volume>14</volume>, <fpage>199</fpage>&#x02013;<lpage>222</lpage>. <pub-id pub-id-type="doi">10.1023/B:STCO.0000035301.49549.88</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tack</surname> <given-names>J.</given-names></name> <name><surname>Barkley</surname> <given-names>A.</given-names></name> <name><surname>Nalley</surname> <given-names>L. L.</given-names></name></person-group> (<year>2015</year>). <article-title>Effect of warming temperatures on US wheat yields</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>112</volume>, <fpage>6931</fpage>&#x02013;<lpage>6936</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1415181112</pub-id><pub-id pub-id-type="pmid">25964323</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tilman</surname> <given-names>D.</given-names></name> <name><surname>Balzer</surname> <given-names>C.</given-names></name> <name><surname>Hill</surname> <given-names>J.</given-names></name> <name><surname>Befort</surname> <given-names>B. L.</given-names></name></person-group> (<year>2011</year>). <article-title>Global food demand and the sustainable intensification of agriculture</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>108</volume>, <fpage>20260</fpage>&#x02013;<lpage>20264</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1116437108</pub-id><pub-id pub-id-type="pmid">22106295</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Troy</surname> <given-names>T.</given-names></name> <name><surname>Kipgen</surname> <given-names>C.</given-names></name> <name><surname>Pal</surname> <given-names>I.</given-names></name></person-group> (<year>2015</year>). <article-title>The impact of climate extremes and irrigation on US crop yields</article-title>. <source>Environ. Res. Lett</source>. <volume>10</volume>:<fpage>054013</fpage>. <pub-id pub-id-type="doi">10.1088/1748-9326/10/5/054013</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Urban</surname> <given-names>D.</given-names></name> <name><surname>Roberts</surname> <given-names>M. J.</given-names></name> <name><surname>Schlenker</surname> <given-names>W.</given-names></name> <name><surname>Lobell</surname> <given-names>D. B.</given-names></name></person-group> (<year>2012</year>). <article-title>Projected temperature changes indicate significant increase in interannual variability of US maize yields</article-title>. <source>Clim. Change</source> <volume>112</volume>, <fpage>525</fpage>&#x02013;<lpage>533</lpage>. <pub-id pub-id-type="doi">10.1007/s10584-012-0428-2</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="book"><person-group person-group-type="author"><collab>USDA</collab></person-group> (<year>2010</year>). <source>Quick Stats NASS USDA</source> (accessed April 30, 2018).</citation></ref>
<ref id="B47">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Vapnik</surname> <given-names>V.</given-names></name></person-group> (<year>2013</year>). <source>The Nature of Statistical Learning Theory</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer Science &#x00026; Business Media</publisher-name>.</citation></ref>
<ref id="B48">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Vapnik</surname> <given-names>V. N.</given-names></name></person-group> (<year>1995</year>). <source>The Nature of Statistical Learning Theory</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>.</citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>C.</given-names></name> <name><surname>Liu</surname> <given-names>B.</given-names></name> <name><surname>Piao</surname> <given-names>S.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Lobell</surname> <given-names>D. B.</given-names></name> <name><surname>Huang</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Temperature increase reduces global yields of major crops in four independent estimates</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>114</volume>, <fpage>9326</fpage>&#x02013;<lpage>9331</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1701762114</pub-id><pub-id pub-id-type="pmid">28811375</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> Funding for this work was provided by four National Science Foundation (NSF) projects, namely NSF BIGDATA under grant number 1447587, NSF CyberSEES under grant number 1442728, NSF CISE Expeditions in Computing under grant number 1029711 and NSF CRISP type 2 under grant number 1735505. This work was also supported by NASA Earth Exchange.</p>
</fn>
</fn-group>
</back>
</article>