Evaluation of the FLake Model in ERA5 for Lake Champlain

Global model reanalyses of temperature and radiation are used for many purposes because of their spatial and temporal homogeneity. However, they use sub-models for lakes that are smaller than the model grid. This paper compares the simplified small-lake model, known as FLake, used in the European Centre global reanalysis known as ERA5, with observations made in and near Lake Champlain in northern Vermont. Lake Champlain is a challenging test for the ERA5 FLake model. The lake, which extends over several grid cells, is the lowest region at 30 m above sea level within complex mountain topography. The smoothing of the adjacent mountain topography means that the ERA5 grid cells containing the lake have higher mean elevations then 30 m, and this contributes to a small cool bias in FLake mid-summer temperatures. The seasonal cycle of FLake temperatures has a sharper peak than the observed lake temperatures. In winter, lake temperatures are close to 3°C, while the 30 m deep FLake mixed layer (ML) is near freezing. In May and June, FLake maintains a deep ML, while lake profiles are generally strongly stratified with peak temperatures near the surface several degrees above the model ML. One possible contributing reason is that inflowing river temperatures that are not considered by FLake are as much as 5°C above the lake surface temperature from April to June. The lake does develop a ML structure as it cools from the temperature peak in August, but the FLake ML cools faster and grows deeper in fall. We conclude that the vertical mixing in the FLake ML is stronger than the vertical mixing in Lake Champlain.


INTRODUCTION
This paper will compare in-situ data for Lake Champlain, which is bordered by the states of Vermont and New York and the province of Quebec, with the sub-grid-scale lake model FLake (Mironov, 2008;Dutra et al., 2010;Mironov et al., 2010) used in the current reanalysis from the European Center for Medium-Range Weather Forecasts (ECMWF), known as ERA5 (C3S: Copernicus Climate Change Service, 2017). Exchanges of energy and water differ greatly for land and water surfaces, and at the land-ocean boundary. Global models explicitly handle this transition using a land-sea gridbox fraction. Over land, both large lakes that are resolved by the model grid, and the large numbers of unresolved smaller lakes are modeled in ERA5 using the one-dimensional FLake model to compute the diurnal and seasonal cycle of lake temperature profiles, and the contribution to the mean grid-box surface fluxes. This study will focus on Lake Champlain, but small lakes are extensive over the continents. For example, Canada has about 31000 small lakes with areas between 1 and 100 km 2 which substantially impact surface temperature (Verseghy and MacKay, 2017).
The broader context is a University of Vermont project called Basin Resilience to Extreme Events (BREE), funded by the National Science Foundation to understand the ecohydrology and economic impacts of the lake as climate and extreme events change. Already toxic blooms of blue-green algae in summer contaminate the shallow lake waters near the urban area of Burlington Vermont, impacting local health and tourism (Isles et al., 2015). In the broad context, the BREE project is developing an integrated assessment model for the Lake Champlain region (Zia et al., 2016) with an atmospheric model (Huang et al., 2019) driving a lake circulation model, coupled to a biogeochemistry model (e.g., Isles et al., 2017), and to land-use and governance issues (Bitterman and Koliba, 2020;Doran et al., 2020).
This study however, which started as a BREE student summer project by the second author (DR), has a limited scope. We compare the simplified 1-D FLake model from ERA5 with surface and profile measurements for two sites on Lake Champlain that are available for several years.

ERA5 Domain and Observation Sites
The operational ECMWF analysis-forecast system is under continual development with significant upgrades typically twice a year. For historic reanalysis a frozen version of the model is used. This paper uses the latest reanalysis, ERA5, based on model cycle Cy41r2, which was introduced operationally in 2016. Extensive details of the representation of physical processes, including the surface parameterization and parameter tables, are available in Hersbach et al. (2020) and Cy41r2 (2016). Here we give a very brief overview.
The land-surface model in ERA5, known as HTESSEL (Balsamo et al., 2009, represents each grid-box in terms of the fraction of eight tiles, one of which is FLake for subgrid-scale lakes . Note that the tiles at the interface of the soil-atmosphere are in energy and hydrological contact with one single atmospheric profile above and one single soil profile below. Each grid-box is divided into eight fractions: two vegetated fractions (high and low vegetation without snow), one bare soil fraction, three snow/ice fractions (snow on bare ground/low vegetation, high vegetation with snow beneath, and lake-ice), and two water fractions (interception reservoir, and sub-grid-lakes which have a specific sub-model (FLake, described in next section). The distinction between low and high vegetation is particularly important for snow, because exposed snow has a high albedo, whereas, a canopy with snow underneath has a low albedo (Betts and Ball, 1997;Betts et al., 2001). The vegetation characteristics in ERA5 are defined by fractional cover and the type of the dominant high and low vegetation, which are based on the Global Land Cover Characterization (GLCC) data set derived from 1 km AVHRR (Advanced Very High Resolution Radiometer) satellite observations (Loveland et al., 2000). For each vegetation type, Leaf Area Index (LAI) has an annual cycle, which comes from a satellite-derived monthly climatology  and which modulates evapotranspiration.
We use ERA5 grid-boxes that are 0.25 × 0.25 degrees, corresponding to about 27.8 km in latitude and 20 km in longitude at 44 • N, and therefore, an area of about 550 km 2 . Lakes with an area >1% grid-box cover are represented by FLake, but they are aggregated to a single lake tile, which communicates with the single grid scale atmospheric profile. Figure 1A shows the mean topography of the 0.25 degree ERA5 grid as a square pattern, showing the north-south chain of the Green Mountains in central Vermont to the east of Lake Champlain, and the higher Adirondack mountains to the west in New York. ERA5 also represents the sub-grid-scale orography (not shown) to improve estimates of the surface stress. The New York-Vermont border runs through the lake (black line) north to the Canadian border at 45 • N. Figure 1B shows the ERA5 grid-boxes which are rectangular in geographic coordinates, superimposed on a map of the sites where there are observations around Lake Champlain for comparison. This paper is a direct comparison of the ERA5 FLake tile model data and the ERA5 grid-mean data with observational data, primarily from the Diamond Island (green diamond) and Colchester Reef (red triangle) sites. We will use ERA5 data from 2012 to 2017. Recent work over the central Canadian Prairies (Betts et al., 2019) showed that the nearsurface air temperature bias in ERA5 is small, typically < ±1 • C for the April to October warm season with no snow. This is much less than the earlier reanalysis known as ERA-Interim (Betts and Beljaars, 2017).

ERA5 FLake Tile Model
The representation of inland water bodies (lakes, reservoirs, rivers, and coastal waters) is important in order to account for the thermal inertia effects, albedo and roughness characteristics of open water and to account for phase change during freezing/melting. This is simulated in ERA5 by the Freshwater Lake model FLake [Mironov (2008), Mironov et al. (2010)], which was chosen for its intermediate complexity, particularly adapted for numerical weather prediction and climate applications. Moreover, FLake benefits from a large research community effort, contributing to validation and development [FLake (2017)]. Its use and evaluation as the tile representing sub-grid-scale lakes in the ECMWF HTESSEL land surface model (Balsamo et al., 2009(Balsamo et al., , 2012 is discussed in Dutra et al. (2010) and Balsamo (2013).
The FLake model was developed to predict the surface temperature in small lakes of various depths on time scales from a few hour to a year, specifically for numerical weather prediction. Key parameters are lake fraction and lake depth which are mapped from global datasets-see Chapter 11.11 in Cy41r2 (2016). The global lake depth and coverage datasets were developed by Kourzeneva (2010), Kourzeneva et al. (2012), and Choulga et al. (2014). The FLake model is based on a twolayer parameterization of the lake temperature profile, with an upper mixed layer (ML) above the stratified lake thermocline extending down to lake bottom. These are described using the concept of self-similarity for the evolving temperature profile. Figure 2A is a schematic of this parameterization, adapted from Mironov (2008). The model is forced at the surface by the wind at the lowest model level, as well as by temperature, humidity and precipitation and the shortwave and longwave radiation; and it adjusts to a new equilibrium profile on timescales of a few hour. Full details are available in Chapter 8.8 in Cy41r2 (2016). The key parameters are ML Temperature (MLT) and ML Depth (MLD); Bottom layer temperature (BLT), and a profile shape factor for the lower layer. ERA5 provides hourly data, which we have integrated to UTC (Universal Time) daily means. Observations made in Eastern Standard Time (EST) will be converted to the same time-base: UTC=EST+5. Our climate analysis begins with daily and monthly timescales, which are longer than the FLake adjustment time. Figure 2B shows the tight coupling on daily timescales (R 2 ≈ 0.95) between ML temperature and ML depth for August 2015 and 2016 for the ERA5 grid-box centered on 44.25 • N, −73.25 • W. August has the most linear structure because it is near the time of maximum temperature, and 2016 (a warmer summer than 2015) has a slightly warmer and deeper ML.

Observations
We compare observations and ERA5 for the seasonal cycle of lake temperature (T water ) and air temperature (T air ) for 2 key sites run by the Forest Ecosystem Monitoring Cooperative (FEMC, 2019). Diamond Island (green diamond in Figure 1B) at 44.237 • N, 73.333 • W has 15-min observations for 2012-2017, including T water at 3 m depth, and T air at 42.6 m above mean sea-level (MSL) (see Duncan and Waite, 2017). The mean elevation of Lake Champlain is 29.9 m (98ft) MSL, with a typical annual variation that can be as large as ±1 m. We compare the Diamond Island data with the ERA5 grid-box centered at 44.25 • N, −73.25 • W which has a mean elevation of 208.3 m MSL. For this grid-box the ERA5 lake cover is 4% and the lake depth is 48.6 m: the FLake model limits lake depth to 50 m. Lake temperature profiles to a depth of 85 m for the Otter Creek Segment (OCS) near Diamond Island are available about 10 times a year for 2012-2017 from the Vermont Department of Environmental Conservation Lake Monitoring Program (VTDEC, 2019). Inflowing river temperatures for the Otter Creek (OCRT, 2019) are also available about 10 times a year for the same period, except there is no data for 2014.
Colchester Reef at 44.555 • N, 73.329 • W (red triangle in Figure 1B: FEMC, 2019) also has 15-min T water at 3 m depth and T air at 47.1 m MSL for 2015-2017, which we will compare with the ERA5 grid-box centered at 44.5 • N, −73.25 • W, which has a mean elevation of 148.4 m MSL. For this grid-box the ERA5 lake cover is 14% and the lake depth is 33.8 m, but we have no comparison lake profile data.

ERA5 Processing
The hourly ERA5 data was accessed at quarter degree resolution from the Copernicus Climate Data Store (C3S: Copernicus Climate Change Service, 2017). We used the 6-18 h short-term forecasts that are initialized from the 0 and 12 UTC analyses. This resolves the diurnal cycle well and removes the initial spin-up in the first 6 h of the forecast (Betts et al., 2019). These shortrange forecasts are close to the analyzed large-scale flow, but they already contain any systematic errors in the land surface model (Haiden et al., 2016).
For the seasonal cycle, the ERA5 hourly grid point data were reduced to daily means in UTC days. The observations are 15 m means, and from these we also computed daily (UTC) and monthly means.

Air and Water Temperature Comparison
We will directly compare the seasonal cycle of air and water temperature between ERA5 and FLake and the observations. There are some issues. The mean height of the surface of Lake Champlain (above MSL) is 29.9 m. In times of major flood, like Tropical Storm Irene in 2011, it rose above 31 m. Lake Champlain at 30 m elevation is surrounded by higher terrain shown in Figure 1A. The Adirondack Mountains to the west have many peaks above 1,200 m, and the Green Mountains to the east have peaks above 1,000 m. The ERA5 native resolution of 31 km (sampled at a quarter degree) smooths the topography. For example, Mount Marcy is in the gridbox at 44 • N, 74 • W, only 40 km west of the lake, with a peak elevation above 1,500 m, while this ERA5 grid-box has a mean elevation around 600 m. The smoothed ERA5 topography does not represent the mountain peaks, nor the smaller hills that surround the lake. As a result, all the ERA5 grid-boxes that include parts of the lake have mean elevations higher than 30 m; and this height difference increases southward.
FLake is a simplified model with a specified fixed lake area and depth for each grid-box. There is no water flow or water balance equation, so the transfer of heat and water by rivers and lake circulations are not represented. Figure 3 compares the mean annual cycle of T air , and T water (at 3 m below the mean lake surface) for 2012-2017 for Diamond Island and 2015-2017 for Colchester Reef, with the ERA5 2-m mean air temperature (T2 m ) and the FLake model MLT on the corresponding ERA5 grid-boxes. The right-hand-scale shows the mean seasonal cycle of the FLake MLD, which is constrained by the specified model lake depths, which are 48.6 m and 33.8 m for the southern and northern grid-boxes.
The Diamond Island air temperature ( Figure 3A) is warmer than ERA5 by 1.1±0.3 • C in the warm season (April to September) and 1.6±0.4 • C for the cold season (October to March). The elevation difference between model topography and measured air temperature (at 42.6 m MSL, 12.7 m above the lake) is 165.7 m, and a nominal correction for this elevation difference, using the standard atmosphere lapse rate of −6.5 • Ckm −1 , is 1.1 • C, comparable to the warm season bias. However, it should be noted that the model 2-m temperatures are computed to represent synoptic measurements above a grass plot, while the observations are on a small island tower at 12.7 m above the lake surface. In addition, we are averaging over day and night with substantially different boundary layers. The Colchester Reef air temperature ( Figure 3B) is warmer than ERA5 by 0.4 ± 0.5 • C in the warm season and 1.5 ± 0.5 • C for the cold season. In April and May the two air temperature are very close. The elevation difference between model topography of 148.4 m for this gridbox at 44.5 • N and the sensor height of 47.1 m MSL is 101 m; giving a nominal correction using the standard atmosphere lapse rate of 0.66 • C, which is comparable to the summer bias.
The measured water temperatures for Diamond Island (left) are warmer than the FLake MLT. The difference is largest in May and June (4.1 ± 0.8 • C), smallest at the peak lake temperatures in August (1.2 ± 0.2 • C), and the difference is again large in December and January (3.3 ± 0.8 • C). Only at the peak in August is the difference in lake temperatures the same as the difference in air temperatures, which is likely connected to the higher elevation of the ERA5 grid-box above the lake.
In winter, the Diamond Island water temperatures at 3 m depth remain above freezing. In contrast the FLake MLT falls to 0 • C in January and stays at 0 • C through March with a surface ice thickness in February and March that ranges from 20 to 76 cm. This same unrealistic 0 • C ML with the FLake model was seen in an earlier study of Sparkling Lake in northern Wisconsin, which was part of the Lake Model Intercomparison Project . For Lake Champlain, the two coldest winters are 2014 and 2015, when the FLake model has the thickest February-March ice layer (66 and 76 cm, respectively) and Lake Champlain froze over on February 12 and 14, respectively. For the other four warmer winters the FLake ice depth was between 20 and 37 cm, and Lake Champlain did not freeze over.
The warming of the FLake MLT from its frozen state is slow in spring, but we also see the lake ML cools faster in the fall than the Diamond Island water temperatures. For the grid-point to the north including Colchester Reef, MLT rises faster in spring and falls a little faster in fall. This is related to the smaller specified depth in the lake model. As a result, in May and June the difference between measured T water and MLT is (1.9 ± 1.2 • C) (smaller than for Diamond Island), with the smallest difference in July (1.0 ± 0.6 • C) and a similar large difference in December and January (3.3 ± 0.8 • C). These lake temperature comparisons near Diamond Island are discussed further in the next section.

Seasonal Comparison With Otter Creek Segment Profiles
Profiles of lake temperature with depth are made at several locations on Lake Champlain. The Otter Creek Segment (OCS) profiles down to 85 m are close to Diamond Island. Figure 4A shows the mean temperature profiles for 2012-2017, binned in 4 m ranges of depth down to 50 m, from May to October. Two late April profiles (from 2010 and 2013) show almost constant temperatures in the range 3.5-3.8 • C with depth, just below the temperature of maximum density of water (3.98 • C). The lake warms from the surface in May, June and July (red curves), reaching its maximum temperature in August (heavy black line) and then cooling in September and October (blue curves). We show only monthly mean profiles, which are 6-year averages, because the data is heterogeneous. There are typically only 2 profiles in May and as many as 4 profiles in August; and profiles are on different days in different years. During the warming phase, there is a strong stratification with depth in the mean, as well as in most individual profiles (see next section), with no suggestion of a ML. However, after August as the lake is cooled from the surface, these mean profiles show the development of a ML, which is also seen in individual profiles. There is no cold season profile data. Figure 4B is the Diamond Island comparison just for water temperatures. It shows the annual cycle of the ERA5 MLT and also BLT (bottom layer temperature at 48.6 m depth), along with the Diamond Island T water at 3 m below the surface. From the profiles in the left panel, May to October Otter Creek Segment means have been calculated. T6:OCS is the mean for the nearsurface layers down to 12 m. The close agreement between T water (fully sampled at 15 min) and T6:OCS (sampled only a few times a month) is encouraging. Lake Champlain for the Otter Creek segment is much deeper than 50 m, and the OCS profiles go as deep as 90 m. So we also show the 50 m comparison of T50:OCS, which corresponds to the depth of ERA5 BLT, as well as T86:OCS, an 86 m-mean at lake bottom. It is clear there is consistency between the BLT and the poorly sampled deep layer OCS temperatures from May to October. In winter, the BLT is close to 3 • C. The temperature, T:OCRT (in green), comes from another dataset for the Otter Creek river temperatures (OCRT, 2019) measured shortly before the river enters Lake Champlain. The annual sampling is poor, the scatter is large and there is no data for 2014. However, these temperatures of the inflowing nearby river are around 5 • C warmer in spring than the Diamond Island T water .
It is clear that the ERA5 MLT is cooler than the near-surface lake observations. In mid-summer the small differences of order 1 • C are probably connected to cooler air temperatures of the ERA5 grid-box which has a mean elevation over 170 m above the elevation of Lake Champlain. However, MLT is 4 • C cooler than Diamond Island T water in May and June. As discussed in the previous section, the FLake MLT falls to 0 • C in January and stays at 0 • C through March with a surface ice layer. As ML depth increases to a peak in May, MLT rises much slower than T water and T6:OCS. The OCS mean profile observations show that Lake Champlain is strongly stratified in Spring. It does not have the ML that is imposed in the FLake model. The T water observations in winter and the few T6:OCS in April (not shown) suggest the lake is close to maximum density near 3 • C. Stepanenko et al. (2010) noted this same behavior and suggested that vertical mixing was too strong in Spring in FLake and two other models. In the fall, the ERA5 ML reaches almost the full FLake model depth in November, which then cools through 3 • C, typically in mid-December.
A separate issue is that FLake cannot represent the rivers that run into Lake Champlain year-round. The warmer river inflow in Spring contributes to the stratification of the lake (Morrill et al., 2005), as suggested by the higher Otter Creek river temperatures T:OCRT shown in Figure 4B.
The next section illustrates the seasonal differences between the ERA5 and the OCS profiles for a single year: May to October 2016. profiles (solid) down to 50 m for the same dates in 2016 to illustrate differences between the warming and the cooling period, and the differences between the FLake ML and observations. The plots again show that ERA5 tends to underestimate the temperature of the lake especially in the spring and late fall. Figure 5A shows that the measured OCS profiles in late May are strongly stratified in the first 10 m. In contrast, ERA5 has deep cold ML down to 40 m. The model ML profile, which was at 0 • C into March, reaches 2 • C on April 22, and climbs roughly 1 • C every 10 days, reaching 5 • C on May 24. The OCS profiles are all stratified in June and July as well, with one exception, June 14, which shows a 23 m deep ML after a few days with strong winds. At depths of 40-50 m the observed profiles and ERA5 agree well. Figure 5B compares daily profiles from the peak lake surface temperature in August through the cooling period in September and October. During the cooling period, most of the OCS profiles show a ML, so the agreement with ERA5 is better, although the OCS profiles are mostly warmer than the ERA5 profiles. As mentioned earlier this may be partially due to the higher elevation of the lake surface in ERA5 (198 m above Lake Champlain), and warmer temperatures in inflowing rivers may play a role in August. After October when we have no profiles, Figure 4B shows that the deep ERA5 ML continues to cool faster than the Diamond Island water temperatures, as it cools toward 0 • C in mid-winter.

DISCUSSION AND CONCLUSIONS
Lake Champlain is a challenging test for the ERA5 FLake model in ERA5 where the native resolution of ERA5 is 31 km, which we have sampled at a quarter degree. The lake is the lowest region at 30 m MSL within complex mountain topography ( Figure 1A) and extends over several grid cells. For the Colchester Reef and Diamond Island sites, where we have extensive comparison data, the mean ERA5 grid-box elevations are above the lake surface by 118 m and 178 m, respectively. This contributes to mean near-surface air temperatures in ERA5 that are cooler than observations of order 1 • C. We compared the seasonal cycle of grid-box air temperature and lake temperature from FLake with a range of observations. The FLake model gives reasonable peak summer temperatures, consistent with the higher mean elevation and cooler air temperatures for the ERA5 grid boxes.
However, the seasonal cycle of FLake temperatures has a sharper peak than observed lake temperatures. In winter, lake temperatures are close to 3 • C not far below the temperature of maximum density, while the deep FLake ML cools to 0 • C in February and March with a surface ice cover ranging from 20 to 76 cm thickness in warm and cold years respectively,. The recovery from this deep cold ML is slow in spring. In May and June, while FLake maintains a deep ML, the lake profiles are generally strongly stratified with peak temperatures near the surface several degrees above the model ML. One possible contribution is that inflowing river temperatures that are not considered by FLake are as much as 5 • C above the lake surface temperature from April to June. The lake does develop a ML structure as it cools from the temperature peak in August. However, the FLake ML cools faster and grows deeper in fall as the model lake returns to a deep near-freezing mixed layer in winter. For the Diamond Island site comparison, the model lake bottom temperatures at 48.6 m correspond closely to observed lake temperatures at 50 m from May to October.
Our conclusion is that the vertical mixing in the FLake ML is stronger than the vertical mixing in Lake Champlain. Higher spatial resolution would reduce the small cool bias in FLake midsummer temperatures associated with the high bias of the ERA5 grid-box elevations from the smoothing of the adjacent mountain topography. Choulga et al. (2019) are working on improving the resolution of both the orography and the depth topography of the lake.

DATA AVAILABILITY STATEMENT
The ERA5 reanalysis data are available from the Copernicus data store at: https://cds.climate.copernicus.eu/cdsapp#!/dataset/ reanalysis-era5-single-levels?tab=form; The Diamond Island dataset is available from Duncan and Waite (2017

AUTHOR CONTRIBUTIONS
Conceptualization, analysis, writing, and editing: AB. Analysis, writing, and review: DR. Data curation, supervision, and review: CC. All authors contributed to the article and approved the submitted version.