Seasonality in Human Interest in Berry Plants Detection by Google Trends

The phenology of berry-producing plants, particularly their harvest season, is of human interest and also reflects the ecosystem’s response to the changing environment. We investigated the seasonal dynamics of human interest in berries growing in boreal, subarctic and Arctic ecosystems, mainly in Russia, based on internet search data via Google Trends. There is a typical and culture-specific pattern of seasonal variations in search volume concerning berries across Russia, Finland, and Canada. Generally, the seasonal peak of search corresponds to the common berry harvest season across these countries. We discussed the potential and limitation for detecting ecological factors from the internet search data, in which physical phenomena and socio-cultural aspects are fundamentally superimposed, and its applicability to phenological studies.


INTRODUCTION
Plant phenology, or seasonality, can affect human behavior in various ways, reflecting human interaction with specific plants (Platton and Henfrey, 2009;Spark et al., 2012). Wild berries growing in the boreal, subarctic and Arctic ecosystems benefit both humans and wildlife (e.g., Nestby et al., 2019). They are part of essential ecosystem services and provide material resources such as nonwood forest products (e.g., Turtiainen and Nuutinen, 2012;Sorrenti, 2017) and social-cultural services of recreation, culture, and livelihoods when harvested by rural and urban residents (Kangas and Markkanen, 2001;Stryamets et al., 2015). During the harvest period of the wild berrieswhich ranges from early summer to autumn, depending on species-they are foraged for both self-consumption and sales, with accompanying outdoor activity and seasonal jobs (e.g., Kolosova et al., 2020). Which berries are eaten raw, or processed and preserved for winter, depends on the species and culture of the region. Ongoing and projected climate changes in high latitudes may, however, change these functions as plants respond to a changing environment (Gauthier et al., 2015). Although the extended growing season may improve yields (Holmberg et al., 2019), the sensitivity of these plants to climate and environmental changes (Boulanger-Lapointe et al., 2017;Anderson et al., 2018;Herman-Mercer et al., 2020) may lead to changes in the areas where they are found. The influence of climate change on berry abundance has been recognized by local residents in northeastern Siberia (Ksenofontov et al., 2018) and Alaska (Hupp et al., 2015). Therefore, mapping and monitoring berry plants, and especially their fruiting phenology, are essential to understand and predict the human-nature interaction in this region.
Various methods are used to observe spatial and temporal variation in terrestrial ecosystem phenology. Satellite and UAV remote sensing observation is practical in monitoring ecosystem dynamics for extending space and temporal scales (e.g., Klosterman et al., 2018). It is, however, difficult to monitor the fruiting of shrub and forest floor vegetation that is shaded by overstory canopies or by its own stems and branches with these technologies, and in situ observation (i.e., field survey and fixed-point time-lapse images) is needed (Nagai et al., 2018). Even coverage across extended boreal regions is prevented in remote sites that are difficult to access. An alternative source of ecological information has been the use of social sensing data (Barve, 2014;Silva et al., 2018). Massive databases have been created by diverse crowd users who have acted as sensors by posting their experiences and interest in environmental phenomena, including vegetation dynamics, thereby extending and popularizing internet-based platforms worldwide (Kirilenko et al., 2015;Lopez et al., 2019). This information records environmental changes in real time.
One example of a social sensing data source of human interest is Google Trends (GT) (Google Inc.), which provides internet search statistics. These record how often users search for a particular term via Google over a specific period and geographical region, and cover fields such as business marketing, public health, tourism, and communication (e.g., Jun et al., 2018). In the fields of ecology and environmental research, published studies have focused on the trend of human interest in bio-environmental issues with temporal variations in searched keywords, such as "climate change, " "biodiversity, " "wildlife, " and "conservation" (Ficetola, 2013;Mccallum and Bury, 2013;Nghiem et al., 2016;Brodeur et al., 2018). Interest in specific plants and animals that may harm people, such as plant species with allergenic pollen (Bousquet et al., 2018;Hall et al., 2020), mosquitoes, invasive species (Proulx et al., 2014), and rodent outbreaks (Szymkowiak and Kuczyński, 2015) are reflected in the spatial and temporal variation in internet searches. Interest in organisms such as fireflies and beetles (Takada, 2012), and flowering times and autumn leaf coloring, also has seasonality. Seasonal variations in biological processes, therefore, will be available in the internet search data.
This study aimed to explore the potential of using internet search data to understand the temporal dynamics of berry plants, which are of human interest as seasonal delicacies, as part of recreation activities, and sometimes for subsistence. We investigated the temporal variation in human interest in berries in Russia via GT, focusing on its seasonality and regional differences in Russia and other countries such as Finland and Canada. We considered the socio-cultural and geographical background of GT data in discussing the benefits and limitations of internet search data for phenological studies.

MATERIALS AND METHODS
GT 1 provides a time series of relative search volume (RSV), which indicates a ratio of the number of queries-including examined keywords-to the total number of queries sent from a specific location (selectable from worldwide/national/state/city) from 2004. The value of RSV ranges from 0 to 100, owing to normalization, to set the maximum in an investigated period to 100, and RSV = 0 when the number of queries is less than a certain threshold (Stephens-Davidowitz and Varian, 2015). The magnitude of RSV is not in proportion to the number of queries. RSV is calculated with a sample taken from the total Google search corpus, which consists of "billions of searches per day" and is cached every day (Stephens-Davidowitz and Varian, 2015). Hereafter, terms in italics indicate those originally used in GT.
There are two ways of searching GT: search term and search topic (details are given in the Supplementary Text). In this study, we conducted GT searches mainly in the search topic option, with the topic of "Plants" to filter out queries not relevant to berries as a plant (e.g., trade names, titles of movies and books, and place names). The terms we examined were given in English (search topic option is executed for any language, as noted in the Supplementary Text). We checked the related queries (given in corresponding languages, i.e., Russian, Finnish, and English, for Russia, Finland, and Canada, respectively) and related topics (given in English) attributed to the examined keyword. GT returned them with a relative score; the most frequently searched query and topic were given 100.
We analyzed the period from 2011 to 2020 as there was a noisy time series of RSV for the examined terms in Russia before 2011, probably owing to the improved geographical assignment of GT data collection system from January 1, 2011 (i.e., localization of search volume; Bokelmann and Lessmann, 2019). We found there was a reduced share of each keyword, as search populations expanded and diversified owing to the popularization of internet use worldwide (Stephens-Davidowitz and Varian, 2015); this has been particularly remarkable in searches on science and environmental issues (Ficetola, 2013;Nghiem et al., 2016). We applied corrections to RSV, as proposed by Nghiem et al. (2016), in which RSV was normalized with several benchmark terms, and found no evidence of effect from the changing search population (details are given in the Supplementary Material). Thus, we assumed that the uncertainty due to technical problems in the search engine (as noted above) might be minimized by analyzing a recent decade, and so we analyzed the GT data without correction.
The calculation time steps depend on the analysis period: hourly data available for the last 7 days; daily data for < 270 days; weekly data for < 5 years; and monthly data for longer than 5 years (confirmed in February 2021). To detect seasonal patterns in RSV, we used the RSV of each year from January to December at weekly intervals (i.e., N = 52 or 53 per year). As a result, we were able to obtain normalized seasonal variation, in which each year's seasonal peak took a maximum of RSV = 100. Although the time series of daily and monthly RSV intervals could also be obtained from the analyzed period (Supplementary Text), we chose to use weekly data to detect plant phenology because of the tradeoff between temporal resolution and noise contamination. Execution of GT was carried out using the R package GTrendR (Massicotte and Eddelbuettel, 2020) on March 6 and 7, 2021. We based GT data collection on standard time (not local time) because of its negligible effect on seasonal analysis. We examined 10 terms for berry plants that are popular and generally used in Russia (Strakhov and Pisarenko, 1996) and in Nordic countries (Hjalmarsson and Ortiz, 2001;Li et al., 2016): lingonberry, cranberry, bog bilberry, European blueberry, currant, gooseberry, cloudberry, raspberry, honeysuckle, and sea buckthorn ( Table 1). The ripening season of these berries ranges from July to October depending on species and region ( Table 1). Google search frequency for these terms was relatively higher in Russia than in other countries ( Table 1). Common names were examined in our GT analysis rather than scientific names. The single exception was for the term "gooseberry, " which did not return corresponding Russian queries, however, when we examined its scientific name, Ribes uva-crispa, accurate results were returned. In the case of "raspberry" searches, the topic "Fruit" instead of "Plants" was the applicable search topic option. These 10 terms were also examined for Finland and Canada and compared in terms of the 10-year mean RSV ensemble. When considering habitat distribution of common species, in the case of Canada we replaced "bog bilberry" and "European blueberry" with "bilberry" and "blueberry, " respectively.

Search Results
Supplementary Table 1 shows a list of queries for the examined keywords by a search topic option. Names of the berry in question, in the language of the corresponding country, were found in the most popular queries. An exception was found in results for raspberry, in which English word "raspberry" and "raspberry pi" (expected being trade name of the computer released in 2012) got a higher score than the Russian term " (raspberry)" (of the third frequent query with a score of 14). These queries were not excluded, even by the search topic option. Generally, other irrelevant queries found in a search term option (Supplementary Table 2) have been excluded in Supplementary Table 1; for example, the name of a company like " (Lingonberry Ekaterinburg)" for lingonberry and " (Bank Cranberry)" for cranberry and literature such as " (Gooseberry Chekhov)" for gooseberry. The RSV based on search term exhibited a larger scatter than that based on search topic option, particularly during a period of relatively small RSV, probably because searches for these queries did not refer to berries as plants of the season (Figure 1).
Supplementary Table 3 shows a list of topics linked to posted queries for each keyword. These keywords were frequently searched for in the context of other kinds of berries. For example, queries for cranberry, European blueberry, and cloudberry were searched for in the common topic "lingonberry." Various topics indicated processed products such as "Pie, " "Sauce, " "Mors (boiled berry drink in Russian), " and "Varenye (berry preserve or jam in Russian)" in queries for most keywords, except for raspberry and sea buckthorn. "Winter" and "Sugar" might have reflected preservation for the winter season (Stryamets et al., 2015). We found unique topics for specific berries: "Leaf " was the most frequent topic for lingonberry, which may indicate the popularity of using the leaves as pharmaceuticals (Shikov et al., 2017;Raudone et al., 2019). Several topics relating to horticulture, such as "cultivar, " "pruning, " "reproduction, " and  (Hummer et al., 2006;Russia, Sakhalin); c (Hjalmarsson and Ortiz, 2001; Scandinavia), d (Li et al., 2016;Canada, Labrador), e (Barney, 1997). *Based on Google Trends data from 2011 to 2020. Frontiers in Forests and Global Change | www.frontiersin.org "transplants, " may connect to searches of honeysuckle. Some berry plants are popularly cultivated in private gardens or farms (Stammler and Sidorova, 2015;Rusanov, 2019). In contrast, searches for sea buckthorn concentrated on topics related to its oil, which has been used in traditional botanical medicine (e.g., Bal et al., 2011;Olisova et al., 2018). Searches relevant to a computer ("raspberry pi") remained, even in search topic limited to "Plants."

Seasonal Variation in Relative Search Volume in Russia
There was similar seasonal variation in RSV with different time scale settings. Figure 2 shows the lingonberry RSV across Russia for 3 years in daily, weekly, and monthly time steps. These were obtained from time settings of May-December (daily), January-December (weekly) in each year, and through 10 years from 2011 to 2020 (monthly). The weekly RSV variation maintained the general features of daily variation. When monthly RSV was a relative value over 10 years, there was small seasonal change in monthly RSV in a year when RSV was relatively low (as in 2011). Weekly and monthly results in the case of Russia are presented in Figures 1, 3, respectively. Seasonal peaks in monthly RSV generally appeared in the same month in all years. Positive trends (derived from seasonal and trend decomposition; Cleveland et al., 1990) arose in most examined terms. There was a marked maximum RSV in 2020 for some results. In the trenddeduced RSVs, the largest annual amplitude in 2020 was still consistent for some of them (Supplementary Figure 1). A seasonal cycle with a maximum in summer to autumn was a common feature for each species (Figure 4). After the seasonal peak of lingonberry, cranberry, and sea buckthorn, RSV gradually decreased until a rise occurred in the next season. The other species, mainly European blueberry, currant, gooseberry, and honeysuckle, showed relatively distinctive seasonal peaks with small and stable RSV during the other period (i.e., out of the growing season). In contrast, a relatively large variation in RSV throughout a year (except for the peak season) in the first few years of the observation period was remarkable, particularly in raspberry, cloudberry, and some other berries (e.g., cranberry, bog bilberry, and sea buckthorn) (Figure 1). Around May (before the increase to seasonal peak), a slight increase in RSV was found for bog bilberry, currant, and gooseberry, despite a large scatter by years. These trends probably relate to berry cultivation, found in related topics (i.e., cultivar, pruning, and landing) in spring months (Supplementary Table 4).
A distinct period of seasonal maximum (RSV = 100) was found for each berry (Figure 5). The middle date of the week with maximum RSV (hereafter, maximum RSV date) was late June for honeysuckle (166.5, 6.6; 10 years mean and standard deviation of day of year); around July for raspberry (191.5, 46.9), cloudberry (198.0, 9.

Regional Differences (Russia, Finland, and Canada)
The annual patterns of 10-year mean RSV were generally similar in summer (Figure 6), and seasonal peaks (maximum RSV dates) overlapped among the three regions ( Figure 5). The range of 10-year variation in the seasonal peak was extensive for some species, such as honeysuckle, raspberry, sea buckthorn in Finland; and honeysuckle, sea buckthorn, and lingonberry in Canada. These resulted in an unclear seasonal pattern of 10-year mean (Figure 6). For cloudberry, raspberry, and sea buckthorn, topics irrelevant to plants (i.e., lacquer, raspberry pi, and buckthorn oil, Supplementary Table 3) made for a scattering background.
In the case of cranberry in Canada, two distinct seasonal peaks-the largest in the third week of December and the second in the first or second week of October (Figure 6)-corresponded to Christmas and Thanksgiving, respectively. There was a seasonal peak in December in Finland, but a gradual second peak may have overlapped the harvest season, as in Russia. Related topics in each month supported contrasting search tendencies in Russia and Canada (Supplementary Table 4). While "Mors (berry drink in Russian)" was regularly a top topic throughout the year in Russia, there was a seasonal change in popular topics in Canada; that is, "Juice" during March to September, and "Recipe" or "Sauce" (probably relating to cooking) in the other months (Supplementary Table 4).

Seasonal Human Interest in Berries Detected by GT
A seasonal pattern of human interest in the topic of berry species was detected from web search data via annual (i.e., weekly resolution) GT analysis. Among the tested 10 species, a remarkable rise in RSV during the berry harvest season (ripening season) was detectable for several species. Another noteworthy pattern was found when cultural customs were seen to affect RSV (i.e., cranberry during the Christmas period in Canada). Most species in the study area exhibited a seasonal peak in RSV, with a steep increase, and differences in species and countries appeared in period when RSV decreased and in winter. There was interest in some berries after the harvest season but no interest in other species, reflecting the different uses of each berry (they may be a seasonal delicacy or useful over winter). Interest in the seasonality of plants generally tended to be more common in residents of high latitudes (Proulx et al., 2014;Mittermeier et al., 2019), which may support the clear seasonality projected in the berry searches in our study area. Our results present evidence of seasonality in human interest, through the internet search volume, with fundamental socio-cultural aspects superimposed on this. Consequently, searches for terms directly mentioning the use of a plant (e.g., food product, berry harvest) can reflect the seasonality of the berries. The motivations and context of the searches for "berries" were diverse. The topics popularly related to berries depended on (1) Honeysuckle, (2) cloudberry, (3) raspberry, (4) currant, (5) (European) blueberry, (6) gooseberry, (7) (bog) bilberry, (8) sea buckthorn, (9) lingonberry, and (10) cranberry is presented in order of increasing RSV date in Russia. species and region; some commonly indicated products (either purchased or homemade) and cultivation (either household or commercial). The multiple contexts of a search for "berries" meant that simple interpretation of the results, as validating a single piece of evidence, was far from straightforward. This contrasted with previous studies of the pollen release season, which showed that web searches for major allergenic pollen species (Bousquet et al., 2018) and the general term "pollen" (Hall et al., 2020)-which are mainly driven by concerns of health risk-coincided with seasonal changes in pollen concentrations. Such a keyword with relatively clear associations with specific phenomena enables temporal and spatial variation in RSV to be validated with biological observation (Proulx et al., 2014). However, time series or statistics data usable for a validation source for the seasonal pattern of human interest in berry species were the indirect explanatory factor: for example, supply in the agricultural markets and retail stores, routine observation at botanical gardens or institutes (e.g., Roslin et al., 2021), and climate conditions affecting berry growth (e.g., Krebs et al., 2009;Nestby et al., 2019;Tahvanaian et al., 2019) in the target area. Event dates that cue people to make an internet search for berries (e.g., festivals featuring specific berries and announcement of wild berry harvesting periods from local government in Russia) are considered a regionally specific condition. Collections of field survey information from people's behavior, such as local customs and knowledge of where and when berries are available (e.g., Everett, 2007;Ksenofontov et al., 2018), can be sources of validation.
We carried out a national-scale analysis for the selected three countries. Previous studies have revealed the reliability of spatial analysis at the sub-national scale, such as by state (the United States; Proulx et al., 2014), by province (France; Bousquet et al., 2019), and in the US Designated Market Area (Hall et al., 2020). Thus, the value of GT data as geographical data has been suggested (Proulx et al., 2014). In Russia and Finland, however, typical seasonal patterns in RSV found in the national level analysis were not sufficiently detected at a subnational scale, except for relatively highly populated districts (e.g., Moscow). A less-populated district was even unable to provide a continuous time series, probably because of low search volume beneath a threshold. The search population in a limited area, and period with a bias in individual attribution (e.g., place of residence, age, and occupation), and the motivation for and context of searches, introduced uncertainty into the RSV (Proulx et al., 2014;Toivonen et al., 2019). The rate of a user's share of other search engines also affects population bias (Dergiades et al., 2018). In addition, spatial heterogeneity inside the target area can result in unclear trends. For instance, Sakha republic (the largest subdivision of Russia, covering 3 million km 2 over the southern forest region to the coast of the Arctic Ocean) has 30% of its population concentrated in the capital city in the forest area. Diversity in climate and ecoregion inside the examined area can be reflected in people's behavior and lead to mixed results. Successful spatial analysis of previous studies in the United States and Europe may partly have resulted from a sufficiently large population and geographical uniformity at the sub-national scale at a certain level.
Although this study focused on the shape of the seasonal cycle and its maximum, the interannual variation in RSV throughout the study period (Figure 3) provided decadal-scale changes in human interest. Multiple backgrounds, including food demand, cultivation, and picking activity (as both livelihoods and recreation), can affect public concern about specific plants, particularly edible/usable ones such as berry plants. Learning how these search motivations varied at a decadal scale is an essential perspective in understanding human perception and response to the seasonality of berry plants. Previous studies, however, raised questions about uncertainty embedded in the long-term trend in GT data due to changing search populations since 2004 (Ficetola, 2013;Nghiem et al., 2016;Bokelmann and Lessmann, 2019;Supplementary Text). In contrast to previous studies in various fields such as tourism (Bokelmann and Lessmann, 2019), conservation, and environmental issues (Ficetola, 2013;Nghiem et al., 2016), our 10-year RSV time series did not present a downward trend, and we suspect this was not affected by changes in absolute search volume, irrespective of correction (Supplementary Text and Supplementary Figure 3). The internet prevalence rate in the Russian Federation has almost doubled in the last decade: in 2011 and 2019 the percentage of subscribers to internet access via fixed broadband was12.2 and 22.2%, and access via mobiles was 47.8 and 96.4%, respectively, (Federal State Statistics Service, 2020). We may attribute the positive trend in some berries (Figure 3), therefore, to increased fruit consumption in the Russian Federation, reflecting health and food choices, despite declining disposable income (USDA, 2020).

Application to Phenology Study
As mentioned in 3.2, some berry species exhibited a remarkable peak in RSV during the berry harvest season, although others were strongly affected by abiological factors. These peaks generally corresponded to the period of ripening, although the nation-scale analysis did mean there was some uncertainty (Figure 5). It was expected, for example, that interannual variation in the seasonal peak of human interest could be evaluated with the weekly time resolution of GT analysis, as conducted in this study. To apply GT data to phenology studies, some of the remaining (and critical) limiting factors are considered below.
It will be possible to overcome uncertainty from multiple meanings irrelevant to the research objective, and which affects the results-such as the growing (ripening) berry seasonif (1) appropriate terms and topics are assigned and (2) the period being analyzed is set in a way that eliminates searches irrelevant to plant dynamics (e.g., avoiding the dormant season). These two recommendations will need to be based on knowledge of phenological events such as the typical length of the growing season and specific, regional social/cultural events. Against the problem of spatial resolution of GT analysis (including less-populated regions), integrating various data sources will reveal the spatial and temporal dynamics of phenological phenomena. Photographs and videos provide, sometimes without the intention of users, the in situ reality of species habitats (ElQadi et al., 2017), phenology (Silva et al., 2018), and ecosystem services such as recreational and aesthetic landscapes (Richards and Friess, 2015;Vaz et al., 2019). Posted texts in social media also provide location-specific events about vegetation phenology (Silva et al., 2018). GT data time series fill a deficiency in temporal continuity and regional representativeness of these spot data, and can be applied to mathematical modeling as an alternative to traditional ecological data (Capinha, 2019).
Use of these social sensing data in satellite remote sensing analysis is an expected application of the technology. Recently developed satellite observations with spatially highresolution sensors (e.g., Sentinel-2) produce precise mapping of phenological timing, including autumn coloring of (mainly) crown leaves. However, detecting floor vegetation dynamics remains to be addressed, even in a boreal ecosystem with a relatively sparse canopy structure. As berry plants on the forest floor experience climate change just as the overstory species do (Selås et al., 2015), the harvest season of forest berries can be affected by canopy-forming plants, which are directly detected by remote sensing observation. Linking the characteristics of overstory and floor vegetation enables the extraction of floor vegetation information from remote sensing observation. To achieve this aim, information based on social sensing data is essential, as well as on the relationship between floor vegetation and overstory canopy structure (Miller et al., 1997;Barbier et al., 2008;Majasalmi and Rautiainen, 2020), and spectral characteristics of several berries on the forest floor (Rautiainen et al., 2011;Forsström et al., 2019).

CONCLUSION
We explored the possibility of using internet search data to detect the human interest in berry species in high-latitude countries, and especially in Russia. There is a typical and culturespecific pattern of seasonal variation in search volume on berries across Russia, Finland, and Canada. Although it is important to recognize the limitations inherent in GT data-such as validity in decadal-scale trends, the bias of the search population, and multiple meanings of a search term-the seasonal search peak corresponds to the common berry harvest season across these countries. If the characteristics of the search population (e.g., residence area, age structure, and occupation) and basic geographical and ecological information about the target area are available in advance, it will be possible to interpret the seasonal patterns. Combining this information with data from other social sensing data and satellite remote sensing observation with high spatial resolution is expected to improve the spatial evaluation of berry plant dynamics, especially in sparsely populated areas.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
AK, NS, and ST contributed to the conception and design of the study. AK, TG, and AM organized the database. AK and NS wrote the first draft of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version. Black lines indicate RSV without correction (same to Figure 3) and blue (computer), cyan (software), green (life), and red (love) are RSVs with benchmark correction.
Supplementary Table 1 | Related queries for given keyword with the limited topic "Plants" (search topics option) (score > 20).
Supplementary Table 3 | Related topics for given keyword with the limited topic "Plants" (search topics option) (score > 20).
Supplementary Table 4 | Monthly based related topics for given keyword with the limited topic "Plants" (search topics option). Numbers of the corresponding months are noted in parentheses.