Ten Years of Landscape Genomics: Challenges and Opportunities

Landscape genomics is a relatively new discipline that aims to reveal the relationship between adaptive genetic imprints in genomes and environmental heterogeneity among natural populations. Although the interest in landscape genomics has increased since this term was coined, studies on this topic remain scarce. Landscape genomics has become a powerful method to scan and determine the genes responsible for the complex adaptive evolution of species at population (mostly) and individual (more rarely) level. This review outlines the sampling strategies, molecular marker types and research categories in 37 articles published during the first 10 years of this field (i.e., 2007–2016). We also address major challenges and future directions for landscape genomics. This review aims to promote interest in conducting additional studies in landscape genomics.


INTRODUCTION
Rapid global climate change is an important factor that affects biodiversity (Hoffmann and Sgrò, 2011). Adjusting their distribution range or local adaptation is the usual coping strategy of species toward rapid climate change (Aitken et al., 2008). Local adaptation requires the species to face long-term spatial environmental heterogeneity and eventually leads to adaptive differentiation of phenotypes. These changes might be due to phenotypic plasticity or heritable phenotypic variation. Exploring the adaptive evolution of species in response to spatial environmental heterogeneity will be useful in understanding initial adaptive divergence and evolutionary potential of a target species (Pluess et al., 2016). Landscape genomics is a powerful research field for investigating the adaptive evolution of species in response to spatial environmental heterogeneity (Vincent et al., 2013). Joost et al. (2007) proposed landscape genomics as a relatively new discipline that aims to reveal the relationship between the adaptive genetic imprints in genomes and the environmental heterogeneity. Different from landscape genetics, landscape genomics requires a sufficient number of molecular markers to cover the entire genome. Emphasis is placed on adaptive evolution at the genome level . Landscape genetics, however, is biased toward using a relatively small number of molecular markers to reveal the relationship between environmental factors and the spatial genetic structure of populations (Dionne et al., 2008;Poelchau and Hamrick, 2012;Manel and Holderegger, 2013). Landscape genomic studies on many plant and animal species have been recently conducted (Berg et al., 2015;Manthey and Moyle, 2015;Leamy et al., 2016;Vangestel et al., 2016). These studies have achieved considerable progress on understanding of the relative roles of adaptive and non-adaptive processes in shaping patterns of genomic variation and the effects of environmental variables on adaptive differentiation at the genomic level. Although landscape genomics has been pursued for a decade, the studies, basic theoretical frameworks, and universal hypotheses in this field are still scarce. Thus, additional landscape genomic studies are needed to assist the construction of basic theoretical frameworks and formulation of universal hypotheses. This review summarizes the progress of landscape genomic studies such as conceptual and methodological developments as well as applied contributions during the previous decade. We then outline expected future directions in the field and encourage researchers to participate in this field.

METHODS
By searching the theme "landscape genomics" in the database of web of science 1 , and further looking through the related papers carefully, 37 articles focused on adaptive genetic imprints in genomes driven by environmental factors were finally selected. Supplementary Table S1 lists the molecular markers, sampling strategies, statistical methods, and research categories addressed in these articles.

SAMPLING STRATEGIES IN LANDSCAPE GENOMICS
Sampling strategies in landscape genomics are divided into two major categories: random and stratified. The random sampling design includes scattered and clustered sampling. In the scattered sampling design, samples are randomly collected from across the species distribution range, while in the clustered sampling design, populations are divided into clusters according to environmental or genetic factors and samples are randomly taken from each cluster. A stratified sampling design can be performed to capture the range of variability across landscape variable(s) of interest. Thus, this sampling design requires a large amount of biological and environmental information of a target species. The optimal sampling scheme will be obtained by model calculation (Manel et al., 2012). The two sampling strategies mentioned above can be implemented at the individual or population level. The advantage of population sampling is more conducive to detect variation in gene frequency among populations than individual sampling. The most controversial topic in population sampling is multiple samples in fewer populations versus fewer samples in multiple populations. The former strategy is more representative in landscape genomic studies, but its accuracy in estimating genetic parameters is often questioned. In population genetics studies, the minimum sample size of a population should not be less than 20 individuals, 25-30 individuals are considered to be more reasonable (Hale et al., 2012). Therefore, it is necessary to ensure a minimum population sample size in the landscape genomic studies. Compared to population-based sampling, the application of individual sampling in landscape genomic studies is relatively scarce, but nonetheless suitable for clinal populations or those with unclear population structure (Jones et al., 2013).

MOLECULAR MARKERS IN LANDSCAPE GENOMICS
Landscape genomic studies require molecular markers that are sufficiently spread throughout the genome (Balkenhol et al., 2009). However, most non-model species do not have established genomic information to appropriately place sufficient markers across the genome. Therefore, two characteristics, i.e., no requirement for a priori genome knowledge and a high covering density in genomes, are indispensable for the use of these molecular markers in landscape genomics .
Two types of molecular markers are suitable for landscape genomic studies. Type-I markers have no DNA sequence information, such as amplified fragment length polymorphisms, inter-simple sequence repeats, and start codon targeted polymorphisms. Type-II markers contain DNA sequence information, such as single-nucleotide polymorphisms (SNPs). Type-I markers require low generation cost but have few defects. Although these type-I markers may allow detecting loci potentially responsible for adaptation using outlier locus detection and environmental association analysis (EAA), the gene function of those loci cannot be easily validated, and thus might be false-positives. Type-II markers usually display high scanning density but have higher generation cost than type-I markers. However, type-II markers exhibit several advantages because these markers contain DNA sequence information. This information can help us annotate and map these markers on the genome. Based on the landscape genomic studies that we have selected (see Supplementary Table S1), SNP genotyping was mainly achieved through DNA microarrays. However, the use of DNA microarrays requires a large amount of prior gene information (Teng and Xiao, 2009). The recently developed reduced-representation genome sequencing (RRGS) is based on next-generation sequencing (NGS), which includes genotyping by sequencing (Elshire et al., 2011), restricted site associated DNA (Miller et al., 2007), and specific-locus amplified fragment sequencing (Sun et al., 2013). RRGS reduces the cost of sequencing, maintains high coverage of the genome, and does not require a priori genomic information. Thus, the use of RRGS is beneficial in landscape genomic research (Brauer et al., 2016). Most of the RRGS methods are currently based on Illumina sequencing platforms, which have an advantage of high accuracy and throughput and a disadvantage of short reading lengths. Third generation sequencing (TGS), such as MinION device by Oxford Nanopores and PacBio Sequel by Pacific BioSciences, has been recently developed to compensate for the short reading length of NGS. Although TGS maintains the speed and flux advantages of NGS, this method still exhibits some problems, such as high cost and error rate, which must be addressed (Mikheyev and Tin, 2014). In summary, the use of type-II markers to conduct landscape genomic studies can facilitate the indirect validation of loci potentially responsible for adaptation.

MAJOR RESEARCH CATEGORIES IN LANDSCAPE GENOMICS
There are a wide variety of questions in ecology and evolution that can be addressed using a landscape genomic approach. We group these questions under two major research categories: (1) quantifying influence of spatial environmental variables on genomic divergence; (2) uncovering the environmental factors that shape adaptive genetic variation and the genetic basis of adaptive change.

QUANTIFYING INFLUENCE OF SPATIAL ENVIRONMENTAL VARIABLES ON GENOMIC DIVERGENCE
Isolation by distance (IBD) describes the local accumulation of genetic differences when dispersal between populations is geographically restricted (Slatkin, 1993). For IBD, gene flow path is assumed to be in a linear geographic distance. However, in natural landscapes, the paths of gene flow between populations are often non-linear and complex. In fact, populations that have identical habitats or small distances may also diverge when intervening landscape features inhibit dispersal between them (Isolation by Resistance, IBR;McRae, 2006;Ruiz-Gonzalez et al., 2015). Nevertheless, local genetic adaptation can also reduce gene flow among natural populations. This adaptive reduction in the effective rate of gene flow can contribute to a pattern of "isolation by environment" (IBE; Wang and Summers, 2010;Wang and Bradburd, 2014;Mosca et al., 2016). A strong pattern of IBE indicates that divergent selection is maintaining population differentiation in the face of possible dispersal (Schluter, 1998;Kawecki and Ebert, 2004). IBE can also arise from divergent habitat choice or other forms of biased dispersal (Armsworth and Roughgarden, 2008;Bolnick and Otto, 2013). Therefore, these different processes affect the spatial distribution of genetic variation and landscape genetic structure. Recently developed analytical methods can partition the often confounded patterns of IBD, IBE, and IBR when explaining genetic divergence across a landscape (Supplementary Table S1). The basic strategy is to use IBD as a null hypothesis against which IBE (or IBR) can be tested. Partial Mantel tests have been widely used in landscape genomics studies to evaluate the relative influence of different ecological and evolutionary factors on genetic differentiation. However, such tests have low statistical power and are prone to false positives (Guillot and Rousset, 2013). Recently, structural equation modelling (SEM)  and multiple matrix regression with randomization (MMRR) (Wang, 2013) have been used to quantitatively compare how much genetic divergence depends on IBD versus IBE (Zhang et al., 2016). In addition, the BEDASSLE package (Bradburd et al., 2013) is also used to estimate the relative contributions of IBD and IBE to genetic differentiation This Bayesian method models the allele frequencies in a set of populations at a set of unlinked loci as spatially correlated Gaussian processes, in which the covariance structure is a decreasing function of both geographic and ecological distance (Bradburd et al., 2013). In landscape genomic studies, multivariate statistical models are more appropriate when multidimensional niches are analyzed to identify ecological drivers of population genetic variation (Orsini et al., 2013). Redundancy analysis (RDA) (Legendre and Legendre, 2012) and canonical correlation analysis (CCA) (Parisod and Christin, 2008;Hecht et al., 2015) are commonly used to estimate the relative contribution of spatial and environmental variables. The CCA method can control for demographic effects if spatial autocorrelation is included in the model design, while RDA and partial RDA analyses are alternative and robust approaches that can control for spatial effects while analyzing others (Sork et al., 2013).

UNCOVERING THE ENVIRONMENTAL FACTORS THAT SHAPE ADAPTIVE GENETIC VARIATION AND THE GENETIC BASIS OF ADAPTIVE CHANGE
Polymorphic sites across species genomes will establish their adaptive differentiation to acclimatize to the heterogeneous environment. Landscape genomics attempts to detect these adaptive loci under selection and reveal potential environmental drivers of selection by using correlative methods. The detection of loci responsible for adaptation usually involves two steps. One is to detect the outlier loci; and the other is to associate the outlier loci with environment variables, referred to as EAA. The commonly used methods for detecting the outlier loci are ARLEQUIN (Excoffier et al., 2009), BAYESCAN (Foll and Gaggiotti, 2008), FLK (Bonhomme et al., 2010), and spatial ancestry analysis (SPA) (Yang et al., 2012). ARLEQUIN is applied to simulate a null distribution of F ST values under a hierarchical island model, which is insensitive to the hierarchically subdivided population samples or those with a recently shared history. BAYESCAN is an F ST -based model to identify outlier loci according to Bayesian posterior probability. FLK deals with variation in effective population size and historical branching of populations by incorporating a population kinship matrix into the Lewontin and Krakauer (LK) statistic (Lewontin and Krakauer, 1973). SPA is a probabilistic model for the spatial structure of genetic variation that is used to identify loci showing extreme patterns of spatial differentiation. Compared with the two F ST -based approaches, SPA is particularly sensitive to strong spatial patterns in allele frequency and works at the individual level rather than at the population level. These methods are usually combined to distinguish the selected loci from the neutral loci and thus effectively reduce the false-positive rate . EAA, followed by outlier analysis, will be conducted to test whether these outlier loci are associated with particular environmental factors and under adaptive evolution.
The methods for conducting EAA can be divided into five broadly defined categories, including categorical tests, logistic regressions, matrix correlations, general linear models, and mixed effects models (Rellstab et al., 2015). A first category contains categorical tests, which compares allele frequencies of individuals or populations from different types of environments. The different types of environment are introduced as categorical variables in parametric or nonparametric tests. A second category comprises the statistical methods of logistic regressions, such as SAM, Samβada. The spatial analysis method (SAM; Joost et al., 2007) is the first implementation of logistic regression in EAA. SAM can compute multiple simultaneous univariate logistic regressions to test for association between allelic frequencies and environmental variables. However, this approach ignores neutral genetic structure, possibly leading to high false-positive rates under various demographic scenarios (De Mita et al., 2013;Frichot et al., 2013). Nevertheless, an extended version of SAM, Samβada (Joost et al., 2007) improves the performance of this method by adding neutral genetic structure as an additional factor (Rellstab et al., 2015). A third category contains a linear approach, matrix correlations, in which the effects of environmental factors and neutral genetic structure on allele frequencies are simultaneously estimated. The most widely used methods include a simple Mantel test and the partial Mantel test (Mantel, 1967). However, variations of the (partial) Mantel test may circumvent certain bias and autocorrelation problems (Legendre, 1993;Legendre et al., 2002). A fourth important category of statistical methods is general linear models in which a response variable is modeled as a linear function of some set of explanatory variables. The general linear model framework can be extended to models with multivariate response variables to account for the polygenic architecture of adaptive traits (Rellstab et al., 2015). The statistical methods include multiple linear regressions and univariate general linear models (Carl and Kuhn, 2007;Eckert et al., 2009) and canonical correlations and multivariate linear regressions, e.g., CCA (ter Braak and Smilauer, 2002;Legendre and Legendre, 2012) and RDA (Legendre and Legendre, 2012;Hecht et al., 2015). A fifth important category of statistical methods comprises the mixed effects models, such as BAYENV (Coop et al., 2010), LFMMs (Frichot et al., 2013), TASSEL (Bradbury et al., 2007), and EMMA (Kang et al., 2008). These approaches provide a unified statistical framework for controlling for the effects of neutral genetic structure (Rellstab et al., 2015). For example, BAYENV, based on a Bayesian generalized linear mixed model, is applied to test the correlation between allelic frequencies and environmental variables after correcting for population structure and size (Günther and Coop, 2013). Latent factor mixed models (LFMMs) implemented fast algorithms using a hierarchical Bayesian mixed model based on a variant of principal component analysis (PCA), in which the residual population structure is introduced via unobserved or latent factors (Frichot et al., 2013;Caye et al., 2016). In addition, a linear mixed-model method implemented in TASSEL (Bradbury et al., 2007) is used to identify candidate loci responsible for adaptation according to the association between the genotypes and climate variables (Yoder et al., 2014). Based on linear mixed models, Kang et al. (2008) developed an efficient mixed-model association (EMMA) method. As previously mentioned, in order to reduce the false-positive rate, it is desirable to combine more than two statistical methods to identify the environment-associated loci .

MAJOR CHALLENGES
Although great progress in landscape genomics has been achieved in the past decade, two major challenges remain to be solved in the future. One is the presence of false positives, which have been a major problem in landscape genomics because of the lack of validation for adaptive loci. Three solutions will help solve this major challenge. First, robust detection methods must be developed, and multiple detection methods must be used to reduce the false-positive rates. Second, type-II markers that contain DNA sequence information must be selected. Although type-I markers may allow detecting loci potentially responsible for adaptation, the gene function of these detected loci are difficult to be validated. Type-II markers have DNA sequence information, which can be indirectly validated through the annotation of gene function. Third, a part of the loci responsible for adaptation must be validated using gene transfer and gene knockout technologies. Since most of previous landscape genomics studies have focused on non-model species, the detected loci responsible for adaptation do not have functional verification. Thus, in future, more experiments are needed to validate the function and adaptive generality of the detecting loci responsible for adaptation. In addition, most previous studies have showed great concern on gene differentiation rather than phenotypic differentiation (Manthey and Moyle, 2015;Di Pierro et al., 2016). The acquisition of adaptive phenotypic data has been conducted in a few recent landscape genomic studies (De Kort et al., 2014;Roschanski et al., 2016). Thus, obtaining the phenotypic data through common garden experiments and reciprocal transplant experiments should be considered in future.

RECOMMENDATIONS FOR FUTURE RESEARCH
The present landscape genomics mainly addresses two issues, i.e., influence of spatial environmental variables on genomic divergence and effects of the environmental factors on adaptive genetic variation. The following concerns need to be addressed in landscape genomic studies. (1) Previous studies have determined the specific genes that undergo adaptive changes and the environmental factors that contribute to these changes. However, the specific reason why these particular genes or environmental variables exhibit these functions remains unknown. (2) Type-II markers can help us reveal these specific genes. However, the metabolic pathways of the involved genes and the adaptive phenotypes controlled by these genes need to be identified. (3) Regional species in extreme environments usually establish some convergent adaptive changes in their genes or phenotypes. However, most regional species not living in extreme environments have various adaptive differentiations. Thus, the commonalities behind these diverse adaptive differentiations must be determined. (4) The distribution range of species and their ability to respond to climate change largely depend on their landscape adaptability, which is usually determined by the potential adaptive differentiation of the genome and the gene dispersal ability of the species. Thus, a landscape adaptation index must be established to measure the adaptability of species. In summary, landscape genomics is an efficient method to study the adaptive evolution of species. We hope that this review of studies on landscape genomics over the past 10 years will assist in promoting future research in this field.

AUTHOR CONTRIBUTIONS
X-XZ and R-LM wrote the original draft of article; JY, C-YM, and ZL revised this article; Y-XQ and YL conceived the ideas and contributed to substantial revisions; all authors read and approved the final version of the manuscript.