ORIGINAL RESEARCH article
Sec. Evolutionary and Population Genetics
Volume 13 - 2022 | https://doi.org/10.3389/fgene.2022.1100440
The power of geohistorical boundaries for modeling the genetic background of human populations: The case of the rural catalan Pyrenees
- 1Department of Basic Medical Sciences, University of Lleida, Lleida, Spain
- 2Institute of Biomedical Research of Lleida (IRBLleida), Lleida, Spain
- 3CNAG-CRG, Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
- 4Department of Geography and Sociology, University of Lleida. Pl. Víctor Siurana, Lleida, Spain
- 5Genome Data Science, Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute for Science and Technology, Barcelona, Spain
- 6Hospital Sant Bernabé, Berga, Spain
- 7Fundació Sant Hospital La Seu d’Urgell, La Seu d'Urgell, Spain
- 8Hospital d’Olot i Comarcal de la Garrotxa, Olot, Girona, Spain
- 9Hospital de Campdevànol, Campdevànol, Girona, Spain
- 10Servei d’atenció primària Lleida Nord, Gerència Territorial Alt Pirineu i Aran, Institut Català de la Salut, Tremp, Spain
- 11Laboratori d’Anàlisis Clíniques, Hospital Comarcal del Pallars, Tremp, Spain
- 12Department of Evolutionary Biology, Ecology and Environmental Sciences, Faculty of Biology, Biodiversity Research Institute, University of Barcelona, Barcelona, Spain
The genetic variation of the European population at a macro-geographic scale follows genetic gradients which reflect main migration events. However, less is known about factors affecting mating patterns at a micro-geographic scale. In this study we have analyzed 726,718 autosomal single nucleotide variants in 435 individuals from the catalan Pyrenees covering around 200 km of a vast and abrupt region in the north of the Iberian Peninsula, for which we have information about the geographic origin of all grand-parents and parents. At a macro-geographic scale, our analyses recapitulate the genetic gradient observed in Spain. However, we also identified the presence of micro-population substructure among the sampled individuals. Such micro-population substructure does not correlate with geographic barriers such as the expected by the orography of the considered region, but by the bishoprics present in the covered geographic area. These results support that, on top of main human migrations, long ongoing socio-cultural factors have also shaped the genetic diversity observed at rural populations.
The current genetic variation of European human populations at a macro-geographic scale is mostly driven by isolation by distance patterns (Lao et al., 2008) and relatively modern massive migrations (Hellenthal et al., 2014). These results have been also observed at a country-based geographic scale (Lao et al., 2013; Leslie et al., 2015; Bycroft et al., 2019; Raveane et al., 2019). Nevertheless, despite this general trend, some populations appear as genetic outliers, mostly related to geographically isolated rural and religious areas (Xue et al., 2017), as well as regions showing peculiar orographic events, such as islands (Biagini et al., 2019; Font-Porterias et al., 2019; Marcus et al., 2020). Self-identified ethnicity (i.e., Romani people (Font-Porterias et al., 2019)) and ethnolinguistic background (such as Basques (Flores-Bello et al., 2021)) can also promote genetic barriers at different scales.
A basic question is whether there are socio-cultural factors in Europe that remained immutable over time and affected mate preferences at a microgeographic scale. This question is complex, as the recent demography of European populations has been shaped by a progressive regional concentration (Martí-Henneberg, 2005). This phenomenon is mainly explained by the continuous migration from rural areas towards industrialized urban areas during the last century (Brown, 2011). Consequently, micro-geographic patterns can be expected to be currently blurred at urban areas, and patterns shaped by cultural factors present at rural areas obscured by their recent depopulation.
Studying a relatively isolated European rural geographic region showing geohistorical singularities that maintain over time political and cultural borders could help understanding the role of different socio-cultural factors in mating patterns during the last centuries. Within this context, the catalan Pyrenees appear as a good candidate. First, the catalan Pyrenees constitute a vast and abrupt region in the north of the Iberian Peninsula, delimited to the east by the Mediterranean Sea, to the west by the Aigüestortes National Park and the Noguera Ribagorçana river valley, to the north by the border with France and Andorra and to the south by the central depression of Catalonia. A marked orography is distributed on a west-east axis along the Prepyrenees massifs (Figure 1).
FIGURE 1. Orographic map of the catalan Pyrenean region considered in this study, depicting the most relevant orographic elements as massifs, rivers and depressions. Labeled by purple line the main orographic barriers defined by mountains higher than 1,500 m. Insert: location of villages (gray dotted points), comarca capitals (red dotted points), and recruitment centers (white-red circles).
Most of the Pyrenean valleys run from north to south, except for the maritime ends and the Puigcerdà-La Seu d’Urgell axis. The Pyrenean valleys, of river origin, run in parallel to the watershed, with narrow gorges, which does not facilitate communications between valleys (Gimenez, 2005). Along its entire extension, north-south communication axes are imposed, following the fluvial courses, which force its residents to move to the lower regions to ensure intra-Pyrenees relations. Furthermore, communications between valleys have been severely limited over time. It is at the end of the 19th and beginning of 20th century in which the construction of roads begins, first following the course of the rivers in a north-south axis, and much later the first road connections between the valleys were established. The limitations in the connections are strongly marked on the western slope, valleys of the Noguera Pallaresa and Segre rivers, while on the eastern slope, much less steep, the communications network is denser and earliest (Font, 1993) (see Supplementary Figure S1).
Second, the sum of all these mountain territories accounts for approximately a quarter of the surface of Catalonia, while the current population (70,000 people) barely exceeds 1% of the total catalan population. Throughout history, the demography of the Pyrenean region has undergone marked variations. The times of maximum splendor go back to the Middle Ages, followed by successive demographic oscillations. In Modern times, the maximum population was reached in the 1880s, followed by a progressive population decline until today (Gallart, 2002). Due to the orography of the region, a distinctive feature of the Pyrenees settlements is its scattering, with villages mainly centered around the rivers and lower part of the valleys (see insert Figure 1).
Finally, from a political point of view there are six Pyrenean comarques of Catalonia, corresponding to Alt Urgell, Alta Ribagorça, Val d’Aran, Pallars Jussà, Pallars Sobirà and Cerdanya and the northern part of another four more, Solsonès, Berguedà, Ripollès and Garrotxa (Supplementary Figure S3F). The Iberian peninsula, and in particular the ancient Pyrenees, are considered to have been culturally and genetically heterogeneous (Martin and Vaquer, 1995; Olalde et al., 2019). Before the mid-sixth millennium BC, the region was inhabited by Mesolithic hunter-gatherer communities (Oms et al., 2016). Several pre-Roman Pyrenean tribes mentioned by ancient sources are already distributed roughly following a West-East axis, such as the Airenosini (Pallars, Alt Urgell), Bergistani (Alt Urgell, Berguedà) and Castellani (Ripollès, Garrotxa). It is estimated that the Roman and Visigoth people only represented a 2.2%–4.4% of the Pyrenean population, while the Islamic conquest lasted only 80 years (Riu, 1995). Administrative-social boundaries have remained stable practically since the Middle Ages. The oldest administrative divisions correspond to the limits established between bishoprics, with the bishopric of Urgell (Diœcesis Urgellensis) being the oldest (531 A.D.), followed by the Diocese of Vic (Diœcesis Vicensis, 561 AD), Girona (Diœcesis Gerundensis, 4th century), Lleida (Diœcesis Ilerdensis, 12th century) and Solsona (Diœcesis Celsonensis, 16th century) (Supplementary Figure S2A). Other political-administrative divisions have been superimposed on these administrative divisions, such as the Middle-Age counties (11th century) (Supplementary Figure S2B), Vegueries (12th-18th centuries) (Supplementary Figure S3C), Corregimientos (18th century) (Supplementary Figure 2D), Provinces and Comarques (19th century) (Supplementary Figure S2E) (Burgueño, 1991, 1992). Despite these multiple denominations, the orographic backbone has imposed that they present great coincidences in their limits, so we could consider that these divisions have perpetuated the limits established from the old bishoprics with few variations (Supplementary Figure S2A).
In a previous study using whole genome sequencing on a limited sample size of 29 individuals (Maceda et al., 2021), we showed that populations from the catalan Pyrenees were interesting for understanding the genetic diversity of Spanish rural populations. More importantly, we observed the presence of micro-population substructure in this area, despite just covering populations in a region of around 200 km. Nevertheless, the lack of a continuous geographic sampling over the whole area of that study and the low sample size prevented identifying the atomic units of such substructure, as well as testing the factors that shaped the distribution of their genetic variation in the last millennia.
In the present study we have conducted a large genotyping study on 435 samples, analyzing and interpreting their genetic diversity in orographic and cultural terms.
We have collected samples from five reference hospital centers belonging to three health regions expanding the catalan Pyrenees. Recruitment period was extended from July 2014 to October 2016. In total, we recruited 435 samples (51.3% females) of subjects born and with all their grandparents born in the Pyrenean region (see insert Figure 1).
The data set represented an oldest extract of the population, with an average age of 55.6 years and equal proportion of sexes (Table 1). Therefore, this sample should be considered moderately affected by the demographic changes that occurred during the 19th and 20th centuries. All subjects signed an informed consent, and the study had the approval of the Ethics Committee of the University of Barcelona.
For each participant, we picked up demographic data such as sex, age and place of birth of the participant and place of birth their father and mother and for all their four grandparents. Birth places were assigned to geographic coordinates using Vissir3, the official viewer of the Cartographic and Geological Institute of Catalonia (Vidal, 2012). Assignment of centroid location of participants was done by calculating the mean longitude and latitude of grandparents (see Supplementary Table S1). Regional cartographic base maps and the terrain elevation model were from the Cartographic and Geological Institute of Catalonia. Maps were made with ArcGis 8.1 software (ESRI, Redlands, CA: Environmental Systems Research Institute), using facilities at the service of Cartography & SIG, of the University of Lleida (UdL).
We collected 1 ml of peripheral blood from each participant in EDTA-coated tubes. All participants were genotyped using Applied Biosystems Axiom™ Precision Medicine Diversity Array (PMDA) on GeneTitan™ Multi-Channel Instrument (Thermo Fisher, CA, United States) at Centro Nacional de Genotipado facilities (CEGEN, nodo Santiago de Compostela, Spain). Genotype clustering was analyzed using the Axiom Analysis Suite 6.0 software adjusting for BestWorkflow settings. Quality control (QC) was tuned to sample calling rate >0.95, marker calling rate >0.90, and Hardy-Weinberg equilibrium (HWE) >10–5. After removing Indel and copy number variation (CNV), and QC, 780,310 SNPs (726,718 at autosomes, 49,319 at chromosome X, 406 at chromosome Y, 3,457 at pseudoautosomal region and 410 at mitochondrial genome) on 435 subjects were available for subsequent analyses (thereafter GENPIR data set).
Relatedness among individual participants
We estimated relatedness among samples by using SNPRelate R package (Zheng et al., 2012). Identical by descent (IBD) analysis reveals a degree of relatedness among several individuals (Supplementary Figure S3A). In order to consider unrelated individuals for further analyses, we selected one individual from each pair of individuals having a kinship ≥0.09 (Supplementary Figure S3B). This yields a final set of 397 unrelated individuals.
Data phasing and IBD tract estimation
IBD tracts between pairs of chromosomes from different individuals and homozygote by descent (HBD) fragments between chromosomes from the same individual were inferred with Beagle 4.0 (IBD_Beagle) (Browning and Browning, 2013), using the human recombination map provided by the same software (Browning and Browning, 2013).
Merging with reference populations
To make comparisons with other populations, GENPIR genotype data was merged with genotype data from samples of the 1,000 Genomes Phase 3 project (1KG3) (2,504 individuals) (Auton et al., 2015) and Human Genome Diversity Panel (HGDP) (929 individuals) (Bergström et al., 2020). After merging with the GENPIR dataset, a total of 3,868 individuals and 738,061 single nucleotide variants (SNVs) were selected with a genotyping rate of 0.989 (690830 at autosomes, 46,947 at chromosome X, 280 at chromosome Y). We also applied unrelatedness criteria to the merged data set resulting in a final worldwide set of 3,805 unrelated individuals that were used for further analyses.
Feature extraction analyses
Principal component analysis (PCA)
PCA was done on the unrelated set of GENPIR and worldwide merged datasets using plink1.9 software (Chang et al., 2015), after performing QC that comprised: excluding markers with minor allele frequency (MAF) <0.05 and pruning to-indep-pairwise 200 25 0.3. After QC passing, PCA analysis was done on 3,805 subjects (2026 males, 1779 females) and 175,277 autosomal markers. PCA QC control was also performed to a data set obtained after extracting subjects belonging to GENPIR and European populations including Finland (FIN, n = 99), Russia (RUS, n = 25); Great Britain and Scotland (GBR, n = 91); Orcadian (Scotland) (ORC, n = 15); Utah residents (CEPH) with Northern and Western European ancestry (CEU, n = 93); French Basques (FRB, n = 23); France (FRE, n = 28); Iberian Peninsula (Spain) (IBS, n = 107); Bergamo (Italy) (BGT, n = 12) and Toscana (Italy) (TSI, n = 115), (1,005 samples, 492 males and 513 females; 152,717 autosomal SNVs) in order to perform independent PCA analyses. European populations of Adygey and Sardinia were not included in the PCA analysis as they are genetically isolated populations which can bias the results.
ADMIXTURE v1.30 (Alexander et al., 2009) was used to analyze the population structure and calculate the corresponding cross-validation error value (CV) of the considered clusters (K = 2–4). ADMIXTURE analysis was done on merged GENPIR and European populations (1,005 samples, 492 males and 513 females; 152,717 autosomal SNVs) and applying the same QC control as for PCA analysis. Distribution of derived European components from K = 3, classified as North (N) and Central (C), as from K = 4, classified as Southwest (SW) and Southeast (SE), were compared among Pyrenean regions (West, Mid, East) by pairwise t-test.
Multiple correspondence analysis
The bishopric origin of the grandparents from a given sample was projected in two dimensions using a multiple correspondence analysis (Greenacre, 2007) as implemented in the package FactoMiner (Lê et al., 2008) in R software (Team R core, 2020).
Given the regionality of the sampled dataset, the presence of population substructure must be relatively small. Since the IBD statistic is sensitive towards subtle hidden genetic relationships between individuals, we computed an IBD matrix with Beagle (Browning and Browning, 2013). We projected the IBD_Beagle matrix in a lower dimensional space after transforming it into dissimilarity by means of a non-metric ordinal SMACOF analysis (Leeuw and Mair, 2009) implemented in R software.
Estimating ancestry proportions across geographic space
We evaluated the genetic ancestry across geographic space on GENPIR population autosomal data set by tess3r software (Caye et al., 2018), applying the following parameters: method = “projected.ls”, ploidy = 2, max.iteration = 200, rep = 10, keep = “best”, tolerance = 1e−05. We kept the most highly supported model (i.e., “best” based on cross-entropy scores) within each of the 10 replicates. Two subjects were excluded lacking centroid assignment. We also excluded markers with MAF<0.05 as well as after pruning to-indep-pairwise 200 25 0.3.
Mantel test and procrustes
The relationship between genetic variation and cultural or political units while controlling by the putative confounder effect of geography was tested by means of a partial Mantel test (Lichstein, 2007) implemented in the R package vegan (Oksanen et al., 2020).
A Procrustes analysis (Peres-Neto and Jackson, 2001) between geography and the coordinates estimated by PCA or SMACOF using genetic variation was computed using the R package vegan.
Unsupervised clustering of the individuals using the first two dimensions computed by SMACOF was conducted by means of the MCLUST package (Scrucca et al., 2016) implemented in R. Mclust models the points as a finite mixture of Gaussian distributions. Ascertainment of the best number of clusters was conducted based on BIC metrics.
Results and discussion
The genetic diversity of the catalan Pyrenees in the context of worldwide populations
FIGURE 2. First two PCs of PCA analyses of worldwide populations (A), and European populations (B) including the GENPIR population. (A) PCA analysis of world-wide superpopulations, African (AFR, n = 736); Amerindians (AMR, n = 372); Central Asia (CSA, n = 194); Europeans (EUR, n = 651); East Asia (EAS, n = 715); Middle east (MED, n = 161); South Asia (SAS, n = 475); Oceania (OCN, n = 20) and catalan Pyrenean (PIR, n = 397) samples. (B) PCA analysis of European populations, Finland (FIN, n = 99), Russia (RUS, n = 25); Great Britain and Scotland (GBR, n = 89); Orcadian (Scotland) (ORC, n = 14); Utah residents (CEPH) with Northern and Western European ancestry (CEU, n = 95); French Basques (FRB, n = 23); France (FRE, n = 28); Iberian Peninsula (Spain) (IBS, n = 107); Bergamo (Italy) (BGT, n = 12) and Toscana (Italy) (TSI, n = 115).
The first two PCA components (PCs), explaining 11.3% of variance, show that the considered catalan Pyrenees samples are genetically related to other samples with European ancestry (Figure 2A). Within Europe, the catalan Pyrenees samples fall within the first two PCs (explaining 0.7%) close to samples from Southern Europe, close to individuals from Iberian and French Basque origin (Figure 2B). Scatterplots of the top 4 PCs with proportional variance explained included are also presented in Supplementary Figure S4.
Given these results, we run an unsupervised ADMIXTURE analysis focusing on the European region for K ranging from one to four in order to quantify the European ancestry proportions in GENPIR samples, revealing a minimum cross validation error value for K= 2–3 (Supplementary Figure S5).
As shown in Figure 3A, for K = 2 a North (red) to South (blue) clinal component was observed that was extended to an additional Central (green) component at K = 3. Finally, for K = 4, Southern component was split into a Southeastern (orange) and Southwestern (blue) component. Considering these four major components maximizing in present-day Southwestern European populations as “Northern (N)”, “Southeastern (SE)”, “Southwestern (SW)” and “Central (C)” (Figure 3A), we estimated their proportion in our samples. Figure 3B, C shows a decreasing gradient for the contribution of N, C and SE components being highest at the East, intermediate at the Mid and lowest at the West Pyrenean regions. In contrast, the SW European component shows an increasing gradient being lowest at the East, intermediate at the Mid and highest at the West Pyrenean regions. These results agree with the longitudinal gradient observed on the same region by other studies using classical markers (Calafell and Bertranpetit, 1994), as well as with the main axes of genetic differentiation within the Iberian Peninsula (Bycroft et al., 2019). Therefore, all these results support that the catalan Pyrenees fall within the genetic continuum within the Iberian Peninsula. These overall patterns have been previously interpreted in terms of ancient migrations (Calafell and Bertranpetit, 1994), as well as on the more recent migrations of Christian kingdoms and linguistic differences during the middle ages (Bycroft et al., 2019). In particular, the West to East pattern observed in the Iberian Peninsula has been explained by the admixture between Paleolithic and Neolithic populations and further geographic isolation, independently of which was the cause of that isolation (i.e., Catholic Kingdoms or linguistic differences) (Ferreiro et al., 2021).
FIGURE 3. (A) Admixture analysis of European populations and catalan Pyrenean populations for K = 2 to K = 4. At K = 2 we showed a North (N) to South (S) gradient component in the European samples (red and blue labeled, respectively). At K = 3 a newly component appears that should be assigned to Central (C) European component (green labeled). Finally, at K = 4 the Southern component splits into a Southeastern (SE) component (orange labeled) and a Southwestern (SW) component (green labeled). (B) Distribution of European components K = 3 (N, C) and K = 4 (SE, SW) on GENPIR individual samples according to their geographical location. (C) Box plot distribution of European components K = 3 (N, C) and K = 4 (SE, SW) in the West (n = 168), Mid (n = 83) and East (n = 146) catalan Pyrenean regions, including p-values of pairwise t-test comparisons.
At a local level, the presence of a higher proportion of SW European ancestry in the West region of the catalan Pyrenees supports that this region has been more accessible for migrations than other populations from the Iberian Peninsula. In addition, the higher contribution of N, C and SE European components at the East region of the catalan Pyrenees reflects a major permeability of this Pyrenean region, where geographical barriers are much weaker and with a much richer communications network, that has allowed contact with the nearby regions of northern and eastern Europe. We noted that the SE component, being significant on the Mid and East Pyrenean regions, reflects potential migratory events from Southeast Mediterranean coasts.
Population substructure is present at a microscale level within the catalan Pyrenees
Given the observed differences in ancestry among the GENPIR samples, we conducted a PCA only using GENPIR samples (Figure 4A; 0.5% of variance).
FIGURE 4. (A) First two PCs, explaining 0.6% of variance of a PCA of GENPIR individuals. (B) First two dimensions out of the four-set obtained by projecting an IBD matrix between pairs of individuals using an ordinal SMACOF analysis (stress = 0.4). GENPIR samples are assigned to three catalan Pyrenees regions according to their centroid assignment. (C) Map denoting assignment of bishoprics at each region: WEST (blue, bishoprics of Lleida and Urgell); MIDDLE (green, bishopric of Solsona); EAST (red, bishoprics of Vic and Girona). (D) Map denoting assignment of comarques at each region: WEST (blue, comarques of Pallars, Ribagorça, Alt Urgell and Cerdanya); MIDDLE (green, comarques of Solsonès and Berguedà); EAST (red, comarques of Ripollès and Garrotxa). In d, purple line denotes main orographic barriers.
We observed that the genetic map defined by the two PCs supports the presence of a non-random distribution of GENPIR individuals. Individuals from West, East and Middle Pyrenean regions tend to be grouped into three prominent clusters (Figure 4A). Since shared haplotype tracts are more sensitive for identifying subtle population substructure (Lawson et al., 2012), we wondered if the pattern observed by the PCA could be enhanced by projecting the IBD matrix between pairs of individuals estimated from Beagle 4.0 (Browning and Browning, 2013). We applied an ordinal SMACOF approach, in which the distance between pairs of individuals is interpreted in rank terms rather than by its magnitude. This decision was taken due to the fact that the overall genetic differences between all individuals are expected to be limited. Therefore, the presence of few outliers showing a higher degree of relationship (even after excluding highly related individuals) could distort the projection towards these relationships rather than describing the patterns observed among all samples. The first two dimensions from the SMACOF analysis considering 15 dimensions (stress = 0.4) produce a similar picture as observed by PCA (Figure 4B). However, it enhances the differences between the three different clusters of populations, supporting micro substructure in the Pyrenees.
Furthermore, this micro-population substructure can be interpreted in geographic terms. In particular, a procrustes analysis between the geographic coordinates of the samples and the genetic map defined by the first two PCA dimensions showed a symmetric correlation of 0.61 (p-value after 10,000 permutations <9.999e-05).
Taking into account the orography of the region, the Cadi Massif constitutes a clear geographic barrier detaching the Western to the Middle and Eastern regions while no clear geographic barrier appears to detach the Middle to the Eastern regions (Supplementary Figure S1). When applying actual (comarques) and historical (bishoprics) administrative-religious boundaries, the West cluster comprises Pallars, Ribagorça, Cerdanya and Alt Urgell comarques as well as Lleida and Urgell bishoprics; the Middle cluster comprises Solsonès and Berguedà comarques and Solsona bishopric and the East cluster includes Ripollès and Garrotxa comarques as well as Vic and Girona bishoprics (Figures 4C, D).
Micro-population substructure within the catalan Pyrenees is better explained by geohistorical organization (religious) structures
Given these results supporting a geographic interpretation of the genetic variation, we wondered whether models incorporating geographic information could identify genetic barriers present between the samples. The ancestry map generated by tess3r at K = 2 to K = 4 (Figure 5) identifies a strong longitudinal component that agrees with ADMIXTURE analyses and procrustes, and splits individuals following a geographic partition that matches the genetic differentiation observed in previous analyses.
FIGURE 5. Spatial distribution analysis of the ancestry of GENPIR individuals. Barplot (A) (C), (E) and Geographical (B) (D), (F) pattern distribution of ancestry components of GENPIR samples assuming K = 2 to K = 4. In (A) (C) and (E), GENPIR samples are assigned to WEST, MIDDLE or EAST catalan Pyrenees regions, comarques (PAR, Pallars-Ribagorça; AUC, Alt Urgell-Cerdanya; BBG, Berguedà and Solsonès; RIP, Ripollès and GAR, Garrotxa) and bishoprics (LLE, Lleida; URG, Urgell; SOL, Solsona; VIC, Vic and GIR, Girona), according to their centroid assignment. In (B) (D) and (F), individual values of the ancestry components of GENPIR samples for K= 2 to 4 are plotted over the geographic location of each subject. Purple line denotes main orographic barriers.
Interestingly, these divisions do not fully agree on orographic events, such as the presence of mountains, that could act as genetic barriers. Particularly, the division that is observed between East and West at K = 2 is set in a region lacking clear geographic barriers. Therefore, as previously considered, other factors correlating with geography must be shaping the observed genetic differentiation across the region. We also point out that at K = 3 the algorithm mainly reproduces the West, Middle and East clustering, which we previously observed with both PCA and SMACOF analysis.
We wondered if such genetic clusters identified in previous analyses were shaped by current administrative divisions (comarques), or if they were influenced by other long term ongoing cultural factors such as the limits defined by middle age-founded bishoprics. We applied a Mclust analysis using the first two dimensions from SMACOF in order to identify clusters showing a similar probability distribution function. The optimal number of clusters identified by Mclust was six (BIC = 406.33) that accurately reflect a hardest effect of geographic barriers that is shaped by administrative divisions, both comarques and bishopric (Figure 6). Cluster I includes subjects from the comarques of Pallars and bishopric of Lleida, cluster II subjects from the comarques of Pallars and Alt Urgell as well as bishopric of Urgell, cluster III mainly includes subjects from the comarca of Alt Urgell and bishopric of Urgell, cluster IV subjects from comarca of Solsones and Berguedà as well as bishopric of Solsona, cluster V subjects from comarca of Ripollès and bishopric of Vic and cluster VI mainly includes subjects from comarca of Garrotxa and bishopric of Girona (Figure 6).
FIGURE 6. (A) Suggested clusters (ellipses I to VI) identified by applying the unsupervised clustering Mclust analysis. Colors identify bishoprics (Girona, red; Lleida, brown; Solsona, green; Urgell, blue and Vic, purple). Shape identify comarques (circles AUC, Alt Urgell-Cerdanya; triangles BBG, Berguedà-Solsonès; squares GAR, Garrotxa; crosses PAR, Pallars Jussà-Pallars-Sobirà-Ribagorça; crosses in squares RIP, Ripollès). (B) Colored bishoprics according to clustering, colors as in A. (C) Labeled comarques according to clustering, (circles AUC, Alt Urgell-Cerdanya; triangles BBG, Berguedà-Solsonès; squares GAR, Garrotxa; crosses PAR, Pallars Jussà-Pallars-Sobirà-Ribagorça; crosses in squares RIP, Ripollès).
Interestingly, such bishopric division from the SMACOF bi-plot is also observed within the geographic region (depicting ∼70 km far away) that contains the intersection of four bishoprics (Figure 7).
FIGURE 7. (A) First two dimensions of the SMACOF analysis depicted in Figure 6 focusing on individuals (N = 230) from the geographic region that contains the intersection of four bishoprics. (B) Detailed orography of the focused region. Despite the small distance considered, the influence of bishoprics (C) and comarques (D) is observed. In (A) Colors identify bishoprics (Urgell, blue; Solsona, green; Vic, purple and Girona, red). Shape identify comarques (circles AUC, Alt Urgell-Cerdanya; square BBG, Berguedà-Solsonès; triangles GAR, Garrotxa and diamond RIP, Ripollès). (B) Orography of the detailed region. (C) Colored bishoprics according to clustering, colors as in (A). (D) Colored comarques according to clustering, (AUC, Alt Urgell-Cerdanya, blue; BBG, Berguedà-Solsonès, green; RIP, Ripollès, purple; GAR, Garrotxa, red).
Since bishoprics and comarques depend on geography, the identified geographic clusters by tess3r could reflect cultural and/or political barriers. We conducted a partial Mantel test between the Euclidean distance estimated from the first two PCs, and the assignment to the same administrative division or bishoprics, controlled by the geographic distance. Pearson’s correlation between PC based distances and bishoprics after controlling by geography in a partial Mantel test was 0.434 (p-value <1e-04 using 9,999 permutations). This degree of association is higher than the observed between genetics and administrative divisions of comarques (Pearson’s correlation = 0.324, p-value <1e-04 using 9,999 permutations). When considering a classification based on bishoprics and administrative divisions at the same time, Pearson’s correlation decreased to 0.294 (p-value <1e-04 using 9,999 permutations). Similar results supporting higher correlation between bishopric divisions and genetic diversity were obtained when using the first two dimensions from the SMACOF analysis conducted with the IBD distance matrix. The degree of correlation between genetics and bishoprics after controlling by geographic distance was 0.563 (p-value <1e-04 using 9,999 permutations). This correlation decreases to 0.382 (p-value <1e-04 using 9,999 permutations) when considering administrative divisions and it is similar when considering both bishopric and administrative divisions (R = 0.361, p-value <1e-04 using 9,999 permutations). Overall, these results suggest that bishoprics play a role in shaping the genetic variation of GENPIR samples by modifying their mating patterns.
We supported such a hypothesis based on genetic data by taking advantage of the sampling strategy conducted in this study, which incorporated the geographic birthplace of the four grandparents (Figure 8, Supplementary Table S1).
FIGURE 8. Geographic location of the individuals with regards to the mating relationship of grandparents. Each propositus is linked to their grandparents generating a complex network centered on bishoprics. Red lines delimit the bishopric boundaries. Purple lines indicate the presence of orographic events preventing direct routes of contact between regions.
Figure 8 shows the grandparent to propositus network in which each propositus is linked to their four grandparents generating a complex network centered on bishopric boundaries. We can also observe that grandparent and parent couples come from the same bishopric (over 80% of couples) or to a nearest bishopric (near 10% of couples) (Figure 9A). The overall relationship between bishopric and mating was summarized by a multiple correspondence analysis (MCA) using the bishopric of each grandparent, parent and the considered sample. The relationship defined by the bishopric origin of the grandparents and samples is summarized by a triangle in the first two dimensions (explaining 39.3% of total inertia; Figure 9B) of the MCA. The first vertex contained relationships from the bishoprics of Urgell and Lleida. The second vertex was determined by the relationship between members from the bishopric of Solsona and the third one by the bishopric of Girona. The grandparents and current samples from the bishopric of Vic fall within the axis defined by Solsona and Girona, indicating a higher number of bishopric-admixed grandparent couples with the bishoprics of Solsona and Girona (Figure 9B). The center of the triangle depicted relationships where at least one of the grandparents is out of the bishoprics. Overall, the observed mating relationships agree with the geographic proximity between bishoprics (Lleida and Urgell, Vic between Solsona and Girona), as expected given the dependence between geography and these geohistorical organizations and highlight the bishopric-based mating choice of the individuals from GENPIR.
FIGURE 9. (A) Relationship of marriages depending on the bishoprics of parents and grandparents couples. (B) Multiple Correspondence Analysis between the bishopric of the sample and her/his parents and grandparents.
The ecclesiastical division of the modern era responds to human spheres of multisecular relationships. For this reason, it is not surprising that the bundles of family relationships roughly coincide with the bishoprics: a central Pyrenean space ‒bishopric of Urgell‒with abundant interactions (although it is capable of being subdivided into comarques); an eastern area (Girona); and a southern territory or central Catalonia (bishopric of Solsona, perhaps undifferentiated from that of Vic).
Traditionally, the main force used for explaining biases in mating choice in Spanish rural areas compared to urban areas has been geography (Fuster and Colantonio, 2004). Poor transportation facilities and limited communications - which in turn depend on orography- enhance the practice of consanguineous marriages and inbreeding in Spain. In particular, in small rural and dispersed localities, a limited number of suitable local partners conditioned marriage, and marrying a distant relative was a likely option when few other partners were available. This situation was even more common when the population started increasing due to medical advances, but mobility remained restricted (Gamella and Nez-Negrillo, 2019). Furthermore, in a sociocultural context, aunt-nephew and uncle-niece marriages could have been practiced in rural areas in order to maintain or enlarge the family inheritance (Fuster and Colantonio, 2004). Similarly, the regional practice in Northern and Northwestern Spain from the 18th to the 20th century of a complex household system could have favored first cousin mating in these rural areas (Fuster and Colantonio, 2004). Overall, these socio-cultural processes could explain the overall patterns of isolation by distance that we detected in our different analyses. However, our study also supports that bishopric can act on shaping the genetic variation of rural areas. This observation can be interpreted in the socio-cultural context of the catalan Pyrenees. In this region, the old urban centers of cities and towns with episcopal seats are the first location of markets or fairs. Cities such as La Seu d’Urgell (bishopric of Urgell), Besalú (bishopric of Girona), Berga (bishopric of Solsona) or Vic (bishopric of Vic) have documented markets or fairs since XII and XIII centuries. In rural areas, markets and fairs act as a space for sociability promoting personal contacts between relatives and acquaintances. Social relationships were established or renewed, labor contracts were established, and marriages were arranged (Gallart, 2004).
Humans, as social animals, are subject to rules that emanate from power, economic, political or religious, which beyond allowing the proper functioning of social structures, also modulate interpersonal relationships. The role of the church has been prominent throughout history, acting as a central and determining factor of interpersonal relationships in past societies. In particular, the Catholic Church, with its rigid pyramidal structure, has left an imprint on societies that go beyond religious aspects. As our results show, the genetic structure of the rural populations of the catalan Pyrenees, in addition to the geographical factor, has also been modulated by administrative-social factors, which have established limits to the social relationships of individuals, being the limits established by the bishoprics one of the determining factors.
In this context, the bishoprics can be considered as the most representative territorial organization frameworks, historically very stable and cohesive, and necessarily well adapted to the strong geographical conditions that mark the Pyrenean environment.
A singular orography that has determined the establishment of administrative-religious limits, perpetuated unalterably for centuries, and the rigid social structure enforced by the Catholic Church over long periods of time, have shaped the genetic structure of the catalan Pyrenees population. It remains to be determined to what extent this structure may have generated interpopulation differences able to be reflected in differences on genetic susceptibility to diseases.
Data availability statement
The data presented in the study are deposited in the Figshare repository, accession numbers https://figshare.com/articles/dataset/Genotype_files/21404643 and https://figshare.com/articles/dataset/Code_for_figures_and_statistics/21406935.
The studies involving human participants were reviewed and approved by the Ethics Committee of the University of Barcelona. The patients/participants provided their written informed consent to participate in this study.
JF, PM and OL contributed to the study conception and design. Material preparation, recruitment and data collection were performed by AC, JFa, JFe, JG, JR and JS. Analysis and interpretation of the data was performed by JF, IM, ML, MMA and OL. Geohistorical contextualization was performed by MG, JB, JF and OL. The first draft of the manuscript was written by JF and OL and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
This work was supported by the Diputació of Lleida grant to JF. OL and IM acknowledge the support from Spanish Ministry of Science and Innovation to the EMBL partnership, the Centro de Excelencia Severo Ochoa, CERCA Program/Generalitat de Catalunya, Spanish Ministry of Science and Innovation through the Instituto de Salud Carlos III, Generalitat de Catalunya through Departament de Salut and Departament d’Empresa i Coneixement, Co-financing with funds from the European Regional Development Fund by the Spanish Ministry of Science and Innovation corresponding to the Programa Operativo FEDER Plurirregional de España (POPE) 2014–2020 and by the Secretaria d’Universitats i Recerca, Departament d’Empresa i Coneixement of the Generalitat de Catalunya corresponding to the Programa Operatiu FEDER de Catalunya 2014–2020. OL gratefully acknowledges the financial support from Ministerio de Economía, Industria y Competitividad BFU 201568759-P and Generalitat de Catalunya (Government of Catalonia)—GRC 2017 SGR 937. IM gratefully acknowledges the financial support from the Government of Catalonia | Agència de Gestió d’Ajuts Universitaris i de Recerca (Agency for Management of University and Research Grants)—GRC 2014 SGR 615. MMA was supported by a FPU15/01251 from Ministerio de Ciencia, innovación y Universidades. MMA acknowledges the financial support from Precipita-FECYT (FBG 309307 project). The samples analyzed in this study are part of a much larger sample whose collection was supported by the project CGL 2011–27866 of the Ministerio de Ciencia y Tecnologia, the Fundació Moret i Marguí to PM.
We thank all the volunteers who donated DNA samples and who were involved in the collection of samples and genealogical data (Hospital Sant Bernabé, Berguedà; Fundació Sant Hospital, Alt Urgell; Hospital d’Olot i Comarcal, Garrotxa; Hospital de Campdevànol, Ripollès and Hospital Comarcal, Pallars).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2022.1100440/full#supplementary-material
Bergström, A., McCarthy, S. A., Hui, R., Almarri, M. A., Ayub, Q., Danecek, P., et al. (2020). Insights into human genetic variation and population history from 929 diverse genomes. Science 367, eaay5012. doi:10.1126/science.aay5012
Biagini, S. A., Solé-Morata, N., Matisoo-Smith, E., Zalloua, P., Comas, D., and Calafell, F. (2019). People from ibiza: An unexpected isolate in the western mediterranean. Eur. J. Hum. Genet. 27, 941–951. doi:10.1038/s41431-019-0361-1
Brown, D. L. (2011). “Migration and rural population change: Comparative views in more developed nations,” in International handbook of rural demography international handbooks of population. Editors C. KJ, 35–48. doi:10.1007/978-94-007-1842-5_4
Burgueño, J. (1991). Del corregiment a la comarca de muntanya. Les divisions territorials al Pirineu (1). Treballs de la Societat Catalana de Geografia 28/29, 55–89. Available at: http://hdl.handle.net/10459.1/47855.
Burgueño, J. (1992). Del Corregiment a la Comarca de muntanya. Les divisions territorials al Pirineu (i11). Treballs de la Societat Catalana de Geografia 32, 73–101. Available at: http://hdl.handle.net/10459.1/47854.
Bycroft, C., Fernandez-Rozadilla, C., Ruiz-Ponte, C., Quintela, I., Carracedo, Á., Donnelly, P., et al. (2019). Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula. Nat. Commun. 10, 551. doi:10.1038/s41467-018-08272-w
Chang, C. C., Chow, C. C., Tellier, L. C., Vattikuti, S., Purcell, S. M., and Lee, J. J. (2015). Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 4, 7. doi:10.1186/s13742-015-0047-8
Ferreiro, D., Núñez-Estévez, B., Canedo, M., Branco, C., and Arenas, M. (2021). Evaluating causes of current genetic gradients of modern humans of the Iberian Peninsula. Genome Biol. Evol. 13, evab071. doi:10.1093/gbe/evab071
Flores-Bello, A., Bauduer, F., Salaberria, J., Oyharçabal, B., Calafell, F., Bertranpetit, J., et al. (2021). Genetic origins, singularity, and heterogeneity of Basques. Curr. Biol. 31, 2167–2177.e4. e4. doi:10.1016/j.cub.2021.03.010
Font-Porterias, N., Arauna, L. R., Poveda, A., Bianco, E., Rebato, E., Prata, M. J., et al. (2019). European Roma groups show complex West Eurasian admixture footprints and a common South Asian genetic origin. Plos Genet. 15, e1008417. doi:10.1371/journal.pgen.1008417
Fuster, V., and Colantonio, S. (2004). Socioeconomic, demographic, and geographic variables affecting the diverse degrees of consanguineous marriages in Spain. Hum. Biol. 76, 1–14. doi:10.1353/hub.2004.0021
Gallart, C. B. (2004). Fires i mercats: Factors de dinamisme econòmic i centres de sociabilitat (segles XI a XV). Rafael Dalmau Available at: https://books.google.es/books?id=8I1IAAAACAAJ.
Lao, O., Altena, E., Becker, C., Brauer, S., Kraaijenbrink, T., Oven, M. V., et al. (2013). Clinal distribution of human genomic diversity across The Netherlands despite archaeological evidence for genetic discontinuities in Dutch population history. Investig. Genet. 4, 9. doi:10.1186/2041-2223-4-9
Lao, O., Lu, T. T., Nothnagel, M., Junge, O., Freitag-Wolf, S., Caliebe, A., et al. (2008). Correlation between genetic and geographic structure in Europe. Curr. Biol. CB 18, 1241–1248. doi:10.1016/j.cub.2008.07.049
Loh, P.-R., Danecek, P., Palamara, P. F., Fuchsberger, C., Reshef, Y. A., Finucane, H. K., et al. (2016). Reference-based phasing using the haplotype reference consortium panel. Nat. Genet. 48, 1443–1448. doi:10.1038/ng.3679
Maceda, I., Álvarez, M. M., Athanasiadis, G., Tonda, R., Camps, J., Beltran, S., et al. (2021). Fine-scale population structure in five rural populations from the Spanish Eastern Pyrenees using high-coverage whole-genome sequence data. Eur. J. Hum. Genet. 29, 1557–1565. doi:10.1038/s41431-021-00875-0
Marcus, J. H., Posth, C., Ringbauer, H., Lai, L., Skeates, R., Sidore, C., et al. (2020). Genetic history from the middle neolithic to present on the mediterranean island of Sardinia. Nat. Commun. 11, 939. doi:10.1038/s41467-020-14523-6
Martin, A., and Vaquer, J. (1995). “El poblament dels Pirineus a l’Holocè, del Mesolític a l’edat del Bronze,” in Muntanyes i població. El Passat dels Pirineus des d’una perspectiva multidisciplinària. Editors J. Bertranpetit, and E. Vives (Andorra la Vella, 35–73. I Simposi de Poblament dels Pirineus.
Oksanen, J., Blanchet, F. G., Friendly, M., Kindt, R., Legendre, P., McGlinn, D., et al. (2020). vegan: Community ecology package. R package version 2.5-7. Available at: https://CRAN.R-project.org/package=vegan.
Olalde, I., Mallick, S., Patterson, N., Rohland, N., Villalba-Mouco, V., Silva, M., et al. (2019). The genomic history of the Iberian Peninsula over the past 8000 years. Science 363, 1230–1234. doi:10.1126/science.aav4040
Oms, F. X., Martín, A., Esteve, X., Mestres, J., Morell, B., Subirà, M. E., et al. (2016). The neolithic in northeast iberia: Chronocultural phases and 14C. Radiocarbon 58, 291–309. doi:10.1017/rdc.2015.14
Peres-Neto, P. R., and Jackson, D. A. (2001). How well do multivariate data sets match? The advantages of a procrustean superimposition approach over the Mantel test. Oecologia 129, 169–178. doi:10.1007/s004420100720
Raveane, A., Aneli, S., Montinaro, F., Athanasiadis, G., Barlera, S., Birolo, G., et al. (2019). Population structure of modern-day Italians reveals patterns of ancient and archaic ancestries in Southern Europe. Sci. Adv. 5, eaaw3492. doi:10.1126/sciadv.aaw3492
Riu, M. (1995). “El poblament dels Pirineus, segles VII-XIV,” in Muntanyes i població. El Passat dels Pirineus des d’una perspectiva multidisciplinària. Editors J. Bertranpetit, and E. Vives (Andorra la Vella: I Simposi de Poblament dels Pirineus), 195–218.
Scrucca, L., Fop, M., Murphy, T. B., and Raftery, A. E. (2016). Mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. R. J. 8, 289–317. doi:10.32614/rj-2016-021
Team Core R (2020). R: A language and environment for statistical computing. Viena, Austria: R Foundation for Statistical Computing Available at: http://www.R-project.org/.
Vidal, A. (2012). “Vissir3: Nuevas posibilidades de visualización e interacción con la cartografía del ICC,” in VI jornadas de SIG libre (Girona: Universitat de Girona). Available at: https://dugi-doc.udg.edu/bitstream/handle/10256/4195/7Art-Vissir3.pdf?sequence=1&isAllowed=y.
Xue, Y., Mezzavilla, M., Haber, M., McCarthy, S., Chen, Y., Narasimhan, V., et al. (2017). Enrichment of low-frequency functional variants revealed by whole-genome sequencing of multiple isolated European populations. Nat. Commun. 8, 15927. doi:10.1038/ncomms15927
Zheng, X., Levine, D., Shen, J., Gogarten, S. M., Laurie, C., and Weir, B. S. (2012). A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328. doi:10.1093/bioinformatics/bts606
Keywords: genetic variation, rural population, inbreeding, population differentiation, bishopric boundaries, rural catalan Pyrenees
Citation: Fibla J, Maceda I, Laplana M, Guerrero M, Álvarez MM, Burgueño J, Camps A, Fàbrega J, Felisart J, Grané J, Remón JL, Serra J, Moral P and Lao O (2023) The power of geohistorical boundaries for modeling the genetic background of human populations: The case of the rural catalan Pyrenees. Front. Genet. 13:1100440. doi: 10.3389/fgene.2022.1100440
Received: 16 November 2022; Accepted: 12 December 2022;
Published: 10 January 2023.
Edited by:Miguel Arenas, University of Vigo, Spain
Reviewed by:Veronica Fernandes, Universidade do Porto, Portugal
Hannah Moots, The University of Chicago, United States
Copyright © 2023 Fibla, Maceda, Laplana, Guerrero, Álvarez, Burgueño, Camps, Fàbrega, Felisart, Grané, Remón, Serra, Moral and Lao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
†Present address: Oscar Lao, Institut de Biologia Evolutiva (UPF-CSIC), Parc de Recerca Biomèdica de Barcelona, Barcelona, Spain