Genetic Differentiation of North-East Argentina Populations Based on 30 Binary X Chromosome Markers

Alu insertions, INDELs, and SNPs in the X chromosome can be useful not only for revealing relationships among populations but also for identification purposes. We present data of 10 Alu insertions, 5 INDELs, and 15 SNPs of X-chromosome from three Argentinian north-east cities in order to gain insight into the genetic diversity of the X chromosome within this region of the country. Data from 198 unrelated individuals belonging to Posadas, Corrientes, and Eldorado cities were genotyped for Ya5DP62, Yb8DP49, Ya5DP3, Ya5NBC37, Ya5DP77, Ya5NBC491, Ya5DP4, Ya5DP13, Yb8NBC634, and Yb8NBC102 Alu insertions, for MID193, MID1705, MID3754, MID3756 and MID1540 Indels and for rs6639398, rs5986751, rs5964206, rs9781645, rs2209420, rs1299087, rs318173, rs933315, rs1991961, rs4825889, rs1781116, rs1937193, rs1781104, rs149910, and rs652 SNPs. No deviations from Hardy-Weinberg equilibrium were observed for Posadas and Corrientes. However, Eldorado showed significant values, and it was found to have an internal substructuring with two groups of different origin, one showing higher similarity with European countries, and the other with more similarities to Posadas and Corrientes. Fst pairwise genetic distances emerged for some markers among the studied populations and also between our data and those from other countries and continents. Of particular interest, Alu insertions demonstrated the most differences, and could be of use in ancestry studies for these populations, while INDELs and SNPs variation were informative for differentiation within the country.


INTRODUCTION
The human X chromosome contains multiple types of non coding markers distributed along its sequence, including Alu insertions, insertion-deletions (INDEL), and single nucleotide polymorphisms (SNP) markers. Alu insertions are short interspersed nuclear elements classified into 12 subfamilies that appeared at different times during primate evolution (Kapitonov and Jurka, 1996). Alu sequences are about 300 bp in length and were ancestrally derived from the 7 SL RNA gene, inserted into the genome through an intermediate RNA single strand generated through transcription by RNA polymerase III (Batzer and Deininger, 1991). These polymorphisms consist of the presence or absence of Alu elements at a particular locus. Usually, they are selectively neutral and, as their location hardly changes or rearranges, they are considered to be derived from one unique event in which the absence of the insertion is the ancestral state for Alu markers (Batzer et al., 1994). All of these distinctive features make the human Alu insertion polymorphisms a good tool for studying the genetic variation and the evolutionary relationships of human populations (Stoneking et al., 1997). INDELs and SNPs have many genetic advantages for population studies: (i) they are widely spread throughout the genome, including the X chromosome; (ii) the majority of these polymorphisms derive from a single mutation event; (iii) mutation rates are much lower than those of repetitive markers; (iv) they can show significant allele frequency differences among geographically distant populations; and (v) they can be easily genotyped, even from degraded DNA samples, given the short length of amplicons (Tomas et al., 2008;Pereira et al., 2009;Ribeiro-Rodrigues et al., 2009;Casto et al., 2010;Li et al., 2010).
Concerning transmission properties, the X chromosome is gaining significant importance in population and forensic genetic studies. The mammalian X chromosome presents many unique features. Females inherit an X chromosome from each parent, whereas males inherit a single, maternal X chromosome. In the female germinal cell line, both X chromosomes undergo recombination, whereas in males, instead, recombination is restricted to the short regions at both tips of the X chromosome arms that recombine with equivalent segments on the Y chromosome. Some interesting characteristics of the X chromosome rely on its special transmission pattern: (i) it travels between both sexes in each generation, telling a story which differs from uniparental genomes; (ii) it has an overall genetic diversity within populations higher than autosomes due to its reduced effective population size, which makes it more sensitive to the effects of genetic drift, population substructuring and selective sweeps; (iii) the lower rate of recombination leads to an increase in levels of linkage disequilibrium (LD) in comparison to autosomes; and (iv) the hemizygous state in males provides a direct access to haplotypes (Hamosh et al., 2000;Schaffner, 2004).
In Argentina, the populations of different provinces came from multiple, distinct origins. In the case of the Argentinian Mesopotamia (the Misiones, Corrientes, and Entre Ríos provinces) the Guaraní people, who came from Amazonia, settled in the current Argentinian territory between the end of the fifteenth century and the beginning of the sixteenth century (Heguy, 2012). They entered in a violent way, thus generating a situation of continual conflict concerning access to resources by the native non-Guaraní populations who inhabited the region concerning access to resources (Vara, 1985). About a 100 years later, the Jesuits arrived during the European invasion of the area and established Jesuitic missions. This halted the expansion of the Portuguese, who were hunting the aboriginal people for. The Guaraní military men also participated in numerous campaigns of punishment against other aboriginal tribes of the Gran Chaco region, such as the Guaykurú, the Payagú, and the Mbyá. As a consequence of the Bourbon reforms, the Jesuits were expelled from America, and natives emigrated to Corrientes and Santa Fe for working as craftsmen or farmers, while others returned to the jungle (González and Pérez, 2000;Heguy, 2012).
On the other hand, European immigration in Misiones was characterized by two types of settlements: one official  and one private . Private colonization took place in Eldorado, Puerto Rico and Montecarlo, in the northern region of Misiones, where companies of European origin promoted the arrival of foreigners, preferably Germans (Junta de Estudios Históricos del Municipio de Eldorado, 2015;Poenitz, 2015).
Finally, in the case of the Corrientes province, at the end of the fourteenth century this region had substantial ethnic diversity (Caingang, Abipon, Chaná, Caracará, Mocoretá) that changed with the arrival and expansion of the Guaraní people. As a consequence of this expansion, the language and customs of extant communities was replaced by those of the Guaraní. In particular, the city of San Juan de Vera de la Siete Corrientes was founded by Spaniards and their descendants born in America (dubbed "criollos"), who employed these lands as farms for the breeding and exploitation of cows and horses. The initial settlers of the city of Corrientes gradually joined the local Amerindian population (Vara, 1985). In contrast with the situation in Misiones, there were no Jesuitic missions in the province of Corrientes, except for a few in the west coast of the Uruguay River, and the Franciscan mission of Itatí located near the city of Corrientes (Pérez, 1936;Heguy, 2012).
In this work, we studied 10 Alu insertions, 5 INDELs, and 15 SNPs to gain insight into the genetic composition of these particular populations. First, we focused in discerning the different contribution of European and Native American genetic components on the studied populations to understand the diversity that they represent. Second, we analyzed the different information that each kind of marker can offer. This constitutes the first study in South American populations that uses this set of 30 markers.

DNA Samples
We analyzed a total of 198 samples from healthy, unrelated persons of both sexes from three different locations in Argentina: the capital city of Corrientes (Corrientes, n = 92; 32 females, 60 males), the capital city of Misiones (Posadas; n = 52, 28 females, 24 males), and another important city of Misiones (Eldorado, n = 54); the geographical location of these cities is shown in Figure 1. Samples from Posadas were obtained in public hospitals; samples from Eldorado were collected in a private laboratory of clinical analysis, and those from Corrientes were collected during a campaign by physician Darío Martín González. The current geographical location and the geographical origin of parents and grandparents of each donor were considered. Concerning Eldorado, samples were separated into two groups according to the grandparents reported origin. We labeled as Eldorado A (n = 27; 13 females, 14 males) the donors who knew that their four grandparents were German and/or Swiss, and we labeled donors who did not know the origin of all four grandparents as Eldorado B (n = 27; 11 females, 16 males). All samples were obtained with informed consent and were analyzed anonymously; the project was approved by the Ethics Commitee at IMBICE. DNA was isolated from buccal cells and peripheral blood samples as in Gemmell and Akiyama (1996).

Statistical Analysis
Allele frequencies were calculated using the software RStudio v.0.99.893 (R Core Team, 2008). Heterozygosity and Hardy-Weinberg equilibrium (HW) were calculated only for female subsamples, while LD was calculated for male. F st -values were estimated using both female and male data, employing the program ARLEQUIN v.3.5 (Excoffier and Lischer, 2010). Program Past was used to make MDS graphics (Hammer et al., 2001). To infer the population structure, the program STRUCTURE v.2.1 (Hubisz et al., 2009) was used assuming a model of k population groups, with k between 1 and 5; all runs were performed using a burn-in period of 10 5 iterations followed by 10 5 iterations, and a repetition number of 10. To choose the best k in each case, Structure Harvester software was used (Earl and vonHoldt, 2012). Intercontinental comparisons were performed using data from Athanasiadis et al. (2007) and Gayà-vidal et al. (2009), including sub-Saharan Africa (Ivory Coast), North Africa (Moroccan High Atlas, Siwa oasis in Egypt, and Tunisia), Greece (Crete Island), Spain (Basque Country), and Native Americans from Bolivia (Aymará and Quechua speakers from Andean region). The comparisons were made with the mentioned populations because there was no data available for other communities of Europeans or Native Americans in the scientific literature. Finally, the calculations of the forensic parameters PIC (polymorphic information content), PE (power of exclution) and PD (power of discrimination) were calculated with an online software ChrX-STR.org 2.0 Calculator (http://www.chrx-str.org/) (Szibor et al., 2006). The location of the Alu polymorphisms was obtained from Callinan et al. (2003) and the location of the INDEL and SNP polymorphisms were obtained from NCBI (1988).

RESULTS
The DNA data corresponding to this work are deposited in http:// hdl.handle.net/10915/66905 Genotype frequencies are included as supplemental material (Tables S5-S7). All four populations adjusted to HW equilibrium for most of the polymorphisms analyzed (Tables S8-S11) Table 2; two of them, Ya5NBC491 and Ya5DP4, were monomorphic for two and three populations, respectively, while Ya5DP13 was monomorphic for all. The average observed heterozygosity (OH) was also analyzed for each population, obtaining values of 0.43, 0.35, 0.40, and  Table 3).
The F st -values and p−values for the Alu insertions were also calculated using other populations from previous works (Athanasiadis et al., 2007;Gayà-vidal et al., 2009) (Table 4).
A multidimensional scaling plot (MDS) was graphed employing Reynolds index (Figure 2). The first dimension of the graph clearly separates Bolivians from the rest of populations, while the second dimension stresses the differentiation of two samples: Crete Island and Siwa Oasis. The rest of the samples cluster in an intermediate position, our Argentinian samples except Posadas are grouped together with an European and a North African sample. When these populations were grouped in Europeans, Africans, Bolivians and Argentinians the exact test gave signicant differentiation between them (p < 0.05) (data not shown) probably as a result of the Bolivian differentiation from one side, and the presence of Siwa Oasis in the African group, and Crete Island in the European one. The outlier position of these two samples may be an artifact due to the relatively low number of loci analyzed. The proximity of Basque data to our populations can be explained given that Argentina is the country with highest rate of Basque immigration in the world (Eusko Jaurlaritza -Gobierno Vasco, 2000;Ezkerro, 2002), while the Moroccan sample similarity might respond to an ancient relationship between northern Africa and southern Spain.
Each of the analyzed populations presented a different LD pattern. Posadas presented the major LD followed by Corrientes. See Tables S12-S15 for more information. A possible internal structure was analyzed in each of the populations using the Structure software. The results are shown in Figure 3, where similiarity can be observed between Corrientes and Eldorado B populations, probably due to their common origin. In the case of INDELs, they showed a marked interpopulation differentiation of Eldorado A compared to the other populations. F st values obtained ( Table 3) are in agreement with the differentiation shown with Structure graphics. A comparison against the 8 previously analyzed world populations, assuming the same parameters, is shown in Figures 4, 5. In both graphics a clear differentiation of Bolivian Native populations to any other can be observed. In Figure 4, a close relationship among Eldorado A and the populations of European origin is shown, while Eldorado   B seems to share much less European genetic background. Finally, the forensic parameters PIC, PE, PDf, and PDm were calculated using the 30 markers for the analyzed populations. See Tables S16-S18. The highest values for the SNP markers for Corrientes were rs1299087, rs933315 and rs1781104, for Posadas were rs5986751, rs2209420, and rs933315, for Eldorado A were rs5986751, rs1991961, and rs1781104 and for Eldorado B were rs5964206, rs933315, rs1991961, rs1781104, and rs149910.

DISCUSSION
In this work we analyzed the differentiation among the populations from three different north-east localities from Argentina: the two capital cities of Posadas and Corrientes, and the city of Eldorado. Previous studies include autosomal SNPs within coding regions for Corrientes city (Lopez-Soto and  and STRs of X chromosome for Corrientes and Posadas (Glesmann et al., 2011). The report on autosomal SNPs showed clear differences in the comparison of Corrientes to other populations of the world, while the report on X-chromosome STRs presented certain differentiation between Corrientes and Posadas (Di Santo et al., 2015). Information on uniparental variability is scarce, although some preliminary data on Corrientes city showed a high component of Amerindian mitochondrial DNA and a very low proportion of native Y chromosome. Given the pattern of inheritance of X-chromosome variation, females can influence the X variation of current populations double than males, thus we can interpret that an important native component is present in X-chromosome variation at this location (Golpe et al., 2017a,b). The genetic diversity found in this work between the four populations (Misiones, Corrientes, and the two populations of El Dorado) was similar to those referenced in the bibliography (Callinan et al., 2003;Athanasiadis et al., 2007;Gayà-vidal et al., 2009;Rocañín-Arjó et al., 2013), except for two Alu insertions that were monomorphic in some cases, and for the Alu insertion Ya5DP13 that were invariant, similar to previous data for European populations (Callinan et al., 2003;Athanasiadis et al., 2007). According to INDELs results (Figure 2B), the population from Eldorado was clearly separated into two subpopulations (dubbed A and B), as reflected in the number of ancestral populations obtained (k = 2). Figure 2B also marked a separation between Eldorado A, and Posadas and Corrientes; this differentiation was also evident in the rest of the comparisons. Consistently, Eldorado A participants reported a specific European ancestry, while Posadas, Corrientes, and Eldorado B participants reported to come from scattered places of Europe, or to have an unknown origin. Thus, a possible endogamy process within Eldorado A could be suggested. Such a process has been observed in other communities of immigrants in which the individuals tend to marry with partners who share their cultural customs (Junta de Estudios Históricos del Municipio de Eldorado, 2016). However, Eldorado A showed the lowest LD, although this result could be biased by the small size of the sample. It is less clear how to explain the high LD results obtained for the Posadas population, as this capital city was expected to receive a considerable migratory flow, in agreement with the average OH value being slightly (though not significantly) higher than the EH. This feature could be the product of a possible internal structure not detected with the HW analysis, therefore collecting further data from this population might help to clarify this issue. The four populations studied did not show high differentiation between them for the Alu insertions variation, reflecting that they share, at least partially, a common origin in the European immigration. This absence of marked interpopulational differences is in accordance with the lower rate of mutation of Alu insertions compared to other markers useful for analyzing the relationship between populations further in time (see e.g., Callinan et al., 2003;Athanasiadis et al., 2007;Gayà-vidal et al., 2009, for structure analysis of populations from other continents and/or countries).
Concerning intercontinental comparisons, results clearly indicated that the Eldorado A population was closer to Europeans. Such higher European ancestry is consistent with the self-reported origin of the participants, whom in most cases answered that their parents or grandparents came from Switzerland or Germany. Comparisons to data from sub-Saharan Africa (Ivory Coast) resulted in its separation from our four populations. This feature might be indicative of a low rate of slave traffic in the past in this region, unlike the more significant flow of slaves occurred across the central and north-west regions of Argentina (Morales and Alfaro, 2000).
A notable differentiation was found between the geographically closer Native Americans from Bolivia and our four populations (Figures 4, 5). The differences might be partially caused by the force of genetic drift acting on the Native groups all along the South American continent generating an exceptional genetic variation on every Native community (Cavalli-Sforza et al., 1996;Zago et al., 1996). There are several examples of Native communities subjected to a considerable reduction of variability as a result of isolation and small size, as in the case of Gran Chaco (Catanesi et al., 2007;Glesmann et al., 2013) and Amazonia Native people (Zago et al., 1996). But genetic drift was not the only process which increased the separation among different tribes. On the one hand, in the past the Guaraní predominated in North-east Argentina. Between 1609 and 1767 they were introduced to Jesuit evangelization missions, generating an admixed context which integrated them into the European civilization. At the beginning of the evangelization process the Guaraní were separated from the Spanish in full-service towns, and a technical organization was promoted by Jesuits. The rights of the Guaraní people were progressively guaranteed, favoring their population growth. After the priests of the Jesuit community were dismissed from South America, the Guaraní, instead of returning to the forest, started integrating with the European colonies, giving rise to the current admixed population. Nowadays, some Guaraní communities remain in the Misiones province, but those from  the Corrientes province gradually lost their seminomadic habits and their identity, and finally integrated with the urban populations (Heguy, 2012). On the other hand, the Bolivian Andes region was included in the Inca empire, which reached its limit in north-eastern Gran Chaco when the incas faced the native warriors living there (Ibanez, 2008). For that reason, it is reasonable to find a differentiation between Aymará-Quechua and communities from North-east Argentina, since the latter were involved in a process called Guaranization, and there currently exists a considerable component of Guaraní of Mbya origin that is neither Aymará nor Quechua (Magrassi, 1989). The X chromosome binary markers used in this work proved to be informative for differentiating the populations from North-east Argentina, where different levels of admixture with Native people could be observed. Future analysis including a wider range of X chromosome markers in new sampled individuals will surely clarify some unresolved aspects. So far, the Alu insertions resulted useful for comparing distantly related populations, as they gave important information for distinguishing a clear European origin in Eldorado A. Concerning INDELs and SNPs variation, they were more informative for revealing the differentiation within North-east Argentina, as expected.