Migration Route Out of Africa Unresolved by 225 Egyptian and Ethiopian Whole Genome Sequences

Population structure is a fundamental part of population genetics. In coalescent theory, the impact of population structure or a restriction of gene flow is well-studied (Hudson, 1990; Nordborg, 2003). Admixture is inter-mating between previously isolated populations, although the biological characteristics of genetically diverged parental populations can be debated. The pairwise sequentially Markovian coalescent model (Li and Durbin, 2011) and the multiple sequentially Markovian coalescent model (Schiffels and Durbin, 2014), both recently developed methods designed for whole genome sequence analysis, do not model admixture in a formal sense. However, simulations have shown that these models are sensitive to admixture (Li and Durbin, 2011), because admixture increases heterozygosity and consequently appears as an increase in the effective population size. The issue of ancient vs. recent admixture, and the actual time depths, is of concern due to the potentially obscuring effects of a range of evolutionary processes. Both of these models divide time into intervals, theoretically permitting detection of events at different time depths. Consequently, these genetic models have the potential to complement anthropological and archeological studies of the distant past.

about what constitutes African. Progress has been made: analyses of the genetic structure of autosomal data from global surveys of thousands of individuals have revealed multi-way ancestral compositions at a sub-continental level of resolution, likely reflecting evolution of local or regional populations (Tishkoff et al., 2009;Shriner et al., 2014). Three limitations of these types of studies are (1) the extent to which convenience samples are used, in comparison to a complete catalog of all ethno-linguistic or biogeographical groups (since ethno-linguistic groups have varying time depths), (2) the extent to which populations in such studies are arbitrary constructs (Gannett, 2003), and (3) the appropriateness of divergence by isolation to model the genealogical relationships among ancestries. Given these caveats, the ancestral compositions of samples of modern Egyptians and Ethiopians, as well the reference CEU sample, have been previously estimated (Shriner et al., 2014) and are summarized in Table 1. Notably, the two Egyptian samples we used include a low level of Cushitic ancestry but no Nilo-Saharan ancestry. This absence implies a lack of coverage of the full geographical range of Egyptians, including Nubians who today speak a Nilo-Saharan language (Dobon et al., 2015). There is also no evidence of coverage of individuals representing the Egyptian or Coptic language. Similarly, Figure 1B of Pagani et al. (2015) depicts "East African" ancestry, similar to the ancestry of the Gumuz (who speak a Nilo-Saharan language), constituting <10% of the Egyptians.
Regardless of the labels given to ancestries, which typically are presumed to be geographically or linguistically based, there are two problems with the data of Pagani et al. (2015). One problem is that their sample of modern Egyptians, like ours, does not reflect all modern Egyptians. Furthermore, it has not been established that original Nile Valley inhabitants are in some sense covered. The genetic compositions of core Afroasiatic (including Egyptian) and Nilo-Saharan speakers are not known fully. The authors chose a subset of five Ethiopian samples from a larger set (Pagani et al., 2012) on the basis of maximizing genetic and cultural diversity. This approach led to a choice of samples all containing substantial ancestral heterogeneity (Table 1), which confounds inference. We believe a better design principle for sample selection is to minimize ancestral heterogeneity, e.g., as used by Tishkoff et al. (2009) in their supervised clustering analysis. Of the Pagani et al. (2012) samples, better choices are Somali, rather than Ethiopian Somali, to represent Cushitic ancestry; Ari Blacksmith, rather than Wolayta, to represent Omotic ancestry; and South Sudanese, rather than Gumuz, to represent Nilo-Saharan ancestry (Table 1). Additionally, samples from Arabian, Levantine, and Maghrebi populations should have been included.
A second problem is that, of all the ancestries present in the Egyptian and Ethiopian samples, ancestry unique to and common in Ethiopians who currently speak an Omotic language is the most divergent (Shriner et al., 2014). Consequently, both "African" and "non-African" genomes are expected a priori to be more similar to the "African" component of Egyptian genomes than the "African" component of Ethiopian genomes, solely on the basis of genetic distance and independent of genealogical relationships among ancestries. To see this, suppose that East African ancestry in the Egyptians and Ethiopians is identical. Then, comparison of "non-Africans" to this East African component will be inconclusive. On the other hand, suppose that East African ancestry is a combination of Nilo-Saharan and Cushitic ancestries in the Egyptians with an additional Omotic contribution in the Ethiopians (Pagani et al., 2012). Then, given that Omotic ancestry is essentially restricted to Ethiopia, "non-Africans" will be more similar to Egyptians' East African than Ethiopians' East African.
With respect to time, the authors assume that "modern African populations are representative of those at the time of the exit" (Pagani et al., 2015). This assumption may be problematic because of underlying typological assumptions that include conceptualizing and treating geographically or linguistically defined populations such that the same genetic patterns would manifest in any sample from the geographical range or branch of the language family across time. More directly, it would have been useful if the authors had estimated the split time between the African components of the modern Egyptian and Ethiopian genomes. If this split time postdates Out of Africa, then we may infer that the African ancestors of the modern Egyptians and Ethiopians were not genetically differentiated at the time of exit and therefore that a northern route and a southern route are indistinguishable.
There are five lines of evidence against the assumption of representativeness. One, the authors assessed the split times of the "African" components of the modern Egyptian and Ethiopian genomes compared to a "non-African" CEU genome. Despite overlapping time intervals and a lack of formal statistical assessment, the authors inferred a higher similarity between "non-African" and Egyptian "African" components; we find the results to be inconclusive. Two, the authors assessed the split times of the "African" components of the modern Egyptian and Ethiopian genomes compared to a West African YRI genome and an East African Gumuz genome. The split times compared to the YRI genome were 21,000 and 37,000 years ago for Egyptians and Ethiopians, respectively, and even more recent compared to the Gumuz genome. Thus, the African ancestors of the West African YRI, the East African Gumuz, and the "African" components of the modern Egyptian and Ethiopian genomes had not split at the time of exit. Three, reconstruction of the phylogenetic history of autosomal ancestries showed that none of the autosomal ancestries of modern Egyptians and modern Ethiopians had yet diverged at the time of exit (Shriner et al., 2014). Four, in the authors' Supplement, the "African" component includes Y haplogroups A3b2, B2, and E, whereas the "non-African" component includes descendants of Y haplogroup F (specifically G, J, L, R, and T), which is not descended from A3b2, B2, or E. Five, also in the authors' Supplement, the mitochondrial DNA haplogroup L3, the ancestor of M and N haplogroups, is present in both modern Egyptians and modern Ethiopians. Thus, both Y and mitochondrial DNA are inconclusive.
Ancient DNA might help to resolve the question of the route out of Africa, if temporally appropriate specimens can  (Shriner et al., 2014  The Egyptian samples were described in Henn et al. (2012) and Behar et al. (2010); the Gumuz, Amhara, Oromo, Wolayta, Ethiopian Somali, Somali, Ari Blacksmith, and South Sudanese samples were described in Behar et al. (2010) and Pagani et al. (2012); and the CEU sample was described by The International HapMap Consortium (2005).
be found. The individual named Bayira discovered in the Mota Cave in Ethiopia dated to ∼4500 years ago (Gallego Llorente et al., 2015), which is not old enough. Also, Bayira was determined to be ancestrally homogeneous for Omotic ancestry (Gallego Llorente et al., 2015). By comparison, our data set contains the equivalent of 69 individuals ancestrally homogeneous for Omotic ancestry (Shriner et al., 2014), reflecting the ability of ancestry analysis to disentangle recent admixture. Taken together, the autosomal, Y chromosome, and mitochondrial DNA data support the conclusion that the indigenous African components of the specific samples of modern Egyptians and modern Ethiopians studied by Pagani et al. (2015) are uninformative with respect to the origin of non-Africans. The available data suggest that the separation of ancient Egyptians and ancient Ethiopians postdates Out-of-Africa. In the absence of ancient DNA specimens, estimation of genetic profiles of core Afroasiatic and Nilo-Saharan speakers requires phylogenetic techniques to reconstruct ancestral states.