Comprehensive Insights Into Forensic Features and Genetic Background of Chinese Northwest Hui Group Using Six Distinct Categories of 231 Molecular Markers

The Hui minority is predominantly composed of Chinese-speaking Islamic adherents distributed throughout China, of which the individuals are mainly concentrated in Northwest China. In the present study, we employed the length and sequence polymorphisms-based typing system of 231 molecular markers, i.e., amelogenin, 22 phenotypic-informative single nucleotide polymorphisms (PISNPs), 94 identity-informative single nucleotide polymorphisms (IISNPs), 24 Y-chromosomal short tandem repeats (Y-STRs), 56 ancestry-informative single nucleotide polymorphisms (AISNPs), 7 X-chromosomal short tandem repeats (X-STRs), and 27 autosomal short tandem repeats (A-STRs), into 90 unrelated male individuals from the Chinese Northwest Hui group to comprehensively explore its forensic characteristics and genetic background. Total of 451 length-based and 652 sequence-based distinct alleles were identified from 58 short tandem repeats (STRs) in 90 unrelated Northwest Hui individuals, denoting that the sequence-based genetic markers could pronouncedly provide more genetic information than length-based markers. The forensic characteristics and efficiencies of STRs and IISNPs were estimated, both of which externalized high polymorphisms in the Northwest Hui group and could be further utilized in forensic investigations. No significant departure from the Hardy–Weinberg equilibrium (HWE) expectation was observed after the Bonferroni correction. Additionally, four group sets of reference population data were exploited to dissect the genetic background of the Northwest Hui group separately from different perspectives, which contained 26 populations for 93 IISNPs, 58 populations for 17 Y-STRs, 26 populations for 55 AISNPs (raw data), and 109 populations for 55 AISNPs (allele frequencies). As a result, the analyses based on the Y-STRs indicated that the Northwest Hui group primarily exhibited intimate genetic relationships with reference Hui groups from Chinese different regions except for the Sichuan Hui group and secondarily displayed close genetic relationships with populations from Central and West Asia, as well as several Chinese groups. However, the AISNP analyses demonstrated that the Northwest Hui group shared more intimate relationships with current East Asian populations apart from reference Hui group, harboring the large proportion of ancestral component contributed by East Asia.


INTRODUCTION
For recent years, the extensive applications of DNA analysis technologies have made it a crucial tool in forensic investigations. To date, the genomic DNA from biological samples is predominantly dissected by PCR and capillary electrophoresis (CE)-based method to reveal the length variations in genetic markers, such as short tandem repeats (STRs). Further, the DNA sequencing technology also serves as an important role in providing comprehensive information of the target DNA in forensic applications. Conventional Sanger sequencing, initially introduced in the 1970s, has enabled enormous progress in the fields of molecular biology, genomics, and genetics and been regarded as the standard sequencing technology in the forensic investigations (Sanger et al., 1977). Yet, the shortcomings of Sanger sequencing technology, such as low throughput and sensitivity, have hampered its utilization in the more in-depth and intricate genome analyses, which facilitate the exploration of other high-throughput DNA technologies in forensic studies (Fullwood et al., 2009). Massively parallel sequencing (MPS), or next-generation sequencing (NGS), has become an emerging tool commonly utilized in forensic genetic fields (Børsting and Morling, 2015;Li et al., 2017). MPS technology has advantages such as simultaneous sequencing of multiple types of genetic molecular markers and detecting samples at an extraordinarily high throughput capacity, which make it possible to yield forensic data containing more information in a single reaction (England et al., 2020). In addition, the polymorphisms of sequence variations contained in different genetic molecular markers which are undetectable by traditional CE technology, like STRs, are easily identified using MPS technology platforms, thus increasing the possibility of discovering new STR alleles (Gettings et al., 2015;Churchill et al., 2016;Novroski et al., 2016). MPS platforms also maintain the conventional abilities to identify STRs length polymorphisms, facilitating the compatibility of currently forensic DNA data generated by PCR and CE-based platform (Parson et al., 2016;Bruijns et al., 2018).
The ForenSeq ™ DNA Signature Prep Kit (Verogen Inc., San Diego, CA, United States), a newly developed MPS-based commercial kit, is a length and sequence polymorphismsbased typing system for three kinds of STRs and three types of single nucleotide polymorphisms (SNPs), which can simultaneously detect 231 genetic molecular markers in a single reaction on MiSeq FGx ™ Forensic Genomics System (Verogen Inc., San Diego, CA, United States). This kit provides two different primer mixes, including primer mixes A and B. Primer mix A is designed to detect amelogenin, 27 autosomal STRs (A-STRs), 24 Y-chromosomal STRs (Y-STRs), 7 X-chromosomal STRs (X-STRs), and 94 identity informative SNPs (IISNPs); and primer mix B contains primer mix A plus the primers for 22 phenotypic informative SNPs (PISNPs) and 56 biogeographical ancestry informative SNPs (AISNPs) (Jäger et al., 2017). To date, the system performance of the ForenSeq ™ DNA Signature Prep Kit has been comprehensively evaluated by investigating the reproducibility, sensitivity, concordance, casework-type sample, and inter-laboratory comparison and validation, which has proven it to be a promising tool in forensic applications over recent years (Just et al., 2017;Xavier and Parson, 2017;Köcher et al., 2018). Further, several studies have demonstrated that this kit is suitable for challenging samples such as forensically degraded and mixed samples from crime scenes (Fattorini et al., 2017;Sharma et al., 2020), and also performs well in the kinship analyses , phenotypic and biogeographical ancestry predictions (Sharma et al., 2019), and in the exploratory studies of population genetics (Delest et al., 2020).
The Chinese Hui ethnic group is one of the national minorities officially recognized by the People's Republic of China and widespread throughout China, including 34 provincial-level administrative regions. According to the population distribution, the Chinese Hui group is more concentrated in Northwest China. Intriguingly, the Hui group occupies a special existence among the Chinese ethnic minorities, of which the individuals are portrayed as Muslims who appear culturally and linguistically similar to the Han population. Currently, population genetic studies concentrated on limited kinds of genetic markers have been conducted on Hui groups from Chinese different regions. For example, Zou et al. detected the genetic polymorphisms of 30 insertion/deletion (InDels) in the Guangxi Hui group (Zou et al., 2020). HLA class I polymorphisms were investigated in the Hui group from Chinese Qinghai province by Hong et al. (2007). The genomic makeup and ancestry background of the Hui group from Sichuan province were explored using over 700K SNPs by Liu et al. (2021). Guo et al. investigated the population structure of the Hui group from Chinese Liaoning province based on 17 Y-STR loci (Guo, 2017). Yao et al. and Liu et al. explored the genetic background of the Hui group from Chinese Gansu province using 15 autosomal STRs (Yao et al., 2016) and 27 Y-STRs  loci, respectively. The genetic polymorphisms of the Hui group from Chinese Ningxia Hui (NXH) autonomous region were estimated by various research teams using 15 STRs , 24 Y-STRs (Zhu et al., 2014), 30 InDels , 12 X-STRs , and 17 Y-STR loci (Guo et al., 2008), respectively. Thirty InDels , 30 AISNPs , 22 STRs , and 39 ancestryinformative marker (AIM) InDel loci (Xie et al., 2020) were also separately applied in other Hui groups from Northwest China.
In fact, different opinions on the genetic origin and ancestral history of the Chinese Hui group have existed for years. Two kinds of hypotheses have been proposed to infer the historical formation of the Chinese Hui group, which are demic diffusion and cultural diffusion hypotheses (Liu et al., 2021). In addition, due to the limited types of genetic markers utilized separately in previous studies mentioned above, it might be more necessary to simultaneously enroll various types of genetic markers to provide more extensive information for exploring the genetic structure and ancestral origin of the Chinese Hui group. Thus, in the present study, we implemented 231 genetic markers (amelogenin,94 IISNPs,22 PISNPs,and 56 AISNPs) from the ForenSeq ™ DNA Signature Prep Kit into the Hui group from Northwest China. Both length-and sequencebased polymorphisms of genetic markers were utilized into 90 unrelated male individuals recruited from the Northwest Hui group, with the intention of comprehensively detecting the forensic characteristics and genetic background of the Northwest Hui group and subsequently enriching the genetic information of the Chinese populations.

Sample Information
In total, 90 blood samples from unrelated healthy Hui male individuals from Northwest China were collected. All the participants provided their written informed consents prior to sample collection. The present research was conducted according to the ethical guidelines of the Xi'an Jiaotong University Health Science Center and further authorized by the Ethical committees of the Xi'an Jiaotong University Health Science Center (approval number: 2019-1039; 2020-1382).

Library Preparation
Library preparation was performed in accordance with the recommendation of the ForenSeq ™ DNA Signature Prep Kit (Illumina Inc., CA, United States) (Illumina, 2015). The first stage was to amplify and tag the DNA targets. One disc containing a 1.2 mm-diameter blood sample was directly amplified without DNA extraction in the ForenSeq ™ Sample Plate. PCR was conducted on the GeneAmp PCR System 9700 Thermal Cycler (Applied Biosystems, CA, United States) according to the following parameters: 98°C for 3 min; eight cycles of 96°C for 45 s, 80°C for 30 s, 54°C for 2 min, and 68°C for 2 min; 10 cycles of 96°C for 30 s, 68°C for 3 min; 68°C for 10 min; and hold at 10°C. The first-round PCR products were subsequently utilized in the second-stage PCR to enrich the targets. The index adapters and sequences required for cluster amplification were also added in the second-stage PCR. The detailed PCR parameters were as follows: 98°C for 30 s; 15 cycles of 98°C for 20 s, 66°C for 30 s, and 68°C for 90 s; 68°C for 10 min; and hold at 10°C. All samples were amplified with DNA primer mix B. The amplified DNA libraries were then purified from the remaining reaction components using purification beads, and the concentrations of the libraries were normalized to ensure a consistent cluster density. The normalized libraries required to sequence on the same flow cell were then mixed in equal volumes during the library pooling stage. The mixed libraries were then diluted in a hybridization buffer, added the human sequencing control and denatured at 96°C in preparation for sequencing. Finally, after denaturation and dilution, the mixed libraries were sequenced on the MiSeq Desktop Sequencer using the MiSeq FGx Forensic Genomics System (Illumina Inc., CA, United States) (Illumina, 2015).

Data Analyses and Interpretation
The generated data were processed using the ForenSeq ™ Universal Analysis Software based on the default thresholds. In detail, the analytical threshold (AT), interpretation threshold (IT), stutter filter (SF), and intra-locus balance (IB) were utilized to estimate the data quality (Illumina, 2015;Köcher et al., 2018). The optimized AT values were 1.5% for all loci except for DYS635 (3.3%), DYS389II (5%), and DYS448 (3.3%) loci, which represented the lower boundary for the valid results accounting for the entire read count per locus. The default IT values were 4.5% for all loci apart from DYS635 (10%), DYS389II (15%), and DYS448 (10%) loci, indicating the upper boundary of the uncertainty range. When the resulting values are between the AT and IT, the user should identify whether an actual variant has occurred. The SF values for all STR loci were extended from 7.5% to 50% with the average value as 19%. The general settings of IB values were 60% for STRs and 50% for SNPs. As for the valid sequencing depth, the minimum depths were set as 10× for STRs and 5× for SNPs. For A-STRs, the parameters for both sequenceand length-based polymorphisms were calculated using STRAF software v1.0.5 (Gouy and Zieger, 2017), including observed heterozygosity (H obs ), expected heterozygosity (H exp ), polymorphism information content (PIC), power of discrimination (PD), exclusion probability (PE), match probability (PM), typical paternity index (TPI), and the Hardy-Weinberg equilibrium (HWE). Specifically, the H obs was utilized to measure the genetic polymorphic degree of a specific loci, which is described as the observed proportion of heterozygous genotypes detected in a population (Sheriff and Alemayehu, 2018). The H exp , a fundamental statistical parameter used to estimate genetic diversity within populations, is referred to as the expected proportion of heterozygotes under HWE (Nei, 1973). The PIC has been reported as a statistical indicator for measuring the polymorphisms of genetic markers in a population (Botstein et al., 1980;Shete et al., 2000), which is actually determined using heterozygosity and number of alleles (Aljumaah et al., 2012). The PM is defined as the probability of a match between two unrelated individuals selected at random (Fisher, 1951;Malaspinas et al., 2011). Closely related to PM, the PD is the probability of distinguishing two unrelated individuals (Tillmar, 2010). The PE can be treated as the probability of Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 705753 excluding a man falsely indicated as the biological father, which is a measure of efficiency in paternity testing (Cifuentes et al., 2006;Tillmar, 2010;Vandeputte, 2012). For X-STRs and Y-STRs, the gene diversity (GD), PM, haplotype diversity (HD), and haplotype discrimination capacity (DC) were calculated based on the methods of previous reports (Nei, 1987;Liu et al., 2020). In terms of the IISNPs, the STRAF software v1.0.5 was again utilized to estimate the forensic parameters, including H obs , H exp , PIC, PD, PE, PM, TPI, and HWE.

Population Genetic Analyses
A total of four group sets of previously published population data were exploited as references to perform population genetic analyses and further dissect the ancestral components of the Northwest Hui group. The employed data are as follows: raw data of 26 populations for 93 IISNPs enlisted from the 1000 Genomes Project (https://www.internationalgenome.org/data), raw data of 17 Y-STRs for 58 populations collected from previous studies or the Y Chromosome Haplotype Reference Database (YHRD) database (https://yhrd.org/), raw data of 26 populations for 55 AISNPs gathered from the 1000 Genomes Project, and allele frequencies of 109 populations for 55 AISNPs assembled from previously published studies (Kidd et al., 2014;Pakstis et al., 2015;Pakstis et al., 2017;Pakstis et al., 2019a). The detailed reference population information and citations are listed in Supplementary Table S1. Initially, population genetic analyses of the Northwest Hui group and the 26 reference populations (Auton et al., 2015) were performed based on the 93 overlapping IISNPs. A heatmap of allele frequencies for these 93 IISNPs in the Northwest Hui and 26 reference populations was plotted using the R software v3.3 (Team, 2016), which was hierarchically clustered based on the Euclidean distances. Principal component analysis (PCA) of these populations was conducted using the XLSTAT program (https:// www.xlstat.com/en/) based on the allele frequencies of these 93 IISNPs. We also conducted a PCA plot of these populations at the individual level using the PLINK software v1.9 based on the raw data of these 93 IISNPs (Chang et al., 2015). The pairwise D A genetic distances of these populations were calculated using the DISPAN program (http://www.personal.psu.edu/nxm2/dispan2. htm), and a rooted neighbor-joining (NJ) tree was further constructed using the MEGA software v6.0 (Tamura et al., 2013) based on the D A genetic distances. Moreover, the population-specific divergence values of each loci in different intercontinental population sets were estimated using the informativeness for assignment (I n ) statistic based on the allele frequencies of 93 IISNPs (Phillips, 2015). The I n statistical analysis was originally introduced by Rosenberg et al. (2003), with the purpose of determining the amount of ancestry information provided by biallelic or multiallelic markers, which could be further utilized to evaluate the effectiveness of genetic markers in differentiating various populations. Next, the population genetic relationships between the Hui group and 58 worldwide populations were revealed from a paternal perspective based on 17 overlapping Y-STRs. The pairwise R st values based on Y-STR haplotypes were generated using an online AMOVA tool (https://yhrd.org/amova). R st is developed based on a stepwise mutation model to assess the genetic differentiations among populations (Balloux and Lugon-Moulin, 2002). Both heatmaps and histograms of the R st values were performed using R software v3.3 (Team, 2016). Multidimensional scaling (MDS) was yielded based on R st values using the Statistical Package for the Social Sciences (SPSS) 16.0 software. Eventually, the ancestral structure of the Northwest Hui group based on 55 AISNPs was explored using two sets of population data as the references. The PCA plot on individual level and STRUCTURE analysis were performed using the XLSTAT program (https://www.xlstat.com/en/) and ADMIXTURE software v1.3 (Alexander et al., 2009), respectively, based on the 55 AISNPs genotypes of the Northwest Hui group and the 26 reference populations. The ancestral component analysis of the Northwest Hui group was performed using ADMIXTURE software v1.3 (Alexander et al., 2009) on the basis of 55 AISNPs genotypes. The PCA, pairwise D A genetic distances, and the rooted NJ tree of the Northwest Hui and other 109 reference populations were conducted using SPSS 16.0 software, the DISPAN program, and MEGA software v6.0 (Tamura et al., 2013), respectively, using the allele frequencies of 55 overlapping AISNPs.

MPS Results for Three Genres of STRs and Three Kinds of SNPs
The MPS genetic data of the 90 male Hui individuals were genotyped using the primer mix B reagent of the ForenSeq ™ DNA Signature Prep Kit. The genotyping results for three genres of STRs (27 A-STRs, 24 Y-STRs, and 7 X-STRs) based on both length-and sequence-based polymorphisms are presented in Supplementary Table S2. The total success rates of 27 A-STRs, 24 X-STRs, and 7 Y-STRs were 99.96%, 99.84%, and 99.81%, respectively. A total of 54 out of 58 STR loci were efficiently genotyped with a success rate of 100%, while the success rates of the remaining PentaE, DYS392, DYS448, and DXS7132 loci were 98. 89, 98.89, 96.67, and 98

Forensic Parameters for Various Genres of Genetic Markers
For the Northwest Hui group, the allelic polymorphisms and forensic parameters of 27 A-STR loci calculated on the basis of sequence-and length-based polymorphisms are presented in Supplementary Table S5. The number of length-based alleles fluctuated from 5 at D3S1358 and D4S2408 loci to 15 at D18S51, FGA, and Penta E loci, whereas the number of sequence-based alleles varied from 6 at D22S1045, TH01, and TPOX loci to 33 at D12S391 locus. HWE tests were conducted separately on sequence-and length-based A-STR loci. Before Bonferroni correction, it was observed that the TH01 locus deviated from the HWE for both sequence-and length-based polymorphisms; the vWA locus departed from the HWE for length-based polymorphisms; and the remaining 25 A-STR loci were in accordance with the HWE. The other forensic parameters are graphically presented in Figure 1. and 1-1.0266 × 10 -12 , respectively. The PD values of all A-STR loci were larger than 0.7760 at TPOX locus for both length-and sequence-based polymorphisms, while all the PD values were less than 0.9741 at PentaE locus based on length polymorphisms and less than 0.9788 at D21S11 locus based on sequence polymorphisms. Additionally, the combined PD value observed at the length and sequence levels were 1-3.0634 × 10 -30 and 1-8.0118 × 10 -33 , respectively. The PM values were in the range from 0.0259 (PentaE) to 0.2240 (TPOX) for length-based STRs and from 0.0212 (D21S11) to 0.2240 (TPOX) for sequence-based STRs. The TPOX locus showed the lowest TPI value as 1.2500 for both length-and sequence-based STRs, and the D12S391 locus presented the highest TPI values as 6.4286 for length-based STRs and 11.25 for sequence-based STRs. Further, the haplotypic results and forensic parameters of 24 Y-STRs revealed from the perspective of length and sequence levels are presented in Supplementary Table S6. The GD values for length-based polymorphisms spanned from 0.4793 at DYS391 to 0.9718 at DYS385a-b loci, and for sequence-based polymorphisms from 0.4793 at DYS391 to 0.9903 at DYF387S1 locus. Eleven of the 24 Y-STRs exhibited discrepancies in GD values between length-and sequence-based Y-STRs, of which the GD values increased from 0.6324 to 0.6425 at DYS389I, 0.7813 to 0.9311 at DYS389II, 0.7610 to 0.7893 at DYS390, 0.5181 to 0.6774 at DYS437, 0.5912 to 0.6372 at DYS438, 0.7279 to 0.7391 at DYS439, 0.7498 to 0.8656 at DYS448, 0.8209 to 0.8272 at DYS570, 0.8599 to 0.8754 at DYS612, 0.8237 to 0.8569 at DYS635, and 0.6857 to 0.6932 at Y-GATA-H4 locus. No discrepancies were observed between the length-and sequence-based results for the DC, PM, and HD values, which were 1.0000, 0.0111, and 1.000, respectively. The length-and sequence-based haplotypes and forensic parameters for seven X-STRs in males are displayed in Supplementary Table S7. The GD values fluctuated from 0.5316 at DXS7423 to 0.8986 at DXS10135 locus for length-based polymorphisms and from 0.5316 at DXS7423 to 0.9176 at DXS10135 locus for sequencebased polymorphisms. For seven X-STR loci, different GD values for length-and sequence-based STRs were detected at four loci, including DXS8378, DXS7132, DXS10103, and DXS10135. The length-based polymorphisms were disclosed the same DC, PM, and HD values as the sequence-based polymorphisms, which were 1.0000, 0.0111, and 1.0000, respectively. The allele frequencies and forensic parameters of 94 IISNPs are presented in Supplementary Table S8. The minor allele frequencies (MAF) approximately ranged from 0.1056 (rs2040411, T; rs733164, G) to 0.4944 (rs1028528, A). The MAF emerged 32.98% of the total loci for A alleles, 22.34% for C alleles, 20.21% for G alleles, and 24.47% for T alleles. The H obs and H exp values varied from 0.1889 (rs1355366, rs2056277, rs2107612, and rs733164) to 0.5778 (rs722290) and from 0.1900 (rs2056277 and rs740910) to 0.5030 (rs3780962, rs891700, rs2269355, and rs907100) with the average values of 0.4249 (±0.0958) and 0.4330 (±0.0842), respectively. The PIC values varied from 0.1710 at rs2056277 and rs740910 loci to 0.3750 at rs891700 locus with an average value of 0.3343 (±0.0532). The PD values spanned from 0.3242 (rs2056277) to 0.6649 (rs214955) with average value of 0.5708 (±0.0795), and the combined PD value for the 94 IISNPs was 1-7.3652 × 10 -36 . The PE values ranged from 0.0268 (rs1355366, rs2056277, rs2107612, and rs733164) to 0.2651 (rs722290) with average value of 0.1403 (±0.0579). The lowest PM value was 0.3351 at rs214955 locus, whereas the highest value was 0.6758 at rs2056277 locus; and the average value was 0.4292 (±0.0795). The TPI values were in the range from 0.6164 (rs1355366, rs2056277, rs2107612, and rs733164) to 1.1842 (rs722290), and the average value was 0.8916 (±0.1354).

STR Sequence Variations Observed Using the MPS Method
The sequence-and length-based alleles and corresponding frequencies of 58 STRs (27 A-STRs, 24 Y-STRs, and 7 X-STRs) for the Northwest Hui group are listed in Supplementary Table S9. Additional alleles with sequence variants were revealed by using the MPS platform, which were indistinguishable using the CE method. In contrast to the lengthbased alleles, increasing rates of detected alleles on the sequence level were observed, which are presented in Supplementary  Table S10. On the basis of length polymorphism alone, 451 distinct alleles were identified from 58 STRs in 90 unrelated Hui individuals. When the sequence variants were taken into consideration, the allelic diversities of 31 STRs, including 15 A-STRs, 12 Y-STRs, and 4 X-STRs, increased obviously, leading to a total allele number of 652 for 58 STRs in the Northwest Hui group. In detail, the allele numbers for A-STRs in the Hui group ranged from 5 (D3S1358 locus) to 15 (D18S51, FGA and PentaE loci) on the length polymorphism level, whereas 6 (D22S1045, TH01 and TPOX loci) to 33 (D12S391 locus) alleles were observed on the sequence level, which contributed 1 to 23 additional alleles. The D12S391 locus exhibited the most diversity with 230% increase in ratio. The allele numbers in four STRs, including D2S1338 (170.00%), D21S11 (158.33%), D13S317 (125.00%), and D3S1358 (120.00%) loci, increased by more than double. For the Y-STRs loci, a total of 155 and 231 different alleles were identified on the basis of length and sequence polymorphisms, respectively. The highest allelic numbers were 12 at DYS385a-b loci for the length-based level and 33 at DYF387S1 locus for the sequence-based level. The increased ratios of allelic numbers in three Y-STRs exceeded 100%, including DYF387S1 (266.67%), DYS389II (242.86%), and DYS448 (150.00%) loci. Nine Y-STRs displayed 16.67-83.33% extra sequence-based alleles compared with length-based alleles. Four X-STRs, DXS10135, DXS10103, DXS8378, and DXS7132 loci, showed both length and sequence polymorphisms with the increasing ratios ranging from 14.29% to 57.89%. No additional alleles were identified based on sequence polymorphisms in 12 A-STRs, 12 Y-STRs, and 3 X-STRs compared with the length-based polymorphisms, thus providing the same recognition capability as the traditional CE approach.

Phenotypic Predictive Analysis
The phenotypic SNPs utilized in this study mainly focus on pigmentation, including hair and eye colors. The predicted phenotype results for male individuals from the Northwest Hui group are displayed in Supplementary Table S11 and Supplementary Figure S2. The hair colors contained four possible traits, including brown, red, black and blond colors, and the eye colors encompassed three possible traits, including brown, blue and intermediate colors. The tested 90 male individuals from the Northwest Hui group were predominantly revealed with black hair and brown eyes, and the percentages of black hair and brown eyes for all individuals were in the range from 52% to 96%, and 87% to 100%, respectively. The predicted results are roughly in accordance with the phenotypes of the studied Hui group.

Genetic Differentiation Analyses Between the Northwest Hui Group and Reference Populations Based on IISNPs
The genetic differentiations between the Northwest Hui group and worldwide reference populations were analyzed based on the overlapping IISNPs loci. The reference data were obtained from 2,504 individuals of 26 populations in the 1000 Genomes Project.
In contrast with the 94 IISNPs provided by the ForenSeq ™ DNA Signature Prep Kit, one locus named rs938283 was not found in the 1000 Genomes Project, thus leading to the use of 93 overlapping IISNPs in the following analyses. Serial plots, including a heatmap of ancestral allele frequencies, a NJ tree, two PCA plots, and a plot of population-specific divergences based on I n statistic, were constructed, with the intention of determining the genetic discrimination abilities of these 93 IISNPs among the Northwest Hui group and reference populations, as well as the individual identification performances of these IISNPs in the Northwest Hui group. The allele frequencies of 93 IISNPs were utilized to generate a cluster heatmap, facilitating the intuitive visualization of polymorphic distributions for these 93 same IISNPs in the Northwest Hui group and reference populations. As presented in Figure 2, the deeper blue indicated lower allelic frequency values, whereas the deeper red represented higher-frequency values. Populations were divided by different continental origins, including African, American, East Asian, European, and South Asian clusters, whereas some clusters exhibited different patterns of allelic frequency distributions. In particular, the frequency distributions in African populations were evidently distinct from those in the other four intercontinental clusters. For example, one set of IISNP loci marked using the dotted box in Figure 2, including the rs917118, rs1382387, rs1015250, rs1028528, rs1335873, rs1528460, and rs722098 loci, overwhelmingly displayed red in African populations, while they presented blue in the remaining populations from the other four continents. Additionally, the allelic frequency distribution in the Northwest Hui group showed the most similar pattern to East Asian populations, in accordance with the dendrogram classifications based on Euclidean distances in Figure 2. The MAF values of the Northwest Hui group and reference populations are plotted based on the allelic frequencies of these IISNPs in Supplementary Figure S3.
To further inspect whether the IISNP genetic markers could reveal the genetic relationships between the Northwest Hui group and worldwide reference populations, two PCA plots and one NJ tree were constructed. As presented in Figure 3A, the genetic relationships among populations were revealed by the first and second principal components accounted for 48.58% and 20.22%, respectively. According to the first discrimination component, all the populations were segregated into two group sets, i.e., the African group (deep blue) and the non-African group. From the perspective of the second discrimination component, the non-African populations were further categorized into four subgroups. One subgroup was distributed in the top right corner of the first quadrant, including populations from East Asia (yellow) and the Northwest Hui group (red); South Asian populations (green) belonging to another subgroup were located directly beneath the East Asian populations in the bottom right corner of the first quadrant; European populations (pink) were clustered as a subgroup in the lower right corner of the fourth quadrant; and American populations (purple) were scattered between the South Asian and European subgroups in the fourth quadrant. In addition, to dissect the genetic relationships more deeply, a PCA plot on the individual level was constructed. As shown in Figure 3B, the dots in various colors represented the individuals deriving from different biogeographical regions. All individuals were divided into two large cluster groups on the first principal component. Africans in deep blue occupied the right-hand side of the plot as the first cluster. The other non-African cluster was further dispersed into three sub-clusters on the second principal component, namely, East Asians in orange, South Asians in green, and Europeans in pink, which superimposed without a clear boundary. The Americans in yellow were mainly clustered with South Asians and Europeans. Individuals from the Northwest Hui group (red) predominantly overlapped with East Asians, while sporadic Hui individuals were scattered among the South Asians. A NJ tree coupled with a histogram was subsequently constructed based on the D A values in Figure 3C. All populations were divided into four distinct sub-branches, which were the European and American populations for the first, South Asian populations  Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 705753 8 for the second, the East Asian populations for the third, and the African populations for the fourth sub-branch. The cluster distributions of all populations were roughly concordant with their geographical regions, except for four American populations. Further, the Northwest Hui group was evidently clustered with East Asian populations and displayed the lowest divergencies with Han populations, including the CHB (D A 0.0068) and CHS (D A 0.0069) populations. Obviously, populations from Africa revealed the largest genetic distances from the Northwest Hui group with the D A values ranging from 0.0199 to 0.0307.
Eventually, to determine the ability of the 93 IISNP loci on distinguishing populations, a plot of population-specific divergences for the five intercontinental clusters was constructed based on the I n statistic. In Figure 4, populations from one continent had one corresponding I n value at each IISNP locus, thus, there were five intercontinental I n values at each IISNP locus. African populations generated the largest I n values at 30 IISNP loci, ranging from 0.0200 at rs901398 locus to 0.2105 at rs1335873 locus, in comparison with the remaining intercontinental populations. African populations with relatively high I n values always tended to separate from American, East Asian, European, and South Asian populations, especially at four loci (rs1335873, rs1528460, rs722098, and rs1028528) revealing the highest discrepancy I n values (I n > 0.1). Additionally, American populations encompassed 13 loci showing the largest discrepancy values, and the I n values of American populations at these 13 loci varied from 0.0004 at rs13218440 locus to 0.0219 at rs354439 locus. A total of 19 loci showed the largest discrepancy values in East Asian populations, ranging from 0.0005 at rs576261 locus to 0.0715 at rs2040411 locus. European and South Asian populations exhibited the largest discrepancy values at 11 and 20 loci, respectively, of which the I n values were in the range from 0.0015 (rs6955448) to 0.0623 (rs1886510) and from 0.0018 (rs722290) to 0.0403 (rs987640), respectively. The I n values in American populations were in the range from 0.0000 at six loci (rs873196, rs914165, rs2399332, rs10092491, rs763869, and rs2056277) to 0.0389 at rs740910 locus with an average value of 0.0052 (±0.0084). East Asian populations generated I n values ranging from 0.0000 at seven loci (rs8078417, rs2046361, rs13218440, rs1058083, rs1336071, rs1109037, and rs9951171) to 0.0715 at rs2040411 locus with an average value of 0.0091 (±0.0157). Nine loci, including rs2831700, rs2107612, rs445251, rs2342747, rs13218440, rs321198, rs993934, rs3780962, and rs2830795, presented the lowest discrepancy values (I n 0.0000) in European populations, while rs722098 locus yielded the highest discrepancy value (I n 0.0858), and the average I n value was 0.0245 (±0.0190). The I n values in South Asian populations varied from 0.0000 at eight loci (rs6444724, rs873196, rs917118, rs1413212, rs2830795, rs2269355, rs576261, and rs717302) to 0.0500 at rs1335873 locus with an average value of 0.0140 (±0.0118). The African populations offered the lowest discrepancy value (I n 0.0000) at eight loci (rs10488710, rs891700, rs338882, rs1493232, rs576261, rs445251, rs159606, and rs13218440), whereas the highest value was observed at rs1335873 locus (I n 0.2105), and the average value was 0.0198 (±0.0373).

The Comparisons of Y-STR Haplotype Polymorphisms Between the Northwest Hui Group and Reference Populations
The patrilineal genetic data of 59 populations, the Northwest Hui group and 58 reference populations, were employed in this study, which could be classified into six different biogeographical group sets, including 44 populations from East Asia, seven populations from West Asia, three populations from Central Asia, two populations from Africa, two populations from Europe, and one population from Australia. A total of 17 shared Y-STR loci were employed from these 59 populations, including DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385a, DYS385b, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, and YGATAH4. It is worth mentioning that three loci (DYS393, DYS456, and DYS458) from the abovementioned 17 Y-STR loci were absent in the 24 Y-STR markers from the ForenSeq ™ DNA Signature Prep Kit. The genotype profiles of these three loci in the Northwest Hui group were replenished using the PCR and CE-based method.
As shown in Figure 5, a heatmap was constructed using the pairwise R st values calculated based on the raw Y-STR data from the worldwide populations. The gradually deepening yellow indicated the increasing R st values, whereas the deepening blue indicated the decreasing R st values. In the heatmap, populations were labeled according to different linguistic families, containing eight populations from the Altaic family in yellow, two populations from the Afro-Asiatic family in grey, five populations from the Indo-European family in red, 42 populations from the Sino-Tibetan family in blue, and two populations from the Niger-Congo family in pink. All the Hui groups from Chinese different regions, including the studied Northwest Hui group (NWH), Ningxia Hui (NXH), Shaanxi Hui (SAXH), Henan Hui (HNH), Gansu Hui (GSH), Qinghai Hui (QHH) and Yunnan Hui (YNH), except for Sichuan Hui (SCH) were primarily clustered together, then gathered with the populations from East Asia, and subsequently grouped with Central and West Asian populations in deeper blue color.
To display the genetic relationships more clearly, a histogram of pairwise R st values between the Northwest Hui group and the reference populations was plotted on the right-hand side of the heatmap. The Northwest Hui group retained extremely low genetic distances (R st < 0.01) with other reference Hui groups from Chinese different regions, with the exception of SCH. In detail, four reference Hui groups from Chinese different regions, including GSH (R st −0.0068), HNH (R st −0.0022), QHH (R st −0.0082), and YNH (R st −0.0081), displayed negative R st values; two reference Hui groups, NXH (R st 0.0041) and SAXH (R st 0.0062), showed extremely low genetic distances; and one reference Hui group, SCH (R st 0.0371), exhibited the lower genetic differentiation (0.01 < R st < 0.05) from the Northwest Hui group. Intriguingly, the studied Hui group revealed clearly lower genetic differentiations from the Central and West Asian populations, especially for East Anatolian Turkey (EAT) (R st 0.0038). It is worth mentioning that one Chinese group named Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 705753 Gansu Dongxiang (GSDX) (R st 0.0072) also exhibited extremely low genetic distance with the Northwest Hui group.
To further verify the genetic relationships between the Northwest Hui group and the reference populations, we conducted a MDS analysis for all the populations without the reference Hui groups based on pairwise R st values. In Figure 6, the populations were roughly divided into several clusters as follows. Two African populations (blue) were scattered in the bottom of the plot, three European populations (yellow) were in the upper left corner, and the majority of the East Asian populations (green) were gathered in the upper right side. Evidently, the Northwest Hui group was located between East and West Asian populations, but was more prone to blend with West Asian (purple), Central Asian (red), and several East Asian (green) populations, including the Arab (SACA and SASA), Turkish (EAT, TKT, and SAT), and Iranian (FIR and GIR) populations from West Asia; the Northwest Kazakh (NWK) and Northwest Mongolian (NWM) groups from Central Asia; and Hulunbuir Mongolian (HLBM), Hulunbuir Daur (HLBD), GSDX, and other populations from East Asia.

Population Genetic Analyses Based on AISNPs
In the present study, we employed two sets of population data on the basis of the 55 overlapping AISNPs to uncover biogeographical ancestral information for the Northwest Hui group, including 26 reference populations from the 1000 Genomes Project and 109 reference populations from previous studies.
The PCA analysis of the Northwest Hui group and the 26 reference populations was constructed, from which the raw population data of 55 same AISNPs were available. To obtain a more intuitive display, the first two principal components were adopted to perform the PCA plot. As presented in Figure 7, the clustering pattern of individuals from five continents were revealed, preliminarily clarifying the ancestral determination performance of these 55 AISNPs. Africans (blue) located on the right-hand side of the plot were clearly distinguished from the other populations based on the first principal component. When the second principal component was taken into consideration, three clusters of individuals belonging to different continental regions were observed, which were East Asians (orange), South Asians (pink), and Europeans (purple). However, individuals from America (green) were largely overlapped with individuals from South Asia and Europe. The studied Hui individuals (red) were mainly superimposed with the East Asians.
To explicitly demonstrate the ancestral admixture patterns, we executed a STRUCTURE algorithm from K 2 to K 6 for the 26 reference populations and the Northwest Hui group based on the raw population data of 55 AISNPs in Figure 8. The different K values were equivalent to the number of predicted ancestral   Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 705753 11 components in different colors. Each individual was denoted by a vertical line partitioned into several segments corresponding to the contributions of different ancestral components. For example, when K 2 (top line, Figure 8) was taken into consideration, all individuals were composed of two presumed ancestral components (yellow and green). Only populations from Africa, mainly covered by the green component, were distinctly separated from the non-African populations indicated by the large proportion of yellow. An additional ancestral component in purple was added at K 3, which was the optimal number of FIGURE 8 | A STRUCTURE plot for the 26 reference populations and the Northwest Hui group based on the raw population data for 55 AISNPs from K 2 to K 6. The K values are equivalent to the predicted ancestral components shown in different colors. AISNPs, ancestry-informative single-nucleotide polymorphisms.  Based on the results of the STRUCTURE analyses, we subsequently conducted the prediction of ancestral components for the Northwest Hui group in Figure 9. A total of 90 Hui individuals represented by blue dots exhibited the lowest African ancestral component. On the contrary, 90 Hui individuals (pink dots) revealed the highest East Asian ancestral component, and the percentages of East Asian components exceeded 75% in the vast majority of Hui individuals. The Hui individuals shown with red dots disclosed small proportions of European ancestral components, with the percentages generally below 25%.
To comprehensively dissect the genetic structure of the Northwest Hui group, we subsequently implemented the data from 109 worldwide populations as the references, from which the population frequency data of 55 AISNPs were available. The Nei's D A values among the studied Hui group and the 109 reference populations were evaluated and utilized to construct a NJ tree. As shown in Figure 10, the NJ tree was encircled by the histogram of D A values between the Northwest Hui group and the reference populations. For the NJ plot, populations from different continents tended to congregate corresponding to their biogeographical regions, mainly including populations from East Asia (green sub-branches), America (orange and brown subbranches), South Asia (dark green, pink and light blue subbranches), Europe (purple sub-branches), and Africa (red subbranches). The Northwest Hui group was chiefly assembled with populations from East Asia. In detail, many Chinese populations exhibited extremely low genetic differentiations (D A < 0.01) with the Northwest Hui group. The smallest genetic distance was observed between the studied Hui group and the reference Hui group with D A 0.0029, followed by the Xibe group with D A 0.0050, the Tu group with D A 0.0052, the Southern Han population with D A 0.0059, the Mongolian group with D A 0.0060, the Yi group with D A 0.0080, the Tibetan group with D A 0.0081, the Hakka group with D A 0.0083, and the Bai group with D A 0.0091. In addition, three Southeast Asian populations (Vietnamese, Lao Loum, and Khmer) and two East Asian populations (Japanese and Korean) also displayed extremely low genetic distances with the Northwest Hui group. Populations from North America, South America, North Asia, and Europe predominantly manifested moderate differentiations (0.05 < D A < 0.15) from the Northwest Hui group, whereas most populations from Africa primarily differentiated from the Northwest Hui group with large genetic distances (0.15 < D A < 0.25).
To gain more insight into the population clustering pattern, a PCA plot of the Northwest Hui group and 109 reference populations was performed based on the frequency data of 55 identical AISNPs? ( Figure 11A). The first principal component occupied 40.60% of the total variations, followed by the second principal component accounting for 37.97% and the third principal component accounting for 8.48%. Five population clusters, including populations from Africa (light blue), South Central Asia (light purple), Southwest Asia (orange), Europe (deep purple) America (yellow and grey), were clearly distinguished from each other. However, populations from East Asia (red), Southeast Asia (green), and Central Asia (blue) partially overlapped in the bottom left corner. To obtain clear insights of the relationships between the Northwest Hui group and the neighboring populations in the plot, another PCA plot was constructed ( Figure 11B), concentrating on the populations from East, Southeast, and Central Asia. As anticipated, the studied Hui group was clustered with the majority of the East Asian populations and, more specifically, with the reference Hui and Tu groups.

Sequencing Performance
Considering all 231 genetic markers, the sequencing depths for Northwest Hui individuals roughly fluctuated from 77.14 (±21.23) at rs1736442 IISNP locus to 12,535.64 (±3,565.09) at TH01 locus. A disequilibrium in sequencing depth was clearly observed among the different kinds of genetic markers. These sequencing results were within the acceptable criteria defined by the manufacturer and approximately consistent with previous MPS-related studies (Churchill et al., 2017;Guo et al., 2017;Köcher et al., 2018;Hollard et al., 2019;Hussing et al., 2019;Fan et al., 2020). Specifically, partial genetic markers presenting the lowest sequencing depths in this study, such as DYS460, DXS10103, and rs1736442, were also reported to perform poorly in other studies (Almalki et al., 2017;Guo et al., 2017;Jäger et al., 2017;Silvia et al., 2017;Xavier and Parson, 2017;Hollard et al., 2019;Fan et al., 2020). It was speculated that the constant low detection of these markers might be in relation to amplicon length, low (below recommendations) copy number input DNA, inaccurate library quantification, or overmultiplexing of samples .

Forensic Parameter Efficiencies for Various Genres of Genetic Markers
No significant departure from the HWE expectation was observed following the Bonferroni correction of 27 A-STR loci. In terms of the other forensic parameters of 27 A-STRs shown in Figure 1, the sequence-based polymorphisms provided relatively higher median values in comparison with the length-based polymorphisms, except for the PM parameter, which indicated that the A-STR loci detected at the sequence level might contain more genetic information compared to those at the length level. Previous reports in the literature corroborate that genetic markers with heterozygosity values over 0.5 within populations are appropriate for genetic diversity studies (Dávila et al., 2009;Sheriff and Alemayehu, 2018), illustrating that these 27 A-STR loci with high H obs and H exp values at both length and sequence levels were suitably informative for genetic studies. Additionally, we also discovered that the average H obs values were close to, although slightly lower than, the average H exp values from either the length or sequence perspective, possibly indicating no overall loss in heterozygosity (Araújo et al., 2006). The PIC values at the length and sequence levels were both greater than 0.5; according to Marshall et al., a locus revealing a PIC >0.5 could be treated as highly informative and as a polymorphic marker for genetic characterization and diversity studies (Marshall et al., 1998). The average PM value generated using the sequence-based method was lower than that generated using the length-based method, indicating that sequence-based genotypes might contain more genetic polymorphisms than length-based genotypes. The combined PD value observed at the sequence level was higher than that at the length level, denoting that sequence-based STRs could provide more discrimination power for individual identifications. The combined PE value for the sequence-based genotypes was greater than that for the length-based genotypes, indicating that more genetic variations revealed by sequencing increased the exclusive ability for parentage identification.
When analyzing 94 IISNPs in the Northwest Hui group, only three loci, rs1360288, rs2107612, and rs214955, significantly deviated from the HWE expectations, which were further conformed to the HWE expectations after Bonferroni corrections. The average values of other forensic parameters, including H obs , H exp , PIC, PD, PE, and TPI, were overwhelmingly lower than those of A-STRs. However, the combined PD value for the 94 IISNPs was higher than those of the 27 A-STRs at both the length and sequence levels. Previous reports also claimed that these 94 IISNPs could be comparable with the 27 A-STRs considering the discrimination power for individual identification (Wendt et al., 2016;Delest et al., 2020;Fan et al., 2020).

STR Sequence Variations Observed Using the MPS Method
Compared with STR alleles characterized by the lengths of repeat regions, the detection of possible sequence variations in these repeat regions can effectively increase the polymorphism level of these alleles. In this study, the Northwest Hui group presented plentiful sequence variations on the basis of the MPS data. Specifically, the numbers of sequence-based alleles at eight STR loci were more than doubled in comparison with those of the length polymorphic alleles, including D3S1358, D13S317, D21S11, D2S1338, D12S391, DYS448, DYS389II, and DYF387S1, which were partially consistent with previously published studies (Gettings et al., 2015;Gettings et al., 2016;Novroski et al., 2016;Delest et al., 2020). For example, Novroski et al. claimed that D2S1338, D12S391, and D21S11 loci provided the most significant contribution to increasing allele diversity via sequence variations in repeat regions, focusing on African American, Caucasian, Hispanic, and Chinese populations . D12S391 and DYF387S1 were determined to be the most polymorphic loci for the A-STRs and Y-STRs in French populations, respectively (Delest et al., 2020). These sequence variations enhanced the forensic efficiencies of these STR markers in the forensic applications, such as individual identification and kinship testing, which were generally consistent with previous studies (Hussing et al., 2019;Khubrani et al., 2019;Delest et al., 2020).

Genetic Relationships Between the Northwest Hui Group and Reference Populations
The Performances of IISNPs in Population Genetics Figure 2, certain discrepancies the allelic frequency distributions among the intercontinental populations were observed, especially for partial IISNP loci marked by the dotted box. However, the SNP loci utilized for forensic individual identifications are reported to display little allele frequency differentiation and high heterozygosity among the applied populations (Kidd et al., 2006;Yousefi et al., 2018). Further, MAF can also be considered when screening the SNP loci for individual identifications, which is defined as the frequency of the least common allele for each genetic marker in a given population Yousefi et al., 2018). Various MAF thresholds were implemented for the selection of optimal IISNPs in previously published research. For example, Yousefi et al. chose the SNP loci with MAF value of more than 0.2 as one of the criteria for the individual identification (Yousefi et al., 2018). Huang et al. conducted a genome-wide SNP screening based on the HapMap and 1000 Genomes databases for forensic individual identifications, of which the MAF values were in the range from 0.35 to 0.43 . Briefly, SNP loci with higher MAF values are recommended for individual identifications, which also tend to produce higher heterozygosity values. Thus, for the purpose of individual identifications, SNP loci showing relatively homogeneous frequency distributions as well as high MAF values among tested populations are recommended.

As depicted in
In order to determine whether these studied IISNPs have abilities of population differentiations, we delineated two PCA plots based on the population and individual levels. The Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 705753 population discrepancies on the PCA plots were readily discerned, particularly for African populations that were always clearly distinguished from the other four intercontinental populations. Conversely, populations from East Asia, America, South Asia, and Europe were prone to exhibit relatively close genetic relationships in comparison with the African populations based on these 93 IISNPs. This speculation is clearly verified by Figure 3B, where all the individuals other than the Africans formed inconspicuous clusters. For the NJ tree presented in Figure 3C, the Northwest Hui group was initially grouped with East Asian populations and exhibited the furthest genetic relationships from African populations. The abovementioned results indicated that the population genetic structures revealed by these 93 shared IISNPs conveyed certain disparities, which might be attributed to the uneven frequency distributions of some IISNP loci among intercontinental populations, especially between African and non-African populations.
To validate the notion mentioned above, we executed an I n statistical analysis to evaluate the distinguishing abilities of 93 IISNPs across five intercontinental population groups. In the present study, African populations showed the largest discrepancy values at 30 of the 93 IISNP loci in comparison with the other four intercontinental population groups, in which the loci rs1335873, rs1528460, rs722098, and rs1028528 ranked in the top four with the discrepancy I n values greater than 0.1. According to a previous report, molecular markers with high I n values are more liable to infer population structure than those with low I n values (Rosenberg et al., 2003). However, when the I n statistic was utilized to screen ancestral information loci, the explicit threshold value was not clearly unified across different studies (Rosenberg et al., 2003;Xu et al., 2008;Phillips, 2015;Zeng et al., 2016;Jin et al., 2020). The recommended I n threshold value (I n > 0.1) for selecting ancestral markers could be acquired from a previous study , of which the criteria indicated that the four loci mentioned above might be suitable for ancestral information inferences. In contrast with the African populations, these 93 IISNPs provided overwhelmingly lower I n values in the American, European, East Asian, and South Asian populations. The smaller I n values denote that the tested genetic markers exhibit more similar allelic frequency distributions in the applied populations. In combination with the ubiquitously higher MAF values among the non-African populations in Supplementary Figure S3, we thus speculated that these IISNPs might yield a slightly better performance in the non-African populations on the purpose of individual identifications. Eventually, Figure 3B showed that the Hui individuals predominantly overlapped with East Asians; in combination with the high combined PD value of these IISNPs in the Northwest Hui group, these 93 IISNPs could perform well in identifying Northwest Hui individuals.

The Genetic Differentiations Assessed by Y-STRs Between the Northwest Hui and Reference Populations
To comprehensively dissect the patrilineal genetic landscape of the Hui group, we enrolled the reference population data of 17 shared Y-STRs from previously published research and the YHRD database to assess the population differentiations among the Northwest Hui group and reference populations.
A heatmap of pairwise R st values among the applied 59 populations in Figure 5 illustrated that all Hui groups from Chinese different regions displayed relatively close genetic relationships with East Asian populations, as well as some Central and West Asian populations. To be more specific, the histogram of pairwise R st between the Northwest Hui group and the reference populations demonstrated that the Northwest Hui group revealed the lowest genetic differentiations from the reference Hui groups, except for SCH, which was congruous with the previous study (Xie et al., 2019). It was reported that the separated distribution of SCH and the Northwest Hui group was largely attributed to the different frequency distributions of the D, E, D, G, H, J1, R1a, R1b, and R2 sub-haplogroups (Xie et al., 2019). Intriguingly, according to the histogram, relatively intimate genetic relationships might exist between the Northwest Hui group and populations from Central and West Asia, as well as several East Asian populations, like GSDX. In the light of the MDS plot, the Northwest Hui group might have close genetic relationships with some Central and West Asian populations as well as several East Asian populations, such as the Dongxiang, Kazakh, Mongolian, Arab, Turkish, and Iranian populations. From the linguistic perspective, the abovementioned Arab populations belong to the Afro-Asiatic language family; Iranian populations belong to the Indo-European language family; and Turkish, Dongxiang, Kazakh, and Mongolian populations belong to the Altaic language family. The formation of a population is always accompanied by the corresponding establishment of its language family. Although the language family may not be able to exhaustively reflect the exact formation history of a population, from cultural and historical perspectives, it can still provide certain evidence as to the eventual formation history of the Hui group. In the present study, the Northwest Hui group appertained to the Sino-Tibetan family with Chinese as the predominant language (Gladney, 2020), which was more likely to cluster with populations from the Altaic and Indo-European language families based on the Y-STR analyses. Another study also reported that Chinese Hui groups maintained several Arabic and Persian phrases in their language (Dillon, 1999).
In light of historical documents, various ancestries may have participated in the eventual formation of the Chinese Hui group. Partial Islamic adherents from Central and West Asia, such as Persians, Arabs, and Turks, were encouraged to migrate to China during several medieval dynasties, especially the Yuan (1,271-1,368 AD) Dynasty ruled by the Mongolians. It was believed that these Muslims entered China via two routes: from Northwest China via the Silk Road and from Southeastern coastal areas via the Maritime Silk Road. The Silk Road served as a series of extensive inland trade routes connecting East and West Eurasia and promoted the exchanges of economies, cultures, politics, and religions between civilizations during the 2nd century BCE to the 18th century (Elisseeff, 2000;Gan et al., 2009 Road and intermarried with local indigenous peoples, facilitating the ultimate formation of the Chinese-speaking Hui group (Elisseeff, 2000;Gan et al., 2009). The opinions on the ethnic origin of the Chinese Dongxiang group are relatively divergent according to different historians. In short, the Dongxiang was commonly described as the descendants of Mongolian troops (1,162-1,227 AD) who settled in the Hezhou region and possibly mixed with Sarts (Arab traders and Turkic-speaking city dwellers from Central Asia) (Schwarz, 1984;Wang et al., 2018). Accordingly, it is reasonable that the Northwest Hui group externalized genetically intimate relationships with the populations from Central and West Asia, as well as Dongxiang and Mongolian groups from China.

Ancestral Components of the Northwest Hui Group Revealed Using AISNPs
During the formation of a population, the genetic structure might be changed due to different historical events, such as migrations or intermarriages (Mellars, 2006;Duda and Jan Zrzavý, 2016), and changes in the genetic landscape of one population might distinguish it from other populations. Therefore, partial genetic variations found in modern populations can be treated as ancestral informative markers, such as AIM-STRs (Pereira et al., 2011;Phillips et al., 2011), AISNPs (Phillips et al., 2012;Kidd et al., 2014;Phillips et al., 2014;Jin et al., 2020), and AIM-InDels (Romanini et al., 2015), shedding light on the population evolutionary process. The raw data of 26 reference populations were initially utilized to estimate the performance of 55 AISNPs in ancestral estimation and further disclose the genetic structure and ancestral bioinformation of the Northwest Hui group. As presented in Figure 7, the PCA result at the individual level indicated that individuals from East Asia, South Asia, Europe, and Africa could evidently be separated, while individuals from America were substantially overlapped with South Asians and Europeans. As depicted in a previous study, American populations displaying affinities with European populations might be largely attributed to the immigration of Europeans to the American continent since the discovery of America by Christopher Columbus in 1492 (Jordan et al., 2019). Since the 1700s, a large proportion of South Asians have immigrated to America and intermarried with indigenous Americans; consequently, over 3.4 million Americans can trace their ancestry back to South Asia according to the 2010 census (Perez and Hirschman, 2009). These historical records might partially explain why the Americans overlapped the Europeans and South Asians in the PCA plot. Other studies also reported that American populations from the 1000 Genomes Project might retain ancestral components from European, African, and indigenous American populations, which posed challenges in the ancestral analyses of American populations (Pakstis et al., 2019b;Jin et al., 2020). STRUCTURE analyses in Figure 8 supported the abovementioned opinion, in which the genetic components revealed in American populations partially coincided with those from European and South Asian populations at K 3 or K 4. Thus, the above results demonstrated that these 55 AISNPs could perform well in the estimation of the individual ancestral components, especially for individuals from East Asia, South Asia, Africa, and Europe.
In terms of the studied Hui group, the Hui individuals were largely assembled with East Asians in the PCA plot, demonstrating that the Northwest Hui group might retain a significant amount of genetic components from East Asia. This opinion was also congruent with the results of the STRUCTURE analyses. At K 3, the studied Hui group shared the similar proportion of genetic components with East Asian populations with a large area of yellow component; a trace amount of genetic component in purple which was mainly found in European populations was also detected in the studied Hui group. We further performed ancestral component prediction of the 90 Hui individuals. In Figure 9, the Northwest Hui group demonstrated the major genetic component from East Asia and the minor component from Europe, which were roughly consistent with the genetic component distributions of the Northwest Hui group at K 3 in the STRUCTURE analysis ( Figure 8).
In order to have a more comprehensive genetic interpretation of the Northwest Hui group, a phylogenetic tree was further delineated based on the allelic frequency data of the Northwest Hui group and 109 reference populations. In the phylogenetic tree plot, the Northwest Hui group was initially clustered with East Asian populations, exhibiting extremely low genetic differentiations from the Tu, Xibe, Yi, Bai, Han, Hakka, and Mongolian groups, etc. The intimate genetic relationships between the Hui group and some abovementioned Chinese populations were also reported in previous studies (Hong et al., 2007;Fang et al., 2018;He et al., 2018;Lan et al., 2018;Xie et al., 2018;Wang et al., 2019;Xie et al., 2019;Chen et al., 2020;Zhou et al., 2020). For example, the close genetic relationships between Hui groups and Han populations have been commonly reported from different perspectives, including Y-STRs and Y-SNPs Xie et al., 2019), STRs , AISNPs (He et al., 2018), InDels Zhou et al., 2020), and mitochondrial DNA . Other populations, such as Mongolian (Hong et al., 2007) and Xibe , were also reported to possibly have undergone gene exchange with Hui groups to some extent.
The relatively intimate genetic relationships between the Northwest Hui and East Asian populations were subsequently certificated by two PCA plots. The genetic affinities between the Hui group and East Asian populations were further supported by previously published studies (Yao et al., 2016;He et al., 2018;Xie et al., 2018;Wang et al., 2019;Xie et al., 2019;Jin et al., 2020;Zhou et al., 2020). In detail, He et al. investigated the genetic background of the NXH group using 165 AISNPs and discovered that the genetic components of the NXH group was predominantly contributed by the East Asian ancestral component (He et al., 2018). Another study based on InDels loci also claimed that East Asian populations provided a large proportion of ancestral component for the NXH group . The GSH group was also explored by researchers and was found to exhibit substantial genetic intimacy with East Asian populations (Yao et al., 2016;Xie et al., 2018;Jin et al., 2020). To be more precise, the studied Hui group might have closer genetic relationships with the reference Hui and the Tu groups. The Chinese Tu group, predominately dwelling in Northwestern China, was officially recognized as one of the 56 Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 705753 ethnic groups in 1953. The majority of the Tu people speak the Monguor language, which pertains to the family of Mongolic language, one of the largest sub-branches of the Altaic language family. There are different opinions on the ethnogenesis of the Tu group, and various studies indicated that its genetic origin could be traced back to Tuyuhun Xianbei, to the Mongolian troop that came to the current Qinghai-Gansu region during the Mongolian conquest, or to the Han population (Schwarz, 1984;Hu, 2010). The Chinese Hui group was once conquered by the Yuan Dynasty (Schwarz, 1984;Xie and Shan, 2002) as aforementioned, and according to previous studies, the Hui group might exchange a large proportion of genes with the Chinese Han population due to the long-term intermarriages (Schwarz, 1984;Chen et al., 2020). These historical records and genetic studies might partially explain the intimate relationship between the Northwest Hui and Tu group revealed by the PCA plot. However, the genetic background of the Northwest Hui group estimated using AISNPs showed certain divergence from that assessed using Y-STRs. In addition to the reference Hui groups from Chinese different regions, the Northwest Hui group depicted using Y-STRs was more prone to display closer genetic relationships with populations from Central and West Asia, as well as several Chinese groups. Additionally, our previous research based on the complete mitochondrial genome illustrated that the Northwest Hui group typically exhibited closer genetic relationships with East Asian populations, roughly concordant with the genetic result provided by 55 AISNPs in this study . Briefly, our previous study indicated that the genetic component of the maternal lineage that appeared in the Northwest Hui group might be predominantly derived from East Asia ; the Y chromosome strictly follows the paternal inheritance with little recombination during the genetic process, so the paternal lineage of the Northwest Hui group may still retain some ancient genetic imprints closely related to Central or West Asian populations (Gladney, 1998). Considering the ancestral component prediction, the Northwest Hui group revealed a large proportion of ancestral components from East Asia and relatively little from Europe. Thus, we speculated that a sex bias phenomenon might exist in the long-term intermarriage process of the Northwest Hui group, denoting that males from West or Central Asia might intermarry females from East Asia, and eventually facilitating the formation of the current Hui group. This perspective is also supported by previous research  and historical material (Schwarz, 1984).

CONCLUSION
In this study, a total of 231 genetic markers were originally applied in 90 Hui male individuals dwelling in Northwest China. According to the current research, the studied genetic markers displayed satisfactory forensic performances in the Northwest Hui group, especially for the application of sequence polymorphisms which significantly enhanced the genetic diversities of STR genetic markers. Both 27 A-STRs and 94 IISNPs were polymorphic enough and could yield high forensic efficiencies in the forensic applications, such as paternity testings and individual identifications. Four of 94 IISNP loci, the rs1335873, rs1528460, rs722098, and rs1028528 loci, exhibited relatively larger discrepancies in the distributions of allelic frequencies among intercontinental populations, demonstrating that these loci might contain certain potential for the ancestry inferences. Studies based on the Y-STRs illustrated that the Northwest Hui group initially presented closest genetic relationships with reference Hui groups from Chinese different regions except for SCH and also externalized closer genetic relationships with populations from Central and West Asia, as well as several Chinese groups. Yet, the genetic structure revealed using the AISNPs indicated that the Northwest Hui group evidently displayed genetic affinities with populations from East Asia rather than those from Central or West Asia. In combination with the ancestral component estimation, it was proven that the Northwest Hui group contained a large proportion of ancestral components from East Asia and relatively little from Europe. The aforementioned results possibly indicated that the sex bias phenomenon of intermarriages might have existed in the formation of the Northwest Hui group, involving more intermarriages between males from Central and West Asia and females from East Asia.

DATA AVAILABILITY STATEMENT
The data is available in NCBI PRJNA738655.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committee of Xi'an Jiaotong University Health Science Center (approval number: 2019-1039; 2020-1382). The participants provided their written informed consents to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.