Delineation of Mitochondrial DNA Variants From Exome Sequencing Data and Association of Haplogroups With Obesity in Kuwait

Background/Objectives Whole-exome sequencing is a valuable tool to determine genetic variations that are associated with rare and common health conditions. A limited number of studies demonstrated that mitochondrial DNA can be captured using whole-exome sequencing. Previous studies have suggested that mitochondrial DNA variants and haplogroup lineages are associated with obesity. Therefore, we investigated the role of mitochondrial variants and haplogroups contributing to the risk of obesity in Arabs in Kuwait using exome sequencing data. Subjects/Methods Indirect mitochondrial genomes were extracted from exome sequencing data from 288 unrelated native Arab individuals from Kuwait. The cohort was divided into obese [body mass index (BMI) ≥ 30 kg/m2] and non-obese (BMI < 30 kg/m2) groups. Mitochondrial variants were identified, and haplogroups were classified and compared with other sequencing technologies. Statistical analysis was performed to determine associations and identify mitochondrial variants and haplogroups affecting obesity. Results Haplogroup R showed a protective effect on obesity [odds ratio (OR) = 0.311; P = 0.006], whereas haplogroup L individuals were at high risk of obesity (OR = 2.285; P = 0.046). Significant differences in mitochondrial variants between the obese and non-obese groups were mainly haplogroup-defining mutations and were involved in processes in energy generation. The majority of mitochondrial variants and haplogroups extracted from exome were in agreement with technical replica from Sanger and whole-genome sequencing. Conclusions This is the first to utilize whole-exome data to extract entire mitochondrial haplogroups to study its association with obesity in an Arab population.


INTRODUCTION
Mitochondria play a role in generating cellular energy via oxidative phosphorylation (OXPHOS), reactive oxygen species production, and apoptosis. Human mitochondrial DNA (mtDNA) is circular, double-stranded, and 16,569 base pairs (bp) in size and contains 37 genes that code for 22 transfer RNAs, two ribosomal RNAs that are necessary for protein synthesis, and 13 messenger RNAs that are required for OXPHOS (Anderson et al., 1981;Andrews et al., 1999). Each mitochondrion contains several copies of mtDNA, and each cell contains many mitochondria (Hosgood et al., 2010). mtDNA contains a major non-coding region called the control/D-loop region, which regulates mitochondrial transcription and replication. The mitochondrial control region is located at mitochondrial nucleotide positions 16,024-576 and is susceptible to a high rate of mtDNA alterations, particularly at the hypervariable regions (Greenberg et al., 1983) as well as under conditions of increased oxidative stress (Clayton, 2000). mtDNA variants are maternally inherited without recombination and may accumulate over time. A mitochondrial haplogroup is a group of individuals who share the same accumulated mtDNA variants and can be geographically restricted, making then traceable via maternal linage. Different haplogroups form diverse branches of a mitochondrial phylogenetic tree. The sub-Saharan Africans are characterized by L0-L6; the South Asians by haplogroups R5-R8, M2-M6, and M4-67; the Europeans, Southwest Asians, and North Africans by haplogroups U, HV, JT, N1, N2, and X; and the East Asians by haplogroups A-G, Z, and M7-M9 (Loogvali et al., 2004;Chaubey et al., 2007;Soares et al., 2010;Kivisild, 2015).
Sanger sequencing is considered the gold standard for detecting mtDNA variants. This approach has progressed to next-generation sequencing (NGS) platform, as it provides highthroughput sequence data for large cohort studies and is less labor-intensive and time-consuming than Sanger sequencing (Calvo et al., 2012;Chinnery et al., 2012;Tang et al., 2013;Wong, 2013). Recently, a number of studies demonstrated that whole-genome sequencing and off-target exome sequencing are able to target both nuclear DNA and mtDNA for the diagnosis of monogenic cases and association studies for multifactorial disorders (Picardi and Pesole, 2012;Delmiro et al., 2013;Samuels et al., 2013;Griffin et al., 2014;Li et al., 2014). Particularly interesting studies include the following: Wagner et al. (2019) evaluated if mtDNA analysis can be performed using exome data; Diroma et al. (2014) extracted mtDNA sequences from exome data to reconstruct human population history using mtDNA variant as marker and to illustrate the involvement of mtDNA in pathology; and Patowary et al. (2017) analyzed the mtDNA sequence derived from whole-exome sequencing and studied haplogroup and variant association in autism spectrum disorder. Nevertheless, to the best of our knowledge, there is no demonstration in the literature on the efficiency of mitochondrial variant calling from whole genome and exome data when compared with the calling using Sanger sequencing data. In addition, there are no studies on mitochondrial haplogroup and variant association with obesity using exome data with potential significant results.
Obesity has become a worldwide epidemic, particularly among Arab populations. In Kuwait, the prevalence of obesity ranges from 37 to 50% (Ng et al., 2014;World Health Organization [WHO], 2018). While obesity has a large heritable component, elucidating these determinants is complicated by the complex interplay between environmental, behavioral, and genetic factors. Genetic studies into obesity have identified monogenic genes using linkage analysis and common variants using genome-wide association studies (GWASs) (Ramachandrappa and Farooqi, 2011). However, obesity-associated genetic loci are often identified in nuclear DNA and have a modest effect that cannot explain the high heritability estimates, and well-defined genetic loci are often from rare familial syndromes (Stunkard et al., 1990;Bouchard and Perusse, 1993;Sorensen et al., 1998).
Mitochondrial variants, haplogroups, and copy number variations have been proposed as potential causative or protective factors for complex and multifactorial disorders and can explain the missing heritability for obesity.
The present study investigates the role of mitochondrial variants and haplogroups contributing to the risk of obesity in Arabs in Kuwait. Arabian Peninsula populations present unique features in the context of the worldwide genetic diversity (Alsmadi et al., 2013(Alsmadi et al., , 2014Thareja et al., 2015): (1) they resulted from an old and continuous admixture between African, European, and Asian ancestries; (2) the high level of consanguineous mating increases frequencies of rare variants and extends stretches of homozygous chromosomal fragments. Further, the Arabian Peninsula, by virtue of being the exit point for the Southern Route of Africa, was indeed the first staging post in the spread of modern humans around the world . Hence, the characterization of Arabian exome variant data potentiates the easy detection of functional variants, contributing information to discover new disease mechanisms.
The present study examined the association of mitochondrial haplogroups and variants with obesity using off-target wholeexome data from a Kuwaiti population. The study used wholegenome data and Sanger sequencing data as quality control samples for mitochondrial variants called from exome reads.

Exome Data
We analyzed 288 exomes from Kuwaiti individuals who were included in a previously published study (John et al., 2018). Samples from the individuals were divided into two groups according to body mass index (BMI): obese (BMI ≥ 30 kg/m 2 ; n = 152) and non-obese (<30 kg/m 2 ; n = 136). Samples were sequenced using two different exome kits: samples from 160 individuals were sequenced using the TruSeq Exome Enrichment kit, and samples from the remaining 128 individuals were sequenced using the Nextera Rapid Capture Exome kit, both using the Illumina HiSeq platform (Illumina Inc. United States) (John et al., 2018). Target files of both exome kits show that both contain the same 11 mtDNA regions where each target region covers an average of 1,000 bp. Whole genomes from three of the 288 individuals were sequenced in our previous studies (Alsmadi et al., 2014;Thareja et al., 2015), and mtDNA sequences extracted from these individuals were used as quality control samples for mitochondrial variants called from exome reads. Furthermore, we previously sequenced mtDNA D-loops from 173 individuals using conventional DNA Sanger sequencing (Eaaswarkhanth et al., 2019), and variants called using the control regions were used as quality controls for mitochondrial variant calling using exome sequences.

Mitochondrial DNA Sequences, Variant Calling, and Annotation
Raw paired-end reads (100 bp) from exome sequencing were mapped to human genome assembly GRCh37 using Burrows-Wheeler Aligner (BWA-MEM version v07-17) with default mapping options (Li, 2013). Duplicate reads were removed using Picard version 2.20.2 1 . The revised Cambridge Reference Sequence (rCRS) (Andrews et al., 1999) for human mtDNA as deposited in the GenBank NCBI database under accession number NC_012920.1 was extracted using SAMtools version 0.1.19 (Li et al., 2009), and the average mtDNA coverage was calculated using Genome Analysis Toolkit (GATK) version v3.8-1-0 (McKenna et al., 2010). mtDNA BAM files were generated for each sample. We subsequently used the GATK haplocaller with default parameters on the extracted mtDNA BAM files to generate variants for each sample in Genomic Variant Call Format (GVCF). All the GVCF files were combined into a single GVCF that was subsequently used to genotype the mtDNA variants. Annotation of the variants was performed using Ensembl Variant Effect Predictor (McLaren et al., 2016) and mitomap/mitomaster 2 (Lott et al., 2013).

Haplogroup Assignment
We used raw variant calling format files for whole mtDNA from 288 samples from Kuwaiti individuals and three whole-genome (technical replica) samples to determine their maternal haplogroups. Haplogroup calling was performed using HaploGrep 2.0 (Weissensteiner et al., 2016) based on PhyloTree Build 17 (accessed on 19 December 2019). To determine the accuracy of mitochondrial haplogroup prediction from exome data, we compared the results with a matched mitochondrial haplogroup profiling of 173 samples from Kuwaiti individuals whose mitochondrial control region variants were called using Sanger sequencing in our previously study (Eaaswarkhanth et al., 2019). We also assessed the agreement in assignment of major mitochondrial haplogroups between whole-exome and whole-genome samples. Further, the graphical phylogenetic trees for the haplogroups R and L were generated from HaploGrep 2.0. The Median-Joining networks of R and L haplogroups were constructed using PopART version 1.7 (Leigh and Bryant, 2015).

Statistical Analyses
Statistical analysis of clinical characteristics was performed using R Project for Statistical Computing software (version 3.6.2) 3 . Quantitative clinical variables (assuming continuous values) were ascertained for normality assumption using Shapiro-Wilk test and presented as either mean ± standard deviation or median and interquartile range. Non-parametric Mann-Whitney U test was used to compare age and BMI scores (which may have skewed distribution) between obese and non-obese groups. In the cases of categorical variables, descriptive statistics were presented as number and percentage, and chi-square test was applied to find associations or significant differences between them.
The differences in the counts of non-synonymous over synonymous mutations between the R (protective) and L (risk) haplogroups were examined using Fisher exact test. Further, the significance of the differences in dN/dS ratio between the two haplogroups was calculated using unpaired Wilcoxon rank sum test available in R software.
Principal component analysis (PCA) was conducted to determine whether the mtDNA profiles could cluster the samples based on obese/non-obese categorizations and assigned haplogroups. We used the PCA tools package of the R software to perform the PCA. Fisher exact test was used to investigate the differences in the distribution of mitochondrial haplogroups and variants between the obese and non-obese groups. Additionally, logistic regression analysis was performed to determine haplogroup association (adjusted for age and sex) with traits using IBM SPSS statistical software (version 25). PLINK software (version 1.9) (Chang et al., 2015) was used to test mtDNA variant association (adjusted for age, sex, and maternal haplogroups) with traits. A two-tailed P-value < 0.05 was considered statistically significant. Table 1 shows the descriptive statistics for the clinical characteristics of the study cohort and subcohorts of 152 (52.8%) obese and 136 (47.2%) non-obese Kuwaiti individuals. There were no significant differences between the obese and non-obese groups in terms of sex; however, the mean age was significantly higher in the obese group compared with the non-obese group (P > 0.001). This difference was in agreement with the results reported in our recent study on Arab population from Kuwait (Eaaswarkhanth et al., 2019). As expected, BMI was significantly different between obese and non-obese groups (P > 0.001).

Mitochondrial DNA Coverage and Variants
The average coverage of extracted mtDNA sequences from offtarget whole-exome samples in our study cohort was 50 × using Nextera Rapid Capture Exome kit and 8 × using TruSeq Exome Enrichment kit. The average coverage of mtDNA sequences from whole-genome (technical replica) samples was 2,491×. The coverage of mtDNA sequences was expected to be high using whole-genome sequencing as the large number of mitochondria present in the cytoplasm contributed to a greater number of reads.
A total of 1,241 mtDNA single-nucleotide polymorphisms (SNPs) and insertion/deletion (INDELs) variants were identified among the 288 whole-exome samples. A comparison of the detected mtDNA variants (SNPs only) from whole-exome data with variants called from the Sanger sequenced reads for the corresponding samples revealed that 77% of the variants were common. Nevertheless, a higher detection rate of variants was observed in Nextera Rapid Capture Exome kit samples (87%) compared with the TruSeq Exome Enrichment kit samples (70%). Some variants were detected by only Sanger sequencing-such variants were MT:71, 209, 235, 311, 315, 398, 411, 523, 524, 571, 573, 582, 16086, 16186, 16188, 16207, 16217, 16249, 16256, 16293, 16351, and 16399. On the other hand, some mtDNA variants (such as MT:513 and MT:16183) were detected only in wholeexome data. Inconsistencies were seen in the called genotypes at MT:302, MT:309, and MT:310 between Sanger and exome sequencing. Furthermore, all the mtDNA variants identified from whole exomes were detected in whole genomes of the same samples. Only four variants (including a mtDNA variant at position MT:3492) were detected in the whole genomes, but not in whole exomes.

Mitochondrial Haplogroups Associated With Obesity
A total of 12 maternal haplogroups (H, HV, J, K, L, M, N, R, T, U, W, and X) were identified from the mitochondrial variants extracted from the whole-exome samples. The most common maternal haplogroups among the 288 Kuwaiti individuals were J (19%), H (16%) L (13%), R (11%), and U (11%) (Figure 1). Good concordance was observed in haplogroup calling using whole exomes versus whole genomes versus Sanger sequenced reads. Among the 173 samples used for both exome sequencing and Sanger sequencing, 123 had the same major maternal haplogroups detected in both exomes and Sanger sequences (Supplementary Table 1). Further, same haplogroups (even at the resolution of subclade) were detected in both exomes and whole genomes in three samples that were analyzed using both sequencing techniques. One sample that had the same mitochondrial haplogroup detected using whole-exome and whole-genome sequencing differed in Sanger sequencing reads.
To assess the amount of variation observed in mtDNA that could be attributed to BMI classification, PCA was performed for the 288 samples, in which 152 (52.8%) were classified as obese and 136 (47.2%) were non-obese (Figure 2). The samples did not cluster based on BMI classification or sex (data not shown), which may indicate that the genetic heritability of obesity in mtDNA is overestimated and/or our data are too small to demonstrate (Figure 2A). However, the samples clustered well based on haplogroup origin, emphasizing the importance of mtDNA when studying the maternal relatedness between individuals and populations ( Figure 2B).
Supplementary Table 2 lists the variants used to assign haplogroup for each sample and the haplogroup assigned to the sample along with the HaploGrep 2 score (Weissensteiner et al., 2016). Table 2 shows the frequencies of each haplogroup in the obese and non-obese groups. The results indicated that individuals with the R haplogroup are at low risk of obesity [odds ratio (OR) = 0.4; P = 0.017)] and remained significant after adjusting the model for age and sex [OR = 0.311; 95% confidence interval (CI) = 0.135-0.717; and P = 0.006]. In addition, males with haplogroup R had a greater likelihood of being non-obese (OR = 4.84; P = 0.035) than obese (data not shown). On the other hand, haplogroup L individuals had a twofold increased risk of obesity (OR = 1.94), which was not significant (P = 0.074) but became significant after adjusting for age and sex using multivariate logistic regression (OR = 2.285; 95% CI = 1.02-5.14; and P = 0.046) ( Table 2). The frequencies of haplogroups H and L differed significantly between obese and non-obese groups, where haplotype R was more frequent in the non-obese group and L was more frequent in the obese group (Figure 3). The complete phylogeny and Median-Joining networks of these obesity riskassociated haplogroups R and L along with their subclades in Kuwaiti individuals are presented in Supplementary Figures 1,2 and Figures 4A,B, respectively.

dN/dS Ratio Between the R and L Haplogroups
Upon performing Fisher exact test on the counts of nonsynonymous and synonymous substitutions between R and L haplogroups, we did not find any significant difference (OR = 0.636; CI = 0.14-2.77; P = 0.547) in the distribution of non-synonymous and synonymous substitutions between them. However, upon computing the dN/dS ratios, we observed the median (IQR) of dN/dS ratio as 0.6 (0.425) and 0.364 (0.196) for R and L, respectively. A statistical test using unpaired Wilcoxon rank sum test between dN/dS ratio of R and L suggested significant differences (P = 0.0024) among them (Figure 5).

Mitochondrial DNA Variants Associated With Obesity
Significant associations (P > 0.05) with BMI classifications were found for 14 mtDNA variants (Table 3); however, three of these associations were no longer significant after adjusting for age, sex, and maternal haplogroups using multivariate logistic regression (Table 3). In addition, nine mtDNA variants were found when the model was corrected for age, sex, and maternal haplogroups. Thus, a total of 20 SNPs were correlated with obesity, among which 11 were positively (OR > 1) correlated with obesity. The missense variant MT:5460G > A (Ala331Thr) in the MT-ND2 gene showed the most significant correlation (P = 0.009) and was associated with a threefold increased risk of obesity. Among the nine negatively (OR > 1) correlated SNPs, the upstream variant MT:16362T > C in the MT-TP gene (encoding microsomal triglyceride transfer protein) showed the most significant (P = 0.007; OR = 0.38) negative association with obesity. Functional analysis of the consequences of these 20 variants revealed that 12 were located in coding exonic regions, four were in non-coding regions, and four were in gene upstream regions. The SIFT and PolyPhen-2 tools that assess the impact of variants on the protein structure and function predicted these variants as "tolerated" and "benign, " respectively. None of these variants was annotated as pathogenic for obesity by ClinVar, Mitomaster, and Mitomap databases. Nevertheless, the MT:5460G > A missense variant, which is positively correlated with obesity, has been associated with Alzheimer's disease and Parkinson's disease (Lin et al., 1992;Schnopp et al., 1996), and the MT:16362T > C, which was negatively correlated with obesity, was shown to be associated with lower mtRNA expression levels and affect uncoupled mitochondrial respiration (Zhou et al., 2017).
Nine SNPs were detected in only one of the BMI groups.

DISCUSSION
Previous studies have identified mitochondrial haplogroups associated with obesity. Mitochondrial group T was associated with an increased risk of obesity in Austrian (Ebner et al., 2015) and southern Italian (Nardelli et al., 2013) populations. Mitochondrial haplogroups X and H were reported to decrease risk of obesity in Caucasians of northern European origin in the United States (Yang et al., 2011) and Arabs from Kuwait (Eaaswarkhanth et al., 2019), respectively. It should be noted that these significant mitochondrial haplogroup studies did not follow the same approach and that there were differences in the age of the participants (adults versus children), BMI grouping, number of mtDNA variants, and regions studied. Differences in the region studied may explain why some studies, such as those conducted in European-American and African-American populations, found no association between mitochondrial variants and obesity (Grant et al., 2012).
In the present study cohort, the distribution of maternal linage frequency was 75% Western Eurasian, 12.5% African, and 12.5% Asian. This distribution is consistent with previous published distributions in Kuwait (Scheible et al., 2011) as well as neighboring countries, such as Iraq (Al-Zahery et al., 2003) and Saudi Arabia (Abu-Amero et al., 2008).
The frequency of mitochondrial haplogroup R in non-obese group was significantly higher than that in obese group. In the present study, most individuals (85%) in haplogroup R belonged to the R0a clade ( Figure 4A  multivariate analyses showed that these defining mutations were also negatively correlated with obesity. The same was also observed for MT:11719A > G, which is the defining mutation for an ancestor haplogroup R0. It is important to note that R0a is the most frequent sub-haplogroup in the Arabian Peninsula, with frequency of 5-30%, and it has been speculated that several founders of R0a are present in southern Arabia (Cerny et al., 2011;Scheible et al., 2011). The overall frequency of the R0a haplogroup in our samples was 10%, which is in agreement with the frequency range in the Arabian Peninsula.
The frequency of mitochondrial haplogroup L in the obese group was significantly higher than that in the non-obese group after adjusting for age and sex. Half of the individuals in haplogroup L belonged to the L3 clade ( Figure 4B and Supplementary Table 2), which is associated with out-of-Africa migration into Asia (Cabrera et al., 2018). Within the human mtDNA tree, haplogroup L3 encompasses not only many sub-Saharan Africans but also all ancient non-African lineages. The similarity of the age of L3 to its two non-African daughter haplogroups, M and N, suggested that the same process was likely responsible for both the L3 expansion in Eastern Africa and the dispersal of a small group of modern humans out of Africa to settle the rest of the world (Soares et al., 2012). The defining mutations for African subclade L3, MT:769G > A and MT:1018G > A (van Oven and Kayser, 2009), were positively correlated with risk of obesity after adjusting for age, sex, and maternal haplogroup. Mutations from other subclades of haplogroup L were also positively correlated with   We observed that mitochondrial haplogroup T, which is known to increase risk of obesity (Nardelli et al., 2013;Ebner et al., 2015), showed a higher frequency in the obese group compared with the non-obese group, but this was not significant. However, its defining mutations, MT:11812A > G and MT:14233A > G, correlated positively with risk of obesity (P = 0.029 and P = 0.032, respectively). Examination of the mitomap database (Lott et al., 2013) revealed another mutation, MT:10609T > C, which is a marker for a subclade of haplogroup F, which was negatively correlated with risk of obesity. Interestingly, this SNP was associated with athlete status and sprint performance in a Korean population (Hwang et al., 2019).
The metric of evolutionary rate ratio dN/dS (ratio of nonsynonymous to synonymous substitution rates) indicates how quickly a protein's constituent amino acids change relative to synonymous changes. A value of <1 indicates purifying selection, =1 indicates evolving neutrally, and >1 indicates positive (diversifying) selection (Spielman and Wilke, 2015). For both the R and L haplogroups that we observed in our study as associated with obesity, the median dN/dS ratio was <1 (0.600 and 0.364, respectively), indicating that both the haplogroups undergo purifying selection in the Kuwaiti population; however, the lower ratio in L haplogroup suggested that the L haplogroup (risk effect on obesity) experienced more purifying selection (or negative selection) than the R haplogroup (protective effect on obesity) by purging deleterious mutations in the process of evolution.
We found that several variants located in nicotinamide adenine dinucleotide (NADH) dehydrogenase subunit (MT-ND1, MT-ND4, and MT-ND5) genes, respiratory complex I, and mitochondrial 12S and 16S ribosomal RNA (MT-RNR1 and MT-RNR2) genes were significantly positively or negatively correlated with risk of obesity. In the MT-RNR2 gene, MT:2758G > A was only identified in the obese group, whereas MT:3197T > C was only identified in the non-obese group. NADH dehydrogenase is required for energy generation in the cell; therefore, variants within its seven encoding genes could result in metabolic disorders including obesity (Flaquer et al., 2014). The MT-RNR1 gene encodes MOTS-C protein that regulates insulin sensitivity and metabolic homeostasis and plays a protective role against diet-induced obesity . Furthermore, MT-RNR2 encodes Humanin, which plays a protective role against oxidative stress (Voigt and Jelinek, 2016). Thus, variants within these genes could potentially interfere with their function, resulting in an increased or decreased risk of obesity.
To prioritize the significant variants identified in our study, we focused on missense mutations leading to amino acid substitutions that were unique to either the obese or non-obese group. The missense variant MT:5460G > A from the MT-ND2 gene was only positively correlated with obesity. This finding was in agreement with findings from other studies that reported that variants within the MT-ND2 gene were associated with body fat mass (Yang et al., 2011) and increased BMI (Flaquer et al., 2014). The MT-ND4L gene has been associated with obesity (Flaquer et al., 2014) and is a mitochondrial encoding subunit of respiratory complex I. In the present study, the missense mutation MT:10609T > C in the MT-ND4L gene was negatively correlated with risk of obesity. Cytochrome c oxidase subunit gene 2 (MT-CO2), which is an important regulator of the OXPHOS system, was also negatively correlated with risk of obesity. We found that MT-CO2 harbored a missense mutation, MT:7853G > A, which exhibited a protective role for obesity. A previous study reported that MT-CO2 (Kraja et al., 2019) and variants within this gene were associated with obesity, but not after adjusting for multiple testing (Liu et al., 2012).
Off-target whole-exome sequencing for the entire mitochondrial genome revealed a good variable coverage depending on the exome capture kit used. The non-uniformity of mitochondrial coverage between the two exome kits could have been due to differences in design and target sequences. This may explain why we observed a higher overlap in variants between sequencing using Sanger technology and sequencing by Nextera Rapid Capture Exome kit compared with sequencing with the TruSeq Exome Enrichment kit (obsolete). Nevertheless, mtDNA variants from both the exome capture kits detected almost all the mtDNA variants identified using indirect whole-genome sequencing of replicated samples. Thus, whole-exome sequencing is a cost-and timeeffective alternative for mitochondrial monogenic (Griffin et al., 2014) and association studies (Li et al., 2014) compared with whole-genome sequencing. The reasons for the difference in detection of variants between Sanger and exome variants include the following: (1) low read coverage of exome data at the start and end of the mitochondrial genome (especially when the average mtDNA coverage is <10); (2) repeated poly-C sequencing error using exome data; and (3) the Sanger variant identification pipeline (Eaaswarkhanth et al., 2019) that uses a predicted mtDNA sequence from a sequence browser with manual adjustment could result in a number of false-positive variants.
PCA with the mitochondrial haplogroup profiling from the 288 whole-exome study samples showed a good clustering of haplogroups, which validates the bioinformatics pipeline used in the present study. Moreover, we observed a high concordance (71%) of mitochondrial haplogroup profiling between variants from whole-exome data and the D-loop region data from conventional Sanger sequencing. One sample that was sequenced with whole-genome and exome kits displayed the same assignment of major haplogroup; however, the D-loop Sanger sequencing of the same sample predicted a different haplogroup. This could have an impact on the significant haplogroups identified in previous studies for complex disorders including obesity due to lower resolution or low number of variants used in the studies.
High-throughput NGS of the mitochondrial genome has advantages compared with Sanger sequencing. However, a technical comparison of both the technologies is required to fully understand and unify their results. The present study compared both technologies and found that some variants were only detected by Sanger sequencing and not NGS; however, this discrepancy could be due to low coverage, the whole-exome capture kit design, target sequences, and machine-specific and human error. Nevertheless, other studies observed the same phenomena despite good mitochondrial sequence coverage. For example, variants at positions MT:16183 (Griffin et al., 2014) and MT:523-524 (Park et al., 2017) were only detected by Sanger sequencing. This could have been due to INDEL alignment errors, as its corresponding position in NGS is MT:513. We also found variants that were incorrectly reported between Sanger sequencing and NGS on the same samples at positions MT:302, MT:309, and MT:310 (Park et al., 2017), which could be also due to alignment errors resulting from the complexity of the region. Interestingly, we also observed a sequencing error variant at position MT:3492 that was only detected by wholegenome sequencing and not exome sequencing. This discrepancy may have been an NGS whole-genome sequencing error (Li et al., 2010).
The present study has some limitations. First, the study did not explore mtDNA heteroplasmic variants within the obese and non-obese groups, as these require high coverage sequences. Second, in order to increase the number of study samples, the study utilized data (generated in our previous studies) obtained using two different exome capture kits from Illumina; the Nextera Rapid Capture Exome kit gave a coverage of 50×, while the TruSeq Exome Enrichment kit gave a coverage of mere 8×; having the second one with a very low coverage can weaken the results by not capturing the variants. Third, we divided our study population into obese and nonobese non-obese groups, and the resultant subcohorts were small in size. Despite these limitations, this study paves the way for a larger study to investigate common complex disorders including obesity using whole mtDNA extracted from whole-exome data with greater coverage exome capture kit.

CONCLUSION
Indirect whole-exome sequencing of 288 Kuwaiti individuals revealed negative and positive associations of mitochondrial haplogroups R and L, respectively, with obesity. We also identified significantly distributed mtDNA variants among the obese and non-obese groups that were mostly haplogroupdefining mutations. We identified several variants of the NADH dehydrogenase subunit that were significantly positively or negatively correlated with risk of obesity. The present study is the first to utilize whole-exome data to extract entire mitochondrial haplogroups and determine their association with obesity in the Arabian Peninsula.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Dasman Diabetes Institute. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
MD, FA-M, and TT designed and performed the study and wrote the manuscript. RN and MM performed the sequencing. HA and PS performed the statistical analyses. ME, SJ, and PH participated in data analysis. All authors contributed to the article and approved the submitted version.