Screening for thalassemia carriers among the Han population of childbearing age in Southwestern of China

Purpose: Thalassemia is a severe hereditary blood disorder that poses a significant threat to human health and leads to mortality and disability. It is one of the most prevalent monogenic diseases worldwide. The aim of this study was to analyze the molecular epidemiological data of individuals of childbearing age from the Han ethnic group with thalassemia in Southwest China and to explore the application of next-generation sequencing (NGS) technology in screening thalassemia carriers. Methods: The participants were Han males and females of childbearing age who sought medical advice at the West China Second University Hospital, Sichuan University from June 2022 to June 2023. We detected α- and β-thalassemia mutations using full-length capture of the thalassemia genes and NGS technology. Results: In a cohort of 1,093 participants, 130 thalassemia carriers were identified, with an overall detection rate of 11.89% (130/1,093). Among these, 0.91% (10/1,093) had mutations that could not be detected using traditional PCR techniques. The proportions of carriers with α-, β-, and α-complexed β-thalassemia gene mutations were 7.68% (84/1,093), 3.93% (43/1,093), and 0.27% (3/1,093), respectively. We identified a novel HBA2 c.166del variant that has not been previously reported. Conclusion: Using NGS technology, we found that the mutation-carrying rate of thalassemia genes was 11.89% in the Han population of childbearing age in Southwest China. Compared with the results of traditional PCR techniques, NGS detected an additional 0.91% (10/1,093) rare genetic variants. NGS technology should be utilized as the primary screening method for thalassemia carriers among Han nationality people of childbearing age in Southwest China.


Introduction
Thalassemia, also known as Mediterranean anemia, is an autosomal recessive genetic disease and one of the most common monogenic genetic diseases worldwide (Taher et al., 2018).It belongs to the category of hereditary hemoglobinopathies.The most important forms of thalassemia are α-thalassemia and β-thalassemia, which are primarily caused by the imbalance of α-like and non-α-like globin chain production.This imbalance leads to a series of symptoms including red blood cell hemolysis, ineffective erythropoiesis, and iron overload (Kattamis et al., 2022).About 1.5% of the global population carry a β-thalassemic allele, and about 5% carry an αthalassemic allele (Piel and Weatherall, 2014;Kattamis et al., 2020).Currently, the treatment of patients with thalassemia relies on supportive therapies such as blood transfusion and iron chelation.Only a few patients receive curative treatment, and allogeneic hematopoietic stem cell transplantation (allo-HSCT) remains the only curative method.However, allo-HSCT has not been widely implemented due to challenges in matching, high costs, and unsatisfactory outcomes (Rachmilewitz and Giardina, 2011;Kattamis et al., 2022).Therefore, premarital, pre-pregnancy, and prenatal thalassemia gene screening in areas with a high incidence of thalassemia is the most effective measure to avoid the birth of children with severe thalassemia (Ip and So, 2013).
Geographical heterogeneity is a prominent characteristic of thalassemia.Globally, thalassemia carriers are concentrated in the Mediterranean, the Middle East, Central Asia, India, and southern China (Colah et al., 2010).In China, thalassemia is prevalent in the southern regions, particularly in the Guangdong, Guangxi, Fujian, Yunnan, Guizhou, and Sichuan provinces (Huang et al., 2019).Southwestern China, including Chongqing, Sichuan, Guizhou and Yunnan provinces (He et al., 2017;Li et al., 2021;Tan et al., 2021;Li et al., 2022), has a high incidence of thalassemia.Mutations in the thalassemia genes are usually population-specific; each population has a unique spectrum of mutations (Williams and Weatherall, 2012).Currently, data on large samples of thalassemia carriers in the childbearing-age Han population in Southwest China are unavailable.Hence, it is crucial to explore the local epidemiological characteristics of thalassemia for effective targeted genetic detection and counseling.
At present, routine blood counts and indicators, such as mean corpuscular volume (MCV) and mean corpuscular hemoglobin (MCH), and hemoglobin component analysis (using highperformance liquid chromatography or capillary electrophoresis), are often used as first-line screening protocols for thalassemia (Ryan et al., 2010;Aliyeva et al., 2018).Although these are the fastest and most cost-effective screening protocols for thalassemia, they are susceptible to generating false negatives.Genetic testing is the gold standard for the diagnosis of thalassemia, identification of hemoglobin variants, clarification of complex cases, and prenatal diagnosis.Globally, gap-PCR and PCR-reverse dot blotting (PCR-RDB) are commonly used to diagnose thalassemia.However, the detection ranges of gap-PCR and PCR-RDB are limited; these methods only cover a small number of thalassemia gene mutation types (3 common deletion α-thalassemia mutations, 3 common non-deletion α-thalassemia mutations, and 17 common β-thalassemia mutations).Comparatively, nextgeneration sequencing (NGS) has significantly improved the detection range and accuracy and is able to detect more samples simultaneously.However, NGS has disadvantages in terms of detecting highly homologous regions, gene rearrangements, and large fragment deletions/duplications (Aliyeva et al., 2018).Although long-read third-generation sequencing has the advantage of detecting copy number variations (CNVs) and mutations in homologous genes (Hassan et al., 2023), it is not widely used in clinical practice because of its high cost.Considering the possible limitations of NGS technology in detecting homologous regions of HBA1 and HBA2, we utilized NGS and fluorescence PCR melting curve method to test all participants.Additionally, Sanger sequencing, quantitative Realtime polymerase chain reaction (qRT-PCR), and PCRelectrophoresis were employed to validate the point mutations and α-thalassemia triplications detected from the NGS results.In this study, we aimed to analyze the molecular epidemiological characteristics of thalassemia among the childbearing-age Han population in Southwest China to provide a reference for prenatal screening, prenatal diagnosis, and birth defect prevention and control.We also explored the application of NGS for the detection of thalassemia.

NGS
Genomic DNA was extracted from whole blood samples using nucleic acid extraction kit (magnetic bead method) (MyGenostics, Chongqing, China).The extracted DNA was required to have a concentration ≥25 ng/μL and a 260/280 ratio between 1.7 and 2.0.Library preparation was performed using an enzyme digestion method with a V5.0 library preparation kit (MyGenostics).Library products were required to have a fragment length distribution of 200-500 bp.Then 8-12 samples were mixed into one hybridization unit with 600 ng every sample.Added probes and buffer from the GenCap ® universal nucleic acid fragment enrichment purification kit V2.3 (MyGenostics) to each hybridization unit.And hybridized for 12-24 h to Hybridization enrichment.The probes are double-stranded DNA, which covering the full length of the HBA1, HBA2, and HBB genes.The concentration of each hybridization unit was detected by absolute quantitative QPCR, and pooling was performed based on the QPCR results.Sequencing was performed on the Illumina NextSeq 500 platform (Illumina, San Diego, CA, USA) using the NextSeq CN500 Middle Output Kit (Illumina).After sequencing, the data were aligned to preprocessed data and human reference genome GRCh37/hg19 using BWA (version 0.7.10; https://www.plob.org/tag/bwa).GATK (version 4.0.8.1; https://www.broadinstitute.org/gatk/)was used for base recalibration, and the Single Nucleotide Polymorphisms (SNPs) and Insertions and Deletions (InDels) results were annotated using the ANNOVAR software (version 1; http://annovar.openbioinformatics.org/en/latest/)(McCombie et al., 2019).Pathogenicity analysis was performed according to the sequence variation interpretation standards and guidelines recommended by the American College of Medical Genetics and Genomics and Association for Molecular Pathology (Richards et al., 2015).

Validation experiments
All DNA samples were tested for three common deletion αthalassemia mutations in the Chinese population, including Southeast Asian deletion type (--SEA /), left deletion type (-α 4.2 /), and right deletion type (-α 3.7 /).DNA was tested using fluorescent PCR melting curve method (deletion α-thalassemia kit, Zeesan, Xiamen, China) on the cobas z 480 analyzer (Roche, Switzerland) platform.Samples with NGS results of HBA1, HBA2 or HBB gene mutations (such as SNPs and InDels) were validated using Sanger sequencing.Subsequent to performing primer design and PCR, Sanger sequencing was carried out, and the sequencing results were analyzed using Chroms software (version 2.4.1;Biosoft).
Samples showing increased copy numbers of HBA1/HBA2 (unkown samples) based on NGS analysis results were subjected to qRT-PCR and the 2 −ΔΔCt method (Livak and Schmittgen, 2001).There are three groups of nucleotide fragments with high homology in HBA1 and HBA2, labeled as X segment, Y segment, and Z segment (Farashi and Harteveld, 2018).According to the molecular structure of ααα anti3.7 , ααα anti4.2, and HKαα, we measured the copy number of Z segments to detect the copy numbers of HBA1/HBA2 (Long and Liu, 2021).Primers were designed for exon 3 of HBA2 in Z segments, and the β-actin gene was selected as the internal control gene.We ensure that the products of each pair of primers have unique sequences.We selected a sample with two copy numbers of HBA1/HBA2 as calibrator sample.The threshold cycle (Ct) were detected on a 7500 FAST Dx Real-Time PCR platform (Thermo Fisher Scientific Inc, Waltham, MA, USA).All sample performed experiments three times.We calculated the ΔCt value of the unkown samples and calibrator sample tested as bellow: ΔCt = Ct HBA2 -Ct β-actin .And we calculated ΔΔCt value of each unkown sample as bellow: ΔΔCt unkown sample = ΔCt unkown sample -ΔCt calibrator sample .Using the 2 −ΔΔCt method to calculate the relative quantitative value, the result of 2 −ΔΔCt is the relative ratio of the unkown samples and calibrator sample.In addition, we can get the 2 −ΔΔCt value of unkown samples on 7500 FAST Dx Real-Time PCR platform directly.And the genotype was verified using the PCR-electrophoresis (6 α-thalassemia gene detection kit, Yaneng, Shenzhen, China).We presented several representative results of qRT-PCR and PCR-electrophoresis (Supplementary Figure S1).All samples which were identified thalassemia carriers with NGS were validated using another method.The primers information is shown in Supplementary Table S1.
We identified a novel mutation (HBA2 c.166del, p.Val56LeufsTer12).To the best of our knowledge, the mutation has not been recorded in ClinVar, the Human Gene Mutation Database or PubMed database, and it has not been reported in literature.According to the prediction, the mutation may lead to premature termination of the codons.Beyond that, we identified six rare thalassemia genotypes: HBA2 c.272_279delAGCTTCGG heterozygote, HBB c.180G>C heterozygote, HBB c.341T>A heterozygote, HBB c.68A>C heterozygote, HBB c.68A>G heterozygote, and heterozygous deletion of HBB exons 1-3.Heterozygous deletion of HBB exon 1-3 was verified using multiplex ligation-dependent probe amplification, whereas the remaining rare thalassemia genotypes were verified using Sanger sequencing (Figure 1).NGS results of 25 samples indicated increased copy numbers of HBA1/HBA2, which were verified using qRT-PCR and PCR-electrophoresis.From the 25 samples, six different genotypes were detected (Table 2).The most common genotype was ααα anti3.7 /αα, accounting for 48% (12/25) of all α-globin gene triplication cases.One genotype was more cryptic than others and required several analyses.A case was identified as -α 3.7 /αα using NGS.The fluorescence PCR melting curve method showed the presence of both wild-type and deletion peaks in the A-ROX and B-ROX channels, indicating the simultaneous presence of heterozygous deletions of -α 3.7 and -α 4.2 .However, the result did not correspond to the -α 3.7 /-α 4.2 peak pattern.The electrophoresis result of gap-PCR showed the simultaneous presence of -α 3.7 /, -α 4.2 /, and a normal band.The qRT-PCR results indicated that the copy number of HBA1/HBA2 was 1.The result of PCR-electrophoresis showed that there was a positive HKαα band and a normal band.Taking all the results into account, we identified this genotype as Frontiers in Genetics frontiersin.org HKαα/-α 4.2 (Figure 2).We identified seven couples with a high-risk of thalassemia in this study (Supplementary Table S2).

Discussion
In this study, the carrier rate of α-thalassemia was higher than that of β-thalassemia in Southwest China.Secondly, the mutations of α-thalassemia carriers were mainly --SEA /αα and -α 3.7 /αα, while βthalassemia were mainly codon 41/42 (-TTCT), codon 17 (A>T), and IVS-II-654 (C>T).These two findings mentioned above in our study are consistent with previous reports from various regions of China (He et al., 2017;Zhuang et al., 2020;Li et al., 2021;Tan et al., 2021;Li et al., 2022;Wei et al., 2022;Xian et al., 2022;Yu et al., 2022;Xi et al., 2023;Yang et al., 2023) (Table 3).Li et al. reported that the thalassemia carrying rate in Sichuan region was 2.6%, which was lower than this study.This could be due to Li et al.'s study included a total of 42155 participants, with only 2430 hematologically positive individuals undergoing molecular diagnosis (Li et al., 2021).Baise, Guangxi is a multi-ethnic area with a high incidence of thalassemia genes in the population, and the carrier rate of thalassemia of Baise is higher than the results of this study (Wei et al., 2022).The participants in Hainan, Guangdong and Fujian (Quanzhou) had positive hematological results and belonged to high-risk  Frontiers in Genetics frontiersin.orgpopulations, resulting in a much higher carrier rate of thalassemia than that in this study.The thalassemia carrier rate in our study was higher than that reported in Hunan (Xi et al., 2023).Nevertheless, the carrier rate of thalassemia in cities in Hunan varies according to the geographical location.Although the data in this study were similar to those from Chongqing, Guizhou (Qianxinan) and Jiangxi (Ganzhou), there were slight differences in the characteristics of the mutation spectra.This study identified a novel mutation (HBA2 c.166del), which has not been previously reported.In addition, this study identified six rare thalassemia gene mutations.Among them, HBA2 c.272_279delAGCTTCGG was first reported in Guangxi Zhuang Autonomous Region, China, and the phenotype of HBA2 c.272_279delAGCTTCGG heterozygote was speculated to be similar to -α 3.7 /αα, showing static thalassemia (Li et al., 2019).According to the Database of Human Hemoglobin Variants and Thalassemia Mutations, all individuals with heterozygous mutations in HBB c.180G>C, c.341T>A, c.68A>C, and c.68A>G present normally.Exon 1-3 deletion in HBB has been reported in foreign countries (Hogan et al., 2018), but there is no report in China.In the two cases of HBB exons 1-3 deletion heterozygosity in this study, hematological tests showed normal levels of Hemoglobin (HGB), decreased MCV or MCH, and elevated Fetal hemoglobin.The α-thalassemia triplication or quadruplication rate in this study is 2.28%, which is close to Hunan Province of 1.99% (Xi et al., 2023).Of the 25 α-thalassemia triplication carriers, 15 did not take hematological testing (including two with --SEA /ααα anti3.7 and HKαα/-α 4.2 , respectively), nine returned normal results in routine blood counts or hemoglobin electrophoresis, and one showed mild anemia with 105 g/L HGB, 63.7 fL MCV, and 18.8 pg MCH (prepregnancy).The NGS result of a mild anemia participant was ααα anti4.2/αα complex HBB c.316-197C>T heterozygous, and her hemoglobin continued to decrease during pregnancy.At 29 +1 weeks of pregnancy, routine blood counts revealed an HGB level of 87 g/L, indicating moderate anemia.In conclusion, patients with βthalassemia combined with α-thalassemia triplication probably have aggravated clinical phenotype.And pregnancy may worsen the severity of anemia and increase the risk of anemia-related obstetric comorbidities and complications.Therefore, it is important to perform thalassemia gene testing in women during preconception.In this study, seven high-risk couples were identified.The women in the F03, F05, and F07 families in this study were pregnant.After genetic counseling, all three families chose to undergo amniocentesis for the prenatal diagnosis of thalassemia.
Currently, hematological testing (combined routine blood counts and hemoglobin component analysis) is commonly used as the first-line screening method for thalassemia in clinical practice (Ryan et al., 2010;Aliyeva et al., 2018).However, this method is prone to false negatives in thalassemia carriers with negative hematological test results.In this study, the false-negative rate for only using routine blood count examinations or hemoglobin electrophoresis was 30.61% (15/49) and 56.41% (22/39), respectively.When both two methods were used, the falsenegative rate was 19.35% (6/31) (Table 4).The main methods used for the detection of thalassemia genes in clinical practice include Southern blot hybridization, high-resolution melting curve analysis, array-based detection, amplification refractory mutation system PCR (ARMS-PCR), real-time PCR, gap-PCR, and PCR-RDB (Vrettou et al., 2003;Chan et al., 2004;Huang et al., 2017;Hassan et al., 2023).Southern blot hybridization is considered the gold standard for large fragment deletion detection, especially for the α-globin gene.However, it requires a large sample volume and involves complex procedures, limiting its clinical application.High-resolution melting curve analysis, array-based   The thalassemia genes HBA1 and HBA2 detected in this study are highly homologous, and NGS has certain limitations (Aliyeva et al., 2018).The capture probe used in this study was a doublestranded DNA probe covering the full lengths of the HBA1, HBA2, and HBB genes, which could stably capture target genes with good repeatability.Probe density increased locally in areas with common deletions and duplications.Simultaneously, the probe density and length were adjusted according to the GC content and melting temperature value for special areas to ensure a higher capture efficiency and more accurate CNVs analysis.Our results showed that the NGS results of α-thalassemia carriers were consistent with the corresponding validation experiments.Based on these results, detection using full-length capture of the thalassemia genes and NGS technology are accurate and reliable.In this study, the NGS results of one sample indicated that the copy number of HBA1/ HBA2 was four, but the genotype could not be distinguished as αααα anti3.7 /αα or ααα anti3.7 /ααα anti3.7 .PCR-electrophoresis confirmed the genotype to be αααα anti3.7 /αα.Therefore, NGS can detect an increase in the copy number of genes but cannot accurately assess the fertility risk because of the inability to identify complex rearrangement genotypes.Although this situation is very rare, if clinical testing shows that the genotype cannot be determined, other methods, such as PCR electrophoresis or third-generation sequencing, still need to be used for further validation.

Conclusion
In summary, there is a high incidence of thalassemia among people of childbearing age in Southwest China.Our results confirmed that NGS technology can serve as an economical, rapid, high-throughput, and efficient strategy for screening thalassemia carriers.

FIGURE 2
FIGURE 2 All experimental results of 1 sample with genotype HKαα/-α 4.2 , S is sample, PC is positive control, NC is normal control, NTC is No template control, M is marker.(A) The result of NGS; (B, C) The result of Fluorescent PCR melting curve method, (B) A-ROX, (C) B-ROX, the green peak pattern is sample, the red peak pattern is normal control; (D) The result of qRT-PCR, PC is --SEA /αα; (E) The result of gap-PCR, PC is -α 3.7 /αα; (F) The result of PCRelectrophoresis, SA is Quality control of reagents, PC is HKαα/αα.

TABLE 1
The thalassemia screening results of the 1,093 participants.
a HGVS, the Human Genome Variation.b Hb, Hemoglobin.

TABLE 2
The α-Globin gene triplication results of the 1,093 participants.
a NGS, next-generation sequencing.b qRT-PCR, quantitative Real-time polymerase chain reaction.

TABLE 3
Comparison of thalassemia mutations in different regions of Southern China.

TABLE 4
Sensitivity and specificity of HbA2 a and MCV b +MCH c levels for thalassaemia carriers screened by NGS d .PCR, real-time PCR, gap-PCR, and PCR-RDB have limited detection ranges and can only detect known thalassemia gene variants.Gap-PCR and PCR-RDB are the most commonly used methods for diagnosing point mutations of thalassemia genes and deletions of α-globin genes globally.However, when using only gap-PCR combined with PCR-RDB, the detection rate of thalassemia was 10.97% (120/1,093) in our study.Two samples carrying mutations of α-globin genes (HBA2 c.166del and c.272_279del) and eight samples carrying mutations of β-globin gene (HBB exons 1-3del, c.*110T>C, c.-100G>A, c.180G>C, c.341T>A, c.68A>C, and c.68A>G) could not be detected.