Mucopolysaccharidosis Type I in the Russian Federation and Other Republics of the Former Soviet Union: Molecular Genetic Analysis and Epidemiology

Mutations in the IDUA gene cause deficiency of the lysosomal enzyme alpha-l-iduronidase (IDUA), which leads to a rare disease known as mucopolysaccharidosis type I. More than 300 pathogenic variants of the IDUA gene have been reported to date, but not much is known about the distribution of mutations in different populations and ethnic groups due to the low prevalence of the disease. This article presents the results of a molecular genetic study of 206 patients with mucopolysaccharidosis type I (MPS I) from the Russian Federation (RF) and other republics of the former Soviet Union. Among them, there were 173 Russian (Slavic) patients, 9 Tatars, and 24 patients of different nationalities from other republics of the former Soviet Union. Seventy-three different pathogenic variants in the IDUA gene were identified. The common variant NM_000203.5:c.208C>T was the most prevalent mutant allele among Russian and Tatar patients. The common variant NM_000203.5:c.1205G>A accounted for only 5.8% mutant alleles in Russian patients. Both mutations were very rare or absent in patients from other populations. The pathogenic variant NM_000203.5:c.187C>T was the major allele in patients of Turkic origin (Altaian, Uzbeks, and Kyrgyz). Specific own pathogenic alleles in the IDUA gene were identified in each of these ethnic groups. The identified features are important for understanding the molecular origin of the disease, predicting the risk of its development and creating optimal diagnostic and treatment tools for specific regions and ethnic groups.

BACKGROUND MPS I is a rare lysosomal storage disease that results from the pathogenic nucleotide alterations in the IDUA gene. The IDUA gene encodes the lysosomal enzyme alpha-L-iduronidase (IDUA; EC 3.2.1.76) involved in glycosaminoglycan (GAG) metabolism. The IDUA deficiency leads to the accumulation of the two types of GAGs, i.e., heparan sulfate and dermatan sulfate in different tissues and organs, resulting in the development of progressive multisystem pathology (Campos and Monaga, 2012). The three subtypes of the disease are traditionally distinguished: severe form (Hurler syndrome; MPS IH; MIM#607014), intermediate form (Hurler/Scheie syndrome; MPS IH/S; MIM#607015), and mild form (Scheie syndrome MPS IS; MIM#607016). However, in patients with different MPS I syndromes, no easily measurable biochemical differences have been identified and the clinical findings overlap (Muenzer, 2004). It is now assessed that MPS I exists as a spectrum of disorders from the attenuated form to severe, with many phenotypes in between. The clinical symptoms include coarse face, growth retardation, corneal clouding, contractures of the joints, kyphoscoliosis, dysostosis multiplex, hearing loss, thickening of the heart valves, hepatosplenomegaly, diffuse muscle hypotension, umbilical and inguinal hernias, and cardiomyopathy. The manifestation and severity of symptoms vary depending on the severity of the disease. Cognitive and developmental delays are observed in patients with severe form of disease (Neufeld et al., 2001;Hampe et al., 2020).
The first step in diagnosing MPS I involves qualitative and quantitative analysis of urine GAGs and measurement of the residual alpha-L-iduronidase activity. Enzyme activity can be measured in plasma or leukocyte homogenate of patients, using phenyl-iduronide or 4-methylumbelliferyl as a substrate (Hall and Neufeld, 1973;Hopwood et al., 1979;Hopwood and Harrison, 1982;Stone, 1998). Since recently, enzyme activity has been measured in dried blood spots (DBS) by tandem mass spectrometry (MS/MS) (Kumar et al., 2015). The second step, which is considered definitive to confirm the disease, is the molecular genetic analysis of the IDUA gene.
The IDUA gene is located on the locus 4p16.3 of chromosome 4 and consists of 14 exons and 13 introns. The gene is transcribed into a 2.3-kb cDNA, which encodes a 653-residue glycopeptide (Scott et al., 1991;Scott et al., 1992). Three hundred nineteen variants in the IDUA gene have been reported in the Human Genetic Mutation Database (HGMD). Of these, 86 are nonsense and missense mutations, 49 are splicing substitutions, 47 are minor deletions, 23 are minor insertions, four are small indels, 10 are gross deletion, two are gross insertions, three are complex rearrangement, and one is regulatory substitution (data as of November 2021). Frequencies of mutations differ across populations (www.hgmd.cf.ac.uk -Human).
The most common pathogenic alleles worldwide are NM_000203.5:c.1205G>A and NM_000203.5:c.208C>T. The last investigation of the global distribution of common mutations in the IDUA gene has shown that the NM_000203.5:c.1205G>A was major allele among patients with MPS I from most European countries, America, and Australia. The common allele NM_000203.5:c.208C>T was found mostly in North and East Europe. The accumulation of unique pathogenic alleles is the characteristic of individual population groups. In different populations, the frequency of MPS I ranges from 0.11:100,000 to 1.85:100,000 newborns (Khan et al., 2017;Poletto et al., 2018).
Specific treatment options available for this disorder are Enzyme Replacement Therapy and allogeneic Hematopoietic Stem Cell Transplantation (Concolino et al., 2018;Kubaski et al., 2020).
Genotype-phenotype correlations in MPS I, as well as in other hereditary diseases, are not obvious. However, in some cases, a clear relationship between pathogenic variants and clinical manifestations can be traced (Clarke et al., 2019). Understanding genotype-phenotype correlations may be useful for clinical management and treatment decisions.
Currently, newborn screening for MPS I has been implemented, allowing for early identification of patients and timely treatment (Clarke et al., 2017). For screening to be effective, it is necessary to know the incidence of the disease in the population. In each population, the incidence of MPS I varies due to differences in ethnicity and/or founder effects. Besides, local ethnic groups still retain their unique gene pools.
Knowledge of the prevalence of MPS I and the identification of genetic characteristics of each ethnic group are the prerequisites for the development of optimal methods of diagnosis, treatment, and prediction of disease risk for specific regions and ethnic groups.
Of the 256 patients from different regions of Russia and the former Soviet Union diagnosed with MPS I in the last 35 years, DNA analysis was performed in 206 patients from 201 families.
The aim of the study was to perform a comprehensive DNA analysis of the IDUA gene, studying genotype-phenotype correlations and peculiarities of pathogenic variants among patients with MPS I from different ethnic groups.

MATERIALS AND METHODS Patients
A total of 256 patients (134 male and 122 female) were diagnosed with MPS I from 1985 through 2020. For 206 patients from 201 families, DNA samples were available, and the analysis of the IDUA gene was performed.

DNA Analysis
DNA was extracted following the manufacturer's protocol with the DIAtomt DNA Prep100 kit (Isogene Lab. Ltd., Russia).
The 14 exons and exon-intron boundaries of the IDUA gene were amplified from DNA samples. Primers and PCR reaction conditions have been previously described (Beesley et al., 2001). Sanger sequencing of each one of the 14 exons was performed according to the manufacturer's protocol on an ABI Prism 3500XL (Applied Biosystems). PCR products containing mutations were re-sequenced in both directions. The mutations were further confirmed where possible by restriction analysis (data not shown).

Ethics Statements
Written informed consent was obtained from patients and their parents or legal guardians. Molecular research was approved by the ethics committee of the Federal State Budgetary Scientific Institution "Research Center for Medical Genetics" (Moscow, Russia). All procedures were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration.      Electrophoresis of urine GAGs and measurement of lysosomal enzyme activity in peripheral blood leukocytes or DBSs were performed for all patients with suspected MPS I. All patients with MPS I had hyperexcretion of urine heparan and dermatan sulfate. Residual IdA activity in leukocytes varied from zero to 18.7 nmol/ 18 h/mg. Residual IdA activity in DBS was always below 0 μmol/ h/L blood (the control values 1-7 μmol/h/L blood). No dependence of IdA activity on the severity of the clinical manifestation of the disease was observed ( Table 1).

DNA Analysis
As a result of DNA sequencing analysis, 73 special mutations in different combination were revealed. Of them, 14 were nonsense mutations, 31 were missense mutations, 15 were small deletion, five were small insertions, three were small insdel, and five were site-splicing mutations. Forty-one mutations were well known or previously described. Thirty-two mutant alleles were not described before, and data on these nucleotide substitutions are not available in the HGMD or ClinVar databases ( Table 2).
A total of 409 mutant alleles were identified. The common mutation NM_000203.5:c.208C>T was prevalent in the patient cohort and represented 55.0% of the total number of patient alleles. The NM_000203.5:c.1205G>A variant, which is widespread throughout the world, was detected in only 12 patients (22 of 409 alleles) and accounted for 5.37% of mutant alleles. A similar pattern was observed for the previously described mutation NM_000203.5:c.1139A>G (21 of 409 alleles; 5.1%). The recurrent mutations (detected twice or more) were as follows: NM_000203.    c.1882C>T (2/409; 0.48%). Fifty mutant alleles were unique, that is, occurring in only one individual ( Table 1 and Table 2). The novel mutations included two nonsense mutations, 11 missense mutations, 10 small deletions, four small insertions, three small delins, and two site-splicing substitutions. Five small deletions, two insertions, and all delins were mutations with frameshift. The estimation of frequencies and in silico analysis using the bioinformatics tools (Mutation taster, PolyPhen-2, SIFT, PROVEAN) was performed for the newly found mutations. Mutations were also classified according to the ACMG criteria. All novel mutations were considered to be pathogenic or likely pathogenic ( Table 3).

Genotype-Phenotype Correlation
Of 98.5% patients (203 of 206) had two IDUA variants identified. In three patients (1.5%), only one mutant allele was found. Ninety-three different genotypes were detected, with 74 genotypes being unique (35.9% of all patients). One hundred and fifty-seven patients were classified as having a severe phenotype and 49 as an attenuated (Table 1).

Patients With a Severe Phenotype (MPS IH)
There were 59 individual genotypes represented in the 157 patients with a severe phenotype; 14 genotypes were recurrent and 45 genotypes were unique. The most common genotypes in the patients were NM_000203

Turkic Origin Patients
Uzbeks, Kyrghyz, and Altaians are indigenous peoples of Turkic origin living in Central Asia. On the basis of the assumption of a single common ancestor, we assigned these patients to a group of Turkic origin. In the group, NM_000203.5:c.187C>T mutation prevailed. Variant NM_000203.5:c.187C>T was found in homozygous state in six patients (five Uzbeks and one Altaian) and in heterozygous state, in one Uzbek and one Kyrgyz. The frequency of NM_000203.5:c.187C>T was 77.7%. Mutation NM_000203.5:c.187C>T was first described by the authors in their previous study and has not been reported by anyone else (Voskoboeva et al., 1998). All patients had Hurler phenotype ( Table 1).

Kazakh Population
In three unrelated Kazakh patients, the prevalent mutation was NM_000203.5:c.1403-1G>T (four of the six alleles). The remaining alleles were NM_000203.5:c.1205G>A and a novel minor deletion NM_000203.5:c.1451_1480del. All patients had severe form of disease (Table 1).

Ukrainian Patients
In two Ukrainian patients with MPS IH, the NM_000203.5: c.208C>T allele was found in homozygous state and in combination with site-splicing substitution NM_000203.5: c.972+2T>C (Table 1).

DNA Analysis and Epidemiology
The Soviet Union was a state in Eurasia that existed from 1922 to 1991. In addition to the Russian Republic, there were 14 other republics, each with its own national composition. Representatives of more than 200 different nationalities (ethnic groups) live in today's Russia. About 80% of the population of Russia are Russians. There were no representatives of ethnic groups from Russian regions among the examined Russian patients, with the exception of one Altaian (Altai Republic) and one Avar (Dagestan Republic). Thus, the group of Russian patients was represented by Russians of Slavic origin. Patients' families lived in different regions of the country. Unfortunately, there was no information on possible resettlement of the families.
To simplify the analysis, we divided patients' places of residence according to the federal districts of the RF ( Table 1).
The Tatars are the second largest nation in the RF after the Russians. Mutation NM_000203.5:c.208C>T was found to be predominant among Russian and Tatar patients. Two siblings and 62 unrelated Russian patients and three unrelated Tatar patients were homozygous for NM_000203.5:c.208C>T. Eightyfour unrelated Russian and four Tatar patients were heterozygous for NM_000203.5:c.208C>T.
The NM_000203.5:c.208C>T is one of the most common pathogenic variants in the IDUA gene, accounting for up to 19%-62% of pathogenic alleles among North and East European or Scandinavian patients with MPS I. The frequency of NM_000203.5:c.208C>T decreases from the north to the south across Europe (Khan et al., 2017;Poletto et al., 2018). Such distribution of NM_000203.5:c.208C>T is explained by the possible Viking origin of the allele (Poletto et al., 2018). It is assumed that, in the eighth century, the Scandinavian colonial expansion began, moving mainly along the coast of the Baltic and North Seas. The Vikings also migrated eastward across the territories of the present-day Russia. At the same time, the eastern Slavs inhabited a large part of the East European plain, reaching the Lake Ilmen in the north. According to the current hypothesis, the historical settlements of the Scandinavians may have looked as follows ( Figure 1A).
The hypothesis of NM_000203.5:c.208C>T origin is consistent with the observed pattern of allele accumulation in the Central, Northwestern, and Volga territories of modern Russia, with decreasing frequency in Siberia and the Far East ( Figure 1B). It is possible that the high accumulation of NM_000203.5:c.208C>T homozygotes is explained by the founder effect, and the historical migration of the population to Siberia and the East has led to a dilution of the prevalence of homozygotes. Similar data were obtained in our first study (Voskoboeva et al., 1998). Tatar patients were few, so the frequency of NM_000203.5:c.208C>T may be overestimated. However, the accumulation of NM_000203.5:c.208C>T in Tatar patients could also be attributed to descent from a common ancestor.
Pn the other hand, Vazna A et al. showed that mutation NM_000203.5:c.208C>T might have arisen more than once (Vazna et al., 2009). Thus, it could be assumed that NM_000203.5:c.208C>T has a different origin in the population of Russians and, especially, Tatars.
In contrast to NM_000203.5:c.208C>T, the common allele NM_000203.5:c.1205G>A found with a high frequency among various populations in Europe, North America, and Australia was identified in only 11 Russian patients and only once in the homozygous state (Clarke and Scott, 1993;Poletto et al., 2018). A very similar pattern was observed for the NM_000203.5:c.1139A>G allele. The frequencies of these mutations did not exceed 5%. The variant NM_000203.5: c.1139A>G has been described in several patients of European origin and was predominantly (10%) encountered in patients with MPS I from the Czech Republic and Slovakia (Scott et al., 1995;Venturi et al., 2002;Matte et al., 2003;Vazna et al., 2009). Such a low frequency of these mutations is probably due to the insignificant resettlement of the European population from the west, which led to allele dilution in the Russian population. Allele NM_000203.5:c.1115A>G was the fourth most common in the Russian population (2.6%). The mutation NM_000203.5: c.1115A>G has been detected in Ukrainian patients and a patient from India (Trofimova, 2016;Uttarilli et al., 2016). There have been no reports of this mutation in other populations.
Turkic peoples are diverse ethnic groups defined by Turkic languages. According to a recent study, Kyrgyz, Kazakhs, Uzbeks, and Turkmens share more of a gene pool with various East Asian and Siberian populations than with West Asian or European populations (Yunusbayev et al., 2015). Another study suggests that Mongolian expansion has left a strong mark on the gene pool of Turkic peoples (Zerjal et al., 2002). The presence of a common ancient ancestor for certain Turkic-speaking groups could not be excluded. Variant NM_000203.5:c.187C>T might be arisen from a common ancestor and be a founder mutation for patients of Turkic origin.
Specific mutation pattern was found in the patients of the Armenian and Kazakh populations. Although only few patients were diagnosed, some features can be noted: 1. the absence of common alleles NM_000203.5:c.208C>T and NM_000203.5: c.1205G>A in patients in of these population groups; 2. recurrence of NM_000203.5:c.1A>C mutation among Armenians; 3. the prevalence of NM_000203.5: c.510delinsAAGTTCCA among Armenians and NM_000203.5: c.1403-1G>T, among Kazakhs. These findings are in agreement with the data on the specificity of the genetic background of MPS I in each population (Lee et al., 2004;Wang et al., 2012;Atçeken et al., 2016;Poletto et al., 2018). Mutation NM_000203.5:c.1A>C has been reported in Turkish, Chinese, and Spanish population (Bertola et al., 2011;Wang et al., 2012;Shafaat et al., 2019) and was most common in Iranian patients (Atçeken et al., 2016). The nucleotide variant NM_000203.5:c.1403-1G>T was described only in Chinese patients with MPS I (Pollard et al., 2013).
A recurrent mutation, especially in the homozygous state, can be caused by consanguinity. In unrelated families, a recurrent mutation can be a "hot spot" or founder mutation. The pattern of distribution of mutant alleles worldwide suggests that the accumulation of IDUA mutations is probably due to the founder effect. Although this is most likely true for NM_000203.5:c.208C>T in Russians and possibly in Tatars, the question remains open for mutations found in other populations. We can assume, on the basis of the different places of residence, that the patients were not related. However, this information was not obtained from all parents. Therefore, there is a possibility that the frequencies of homozygotes are associated with consanguineous marriages.

Genotype-Phenotype Correlation
Because the material was collected over a long period of time, it was problematic in many cases to obtain detailed information about on patients' phenotypes. Therefore, the analysis of genotype-phenotype correlation was performed in a reductive manner, as has been done by Clarke et al. (Clarke et al., 2019). Two groups of patients were formed: patients with a severe phenotype (MPS IH) and patients with an attenuated phenotype (MPS IH/S), with the exception of a few patients who were exactly classified as MPS IS (Table 1). In general, our data are in agreement with the data presented by the others (Venturi et al., 2002;Vazna et al., 2009;Bertola et al., 2011;Prommajan et al., 2011;Clarke et al., 2019). All patients homozygous for two "null" alleles had Hurler phenotype. Most patients with an attenuated phenotype had at least one allele represented by a missense mutation. Phenotype divergence was observed in patients with NM_000203.5:c.
Patients heterozygous for NM_000203.5:c.878_889dup in combination with NM_000203.5:c.208C>T and NM_000203.5: c.1598C>T (#152 and #105) had an attenuated form of the disease. Moreover, patient #105 had an extremely mild form of MPS I. She is now 42 years old and has given birth to two children. Professionally, she has a degree in geography. The patient was first described in our study 23 years ago (Voskoboeva et al., 1998) (Table 1).
Coth groups of researchers who described the NM_000203.5: c.1115A>G mutation reported it in patients with a severe phenotype (Trofimova, 2016;Uttarilli et al., 2016). Trofimova et al. suggested that NM_000203.5:c.1115A>G substitution leads to a change in the splice site, but there are no data on the functional study performed. Three our patients heterozygous for NM_000203.5:c.1115A>G had the attenuated form of the disease (#95 to #97). Other two heterozygous patients (#98 and #100) and two homozygous siblings (#99 and #99a) were classified as MPS IS (Table 1).
We were able to identify the genetic features of MPS I among the patients of such a multipopulation country as the Former Soviet Union. Knowledge of MPS I genetic background in each population is very important for providing patients with the right care. Determination of prevalent mutations will allow creating costeffective test systems and avoiding unnecessary testing for a multitude of rare variants. It may also help in developing national screening programs or designing new genotype-specific treatments.
To highlight some of the findings, our data show the following: 1. the standard approach to the IDUA gene DNA analysis identified 98.5% of the genotypes; 2. an accumulation of the NM_000203.5: c.208C>T mutation among Russian patients was detected, which is probably attributed to the founder effect. The frequency of NM_000203.5:c.208C>T is very close to that in Scandinavian countries, which may reflect the existing hypothesis of a Viking origin of NM_000203.5:c.208C>T; 3. common NM_000203.5: c.208C>T and NM_000203.5:c.1205G>A alleles were rare or absent among patients from other ethnic groups (except Tatars and Ukrainians). The prevalence of their unique alleles was detected among these patients. These results are in agreement with those of other researchers; 4. the analysis of genotype-phenotype correlations did not reveal any principal discrepancies with the conclusions of other researchers. A significant discrepancy occurred only for the NM_000203.5:c.1115A>G.
This study also has a number of limitations: 1. 76.2% of the patients in the cohort had a severe phenotype and thus clearly marked clinical manifestations. It could not be excluded that patients with an attenuated form of the disease remain underdiagnosed; 2. at least one study reported a possible non-single origin of NM_000203.5: c.208C>T, which calls into question the founder mutation effect associated with Viking ancestry; 3. in many cases, data on clinical phenotypes were poor and, often, determined by the subjective opinion of the physician, making it difficult to perform genotype-phenotype correlation analysis; 4. the frequencies of unique alleles in the populations examined may be overestimated because of few patients diagnosed; 5. analysis of novel mutations was performed only in silico; 6. not all patients' parents' DNA was available for testing.
A more careful analysis of the patient history, possibly based on certain clinical criteria, is needed to allow the physician to distinguish between MPS IH, MPS IH/S, and MPS IS. A functional analysis for detectable mutations in the IDUA gene, especially missense variants, is required to evaluate their actual effect on enzyme function. Parental DNA testing is necessary to confirm inheritance of the disease. When recurrent mutation is observed in unrelated patients, a detailed analysis of polymorphic the IDUA gene variants and haplotypes is needed to distinguish the "hot spot" from the founder mutation.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by ethics committee at Federal State Budgetary Scientific Institution "Research Center for Medical Genetics", Moscow, Russia, ethics committee chairman Kurilo L.F. Written informed consent to participate in this study was provided by the Frontiers in Molecular Biosciences | www.frontiersin.org January 2022 | Volume 8 | Article 783644 participants' legal guardian/next of kin. Written informed consent was obtained from the individual(s)' and minor(s)' legal guardian/next of kin, for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
All of the authors have read and approved the final version of the manuscript. EV performed the DNA analysis to search for a disease-associated variant in the IDUA gene and interpreted the data received, performed a final analysis of the results obtained, designed the article, and wrote the first draft of the manuscript. TB determined the activity of lysosomal enzyme IIDUA in leukocytes and analyzed data. AS, NV and SM examined patients and carried out a sample of patients on the territory of the RF and the Former Soviet Union, and selection and search of articles and literature. GB determined the activity of lysosomal enzyme IDUA in DBSs and analyzed data. SK and EZ analyzed all laboratory data obtained and organized a discussion.

FUNDING
This work was supported by The Ministry of Science and Higher Education of the Russian Federation (the Federal Scientifictechnical program for genetic technologies development for -2027, agreement No. 075-15-2021-1061.