CLCC1 c. 75C>A Mutation in Pakistani Derived Retinitis Pigmentosa Families Likely Originated With a Single Founder Mutation 2,000–5,000 Years Ago

Background: A CLCC1 c. 75C > A (p.D25E) mutation has been associated with autosomal recessive pigmentosa in patients in and from Pakistan. CLCC1 is ubiquitously expressed, and knockout models of this gene in zebrafish and mice are lethal in the embryonic period, suggesting that possible retinitis pigmentosa mutations in this gene might be limited to those leaving partial activity. In agreement with this hypothesis, the mutation is the only CLCC1 mutation associated with retinitis pigmentosa to date, and all identified patients with this mutation share a common SNP haplotype surrounding the mutation, suggesting a common founder. Methods: SNPs were genotyped by a combination of WGS and Sanger sequencing. The original founder haplotype, and recombination pathways were delineated by examination to minimize recombination events. Mutation age was estimated by four methods including an explicit solution, an iterative approach, a Bayesian approach and an approach based solely on ancestral segment lengths using high density SNP data. Results: All members of each of the nine families studied shared a single autozygous SNP haplotype for the CLCC1 region ranging from approximately 1–3.5 Mb in size. The haplotypes shared by the families could be derived from a single putative ancestral haplotype with at most two recombination events. Based on the haplotype and Gamma analysis, the estimated age of the founding mutation varied from 79 to 196 generations, or approximately 2,000–5,000 years, depending on the markers used in the estimate. The DMLE (Bayesian) estimates ranged from 2,160 generations assuming a population growth rate of 0–309 generations assuming a population growth rate of 2% with broad 95% confidence intervals. Conclusion: These results provide insight into the origin of the CLCC1 mutation in the Pakistan population. This mutation is estimated to have occurred 2000–5,000 years ago and has been transmitted to affected families of Pakistani origin in geographically dispersed locations around the world. This is the only mutation in CLCC1 identified to date, suggesting that the CLCC1 gene is under a high degree of constraint, probably imposed by functional requirements for this gene during embryonic development.

Background: A CLCC1 c. 75C > A (p.D25E) mutation has been associated with autosomal recessive pigmentosa in patients in and from Pakistan. CLCC1 is ubiquitously expressed, and knockout models of this gene in zebrafish and mice are lethal in the embryonic period, suggesting that possible retinitis pigmentosa mutations in this gene might be limited to those leaving partial activity. In agreement with this hypothesis, the mutation is the only CLCC1 mutation associated with retinitis pigmentosa to date, and all identified patients with this mutation share a common SNP haplotype surrounding the mutation, suggesting a common founder.
Methods: SNPs were genotyped by a combination of WGS and Sanger sequencing. The original founder haplotype, and recombination pathways were delineated by examination to minimize recombination events. Mutation age was estimated by four methods including an explicit solution, an iterative approach, a Bayesian approach and an approach based solely on ancestral segment lengths using high density SNP data.
Results: All members of each of the nine families studied shared a single autozygous SNP haplotype for the CLCC1 region ranging from approximately 1-3.5 Mb in size. The haplotypes shared by the families could be derived from a single putative ancestral haplotype with at most two recombination events. Based on the haplotype and Gamma analysis, the estimated age of the founding mutation varied from 79 to 196 generations, or approximately 2,000-5,000 years, depending on the markers used in the estimate. The DMLE (Bayesian) estimates ranged from 2,160 generations assuming a population growth rate of 0-309 generations assuming a population growth rate of 2% with broad 95% confidence intervals.

INTRODUCTION
Retinitis pigmentosa (RP [MIM 268000]) is a clinically and genetically heterogeneous disorder affecting approximately one in 4,000 individuals worldwide (Hartong et al., 2006). Clinically, patients initially exhibit night blindness followed by progressive loss of peripheral visual fields, eventually culminating in compromise or even complete loss of central vision. Typical fundus changes include bone spicule-like pigmentation in the mid-peripheral retina, waxy pallor of the optic discs, and attenuation of retinal blood vessels. Since RP initially affects the rod photoreceptors, followed by the degeneration of cone photoreceptors, patients often have severely diminished or extinguished rod response in electroretinography (ERG) even in early stages of the disease, while the cone response is relatively preserved initially but decreases and becomes undetectable as the disease progresses (Bird, 1995). Genetic inheritance patterns of RP include autosomal-dominant (about 30-40% of cases), autosomal-recessive (50-60%), and X-linked (5-15%) inheritance (Bunker et al., 1984;Rivolta et al., 2002). More than 82 causative genes have been identified for RP so far, of which 58 genes have been identified in families with autosomal recessive RP (arRP) (Daiger et al., 2021).
At least in part reflecting social and economic considerations, the frequency of consanguineous marriages in Pakistan is among the highest in the world (Bittles, 2001), ranging from 15 to 35% (Hamamy et al., 2011). In reviewing 146 genetically resolved arRP Pakistani families, Khan et al. found only 4 (2.7%) with compound heterozygous mutations (Khan et al., 2014), emphasizing the role of consanguinity on the incidence of arRP in this population. Not only does the high frequency of consanguinity in the Pakistani population bring out autosomal recessive alleles, but it also increases the likelihood that sharing of variations by different families is likely to be the result of the variant allele being derived from a common ancestor, especially if the families that share the same variation also share a common intragenic SNP haplotype for the associated gene.
Chloride channel CLIC like 1 (CLCC1) is a transmembrane channel protein with high permeability for anions, in particular chloride, localized to the endoplasmic reticulum (ER) and in some cell types possibly the Golgi apparatus and Nucleus (Nagasawa et al., 2001). The CLCC1 gene spans 33 kb, comprising 13 exons encoding a 551 amino acid protein. Li et al. (Li et al., 2018) demonstrated that Clcc1 is highly expressed in the mouse retina, and modestly expressed in the iris, optic nerve, sclera, and cornea. Immunohistochemistry in the normal adult human eye demonstrated CLCC1 expression extensively in the retina and optic nerve, suggesting a physiologic role of CLCC1 in retinal function. Within the retina, CLCC1 staining was more intense in the lamina cribrosa, optic nerve, ganglion cell layer, inner and outer nuclear layers, and retinal pigment epithelium (RPE). The CLCC1 NM_145,543.2:c.75C > A (p.D25E) missense mutation in CLCC1 was found in seven Pakistani families and one British-Bangladeshi family with arRP mapping to chromosome 1p13 (RP32; 609,913). Recent additional screening has found one new family (61334) carrying the same mutation, bringing the total number of families to nine and accounting for about 6% of genetic cases of arRP in Pakistani families .
The present study was undertaken to investigate the possible common ancestry of the nine Pakistani and Pakistani-derived families carrying the c.75C > A mutation, to define the likely recombination and mutational events that would be required to occur if they did have a common founder, to estimate the approximate age of the putative founder mutation and to correlate the history and geographic distribution of this mutation with the population history of Pakistan. To achieve these goals, we performed haplotype analysis of 99 intragenic SNPs flanking the c.75C > A CLCC1 mutation, derived the recombinational pathways requiring the fewest recombination events to yield the currently observed haplotypes, and estimated the number of generations that have occurred since the original mutation in the founder.

Patients and DNA Samples
This study was approved by the Institutional Review Boards (IRB) of the National Centre of Excellence in Molecular Biology, Lahore, Pakistan, and the CNS IRB at the National Institutes of Health, and consent was obtained in accordance with the Declaration of Helsinki. Patients were diagnosed with RP on the basis of clinical features as previously described (Li et al., 2018). Blood samples were collected from potentially informative family members, and genomic DNA was extracted from leukocytes according to standard protocols (Smith et al., 1989).
Nine retinitis pigmentosa families carrying the c.75C > A mutation were identified in Pakistan originating from the Pakistani (Punjab) origin. Eight families (FAM1, FAM2, FAM3, 61030, 61031, 61224, 61244 and 61328) have been reported previously (Li et al., 2018). Family 61334 is a newly  Table 1 in the Supplementary Material the original risk haplotype is shown in blue and recombined haplotypes held in common by pairs of families showing identical recombination points are highlighted various shades of green. The haplotype blocks composed of SNPs showing no recombination in affected members of the nine families are shown in alternating orange and yellow in the marker column (GRCh38.p13). The genomic position of the CLCC1 mutation is shown in bold type.

Position (GRCh38)
Intermarker distance  Marker  Change  FAM3  FAM3  FAM1  FAM1  FAM2  FAM2  61224  61224  61244  61244  61030  61030  61334  61334  61328  61328  61031  61031 107,600,949 rs3828085 T  T  T  T  T  T  T  T  T  T  T  T  T  T  C  C  T  T 107,642,995 3,770 rs4550085 107,899,020 10,700 rs17020294 T > C  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  107,905,580 6,560 rs10494083 T  T  T  T  T  T  T  T  T  T  T  T  --T  T  T  T 107,995,780 7,591 rs12563016 T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  108,198,443 14,676 rs594397 T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T 108,566,144 34,873 rs1353721 108,722,941 5,206 rs4970808 C > T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T 108,760,685 37,744 rs4970811  Table 1 in the Supplementary Material the original risk haplotype is shown in blue and recombined haplotypes held in common by pairs of families showing identical recombination points are highlighted various shades of green. The haplotype blocks composed of SNPs showing no recombination in affected members of the nine families are shown in alternating orange and yellow in the marker column (GRCh38.p13). The genomic position of the CLCC1 mutation is shown in bold type.  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T 108,963,067 12,691 rs550743 T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T 109,206,449 8,523 rs3197233 T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T 109,369,429 31,330 rs17646665 T  T  T  T  T  T  T  T  T  T  T  T   109,473,110  92,055  rs10494040  C > T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T  T 109,505,814 32,704 rs3738772 T  T  T  T  T  T  T  T  T  T  T  T 109,519,003 5,499 rs534135

Haplotype Analysis and Age Estimation
SNP haplotypes in individuals carrying the c.75C > A mutation were delineated by examination as most individuals were homozygous for the region immediately surrounding the mutation. The haplotype was extended until only two families maintained the original founder haplotype. The ancestral haplotype was determined by identifying the common haplotype shared by all patients and extending the founder haplotype as the adjacent SNP genotypes shared by the largest number of samples. In 3 cases two alternate haplotypes diverging at the same SNP were shared by two families, and those pairs were considered to share a common ancestor at the divergence point.
Haplotypes were sorted into those requiring one or two recombination events in their descent from the ancestral haplotype by examination, minimizing the number of recombination events required . The mutation age in generations was estimated using three independent but related approaches. The first estimation was performed as described in Equation (Hartong et al., 2006) of Risch et al. (Risch et al., 1995;Colombo, 2000): g = log(δ)/log (1-δ), where δ is the linkage disequilibrium constant, δ=(P D − P N )/(1− P N ), with P D being the frequency of the allele on chromosomes of mutation carriers and P N being the frequency of the allele in the control population. Genotypes of unaffected individuals were taken from the 1,000 Genomes database (96 samples in the South Asian population) (Genomes Project et al., 2015). Haplotypes were estimated using the EM (expectation maximization) and CHM (composite haplotype) methods as implemented in the Golden Helix SVS (Golden Helix, Bozeman, MT, United States). The map distances were inferred on the basis of the physical distances as given in GRCh38/hg38 from the UCSC Genome Browser assuming 1 Mb corresponds to 1 cM. Alleles at SNPs showing no recombination in the cases were collapsed into a single haplotype block and analyzed as a single marker in the two markers approach as described in Equation (Bird, 1995) of Risch et al. In the second approach, marker specific values of g were estimated using the iterative method described in Equation (Hartong et al., 2006) of Goldstein et al. (Goldstein et al., 1999): K = cR + μM+(1-c-μ)I, where c and μ are the recombination and mutation rates, respectively, R is a 2 × 2 matrix with R11 = R12 = a, and R21 = R22 = 1 − a, where a is the frequency of the ancestral allele in the control population, M is a 2 × 2 matrix of mutation probabilities with M11 = 0, M12 = 1/3, M21 = 1, and M22 = 2/3, modified from Goldstein et al. to account for the frequency with which a mutation in a SNP might remove an ancestral allele (all possible bases) or move to an ancestral allele (1 of three possible bases), and I is the 2 × 2 identity matrix. The original frequency vector is (1, 0) as it occurs on the founder haplotype, and this association is reduced by multiplying by K at each generation (g) until the current frequency of the ancestral allele is reached. Once more, alleles at SNPs showing no recombination in the cases were collapsed into a single haplotype block and analyzed as a single marker.
In the third approach, the marginal posterior probability distribution of the age (Slatkin and Rannala, 2000) of the c.75C > A mutation was estimated using the DMLE+2.3 software developed by Reeve and Rannala (Reeve and Rannala, 2002). This program estimates the age in generations by comparing the observed haplotypes in chromosomes from affected and unaffected sample sets considering the map distances, the population growth rate ( gen r), and the proportion of the mutation bearing chromosomes sampled, but has a strong dependence on assumptions regarding the population history of Pakistan. Somewhat arbitrarily, population growth was modeled using growth rates of 0.5, 1.0, 1.5 and 2.0%, as the population history of Pakistan and especially the Indus valley, from which most of our families are drawn, is complicated by a decline around 1,800 BC, the influx of new populations from around 2,000-1,500 BC, and invasion by a number of foreign powers including the army of Alexander the Great, the Arab and Muslim conquests and the British Indian Empire, each of which impacted the population number and composition.
In the final approach the estimate used the Gamma method as described by Gandolfo et al. (Gandolfo et al., 2014) as implemented in the WEHI Bioinformatics website: https:// shiny.wehi.edu.au/rafehi.h/mutation-dating/. It is based on the length of conserved ancestral segments surrounding the mutation, and requires few additional assumptions, being specifically applicable for use with dense SNP haplotypes and small sample sizes.

Families With the c.75C > A Mutation
The present study includes nine RP families carrying the c.75C > A mutation. The majority of families, including FAM1, FAM2, 61224, 61244, 61030, 61328 and 61031 from Pakistan as well as a British-Bangladeshi family (FAM3), have been described previously (Li et al., 2018). One new family from Pakistan with the CLCC1 mutation (61334) has been identified more recently and met the same criteria for diagnosis of arRP. The SNP haplotypes of the affected families are shown in Table 1 and the family structures are shown in Figure 1.
Whole genome sequencing and SNP mapping of these families initially identified a 1.5 Mb region of homozygosity on chromosome 1p13.3 common to affected family members of all nine families (flanked by markers rs3828085 and rs1030926216), suggesting a single ancestral founder mutation as the cause of the condition. Further SNP mapping of all individuals in all nine families narrowed the autozygous region common in all affected family members to a small 158 kb interval flanked by rs10857972-rs570812 (chr1: 108, 814, 814-108,973,113, hg38). All affected members of all the families shared the identical SNP haplotype (Table 1), which is estimated to have a frequency in the Pakistani population (Lahore) of 0.01 by the EM algorithm as incorporated into the Golden Helix SVS program (Bozeman MT), strongly suggesting that arRP in all nine families arose due to autozygosity for the same ancestral mutation (p = 2x10 −11 ). This includes the British-Bangladeshi family (FAM3), which is also likely to be of Pakistani (Punjab) origin.

Haplotype Analysis
The SNP haplotypes from the Pakistan arRP patients harboring the c.75C > A mutation included in this study were extended until the common haplotype was lost (that is, only two families showed the FIGURE 1 | Pedigree figures of CLCC1 mutation in the nine consanguineous families with genotypes of available individuals. The squares and the circles represent males and females, respectively. The black-filled symbols indicate patients with retinitis pigmentosa, and a symbol with a diagonal line indicates a deceased family member. The candidate variants are listed under each pedigree, and the genotypes of the individuals for the variants are marked below each symbol. The c.75C > A mutation is shown in the figure as NC_000001.11:g. 108950376G > T for consistency with the genomic description of the SNP in databases and in Figure 2. "T/T" indicates homozygous variant, "G/T" or "C/T" indicate heterozygous variants, and "G/G" or "C/C" indicates homozygous reference alleles for CLCC1 and CDH23, respectively. CDH23 alleles are shown for Family 61334 as individual five is affected on the basis of inheriting two variant alleles for CDH23 and is only heterozygous for the CLCC1 variant allele, while individual one in that family is homozygous for both variants. Linkage results and SNP haplotypes are shown in Li et al. (Li et al., 2018).
Frontiers in Genetics | www.frontiersin.org March 2022 | Volume 13 | Article 804924 6 identical conserved haplotype) and aligned as shown in Table 1. The haplotype shaded in blue in homozygotes was the most common in affected members of our families and was assumed to be the ancestral haplotype. Divergent regions of the patient haplotypes are unshaded, except for three pairs of two families each (FAM2 and FAM3; 61030 and 61334; 61031 and 61328), in which the haplotype diverge at the same SNP, and the divergent haplotypes are identical for each pair beyond the divergent SNP. These families are assumed to represent offspring of a single divergent ancestor and are colored various shades of green. Two arRP associated mutations are segregating in family 61334 (Figure 2). The CLCC1 c.75G > T mutation is homozygous in individuals 1, 4, 6 and 10, and the CDH23 c.1595C > T mutation is homozygous in individuals 1 and 5, so that individual 1 has arRP on the basis of mutations in both genes.
The arRP-associated haplotypes from the nine affected individuals are shown in Figure 2, ordered into three levels based on the number of recombination events required to derive each haplotype from the founder haplotype, which is shown at the top (level 0). Haplotype blocks are designated by the first SNP from the c.75C > A mutation in the block. A single recombination event can generate the haplotypes shown in level 1, four different recombination events (one in families 61030 and 61334, a second in families 61031 and 61328 and a third in FAM1) above and two (one in FAM2 and FAM3 and one in 61244) below the mutation. From five of these haplotypes the remainder, seen on level 2, can be generated by a second recombination on the opposite side of the mutation from the first. Thus, all haplotypes bearing the NC_000001.11:g. 108950376G > T mutation can be generated from the founder haplotype by a maximum of two recombination events, with family 61224 maintaining the founder haplotype and 61244 showing only a single recombination.

Age Estimation of the c. 75C > A CLCC1 Mutation
The age of the c.75C > A mutation was estimated in the Pakistan population using the approaches described by Risch et al. (Risch et al., FIGURE 2 | Haplotypes of the families and their derivation from the original risk haplotype, as shown by the compressed haplotype blocks labeled by the first SNP from the c.75C > A mutation (here shown in the figure as NC_000001.11:g. 108950376G > T for consistency with other SNPs). The original risk haplotype is shown in dark blue, and recombined haplotype blocks are shown in light blue. The mutation is shown in red and haplotypes in families 61328 and 61244 with the identical initial allele but part of a haplotype differing from those families 61224 (the founder haplotype) and 61030/61334 respectively are shown in gold.
Frontiers in Genetics | www.frontiersin.org March 2022 | Volume 13 | Article 804924 1995; Colombo, 2000), and Goldstein et al. (Goldstein et al., 1999), as well as Bayesian disequilibrium estimates of the marginal posterior density as instituted in the DMLE+2.3 program (Reeve and Rannala, 2002). A summary of these results with the age given in generations is shown in Tables 2, 3 and Figures 3, 4. Using the multiple marker approach method described by Risch et al., with nonrecombinant markers combined into haplotype blocks provides estimates ranging from 79 (1975 years) to 196 (4,900 years) generations. Using the method described by Goldstein et al., similar ranges were obtained for the shared haplotypes: 79 to 196 generations (1975-4,900 years). It can be seen that similar estimates were obtained with the Risch and Goldstein approaches, although there is some variability depending on how the haplotype frequencies were estimated, and significant variability depending on which haplotype block was used to estimate the age.
Analysis of the mutation age (Tables 2,3; Figure 4) using DMLE+2.3 showed that variations in the probability of an independent identical mutation does not have a major influence on the age estimate, whereas variations in the assumed growth rate of the Pakistani population is crucial. When the assumed population growth rate was 0%, the mean age estimate was 2,160 generations (95% CI: 211-5,924). When the population growth rate was increased to 2%, the mean estimated mutation age fell to 309 generations (95% CI: 116-326). The median estimated mutation age was somewhat lower than the mean, reflecting the effect of the skewed distributions, greater at lower population growth rates. Estimation of the mutation age was also carried out using the Gamma method as proposed by Gandolfo et al. (Gandolfo et al., 2014), based on ancestral segment lengths estimated from high density SNP data ( Table 2). While the estimates (104 generations for independent ancestry since the 2 | Mutation age estimates in generations of the c.75C > A mutation in the South Asian population by the methods described by Goldstein et al., Risch et al., by Bayesian estimation, and Gamma estimation. Fn(EM): frequency of the observed haplotype in the normal population as estimated by the expectation-maximization method, Fn(CHM): frequency of the observed haplotype in the normal population as estimated by the composite haplotype method, Theta: estimated recombination frequency of the haplotype block to the mutation, Pr (EM): probability of the haplotype in the affected family haplotypes as estimated by the EM method, Pr(CHM): probability of the haplotype in the affected family haplotypes as estimated by the CHM method, Gamma Ind: estimation from segment lengths assuming an 'independent' genealogy, Gamma Cor: Assuming a correlated genealogy. The c.75C > A mutation is alternately referred to as NC_000001.11:g.108950376G > T for consistency with other SNPs.  (Goldstein et al., EM) founder and 99 generations if some sampled individuals have common ancestors since the founder) were somewhat lower than those obtained with DMLE, they still have overlapping confidence limits and are similar to some of the lower estimates from the Risch and Goldstein approaches.

DISCUSSION
In the current study, we have investigated nine unrelated Pakistani derived arRP patients (and families) carrying an identical c.75C > A CLCC1 mutation. A common shared haplotype of 99 microsatellite markers covering a region of approximately 3.5 Mb surrounding the c.75C > A CLCC1 mutation on chromosome 1p13.3 clearly indicates that these families originate from a common ancestor, with a p < 2x10 −11 for the original eight families reported (Li et al., 2018) and p < 6x10 −13 with the additional family reported here.
Depending on different assumptions in the model related to assured phases in families, mutation frequencies at the different SNPs, recombination frequencies between SNP markers, and the likelihood of an independent identical CLCC1 mutation occurring in these families, we estimate that the mutation may have originated from 79 to 196 generations, or approximately 1975-4,900 years ago, if an average generation is assumed to be 25 years (Risch et al., 1995). A Bayesian approach yielded somewhat older estimates (309-2,160 generations), although the wide 95% confidence limits resulted in overlap of the ranges of the predictions for most analyses.
CLCC1 is transmembrane protein functioning as an anion and particularly chloride channel in the ER and possibly the Golgi apparatus and nucleus in some cell types (Nagasawa et al., 2001;Jia et al., 2015;Li et al., 2018). Notably, downregulation of this gene sensitizes cultured cells to chemically induced ER stress. Furthermore, loss of CLCC1 function in vivo results in upregulated expression of UPR target genes and the accumulation of ubiquitin-positive inclusions in neurons before their degeneration, suggesting that disruption of chloride/anion concentrations in the ER leads to loss of ER homeostasis and eventual neuron death (Jia et al., 2015). CLCC1 appears to interact with the mitochondrial microprotein PIGBOS to prevent induction of the UPR and cell death (Chu et al., 2019). Similarly, knockdown of CLCC1 by siRNA results in apoptotic cell death in cultured ARPE19 cells (Li et al., 2018). Li et al. (Li et al., 2018) previously identified the homozygous missense alteration (c.75C > A, p. D25E) in CLCC1 associated with autosomal recessive retinitis pigmentosa (arRP) in eight consanguineous families of Pakistani descent that were suggested to originate from a common founder on the basis of sharing a common SNP haplotype in the CLCC1 gene region.
The c.75C > A CLCC1 mutation is a moderately common cause of arRP in the Pakistan population and so far has been identified in nine different families from or originating in the Punjab region of Pakistan and thus being the cause of arRP in roughly 6% of families . While the relatively short length of the haplotype shared by the affected families suggested that the original mutation might be old, incomplete haplotypes and lack of recombinant SNP genotypes at the ends of the haplotype in some families prevented a formal estimation of the mutation age. Completion and extension of the SNP haplotypes in all family members allowed estimation of the population age and history of the c.75C > A mutation. The mutation age was estimated using four approaches, all based on the recombination frequencies between the mutation and markers in affected individuals, marker mutation rates, allele frequencies in the general population and affected family members, and the genetic distance of markers from the mutation. These included an analytical approach as described by Risch et al. (Risch et al., 1995), an iterative approach as described by Goldstein et al. (Goldstein et al., 1999), a Gamma method based on SNP haplotypes (Gandolfo et al., 2014) and a posterior Bayesian distribution approach as implemented in the program DMLE+2.3 (Reeve and Rannala, 2002). The estimates made using the first two approaches agreed closely, usually to within a generation for each marker. There was some variation generated by the estimated haplotype frequencies (using the CHM or EM method), although it was relatively small for most markers, usually within 10-15% of the estimated mutation age. The mutation rates, and especially the back-mutation rates, are much smaller for SNP markers than for microsatellites and have only a minimal effect, especially as there is no large variation influenced by the number of repeat units, as occurs with microsatellites (Chakraborty et al., 1997;Eckert and Hile, 2009). As is apparent from Tables 2, 3, the major source of variation lies in uncertainty regarding the recombination frequencies between each haplotype block and the c.75C > A mutation. This probably relates to the conversion value of 1 Mb ≈ 1 cM, which is only approximate, and can vary significantly in different regions of the chromosomes as well as between males and females (Broman et al., 1998).
Estimation of the mutation age using a posterior Bayesian distribution, while using all the information in the haplotypes, was excruciatingly dependent on the population growth estimates. These are difficult to ascertain, especially since the population growth of the Punjab region has not been constant over the estimated age of the mutation. Additionally, populations derived from a region including southwestern Iran flourished in the Punjab region until approximately 1800 BC, at which time the population constricted, coinciding with the influx of new population groups from the central Asian steppes (McElreavey and Quintana-Murci, 2005;Narasimhan et al., 2019). In addition, the region experienced subsequent invasions including those of Alexander the Great, the Umayyad and later Arab and Muslim conquests.
Each of these would have impacted the Indus valley population growth and composition in ways that are difficult to predict. For this reason, DMLE modeling was carried out under a series of assumed population growth rates ranging from 0 to 2%. The estimated mutation age was maximal at 2,160 generations under the 0% growth model, decreasing to 309 as the assumed growth rate was set at 2%, a reasonable estimate of the population growth of Pakistan as a whole since 1900, although growth in some recent years has been higher, especially between 1965 and 2005 (World Population Prospect, 2019). Finally, the Gamma method is particularly appealing for our families since it is intended for use with small sample numbers and, being based solely on the length of the conserved ancestral segment as estimated by a dense SNP haplotype, requires few assumptions regarding population structure and growth rate. In our families, as indicated in Figure 2, the analysis based on a correlated ancestry is probably the correct one to use, but the independent analysis is included for comparison, and the two estimates are actually quite close.
There is the possibility that the high rate of consanguinity in the Pakistan population might affect the mutation age estimates by all approaches. While recombination events would be undetectable in individuals homozygous for the risk allele (and thus affected) decreasing the apparent recombination rate, these would be fairly infrequent as affected individuals comprise only one fourth of the offspring of two heterozygous individuals, and as can be seen from the pedigree structure most matings in our families are not between two homozygotes, even though the pedigrees are selected for having a high number of affected individuals.
In conclusion, our results provide evidence that the c.75C > A mutation in the CLCC1 gene is significantly associated with arRP in Pakistan population. Moreover, haplotype analysis using highdensity SNP data, as well as variant age estimates, strongly support a founder origin and linear pattern of descent for this mutation instead of multiple independent occurrences. The identification of founder mutation, such as the one here reported, may contribute for the development of more cost-efficient screening.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.