Exome Sequencing Analysis Identifies Rare Variants in ATM and RPL8 That Are Associated With Shorter Telomere Length

Telomeres are important for maintaining genomic stability. Telomere length has been associated with aging, disease, and mortality and is highly heritable (∼82%). In this study, we aimed to identify rare genetic variants associated with telomere length using whole-exome sequence data. We studied 1,303 participants of the Erasmus Rucphen Family (ERF) study, 1,259 of the Rotterdam Study (RS), and 674 of the British Heart Foundation Family Heart Study (BHF-FHS). We conducted two analyses, first we analyzed the family-based ERF study and used the RS and BHF-FHS for replication. Second, we combined the summary data of the three studies in a meta-analysis. Telomere length was measured by quantitative polymerase chain reaction in blood. We identified nine rare variants significantly associated with telomere length (p-value < 1.42 × 10–7, minor allele frequency of 0.2–0.5%) in the ERF study. Eight of these variants (in C11orf65, ACAT1, NPAT, ATM, KDELC2, and EXPH5) were located on chromosome 11q22.3 that contains ATM, a gene involved in telomere maintenance. Although we were unable to replicate the variants in the RS and BHF-FHS (p-value ≥ 0.21), segregation analysis showed that all variants segregate with shorter telomere length in a family. In the meta-analysis of all studies, a nominally significant association with LTL was observed with a rare variant in RPL8 (p-value = 1.48 × 10−6), which has previously been associated with age. Additionally, a novel rare variant in the known RTEL1 locus showed suggestive evidence for association (p-value = 1.18 × 10–4) with LTL. To conclude, we identified novel rare variants associated with telomere length. Larger samples size are needed to confirm these findings and to identify additional variants.


INTRODUCTION
Telomeres are DNA structures located at the ends of chromosomes and consist of tandem hexanucleotide sequence repeats (TTAGGG) (Blackburn and Gall, 1978). They are important for maintaining genomic stability by preventing DNA degradation and chromosomal fusions (Blackburn, 2000). Telomeres are shortened with each cell division due to the inability of DNA polymerase to fully extend the 3 end of the DNA strand during replication. When the telomeres reach a critical length, this leads to cellular senescence and ultimately to cell death, making them regulators of the replicative capacity of a cell and markers of biological age (Lindsey et al., 1991;Allsopp et al., 1992).
Up until now, no systemic whole exome or genome screen for rare variants has been published, despite the fact that these may explain part of the heritability (Manolio et al., 2009). Rare variants are not well captured by microarrays used for GWAS and remain difficult to impute, despite the recent improvements in imputation panels (McCarthy et al., 2016). Next generation sequencing technologies, such as whole-exome sequencing (WES), are better suited to study rare variants. In this study, we present a dual analysis. First, we conducted a genomewide WES analysis of LTL in 1,303 individuals of the Dutch Erasmus Rucphen Family (ERF) study to search for rare genetic variants associated with LTL. The advantage of a family-based study is that the segregation of rare variants can be studied. We performed a replication analysis in the Rotterdam Study (RS) and the British Heart Foundation Family Heart Study (BHF-FHS). Next, we pooled the data together and conducted a meta-analysis of the association results of all three cohorts.

RESULTS
Descriptive statistics of the family-based and population-based studies are provided in Table 1. Mean age at LTL measurement was 49 years (SD = 15.0) in the ERF study and 61% of the study participants were female. The RS participants were older (x age = 75 years, SD = 7.7) and 57% of the participants were female, while mean age in the BHF-FHS was 58 years (SD = 8.2) and most study participants were male (26% female). Mean LTL values, measured in each participant using a quantitative polymerase chain reaction (qPCR) based technique as the ratio of telomere repeat length to copy number of the single copy gene 36B4, were higher in the ERF study (x LTL = 1.85, SD = 0.35) than in the RS (x LTL = 0.94, SD = 0.18) and BHF-FHS (x LTL = 1.37, SD = 0.22). After adjusting LTL values for age and sex, mean LTL was comparable between studies ( Table 1).
The Manhattan plot and the distribution of the test statistic [quantile-quantile (QQ) plot, λ = 1.04] of the WES analysis in the ERF study are presented in Figures 1, 2, respectively. We observed significant association of nine rare variants [Minor Allele Frequency (MAF) between 0.2% and 0.5%] with LTL as shown in Table 2. The significance threshold (pvalue < 1.42 × 10 −7 ) was adjusted for multiple testing using Bonferroni correction based on the number of variants in the analysis (0.05/353,075 variants). Each variant was negatively associated with LTL and the estimated effects of the minor allele of these variants were large (−2.18 < standardized β < −1.34), suggesting a significant decrease in LTL for each minor allele.
The top eight variants are located in a dense region on chromosome 11q22.3 (position range: 108004687 -108384666, Figure 3) and appear to be a part of a haplotype that spans the C11orf65, ACAT1, NPAT, ATM, KDELC2, and EXPH5 genes. A haplotype can describe a pair of genes inherited together from one parent on one chromosome, or it can describe all of the genes on a chromosome that were inherited together from a single parent. This haplotype segregates with shorter LTL in a family (Supplementary Figure 1), where it is carried by 14 individuals, 11 of whom were related within 4 generations according to the  pedigree data (Figure 4). Further, the genetic kinship estimates show that the other three individuals are also related within 3-4 generations. These 8 variants are in strong linkage disequilibrium (pairwise LD: r 2 between 0.93 and 1.00, D = 1) and show very similar p-values. The top variant rs185270276 is located in an intron of the C11orf65 gene (MAF = 0.5%, β = −1.34, SE = 0.25, p-value = 7.99 × 10 −8 ). The next six variants significantly associated with LTL (MAF = 0.5%, β = −1.38, SE = 0.26, p-value = 1.21 × 10 −7 ) are located in the ACAT1, NPAT, ATM/C11orf65, KDELC2, and EXPH5 genes. Two of these six FIGURE 2 | Quantile-quantile plot of the association analysis with LTL in the ERF study. The QQ-plot shows the observed test statistics (y-axis) plotted against the expected values of the test statistics (x-axis) (X 2 -distribution). The red line shows the distribution under the null hypothesis.
For replication analysis, we used WES data from two independent cohorts of European ancestry, the RS and the BHF-FHS. Results of the replication analysis are shown in Table 3, together with the results of the meta-analysis of summary statistics from all three cohorts for these variants. Six out of nine rare variants significantly associated with LTL in the ERF study (located in ACAT1, NPAT, ATM/C11orf65, EXPH5, and BTN3A1) were present in the RS and/or the BHF-FHS but were not significantly (p-value ≥ 0.21) associated with LTL (p-value < 0.025, 0.05/2 independent tests). The direction of effect of most variants was similar in the ERF study and the RS, while the BHF-FHS showed opposite direction of effect for most variants. Three variants, located in C11orf65, KDELC2, and ATM, were not present in the RS and BHF-FHS data. As these are unique to an isolated population, we were unable to confirm or reject their association with LTL in the replication cohorts.
Finally, to increase the statistical power, we performed an inverse-variance weighted meta-analysis of the association results from all three cohorts using METAL software. Variants were included in the meta-analysis if they were present in at least two out of three cohorts and had a minimum minor allele count of five in one or more cohorts, resulting in a multiple testing corrected significance threshold of 3.02 × 10 −7 (0.05/165,311 variants). The top results of the meta-analysis (p-value < 3.02 × 10 −4 ) are  available in Table 4 and in Supplementary Table 1, which also contains cohort specific information. The Manhattan and QQplots are provided in Supplementary Figures 3, 4, respectively. The λ of 0.97 suggests the power has been low. Although there were no variants genome-wide significantly associated with LTL in the meta-analysis after adjusting for multiple testing, many of the top findings show a consistent effect across cohorts. The variant most significantly associated with LTL was a highly conserved synonymous variant (PhastCons score = 0.999) located in the RPL8 gene on chromosome 8 (p-value = 1.48 × 10 −6 ), which is predicted to be deleterious (CADD score = 15.11). Additionally, we used the meta-analysis results to perform a look-up of variants in loci identified by previous European ancestry GWASs (Supplementary Table 2). There were 153 variants present in these loci and we found suggestive evidence (p-value meta < 3.02 × 10 −4 ) for a positive association of a rare variant with LTL (rs181080831, β = 0.74, SE = 0.19, p-value = 1.18 × 10 −4 ) in the known RTEL1 locus.
FIGURE 4 | Segregation plot of the rare variants on chromosome 11q22.3 in the ERF study. The carriers of the rare variants located on chromosome 11q22.3 are depicted in red. The T/S ratio is added below the individuals of whom whole-exome sequencing data was available. Squares are males and rounds are females. Deceased individuals are denoted by a line through that specific individual. One person is twice in the pedigree, this is shown with the dotted line.

DISCUSSION
In the family-based ERF study, we identified nine rare variants (MAF between 0.2% and 0.5%) associated with LTL by performing a WES association analysis. The eight most significantly associated variants are located in a region on chromosome 11q22.3 and segregate together with shorter LTL in one family. This region contains the ATM locus that has previously been shown to be involved in telomere maintenance and genomic stability and is thus an obvious candidate gene. In the meta-analysis of discovery and replication cohorts, we identified another rare missense variant in the RPL8 gene strongly associated with LTL. Although we were not able to replicate either of the associations, both ATM and RPL8 have been previously found to be strong predictors of telomere length (ATM) and chronological age (ATM and RPL8).
Interestingly, we identified three unique rare variants in the ATM (Ataxia Telangiectasia Mutated) gene associated with LTL in the ERF study. ATM is the homolog of the Tel1 gene in yeast (Greenwell et al., 1995) and has been implicated in important telomere maintenance processes (Pandita, 2002;Lee et al., 2015;Tong et al., 2015). The ATM protein kinase is a master controller of cell cycle checkpoint signaling pathways required for cell response to DNA damage and for genomic stability (https: //www.ncbi.nlm.nih.gov/gene/?term=472). Additionally, ATM kinase is necessary for telomere elongation Tong et al., 2015). ATM is involved in the genetic disorder ataxia telangiectasia (AT), which is characterized by cerebellar ataxia, oculocutaneous telangiectasia, immunodeficiency, and a predisposition to cancer (Savitsky et al., 1995). Cells of AT patients also show telomeric fusions and have accelerated telomere shortening with increasing age (Metcalfe et al., 1996). Furthermore, ATM was significantly associated with  chronological age in a meta-analysis of gene expression profiles, showing lower transcript abundance in older individuals (Peters et al., 2015). Although genetic variants in ATM have been associated with various cancers (Barrett et al., 2011;Helgason et al., 2015;Ransohoff et al., 2017;Scelo et al., 2017), only one genetic variant in ATM, rs227080, was genome-wide significantly associated with LTL in a Singaporean Chinese population (Dorajoo et al., 2019). This variant was not significantly associated with LTL in the ERF study, the Rotterdam Study or the BHF-FHS, implicating it was not driving the association observed in the current study. Segregation analysis of the variants within the chromosome 11q22.3 region in the ERF study showed that the variants segregate with shorter LTL in one family. We did not detect a specific disease that segregates in this family, which may be explained by the relatively young age of most of the carriers. In the replication analysis, we were not able to confirm the association of the nine variants associated with LTL in the RS and BHF-FHS, due to lack of association (six variants) or absence of the variant in the replication cohorts (three variants). However, it is possible this signal comes from variants that are family-specific and thus may not be transferable to the general population. Likewise, it may also be that the whole haplotype has an effect on telomere length, an haplotype that is most likely unique to the isolated ERF population. At this locus, the ATM gene is currently the most likely causative gene. Lastly, we performed a meta-analysis of the three cohorts. The top variant of the meta-analysis is located in RPL8 (Ribosomal Protein L8). This is interesting as RPL8, together with six other RPL genes, was negatively associated with chronological age (Peters et al., 2015). In this transcriptomic study, Peters et al. identified 1,497 genes whose expression level changes associated with age; this gene list includes RPL8 and ATM. We additionally found suggestive evidence for an association of a novel rare variant in the known RTEL1 locus. It would be interesting to perform a WES or whole-genome sequencing meta-analysis with larger sample size to increase the statistical power.
The advantage of our study is that we used a family-based setting that allowed us to show segregation of the variants located on chromosome 11q22.3 in a family. The ERF study population has shown a low immigration rate and a high level of inbreeding, which has increased the frequency of many rare alleles (Pardo et al., 2005). Another advantage is that all three studies used the standardized qPCR method to quantify LTL. However, there are also several limitations to our study. The first limitation is that our findings in the ERF study are not easy to generalize to the general population as these findings may be family-specific. However, we identified genes that are known to be related to telomere processes and thus are plausible candidate genes. The second limitation of our study may be that we used blood measurements of telomere length instead of tissue specific measurements, although previous studies have shown that mean telomere length in blood and other tissues are highly correlated (Okuda et al., 2002;Wilson et al., 2008) and, therefore, we expect this did not influence our findings. The third limitation is the difference in LTL distribution between the three studies, which may be explained by the different age distributions in the studies.   Mean age is lowest in the ERF study (49 years), highest in the RS (75 years) and mean age in the BHF-FHS is in between (58 years). As LTL decreases significantly with age (Lindsey et al., 1991;Slagboom et al., 1994;Blackburn, 2001) and is associated with mortality (Cawthon et al., 2003;Kimura et al., 2008;Fitzpatrick et al., 2011;Honig et al., 2012;Deelen et al., 2014;Rode et al., 2015;Marioni et al., 2016), the variation in LTL becomes less with the aging populations. Mean LTL values were comparable between studies after adjusting LTL values for age and sex. To further standardize across the three cohorts, a z-transformation of the LTL values was performed resulting in comparable distributions with mean of zero and standard deviation of one. Nevertheless, these differences between studies, together with the small sample size of the BHF-FHS cohort, could potentially explain the lack of replication. The fourth limitation is that we were unable to calculate the effect of the variants on telomere length in base pairs because of the quantification method of LTL in our study together with the z-transformation that was applied to the LTL values to standardize across the cohorts.
To conclude, this first study using WES data to search for rare genetic variants associated with LTL has identified interesting variants and genes associated with shorter LTL. Eight out of nine rare variants associated with LTL are located on chromosome 11q22.3 and all variants segregate within an ERF family. As we were not able to replicate findings, future studies should further investigate this region and the other genes identified in this study to confirm their involvement in telomere length regulation.

Study Populations
Our discovery population consisted of participants from the family-based Erasmus Rucphen Family (ERF) study. The ERF study comprises approximately 3,000 inhabitants of a recent genetically isolated community in the Southwest of the Netherlands, studied as part of the Genetic Research in Isolated Population program (Pardo et al., 2005). All ERF participants are descendants of 22 founder couples who had at least six children baptized in the community church in the 18th century, or their spouses. Baseline data collection, including blood withdrawal, took place between 2002 and 2005. As the ERF population shows a low rate of immigration and a high level of inbreeding, the frequency of several rare alleles is increased in this population (Pardo et al., 2005). The ERF study was approved by the Medical Ethics Commitee of the Erasmus Medical Center (MC), Rotterdam, Netherlands. All participants provided written informed consents and all investigations were carried out in accordance with the Declaration of Helsinki.
The replication cohorts included participants from the Rotterdam Study (RS) and the British Heart Foundation Family Heart Study (BHF-FHS). The RS is a prospective cohort study ongoing since 1990 in the well-defined Ommoord district in Rotterdam, The Netherlands. The original RS cohort (RS-I) included 7,983 individuals of 55 years of age or over. At baseline, participants were interviewed at home and had an extensive set of examinations, which were repeated every 3-4 years (Ikram et al., 2017). The Rotterdam Study has been approved by the Medical Ethics Committee of the Erasmus MC and by the Dutch Ministry of Health, Welfare and Sport. The Rotterdam Study has been entered into the Netherlands National Trial Register (NTR; www.trialregister.nl) and into the WHO International Clinical Trials Registry Platform (ICTRP; www.who.int/ictrp/ network/primary/en/) under shared catalogue number NTR6831. All participants provided written informed consent to participate in the study and to have their information obtained from treating physicians.
The British Heart Foundation Family Heart study recruited families with at least two siblings diagnosed with premature (<66 years) coronary artery disease (CAD) within the United Kingdom between 1998 and 2003. Full details are provided elsewhere (Samani et al., 2005(Samani et al., , 2007.

Telomere Length
Mean LTL values in the ERF study and the BHF-FHS study were measured using a qPCR method in all samples (Cawthon, 2002). The measurements were performed in Leicester, United Kingdom, and details of the measurements were previously described (Codd et al., 2010. In summary, mean LTL was measured in leukocytes and expressed as the ratio (T/S ratio) of telomere repeat length (T) to the copy number of a single-copy gene, 36B4 (S). Samples were quantified relative to a calibrator sample used on each run (DNA from the K562 cell line) (Codd et al., 2010).
In the RS, mean LTL values were also measured using a qPCR assay based on the method described by Cawthon (2002) with minor modifications. For each sample the telomere and 36B4 assay were run in separate wells but in the same 384 wells PCR plate. Each reaction contained 5 ng DNA, 1 uM of each of the telomere primers (tel1b-forward: GGTT TGTTTGGGTTTGGGTTTGGGTTTGGGTTTGGGTT, tel2breverse: GGCTTGCCTTACCCTTACCCTTACCCTTACCCTTA CCCT) or 250 nM of the 36B4 primers (36B4u-forward: CAG CAAGTGGGAAGGTGTAATCC, 36B4d-reverse: CCCATTCT ATCATCAACGGGTACAA) and 1x Quantifast SYBR green PCR Mastermix (Qiagen). The reactions for both assays were performed in duplicate for each sample in a 7900HT machine (Applied Biosystems). Ct values and PCR efficiencies were calculated per plate using the MINER algorithm (Zhao and Fernald, 2005). Duplicate Ct values that had a Coefficient of Variance (CV) of more than 1% were excluded from further analysis. Using the average Ct value per sample and the average PCR efficiency per plate the samples were quantified using the formula Q = 1/(1 + PCR eff)ˆCt. The relative telomere length was calculated by dividing the Q of the telomere assay by the Q of the 36B4 assay. To validate the assay 96 random samples were run twice and the CV of that experiment was 4.5%.

Exome Sequencing
The exomes of 1,336 ERF participants were sequenced at the Erasmus Center for Biomics of the department of Cell Biology, Erasmus MC, The Netherlands. The exomes of a randomly selected subset of 2,628 individuals from the RS-I population were sequenced at the Human Genotyping facility of the Department of Internal Medicine, Erasmus MC, The Netherlands. Details of the methods and quality control for ERF and RS are described elsewhere (Amin et al., 2016;van Rooij et al., 2017). In total, 1,303 ERF participants and 1,257 RS participants had both exome sequence and LTL data available and were included in this analysis. A subset of the BHF-FHS, comprising of 674 unrelated individuals of Caucasian ancestry who had previously undergone exome sequencing as part of the Leicester Myocardial Infarction Study (Khera et al., 2017) and had LTL data available (Codd et al., 2010) were included in this analysis.

Statistical Analyses
For each individual cohort quantitative trait association analysis was performed using Rare Variant tests (RVtests) software, which supports the analysis of related individuals (Zhan et al., 2016). Association analysis was performed using a score test, assuming an additive model, suitable for analysis with related and unrelated individuals. We applied a z-transformation of LTL values for the three cohorts separately to standardize values across cohorts. All analyses were adjusted for age, sex and batch effects (if needed). Furthermore, we adjusted for familial relationships in ERF using the kinship matrix estimated from the genotyped data, while in the RS we corrected for the first four principal components as the fourth principal component was significantly associated with LTL. Only variants with a minor allele count ≥ 5 were included.
In the ERF study, we calculated the pairwise LD (r 2 and D ) between the top eight variants on chromosome 11 that were significantly associated with LTL using the -ld command of PLINK 1.9 software (Chang et al., 2015) (www.coggenomics.org/plink/1.9/). Additionally, we performed an inversevariance weighted meta-analysis using METAL software (Willer et al., 2010). In the analysis using data of the family-based ERF study, we corrected the significance thresholds for multiple testing using Bonferroni correction, resulting in a significance threshold of 1.42 × 10 −7 (0.05/353,075). In the replication analysis, the multiple testing corrected p-value thresholds was 0.025 (0.05/2 independent tests). In the meta-analysis, we adjusted for the number of variants tested, resulting in a significance threshold of 3.02 × 10 −7 (0.05/165,311).

DATA AVAILABILITY STATEMENT
The datasets analyzed for each individual cohort can be requested by contacting the responsible Principal Investigator. Because of restrictions based on privacy regulations and informed consent of the participants, data cannot be made freely available in a public repository. For the Rotterdam Study data, requests should be directed toward the management team of the Rotterdam Study (secretariat.epi@erasmusmc.nl), which has a protocol for approving data requests.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by their respective Medical Ethics Boards and all investigations were carried out in accordance with the Declaration of Helsinki. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
AS, NA, and CD designed the study and wrote the manuscript. AS, SW, CN, and SA performed the analyses. LB, PA, RB, MD, MH, JR, RK, and NA were involved in the data collection and provided the data. WI, NS, MI, AU, and CD were involved in the supervision of individual cohorts. AS, SW, CN, DV, NS, VC, NA, and CD interpreted the results. All authors critically reviewed and approved the manuscript. www.bbmri.nl; project number CP2010-41). The Rotterdam Study was funded by Erasmus Medical Center and Erasmus University, Rotterdam, Netherlands Organization for the Health Research and Development (ZonMw), the Research Institute for Diseases in the Elderly (RIDE), the Ministry of Education, Culture and Science, the Ministry for Health, Welfare and Sports, the European Commission (DG XII), and the Municipality of Rotterdam. The BHF-FHS study was funded by the British Heart Foundation (BHF). Exome sequencing of the Leicester Myocardial Infarction Study was supported by grant 5U54HG003067 from the National Institutes of Health. CN and NS were supported by the BHF. NS is a NIHR Senior Investigator.