The Ethnic-Specific Spectrum of Germline Nucleotide Variants in DNA Damage Response and Repair Genes in Hereditary Breast and Ovarian Cancer Patients of Tatar Descent

The Russian population consists of more than 100 ethnic groups, presenting a unique opportunity for the identification of hereditary pathogenic mutations. To gain insight into the landscape of heredity pathogenic variants, we employed targeted next-generation sequencing to analyze the germline mutation load in the DNA damage response and repair genes of hereditary breast and ovary cancer syndrome (HBOCS) patients of Tatar ethnicity, which represents ~4% of the total Russian population. Several pathogenic mutations were identified in DNA double-strand break repair genes, and the spectrum of these markers in Tatar patients varied from that previously reported for patients of Slavic ancestry. The CDK12 gene encodes cyclin-dependent kinase 12, the key transcriptional regulator of the genes involved in DNA damage response and repair. CDK12 analysis in a cohort of HBOCS patients of Tatar decent identified a c.1047-2A>G nucleotide variant in the CDK12 gene in 8 of the 106 cases (7.6%). The c.1047-2A>G nucleotide variant was identified in 1 of the 93 (1.1%) HBOCS patients with mixed or unknown ethnicity and in 1 of the 238 (0.42%) healthy control patients of mixed ethnicity (Tatars and non-Tatars) (p = 0.0066, OR = 11.18, CI 95% = 1.53–492.95, Tatar and non-Tatar patients vs. healthy controls). In a group of mixed ethnicity patients from Tatarstan, with sporadic breast and/or ovarian cancer, this nucleotide variant was detected in 2 out of 93 (2.2%) cases. In a cohort of participants of Slavic descent from Moscow, comprising of 95 HBOCS patients, 80 patients with sporadic breast and/or ovarian cancer, and 372 healthy controls, this nucleotide variant was absent. Our study demonstrates a strong predisposition for the CDK12 c.1047-2A>G nucleotide variant in HBOCS in patients of Tatar ethnicity and identifies CDK12 as a novel gene involved in HBOCS susceptibility.

The Russian population consists of more than 100 ethnic groups, presenting a unique opportunity for the identification of hereditary pathogenic mutations. To gain insight into the landscape of heredity pathogenic variants, we employed targeted next-generation sequencing to analyze the germline mutation load in the DNA damage response and repair genes of hereditary breast and ovary cancer syndrome (HBOCS) patients of Tatar ethnicity, which represents ∼4% of the total Russian population. Several pathogenic mutations were identified in DNA double-strand break repair genes, and the spectrum of these markers in Tatar patients varied from that previously reported for patients of Slavic ancestry. The CDK12 gene encodes cyclin-dependent kinase 12, the key transcriptional regulator of the genes involved in DNA damage response and repair. CDK12 analysis in a cohort of HBOCS patients of Tatar decent identified a c.1047-2A>G nucleotide variant in the CDK12 gene in 8 of the 106 cases (7.6%). The c.1047-2A>G nucleotide variant was identified in 1 of the 93 (1.1%) HBOCS patients with mixed or unknown ethnicity and in 1 of the 238 (0.42%) healthy control patients of mixed ethnicity (Tatars and non-Tatars) (p = 0.0066, OR = 11.18, CI 95% = 1.53-492.95, Tatar and non-Tatar patients vs. healthy controls). In a group of mixed ethnicity patients from Tatarstan, with sporadic breast and/or ovarian cancer, this nucleotide variant was detected in 2 out of 93 (2.2%) cases. In a cohort of participants of Slavic descent from Moscow, comprising of 95 HBOCS patients, 80 patients with sporadic breast and/or ovarian cancer, and 372 healthy controls, this nucleotide variant was absent. Our study demonstrates a strong predisposition for the CDK12 c.1047-2A>G nucleotide variant in HBOCS in patients of Tatar ethnicity and identifies CDK12 as a novel gene involved in HBOCS susceptibility.
Keywords: breast cancer, BRCA1, BRCA2, CDK12, homologous recombination repair, next-generation sequencing, ovarian cancer INTRODUCTION Ovarian (OC) and breast (BC) cancers are the leading causes of oncological mortality in women worldwide (1). Both cancers are highly heterogeneous with a strong hereditary component, as ∼10-15% of OC and 5-7% of BC cases are hereditary (2). The hereditary predisposition for these cancers (hereditary breast and ovarian cancer syndrome, HBOCS) is caused by germline mutations in several genes, primarily those linked to DNA damage recognition and repair. Early diagnosis reduces the disease-associated mortality rate. Therefore, genetic testing for HBOCS predisposition would be a beneficial addition to routine clinical practices.
Currently, genetic risk assessment for HBOCS profiles pathogenic DNA nucleotide variants for a panel of candidate genes. This approach allows for a stratification of patients into different subgroups with tailored therapies and for the identification of individuals at risk of HBOCS before there is a clinical manifestation of the disease (3). Importantly, the distribution pattern of the pathogenic DNA nucleotide variants may differ significantly across different ethnic populations due to the "founder effect" (4), and genetic tests developed for European populations may be clinically uninformative for patients of non-European ancestry. Therefore, genetic testing of patients with diverse ethnic backgrounds should be performed using a panel of markers established specifically for their ethnic group. In Russia, most genetic risk assessment tests for HBOCS include a panel of pathogenic nucleotide variants that are common among patients of European descent such as 5382insC, C61G, 185delAG, 4154delA, and 2080delA variants in the BRCA1 gene. While those nucleotide variants have been comprehensively characterized in Russian Slavic populations (2,(5)(6)(7), recent data indicates that many of them are absent in patients from the Tatar ethnic origin (8). Therefore, there is a clear clinical demand for identification of novel HBOCS predisposing nucleotide variants specific for the Tatar population.
Genomic instability is a hallmark of cancer (9). Defects in DNA damage recognition and repair are associated with a plethora of malignancies including prostate cancer, ovarian cancer, leukemia, and breast cancer (10)(11)(12)(13). In hereditary cancers, a major cause of genomic instability is the inability of the cell to repair DNA damage properly due to germline mutations in genes encoding DNA-repair proteins.
In mammals, the major pathways for DNA repair are base-excision repair (BER), nucleotide-excision repair (NER), non-homologous end joining (NHEJ), and homologous recombination repair (HRR) (14). DNA double-strand breaks (DSBs) are repaired by NHEJ and HRR. The NHEJ pathway orchestrates re-ligation of DSB ends, after removal of damaged nucleotides (15). The HRR pathway repairs DSBs using undamaged homologous DNA as a template sequence. NHEJ is less accurate than HRR, while HRR is characterized by high fidelity and is, therefore, essential for the maintenance of genomic integrity. For many of the genes involved in the HRR pathway, an association with tumorigenesis was clearly demonstrated in both sporadic and hereditary cancers.
The role of DSB repair pathway genes in susceptibility to breast and ovarian cancer has been heavily investigated. The panel of the genes contributing to HBOCS includes several DSB repair genes such as BRCA1, BRCA2, and others (16)(17)(18)(19). Mutations in BRCA1 and BRCA2 genes, which inactivate the corresponding proteins and compromise the function of HRR pathways, contribute to ∼20-25% of HBOCS cases (20,21). However, the remaining cases are comprised of patients with functional BRCA1 and BRCA2 proteins (BRCA1/2 negative HBOCS). For many of these cases, none of the currently used diagnostic markers are present and the predisposition genes remain obscure.
A number of publications indicate that Cyclin-dependent kinase 12 (CDK12), also known as KIAA0904, CRK7, CRKR, or CRKRS, is involved in human tumorigenesis (22). There are recurrent somatic mutations in the CDK12 gene identified in OC (23). Moreover, somatic mutations resulting in CDK12 inactivation are associated with genomic instability in OC (24). CDK12 is also an emerging candidate BC tumor suppressor gene (25).
CDK12 is a serine/threonine protein kinase, a member of the cyclin-dependent kinase family. It is a multifunctional protein involved in many cellular processes such as alternative last exon mRNA splicing (21), embryonic stem cells renewal (26), cellular stress-response (27), and regulation of global transcription by targeting of RNA polymerase II, the polymerase that transcribes mRNA for protein-coding genes (28). Importantly, CDK12 is a key regulator of expression of DNA damage response genes. While depletion of CDK12 does not significantly affect global transcription, it dramatically diminishes transcription of the genes involved in DNA damage response and repair pathways including BRCA1, a gene established to convey HBOCS predisposition. Furthermore, cells with CDK12 depletion are more sensitive to DNA damaging agents and exhibit a higher rate of spontaneous DNA damage (29). Thus, CDK12 plays a pivotal role in the maintenance of genomic stability (30). However, currently there is little data on the role of CDK12 germline mutations in HBOCS pathogenesis. We propose that CDK12 is a candidate gene for HBOCS predisposition.
The aim of this study was to identify a panel of DNA nucleotide variant markers for HBOCS syndrome genetic screening in patients of Tatar ethnic origin. Using Targeted Next Generation Sequencing, we tested a panel of markers in the ATM, BARD1, BRCA1, BRCA2, CDH1, CDK4, CDK12, CDKN2A, CFTR, CHEK1, CHEK2, CTNNA1, EPCAM, FANCI, FANCJ/BRIP1, FANCL, MLH1, MSH2, MSH6, MUTYH, PALB2, PARP1, PDGFRA, PMS2, PPP2R2A, PRSS1, RAD51B, RAD51C, RAD51D, RAD54L, SPINK1, STK11, TP53, and XRCC3 genes of 199 HBOCS patients (Tatars and non-Tatars from the Volga District, Tatarstan Republic). Several pathogenic nucleotide variant markers were identified in the BRCA1, BRCA2, CDH1, CDK12, CHEK2, FANCI, MUTYH, MSH2, and RAD51C genes. The marker distribution profile in Tatars was found to be different than those in the Slavic group, though there is a relatively low prevalence of BRCA1 and BRCA2 founder mutations in Slavic populations. This suggests that HBOCS genetic predisposition tests for Tatar patients should be different than those used for Slavic populations. We found a novel c.1047-2A>G nucleotide variant of the CDK12 gene that was strongly associated with HBOCS and present only in HBOCS patients of Tatar ethnic origin. To the best of our knowledge, our study is the first demonstrating that CDK12 c.1047-2A>G nucleotide variation results in HBOCS predisposition, indicating CDK12 involvement in HBOCS.

MATERIALS AND METHODS
The study cohort comprised of female patients with a familial history of OC and/or BC (HBOCS) as well as healthy donors without a familial history of OC and/or BC obtained from the Republican Clinical Oncology  The study participants in the Tatar group selfidentified as Tatars. The non-Tatar group included participants of unknown or mixed ancestry from Volga District of Tatarstan Republic. The study participants in the Slavic group selfidentified with some or several Slavic ethnicities from Moscow, Russian Federation. All participants provided informed consent.

DNA Isolation
Whole blood samples were collected from all study participants. Genomic DNA was isolated from the blood using the QIAamp DNA Blood Mini QIAcube Kit (Qiagen) and quantified using the NanoVue Plus Spectrophotometer (GE Healthcare).

Targeted Next-Generation Sequencing (NGS)
Targeted NGS was performed in a cohort of 199 HBOCS patients from the Volga District of the Tatarstan Republic. DNA (100 ng) was used to generate sequencing libraries. The NimbleGen SeqCap EZ Choice kit ("Roche") was used for target enrichment and sequencing was performed using the Illumina MiSeq ("Illumina") following the manufacturer's protocol. Rawdata reads were aligned to the human reference genome (hg19) using the aligner BWA (MEM algorithm) with BamQC, FastQC, and NGSrich quality control checks. GATK Haplotype v3.

RT-PCR Assay
RT-PCR analysis was used to assess the presence or absence of a CDK12 c.1047-2A>G nucleotide variant in 93 patients with sporadic OC and/or BC, 238 healthy participants of Tatar ethnic origin, 95 HBOCS patients, 80 patients with sporadic OC and/or BC, and 372 healthy participants of Slavic ethnic origin. RT-PCR was performed using TaqMan probes (FAM-atttcCtAcTgGaAaa-BHQ-1 for wild-type, VIC-atttcCtAcCgGaAaa-BHQ-2 for c.1047-2A>G mutation) and the following primers: forward 5 ′ -TGGCACTTAATCTATTTTACA-3 ′ , reverse 5 ′ -GGATCTCTTCTTTTTACTATGA-3 ′ . RT-PCR was carried out on a thermal cycler "StepOnePlus" (Applied Biosystems, USA) with a 10 µL final volume containing TurboBuffer (Evrogen, Russia), 400 nM forward and reverse primers, 150 nM probes, 1.5 unit Taq DNA polymerase, and 20-50 ng of genomic DNA. Thermocycling conditions: a first cycle at 95 • C for 2 min; 40 cycles at 94 • C for 10 s, and 40 cycles at 56 • C for 90 s. PCR product size was 200 bp. Analysis of the amplification product was performed with the "end point" detection method using built-in thermocycler software tools accompanying SDS version 1.4. Positive control DNA was used to validate assay sensitivity of and analyzed in parallel with all samples. Presence of CDK12 c.1047-2A>G nucleotide variation was determined by targeted NGS and confirmed by RT-PCR assay.

Statistical Analysis
Standard statistical tests were used to analyze the data, including a two-tailed Fisher exact test performed with the R software (v.3.3). Statistical significance was defined as a p value less than 0.05. Values was obtained from fisher.test function.

RESULTS
In a group of 199 HBOCS patients from the Volga district, Republic of Tatarstan (106 of Tatar ancestry and 93 of mixed or unknown ancestry) we employed Targeted NGS to detect a total of 38 germline nucleotide variant markers in 8 genes from a panel of 33 genes. The frequencies of the markers are shown in Table 3.
We also performed Targeted NGS for the CDK12 gene and identified a c.1047-2A>G nucleotide variant in 8 of the 106 patients of Tatar descent. The presence of c.1047-2A>G in the CDK12 gene, identified by Targeted NGS, was confirmed by RT-PCR (data not shown). In a cohort of Slavic participants from Moscow, this nucleotide variant was absent in 95 patients with HBOCS, 80 patients with sporadic BC and/or OC, and 372 healthy controls as determined by RT-PCR. In a cohort of participants from the Volga District, Republic of Tatarstan, the frequency of c.1047-2A>G mutation was significantly higher in HBOCS patients compared to healthy controls (9/199 vs. 1/238, p = 0.0066, OR = 11.18, CI 95% = 1.53-492.95, Table 4). The cohort of HBOCS patients from the Republic of Tatarstan included 106 patients of Tatar ethnicity, and 93 patients of non-Tatar, mixed, or unknown ethnicity. Given that the Tatars ethnic group is one of the most common in the Republic of Tatarstan, constituting almost 50% of the total population, we assume that  about half of the healthy donors randomly recruited to this study in the Tatarstan Republic were also of Tatar ancestry. All HBOCS patients with in silico pathogenic mutations of the CDK12 gene had negative HER2 status.
We also found several other nucleotide variants in the CDK12 gene in the group of HBOCS patients ( Table 5), with a deleterious prediction of pathogenicity determined by in silico tools (SIFT, PolyPhen2, MutationTaster, CADD, DANN, REVEL). Among the patients with HBOCS harboring CDK12 nucleotide variants determined as pathogenic, 21% also had pathogenic nucleotide variants in BRCA1 gene.
Forty three percent of the patients in HBOCS cohort were HER2 positive, but all patients carrying CDK12 c.1047-2A>G nucleotide variant were HER2 negative ( Table 5).
We hypothesized that the c.1047-2A>G nucleotide variant in the CDK12 gene could potentially affect splicing. In-silico splice site prediction analysis of the CDK12 c.1047-2A>G variant by MaxEnt, NNSPLICE, and HSF tools suggests that the variant is a splice site substitution in the acceptor splice site of intron 1, likely resulting in a skip of exon 2. Therefore, the CDK12 c.1047-2A>G mutation may lead to production of a shorter alternative splice transcript. Interestingly, we also found several other nucleotide variants in the CDK12 gene in the group of HBOCS patients (Table 5), with a greater than 90% deleterious prediction of pathogenicity determined by in silico tools. Among the patients with HBOCS harboring CDK12 nucleotide variants determined as pathogenic, 21% also had pathogenic nucleotide variants in BRCA1 gene.

DISCUSSION
The Russian population includes many ethnicities, and is characterized by huge genetic diversity. Slavic and non-Slavic ethnicities in Russia may have different profiles of nucleotide variants resulting in HBOCS predisposition. Therefore, it is possible that identification of novel ethno-specific markers will decrease false-negative results of genetic risk assessment. There is a degree of variability in the frequency of HBOCSassociated nucleotide variants in the BRCA1 and BRCA2 genes of non-Caucasian populations (31,32). Indeed, one of the most common markers in European populations, BRCA1 5382insC, was not found in hereditary BC patients from several non-Slavic indigenous populations (Altaians, Buryats, and Tuvinians) in Russia (31). Our previously published data on germline BRCA1 and BRCA2 nucleotide variants in a small group of Tatar patients with BC indicated the same trend (8). To the best of our knowledge, no data exists on the spectrum of disease-associated nucleotide variants in HBOCS patients of Tatar descent.
We tested multiple-gene panels for the presence of HBOCS predisposition markers in Tatar patients and detected several germline nucleotide variants in the BRCA1, BRCA2, CDK12, CDH1, CHEK2, FANCI, MUTYH, MSH2, and RAD51C genes, including some pathogenic variants previously reported in other populations. Strikingly, their prevalence and spectrum in Tatar HBOCS patients was found to be different to that reported in European populations, particularly in Russia (2,6,32).
Currently, nucleotide variants in the CDK12 gene are not included in panels of HBOCS predisposition markers, despite the fact that several lines of evidence strongly suggest CDK12 involvement in OC and BC pathogenesis. CDK12 has been found to be one of the most frequently mutated genes in high grade serous OC, harboring mutations in 3% of cases (23). In OC, CDK12 mutations deregulate expression of HRR pathway genes (33). In BC, CDK12 is found to be frequently co-amplified with the oncogene ERBB2. Such amplification may contribute to BC pathogenesis (34). Recent  (34). We proposed that CDK12 is involved in HBOCS and performed a Targeted NGS-based approach to identify disease-associated nucleotide variants of the CDK12 gene in the Tatar population.
In this study, we detected a novel germline nucleotide variant c.1047-2A>G in the CDK12 gene in a group of Tatar patients with HBOCS. The percentage of CDK12 c.1047-2A>G variants in Tatar and non-Tatar patients (106 and 93 patients assessed, respectively) was 4.5%, that is significantly higher than the   0.42% observed in a group of 238 healthy donors of mixed or unknown ancestry (Tatar and non-Tatar) from the same geographical region. One potential weakness of this study is the possibility that the healthy control group consists of primarily non-Tatar participants, which would result in a difference in the c.1047-2A>G nucleotide variant frequency between the HBOCS and control groups solely because the c.1047-2A>G variant occurs more frequently in the Tatar population. However, given that Tatar is one of the major ethnic groups in the Republic of Tatarstan, comprising almost 50% of the total population, we assume that about half of the healthy donors are of Tatar ethnicity. We also recruited a relatively large number of participants in a healthy control group (238 participants), to ensure a cohort that better represents the entire population. The frequency of the CDK12 c.1047-2A>G nucleotide variant in Tatar patients is relatively high and similar to the frequency of the BRCA1 5382insC, a founder-mutation present in many Russian populations. The c.1047-2A>G variant was detected in patients from apparently non-related families. Therefore, it is possible that CDK12 c.1047-2A>G is a founder mutation in the Tatar population, at least for the Tatar sub-population in the Kazan region. Importantly, carriers of the CDK12 c.1047-2A>G variant in the group of non-Tatar HBOCS patients from the Volga District were of Chuvash ethnicity, which is closely related to Tatars and belongs to the Turkic ethnic group under which Tatars are classified.
Overall, we conclude that CKD12 is a candidate gene for HBOCS syndrome. Currently, there is only one other report describing cancer patient carrying the CDK12 c.1047-2A>G nucleotide variant. Remarkably, it is also a patient with OC found in a cohort of OC patients in USA (35). We propose that CDK12 is involved in pathogenesis of other malignancies characterized by impaired HRR (10,12), and that c.1047-2A>G may be associated with such diseases. This indicates that CDK12 c.1047-2A>G could be used as a diagnostic marker.
Frequencies of this mutation in samples from the Exome Aggregation Consortium database [http://gnomad. broadinstitute.org/](36) are extremely low ( Table 6). Nevertheless, it is present in several populations, with highest frequency of 0.1% occurring in South Asian populations. We determined the frequency of CDK12 c.1047-2A>G mutation in healthy participants from the Volga District of the Republic of Tatarstan to be 0.42%. This raises the question whether c.1047-2A>G should be classified as a mutation or a nucleotide polymorphism (37). Therefore, we define c.1047-2A>G as a nucleotide variant and classify it as pathogenic in accordance with recommendations of the American College of Medical Genetics and Genomics (ACMG) (38).
The Tatar population in the Volga region has low interpopulation differentiation (39), which indicates that the results of the current study may be extrapolated to the whole Tatar population in the Volga region of the Republic of Tatarstan. Importantly, Tatars who live in the eastern regions of Tatarstan have genetic similarity to the Bashkirs ethnic group (39). Thus, we expect that the c.1047-2A>G nucleotide variant in the CDK12 gene might be involved in HBOCS in individual of Bashkirs ethnicity as well, which should be addressed in further studies. The relatively high percentage of c.1047-2A>G among healthy participants in our study may have several explanations. There is a possibility that even if asymptomatic carriers of c.1047-2A>G have not developed the disease yet, they eventually will. Alternatively, c.1047-2A>G may result in a "disease predisposing" phenotype, but the second mutation, present among patients but is absent in healthy controls, is necessary to trigger the disease as delineated by the "two hit" hypothesis (40). Finally, carriers of the c.1047-2A>G nucleotide variant in healthy group may also harbor "protective" nucleotide variant(s) (yet unknown), which neutralize the pathogenic effect of c.1047-2A>G (41). Identifying such protective nucleotide variants would open an avenue for new therapeutic strategies.
The CDK12 gene is located on chromosome 17q12 and is comprised of 14 exons. Currently, there are two identified  (42,43). In BC, pharmacological inhibition of CDK12 reverses PARP inhibitor resistance in both BRCA wildtype and BRCA-mutant cells (44). If carriers of the c.1047-2A>G nucleotide variant have non-functional CDK12 protein, they may exhibit increased sensitivity to PARP inhibitors. CDK12 gene is located in close proximity to the oncogene ERBB2, also known as HER2. In BC, CDK12 is frequently co-amplified with the HER2 (34). Previously, a correlation of HER2 status and CDK12 level was found in a cohort of BC patients. In most of the HER2 amplified tumors level of CDK12, both mRNA and protein, was high, while absence of CDK12 was rarely observed (45). While 43% of patients in the cohort were HER2 positive, all patients harboring pathogenic nucleotide variants in CDK12 were HER2 negative. Whether HER2 negative status is a functional consequence of the presence of pathogenic nucleotide variants in CDK12 is beyond the scope of current research, but should be addressed in future studies.
Overall, our study demonstrates that prevalence of diseaseassociated mutations in the BRCA1 and BRCA2 genes in the Russian population is significantly different in patients of Tatar and Slavic ethnic origins. We identified the c.1047-2A>G germline nucleotide variant in the CDK12 gene, which may result in an alternative CDK12 splice variant and is strongly associated with HBOCS. We recommend that this variant become part of the standard testing panel for HBOCS susceptibility markers in Tatar patients with a family history of OC and BC. Incorporation of the c.1047-2A>G marker in this genetic diagnostic panel may also lead to improved therapeutic strategies, such as stratification of the patients according to potential sensitivity to PARP inhibitors. This finding also confirms the role of CKD12 as a candidate gene for HBOCS predisposition.