Germline landscape of RPA1, RPA2 and RPA3 variants in pediatric malignancies: identification of RPA1 as a novel cancer predisposition candidate gene

Replication Protein A (RPA) is single-strand DNA binding protein that plays a key role in the replication and repair of DNA. RPA is a heterotrimer made of 3 subunits – RPA1, RPA2, and RPA3. Germline pathogenic variants affecting RPA1 were recently described in patients with Telomere Biology Disorders (TBD), also known as dyskeratosis congenita or short telomere syndrome. Premature telomere shortening is a hallmark of TBD and results in bone marrow failure and predisposition to hematologic malignancies. Building on the finding that somatic mutations in RPA subunit genes occur in ~1% of cancers, we hypothesized that germline RPA alterations might be enriched in human cancers. Because germline RPA1 mutations are linked to early onset TBD with predisposition to myelodysplastic syndromes, we interrogated pediatric cancer cohorts to define the prevalence and spectrum of rare/novel and putative damaging germline RPA1, RPA2, and RPA3 variants. In this study of 5,993 children with cancer, 75 (1.25%) harbored heterozygous rare (non-cancer population allele frequency (AF) < 0.1%) variants in the RPA heterotrimer genes, of which 51 cases (0.85%) had ultra-rare (AF < 0.005%) or novel variants. Compared with Genome Aggregation Database (gnomAD) non-cancer controls, there was significant enrichment of ultra-rare and novel RPA1, but not RPA2 or RPA3, germline variants in our cohort (adjusted p-value < 0.05). Taken together, these findings suggest that germline putative damaging variants affecting RPA1 are found in excess in children with cancer, warranting further investigation into the functional role of these variants in oncogenesis.


Introduction
Maintenance of genome integrity requires efficient DNA repair.The perturbation of processes engaged in repair of DNA damage by somatic mutations is a well-known mechanism for oncogenesis.Germline biallelic inactivation of genes governing DNA repair leads to classic cancer predisposition syndromes such as Fanconi anemia, ataxia telangiectasia and Bloom syndrome, among others (1)(2)(3)(4).Monoallelic mutations impacting some of these genes can also increase the risk for cancer (5)(6)(7)(8)(9).We recently discovered that germline heterozygous mutations in the Replication Protein A1 (RPA1) gene cause Telomere Biology Disorder (TBD), a hereditary condition classically associated with pathological shortening of telomeres resulting in bone marrow failure (BMF), pulmonary and liver fibrosis, mucocutaneous fragility, and predisposition to solid tumors, myelodysplastic syndromes (MDS) and acute myeloid leukemia (AML) (10).
The RPA1 protein is the largest subunit of Replication Protein A (RPA), a heterotrimeric complex consisting of RPA1 (RPA70), RPA2 (RPA32) and RPA3 (RPA14).As a complex, RPA tightly binds single-strand DNA (ssDNA) to protect it from nucleases while maintaining DNA accessible to essential DNA-DNA and DNA-protein interactions.Consistent with the ubiquitous and ongoing formation of ssDNA, RPA is present and required across almost all cellular processes during replication, recombination, and repair of DNA.In fact, RPA is involved in all ssDNA repair pathways (nucleotide excision, base excision, mismatch) and double strand DNA repair mechanisms (homologous recombination, non-homologous end joining) (11)(12)(13).RPA participates in such diverse pathways through its ability to dynamically bind ssDNA while facilitating DNA repair and cell cycle protein interactions (11).
The essential role of RPA in DNA repair might lend RPA to be mutated in cancers.By mining the Catalogue Of Somatic Mutations In Cancer (COSMIC) database (14), we found that somatic mutations in RPA1, RPA2, RPA3 are found in 1.4%, 0.5%, and 0.9% of human cancers, respectively.In our previously published cohort of 4 patients with TBD, one patient who carried a germline RPA1 p.V227A mutation developed advanced MDS requiring hematopoietic stem cell transplantation.All 3 RPA1 germline mutations (p.V227A, p.E240K, p.T270A) identified in the 4 cases were missense and 2 out of 3 exerted a gain-of-function effect, resulting in increased binding to single strand and telomeric DNA (10).Besides these descriptions associating germline RPA1 variants with bone marrow failure or hematologic malignancies, the RPA2 or RPA3 genes have not been linked to any human diseases thus far.Moreover, the landscape of germline variants in RPA heterotrimer in malignancies has not been systematically assessed.To address this knowledge gap, we investigated the occurrence of novel and rare germline variants in RPA1, RPA2 and RPA3 genes, in a cohort of 5,993 children with cancers.We found that ultra-rare and novel germline variants in the RPA1 gene were significantly more common among pediatric cancer patients than non-cancer controls.Furthermore, we examined a separate cohort of 41 young adults with AML and identified potentially deleterious RPA1 germline variants in 3 cases.Our studies indicate that the RPA1 gene may be a novel risk factor for malignancies.

Variant calling and filtering
Variant calling and genotyping were performed using Genome Analysis Toolkit's (GATK) best practices workflow with modifications as described previously (19).We retained high quality variants that passed filtering using following criteria: allelic balance > 0.2, genotype quality > 20, variant allelic frequency (VAF) for heterozygous variants between 20-80%, minimum of 10 alternate reads supporting single nucleotide variants (SNVs) and 7 alternate reads supporting InDels, and missingness < 25% of samples.We performed variant annotation using ANNOtate VARiation (ANNOVAR) and variant effect predictor (VEP) tools (20).We also annotated all the variants using InterVar (21) automated clinical interpretation based on the American College of Medical Genetics and Genomics (ACMG) guidelines (22).

Computational analysis of RPA mutations
We performed local coordinate minimization followed by global side-chain optimization with the Atomic Multipole Optimized Energetics for Biomolecular Applications (AMOEBA) polarizable force field (25) on 5 high resolution structures of RPA fragments collectively comprising 7 modular domain of RPA heterotrimer.These included X-ray structures of the DNA binding domains A and B, DBD-A and DBD-B (PDB: 1JMC) (26), and the RPA trimerization core composed of DBD-C, D and E (PDB: 1L1O) (27) and NMR structures of the DBD-F (5N8A) (28) and the wing helix domain (PDB: 1DPU) (29).Prior to minimization, the ssDNA was removed from the 1JMC structure and bound peptides were removed from the 2 NMR structures.We then used our optimized structures to predict protein stability differences DDG Fold (DDG untrained (DDGun)) (30).DDGun estimates the DDG Fold of missense variants from a linear regression of sequence and biochemical features determined from the protein structure.Destabilizing DDG Fold values indicate a decrease in the ratio of folded to unfolded protein due to the mutation (we define negative DDG Fold values as stabilizing and positive DDG Fold values as destabilizing).We established DDG Fold cut-offs for mutations highly likely to impact protein folding.Our cut-offs were determined based on a DDG Fold that affects the ratio of folded to unfolded protein 12-fold (~1.5 kcal/mol) for both stabilizing and destabilizing mutations.

Statistical analysis
We performed rare-variant burden tests for RPA1, RPA2, RPA3 variants using 5,993 cases from all pediatric cancers in our cohorts (pan-cancer) and within each sub-class of cancers, namely, hematologic (n = 3,452), solid (n = 1,974), and central nervous system CNS (n = 1,068) malignancies.For the control set, we retrieved all variants across RPA1, RPA2, and RPA3 from gnomAD v2 non-cancer subset containing 134,187 individuals with no reported malignancy (23).All variants from control dataset were processed through the same variant annotation and filtering workflow as our cancer cohort (AF < 0.5%).Enrichment tests for cases with and without germline ultra-rare (AF < 0.005%) plus novel (AF 0%) and rare (AF < 0.1%) variants in the 3 genes were performed using both two-and one-sided Fisher exact tests using the statistical package R (v4.3) described in previous studies (31,32).We used Bonferroni correction to adjust for multiple testing with a significance cutoff of adjusted p-value of < 0.05.

Results
Variants identified among the RPA heterotrimer genes Within the pan-cancer cohort, we identified 80 cases with 55 germline heterozygous RPA1, RPA2 or RPA3 variants meeting criteria of AF < 0.5% in gnomAD non-cancer cohort and CADD score > 15 for candidate variant selection (Figure 1A).Specifically, 40 RPA1, 7 RPA2 and 8 RPA3 unique heterozygous germline variants were identified in 63, 7 and 10 cases, respectively (Figure 1B).All variants were classified as variant of uncertain significance (VUS) according to the ACMG criteria (Tables 1-3).Majority of the variants (92% of RPA1, 71% of RPA2 and all RPA3 variants) had CADD scores > 20, indicating a higher probability of a deleterious effect (Tables 1-3).In addition, looking at variant burden in population, we found that 98% (54/55) of the identified RPA heterotrimer variants had AF < 0.1% (this includes rare, veryrare, ultra-rare, and novel variants, Figure 1B).All RPA1, RPA2 and RPA3 variants are mutually exclusive and no cases with compound heterozygous or homozygous variants were identified.

RPA1 germline variants and cancers
RPA1 (616 amino acids, 70kDa) is the largest of the 3 subunits of the RPA heterotrimer.We discovered 1.05% (63/5993) of the cohort to harbor heterozygous germline RPA1 variants (Figure 1C; Tables 1-3), which was statistically not significant compared to gnomAD non-cancer controls for all cancers and cancer subtypes (Table 4).RPA1 has 4 modular oligosaccharide binding-fold domains commonly referred to as functional DNA binding domains (DBD): F, A, B and C spanning the N-to C-terminal regions of the protein.RPA1 variants were found across all 4 DBDs as follows: 6 in DBD-F, 15 in DBD-A, 10 in DBD-B, and 26 in DBD-C (Figure 1C).Of note, 6 cases were found to have RPA1 variants in the linker regions between 2 DBDs.All RPA1 variants were missense (Figure 1C) except for p.L53lfs*53 within DBD-F, which was found in 1 case.Three recurrently mutated amino acids were discovered in RPA1 domains DBD-A (p.V286, 9 cases), DBD-B (p.R389, 5 cases), and DBD-C (p.G437, 5 cases).We next focused specifically on novel and ultra-rare RPA1 variants (33), present in 14 and 21 cases, respectively (Figure 1B; Tables 1-3).Notably, we found significant enrichment of RPA1 novel and ultra-rare variants in our cohort (adjusted p-value < 0.05, Table 4).
Prediction of variant structural effect was performed by calculating protein stability change scores (DDG Fold ) with a DDG Fold that affects the ratio of folded to unfolded protein 12-fold (~1.5 kcal/mol) for both stabilizing and destabilizing mutations.Significant scores (>1.5 kcal/mol) were demonstrated for 4 variants (p.M46T in DBD-F, p. R234G in DBD-A, p.W361L in DBD-B, p.V594G in DBD-C) which were novel or ultra-rare (Tables 1-3).RPA1 p.M46T is likely to destabilize folding of DBD-F resulting in the loss of multiple important protein-protein interactions (11).W361 is a key DNA binding residue in DBD-B and human cells with W361A support normal replication but are deficient in DNA repair (12, 34), suggesting that p.W361L may destabilize DBD-B folding resulting in hypomorphic RPA.
We next assessed which types of malignancies were present in patients with RPA1 variants (Figure 1A).We found comparable frequency of cases with RPA1 variants across solid tumors (n = 22, 1.1%), CNS cancers (n = 12, 1.1%) and hematological malignancies (n = 29, 0.8%).Among solid tumor cases with RPA1 variants, 31.8%(7/22) presented with sarcomas and 27.3% (6/22) were diagnosed with neuroblastoma.Notably, the 7 sarcoma cases carried 6 unique RPA1 variants (n = 2 novel and n = 1 ultra-rare) and only one was noted to have a concomitant germline mutation (Table 2).All 6   Frontiers in Oncology frontiersin.orgneuroblastoma cases were found to have unique RPA1 variants (n = 3 novel, n = 2 ultra-rare) of which half were found to have germline variants reported in PALB2/NDRG4, MDC1, or TP53 genes (Table 2).Three cases of retinoblastoma harbored unique RPA1 variants (2 ultra-rare) with 2 cases having concomitant germline RB1 mutation (Table 2).Two cases of Wilms tumor were identified to have germline RPA1 variants.Among the single cases of solid tumors (germ cell tumor, melanoma, nasopharyngeal carcinoma, and papillary thyroid carcinoma), 2 ultra-rare and 1 novel RPA1 variants were found.Among cases with CNS tumors, 4 patients with medulloblastoma harbored novel (n = 3) or ultra-rare (n = 1) RPA1 variants.Each of these cases also carried other germline mutations (PBRM1, C7 and MYH9, BRCA1, ANKRD26) of which BRCA1 and ANKRD26 are cancer predisposition genes (Table 3).Furthermore, 4 cases with high grade glioma harbored 3 RPA1 variants (n = 1 novel, n = 1 ultra-rare), all clustering within DBD-C domain of RPA1.These patients had no other potentially causative germline variants reported in other predisposition genes.Among the 3 low grade glioma, 3 RPA1 variants (one ultra-rare) were identified, with one harboring other germline mutations in SDHA and RUNX1.Lastly, one ultra-rare RPA1 variant was identified in a case of ependymoma without other germline mutations (Table 3).

Discussion
The RPA heterotrimer is an essential protein for binding ssDNA encountered in cellular transactions to facilitate DNA-DNA and DNA-protein interactions during DNA replication, repair, recombination, RNA transcription, and telomere maintenance.As such, mutations in this genome maintenance protein have been linked to cancer formation in mice (35) and are acquired in up to ~1% of human cancers (14).We recently demonstrated that heterozygous germline RPA1 mutations RPA1 c.680T>C p.V227A, c.718G>A p.E240K and c.808A>G p.T270A in DBD-A are associated with TBD, which predisposes to hematologic and solid tumors.In this study, one patient with RPA1-related TBD developed MDS (10).Based on these data, we reasoned that germline defects in RPA1 and possibly also the other 2 components of the RPA heterotrimer (RPA2 and RPA3) might be associated with cancer development.To this end, we investigated comprehensive germline genomic data for the presence of heterozygous variants in RPA1, RPA2 and RPA3 across a large series of pediatric hematologic, solid and CNS malignancies.We discovered significant enrichment of ultra-rare and novel RPA1 germline variants in our pediatric cancer cohort compared to noncancer controls, positioning RPA1 as a novel candidate predisposition gene.Moreover, in an additional cohort of 41 patients with AML, we identified 3 heterozygous germline RPA1 variants (c460G>A, p.T154A; c.1397C>G, p.A466G; c.1538G>A, p.R513H) with potential pathogenic effect.
RPA1 harbored the most variants likely due to its larger size compared to RPA2 and RPA3.Although we did not observe a statistically significant enrichment of putative damaging variants in RPA2 and RPA3, some of the identified variants were novel or ultrarare and could possibly have a deleterious effect.Thus, RPA2 and RPA3 could be considered as genes of unknown significance (GUS) yet potentially important in tumor formation.All 3 proteins are required to fold properly to form a functional RPA heterotrimer (13).For this reason, we calculated stabilities of the RPA modular In our discovery cohort, we identified 5 AML cases with germline RPA1 variants.One had an ultra-rare RPA1 p.L58F variant in DBD-F and the remaining 4 had variants affecting nucleotide 856 within DBD-A domain (c.G856A, p.V286I in 3 cases and c.G856T, p.V286F in one case).The resulting amino acid changes do not differ in size or charge from wild-type valine and have a neutral protein folding score of 0.6.However, these mutations may disrupt protein-protein, protein-DNA interactions, or post-translational modifications, which are known mechanisms implicated in pathogenicity of RPA1 variants in various experimental models (10)(11)(12)(13)35).Additionally, TBD-associated pathogenic RPA1 variants, p.V227A, p.E240K and p.T270A, have protein folding scores of 1.4, 0.1 and 0.2 (consistent with normal protein folding shown in biochemical assays) yet were shown to exert gain-of-function effect on DNA binding and melting of telomeric Gquadruplexes (10).Three of the 4 AML cases with RPA1 variants in DBD-A domain had additional germline variants in genes (NOTCH2, FANCD2, MLL, HIP1) which, together with RPA1 may have an epistatic effect to cause overall genomic instability.Corroborating data from a small cohort of 41 AML patients in which 3 patients carried RPA1 variants (p.T154A in linker region; p.A466G and p.R513H in DBD-C) deserves further investigation.Beyond RPA1 in the AML cohort, we also found a novel germline missense variant in RPA3 in an infant with AML who also harbored a germline truncating variant in the DNA helicase, RTEL1, which is associated with TBD (36,37).More functional studies are needed to determine the pathogenicity of RPA1 V286I/F alterations and their role in hematologic malignancy.
Among the 13 CNS tumors with variants in RPA heterotrimer genes, 9 cases were high grade neoplasms, including medulloblastoma and high-grade glioma.Interestingly, 3 of the 5 medulloblastoma cases had novel and one very rare germline RPA1, as well as one ultra-rare RPA3 variant.Notably, even though variants in other unrelated genes were also found in 4 of the 5 medulloblastoma cases, none of these genes have been previously associated with medulloblastomas in the literature.Other studies have identified germline defects in DNA repair genes in medulloblastoma (38,39).It would stand to reason that germline mutations in the RPA heterotrimer, which functions in almost all DNA repair pathways, could potentiate oncogenic transformation.Further investigation should focus on assessing the function of RPA mutant proteins in DNA repair and their contribution to tumor biology.
Our study has several limitations.Although all cases were assessed using a uniform pipeline, the cohort is skewed towards cases with B-ALL (~4-fold higher number of B-ALL compared to solid and CNS cancers).We included all germline and somatic mutations per case that were reported in previously published studies; however, this information was unavailable for a proportion of cases and therefore we cannot make definitive conclusions about RPA variants being the sole germline driver in these cancers.Although ultra-rare and novel heterozygous germline variants in RPA1 were significantly enriched in pediatric cancers, it is difficult to ascertain pathogenicity and clinical relevance without functional follow-up, which falls beyond the scope of this study.It is plausible that variants with high in-silico protein folding energy, ultra-rare and/or novel allelic frequency and high pathogenicity scores may be clinically relevant and should be among the top variants to explore in future studies.
In summary, evasion of DNA repair mechanisms is a common theme among cancers.RPA is an essential protein for DNA replication and repair.Our study describes novel and rare variants with potentially deleterious effect in the RPA1, RPA2 and RPA3 genes in pediatric malignancies.Moreover, we have identified enrichment of RPA1 variants in cancer cases compared to noncancer controls, suggesting that this gene potentially acts as a novel cancer driver.We plan to exploit our findings and perform further functional and biochemical characterization of recurrent cancer associated RPA1 variants to assess their potential use as targets for future cancer therapies.

1
FIGURE 1 Germline heterozygous variants in the RPA heterotrimer in pediatric cancers.(A) Number of pediatric cancer cases with either RPA1, RPA2 or RPA3 heterozygous germline variants.B-ALL (B cell acute lymphoid leukemia), T-ALL (T cell acute lymphoid leukemia), AML (acute myeloid leukemia).Unique cancers are identified by different colors represented in the legend.(B) Number of cancer cases with either novel (pink), ultra-rare (gold), very rare (light teal), or rare (blue) germline variants in RPA1, RPA2 or RPA3 according to gnomAD allelic frequency.Schematic of human RPA1 (DNA binding domain (DBD-F, A, B, C)) (C), RPA2 (DBD-D) (D) and RPA3 (DBD-E) (E) proteins with germline variants denoted.Blue and red lettering represents missense and frameshift variants, respectively.Numbers within circles represent the number of cases that harbored that variant while lack of numbering denotes one case per variant.Variants found in hematologic cases are represented on top and solid (intra and extra cranial) malignancies are denoted at the bottom of each protein map.* = ultra-rare variant allelic frequency (< 0.005%), # = novel variants.

TABLE 2
Germline heterozygous variants found in RPA1, RPA2 and RPA3 in extra-cranial solid tumors.

TABLE 3
Germline heterozygous variants found in RPA1, RPA2 and RPA3 in extra-cranial solid tumors.

TABLE 4
Statistical analysis using two-and one-sided Fisher exact tests of ultra-rare plus novel and rare germline heterozygous variants in RPA1, RPA2 and RPA3 across hematologic, extra-cranial solid and CNS tumors.