New Pathogenic Germline Variants in Very Early Onset and Familial Colorectal Cancer Patients

A genetic diagnosis facilitates personalized cancer treatment and clinical care of relatives at risk, however, although 25% of colorectal cancer cases are familial, around 95% of the families are genetically unresolved. In this study, we performed gene panel analysis on germline DNA of 32 established or candidate colorectal cancer predisposing genes in 149 individuals from either families with an accumulation of colorectal cancers or families with only one sporadic case of very early onset colorectal cancer (≤40 years at diagnosis). We identified pathogenic or likely pathogenic genetic variants in 10.1% of the participants in genes such as APC, POLE, MSH2 or PMS2. The MSH2 variant, c.2168C>T, p.(Ser723Phe) was previously described as a variant of unknown significance, but we have now reclassified it to be likely pathogenic. The POLE variant, c.1089C>A, p.(Asn363Lys) was identified in a patient with three metachronous colorectal cancers from age 28 and turned out to be de novo. One pathogenic PMS2 variant was novel. We also identified a number of highly interesting variants of unknown significance in APC, BUB1, TP53 and RPS20. The RPS20 variant is novel and was found in a large Amsterdam I positive family with a multi tumor phenotype including 12 cases of CRC from as early as age 24. This variant was found to segregate with cancer in the family and multiple in silico tools predict it to be pathogenic. Our data further support the shift from phenotypic-based cancer panels to large panels including all established genes involved in hereditary cancer syndromes or (targeted) whole genome sequencing. Additionally, identification of a likely disease-predisposing variant in RPS20 expands the phenotypic spectrum of RPS20-related cancers and emphasize that this gene is relevant to include in colorectal cancer gene panels.


INTRODUCTION
Colorectal cancer (CRC) is one of the most frequent types of cancer worldwide and is now the second most common cancer in Denmark. Approximately 20% to 30% of the cases report a family history with other cases of CRC, however, in more than 95% of the affected cases a genetic etiology cannot be identified (Schubert et al., 2019). The families are highly heterogeneous regarding phenotypes, inheritance patterns and overall lifetime cancer risk, making genetic counseling and surveillance a challenge. An established genetic diagnosis facilitates personalized cancer treatment, and surveillance of affected and unaffected carriers, emphasizing the importance of identifying the genetic background of the disease.
Traditionally, Danish patients or families suspected of having hereditary CRC, for example due to familial aggregation of CRC, early-onset disease or multiple primary hereditary non-polyposis colorectal cancer (HNPCC) associated tumors, have been offered genetic counseling and genetic test of the Lynch Syndrome-predisposing mismatch repair (MMR) genes (MLH1, MSH2, MSH6, PMS2) and EPCAM, and in case of several colonic adenomas in addition APC and MUTYH. In the past 5 years a larger gene panel consisting of 17 CRCpredisposing genes (APC, AXIN2, BMPR1A, EPCAM, GREM1, MLH1, MSH2, MSH3, MSH6, MUTYH, NTHL1, PMS2, POLD1, POLE, PTEN, SMAD4 and STK11) has usually been analyzed, however, the great majority of cases are still genetically unresolved.
Lack of a genetic diagnosis is a worldwide issue in familial and early onset CRC and has led to many studies utilizing larger cancer panels in search of genetic explanations. The number of analyzed genes has varied, but depending on the cohort and previous genetic analyses, a monogenetic etiology -often including variants in genes with uncertain clinical impact and low-or moderate risk variants -has been identified in up to 22% of the patients analyzed, with the highest diagnostic yield in younger patients (Chubb et al., 2015;Mork et al., 2015;Hansen et al., 2017;Pearlman et al., 2017;Yurgelun et al., 2017;Dominguez-Valentin et al., 2018;Martin-Morales et al., 2018;Stoffel et al., 2018). However, some studies have shown a very limited diagnostic yield when analyzing well-established cancer predisposition genes suggesting that other genetic inheritance patterns or mechanisms should be sought. Several possible disease-causing genetic mechanisms have been proposed including variants in not yet identified highly penetrant cancer genes, mosaicism, regulatory-and deep intronic variants in known cancer genes, epigenetic alterations, or di-, oligo-or polygenic inheritance (Schubert et al., 2019); further studies exploring these mechanisms are warranted.
In this study, we aimed at identifying rare or novel germline variants in 32 established or suggested cancer predisposition genes in a cohort of highly selected Danish patients with either very early onset sporadic CRC (i.e., ≤40-years-old) or in families with familial CRC and without identified MMRdeficiency.

Patients
The participants were recruited from two cohorts: (1). Families with familial CRC (the 'Familial CRC cohort') and (2). Families with only one case of early onset CRC (the 'Early onset CRC cohort'). Family data was extracted from the Danish Hereditary Non-Polyposis Colorectal Cancer (HNPCC) registry (Clinical Research Centre, Copenhagen University Hospital, Hvidovre, Denmark). The registry covers all parts of Denmark and has, since 1991, records of all families with, or suspected of having, hereditary CRC. In addition, some patients/families were identified through genetic counseling in Department of Clinical Genetics, Rigshospitalet, and invited/included in the study. They fulfilled the same inclusion criteria. For both cohorts, we included patients without known Lynch Syndrome, or not previously tested. Patients/families with previously identified variants of unknown significance in cancer genes were kept in the study in order to search for alternative explanations. Previous identification of pathogenic variants in other CRC-predisposing genes caused exclusion.
The patients included in this study had gene panel analyses performed by January 1st, 2020. Flow diagram of the inclusion process can be found in Figure 1.

The Familial CRC Cohort
We received data on all Amsterdam I or II positive families (i.e., families with at least three cases of CRC (= Amsterdam I) or HNPCC-associated cancers (cancer of the endometrium, small intestine, ureter or renal pelvis = Amsterdam II), affecting at least two successive generations, with one relative diagnosed before the age of 50 years; one should be a first-degree relative of the other two and familial adenomatous polyposis (FAP) should be excluded) without, or not tested for, Lynch Syndrome. A total of 249 families fulfilled the search criteria as of March 24th, 2015. Based on pedigrees, pathology reports and previous molecular analyses, we selected the families most likely to have a monogenic, non-MMR high risk variant, i.e., families with a high number of affected individuals with CRC (preferably synchronous/metachronous CRC), multiple primary cancers or colonic adenomas (preferably advanced adenomas, i.e., size ≥ 10 mm., with high grade dysplasia or villous/tubulovillous morphology), young age at onset and a clearly dominant inheritance pattern without unaffected generations, and with available DNA or a living affected individual. A total of 181 families were selected for inclusion. All recruited subjects had to fulfill our inclusion criteria by having (1). Any type of cancer or (2). Colonic adenomas (either ≥3 colorectal adenomas or ≥1 advanced, colorectal adenoma).

The Early Onset CRC Cohort
We received data on all families with only a single case of CRC before age 50, and without a family history of CRC in first degree relatives and grandparents (n = 596 patients as of February 15th, 2017). The search criteria only included patients without genetically identified Lynch Syndrome, or not previously tested. Patients with CRC between 18 to 40 years at time of diagnosis (n = 198) were recruited.

Inclusion and Follow Up
After updating the pedigree with relevant clinical information, some families did not fulfill the inclusion criteria. In the early onset CRC cohort this could be due to a newly developed case of CRC in the family, resulting in CRC in a first or second degree relative, and in the familial CRC cohort it could be due to original inadequate family data, such as a case of polyposis not reported to the registry. Since family history is a process of constant development, we kept these families in the study.
All living patients received written and oral information as well as genetic counseling, and a written informed consent was obtained. Ethical approval was obtained from the Danish Committee on Health Research Ethics (reference: H-4-2014-050). In total, we included 149 individuals: 50 patients with early onset CRC and 99 patients from 85 families with familial CRC. Characteristics of the participants are shown in Table 1.

DNA Extraction
Genomic DNA was extracted from whole blood samples using ReliaPrep Large Volume HT gDNA Isolation Kit (Promega, Madison, WI, United States) using a Tecan Freedom EVO HSM2.0 Workstation according to the manufacturer's instructions.

Sequencing
The following 32 genes were examined by next-generation sequencing (NGS): APC (NM_000038), AXIN2 (NM_004655), Target DNA sequences were captured using biotinylated oligos provided through Roche NimbleGen (Roche, Basel, Switzerland). The oligos were designed to capture all exons, including 50 bp of flanking intronic sequence. Library was constructed using 1400 ng of genomic DNA. The DNA was fragmented into an average size of 400 bp using a Covaris S2 AFA ultrasonicator. The trimming, 3 -adenylation and adaptor ligations were done on a Sciclone G3robot (Perkin Elmer, Waltham, MA, United States) using Illumina-compatible KAPA library DNA adaptors (Roche Diagnostics, Basel, Switzerland). Sequence capture was performed using the single capture protocol as described by Roche NimbleGen, where 6 to 12 samples are multiplexed before hybridization. Finally, 2 × 151-bp paired-end sequencing was performed on the Illumina MiSeq platform to an average depth of >50X (range 54.2-5893.4X) with a coverage of at least 20X in >98% of the targeted nucleotides and 30X in >97% of the targeted nucleotides.

Data Processing
Sequencing reads were trimmed and mapped to human reference genome hg19/GRCh37 using BWA-MEM v0.7.15 software (Li,  Cases of basal cell carcinomas were not included in this table. 1: Adenocarcinomas located in cecum, appendix, ascending-or transverse colon. 2: Adenocarcinomas located in the splenic flexure, descending colon or sigmoid colon. 3: Only adenocarcinomas. 4: One vulvar cancer (no other cancers), four endometrial cancers (two persons also had CRC, one person also had CRC and breast cancer and one person also had thyroid cancer) and two ovarian cancers (one also had CRC and breast cancer). 5: One person also had CRC, one person also had endometrial cancer and one person also had both CRC and ovarian cancer. 6: One duodenal cancer, one ileal cancer, one esophageal cancer (squamous carcinoma), one carcinoma at the major duodenal papilla, one hepatic cancer (a hepatocellular carcinoma). All patients also had CRC except the patient with duodenal cancer, who had >100 colonic adenomas. 7: Two persons had bladder cancer, one person had prostate cancer, one person had chronic lymphatic leukemia and one person had melanoma; they all also had CRC. One person had thyroid cancer and endometrial cancer. 8: Another person from the same family with three metachronous CRCs were also included. 9: Two persons from the same family had >100 and 25 colonic adenomas, respectively. Another person had 20 colonic adenomas. 10: Two persons with lack of PMS2 expression had pathogenic PMS2 variants. 11: In seven persons, IHC analysis revealed absence of one or more MMR-proteins, however, they all had a relative with a normal IHC analysis. Two persons had MLH1 promoter methylation.  (Poplin et al., 2017). Variant files were filtered to exclude variants covered by <10 reads or called in <20% of the sequencing reads.

Ingenuity Variant Analysis
Called variants were filtered using Ingenuity Variant Analysis (IVA 1 ). Firstly, variants with call quality < 20, read depth < 10 or variant allele frequency (VAF) < 15 were disregarded. Secondly, variants with an allele frequency > 5% of the public variant database including 1000 genomes project 2 , ExAC 3 , gnomAD 4 or NHLBI ESP exomes 5 , unless established as a pathogenic common variant, were excluded. Variants with a minor allele frequency (MAF) between 0.5% and 5% in any subpopulation in gnomAD were not further analyzed unless a class 4 or 5 variant were detected in a patient in a gene known to cause autosomal recessive cancer. Thirdly, variants in coding regions (including missense variants regardless of in silico prediction and synonymous variants) and splice-site variants (±10 bp) were kept for further analysis, as well as variants listed in ClinVar, with gain of function established in the literature or with a CADD score > 20.

Variant Classification
All variants identified after IVA processing were reviewed and classified according to the ACMG-AMP guidelines (Richards et al., 2015). We used the following five-class system: 1 = Not pathogenic/no clinical significance, 2 = Likely not pathogenic/little clinical significance, 3 = Uncertain, 4 = Likely Pathogenic, 5 = Pathogenic (Plon et al., 2008). Variant classification by expert panels such as ENIGMA 6 or InSiGHT 7 were followed unless new knowledge had emerged. All variants were analyzed manually and evaluated using different tools or databases such as Alamut Visual 8 including in silico splice prediction, LOVD 9 , ClinVar 10 , COSMIC 11 and literature search in PubMed 12 .

Validation of Variants
All pathogenic and likely pathogenic variants as well as all variants in Table 3 were validated by visual inspection using the Integrative Genomics Viewer 13 (Robinson et al., 2011), and all variants used in a clinical setting were validated from a new blood sample, either by NGS analysis or by Sanger sequencing.

RPS20 Segregation Analysis
In one family member from family 92, the RPS20 c.98A>T variant was identified in a clinical setting in another department of clinical genetics during the time our study was running.
In the rest of the tested family members, RPS20 c.98A>T segregation analysis was performed by Sanger sequencing (primer sequences are available upon request) when possible. For some of the family members only non-malignant formalinfixated paraffin-embedded (FFPE) tissue samples were available, and some of those were of a quality that did not allow Sanger sequencing, so NGS was used instead. One tissue sample failed with both methods.

Long Range PCR
For verification of the PMS2 variant c.2275+1G>C long range PCR (LR-PCR) was performed according to the instructions provided in the LR-PCR Kit (TaKaRa, Tokyo, Japan) using primers and conditions as previously described (Vaughn et al., 2010). A second set of primers was used for nested PCR avoiding polymorphisms located in the primer sequences. The primer sequences were tested using SNPCheck 14 and are available upon request.

Immunohistochemical Analysis
Formalin fixed, paraffin-embedded samples of CRC or adenomas were selected for immunohistochemical studies. Immunohistochemical evaluation, on 3 µm thick sections, was done using the following Ready-to-use antibodies MLH1 (clone ES05), PMS2 (clone EP51, MSH2 (clone FE11) and MSH6 (EP49) from Agilent following the manufacturer's instructions. The staining took place on the Omnis from Agilent utilizing the EnVision Flex + detection kit (GV800). The sections were counterstained with hematoxylin. Some samples have been analyzed in other departments of pathology, and historically only MLH1 and MSH2 or MLH1, MSH2, and MSH6 were analyzed. These samples might have been analyzed using other kits, however, all samples have been analyzed as part of a clinical evaluation by experienced pathologists.

Interpretation of IHC Stainings
The IHC stainings were interpreted by trained colorectal pathologists as either positive (retained nuclear staining in any number of tumor cells) or negative (complete loss of nuclear staining in all tumor cells). Normal colonic crypt epithelium adjacent to the tumor, lymphoid cells and stromal cells served as internal positive controls. In addition, on slide positive controls are a routine practice in our IHC laboratory.

RESULTS
We identified 12 pathogenic or likely pathogenic variants in 15 patients (10.1% of the included patients), listed in Table 2. High risk variants in MSH2, POLE and APC were identified in 3/85 families with CRC (3.5%) and variants in PMS2 were identified in 2/50 patients (4%) from the early onset CRC cohort. A total of 119 variants of unknown significance (VUS) were detected and are listed in Supplementary Table S1; the 19 most interesting VUS, based on frequency and CADD scores, are listed in Table 3. Variant filtering is summarized in Figure 2.

Pathogenic and Likely Pathogenic High-Risk Variants
The missense variant c.289G>A, p.(Gly97Arg) in APC was identified in two siblings (no. #14;16 and #14;20) with attenuated familial adenomatous polyposis (AFAP) phenotype. Segregation analysis revealed that a third sibling with AFAP also carried the variant. The variant has previously been reported in a Chinese patient with mild FAP (Wang et al., 2019). The variant creates a cryptic acceptor splice site and interrupts normal splicing; this family and the results of the functional analyses have already been published (Djursby et al., 2020). Based on these data, we consider the APC c.289G>A variant likely pathogenic. One patient (no. #27;10) had a likely pathogenic variant in POLE: c.1089C>A, p.(Asn363Lys). This variant is not reported in population allele frequency databases, but has been identified in two large families with multi tumor phenotypes (Rohlin et al., 2014;Vande Perre et al., 2019). The variant affects the highly conserved amino acid Asn-363 in the exonuclease domain of POLE, and so far only missense variants in this domain have been confirmed pathogenic (Bellido et al., 2016). Segregation analysis in the parents on healthy FFPE tissue indicates that the variant was de novo in our patient, which considerably increases the pathogenicity of the variant. Family data can be found in Figure 3, pedigree A. Based on in silico data, our clinical data and co-segregation data in the two large published families we classify this variant as likely pathogenic.
We identified two PMS2 variants in two patients with early onset CRC (no. #309;10 and no. #409;10). The first PMS2 variant was an indel: c.736_741delinsTGTGTGTGAAG, p.(Pro246Cysfs * 3). It is categorized as pathogenic by InSiGHT and has been identified in several Danish patients (Okkels et al., 2019). The second PMS2 variant was a splice site variant c.2275+1G>C and has to our knowledge not previously been reported. To avoid analysis of the PMS2 pseudogenes, the variant was confirmed using LR-PCR. Immunohistochemical analysis (IHC) also showed loss of the PMS2 protein in the tumor. Since the patient is deceased, it has not been possible to perform mRNA analyses, but the variant is predicted to disrupt normal splicing completely by five out of five in silico splicing programs in Alamut, and we consider it to be likely pathogenic.
In family 165 (Figure 3, pedigree B) we identified the MSH2 variant c.2168C>T, p.(Ser723Phe). The variant had already been detected in the family (Nilbert et al., 2009), but was considered to be a variant of unknown significance (which is also supported by InSiGHT classification) and the family was included in this study to search for alternative explanations. The index person (no. #165;10), who carries the MSH2 c.2168C>T variant, have had two primary cancers: Colon cancer (IHC: lack of MSH2 expression, microsatellite instability (MSI) status unknown) and Variants with a CADD score > 25 and an allele frequency < 0,001 in the population with the highest alternate allele frequency in exomes in gnomAD are listed. BRCA1, BRCA2, CDH1, MMR-, and PTEN variants categorized as benign or likely benign by expert panels (such as Enigma or InSiGHT) have been discarded. *Impact on splicing was evaluated using Alamut. Only differences in canonical splice site strength >10% (based on MaxEntScan) was noticed. As for cryptic splice sites, these are only mentioned if at least 4/5 in silico programs indicated that a new splice site was created. CDS, cryptic donor site. an adenocarcinoma at the major duodenal papilla (IHC: lack of MLH1/PMS2, and methylation of the MLH1 promoter). His daughter had rectal cancer 25-years-old [IHC: normal, but MSIhigh (MSI-H)] and she also carries the MSH2 variant. His mother developed colon cancer 44 years-old-old (IHC and MSI status unavailable), but she does not carry the variant. However, the family history is complex as the mother is also predisposed to CRC from another branch of the family, and the father of the index person died only 31-years-old of non-malignant disease. Ser-723 is a highly conserved amino acid and the variant is predicted to be disease causing by several in silico prediction tools. Several groups have evaluated this variant using in vitro MMR activity assays, yeast assays or murine or human embryonic stem cells; all studies indicate that the variant interrupts normal mismatch repair function and is pathogenic (Gammie et al., 2007;Drost et al., 2012;Houlleberghs et al., 2016;Rath et al., 2019). Based on a suggested functional effect in four studies, in combination with our data with co-segregation in two individuals with early onset CRC, we consider the MSH2 c.2168C>T variant likely pathogenic.

Variants of Unknown Significance
In family 92, a large Amsterdam I positive family with 13 cases of CRC (IHC analysis in two tumors from two different patients showed normal expression of the MMR-proteins) in addition to other cancers, we identified the RPS20 variant c.98A>T, p.(Glu33Val). This variant is absent from all population allele frequency databases and has to our knowledge not previously been reported in the literature. Glu-33 is a highly conserved amino acid and the variant is predicted to be pathogenic by several in silico programs and also to affect splicing by four out of five in silico splicing programs in Alamut by creating a new cryptic donor site, which may lead to a frameshift due to a loss of seven base pairs. Segregation analysis showed that five relatives with CRC from age 24 to 73 years old also carried the variant (Figure 3, pedigree C). In one person (#121;10) we identified the likely pathogenic TP53 variant c.814G>A, p.(Val272Met). The family history consists of three cases of CRC, two hematological cancers and one case of prostate cancer; all cancers were diagnosed after age 50 years. The variant had a VAF of 24% in two separate blood samples suggesting that the variant was either a hematopoietic clone or a case of classic mosaicism (the patient had the tumor removed surgically >10 years ago, which almost certainly excludes circulating tumor DNA from this tumor as a possibility). In order to clarify this issue, we sequenced tumor tissue and healthy non-malignant tissue; the variant was found in 8% of the reads in the tumor and was not found in healthy tissue indicating that the clone most likely represents a hematopoietic clone with lymphocyte infiltration in the tumor.
We identified six VUS in the MMR genes and eleven VUS in APC (listed in Supplementary Table S1), of which two variants -based on allele frequency -are particularly interesting: MSH6 c.3232G>C, p.(Val1078Leu) and APC c. 4318C>T, p.(Pro1440Ser). The MSH6 c.3232G>C variant was detected in a patient with CRC from the early onset cohort (no. #329;10), who also had a pathogenic monoallelic MUTYH variant. The tumor was microsatellite stable and had normal MSH6 expression, and along with benign in silico prediction, the variant has a low probability of being pathogenic. The APC c.4318C>T variant has not previously been reported and affects a highly conserved amino acid. The patient (no #1;52) had colon cancer 52-years-old in addition to three small tubular adenomas. He had two brothers with childhood leukemia and adult-onset serrated polyposis syndrome, respectively, and is predisposed to CRC on the paternal side (however, the variant does not segregate with CRC in a 4th degree paternal relative with CRC 49-years-old and nine adenomas). According to the family, the maternal family history is positive with two cases of brain tumors in addition to a verified case of follicular thyroid carcinoma. Unfortunately, no functional data on the variant are available.
In one patient (no. #106;10) we identified the BUB1 variant c.1321A>G, p.(Thr441Ala). The variant is predicted to introduce a cryptic acceptor splice site 45 base pairs into exon 12 and could lead to an in-frame loss of 15 amino acids in the protein. BUB1 variants have in addition to colon cancer been associated with mosaic variegated aneuploidy syndrome (MVAS), due to BUB1's role as a component of the spindle assembly checkpoint, and dysmorphic features. Data from our patient concerning dysmorphic features were not available, and the variant does not segregate with CRC in a 4th degree relative with CRC at age 48.
In FAN1, we identified the frameshift variant c.922_923del, p.(Val308Cysfs * 5), in a person with metachronous CRC (no. #143;8), who was a second degree relative in an Amsterdam I positive family. The variant is reported in exomes in gnomAD with an allele frequency of 0.023% in non-Finnish Europeans and has been classified as likely pathogenic in ClinVar. However, the variant does not segregate with another case of rectal cancer at age 49 or a case of cancer with unknown origin at age 59 in the family.
In addition, we identified several missense variants in CHEK2 and one in POLD1 that might have a moderate impact on CRC risk (Table 3).

Monoallelic Pathogenic Variants in Genes With Autosomal Recessive Inheritance
We detected the previously reported NTHL1 nonsense variant c.268C>T, p.Gln90 * in one patient (no. #136;12). To our knowledge, the significance of monoallelic, pathogenic NTHL1 variants is currently unknown (Weren et al., 2015). Another patient (no. #84;14) was heterozygous for the likely pathogenic MSH3 variant c.2319-1G>A (Adam et al., 2016). The patient also carried a second MSH3 intron variant, c.2436-13G>T not predicted to affect splicing by in silico analysis (Alamut). Due to the method (short read sequencing) used in this study, it was not possible to unravel if the variants are in cis or trans. Since MSH3-related CRC is inherited in an autosomal recessive pattern, the variants cannot explain the apparently dominant inheritance pattern in the family.

DISCUSSION
In this study, we performed gene panel analysis of 32 CRC associated genes in two cohorts of patients: (1). Patients with early onset CRC (n = 50) and (2). Patients from families with familial CRC (n = 99 patients from 85 families). The great majority of patients had MMR-proficient tumors, based on IHC analysis, and all patients (n = 7) from the familial CRC cohort with abnormal IHC expression had a relative with normal IHC analysis. This was consistent with the results of previous genetic testing of the MMR genes where ∼90% of the participants, or an affected relative, had had MMR analysis performed without identification of a pathogenic or likely pathogenic variant ( Table 1).
In the cohort with familial CRC we identified three highrisk pathogenic variants in APC, MSH2 and POLE. The MSH2 variant, c.2168C>T, p.(Ser723Phe) had previously been identified, but since new knowledge have emerged, the variant was reclassified to a likely pathogenic variant from a VUS. The likely pathogenic APC variant c.289G>A, p.(Gly97Arg) was -for reasons unknown -not identified or interpreted as pathogenic when APC analysis was performed 20 years ago, but since the family fulfils APC-testing criteria the variant would normally have been detected through routine genetic handling. The APC-screening was originally performed using only Sanger sequencing, and these two cases emphasize the importance of a regular reassessment of genetically unresolved families with apparently inherited cancer with either a reassessment of variants of unknown significance or repeated NGS-based analyses. The POLE c.1089C>A, p.(Asn363Lys) variant was found in an Amsterdam I positive family, where two affected persons turned out to be phenocopies. They had milder phenotypes with only one tumor at a higher age (67 and 50 years old, respectively) compared to three syn-and metachronous CRCs at age 28 and 40 in the index patient. This case clearly illustrates the great importance of choosing the most severely affected family member for genetic analysis.
Thus, although we identified pathogenic variants in 8 families out 85 families (9.4%) only three variants were clinically actionable and two variants were, or should have been, detected previously.
In the early onset cohort, we identified two pathogenic/likely pathogenic PMS2 variants in addition to several low or moderate risk variant including a pathogenic MUTYH variant, identified in two unrelated CRC patients. The clinical impact of pathogenic PMS2 variants are currently debated, but it is widely accepted that PMS2 variants confer a much lower cancer risk than the other MMR genes. For now, however, the families are being handled as classic Lynch Syndrome families. Monoallelic MUTYH variants are associated with a ∼two-fold risk of developing CRC, and in Denmark carriers of monoallelic pathogenic MUTYH variants, who have a first-degree relative with CRC, are offered colonoscopy surveillance every 5 years. Thus, the diagnostic yield of clinically actionable variants in the early onset cohort were 8% (4 out of 50 patients).
The genetic background of CRC for a large proportion of the patients in our study is still unresolved, which can have several explanations.
Firstly, our gene panel consisted of only CRC-related genes, and we might have had a higher diagnostic yield if our gene panel had included more known cancer predisposition genes.
Secondly, we did not include copy number variation (CNV) analysis in our study. CNVs have been estimated to account for up to 10% of all pathogenic variants (with great differences from gene to gene) suggesting that we might have missed et least one or a few CNVs in our cohort (LaDuca et al., 2019). Clinical follow up on the patients revealed that one patient actually had been diagnosed with juvenile polyposis syndrome due to a deletion of exon 9 to 10 in SMAD4. She developed colon cancer at age 36 and had had three colonic adenomas removed (one with high grade dysplasia). After the diagnosis she had another polyp removed which was interpreted as either an inflammatory or a juvenile polyp. In total, about one out of three patients had had CNV analysis performed in a clinical setting, but no further CNVs had been detected (unpublished data). The SMAD4 case also exemplifies the overlapping features of CRC-related syndromes: The patient was highly suspicious of having HNPCC/Lynch Syndrome but ended up with a diagnosis of juvenile polyposis.
Thirdly, some patients probably have pathogenic variants in regions that our analysis was not designed to capture, such as deep intronic or regulatory variants in known cancer genes or variants in genes not yet identified or associated with CRC. Many genes have been suggested as candidate colorectal genes, and especially genes such as POLE2, MRE11, and POT1 appear to be interesting genes as well as epigenetic changes in PTPRJ (Venkatachalam et al., 2010;Chubb et al., 2016;Terradas et al., 2020). Another possibility is a non-mendelian predisposition to CRC. Polygenic inheritance has been shown to explain up to ∼15% of the familial CRC risk  and at least two cases with CRC and possible digenic and oligogenic inheritance have been reported; these patients had variants in MUTYH and OGG1 (both involved in the base excision repair pathway) and in APC, OGG1, EXO1 and POLQ, respectively (Morak et al., 2011;Ciavarella et al., 2018). OGG1 was not part of our gene panel, and its role in hereditary CRC/polyposis is controversial (Smith et al., 2013;Mur et al., 2018). Intriguingly, we identified the same EXO1 variant in a patient with rectal cancer 31-yearsold (no. 397;10) from the early onset CRC cohort who also carried a monoallelic, pathogenic MUTYH variant. Although EXO1 (mainly involved in mismatch repair) and MUTYH (mainly involved in base excision repair) are not involved in the same pathway, they are both involved in DNA repair pathways. In order to reveal if she had other low or moderate risk variants -and thus could represent a case of oligogenic inheritance -we plan to perform whole genome sequencing (WGS) as next step.
As expected, we detected a high number of VUS, and some may -as more data become available -be reclassified as either class 1/2 or 4/5. An example of a VUS with a high potential of being reclassified to (likely) pathogenic is the RPS20, c.98A>T, p.(Glu33Val), variant. Only two families, one large family with multiple cases of CRC and a truncating RPS20 variant (c.147dupA, p.Val50SerfsTer23), and one small family with a splice site variant shown to disturb normal splicing (c.177+1G>A), have been reported in addition to two individuals with early onset CRC and without published family cancer history (Nieminen et al., 2014;Broderick et al., 2017;Thompson et al., 2020). The latter individuals had a missense variant, p.Val54Leu, and a frameshift variant, p.Leu61GlufsTer11, respectively. All analyzed tumors, including those in our family, have shown MMR-proficient tumor phenotype based on IHC analysis. RPS20 encodes a ribosomal protein which is a part of the S40 subunit. RPS20 has been suggested to be involved in cell proliferation and regulation, and as a stabilizer of p53 (Nieminen et al., 2014) but recently Krishnan et al. (2018) provided evidence of a critical interaction between RPS20 and GNL1 -a nucleolar ATPase also involved in cell cycle regulation -which regulates and promotes the G1/S phase, and thus provided documentation of a possible link to tumorigenesis. Several factors support that the c.98A>T variant is pathogenic: Strong segregation analysis in the family presented in our study, in silico splicing prediction and the fact that the variant has not previously been reported or cataloged in gnomAD. RPS20 is a very promising colorectal candidate gene, and the identification of three families with variants segregating with disease is strong evidence. However, since the role of RPS20 in CRC is not yet fully established, we categorize the variant as a VUS. Three out of four of the previously published variants in RPS20 were located in exon 3 or close to exon 3/4 boundaries. These variants have only been associated with CRC. The variant identified in the family presented in our study were located in exon 2 (close to the exon 2/intron 2 junction), and this family has a pedigree with 12 cases of CRC, but also other cancer types such as early onset vulva cancer, melanoma, breast cancer and esophageal/gastric cancer (Figure 2, pedigree C). Our family also has the by far youngest affected RPS20 carrier, a female who was diagnosed with CRC 24years-old. The other families also had early onset CRC cases at age 38, 39, and 41 respectively. Nieminen et al. (2014) did not provide detailed data on age at onset, but the mean age at onset of CRC were 52.3 years. Due to the scarcity of families with RPS20 variants, data on genotype-phenotype correlations are premature, but if the c.98A>T variant turns out to be pathogenic it will not only confirm the role of RPS20 in hereditary CRC, but it will also expand the phenotypic spectrum of RPS20 related cancer significantly. Due to the very high probability of RPS20 truly being a new cancer gene, we recommend inclusion of RPS20 in cancer gene panels.
An example of a complicated VUS is the TP53 c.814G>A variant. The family does not meet TP53 testing criteria [i.e., the 2015 version of the Chompret Criteria (Bougeard et al., 2015)] and the variant was solely identified because our multigene panel analysis was performed. TP53 variants with a reduced mutant to wild-type allele ratio (MWTAR), for example <25% or 30%, are a common issue when analyzing multi gene panels as discussed by Weitzel et al. (2018); their study also showed that TP53 variants with reduced MWTAR were more likely to represent hematopoietic clones when identified in multigene panels and in older patients, which was also the case in our family.
We detected several other pathogenic and interesting VUS in moderate penetrance genes such as CHEK2 and GALNT12, however, until we have reached a better understanding of the consequences of combinations of low-and moderate risk variants, we consider genetic testing of these genes in a clinical setting premature.
We identified only one variant in FAN1 with an allele frequency of <0.1%, namely the c.922_923del variant. The variant has been published in two patients with a suspected genetic predisposition to cancer but neither first-or seconddegree-relatives were affected by CRC (Fievet et al., 2019). FAN1 was proposed to be a CRC candidate gene in 2015 (Seguí et al., 2015), but the evidence of its role in hereditary CRC is still very limited (Broderick et al., 2017;Fievet et al., 2019). Our data further questions the role of FAN1 in hereditary cancer, and like Fievet et al. (2019), we also suggest excluding this gene from cancer gene panels.
In general, our study -a retrospective cohort study consisting primarily of highly selected MMR-proficient individualsshowed the highest diagnostic hit rate in families with a high burden of adenomas or with an exceptionally early age at onset. The youngest variant carriers with CRC in the families with (possible) high-risk variants were diagnosed with CRC at a mean of 28.4 years (age 28 (POLE, c. 1089C>A), 25 (MSH2, c.2168C>T), 29 (PMS2, c.736_741delinsTGTGTGTGAAG), 36 (PMS2, c.2275+1G>C) and 24 (RPS20, c.98A>T) compared to an average age of 40.5 years of the youngest person diagnosed with CRC in the rest of the participant's families (only 1st to 3rd degree relatives are included). In total, only eight families had a person with CRC before age 30, and we found a (possible) genetic explanation in 50%. Thus, when reevaluating families/patients with previous MMR-analysis and without polyposis, our data suggest that a limited number of genes -such as APC (primarily in order to identify AFAP families) and MUTYH (due to the high population carrier frequency) -is sufficient to capture the majority of families with a hereditary cancer predisposition syndrome. The exception appears to be when very early onset cases of CRC have occurred, and also if the personal history or family history suggests a syndromic etiology such as Peutz-Jeghers syndrome, juvenile polyposis syndrome, Cowden syndrome etc. The age limit of very early onset CRC is arbitrary, but a proposal could be age 40. These families would benefit from a larger gene panel analysis.
Since most patients referred for genetic evaluation have not been genetic tested previously, the approach in these families needs to be different. In Denmark -with a public health care system allowing all citizens access to relevant health care regardless of income/insurance -both cohorts are eligible for genetic counseling and -testing and the majority are offered analysis of a clinical CRC gene panel consisting of 17 CRC and polyposis related genes (as described in the section "Introduction"). The Collaborative Group of the Americas on Inherited Gastrointestinal Cancer recently published a recommendation regarding gene panel testing in hereditary CRC or polyposis, and they recommend that multigene testing as a minimum includes the MMR-genes, EPCAM, APC, MUTYH, BMPR1A, SMAD4, PTEN and STK11 (Heald et al., 2020). Although some of the genes are primarily relevant in case of polyposis (PMPR1A, SMAD4, PTEN, STK11), implementation of this gene panel would probably be the approach with lowest costs. Another approach is testing of one large cancer gene panel irrespective of phenotype. Studies analyzing very large cancer gene panels, not based on phenotype, have widened the phenotypic spectra for a number of cancer genes (Espenschied et al., 2017;Rohlin et al., 2017;LaDuca et al., 2019) and this approach would also catch rare causes of CRC. Although large gene panels generate more VUS that can be challenging to interpret and handle clinically, they are more efficient in terms of price and time. A third and fourth approach is targeted WES or WGS, respectively. WGS has the advantage of generating the greatest amount of data including (i) promoter, regulatory and deep intronic variants, which is very relevant to look for in established cancer genes (ii) reliable CNV data, (iii) data on structural rearrangements (iv) data on all SNPs making it possible to calculate polygenic risk scores, and (v) readily available data on new genes/variants when new knowledge comes forth. On the other hand, WGS is the most expensive and time-consuming analysis (when it comes to variant interpretation). Costs of WGS are currently approaching those of WES, but both are high-cost approaches, and the gray zone between clinical evaluation and research, as well as the ethical dilemmas of performing large genomic analyses, are issues which should be discussed. A paradigm shift from phenotypebased cancer gene panels to larger genomic analyses seems inevitable though.
The combination of young age at onset and a lack of CRC in the family history would suggest recessive inheritance or highrisk de novo variants, and the complete lack of other high-risk variants than the PMS2 variants in our early onset CRC cohort was unexpected. In the familial cancer cohorts, a number of the families are indeed highly suspicious of having a genetic cancer syndrome, and although novel cancer genes probably only account for a very small percentage of cases (Chubb et al., 2016) further studies are warranted in order to elucidate the genetic background of hereditary cancer and to identify novel cancer genes. In order to search for other explanations in our cohorts, selected individuals are now undergoing WGS, which will hopefully help to clarify the disease-causing mechanisms in a larger proportion of the individuals.

DATA AVAILABILITY STATEMENT
The datasets for this article are not publicly available due to the data being the property of a third party. All class 3, 4, or 5 variants have been uploaded in the article, and relevant variants will also be uploaded in ClinVar. Requests to access the datasets should be directed to the corresponding author, and certain parts can be shared upon reasonable request.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Danish Committee on Health Research Ethics, reference: H-4-2014-050. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
MD, KW, TO, A-MG, CT, and MN initiated and designed the project. MD included the great majority of the patients and wrote the first draft of the manuscript together with KW, A-MG, and TO. MM, LB, HO, GW, JH, and FW contributed to the method section. MM and LB were responsible of the laboratory analyses, and interpreted data together with MD, KW, TO, JF, and A-MG. HO and FW also contributed to the laboratory analyses and interpretation of data. The RPS20 family was clinically handled by A-BS. GW and JH performed the pathological analyses and reviewed tissue samples. All authors read the manuscript critically and approved the final version of the manuscript.