Whole genome sequencing of Malaysian colorectal cancer patients reveals specific druggable somatic mutations

The incidences of colorectal cancer (CRC) are continuously increasing in some areas of the world, including Malaysia. In this study, we aimed to characterize the landscape of somatic mutations using the whole-genome sequencing approach and identify druggable somatic mutations specific to Malaysian patients. Whole-genome sequencing was performed on the genomic DNA obtained from 50 Malaysian CRC patients’ tissues. We discovered the top significantly mutated genes were APC, TP53, KRAS, TCF7L2 and ACVR2A. Four novel, non-synonymous variants were identified in three genes, which were KDM4E, MUC16 and POTED. At least one druggable somatic alteration was identified in 88% of our patients. Among them were two frameshift mutations in RNF43 (G156fs and P192fs) predicted to have responsive effects against the Wnt pathway inhibitor. We found that the exogenous expression of this RNF43 mutation in CRC cells resulted in increased cell proliferation and sensitivity against LGK974 drug treatment and G1 cell cycle arrest. In conclusion, this study uncovered our local CRC patients’ genomic landscape and druggable alterations. It also highlighted the role of specific RNF43 frameshift mutations, which unveil the potential of an alternative treatment targeting the Wnt/β-Catenin signalling pathway and could be beneficial, especially to Malaysian CRC patients.


Introduction
Colorectal cancer (CRC) is among the top three most common cancer worldwide, with 1.93 million new cases and 900,00 deaths reported in 2020 (Sung et al., 2021;Xi and Xu, 2021). Asian countries have also experienced a significant hike in CRC incidences in these past 20 years, especially with the changes in lifestyle and diet (Arnold et al., 2017). In Malaysia, CRC is the most common cancer among men and the second most common among women (Hashimah et al., 2019). According to the National Cancer Patient Registry-Colorectal Cancer, from 2008 to 2013,4,501 cases of CRC were reported, most of which were Chinese, followed by Malays and Indians (Abu Hassan et al., 2016).
Substantial efforts have been made to understand the basic molecular mechanisms of CRC through profiling of somatic mutations, including the International Cancer Genome Consortium (Hudson et al., 2010), The Cancer Genome Atlas (TCGA) (The Cancer Genome Atlas Network, 2012) and the Pan-Cancer Analysis of Whole Genomes (PCAWG) (Campbell et al., 2020). However, there is still a lack of understanding in using the publicly available information to treat CRC patients effectively and unfortunately, their potential clinical significances are largely unexplored. Prognostication and treatment decisionmaking have been improved by numerous biomarkers discovered through comprehensive molecular profiling (Sveen et al., 2020). Several biomarkers and prognostic values, such as KRAS and EGFR, have been widely studied (Grady and Pritchard, 2014). Despite many studies on KRAS as a biomarker, the Ras protein has not yielded any therapeutic intervention due to the absence of a suitable site to which drugs could bind (McCormick, 2015;Liu et al., 2019). Alternatively, studies have been focused on blocking the pathways downstream of RAS, especially the RAF-MAPK pathway and the PI3 kinase pathways, to provide clinical benefit for patients with Ras-associated cancer (McCormick, 2015). Thus, by exploring the landscape of the alterations in cancer patients, new possible therapeutic targets and clinically relevant somatic mutations may be identified.
One of the most important signalling pathways implicated in CRC pathogenesis is the Wnt/β-catenin signalling pathway (Cheng et al., 2019) which is involves in various physiological and developmental processes such as proliferation, differentiation, apoptosis, migration, invasion and tissue homeostasis (Clevers and Nusse, 2012;Ng et al., 2019). Dysregulation of the pathway may contribute to the development and progression of specific solid tumours and haematological malignancies. (Cheng et al., 2019;Zhang and Wang, 2020). There has been increasing evidence supporting the potential relevance of the Wnt/β-catenin signalling pathway as a therapeutic target in cancer treatment (Blagodatski et al., 2014;Zhang and Wang, 2020). One of the components of this pathway is RNF43 (E3 ubiquitin-protein ligase RNF43), a type of ubiquitin ligase located in the transmembrane region (Zebisch and Jones, 2015). In cancer cells, Wnt signalling is activated through loss of function of RNF43 via mutations, leading to a decrease in the degradation of Frizzled (Serra and Chetty, 2018). Studies have shown that RNF43 mutations can have dual roles, either as a negative or positive regulator of the Wnt/β-catenin signalling pathway, depending on the type and location of the mutations in the gene Cho et al., 2022;Fang et al., 2022). Somatic mutations in RNF43 have been associated with increased sensitivity to compounds that target the Wnt pathway, such as the porcupine (PORCN) inhibitor LGK974.
LGK974 impairs the PORCN protein that will subsequently suppress the posttranslational acylation of Wnt-ligands and inhibit their secretion. Consequently, it prevents the activation of Wnt ligands, dysregulates the Wnt-mediated signalling, and inhibits cell growth in Wnt-driven tumours (Liu et al., 2013). Therefore, as the PORCN inhibitors and other upstream inhibitors advance into clinical trials, it is essential to identify the suitable patients to be treated with these Wnt inhibitors. Hence, a comprehensive map of druggable mutations is required.
Whole-genome sequencing (WGS) can provide insight into the mutational spectra of cancers across the entire genome. In the past decades, several new promising therapeutic targets have been discovered through this approach. Extensive reviews and studies on how germline and somatically derived variants can guide therapeutic decisions have been carried out, which highlighted the importance of genome profiling of cancer (Jia et al., 2014;Chan et al., 2019;Yang H. et al., 2019). Moreover, personal genome sequencing may become essential for diagnosing, preventing, and treating human diseases, particularly cancer (Cragun et al., 2016). Patient care can also be improved by transforming genomic research into personalized medicine applications by developing new and better genomics-based diagnostic tests. In this study, we employed WGS to characterize the landscape of somatic alterations in 50 Malaysian CRC patients, identify somatic alterations suitable for anticancer drug treatment, and predict the drug response. In addition, we functionally characterized two novel RNF43 variants and demonstrated that these are potentially clinically relevant variants worth exploring in future studies.

Clinical materials
A total of 50 Malaysian CRC patients were enrolled from 2010 to 2018. All the individuals gave their written informed consent, and the study was approved under UKM PPI/111/8/ JEP-2017-583. All the patients were categorized according to clinicopathological characteristics such as the age of diagnosis, ethnicity, gender, TNM classification, metastasis status, differentiation, tumour localization and survival status. Fifty paired colorectal carcinoma and their corresponding blood DNA or adjacent normal tissues were collected. The collected tissues were subjected to H&E staining and only tissues with 80% of tumour cells, confirmed by the pathologist, were selected to be used in the present study. DNA extraction was performed using AllPrep DNA/RNA/miRNA universal Kit (Qiagen, Germany) according to the manufacturer's protocol. The quantity of the extracted DNA was assessed using Qubit Fluorometer (Thermo Fisher Scientific, United States). The quality of the extracted DNA was evaluated by agarose gel electrophoresis and NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, United States). To confirm the identity and to avoid contamination of each tumour and blood or normal tissue paired samples, we profiled the DNA based on 15 polymorphic STR markers using Investigator ® IDplex Plus (Qiagen, Germany). The microsatellite status of each patient was determined using MSI Analysis System, Version 1.2 (Promega Corporation, United States) according to the manufacturer's protocol. The amplified fragments were detected on the 3130xl Genetic Analyzer (Applied Biosystem, United States).

Library construction and Whole Genome sequencing
Libraries for WGS were constructed using TruSeq ® Nano DNA HT Library Prep Kit (Illumina, United States) according to the manufacturer's protocol. One μg (1 μg) of genomic DNA was randomly fragmented by the S220 Covaris instrument (Covaris, country, United Kingdom) following the manufacturer's protocol. The fragmented DNA was viewed using gel electrophoresis and purified using AxyPrep Mag PCR clean-up kit (Thermo Fisher Scientific, United States) and then underwent end-repairing, phosphorylation and A-tailing reactions. WGS was performed as 150 bp paired-end, with the average coverage of at least 30X, on Illumina HiSeq X-Ten (Illumina, United States).

Single nucleotide variant and indels variants prioritization
Variants with a quality score above Q30 were considered for further analysis. Frequent variants were removed based on a minimal allele frequency (MAF) threshold of more than 5% from 1000 Genomes Project, ExAC and ESP6500 databases. Besides, variants not resulting in amino acid changes and/or identified in unannotated genes (unknown) and non-exonic regions (based on Ensembl) were also removed. Somatic variants were identified by excluding those specified in both tumour and normal samples and considered a true novel if the variant has not been reported in both dbSNP and COSMIC databases. We ensured that the corresponding normal sample has at least ten reads covering the position with zero variant reads for each of the novel somatic mutation candidates identified. For the resulting candidate of somatic mutations, the alignment of each sample was manually examined for possible mapping ambiguities and sequencing artefacts using Intergrative Genomics Viewer (IGV). Finally, we assessed the potential functional effects of each identified somatic variant based on protein impact prediction tools, SIFT and PolyPhen2.

Druggable and tumor driver alterations
We employed Cancer Genome Interpreter (Tamborero et al., 2018) to assess the relevance of the shortlisted somatic alterations as biomarkers of drug response and identify possible tumour driver alterations. Colorectal adenocarcinoma (COREAD) was selected as a cancer type for annotation.

Variants validation by sanger sequencing
All the shortlisted SNVs were validated using the Sanger sequencing method on both tumour and matched blood samples. Primers corresponding to the selected locations were designed using PrimerQuest (Integrated DNA Technologies, United States). PCR products were generated and purified using QIAquick PCR Purification Kit (Qiagen, Germany) and cycle sequencing was performed using the BigDye ™ Terminator V3.1 reagent (Applied Biosystem, United States). The cycle sequencing products were then processed using ethanol precipitation, and sequencing was carried out using the 3130xl Genetic Analyzer (Applied Biosystem, United States). The results were analyzed using the Sequence Scanner software (Applied Biosystem, United States).

Lentiviral vectors construction
The RNF43 coding sequences harbouring the mutations of interest, G156Afs and p.P192Gfs, were purchased from Origene Technologies (United States). The wild-type RNF43 coding sequence was purchased from Genscript (United States). These wild-type and mutant RNF43 coding sequences were amplified to incorporate the 3x FLAG upstream of the start codon, and the EcoRI and XbaI restriction sites at the 5′ and 3′ ends, respectively, for sub-cloning purpose into the expression vector pLVX-Puro (Clontech Laboratories Inc., United States). The RNF43 coding sequences were ligated into the expression vector using T4 Ligase (New England Biolabs) according to the manufacturer's recommendation and transformed into the chemically competent DH5α E.coli.

Cell lines, lentiviral transduction and transient transfection
HEK293T cells were used for lentivirus production and cultured in DMEM (Nacalai Tesque, Japan), supplemented with 10% (v/v) fetal bovine serum (FBS) (Nacalai Tesque, Japan) and 1% (v/v) penicillin-streptomycin mixed solution (Nacalai Tesque, Japan). HEK293T cells were seeded at 1.2 × 10 6 cells/well in six wells plate a day before transfection. The RNF43-expressing plasmid was co-transfected into the HEK293T cells along with packaging plasmids psPAX2 (Addgene, United States) and pMD2.G (Addgene, United States) using Lipofectamine 2000 according to the manufacturer'srecommendation. The media containing the lentiviral were collected 72 h post-transfection and filtered through a 0.45 μM PVDF sterile filter (Merck Milipore, Germany). For the lentiviral transduction, the SW48 cells were seeded into the six wells plate at a density of 2.5 × 10 6 cells/well, and transduced with the collected supernatant on the following day in the presence of 8 μg/ml polybrene (Merck Millipore, Germany).

Gene expression and protein analysis
Total RNA was extracted from the cell line using the AllPrep DNA/RNA mini kit (Qiagen, Germany) according to the manufacturer's protocol and cDNA was synthesized using the iScript cDNA synthesis kit (Bio-rad Laboratories Inc., United States). Quantitative PCR (qPCR) was performed using Sso Advanced ™ universal SYBR ® green mastermix (Bio-Rad Laboratories Inc., United States) and run on CFX96 Real-Time PCR Detection System (Bio-Rad Laboratories, United States). GAPDH and β-actin were used as standardization controls, and fold change was calculated based on 2−ΔΔCt (Schmittgen and Livak, 2008).
Total protein from cell cultures was extracted using RIPA buffer and resolved on 10% acrylamide gel. The target protein was analyzed against the following primary antibody: Anti Flag M2 (Sigma Aldrich, United States, 1:1,000) and the secondary antibody was Rabbit anti-mouse IgG/HRP conjugated (Dako, Denmark, 1:1,000).

Ki67 proliferation assay
Muse ® Ki67 Proliferation kit (Merck Milipore, Germany) was used to determine the percentage of proliferating cells based on Ki67 expression according to the manufacturer's protocol. The stained cells were analyzed on the Muse ® Cell Analyzer (Merck Milipore, Germany), followed by data analysis using Muse 1.7 Analysis software (Merck Milipore, Germany).

LGK974 drug sensitivity assay
The SW48 cells expressing wild type (Flagged-RNF43 WT ) and mutant RNF43 proteins (Flagged-RNF43 p.G156fs and Flagged-RNF43 p.P192fs ) were treated with different concentrations of LGK974 ranging from 10 to 100 µM for 48 h. The control groups, cells treated with 0.1 and 1% (v/v) of DMSO, were included in each experiment. The cell viability was analyzed by using the XTT Cell Viability Assay Kit (Biotium, Germany) based on the manufacturer's user guide to assess the sensitivity of the cells against the drug treatment.

Cell cycle assay
Cell cycle analysis was performed on SW48 cells expressing wild type (Flagged-RNF43 WT ) and mutant RNF43 proteins (Flagged-RNF43 p.G156fs and Flagged-RNF43 p.P192fs ) treated with 50 µM LGK974 for 48 h. The harvested cells were stained with propidium iodide provided in BD Cycletest ™ Plus DNA Reagent Kit (BD Biosciences, US) according to the manufacturer's instructions. The DNA content of at least 10,000 cells was analyzed by FACS Aria II flow cytometry (BD Biosciences, United States) for each experiment before the data was analyzed using ModFit LT 5.0 (Verity Software House, United States).

Patients characteristic
The characteristics of all 50 patients are listed in Table 1. Most patients were in stage 3, 70% (n = 35). The average age of patients was approximately 64 years old (range 30 -89 years old). The samples comprised an equal number of well-differentiated adenocarcinomas and moderately differentiated adenocarcinoma. Of these 50 patients, 36% (n = 18) have died and 54% (n = 27) are still alive (as when the data were collected in 2018).

Whole Genome sequencing analysis and coverage
A total of 100 (50 pairs) tumour and normal samples were sequenced using the Illumina HiSeq X Ten platform. At least 730 million reads were generated for each sample, producing approximately 30× to 50× sequencing depth. The reference mapping of the data against human genome hg19 and alignment refinement were carried out based on GATK Best Practices. On average,~99% of the reads were found to align to hg19, with at least 83% achieving more than 20× coverage (Table 2).

Mutation rate and microsatellite status
In this study, we defined a high mutation rate or hypermutated as >12 mutations/Mb, in concordant as reported in TCGA 2012. The somatic mutation rates varied among the samples. The average mutation rate was 13.11 per Mb, with a range of 0.83-243.97 mutations per Mb. Of the 50 patients tested for microsatellite instability, 10% (n = 5) were classified as microsatellite instability-high (MSI-H), 10% (n = 5) as microsatellite instability low (MSI-L) and the remaining 80% (n = 40) were microsatellite stable (MSS). The median mutation rate in MSI-H, MSI-L and MSS groups were 57.7/Mb, 4.1/Mb and 3.8/Mb, respectively. The median mutation rate was significantly higher in MSI-H patients (p < 0.05) compared to MSI-L. Nearly all (4/5, 80%) MSI-H patients, except one (C289T), were classified as hypermutated and the association of MSI-H status with a high mutation rate was statistically significant (p < 0.001). Besides, one of the microsatellite stable (MSS) tumours, patient C569T, was hypermutated with a mutation rate of 243.97/Mb ( Figure 1).

Somatic mutations landscape in malaysian colorectal cancer
The total somatic variants detected in each CRC patient ranged between 2,587 and 756,750. Single nucleotide variants (SNVs) (Figure 2A) was the most variant type detected, with missense mutations being the highest variant class ( Figure 2B). We analyzed the mutational signature underlying the development of our local CRC patients and three signatures similar to the COSMIC signature 6, 10 and 1 with cosine similarity of 0.949, 0.904 and 0.83, respectively, were discovered ( Figure 3). The COSMIC signature six is related to   defective DNA mismatch. COSMIC signature 10 is associated with defects in polymerase POLE, while signature one pertains to spontaneous deamination of 5-methylcytosine. The top ten most frequently mutated genes are APC, TP53, KRAS, MUC4, TCF7L2, CCDC168, FAT3, KMT2C, LRP1B, PCLO, SCN1A and SPEG. Mutations in three well-established CRC genes, APC, TP53 and KRAS, were present in 70%, 66% and 34% of the patients, respectively (Figure 4). Using MutSigCV, we identified significantly mutated genes and from this analysis, the number of significantly mutated genes with p-values less than 0.001, 0.01, and 0.05 were 15, 25, and 113, respectively. Among these, the top significantly mutated genes (p < 0.001 and q < 1.0) are APC, TP53, KRAS, TCF7L2 and ACVR2A. These genes, APC, KRAS and TP53, were mutated in more than 30% of the patients. The remaining two were mutated in less than 20%, namely TCF7L2 (20%, p < 0.0001, q = 0.02) and ACVR2A (8%, p < 0.0001, q = 0.63). The significantly mutated genes (SMGs) identified (p < 0.01 and q < 1.0) are summarized in Table 3.
Upon variants prioritization, 64 were identified as recurrent in 54 genes of two to six patients. As expected, most of the recurrent variants were presented in well-established CRC genes such as KRAS, APC and TP53, for which nearly all (92%) are known variants reported in the dbSNP or COSMIC database. In 12% of the patients (6/50 patients), KRAS G12D was observed to be the most frequent variant, followed by ACVR2A K435fs (8%, 4/50 patients) and TP53 R175H (8%, 4/50 patients). Eleven clinically significant variants, classified as pathogenic, were identified in five genes, which were KRAS (rs121913529, rs112445441, rs121913529), APC (rs587781392, rs587782518, rs121913332), TP53 (rs28934576, rs121912651), PIK3CA (rs104886003) and BRAF (rs113488022). All of the mentioned variants are listed in Table 4. We identified 20 candidate driver genes using the oncodrive function in maftool v2.0.16. However, only two genes were significantly mutated (FDR<0.1), which were KRAS (G12D) and ACVR2A (K435fs). Remarkably, even with less than 20% of frequency, the ACVR2A gene was discovered to be one of the driver genes and was significantly mutated among other genes, suggesting its possible role in tumorigenesis of CRC in our local patients. The list of identified cancer driver genes is shown in Table 5.

Distribution of KDM4E, MUC16 and POTED hotspot and novel mutations
Four novel, non-synonymous variants, were identified in three genes; KDM4E R100H, MUC16 L12755F and L12755S, and POTED E172Q. At the time of analysis, these variants had not been previously reported in neither COSMIC or dbSNP. The mutation hotspots of the genes were analyzed using the cBioPortal web-based tool (https://www.cbioportal.org/) (Cerami et al., 2012;Gao et al., 2013). The lollipop plot in Figures 5A-C shows the distribution and classes of hotspot mutations in these three genes across eight different CRC

Druggable somatic alterations
Based on the clinical annotation using Cancer Genome Interpreter, 88% (44/50) of the patients harboured at least one (range from 1 to 16) predicted candidate of druggable alterations. These alterations were either the targets of existing therapies (FDA guidelines or NCCN guidelines) or are currently being investigated in clinical trials (case reports, early trials, late trials and pre-clinical). Among them were various APC variants, detected in 72% (36/50) of the patients and predicted to respond against tankyrase inhibitors at the pre-clinical level. KRAS G12D was detected in 12% (6/50) of the patients and these patients were predicted to be resistant to several EGFR monoclonal antibody inhibitors (Panitumumab and Cetuximab) and ERBB2 monoclonal antibody inhibitor (Trastuzumab and Lapatinib). Six other KRAS variants were also observed in 10 (20%) different patients, which were predicted to be responsive to the combination of monoclonal antibody inhibitors such as MEK and PIK3 pathway inhibitors and MEK and MEK BCL-XL inhibitors.
Moreover, 14% (7/50) of CRC patients whose tumours possess PIK3CA variants were predicted to respond to the PI3K pathway inhibitor. However, these patients may not benefit from cetuximab therapy due to these variants. Four patients with different variants in the POLE gene might be suitable candidates for immunotherapy using the immune checkpoint inhibitor, PD1 antibody inhibitor. In addition, two RNF43 mutations were discovered in one of the hypermutated phenotype patients, C474T. This patient is likely to be responsive  to the Wnt pathway inhibitor, also known as the porcupine inhibitor.

RNF43 G156Afs mutation promotes colorectal cancer cells proliferation
One commonly used marker for active cell proliferation is the Ki67 protein. To test the effect of harbouring the G156Afs and P192Gfs mutations on CRC cell's proliferative capacity, we performed Ki67 FACS on the RNF43 wild type-and mutantstransduced SW48 cells. This was to compare the percentage of non-proliferating (Ki67 -) and proliferating cells (Ki67 + ). We found that the expression of truncated RNF43 G156Afs promoted SW48 cells proliferation (78% of Ki67 + cells) as compared to the SW48 cells transduced with empty vector (51.67% of Ki67 + cells and wild-type RNF43 (57.4% of Ki67 + cells). However, we did not observe any significant change in SW48 proliferative capacity between the cells expressing wild-type RNF43 and RNF43p.P192fs mutation ( Figure 6).

RNF43 G156Afs and P192Gfs mutation increase sensitivity against LGK974 treatment
Several studies have shown the cells that carry the inactivating RNF43 mutations are more sensitive to the porcine inhibitor LGK974 (Jiang et al., 2013;Tu et al., 2019;Zhong et al., 2019). Based on these reports, we were prompted to assess whether the expression of this RNF43 G156Afs and P192Gfs mutations would sensitize the SW48 cells to LGK974 treatment. To this end, we performed a cell viability assay upon treating the SW48 cells that expressed empty vector, wild-type RNF43 and the two RNF43 mutations with increasing concentration of LGK974 drug (10-100 µM). We found that the cells that expressed these mutations were more sensitive to a higher concentration of LGK974 (50 μM and 100 µM) as compared to the cells that expressed empty vector and wild-type RNF43 (Figure 7). We, however, did not observe any significant difference or additive effect in terms of drug sensitivity between the 50µM and 100 µM LGK974 treatments. Therefore, we used the 50 µM LGK974 in the subsequent cell cycle arrest assay.

RNF43 G156Afs and P192Gfs mutation induce G1 cell cycle arrest upon LGK974 treatment
Since LGK974 is known to affect the cell cycle, we examined the effect of LGK974 treatment on the cell cycle process of each of these SW48 transduced cell lines. We treated these cells with 50 µM LGK974 for 48 h and assessed the cell cycle phases using FACS and BD Cycletest ™ Plus DNA Reagent Kit (BD Biosciences, US). FACS analysis revealed a significant percentage of SW48-RNF43-p.G156fs and SW48-RNF43-p.P192s cells at the G0/G1 as compared to SW48-RNF43 wild-type cells, showing that the RNF43 mutated

Discussion
In this present study, we performed WGS on 50 paired tumour tissues and their corresponding blood DNA or adjacent normal tissues of Malaysian CRC patients. The comprehensive analysis of the WGS data, which consisted of SNVs and Indels, resulted in the discovery of recurrent and novel variants in Malaysian CRC patients. The somatic mutation rate varied between the CRC patients. However, nearly all patients with hypermutated tumours were microsatellite instable (MSI). In the TCGA study, more than half of the hypermutated tumours had high levels of MSI (MSI-H) due to somatic mutation in mismatch repair genes, MLH1 methylation or the CpG island methylation phenotype (CIMP) (The Cancer Genome Atlas Network, 2012). The determination of MSI status is essential, especially in metastatic CRC (mCRC), because of its prognostic and therapeutic implications. MSI status has also been considered as the biomarker for the immune checkpoint inhibitor treatment response (Nojadeh et al., 2018). Two of the FDA-approved immune checkpoint inhibitors for programmed cell death-1 protein (PD-1), pembrolizumab and nivolumab, had survival benefits in patients with mCRC and MSI-H (Le et al., 2015;Overman et al., 2017). Therefore, we postulated that our C420T patient, who has a high level of MSI and mutation in the DNA polymerase epsilon (POLE) gene (R573W), might be benefited from the immune checkpoint inhibitor therapy. A recent study reported a favourable clinical response to pembrolizumab in CRC patients who have metastatic disease and are intractable to FOLFOX and FOLFIRI treatments. These patients were characterized by MSS phenotype and POLE mutation, which highlighted the importance of genomic profiling and the determination of microsatellite status for an effective therapeutic purpose (Gong et al., 2017). In addition, the POLE mutations can also serve as a prognostic marker. Patients carrying these mutations have a significantly better overall survival than those with wild type, regardless of their microsatellite status and tumour mutation burden. POLE mutations also predict a good response to the immune checkpoint inhibitor treatment. Based on this evidence, a clinical trial on toripalimab in patients with several solid tumours, including CRC, with POLE mutations and non-MSI-H, has been initiated (NCT03810339) . We found that our C569T patients, who has hypermutated tumour, MSS phenotype and POLE mutation, is likely to have a responsive effect against immune checkpoint inhibitor through our druggable alteration analysis.
Our genome data can be classified into three mutation signatures, signatures 1, 6 and 10, which were supported by several other studies on sporadic CRCs (Jia et al., 2014;Nagahashi et al., 2016;Tubbs and Nussenzweig, 2017). Signature one is strongly associated with an endogenous mutational process initiated by spontaneous deamination of 5methylcytosine due to the ageing process (Tubbs and Nussenzweig, 2017). This is reflected in our patients' age, of which 92% (n = 46) of the recruited CRC patients were above 50 years old with an average age of 64. Signature 6 and 10, on the other hand, are associated with defective MMR and defective Percentage of proliferated cells assessed by Ki67 proliferation assay, 48 h post cell seeding. Percentage of Ki67+ was significantly higher in SW-RNF43-p.G156fs cells as compared to both SW48empty vector and SW48 wild type. Two-way ANOVA with Tukey′s range test, mean + SEM, n = 2, ***p < 0.005, ****p < 0.0001).

FIGURE 7
Mutants RNF43 promote reduction in cell viability. Statistical significance in all cases was measured by Two-way ANOVA with Tukey′s range test, (*p < 0.05), n = 3. Error bars represent average ± SD.

Frontiers in Molecular Biosciences
frontiersin.org exonuclease activity of POLE, respectively. We observed that 10% (n = 5) of the recruited patients were categorized as MSI-H, with all of them having at least one known somatic mutation in either MMR or POLE genes, which may lead to impaired MMR and exonuclease activity of POLE, respectively. The top frequently mutated genes identified in our CRC patients cohort were APC, KRAS, TP53 and MUC4, which were also readily reported in multiple studies (The Cancer Genome Atlas Network, 2012;Abdul et al., 2017;Chang et al., 2019;Mohd Yunos et al., 2019). High mutation frequency was also observed in several other genes, including CCDC168, FAT3, KMT2C, LRP1B, PCLO, SCN1A and SPEG. Based on the MutSigCV analysis, we also identified TCF7L2 and ACVR2A among the significantly mutated genes in our CRC patients. However, these two genes were not categorized as the top ten frequently mutated genes. In MutSigCV, SMGs were defined as the genes that are mutated more often than expected by chance of given background mutation processes. Our analysis indicated that most of the top ten frequently mutated genes were not statistically significant when mutational heterogeneity was considered. Despite their high mutation frequency in CRC, these genes may not be functionally important for tumorigenesis. Compared to other studies, the mutation frequency of APC and TP53 in the Malaysian population was almost similar but much lesser than that of KRAS (The Cancer Genome Atlas Network, 2012; Abdul et al., 2017;Tanaka et al., 2017). Our previous genomic alterations profiling of Malaysian CRC patients also revealed that the APC gene was among the most frequently mutated gene, with a mutation frequency between 60% and 70% Chang et al., 2019).
On top of that, we identified four novel, non-synonymous which led to amino acid substitutions in three genes; KDM4E, MUC16 and POTED. Non synonymous variant of KDM4E R100H was identified in two patients C434T and C569T. KDM4 family protein functions as histone lysine demethylases that remove methyl groups from lysine residues in the histone tail, thereby controlling the transcriptional activity of target genes (Chen et al., 2006). KDM4 proteins family consist of four paralogues, namely KDM4A-KDM4D, and two pseudogenes, KDM4E and KDM4F. While KDM4A and KDM4B are the widely-studied members of the KDM4 subfamily, the roles of KDM4E in cancers have rarely been reported (Wang et al., 2022a). Genomic alterations and overexpression of the KDM4 family are reported in different breast cancer subtypes. Several KDM4 inhibitors have already been used as anticancer drugs for breast cancers in vitro (Ye et al., 2015;Varghese et al., 2021). However, none of these drugs have undergone clinical trials yet (Varghese et al., 2021). The bioinformatics analysis demonstrated that the intronless KDM4E and KDM4F are expressed similarly to KDM4D. Because of their architecture and lack of expression, KDM4E and KDM4F are referred to as pseudogenes (Berry and Janknecht, 2013). Growing evidence that the pseudogenes have a variety of biological roles and that their dysregulation is frequently linked to human disorders like cancer signifies their potential as therapeutic targets (Prensner and Chinnaiyan, 2011;Wahlestedt, FIGURE 8 Distribution of SW48 transduced cells upon treatment with 50 μM of LGK974 throughout different cell cycle phase. Statistical significance in all cases was measured by mixed effect analysis with Tukey′s range test, (*p < 0.05), n = 4. Error bars represent average ± SD.
Frontiers in Molecular Biosciences frontiersin.org 2013;Sisu, 2021). Several genomic alterations of pseudogenes in CRC have been identified. For instance, pseudogenes DUXAP8, MST O 2P and MYLKP1 involved in supporting CRC progression and enhance cancer risk (Lynn et al., 2018;He et al., 2020;Guo and Zhang, 2022). Hence, it is worth to explore the molecular characteristic and functional relevance of the identified recurrent KDM4E R100H mutation to unravel their potential as therapeutic target in CRC.
In this present study, we have identified a recurrent, nonsynonymous MUC16 L12755S mutation which was predicted to be deleterious by SIFT and PolyPhen-2 tools. Located within the tandem repeat domain, this particular mutation has not been previously reported in CRC and is worth exploring its functional relevance in future studies.The MUC16 gene encodes for a highly glycosylated protein that consists of two primary domains: a tandem repeat domain (interspersed with SEA domain) containing the CA-125 epitope and a transmembrane domain (Hattrup and Gendler, 2008;Felder et al., 2014). CA-125 is an FDA-approved serum biomarker used in monitoring cancer progression and treatment response, particularly in ovarian cancer (Bottoni and Scatena, 2015;Li et al., 2018;Charkhchi et al., 2020). MUC16 is the most frequently mutated gene in endometrial cancer (Hu and Sun, 2018), and its oncogenic properties have been investigated in several other cancers such as glioblastoma (Yang C. et al., 2019), gastric cancer (Huang et al., 2021) and colorectal cancer (Björkman et al., 2019). Meanwhile, knocking down MUC16 in CRC cells impaired their growth and metastatic capability due to the deregulation of JAK2-STAT3 signalling pathway (Liu et al., 2022). Furthermore, a significant correlation between the MUC16 mutation with tumour mutational burden and microsatellite status was shown in patients with gastric cancer , colorectal cancer , and melanoma Wang et al., 2022b) which signifies the used of immune checkpoint inhibitor (ICI) in the treatment regimen.
To our knowledge, our study is the first to report novel POTED E172Q mutation in CRC, which were discovered in two of our patients. POTE family gene has at least ten paralogs, which encode for cancer testis antigens (CTAs) that are expressed in the germ cells of the adult testis, fetal ovary, prostate, placenta. Moreover, POTE gene family has been associated with the pathogenesis of various human cancers in which their expression is higher in cancer tissues as compared to normal tissues (Coulie et al., 2014;Sharma et al., 2019). Due to their low expression in normal tissues, POTEs are potential biomarker candidates for cancer progression and therapeutic targets (Redfield et al., 2013). POTED, also known as ANKRD21, is one of the paralogs of POTE located on chromosome 21. This gene is one of the 45 gene signatures for metastatic predictor in triple-negative breast cancer (TNBC) whereby the high expression of POTED was associated with poor prognosis (Kuo et al., 2012). Nevertheless, the mechanism regulating the POTED expression in cancer remains to be elucidated. In 2019, Shen et al. demonstrated an aberrant expression of POTEE, another paralog of POTE gene family, perturbed the SPHK1/ p65 signalling axis that consequently promoted tumorigenesis by inhibiting apoptosis in CRC cells. Their study has highlighted the potential roles of POTEE as a novel biomarker for the diagnosis and intervention of CRC (Shen et al., 2019). Thus, the functional roles of other paralogs of POTE gene family, such as POTED, remain elusive and worth pursuing.
The Wnt signalling pathway is frequently activated in most CRC cases due to the loss of function mutations in the APC gene. APC mutations were discovered to be one of the potential biomarkers for sensitivity to tankyrase inhibitors in CRC. Tankyrase inhibitors enhance the degradation of β-catenin and inhibit cell proliferation in CRC cell lines that harbour APC mutations (Schatoff et al., 2019;Jang et al., 2020). In this study, we identified four previously reported pathogenic APC truncating mutations, namely the R223X, R213X, Q1406X and R1450X, which were predicted to be sensitive toward tankyrase inhibitors. We analyzed the druggability of the identified mutations, which were expected to be the target of either existing therapies or currently being investigated in clinical trials. The response of APC truncating mutations, such as Q1405X, in in vivo model was proven to be sensitive against tankyrase inhibitor, G007-LK, through WNT suppression due to tankyrase synthase inhibition (Schatoff et al., 2019). The finding demonstrates the importance of these APC mutations in CRC and an investigation into how these mutations can be translated for targeted molecular therapeutics is warranted.
Besides APC, we also identified two N-terminal truncating mutations in RNF43, specifically the G156Afs and P192Gfs. These variants were found in C474T patient who has wild-type APC, KRAS and TP53, is hypermutated, and MSI-H phenotype. From our druggable alterations analysis, those with RNF43 mutations were predicted to be responsive against the porcupine inhibitor LGK974. Even with a prevalence of less than 20% in CRC, RNF43 has been described as one of the emerging predictive markers for treatment selection, especially in those with BRAF V600E mutations and MSI-H tumors with low MLH1 expression (Jiang et al., 2013;Giannakis et al., 2014;Tu et al., 2019;Yunos et al., 2020). RNF43 gene has been functionally characterized in multiple cancers such as pancreatic (Jiang et al., 2013), gastric (Niu et al., 2015) and hepatocellular carcinoma (Xing et al., 2013). Depending on the type and position of the mutations in the gene, RNF43 mutations can function as either positive or negative regulators of the Wnt/β-catenin signalling pathway Yu et al., 2020). The widely reported RNF43 mutations in CRC are the R117fs and G659fs, which these mutations are commonly observed in serrated CRCs with mutated BRAF and MSI (Bond et al., 2016). The R117fs along with another RNF43 mutation, P441fs, act as positive regulators of the Wnt/ β-catenin signalling pathway because the presence of these mutations resulted in FZD accumulation on the CRC cells surface. Furthermore, treatment with LGK974 decreased the Frontiers in Molecular Biosciences frontiersin.org Wnt/β-catenin activity induced by these mutations (Cho et al., 2022). Contrariwise, a study by Tu et al. showed that G659fs mutation does not confer any dominant-negative activities and is unlikely to play a role in supporting CRC pathogenesis (Tu et al., 2019). This finding was supported by several independent studies that show the same G659fs C-terminal truncation did not affect the Wnt/β-catenin signalling Cho et al., 2022). A more recent study has shed light on this observation whereby the G659fs actually promoted CRC cells growth via PI3K/mTOR instead of the Wnt signaling pathways (Fang et al., 2022). Collectively, these observations indicate that different RNF43 mutations would possess different molecular properties whereby each of these specific mutations warrant for comprehensive investigations in order to understand their roles in CRC. Comprehensive screening of 135 RNF43 missense and frameshift mutations in multiple human cancers revealed that all of the frameshift mutations and almost all missense mutations are located in the RING domain. This resulted in the RNF43 loss of function and subsequently increased activity of Wnt/β-catenin . Cho et al. demonstrated that even though RNF43 R117fs could interact with FZD5, RNF43 with this specific mutation could not ubiquitinate FZD5 due to the lack of the RING domain. It suggests that the RING domain of RNF43 is vital for regulating the Wnt/β-catenin signalling pathway (Cho et al., 2022). Herein, we investigated the role of two RNF43 mutations, G156Afs and P192Gfs, identified from our WGS data in the SW48 CRC cell line. The G156fs mutation is located in the protease-associated domain (PA domain) and mutation in this domain may affect the RING domain. Our observations revealed that RNF43 G156Afs mutation, but not P192Gfs, promoted SW48 cell proliferation. Nevertheless, both mutations exhibited higher sensitivity to LGK974 treatment that was manifested via reduced cell viability and cell cycle arrest at the G0/G1 phase 48 h post-treatment. We have successfully characterized the potential roles of these truncating RNF43 mutations in CRC pathogenesis, which can be further explored for the development of novel therapeutic targets in CRC. Moreover, it is essential to further validate invidual mutation identified from any genomic profiling studies to confirm their involvement in tumorigenesis. This is because Altogether, the analysis of druggable variants from our WGS, supported by the functional characterization, enhanced our understanding of the value of genomics and translating them into precision medicine.

Data availability statement
The data presented in the study are deposited in the NCBI SRA repository, accession number PRJNA928101.

Ethics statement
The studies involving human participants were reviewed and approved by UKM. The patients/participants provided their written informed consent to participate in this study.

Author contributions
RM performed the experiments, data analysis, interpretation, and manuscript drafting. KJS performed the bioinformatics analysis and visualization of the sequencing data. N-SA was involved in data interpretation, drafting the manuscript, and overseeing the experiments. SSa, MI, NM, NM and MA were heavily engaged in sample QC, DNA extraction and variants validation. SSa and FT were involved in the determination of the microsatellite status. SSy was involved in optimization in vectors construction and cell-based assay. NA gave insight into the functional analyses. LM and IS are the colorectal surgeons involved in specimen collection and IR is a pathologist. RJ was involved in the critical review of the manuscript. All authors read and approved the final manuscript.