Deep Transcriptome Sequencing of Pediatric Acute Myeloid Leukemia Patients at Diagnosis, Remission and Relapse: Experience in 3 Malaysian Children in a Single Center Study

Among the many types of leukemia, acute myeloid leukemia (AML) affects 20% of diagnosed hematological malignancies in pediatric patients (Meshinchi and Arceci, 2007; de Rooij et al., 2015). Standard chemotherapy regimen remains as the first line treatment for pediatric AML, however nearly 40% of AML patients may suffer from relapse and eventually die from the disease (de Rooij et al., 2015). Similarly, it has been reported that 50% of the pediatric AML relapsed within 12–18 months of diagnosis and 45% of those relapsed were not expected to survive (Creutzig et al., 2014). Despite advances in cytogenetic analysis through fluorescence in situ hybridization and multiplex PCR, there is still a need for a better and comprehensive molecular profiling. For instance, microarray has long been used to study the gene expression profiles of AML patients. The different profile of gene expression has enabled clinicians to tailor better treatment for patients and predict whether patients have the tendency to relapse (Goswami et al., 2009). In a recent study, Handschuch et al. reported that three genes, ANXA3, S100A9, and WT1 can differentiate between different prognostic types of AML (Handschuh et al., 2018). The study outcome was in agreement with another study conducted by Shimada et al. (2012), where a high expression of WT1 gene showed prognostic impact in pediatric AML (Shimada et al., 2012). Another study by Jo et al. (2015) reported that high expression of EVI1 and MEL1 could predict the prognosis of pediatric AML (Jo et al., 2015). However, none of the biomarkers identified from these studies have been translated into clinical use. Therefore, the search continues for additional promising biomarkers, notably novel transcripts, novel fusion genes and non-coding RNAs which are not represented in the microarray platform. Transcriptome sequencing through next generation sequencing represents an effective approach to discover


INTRODUCTION
Among the many types of leukemia, acute myeloid leukemia (AML) affects 20% of diagnosed hematological malignancies in pediatric patients (Meshinchi and Arceci, 2007;de Rooij et al., 2015). Standard chemotherapy regimen remains as the first line treatment for pediatric AML, however nearly 40% of AML patients may suffer from relapse and eventually die from the disease (de Rooij et al., 2015). Similarly, it has been reported that 50% of the pediatric AML relapsed within 12-18 months of diagnosis and 45% of those relapsed were not expected to survive (Creutzig et al., 2014). Despite advances in cytogenetic analysis through fluorescence in situ hybridization and multiplex PCR, there is still a need for a better and comprehensive molecular profiling. For instance, microarray has long been used to study the gene expression profiles of AML patients. The different profile of gene expression has enabled clinicians to tailor better treatment for patients and predict whether patients have the tendency to relapse (Goswami et al., 2009). In a recent study, Handschuch et al. reported that three genes, ANXA3, S100A9, and WT1 can differentiate between different prognostic types of AML (Handschuh et al., 2018). The study outcome was in agreement with another study conducted by Shimada et al. (2012), where a high expression of WT1 gene showed prognostic impact in pediatric AML (Shimada et al., 2012). Another study by Jo et al. (2015) reported that high expression of EVI1 and MEL1 could predict the prognosis of pediatric AML (Jo et al., 2015). However, none of the biomarkers identified from these studies have been translated into clinical use. Therefore, the search continues for additional promising biomarkers, notably novel transcripts, novel fusion genes and non-coding RNAs which are not represented in the microarray platform. Transcriptome sequencing through next generation sequencing represents an effective approach to discover new genetic information on gene expression which may contribute to tumorigenesis. Notably, several novel and rare fusion transcripts have been identified from AML patients via RNA-sequencing (Padella et al., 2015). A recent study combining whole genome sequencing, whole exome sequencing and RNA sequencing in pediatric cancers has identified 240 pathogenic variants with increased sensitivity (Rusch et al., 2018). Previous studies in relapsed AML have shown that the cells acquired additional genetic mutations that were either different or evolved from subclones of diagnostic blasts cells (Padella et al., 2015;Rusch et al., 2018). Nevertheless, little is known about the genetic changes at the transcriptomic level at diagnostic, remission and relapse stages of the same patients, especially in the Malaysian population.

VALUE OF DATA
• AML is the second commonest hematological malignancy affecting children worldwide and more research at transcriptome level is needed to help to improve the survival rates. • This data is of value to understand further the molecular landscape underpinning de novo and relapsed pediatric AML in the Malaysian population. • Most of the data available only report the molecular profiles at diagnosis and relapse stages. In this study we performed the sequence at remission as well. This finding is important for researchers to understand the changes in the progression of the disease at all stages. • Deep transcriptome sequencing will allow users to not only obtain the gene expression profile, but also for fusion gene identification and mutation analysis.

DATA
In this experimental design, we successfully sequenced the RNA of three Malaysian pediatric AML patients at three different stages; diagnosis, remission, and relapse. Table 1 displays the demographic information of the patients involved in this study.
Two out of three patients had a RUNX1-RUNXITI translocation while the other patient was cytogenetically normal. The RUNXI-RUNXITI translocation is one of the most widely identified chromosomal aberration in AML patients. Moreover, it has been reported that patients with this translocation have a higher change of gaining relapse (Christen et al., 2019). All patients relapsed within one year after remission. Based on our next generation sequencing results, for PAML1, the relapse (RL) sample yielded the highest number of reads but with lower total percentage of mapped reads at 78.47% as compared to diagnosis (DX) and remission (RM) stages with 81.33% and 80.72%, respectively. For PAML2 and PAML3, the remission samples resulted in the highest number of reads and percentage total mapped as compared to diagnosis and relapse samples. Collectively, as shown in Table 2, PAML3 had relatively higher reads as compared to PAML 1 and PAML2 for all three stages. This subsequently resulted in a higher percentage of total mapped reads in PAML3. The raw sequences for each sample were submitted to Sequence Read Archive (SRA), with accession number PRJNA509497. All the samples were at least mapped >73% to the exonic region of the genome except for PAML1-DX sample.
One of the analyses that can be used with this data is the identification of differentially expressed genes. For instance, here, we compare the gene expression profile for PAML1, using three different group comparisons (1) Diagnosis vs. Relapse, (2) Diagnosis vs. Remission, and (3) Relapse vs, Remission (Further analysis of PAML2 and PAML3 can be found in Data Sheet 1). Table 3 lists the top 10 differentially expressed genes in all comparison groups. The top genes were different between each group which shows that the regulation of gene expression in each stage of the patient is also distinct. Furthermore, we performed KEGG pathway enrichment analysis based on the  differentially expressed genes, as shown in Figure 1. In the relapse vs. diagnosis comparison, the most enriched pathway is the systemic lupus erythematosus pathway followed by viral carcinogenesis, and cell cycle. Similarly, the same pathways were also enriched in the diagnosis vs. remission comparison.
There have been studies reporting on the association between systemic lupus erythematosus, or any autoimmune diseases with the risk of developing AML (Tsunematsu et al., 1984;Ramadan et al., 2012). Nevertheless, AML is usually attributed as the effect of administering cytotoxic drugs in autoimmune diseases (Ramadan et al., 2012). The causal link between AML and systemic lupus erythematosus needs further elucidation, even though, there is indeed a link in terms of molecular structure between these two diseases. Moreover, the cell cycle pathway has also been previously reported to be involved in the hematopoietic landscape of AML (Yagi et al., 2003;Handschuh, 2019). Whereas for the relapse vs remission comparison, the most enriched pathway is the transcriptional misregulation in cancer. This was in concordance with a different study conducted on AML, where the authors found that this pathway was among the topmost enriched pathways (Roushangar and Mias, 2019). This observation also implicates that the genes that are being regulated from remission to relapse are different than at the diagnosis stage. Nevertheless, these data need to be used with caution since it is a high throughput sequencing, further validation is recommended.

RNA Extraction
Total RNA from isolated MNC was extracted and purified according to the standard protocol of AllPrep DNA/RNA/ miRNA Universal Kit (Qiagen, Germany). The quality and quantity of the total RNA were checked using NanoDrop ND-1000 (NanoDrop Technologies, USA) and Qubit 2.0 Fluorometer (Invitrogen, Life Technologies, USA). Total RNA integrity was then assessed using the Agilent 2100 Bio-Analyzer (Agilent Technologies, USA). Only those bone marrow aspirate (BMA) with RNA integrity number (RIN) > 7.0 were included in transcriptome sequencing (RNA-Seq).

Library Preparation and RNA Sequencing
One microgram of total RNA was used to remove ribosomal RNA (rRNA) using the Ribo-Zero rRNA Removal Kit (Illumina, USA). Purified rRNA-depleted RNA was subjected to RNA library preparation using the ScriptSeqTM v2 RNA-Seq Library Preparation Kit according to the manufacturer instructions. After library construction, the library was diluted to 1.5 ng/µl after preliminary quantitation by Qubit 2.0 and insert size by Agilent 2100. Qpcr was used to accurately quantify the library effective concentration (> 2nM), in order to ensure the library quality. The libraries were then sequenced on Illumina HiSeq 2500 (Illumina, USA).

Bioinformatics Analysis
Paired-end sequences were individually obtained as FASTQ files from the images by CASAVA base recognition software. We later filtered the raw reads to remove adaptor sequences, reads containing undetermined bases >10%, low quality reads having QScore of over 50% bases of the read is < = 5. After obtaining the clean reads, we further mapped the reads against the human reference genome (GRCh38) using TopHat2 (Kim et al., 2013). For the differentially expressed genes analysis, the read count was adjusted using trimmed mean of M values (TMM), then the differential expression analysis was conducted using the DEGseq R package (Wang et al., 2010). The pathway enrichment analysis was performed based on the KEGG database (Kanehisa et al., 2019). KEGG enriched pathway results were demonstrated by using Scatterplot. The KEGG enriched pathway results were evaluated by the Rich factor value, Qvalue and the number of differentially expressed genes involved in the enriched pathways. The rich factor value refers to the ratio of the number of differentially expressed genes in the pathway and the number of all genes annotated in the same pathway. The bigger the value of the rich factor is, the more significant the enrichment degree is.

DATA AVAILABILITY STATEMENT
All of the sequencing reads from this project have been uploaded to NCBI with the BioProject ID PRJNA509497 (https://www. ncbi.nlm.nih.gov/sra/?term=PRJNA509497).

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by National University of Malaysia Ethics Committee. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.