No evidence on infectious DNA-based agents in pediatric acute lymphoblastic leukemia using whole metagenome shotgun sequencing

The etiology of pediatric acute lymphatic leukemia (ALL) is still unclear. Whole-metagenome shotgun sequencing of bone marrow samples in patients with treatment-naïve ALL (n=6) was performed for untargeted investigation of bacterial and viral DNA. The control group consisted of healthy children (n=4) and children with non-oncologic diseases (n=2) undergoing bone marrow sampling. Peripheral blood of all participants was investigated at the same time. After bioinformatical elimination of potential contaminants by comparison with the employed controls, no significant amounts of microbial or viral DNA were identified.


Introduction
Acute lymphoblastic leukemia (ALL) is the most common childhood cancer (Pui, 1995).Nevertheless, its etiology is still unknown.Derived from epidemiological studies, infection-related hypotheses have frequently been discussed in the past (Greaves and Alexander, 1993;Kinlen et al., 1995).However, no specific causative agent has been identified, so far (O'Connor and Boneva, 2007).Using next-generation sequencing (NGS) techniques for untargeted amplification of the whole viral and bacterial metagenome in pre-treatment bone marrow samples, a higher viral load has been identified in pediatric patients with ALL compared to acute myeloid leukemia (AML) (Francis et al., 2014).To further investigate the infection-related hypothesis in pediatric ALL using novel sequencing techniques, we performed a pilot trial to search for microbial and viral DNA using whole metagenome sequencing (WMS) of peripheral blood and bone marrow samples of six patients with treatment-naïve ALL and six children without oncologic diseases acting as negative controls.

Collection of samples
All samples were collected during planned bone marrow punctures of the iliac crest after disinfection of the skin and under sterile conditions using an aspiration cannula.Bone marrow punctures were scheduled in six patients for initial diagnosis of ALL, for bone marrow donation in four healthy participants and for other reasons in two participants (diagnostic purpose in one patient with severe aplastic anemia (SAA) and gathering of an autologous backup ahead of bone marrow transplantation in one patient suffering from sickle cell anemia).All work was conducted within the formal approval of the institutional ethics comittee (University Tuebingen, approval number 561/2020BO1, gathered 11/2020) and in accordance with the Declaration of Helsinki.Blood and bone marrow samples were collected after obtaining written informed consent of participant's guardians and informing study participants according to the corresponding age group.
The sample collection for this study was performed in the midst or the end of the bone marrow aspiration, not at the beginning, to minimize the risk of contamination through commensal bacteria of the skin.A total of five milliliters of bone marrow was drawn, plus one additional milliliter of bone marrow for microbial cultivation as contamination control.Five milliliters of peripheral blood were collected through a peripheral venous blood draw or a central venous catheter (Hickman line).In patients below three years of age, two milliliters of blood were taken.Collection of bone marrow and peripheral blood was performed simultaneously, whenever possible.
The collected sample material was placed in DNA stabilizer tubes (Streck Cell-free DNA BCT, CE-IVD, Streck, La Vista, NE, USA) and sent to the laboratory at room temperature.After centrifugation for separation of plasma and peripheral blood cells, the plasma was stored at -80°C until further processing.Bone marrow samples were treated in the same manner.

Metagenomic nextgeneration sequencing
Blood samples and bone marrow samples were processed in a blinded manner and therefore processed identically.Cells were separated from the supernatant by centrifugation at 1,600 x g for 10 min at 4°C and the supernatant was transferred to a fresh reaction tube.Then, a second centrifugation step at 16,000 x g for 10 min at 4°C was performed, supernatants were again transferred, and aliquots were further stored at -80°C.Nucleic acid isolation, quality controls and library preparation were carried out as previously described (Grumaz et al., 2016).NGS was performed on a Illumina © NextSeq 550 instrument.Adequate positive and negative controls accompanied all laboratory and sequencing procedures.Raw sequencing data were subjected to various QC controls comprising PHRED-Score filtering, adapter trimming, complexity filtering as well as k-mer based classification and contamination screening.To pass the quality filter, read quality needed to surpass a PHRED score of 20 and achieve a minimal length of 50 bp after quality control.All data generated were analyzed using Noscendo's DISQVER platform, integrating the CE-IVD for pathogen detection assay from blood.The DISQVER platform comprises a curated microbial genome reference database of over 16,000 microbial species covering >1,500 pathogens and can detect bacteria, DNA-viruses, fungi, and parasites while differentiating contaminating commensals from infective agents via run controls and dedicated control collectives.For each pathogen type (bacterium, virus, fungus, parasite) a dedicated minimum threshold for reporting was set, that is further refined by incorporating data for each species individually regarding their observation in control cohorts.The limit of detection (LOD) was determined by bioinformatic and wetlab analytical validation procedures, which are further described in Supplementary Material 1.For the PCA analysis, raw sequencing data after removal of human sequences and after classification of the nonhuman fraction was used for each sample.Read counts were log10 transformed prior to PCA analysis.The positive control was used after bioinformatic exclusion of the known, spiked-in microbial sequences, in order to not bias the results by taking these into account.

Statistical methods
Statistical analyses were conducted using SPSS 27 (Armonk, New York) and RStudio 2022.12.0 (Boston, MA).Descriptive statistics are reported as median [range], if not otherwise specified.For comparison of means between cfDNA levels in control and ALL cohorts, the Wilcoxon rank sum test was used as values were not-normally distributed.For comparison of read mapping proportions between the different groups, the Kruskal-Wallis test was used.

Participant characteristics
As shown in Table 1, participants were well balanced concerning gender and age.Common type ALL was the most frequent diagnosis in the ALL group, whereas controls were mainly healthy children and young adolescents willing to donate bone marrow for a sibling.
In the ALL group, four out of six patients received antibiotic treatment ahead of BMA.In three patients, intravenous Piperacillin/Tazobactam or Cefuroxime plus Clindamycin was administered just one day ahead of BMA.The remaining case with fever and mild pancytopenia was treated intravenously for six days with Cefuroxime and then discharged.Fourteen days later, ALL was diagnosed via BMA.
In the control group, only the two patients suffering from SAA or sickle cell anemia received antibiotics ahead of BMA: Piperacillin/Tazobactam in the SAA patient two days ahead of BMA, and long-term Penicillin prophylaxis in the sickle cell patient.

Results of WMS
In total, an mean of 4.8 ng/µl of cell free DNA (cfDNA) could be isolated in the bone marrow of the ALL patients, compared to 2 ng/ µl in the control group (Table 1, p=0.18).In peripheral blood, mean cfDNA concentration was 2.1 ng/µl in ALL and 0.3 ng/µl in the control group (p=0.08).
The cfDNA samples were processed in 6 different batches with different reagent lots due to logistical requirements.The batches separate into different clusters (Figure 1A) in contrast to the plot showing the different groups (patients, controls, synthetic sequencing run controls, Figure 1B) where no separation based on any of these groups was possible.In the read mapping statistics regarding the proportion of human reads versus non-human reads (Table 2), no significant differences between the groups were discernible (Figure 2A), with the exception of the synthetic control that shows a statistically significant increase in the nonhuman proportion, due to the synthetic microbial sequences (Figure 2B).In blood and bone marrow samples of patients or controls, no significant amounts of viral DNA were identified.DNA fragments for Torque Teno virus were found in two patients in the raw data, but below LOD and reporting thresholds.

Discussion
In this pilot trial, WMS of bone marrow samples in pediatric patients with treatment-naïve ALL revealed no specific infectious agents and no differences in the viral or bacterial metagenome when compared to peripheral blood of the same patients (internal validation) or in comparison to peripheral blood and BM of healthy children or children with non-oncologic diseases (external validation).As a limitation, only DNA viruses could have been

A B
PCA plot visualizing the clustering of datasets based on the six investigated batches (A) compared to the distribution in the PCA plot based on the different groups (patients, controls and synthetic run controls) (B).identified, which could explain the opposing results of a higher viral infection load in ALL previously reported (Francis et al., 2014) where RNA has been amplified.The absence of a discernible bacterial metagenome in the peripheral blood and bone marrow samples after implementation of rigid controls, could also be linked to the recent discussion about the hypothesized "human blood microbiome", which has been postulated by several small trials using NGS techniques on peripheral blood samples (McLaughlin et al., 2002;Moriyama et al., 2008;Païsséet al., 2016), reviewed by Castillo et al. (2019) A frequent issue that arises from the combination of highthroughput sequencing and low-biomass samples (Castillo et al., 2019) is a contamination of the used DNA extraction kits with commensal bacteria ("kitome") as well as cross-contamination between different samples in the laboratory ("splashome") that might emulate a "microbiome".This has been demonstrated by Olomu et al. (2020), who opposed the findings of a hypothetical placental microbiome (Bassols et al., 2016;Zheng et al., 2017) by adjusting for these confounders.Just recently, the idea of a "human blood microbiome" has been rendered unlikely by the findings of a population-based study including nearly 10,000 healthy subjects (Tan et al., 2023).However, the small cohort size of our pilot study as well as the fact that several patients received antibiotic treatment ahead of inclusion into this trial have to be acknowledged as important limitations in interpreting our data.
As a conclusion, further unbiased investigation of a potential, infectious-related etiology of pediatric ALL using untargeted DNA amplification methods should always implement rigid controls against potential confounders, and might consider the detection of viral DNA and RNA in larger sample sizes.

TABLE 2
Read mapping statistics of whole metagenome shotgun sequencing for all patient samples (n=23) in comparison to synthetic control samples.