Utility of clinical metagenomics in diagnosing malignancies in a cohort of patients with Epstein-Barr virus positivity

Backgrounds Differentiation between benign and malignant diseases in EBV-positive patients poses a significant challenge due to the lack of efficient diagnostic tools. Metagenomic Next-Generation Sequencing (mNGS) is commonly used to identify pathogens of patients with fevers of unknown-origin (FUO). Recent studies have extended the application of Next-Generation Sequencing (NGS) in identifying tumors in body fluids and cerebrospinal fluids. In light of these, we conducted this study to develop and apply metagenomic methods to validate their role in identifying EBV-associated malignant disease. Methods We enrolled 29 patients with positive EBV results in the cohort of FUO in the Department of Infectious Diseases of Huashan Hospital affiliated with Fudan University from 2018 to 2019. Upon enrollment, these patients were grouped for benign diseases, CAEBV, and malignant diseases according to their final diagnosis, and CNV analysis was retrospectively performed in 2022 using samples from 2018 to 2019. Results Among the 29 patients. 16 of them were diagnosed with benign diseases, 3 patients were diagnosed with CAEBV and 10 patients were with malignant diseases. 29 blood samples from 29 patients were tested for mNGS. Among all 10 patients with malignant diagnosis, CNV analysis suggested neoplasms in 9 patients. Of all 19 patients with benign or CAEBV diagnosis, 2 patients showed abnormal CNV results. The sensitivity and specificity of CNV analysis for the identification for tumors were 90% and 89.5%, separately. Conclusions The application of mNGS could assist in the identification of microbial infection and malignancies in EBV-related diseases. Our results demonstrate that CNV detection through mNGS is faster compared to conventional oncology tests. Moreover, the convenient collection of peripheral blood samples adds to the advantages of this approach.


Introduction
Epstein-Barr virus (EBV) is a g-herpesvirus with a doublestranded DNA genome that infects over 90% of the human population worldwide (de-The et al., 1975).It is mainly transmitted through saliva.Once inside the body, it invades the pharyngeal lymphatic tissue and B lymphocytes, leading to lifelong latent infections.EBV-related diseases often manifest as prolonged or unknown fever (FUO).The initial infection with EBV typically results in infectious mononucleosis, characterized by symptoms such as persistent fever, swollen lymph nodes, fatigue, malaise, etc (Dunmire et al., 2015).However, persistent EBV infection can have more serious consequences, including the development of Burkitt lymphoma, Hodgkin lymphoma, nasopharyngeal carcinoma, gastric carcinoma, and various malignancies in individuals with compromised immune systems due to either inherited or acquired immunodeficiency (Old et al., 1966;zur Hausen et al., 1970;Shibata et al., 1991;Hjalgrim et al., 2003;Thorley-Lawson and Gross, 2004).In addition to cancer, chronic active Epstein-Barr virus (CAEBV) is a progressive disorder associated with persistently high levels of EBV DNA in the blood and infiltration of organs by EBV-positive lymphocytes.Patients with CAEBV lack evidence of a known underlying immunodeficiency and are unable to control acute infection with the virus (Kimura, 2006;Kimura and Cohen, 2017).Moreover, they often exhibit poor response to antiviral therapy, interferon, intravenous immunoglobulin, and conventional chemotherapy, resulting in limited chances for improvement.In some cases, patients with CAEBV may develop EBV-positive B, T, or NK cell lymphomas (Okano, 2002).Given the severity of EBV-related diseases, it is crucial to promptly and accurately diagnose them to ensure appropriate treatment and care.
Methods used for the detection and monitoring of EBV include the heterophile antibody test, the EBV viral capsid antigen (VCA) IgG and IgM antibody test, EBV nuclear antigen (EBNA) antibodies, EBV-DNA qPCR and Epstein-Barr encoding region (EBER) in situ hybridization (Linderholm et al., 1994;Okano et al., 2005;Hurt and Tammaro, 2007;Gulley and Tang, 2008;De Paschale and Clerici, 2012).However, traditional microbiological tests that measure the pathogen load of EBV in blood cannot shed light on whether the patient's symptoms are of an infectious or malignant etiology.Additionally, the pathogen load of EBV cannot be simply used to develop threshold values or patterns for medical intervention since it varies significantly between individuals and assay platforms (Dunmire et al., 2018;Wu et al., 2019), thus more focused clinical examinations are necessary.A pathological biopsy is used as the gold standard for malignancy diagnosis, but this operation might be higher-risk, time-consuming, laborious, and costly.It is call for a more convenient and rapid tests help with diagnosis in clinical practice.
Metagenomic Next-Generation Sequencing (mNGS) is a DNA sequencing-based approach for the detection of clinically relevant microorganisms.It has been commonly used in patients suspected of infectious diseases when a definitive etiological diagnosis is elusive (Chiu and Miller, 2019).Moreover, mNGS has clinical utilities that extend beyond pathogen detection (Leary et al., 2012).For instance, two recent studies have used mNGS to identify neoplasms from body fluids and cerebrospinal fluid with an overall sensitivity of 87% and specificity of 100% (Gu et al., 2021a;Gu et al., 2021b).This was achieved by analysis of chromosomal copy numbers of the host and detection of possible cancer-related copy number variations (CNVs).Although many other studies have been proved that the CNVs is the pathogenic factor of genetic diseases and could be used as a screening tool (Yang et al., 2013;Retterer et al., 2016;Trujillano et al., 2017;Zampaglione et al., 2020).Indeed, genomic instability is also one of the hallmarks of cancer cells (Shlien and Malkin, 2009), multiple studies have shown that CNV is an important component of genetic variation involved in the development and progression of tumors.Chromosomal CNVs have been recently used for early diagnosis of colorectal cancer, breast cancer, and hematologic malignancies (Kumaran et al., 2017;Xu et al., 2018;Lenaerts et al., 2019).For instance, pyothorax-associated lymphoma has been found to have the following characteristics: monoclonal pattern of EBV infection, complicated chromosomal abnormalities with numerous structural and numerical abnormalities, and occasional but distinct genome instability.These observations showed the potential utility of mNGS in diagnosing both microbial infections as well as cancer in patients who are positive for EBV in a single test (Takakuwa et al., 2003).In view of these, we developed and applied a metagenomic approach to validate its role in identifying EBV-related malignant diseases in this study.

Study design and participants
We enrolled the patients with positive EBV results in the cohort of FUO in the Department of Infectious Diseases of Huashan Hospital affiliated with Fudan University from 2017 to 2019.The peripheral blood samples were collected from the FUO patients on the first day of admission for the routine tests and pathogen detection examinations, which were stored simultaneously.The positive of EBV was ascertained by either (1) plasma or whole blood for EBV was positive by PCR or mNGS methods or (2) the EBV viral capsid antigen (VCA) IgM antibody test positive.The CNVs analysis was retrospectively performed in 2022 using original samples of the abovementioned EBV-positive patients from 2017 to 2019.And then these patients were classified into benign diseases, CAEBV, and malignant diseases based on their final diagnosis (Figure 1A).The final diagnosis was confirmed by 2 independent physicians with a follow-up after 5-7 years and the diagnostic process is described in the Supplementary Information.This study was approved by the Ethics Committee of Huashan Hospital (Approval number: KY2017-338).

DNA extraction, library preparation and metagenomic sequencing
For DNA extraction, we used a kit from Matridx, Cat# MAR002 and followed standard operation procedures (SOPs) provided by the manufacturer.Peripheral blood samples were centrifuged at 16000g for 10 min and cell-free DNA (cfDNA) was extracted from plasma.DNA library was prepared by NGSmaster, a device that can automatically complete nucleic extraction, PCR-free library preparation (enzymatic fragmentation of genomic DNA, end repairing, terminal adenylation and adaptor ligation) and purification (Luan et al., 2021).Sequencing libraries were quantified by real-time PCR (KAPA) and pooled.Shotgun sequencing was carried out on Illumina Nextseq.Approximately 20 million of 50bp single end reads (dual barcode sequencing to minimize index hopping) were generated for each library.Bioinformatic analysis was conducted as described in a previous report (Shen et al., 2020).Briefly, sequences of human origin were filtered (GRCh38.p13)and the remaining reads were aligned to a reference database (NCBI nt, GenBank and in-house curated genomic database) to identify the microbial species and read count.For each sequencing run, a negative control (culture medium containing 10 (Hjalgrim et al., 2003) Jurkat cells/mL) was included.

mNGS reporting criteria
The pathogens were reported if: 1) the sequencing data passed quality control filters (library concentration > 50 pM, Q20 > 85%, Q30 > 80%); 2) negative control (NC) in the same sequencing run does not contain the species or the RPM (sample)/RPM (NC) ≥ 5, which was determined according to previous studies as a cutoff for discriminating true-positives from background contaminations (Schlaberg et al., 2017;Wilson et al., 2019;Luan et al., 2021).

CNVs analysis
DNA sequences were aligned to the human hg19 (GRCh37) reference genome.Guanine-cytosine (GC) content bias was corrected using LOESS regression.The standard deviation of the read fold change of each bin of data (bin size 100k) and normalized read counts was obtained.CNVs were called based on XHMM and Canoes (using reference data of normal chromosomal copy numbers) (Fromer et al., 2012;Backenroth et al., 2014).The sequencing depth was approximately 20M (20 million) reads to ensure that the coverage of the host chromosomes was comparable among specimens (>95% of sequencing reads were of human origin).The presence of large or multiple CNVs are strong indications of tumor cells since inherited CNVs are usually small (i.e. the largest inherited CNV is trisomy 21).We determined the possibility of neoplasms (presence of tumor cells) when large CNVs (> 10Mbp) were detected by the bioinformatic pipeline, which were unlikely to be caused by inherited disorders (Gu et al., 2021a).In addition, machine learning-based algorithm was applied to indicate the risk of tumors as previously described (Guo et al., 2021).

Statistical analysis
The CNVs results were compared with the clinical final diagnosis for each patient.Sensitivity, specificity, efficiency, positive predictive value (PPV) and negative predicted value (NPV) were calculated to evaluate the ability of CNVs test in the performance of distinguishing the EBV-related diseases and malignant disease.In addition, we calculated Jouden's index to conduct a comprehensive evaluation of the diagnostic indicator.
The median (IQR) depth of sequencing was 11.9 (10.4-15.0) million reads.All 29 samples generate interpretable copy ratio plots (Figure S1).Of all the 29 patients, 10 patients were diagnosed with malignant diseases.And among all 10 patients with malignant diagnosis, there were 9 patients with positive CNV results (Figure 1B), including 5 cases lymphoma, 1 nasopharyngeal carcinoma, 1 gastric carcinoma, 1 liver cancer, and 1 case of polyneuropathy, organomegaly, endocrinopathy, M-protein, skin changes (POEMS) syndrome.Of all 19 patients with benign or CAEBV diagnosis, 2 patients showed abnormal CNV results.Thus, the sensitivity and specificity of CNVs analysis for the identification of tumors was 90% and 89.5%, separately (Figure 1B) and the efficiency is 89.7%.In general, the Jouden's index of CNVs analysis is 0.76.
Next-generation sequencing revealed the presence of more than 10 chromosomal copy number variations (CNVs) affecting over 10 million bases (10 M) of genomic regions in patients 390 and 2206, both diagnosed with malignant neoplasms.Specifically, patient 390 was diagnosed with lymphoma while patient 2206 was diagnosed with nasopharyngeal carcinoma (Figures 2A, B).Notably, four patients were thought to have CAEBV or benign diseases, but the CNV analysis suggested the possibility of cancer, which was confirmed by tissue biopsy or clinicians' consensus in the followup evaluations.Among the four patients, one was diagnosed with POEMS syndrome, and CNV analysis indicated a deletion of -19 (mos), the CNV analysis revealed a duplication of +8 in a patient with liver cancer, deletion of 6p22.2-p12.3(dup_23.5Mb)and 6p25.3-p22.2(del[mos]_26.6Mb) in a patient with gastric carcinoma, and a deletion of 6q11.1-q27(del[mos]_109.2Mb), 17p13.3-p11.1(del[mos]_22.3Mb),duplication of 17q11.1-q25.3(dup[mos]_55.9Mb)and deletion of 19p13.3-p11(del[mos]_24.5Mb) in a patient with lymphoma (Figures 2C-F).Among these chromosomal variations, abnormalities of chromosome 17 has been found to be related to lymphoma, loss and gain of chromosome 6 has frequently been reported in gastric carcinoma, and upregulated expression of chromosome 8 has been found in liver cancer (Jendiroba et al., 1995;Parada et al., 1998;Li et al., 2003;Kang et al., 2014;Li et al., 2022).These findings indicate the predictive and early diagnostic value of CNVs in malignant diseases.
Of all 11 patients that showed positive CNVs, two were finally diagnosed with benign diseases, indicating a positive predictive value (PPV) of 81.8%.Patient 3081 was a child diagnosed with superfemale syndrome and CNV indicated deletion of Xp22., and duplication of Xq11.1-q28(dup_93.3Mb)(Figure 2G).On the other hand, patient 895 had a negative CNV result but was diagnosed with lymphoma, yielding a negative

Discussion
Although chromosomal CNVs do not directly indicate cancer, they can provide valuable diagnostic clues for clinicians to order appropriate diagnostic tests.This can potentially save time, money and effort spent on discovering an infectious etiology, especially when microbiological tests are negative and empirical antibiotics had little effect.Differential diagnosis can be challenging and timeconsuming, as different diseases may exhibit overlapping clinical manifestations.For instance, both benign and malignant diseases can result in elevated levels of serum inflammatory and tumor markers.Moreover, lymphoma can be misdiagnosed due to its ability to affect multiple organs and present with symptoms that resemble infection rather than cancer.Under such circumstances, CNVs detected by mNGS can lead to a more detailed clinical investigation of malignant diseases.Indeed, various studies have reported chromosomal abnormalities in cases of EBV-related malignancy, such as EBV-positive nodular sclerosis-type Hodgkin lymphoma, EBV-positive plasmablastic lymphoma (PL) carrying immunoglobulin (IG)/MYC rearrangements referred to a diagnostic hallmark of Burkitt lymphoma and EBV-positive PL with a higher mutation of JAK -STAT3 pathway genes (Taddesse-Heath et al., 2010;Valera et al., 2010;Castillo et al., 2015;Hayashida et al., 2017;Aukema et al., 2021;Ramis-Zaldivar et al., 2021).The EBV-related malignant diseases are serious conditions, but treatment is positive with equal harm caused by missed and misdiagnosis (Fugl and Andersen, 2019).The CNVs analysis with an efficiency of 89.7% could be helpful in daily clinical practice.Besides, Jouden's index of this indicator is 0.76 which can be considered as an indicator of high diagnostic value.
Clinicians often encounter challenges when diagnosing and treating patients with fever of unknown origin, particularly when considering different potential causes such as microbial infections, malignancies, and collagen vascular diseases.These conditions often present with overlapping symptoms, making it difficult to determine the underlying cause.As a result, febrile cancer patients may receive empirical antibiotic treatment even in the absence of microbial infections due to delays in timely and efficient etiological diagnosis.Therefore, a diagnostic tool such as mNGS can be helpful to detect both pathogens and neoplasms and provide results in hours.One advantage of mNGS is that it does not require intact cells while pathological tests rely on the quantity and integrity of cells for proper analysis.In our study, the cell-free DNA (cfDNA) in plasma was processed for sequencing, which enables retrospective testing of stored specimens.By leveraging mNGS technology, clinicians can overcome the challenges of diagnosing FUO patients by obtaining comprehensive and rapid results that cover both infectious and malignant etiologies.This approach saves time, enables more targeted treatments, and offers the potential for improved patient outcomes.
The assay used in our study also has limitations.First, CNV analysis using mNGS data can only identify the gain or loss of genetic materials.Gene mutations (such as point mutations of oncogenes) cannot be detected due to limited sequencing depth (typically ~20 million reads for mNGS tests).In addition, unlike karyotyping, structural variations that do not involve numerical changes of gene copies cannot be detected, such as balanced translocation and inversion.As a result, tumors cannot be ruled out with a negative result.Second, experimental and quality control standards are currently lacking for mNGS, which are technically challenging to perform.It is recommended for febrile or immunocompromised patients when a chronic EBV infection has been diagnosed.Third, the CNVs detected by mNGS can only suggest the presence of tumor DNA but cannot pinpoint the location or whether the tumor is benign or malignant and therefore should be corroborated by additional diagnostic tests.Finally, the sample size in our study was limited, and clinical studies of a larger scale could yield more definitive evidence on the utility of mNGS in real-world settings.
In conclusion, the application of mNGS could assist in the identification of microbial infection and malignancies in EBVrelated diseases.Our results demonstrate that CNV detection through mNGS is faster compared to conventional biopsy tests.Moreover, the convenient collection of peripheral blood samples adds to the advantages of this approach.On the other hand, it is important to note that this study is preliminary due to the limited sample size.Conducting prospective diagnostic trials on a larger scale would provide more conclusive evidence and enhance the validity of the findings.
FIGURE 1Schematic of the CNVs Test and Its Overall Performance.(A) An mNGS test was performed on blood samples assessed for aneuploidy and pathogens in patients with EBV positive in the cohort of FUO (fever of unknown origin).(B) 29 patients performed the mNGS test and the CNVs results manifested by groups.CNVs, copy number variations.
FIGURE 2 Plots showing patients with abnormal genomic copy numbers.(A) Copy ratio plot of Patient 390: lymphoma.(B) Copy ratio plot of Patient 2206: nasopharyngeal carcinoma.(C) Copy ratio plot of Patient 2998: POEMS syndrome.(D) Copy ratio plot of Patient 2339: liver cancer.(E) Copy ratio plots of Patient 2060: gastric cancer.(F) Copy ratio plot of Patient 3871: lymphoma.(G) Copy ratio plot of Patient 3081: superfemale syndrome.(H) Copy ratio plot of Patient 1849: latent infection of EBV.

TABLE 1
Patients' characteristics and CNV results in the study.