Strategy and Performance Evaluation of Low-Frequency Variant Calling for SARS-CoV-2 Using Targeted Deep Illumina Sequencing

The ongoing COVID-19 pandemic, caused by SARS-CoV-2, constitutes a tremendous global health issue. Continuous monitoring of the virus has become a cornerstone to make rational decisions on implementing societal and sanitary measures to curtail the virus spread. Additionally, emerging SARS-CoV-2 variants have increased the need for genomic surveillance to detect particular strains because of their potentially increased transmissibility, pathogenicity and immune escape. Targeted SARS-CoV-2 sequencing of diagnostic and wastewater samples has been explored as an epidemiological surveillance method for the competent authorities. Currently, only the consensus genome sequence of the most abundant strain is taken into consideration for analysis, but multiple variant strains are now circulating in the population. Consequently, in diagnostic samples, potential co-infection(s) by several different variants can occur or quasispecies can develop during an infection in an individual. In wastewater samples, multiple variant strains will often be simultaneously present. Currently, quality criteria are mainly available for constructing the consensus genome sequence, and some guidelines exist for the detection of co-infections and quasispecies in diagnostic samples. The performance of detection and quantification of low-frequency variants using whole genome sequencing (WGS) of SARS-CoV-2 remains largely unknown. Here, we evaluated the detection and quantification of mutations present at low abundances using the mutations defining the SARS-CoV-2 lineage B.1.1.7 (alpha variant) as a case study. Real sequencing data were in silico modified by introducing mutations of interest into raw wild-type sequencing data, or by mixing wild-type and mutant raw sequencing data, to construct mixed samples subjected to WGS using a tiling amplicon-based targeted metagenomics approach and Illumina sequencing. As anticipated, higher variation and lower sensitivity were observed at lower coverages and allelic frequencies. We found that detection of all low-frequency variants at an abundance of 10, 5, 3, and 1%, requires at least a sequencing coverage of 250, 500, 1500, and 10,000×, respectively. Although increasing variability of estimated allelic frequencies at decreasing coverages and lower allelic frequencies was observed, its impact on reliable quantification was limited. This study provides a highly sensitive low-frequency variant detection approach, which is publicly available at https://galaxy.sciensano.be, and specific recommendations for minimum sequencing coverages to detect clade-defining mutations at certain allelic frequencies. This approach will be useful to detect and quantify low-frequency variants in both diagnostic (e.g., co-infections and quasispecies) and wastewater [e.g., multiple variants of concern (VOCs)] samples.

1 Transversal Activities in Applied Genomics, Sciensano, Brussels, Belgium, 2 Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium, 3 VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium, 4 Quality of Laboratories, Sciensano, Brussels, Belgium, 5 Expertise and Service Provision, Sciensano, Brussels, Belgium The ongoing COVID-19 pandemic, caused by SARS-CoV-2, constitutes a tremendous global health issue. Continuous monitoring of the virus has become a cornerstone to make rational decisions on implementing societal and sanitary measures to curtail the virus spread. Additionally, emerging SARS-CoV-2 variants have increased the need for genomic surveillance to detect particular strains because of their potentially increased transmissibility, pathogenicity and immune escape. Targeted SARS-CoV-2 sequencing of diagnostic and wastewater samples has been explored as an epidemiological surveillance method for the competent authorities. Currently, only the consensus genome sequence of the most abundant strain is taken into consideration for analysis, but multiple variant strains are now circulating in the population. Consequently, in diagnostic samples, potential co-infection(s) by several different variants can occur or quasispecies can develop during an infection in an individual. In wastewater samples, multiple variant strains will often be simultaneously present. Currently, quality criteria are mainly available for constructing the consensus genome sequence, and some guidelines exist for the detection of co-infections and quasispecies in diagnostic samples. The performance of detection and quantification of low-frequency variants using whole genome sequencing (WGS) of SARS-CoV-2 remains largely unknown. Here, we evaluated the detection and quantification of mutations present at low abundances using the mutations defining the SARS-CoV-2 lineage B.1.1.7 (alpha variant) as a case study. Real sequencing data were in silico modified by introducing mutations of interest into raw wild-type sequencing data, or by mixing wild-type and mutant raw sequencing data, to construct mixed samples subjected to WGS using a tiling ampliconbased targeted metagenomics approach and Illumina sequencing. As anticipated, higher variation and lower sensitivity were observed at lower coverages and allelic frequencies. We found that detection of all low-frequency variants at an abundance of 10, 5, 3, and 1%, requires at least a sequencing coverage of 250, 500, 1500, and 10,000×, respectively. Although increasing variability of estimated allelic frequencies at

INTRODUCTION
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent of the ongoing COVID-19 pandemic . To limit the spread of disease, governments were forced to take drastic measures due to the high potential for human-to-human transmission and the lack of immunity in the population (Leclerc et al., 2020). SARS-CoV-2 spreads very easily during close person-to-person contact (Azuma et al., 2020). Consequently, the individual diagnostic testing for SARS-CoV-2 on respiratory samples using reverse transcription quantitative polymerase chain reaction (RT-qPCR) is essential for the diagnosis of patients presenting COVID-19 symptoms for appropriate clinical treatment and isolation, as well as for tracing potential contact transmissions, including asymptomatic individuals. Systematic individual SARS-CoV-2 diagnostics are also used to test certain population cohorts, such as primary caregivers, to avoid transmission of the virus to vulnerable people, such as the elderly (Bayle et al., 2021).
Data from individual diagnostics are also collected and analyzed for surveillance by National Reference Centres to assist governments to monitor the epidemiological situation. The efficiency of this strategy for epidemiological monitoring depends greatly on the extent of testing the complete population. Additionally, it may be biased by the willingness of individuals, covering all population ages, to get tested, whether individuals are aware of being infected, and visitors to a certain country not always being included in the testing strategy. Moreover, despite having a relatively low per-sample cost, the high volume of required tests incurs substantial costs for public health systems for which testing capacities can be exceeded during periods of intense circulation of the virus (Contreras et al., 2021). The detection of newly emerging SARS-CoV-2 strains may be delayed by the lack of testing during such periods. As SARS-CoV-2 viral particles and mRNA have been isolated from feces of COVID-19 patients (Wu et al., 2020b;Zhang et al., 2020), monitoring of wastewater for SARS-CoV-2 has been explored as a complementary and independent alternative for epidemiological surveillance for the competent authorities (European Commission, 2021). Various studies have observed an association between an increase in reported COVID-19 cases and an increase of SARS-CoV-2 RNA concentrations in wastewater (Ahmed et al., 2020;Medema et al., 2020;Wu et al., 2020a). Wastewater-based monitoring could therefore be a cost-effective, non-invasive, easy to collect, and unbiased approach to track circulating virus strains in a community (Thompson et al., 2020). Compared to clinical surveillance, wastewater surveillance could also provide opportunities to estimate the prevalence of the virus and assess its geographic distribution and genetic diversity (Sinclair et al., 2008;Xagoraraki and O'Brien, 2020), and can be used as a non-invasive early-warning system for alerting public health authorities to the potential (re-)emergence of COVID-19 infections (Panchal et al., 2021). Alternatively, the absence of the virus in wastewater surveillance could indicate that an area can be considered at low risk for SARS-CoV-2 infections (European Commission, 2021).
Although the mutation rate of SARS-CoV-2 is estimated as being low compared to other RNA viruses (Duchene et al., 2020), several new variants carrying multiple mutations have already emerged. Some of these variants are characterized by a potential enhanced transmissibility, and can cause more severe infections and/or potential vaccine escape (SAGE-EMG, SPI-B, Tranmission Group, 2020; Davies et al., 2021; GOV.UK -Scientific Advisory Group for Emergencies, 2021; Greaney et al., 2021;Hoffmann et al., 2021). Consequently, monitoring current and potential future variants is crucial to control the epidemic by taking timely measures because these variants can affect epidemiological dynamics, vaccine effectiveness and disease burden.
To monitor SARS-CoV-2 variants, RT-qPCR methods were designed to detect a selection of the mutations that define specific variants of concern (VOCs). VOCs are, however, defined by a combination of multiple mutations and only few mutations can be targeted by RT-qPCR assays. This approach is not sustainable because it is likely that the ongoing vaccination and increased herd immunity will result in the selection of new mutations and emergence of new VOCs (Gómez et al., 2021), as has been observed with other viruses (Boni, 2008;Shao et al., 2017). Since only a few mutations can be targeted by a RT-qPCR assay, an additional step of whole genome sequencing (WGS) is required to fully confirm the variant's sequence (Bal et al., 2021).
Whole genome sequencing has been used to understand the viral evolution, epidemiology and impact of SARS-CoV-2 resulting in, as of July 2021, more than 2,000,000 publicly available SARS-CoV-2 genome sequences, mainly derived from respiratory samples that are frequently submitted to the Global Initiative on Sharing Avian Influenza Data (GISAID) database (Shu and McCauley, 2017). Most of these sequences were obtained using amplicon sequencing in combination with the Illumina or Nanopore technology, with Illumina still being the most commonly used method (Shu and McCauley, 2017;Charre et al., 2020). This large amount of genomes allows reliable detection of variants based on the consensus genome sequence in patient samples (van Dorp et al., 2020;Firestone et al., 2021;Hartley et al., 2021;Lin et al., 2021). The European Centre for Disease Prevention and Control (ECDC) has defined several quality criteria for diagnostic samples depending on the application. For most genomic surveillance objectives, a consensus sequence of the (near-)complete genome is sufficient and a minimal read length of 100 bp and minimal coverage of 10× across more than 95% of the genome is recommended. To reliably trace direct transmission and/or reinfection, a higher sequencing coverage of 500× across more than 95% of the genome is recommended for determining lowfrequency variants (LFV) that can significantly contribute to the evidence for reinfection or direct transmission. In-depth genome analysis, including recombination, rearrangement, haplotype reconstruction and large insertions and deletions (indel) detection, should be investigated using long-read sequencing technologies with a recommended read length of minimally 1000 bp and a sequencing coverage of 500× across more than 95% of the genome (ECDC, 2021). A few studies evaluated quasispecies in diagnostic samples by only evaluating positions with a minimum depth of 100× (Lythgoe et al., 2021), by employing a minimum AF of 2% and a minimum depth of 500× (Siqueira et al., 2021) or by using Lofreq with a false discovery rate cut-off of 1%, minimum coverage of 10×, dynamic Bonferroni correction for variant quality and strand bias filtering (Karim et al., 2021). Due to the high cost of sequencing large quantities of samples from individual patients, samples that tested positive for a selection of mutations related to VOCs using RT-qPCR and have a sufficiently high viral load are typically sequenced. Consequently, only a subset of all circulating variants is detected during routine clinical surveillance. Since wastewater samples contain both SARS-CoV-2 RNA from symptomatic and asymptomatic individuals, sequencing wastewater samples can provide a more comprehensive picture of the genomic diversity of SARS-CoV-2 circulating in the population compared to individual diagnostic testing and sequencing. Wastewater surveillance of SARS-CoV-2 may therefore be of considerable added value for SARS-CoV-2 genomic surveillance by providing a cost-effective, rapid, and reliable source of information on the spread of SARS-CoV-2 variants in the population.
Sequencing of wastewater samples is, however, currently mainly used to reconstruct the consensus genome sequence of the most prevalent SARS-CoV-2 strain in the sample and LFV are often not investigated (Nemudryi et al., 2020;Bar-Or et al., 2021;Crits-Christoph et al., 2021;Sharif et al., 2021). This consensus sequence can be useful to demonstrate that the detected strain in wastewater corresponds to the dominant strain that circulates in individuals within the same community (Crits-Christoph et al., 2021). However, similarly to diagnostic samples, only limited quality criteria are in place when sequencing wastewater samples and those available often only apply for consensus sequence construction. The EU recommends the generation of one million reads per sample and a read length of more than 100 bp (European Commission, 2021). A few studies evaluated LFV in wastewater samples, by using local haplotype reconstruction with ShoRAH (Jahn et al., 2021) or iVar and setting up a minimum coverage of 50×, Phred score of ≥30 and a minimal allelic frequency (AF) of 10% (Izquierdo-Lara et al., 2021) or a minimum base quality filter of 20 with a minimum coverage of 100× (Rios et al., 2021). However, none of these studies evaluated their approach on well-defined populations nor determined detection thresholds for retaining LFV. Since multiple VOCs may co-circulate in a given population, their relative abundance is expected to vary and potentially be very low in wastewater samples. While genome consensus variant calling workflows can only identify mutations present at high AFs, LFV calling methods have been specifically designed to call mutations at lower-than-consensus AFs, and are required to detect VOCs in wastewater samples that are present at an AF below 50%. Appropriate tools and statistical approaches should be provided to ensure reliable and comparable collection and analysis of data, because the detection of LFV is challenging due to the drop in confidence of called mutations at low AFs and sequencing coverages (Macalalad et al., 2012;Wilm et al., 2012;Isakov et al., 2015). High-quality sequencing reads are required to ensure that single nucleotide variants (SNVs) and indels can be reliably called and quantified. Most LFV calling algorithms therefore consider multiple sequencing characteristics such as strand bias, base quality, mapping quality, sequence context, and AF (McCrone et al., 2016) to delineate true variants from sequencing errors. Although the viral diversity in multiple WGSbased studies has been explored using several variant calling methods (Kundu et al., 2013;Rogers et al., 2015;Simon et al., 2019), they are often not benchmarked against defined viral populations, rendering the feasibility of using these methods for detecting SARS-CoV-2 VOCs in mixed samples for wastewater surveillance largely unknown.
In this study, we evaluate the performance of LFV detection and quantification based on targeted SARS-CoV-2 sequencing for mutations present at low abundances via the Illumina technology. We used mutations that define the B.1.1.7 lineage as a proof-of-concept. Using two real sequencing datasets that were in silico modified by either introducing mutations of interest into raw wild-type sequencing datasets or mixing wildtype and mutant raw sequencing data, we provide guidelines for minimum sequencing coverages to detect clade-defining mutations at specific AFs. This approach can be used to detect and quantify LFV in diagnostic samples (e.g., to detect coinfections and quasispecies) and wastewater samples (e.g., to detect multiple strains circulating in the population).

Employed Sequencing Data and Generation of Consensus Genome Sequences
SARS-CoV-2 raw sequencing data from 316 samples was downloaded from the Sequence Read Archive (SRA) (Leinonen et al., 2011). A random selection of samples was done on the 27 th of January 2021 from the COVID-19 Genomics UK (COG-UK) consortium (PRJEB37886) including only samples with a submission date in January 2021, sequenced with Illumina Novaseq 6000 and using an amplicon-based enrichment strategy (Supplementary File 1).
To ensure correct pairing of fastq files, all samples were re-paired using BBMap v38.89 repair.sh with default settings (Bushnell and BBMap, 2021) (Figure 1: Step 1). The consensus genome sequences were generated for all these samples (Figure 1: Step 2). The workflow was built using the Snakemake workflow management system using python 3.6.9 (Mölder et al., 2021). Next, the re-paired paired-end reads were trimmed using Trimmomatic v0.38 (Bolger et al., 2014) setting the following options: "LEADING:10", "TRAILING:10", "SLIDINGWINDOW:4:20" and "MINLEN:40". As reference genome for read mapping of the SRA samples, the sequence with GISAID accession number EPI_ISL_837246 was used for the wild-type samples, while EPI_ISL_747518 was used for the mutant samples. Both references were chosen based on the fact that they should have a complete genome according to GISAID. Additionally, these were chosen to be as close to the SRA data as possible based on their location of sampling (i.e., United Kingdom), sampling date that was in the same period as the data obtained from SRA (i.e., December 2020-January 2021), and whether or not it was classified as belonging to the B.1.1.7 lineage. These reference genomes were indexed using Bowtie2build v2.3.4.3 (Langmead and Salzberg, 2012). Trimmed reads were aligned to their respective reference genomes using Bowtie2 v2.3.4.3 using default parameters. The resulting SAM files were converted to BAM files using Samtools view v1.9 (Danecek et al., 2021) and sorted and indexed using the default settings of respectively Samtools sort and Samtools index v1.9. Using the sorted BAM file, a pileup file was generated with Samtools mpileup v1.9 using the options "--count-orphans" and "--VCF." Next, the variants were called with bcftools call v1.9 using the options "-O z", "--consensus-caller", "--variants-only" and "-ploidy 1", and converted and indexed to uncompressed VCF files with respectively bcftools view v1.9 using the options "--outputtype v" and bcftools index v1.9 using the option "--force." Lastly, a temporary consensus sequence was generated using bcftools consensus v1.9 with default settings, providing the reference genome and produced VCF file as inputs. Afterward, the previous steps were repeated once with the same options using the generated temporary consensus sequence as fasta reference to generate the final consensus sequence. These sequences were used to confirm either the presence or absence of the cladedefining mutations of the B.1.1.7 mutant for both the mutant and wild-type samples respectively (Table 1). To extract the sequencing coverage for each position and subsequently calculate the median coverage for each sample, Samtools depth v1.9 was used on the BAM files. Additionally, bamreadcount v0.8.0 1 was run on all samples using the BAM files to determine the coverage at each position.
From the initial 316 samples, ten mutant samples were selected that presented similar coverage depths at the positions of interest 1 https://github.com/genome/bam-readcount after normalization (see below). These samples contained the mutations assigned to the B.1.1.7 variant. Ten wild-type samples were also chosen that did not contain any of these mutations (Tables 1, 2) and also presented similar coverage depth at the positions of interest after normalization. Lineage B.1.1.7, termed Variant of Concern (VOC) 202012/01 by Public Health England (PHE) (Public Health England, 2021), 20I/501Y.V1 by Nextstrain (Centers for Disease Control and Prevention [CDC], 2021) and alpha variant by the World Health Organization (WHO, 2021), was first reported in the United Kingdom but became the dominant strain in many European countries until the emergence of the delta variant since mid-April 2021 (Mishra et al., 2021). The B.1.1.7 variant was found to be more transmissible (Davies et al., 2021) and may cause more severe infections (SAGE-EMG, SPI-B, Tranmission Group, 2020; GOV.UK -Scientific Advisory Group for Emergencies, 2021). Lineage B.1.1.7 is defined by multiple spike protein changes, including deletion 69-70 and deletion 144 in the N-terminal domain, amino changes N501Y in the receptorbinding domain, and amino acid changes A570D, P681H, T716I, S982A, D1118H, as well as mutations in other genomic regions (Rambaut et al., 2021). More recently PHE has reported B.1.1.7 cases with an additional mutation, E484K (Public Health England, 2021). Median coverages of the selected samples were consistently high (minimum 13,848×; maximum 36,255×) and median read lengths were always 221 and 201 for the forward and reverse reads respectively ( Table 2). Additionally, as suggested by ECDC, more than 95% of the genome was covered by reads with a minimal coverage of 500× (ECDC, 2021).

Low-Frequency Variants Detection
The absence of pre-existing wild-type and mutant LFV at the positions defining lineage B.1.1.7 (Table 1) was verified in both the mutant and wild-type samples (Figure 1: Step 3), respectively, by calling all LFV in these samples and subsequently checking the positions of interest. Python 3.6.9 was used with the packages pysam 0.16.0.1 (Li et al., 2009) and numpy 1.19.5 (Harris et al., 2020). Each generated (final) consensus FASTA file for each sample coming from SRA was used as reference for its respective sample and indexed using Samtools faidx v1.9 and Bowtie2-build v2.3.4.3. Bowtie2 v2.3.4.3 was then used to align the reads of each sample to its reference sequence, producing a SAM file that was converted into BAM using Samtools view v1.9. Next, reads were sorted using Picard SortSam v2.18.14 2 with the option "SORT_ORDER = coordinate" and Picard CreateSequenceDictionary v2.18.14 (Broad Institute) 3 was used to generate a dictionary of the reference FASTA file. Picard AddOrReplaceReadGroups v2.18.14 (Broad Institute) (see footnote 3) was afterward run on the reads with the flags "LB", "PL", "PU", and "SM" set to the arbitrary placeholder value "test." The resulting BAM files were indexed using Samtools index v1.9 and used as input for GATK RealignerTargetCreator 3.7 (McKenna et al., 2010), which was followed by indel realignment using GATK IndelRealigner v3.7 ( McKenna et al., 2010). Next, generated BAM files were indexed using Samtools index v1.9. The call function of the LoFreq v2.1.3.1 package (Wilm et al., 2012) was used to call LFV in the BAM files and generate a VCF file using the options "--call-indels" and "--no-default-filter" and using the consensus sequence as reference to call LFV. Next, the unfiltered VCF The first, second, and third columns present respectively the gene name, cDNAlevel mutation and protein-level mutation. The last column describes whether the position is covered by one or two amplicons from the enrichment panel (Supplementary Table 1). (*) One adaptation was observed for position 26 801.
In the wild-type strains a G was observed in contrast to Rambaut et al. (2021) where a T was observed. (**) Due to the tiled amplicon approach used to amplify the samples prior to sequencing, the regions where amplicons overlapped resulted in a double coverage. Mutation C27972T was positioned in such an overlap in the wild-type, but not in the mutant. WT, wild-type.
file was filtered using the filter function of the LoFreq v2.1.3.1 package, setting the strand bias threshold for reporting a variant to the maximum allowed value by using the option "--sb-thresh 2147483647" to allow highly strand-biased variants to be retained, to account for the non-random distribution of reads due to the design of the amplification panel. All employed scripts are available in Supplementary File 2. Additionally, the workflow is also available at the public Galaxy instance of our institute at https://galaxy.sciensano.be as a free resource for academic and non-profit usage. The presence of the nucleotides assigned to the B.1.1.7 lineage or the wild-type (Table 1) was verified for the mutant and wild-type samples, respectively. Additionally, it was checked that there were no LFV at these positions, so that the wild-type nucleotide or mutant nucleotide was always present at 100% for the retained 10 WT and 10 mutant samples.
v2.27.1 (Quinlan and Hall, 2010). Finally the LFV detection workflow (Figure 1: Step 3) described in section "Low-Frequency Variants Detection" was used on these 10 samples for all 364 conditions using the FASTA file that was generated for the wildtype samples from SRA as reference with LoFreq.

Dataset 2: Introduction of Mutations of Interest by Mixing Wild-Type and Mutant Raw Sequencing Read Datasets
For the second dataset (Figure 1: Step 5), the coverage of all 20 samples ( Table 2) was normalized to 5000× using BBMap v38.89 bbnorm.sh (Bushnell and BBMap, 2021) with the options "target = 5000", "mindepth = 5", "fixspikes = f ", "passes = 3" and "uselowerdepth = t". However, due to the tiled amplicon approach used to amplify these samples prior to sequencing, regions where amplicons overlapped subsequently had double coverage resulting in two coverages, i.e., 5000 and 10,000×, after normalization (Supplementary Table 1). In silico datasets were then generated by mixing the appropriate number of reads for every combination of the ten wild-type and ten mutant samples, resulting in a total of 100 mixed samples, which were down-sampled using "seqtk sample" (with option "-s100") to the appropriate fractions for the required combination of 13 final coverages (100, 250, 500, 750, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, and 5000×) and 26 AF (mutant: 0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10, 20, 30, 40, 50, and 100%). This resulted in 100 mixed samples at 338 conditions (i.e., combination of coverage and AF). Finally, the LFV detection workflow (Figure 1: Step 3) described in section "Low-Frequency Variants Detection" was used on these samples for all conditions using the FASTA file that was generated for the wild-type samples from SRA as reference, except for samples with 100% AF for the mutant positions where the FASTA file of the mutant sample was used. Although the second dataset was normalized for total coverage at every genomic position, the tiled amplicon approach resulted in some genomic positions being covered by two overlapping amplicons. Two groups of mutations were therefore obtained for every coverage ( Table 2), i.e., for a targeted coverage of 5000×, 17 mutations were present at ∼5000× (C913T, C3267T, C5388A, C5986T, T6954C, 11288-11296 deletion, C14676T, C15279T, 21765-21770 deletion, A23063T, C23271A, C23604A, C23709T, T24056G, G26801C, G28048T, and C28977T) and 7 mutations were present at ∼10,000× (T16176C, 21991-21993 deletion, G24914C, A28111G, G28280C, A28281T, and T28282A). Mutation C27972T was excluded from further analysis, because this position in the wild-type samples was located in a region where amplicons overlapped resulting in a coverage of approximately 10,000×, while in mutant samples it was in a region with no overlap and where a coverage of 5000× was therefore observed (Supplementary Table 1). For further analysis, the results were pooled together per theoretical coverage resulting in 24 mutations per coverage but only 17 and 7 mutations at the lowest (i.e., 100×) and highest (i.e., 10,000×) coverage, respectively (Supplementary Table 2). The actual median coverage was calculated per theoretical targeted coverage using the output of bamreadcount v0.8.0 of each sample. Using this output, the coverage of each position of interest was extracted (Supplementary Table 2).

Qualitative Evaluation of Detection of B.1.1.7 at Different Abundances
Since samples of Dataset 1 were normalized for the total median coverage, different individual positions of interest could exhibit deviating coverages. For the qualitative evaluation of LFV detection (i.e., can mutant positions of interest be correctly detected?), the number of false negatives was counted per condition (i.e., combination of AF and coverage) and divided by the total number of observations [i.e., the number of samples (n = 10) and number of mutations considered for that condition (n = 25)]. A mutant position of interest was considered as correctly detected as soon as it was detected by LoFreq, irrespective of its estimated AF.
Dataset 2 was subjected to the same qualitative evaluation as described for Dataset 1. The number of false negatives per condition was divided by the number of observations (i.e., the number of samples (n = 100) and number of mutations considered for that condition [either n = 7, 17, or 24)].
The visualization of the qualitative evaluation was performed using a contour plot from the R package plotly (RStudio 1.0.153; R3.6.1) (Sievert, 2020). The false negative (FN) proportion in the qualitative evaluation plots ranged from 0 to 1 with a step size of 0.1.

Quantitative Evaluation of Detection of B.1.1.7 at Different Abundances
For the quantitative evaluation of LFV detection (i.e., is the estimated AF of correctly detected mutant positions of interest close to the true AF?) of both datasets, FN values were considered as 'below the quantification limit' with the quantification limit equal to the lowest recorded value for that condition (i.e., combination of AF and coverage). Outliers were identified for each condition using the Grubbs test that was sequentially applied by first searching for two outliers at the same side, followed by a search for exactly one outlier. If the p-value of the Grubbs test was below 0.05, outliers were excluded. The standard deviation (SD) and mean value of AF for every condition were estimated by a maximum likelihood model based on the normal distribution that took the FN into account as censor data. Data were modeled according to a normal distribution. If the percentage of FN results was above 75%, the condition was, however, excluded from quantitative evaluation. Finally, a performance metric describing closeness to the true AF was calculated for each targeted AF individually by dividing each pooled squared SD by the maximal pooled squared SD. This metric will range between 0, relatively the closest to the targeted AF, and 1, relatively the furthest from the targeted AF.
As described for the qualitative evaluation, contour plots from the R package plotly (RStudio 1.0.153; R3.6.1) were used for the visualization of the quantitative evaluation. The performance metric in the quantitative evaluation plots ranged from 0 to 1 with a step size of 0.1. To construct samples using targeted SARS-CoV-2 sequencing with a VOC present at low abundances in the viral population, B.1.1.7 clade-defining mutations were first in silico introduced at well-defined AFs and coverages in real sequencing data ("Dataset 1") of ten wild-type samples, without, however, using any coverage normalization so that individual mutations could be present at higher or lower coverages compared to the total median genomic coverage due to unevenness of coverage. To assess whether introduced mutations were correctly detected, or alternatively missed as FN, samples of this dataset were analyzed using a LFV calling workflow based on LoFreq. Figure 2A depicts the proportion of FN observations, and corresponding values are presented in Table 3, for all evaluated coverages and targeted AFs until 20%. Results for all targeted AFs (including higher values) are presented in Supplementary  Figure 1 and Supplementary Table 3. All LFV could be detected at an AF of 1% at a median coverage of 10,000×. As the coverage decreased, the AF threshold at which no single FN occurred (i.e., perfect sensitivity) increased to 1.5% at 5000×, 3% at 1000×, 5% at 500×, 9.5% at 250×, and 20% at 100×. When allowing a maximum of 10% FN (i.e., sensitivity of 90%), the AF thresholds decreased substantially to 1% at 5000×, 1.5% at 1000×, 2.5% at 500×, 4% at 250×, and 7.5% at 100×. No false positive mutations related to the mutant and wild-type were observed at, respectively, 0 and 100% AF.

Qualitative Evaluation
A second approach was also considered for constructing samples using targeted SARS-CoV-2 virus sequencing with a VOC present at low abundances, by in silico mixing real raw sequencing reads from ten B.1.1.7 samples into ten wild-type samples ("Dataset 2") for a total of 100 mixes at well-defined AFs and coverages, while applying coverage normalization so that individual mutations were present at approximately similar coverages for all B.1.1.7 clade-defining positions. Figure 2B depicts the proportion of FN observations, and actual values are presented in Table 4, for all evaluated coverages and targeted AF until 20%. Results for higher targeted AF are presented in Supplementary Figure 2 and Supplementary  Table 4. All LFV could be detected at an AF of 1% at a median coverage of 9792×. As the coverage decreased, the AF thresholds at which no single FN occurred (i.e., perfect sensitivity) increased to 1.5% at 4851×, 3.5% at 969×, 4% at 482×, 7% at 237×, and 20% at 97×. However, when allowing a maximum of 10% FN (i.e., reducing the sensitivity to 90%), the AF thresholds decreased substantially to 1% at 4851×, 2% at 969×, 3% at 482×, 4% at 237×, and 7% at 97×. No false positive mutations related to the mutant and wild-type were observed at 0 and 100%, respectively. Overall, the results for Dataset 1, using the median coverages, and Dataset 2, using the coverages at the positions of interest, were qualitatively similar. To evaluate the possibility of quantifying LFV in both datasets, the SDs of available observations were first evaluated for each condition (i.e., combination of AF and coverage). This provisional analysis indicated that for both Dataset 1 (Supplementary File 3) and Dataset 2 (Supplementary File 4), the SD systematically decreased per target AF as coverage increased. This provisional analysis also indicated that for both datasets, irrespective of coverage, the SD generally increased between a targeted AF of 1 to 10%, after which it plateaued for targeted AFs above 20%. We therefore employed the squared SD per AF divided by the maximal squared SD per target AF to describe closeness of observed AF to the true AF, for which results are presented in Figure 3A for Dataset 1. As expected, the variation in AF estimates fluctuates in function of the median coverage and targeted AF, with variation decreasing per target AF as coverage increased, but also variation being generally more pronounced at low AFs irrespective of coverage. Notwithstanding, even for regions in Figure 3A exhibiting high variation, the variability overall remained small (Supplementary  File 3). The interquartile range (IQR) (Supplementary File 3D) of the observed AF was still limited at the various targeted AF ranging from 0.62-6.26% at an AF of 50%, 0.36-3.49% at an AF of 10% and 0.27-2.07% at an AF of 5% with the highest IQR observed at lower coverages.

Quantitative Evaluation Demonstrates
Results for the quantitative evaluation of Dataset 2 are presented in Figure 3B, and are in accordance with the trends observed for Dataset 1 with the variation decreasing per target AF as coverage increased, and lower target AFs exhibiting increasing variation irrespective of coverage. Notwithstanding, similarly to Dataset 1, the observed total variation remained small (Supplementary File 4). The IQR (Supplementary File 4D) of the observed AF was limited at the various targeted AF ranging from 0.73-3.93% at an AF of 50%, 0.41-3.93% at an AF of 10% and 0.29-2.27% at an AF of 5% with the highest IQR observed at lower coverages.

DISCUSSION
Whole genome sequencing is a more powerful approach than RT-qPCR to track both existing and newly emerging SARS-CoV-2 variants. WGS is currently, however, mainly used to construct the consensus genome sequence and determine the most prevalent strain in communities, but interest exists in its potential for detecting LFV both within diagnostic samples to detect co-infections and quasispecies, and wastewater samples to determine all circulating variants in a population FIGURE 2 | Qualitative evaluation of Dataset 1 (A) and Dataset 2 (B) based on false negative proportions per condition until a targeted mutant AF of 20%. Orange and red dots represent conditions with a FN proportion between 0 and 0.1, and between 0.1 and 1, respectively. The percentage of FN is colored ranging from 0 (dark) to 1 (light) in intervals of 0.1 as extrapolated using a contour plot in the R package plotly (Sievert, 2020) (actual FN proportions are presented in Table 3 for Dataset 1 and Table 4 for Dataset 2). Results for targeted mutant AF values > 20% are presented in Supplementary Figure 1 for Dataset 1 and Supplementary  Figure 2 for Dataset 2. Both the X-and Y -axis follow a logarithmic scale.
(European Commission, 2021). To evaluate the potential of targeted amplicon-based SARS-CoV-2 WGS to detect and quantify LFVs at low abundances, we assessed the performance of a workflow designed for LFV detection in WGS data. Mutations defining lineage B.1.1.7 were employed as a proofof-concept using an approach based on in silico modifying real sequencing data to construct two datasets with the Illumina technology. These two datasets comprise in total 35,100 different samples, which results in a thorough in silico analysis requiring a considerable amount of computational calculation hours to validate this approach. For the first dataset, lineage B.1.1.7defining mutations were introduced in silico into raw wildtype sequencing datasets. For the second dataset, the same mutations were introduced by mixing wild-type and B.1.1.7 raw sequencing datasets. In Dataset 1, the coverage profiles of samples corresponded to a typical real dataset including large fluctuations in sequencing coverage at certain positions. In Dataset 2, sequencing coverages were normalized, which allowed evaluating with high precision how reliable AF detection is at specific coverages. Afterward, the ability to both detect and quantify LFV was evaluated. Results demonstrated that WGS enabled detecting LFV with very high performance. As expected, lower coverages and AFs resulted in lower sensitivity and higher variability of estimated AFs. We found, employing the most conservative thresholds from either Datasets 1 or 2, that a sequencing coverage of 250, 500, 1500, and 10,000× is required to detect all LFV at an AF of 10, 5, 3, and 1%, respectively (Tables 3, 4). For quantification of variants, the variability remained overall small for all conditions respecting the thresholds above, resulting in reliable abundance estimations, despite the variability of estimated AF increasing at lower coverages and AF. Of note, it was observed that the profile of the genome coverage differed at some positions between wild-type and mutant samples indicating that the amplicon-based enrichment approach could possibly introduce a bias. Consequently, this should be considered when examining and quantifying the proportion of mutants in samples. Our results can serve as a reference for the scientific community to select appropriate thresholds for the AF and coverage. These could also be context-specific as a smaller or larger degree of false negatives might be warranted for specific applications, and can also be used as a baseline for determining the number of samples that can be multiplexed per run to optimize costefficiency of WGS.
With respect to diagnostic samples, this study illustrates it is feasible to use targeted amplicon-based metagenomic approaches to detect co-infections and quasispecies in diagnostic samples. There are currently only limited guidelines available regarding the coverage and AF for such samples and those criteria were not assessed using predefined populations. ECDC has provided limited quality criteria regarding the sequencing coverage, namely 500× across 95% of the genome to detect LFV, but has not indicated the corresponding AF thresholds this corresponds to for reliable LFV detection (ECDC, 2021). Based on the results obtained in this study, a coverage of 500× allowed to detect LFV until an AF of 5% with perfect sensitivity and would therefore be less suited to detect LFV at lower AFs. Lythgoe et al. (2021) recommended a depth of at least 100 reads with an AF of at least 3% to detect the LFV in diagnostic samples with high viral loads (50,000 uniquely mapped reads), while Siqueira et al. (2021) used an AF threshold of 2% and a minimal depth coverage of 500 reads and Karim et al. (2021) adopted an AF of 1% and a minimal depth coverage of 10×. Based on the results in this study, these recommendations appear not sufficiently strict, since we observed that an AF of 1, 2, and 3%, requires at least a sequencing coverage of 10,000, 2500, and 1500× to detect all LFV or 3500, 1000, and 500× to detect 90% of LFV, respectively. However, our study is limited to in silico modified data from real diagnostic samples, so these results will need to be validated using real samples with well-established existing LFV in future research.
With respect to wastewater samples, our findings also corroborate the feasibility of using targeted amplicon-based metagenomics approaches for wastewater surveillance, as such samples comprise a collection of different strains, among which the dominant strain will define the consensus sequence of the sample and the detected LFV will represent the circulating strains present at lower frequencies. Only very limited recommendations regarding wastewater sequencing are available by the competent authorities. The EU has recommended the generation of one million reads per sample with a read length of minimum 100 bp which corresponds to a minimum coverage of 3333× using the Lander/Waterman equation (European Commission, 2021). Based on the results obtained in this study, a coverage of 3000 and 3500× allowed to detect LFV until an AF of 2 and 1.5% respectively with perfect sensitivity. Other studies that investigated LFV in wastewater have provided limited quality criteria regarding the coverage and AF. Furthermore, the quality criteria in those studies were not evaluated using a defined population (Izquierdo-Lara et al., 2020;Jahn et al., 2021). Izquierdo-Lara used a minimum depth coverage of 50× and minimum AF of 10% (Izquierdo-Lara et al., 2021), while Rios et al. (2021) adopted a minimum depth coverage of 100× without indicating an AF threshold. Based on the results in this study, these recommendations appear not sufficiently strict as a sequencing coverage of 100× and 250× at an AF of 20 and 10% respectively was required to observe all LFV.
Obtaining high-quality sequencing reads for wastewater samples may, however, be challenging under real-world conditions. In contrast to diagnostic samples in which viral loads are typically high, ranging from 10 4 to 10 7 copies/mL (Pan et al., 2020), viral RNA loads in wastewater samples are often low, ranging from 10 −1 to 10 3.5 copies/mL (Saawarn and Hait, 2021). This renders it more challenging to sequence samples with a low viral load in addition to the RNA degradation that occurs in wastewater samples. Additionally, variants circulating at low frequencies in a community are expected to be present at a low AF in wastewater samples. Nevertheless, employing the most conservative thresholds from either Datasets 1 or 2, 90% of LFV present at an AF of 10, 5, 3, and 1% were still detected at a sequencing coverage of 100, 250, 500, and 2500×, respectively (Tables 3, 4).
This study focused on the sensitivity of LFV detection and did not explore the false positive rates (i.e., specificity). Although our recommendations for AFs and coverages ensure high sensitivity, often an inverse relationship exists between sensitivity and specificity and we can therefore not exclude that false positives occur for AF and coverage combinations considered as providing qualitative results in this study. A false positive detection is, however, typically less problematic compared to a false negative result as the former can still be discovered in follow-up investigation in contrast to the latter. Additionally, false positive observations typically occur randomly over the genome (McCrone et al., 2016) and it is unlikely that all VOCdefining mutations would be simultaneously falsely detected, even at low AFs and coverages. The issue of low viral loads, low expected AF and potential false positives could be mitigated by sequencing samples in duplicate when necessary. Possible false positive results could be investigated using RT-qPCR or RT-ddPCR assays that target those specific positions.  The percentage of FN is colored ranging from 0 (dark) to 1 (light) according to the gradient depicted in Figure 2A.  The percentage of FN is colored ranging from 0 (dark) to 1 (light) according to the gradient depicted in Figure 2B.
Frontiers in Microbiology | www.frontiersin.org FIGURE 3 | Quantitative evaluation of Dataset 1 (A) and Dataset 2 (B) using the squared SD divided by the maximal squared SD per targeted AF. The figure is colored ranging from 0 (dark) to 1 (light) in intervals of 0.1 as extrapolated using a contour plot in the R package plotly (Sievert, 2020) (actual values are presented in Supplementary File 3 for Dataset 1 and Supplementary File 4 for Dataset 2). Both the X-and Y -axis follow a logarithmic scale. Conditions with a FN proportion higher than 75% were excluded and correspond to the white plane in the lower left corner.
In this study, the B.1.1.7 variant and a WT (i.e., non-VOC) background of the same time period and location were used as a proof-of-concept, but can be considered to also apply to other combinations (e.g., two VOCs), since additional VOCs in the sample material will translate into more VOC-defining mutations in the background genomic material that will be independently identified by the variant calling engine. In the presence of multiple VOCs, the VOCs can be identified by composing all possibly existing combinations of LFV as a conservative strategy, although multiple VOCs in one sample will also make the estimation of the relative abundance of each VOC more complicated. If multiple VOCs with partially overlapping defining mutations would be present in a wastewater sample, some mutations of interest would consequently be present at different AFs. Haplotype reconstruction methods could be used in such situations to delineate VOCs. However, most haplotype reconstruction programs perform poorly under higher levels of diversity, and haplotype populations with rare haplotypes are often not recovered (Eliseev et al., 2020). Although haplotype reconstruction has been described for short reads, Nanopore sequencing might offer a substantial advantage for such cases due to its longer reads, despite their higher error rate, to perform haplotype estimation to delineate actual VOCs.

CONCLUSION
There exists a pressing need for recommendations for detecting LFV for both diagnostic samples and wastewater surveillance. Further investigation will be required to investigate the specificity and possibility to detect VOCs instead of just mutations, including for other existing and employed methodologies such as probe-based capture, other amplicon-based methods, and Nanopore sequencing. Nevertheless, using in silico modified data derived from WGS of real diagnostic samples, this study demonstrates the feasibility of a targeted metagenomics approach for highly sensitive LFV detection with acceptable relative abundance estimations using a tiled-amplicon enrichment based on the Illumina technology. This approach enables the detection of mutations associated with specific VOCs. Our approach could be used to evaluate the potential occurrence of coinfections with other SARS-CoV-2 variants with different strains in diagnostic samples. It can also be employed to detect multiple strains for wastewater surveillance, although several additional challenges exist for wastewater samples such as low viral load and potential RNA degradation. Since in this study, high-quality data from diagnostic samples was used and modified in silico to construct datasets to provide guidelines for sequencing wastewater and diagnostic samples with coinfections, future work will need to consider data coming from samples that are closer to real data from actual diagnostic and wastewater surveillance. In light of the pandemic urgency, and the multiple SARS-CoV-2 wastewater surveillance initiatives that are being established and also being integrated into overarching coordination and preparedness initiatives such as the recently announced European Health Emergency Preparedness and Response Authority (European Commission, 2021), we hope that our results will help establishing guidance and recommendations for wastewater surveillance and other relevant applications.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
NR, KV, and XS: conceptualization. NR: project administration. LV: data curation, investigation, and visualization. LV, TD, WC, SD, NR, and KV: methodology. LV, TD, and WC: software and formal analysis. LV and TD: validation. LV, TD, NR, and KV: writing -original draft preparation. NR and PH: funding acquisition. NR and KV: supervision. All authors: writingreview and editing.

FUNDING
This study was financed by Sciensano through COVID-19 special funding.