Rapid and Affordable High Throughput Screening of SARS-CoV-2 Variants Using Denaturing High-Performance Liquid Chromatography Analysis

Mutations in the receptor binding domain (RBD) of SARS-CoV-2 alter the infectivity, pathogenicity, and transmissibility of new variants of concern (VOCs). In addition, those mutations cause immune escape, undermining the population immunity induced by ongoing mass vaccination programs. There is an urgent need for novel strategies and techniques aimed at the surveillance of the active emergence and spread of the VOCs. The aim of this study was to provide a quick, cheap and straightforward denaturing high-performance liquid chromatography (DHPLC) method for the prompt identification of the SARS-CoV-2 VOCs. Two PCRs were designed to target the RBD region, spanning residues N417 through N501 of the Spike protein. Furthermore, a DHPLC screening analysis was set up. The screening consisted of mixing the unknown sample with a standard sample of a known variant, denaturing at high temperature, renaturing at room temperature followed by a 2-minute run using the WAVE DHPLC system to detect the heteroduplexes which invariably form whenever the unknown sample has a nucleotide difference with respect to the standard used. The workflow was able to readily detect all the variants including B.1.1.7, P.1, B.1.585 B.1. 617.2 and lineages at a very affordable cost. The DHPLC analysis was robust being able to identify variants, even in the case of samples with very unbalanced target concentrations including those samples at the limit of detection. This approach has the potential of greatly expediting surveillance of the SARS-CoV-2 variants.


INTRODUCTION
The SARS-CoV-2 genome is more stable than other RNA viruses thanks to the proofreading activity operated by a 3'-to-5' exoribonuclease (nsp14-ExoN) during replication which reduces the error rate of RNA polymerase 100-1000-fold. This confers the capacity of maintaining its 30,000 nt genome to the virus without catastrophic mutational events hampering its integrity (1)(2)(3). Nonetheless, errors still occur in the SARS-CoV-2 genome at a higher rate than in eukaryotic cells and, together with high replication rates, allow for the accumulation of mutations in the viral genome including amino acid changes, truncations, or the loss of viral proteins (4)(5)(6). These changes may impact infectivity, pathogenicity, and transmissibility, and they could lead to higher fitness and undergo positive selection (5,(7)(8)(9).
The D614G substitution of the Spike (S) protein is the first and most investigated of the positively selected mutations. This mutation occurred early in the first months of 2020 and became rapidly prevalent worldwide due to the increased infectivity of the strains carrying it (4,(10)(11)(12).
In immunologically naïve COVID-19 patients, as those prevailing at the beginning of the pandemic, immune response exerts limited selection pressure on the virus transmission. However, as the COVID-19 pandemic continues, and vaccination programs expand, it is expected that the rapid rise of population-level immunity might exert a strong positive selection pressure for those mutations responsible for immune escape or combination thereof, prompting immune escape. In other words, the speed at which resistance against acquired immunity develops in the population increases substantially as the number of infected or vaccinated individuals increases As most vaccines exploit the immunogenicity of the Spike protein, due to its pivotal role in binding to the ACE2 cell receptor and entry into the host cell, the strict monitoring of mutations in the domains targeted by the neutralising antibodies should be implemented worldwide (13)(14)(15)(16)(17). Substitutions in the receptor binding domain (RBD) have emerged, and they are of particular concern due to the possibility of being responsible for immune escape (17). In fact, after the D614G variant, at least five variants (B.1.1.7 also known as the Alpha variant, B.1.351 also known as the Beta variant, P.1 also known as the Gamma variant, B.1.617.2 also known as the Delta variant, and B.1.1.529 also known as the Omicron variant) carrying several mutations in the RBD of the S protein (9,18) have emerged, impacting the transmission dynamic and causing epidemic waves which succeeded each other. More specifically, these variants carry mutations of concern or mutations of interest in the 417, 440, 452, 477 478 and 484 493, 496, 498, 501 and 505 codons of the S protein which have strong implications for infectivity and immune evasion (19,20).
Variants of concern (VOCs) which might escape vaccine immunity are being monitored by genomic surveillance based on next generation sequencing (NGS) implemented in wealthier and many developing countries. Mass-scale sequencing has had a dramatic impact associated with the pandemic; it has enabled epidemiologic surveillance allowing the phylogenetic analysis to trace the origin and the spread, and to predict the evolution of the epidemic waves (18,(21)(22)(23)(24)(25). However, the sequencing of all or of even most of the isolates is still far from a reality. To date, only 21 of 160 countries have sequenced more than 1% of the confirmed cases, and 83 of the 160 have sequenced less than 0.1% of the positive cases (19,20). Precisely those countries which have limited diagnostic capability and even more limited sequencing possibilities are those having a population which is less vaccinated and those in which it is more likely that new VOCs might emerge.
There is an urgent need for methods capable of permitting affordable genomic surveillance in the real world. Alternatives have been proposed to be integrated with sequencing in order to more effectively monitor the VOCs (26)(27)(28)(29). Heteroduplex analysis using denaturing high-performance liquid chromatography (DHPLC) is a fast, very sensitive, low cost and reliable technique for screening nucleic acid variations It has been used for two decades in many different applications but, for the most part, for detecting cancer somatic mutations owing to its extreme analytical sensitivity (30); DHPLC discovers DNA variations by separating heteroduplex and homoduplex DNA fragments using ion-pair reverse-phase liquid chromatography (30). This technique is based on the assumption that when two distinct PCRamplified DNA targets containing nucleotide variations are denatured by heating and then left to renature by cooling, heteroduplexes, due to cross-hybridisation between the mismatched strands, in addition to homoduplexes, are also formed. Heteroduplexes and homoduplexes are bound to a stationary phase and eluted by a denaturing gradient at a constant slightly denaturing temperature. Since heteroduplexes have a lower stability, they are eluted before homoduplexes, and they are detected as different peaks modifying the chromatogram shape by ultraviolet absorption.
The aim of this study was to set up and evaluate a very quick, accurate and straightforward DHPLC method for scanning the RBD of SARS-CoV-2 isolates and, eventually, to readily screen the SARS-CoV-2 VOCs to undergo full genome sequencing. The proposed workflow is intended to screen all the PCR positive samples using DHPLC in addition to those randomly undergoing full genome sequencing. All positive samples are examined with two overlapping endpoint PCRs targeting the RBD. The amplicons are then mixed with the standard represented by the predominant variant and assayed using DHPLC. Those samples positive for heteroduplexes at DHPLC then undergo sequencing of amplicons and/or for full genome sequencing. The overall workflow of the screening proposed is represented in Figure 1.

Samples and Experimental Design
The experimental layout consisted of a preliminary setup of the DHPLC method using open-label lineage assigned SARS-CoV-2 isolates combined with a validation phase using blind-label All the RT-PCR-positive samples had been lineage assigned by means of sequencing using either the Oxford Nanopore or the Illumina platform.

RNA Purification and Endpoint RT-PCR
For the DHPLC screening, RNA was re-extracted starting from 200 µL of UTM-RT (Copan) samples using a commercial kit (Maxwell ® RSC Blood RNA Kit) with an automatic instrument (Maxwell RSC).
The iScript cDNA Synthesis kit (Bio-Rad) was used to reverse transcribe the RNA samples to cDNA. In brief, 10 µL of RNA were mixed with 4 µL of iScript Reaction mix (containing MMLV RNase H, dNTPs, oligo(dT)s and random primers), 1 µL of iScript Reverse Transcriptase and nuclease-free water to a final volume of 20 µL using the following protocol: 5 minutes at 25°C for priming, 20 minutes at 46°C for reverse transcription and 1 minute at 95°C for reverse transcriptase inactivation. The cDNA thereby obtained was consequently used as a template for PCR.
Briefly, the PCR reactions consisted of 1X Buffer (Phusion HF Buffer, Life Technologies), 0.2 mM MgCl 2 , 350 nM each of the forward and reverse primers, deoxynucleotide triphosphates (dNTPs) 250 µM, 0.4 U of Phusion Taq DNA polymerase (ThermoFisher), 2 µL of cDNA template brought up to a final volume of 20 µL with molecular biology grade water. The cycling programme for both conventional PCRs consisted of the following steps: 98°C for 30 s, 40 cycles at 98°C for 5 s, 63°C for 5 s and 72°C for 10 s, and a final elongation step at 72°C for 10 min.

DHPLC
In this study, heteroduplexes, if any, were generated by mixing the amplicons of the above-described endpoint PCRs of a standard sample (reference) and a test sample (unknown), by denaturing the mixed fragments at 95°C for 2 min, and then allowing them to renature at room temperature for 15 min. The reference samples may conveniently be the predominant variant identified in a given period. In this study, either B.1, B.1.177 or B.1.617.2 were used as standard. Each test sample was then run as such and mixed with the reference on an automated DHPLC apparatus (WAVEDHPLC system, ADS Biotec) equipped with a proprietary column (DNASep, ADS Biotec) which used alkylated non-porous polystyrene-divinylbenzene copolymer hydrophobic beads for high performance nucleic acid separations. Separation of the products was carried out by a mobile phase obtained by continuously mixing buffer B (0.1 mol/L triethylammonium acetate with 25% v/v in water acetonitrile) to buffer A (0.1 mol/L triethylammonium acetate), according to a gradient ( Table 1) calculated by Navigator ™ Software (ADS Biotec) and experimentally confirmed.  Likewise, the optimal oven temperatures for heteroduplex separation were determined using NavigatorSoftware (ADS Biotec) which gave a computer-assisted determination of the melting profile and analytical conditions for each fragment which were then experimentally verified. In this study, partially denaturing temperatures of 55.5°C and 56.6°C were used for the DHPLC screening of both the RBD_1 and the RBD_2 amplicons.
In the DHPLC system, amplicons are screened for chromatogram shape and, in particular, for the presence or not of more than one peak with respect to the reference control. Since the method is not quantitative but qualitative, the peak height and thereafter the peak quantitation are not relevant. Conversely, the inherent impressive analytical sensitivity as low as 1%, allows for detecting a heteroduplex peak, even if the amount of mixed amplicons are not quantified and normalised.
Of course, to readily identify additional peaks unveiling a nucleotide variation with respect to the standard used, the test sample should be adequately amplified. To avoid the need for quantifying and normalising the nucleic acid of the variants and, hence, speeding up the workflow, a serial dilution experiment was carried out by spiking, in human saliva, the viral mRNA of a B1.617.2 variant sample serially diluted 1:10 in molecular biology grade water from 3.5.0x10 8 to 3.5x10 3 copies/mL. The copy number was quantified using a digital PCR method (31). The mRNA was then purified from the saliva samples spiked with the viral mRNA, and the mRNA was retrotranscribed according to the above-mentioned methods, The cDNA samples were then PCR amplified, checked for the presence of an amplification band and were then tested using DHPLC to assess the last dilution yielding a distinct heteroduplex peak.
Before each run, the column was prepared according to the manufacturer's instructions. In particular, the Wave Low-range mutation control standard and the Wave High-range mutation control standard were used to check the apparatus before each run. One reference control was included in each assay run as well as a blank sample.
The data analysis was carried out using Navigator Software (ADS Biotec). Following the DHPLC screening, the same amplicons were purified and sequenced using an ABI310 automated sequencer (ThermoFisher).
According to the workflow in Figure 1, all positive samples should undergo the screening.

RESULTS
The endpoint PCR assays, run by agarose gel electrophoresis, showed discrete bands without smearing or additional nonspecific bands (Figure 2A).
Even samples with a very low viral load, close to the limit of detection of the diagnostic PCR, could be amplified using endpoint PCR and, notably, they could be readily screened using DHPLC (Figure 3). The method, carried out by mixing the B.1.117 and the emerging B.1.617.2 variants, even at a concentration of 7 copies/reaction, allowed identifying the heteroduplexes. In other words, DHPLC identified the heteroduplexes by mixing the D614G variant (B.1) with the B.1.1.7 variant (Alpha) as a reference ( Figure 2B). Then, using the B.1.1.7 as a reference, the DHPLC assay identified the emerging VOCs P.1 (Gamma), B.1.585 (Beta) (Figure 4), and B.1.617.2 (Delta) (blind group #1; Figure 4) Figure 5).
Complete findings of the latter are reported in Table 2.

DISCUSSION
Variant surveillance is crucial for assessing whether there are emerging mutations which might make the SARS-CoV-2 more contagious, virulent, or be capable of escaping natural or vaccinal immunity. This surveillance is of paramount importance for researchers, public health authorities and policy makers to implement actions aimed at mitigating the effect of the pandemic on the healthcare system. In particular, the transmission chains and local outbreaks in hospital settings may be investigated very effectively using massive sequencing, shedding light on how to prevent transmission (22)(23)(24)(25). More broadly, NGS on a global scale may serve to monitor emerging variants capable of escaping immunity and, to some extent, to predicting epidemic waves or assessing pathogenicity (18,(22)(23)(24)(25).
In this regard, an unprecedented effort to build up networks, platforms, and facilities capable of handling this great workload has been reinforced in developed and, to some extent, in developing countries. As of January 15, 2022, 7,181,951 SARS-CoV-2 full genome sequences had been filed of the 331,009,268 confirmed cases (2.1%) (19,20). However, after two years of the pandemic and the epidemic waves caused by the spread of new more transmissible variants,the majority of healthcare systems worldwide have been overwhelmed once again by the spread of the Omicron variant (B.1.1.529 and sublineages).
Two main limitations, such as the promptness (the ability of effectively sequencing isolates in a timely manner, useful for managing single clusters of VOCs) and the comprehensiveness (the ability to sequence all the positive samples in a timely manner) of NGS may hamper the possibility of readily undertaking actions. These measures could include nonpharmaceutical interventions (NPIs) intended to put more stringent contact tracing and isolation procedures in place so as to control the outbreak of new variants as much as possible. This became dramatically evident during the spreading of the Omicron variants in Italy. In late November 2021, when the Omicron variants were beginning to emerge, Italy had a low level of SARS-CoV-2 circulation, recording fewer than 50/100,000 cases daily (32). In that critical phase, the Epidemiology and Disease Control Division (EDCD) issued a warning (33) to reinforce genomic surveillance. Although the first Omicron case identified in Italy on 26 November was promptly sequenced, and accurate contact tracing with sequencing was carried out (34), no impact of the sequencing program was observed on the epidemic curve which followed those of neighbouring countries by only a few days of delay. This led to the loss of control of the epidemic spread in only a few days as occurred during the months of December 2021 and January 2022 in Italy with millions of new cases deeply impacting the health care system (32) (Supp. Figure 1). In Italy, a network, namely I-Co-Gen (Italian COVID-19 Genomic) commissioned to coordinate the genome sequencing surveillance, which started to operate in April 2021 was implemented by the policy maker. The consortium started operating in April 2021 and an increase in the absolute number sequenced was achieved in the following months. However, the positive isolates sequenced in Italy was still less than 1% of the number of positive cases (82,793 filed genome sequences of the 8,706,915 confirmed cases) or less than 3% of the 3,019,676 cases reported to the Surveillance system (34, 35) (Supp. Figure 1).  Higher rates of sequencing were carried out with a low number of daily cases; however, eventually, the percentage dropped below 0.5% as the incidence increased (34, 35) (Supp. Figure 1). As a matter of fact, genomic surveillance carried out at the level reported, even in developed countries, does not represent a measure which implements contact tracing and isolation procedures, and does not allow effective managing of discrete clusters. Technological advances are ongoing, arousing expectations that the above will be possible in the future (36).
However, it would be helpful to acknowledge that NGS could not address this issue until now and more practical to build up affordable platforms placed upstream from the sequencing  facilities. Those platforms should be capable of investigating all positive samples in order to pre-select those samples with mutations in critical regions of the viral genome. To date, all the VOCs responsible for the epidemic waves had a mutation of concern in the RBD with no exceptions (19,20). The above would allow focusing sequencing efforts on samples of concern to immediately establish NPI limited to specific cases. Denaturing high-performance liquid chromatography has been demonstrated to have valuable features asscreening step in different settings. This study demonstrated that DHPLC readily detected mutations in selected SARS-CoV-2 genome regions which might be of concern. To that end, DHPLC exploits the different retention times of homoduplexes and heteroduplexes to detect their multiple retention peaks. A sample of a known sequence was used as a standard and was mixed with the unknown samples. Whenever more than one retention peak was evident at DHPLC, it indicated that at least one mismatch, existed between the standard and the sample, and hence one or more mutation/s are present in the test sample. In other words, the DHPLC is able to distinguish whether the test sample is different from the reference in the RBD. To achieve adequate separation of the elution peaks, DHPLC is suitable for screening PCR fragments with a relatively limited length ranging from 150 to 400 bp (30). To almost cover the entire RBD of the Spike protein, two PCRs were designed, and the respective amplicons were screened for mutations using DHPLC. An inherent limitation of the technique is the need for additional studies including viral targets which would require additional PCR reactions.
To obtain sharp and easily interpretable peaks at DHPLC analysis, PCR reactions should avoid the formation of nonspecific amplicons and smears ( Figure 2A). The high-fidelity Phusion Taq polymerase is well-suited for this purpose; furthermore, the PCR reactions using Phusion Taq are very rapid, thus reducing the duration of the PCR step. Moreover,the Phusion buffer does not contain any detergent which would be detrimental for downstream DHPLC analysis.
In this study, it was demonstrated that DHPLC analysis, by screening the presence of mutations of concerns in the RBD, would have been able to detect all the VOCs which have emerged in the two years of the pandemic. In the study presented here, we were also able to identify a Delta variant (B.1.617.2) having a rare mutation of concern in this lineage (E484A). However, sublineages assessment relies on full genome sequencing and DHPLC cannot unveil sublineages unless hallmark mutations are present in the RBD. For instance, BA.4 and BA.5 Omicron sublineage cannot be distinguished using DHPLC on the RBD since no differences exists at this level (Suppl. Figure 2). Overall, the process was able to be completed in a few hours. Potentially, all new positives would be able to be screened on the same day as diagnosis using a dedicated thermal cycler and the DHPLC system ( Figure 1). In addition to the cost of the FIGURE 4 | Selected example of the detection of rare variants of concern using the Denaturing High-Performance Liquid Chromatography system. Using the B.1.1.7 strain as a standard, the rarer B.1.585 (Nigerian) and P.1. (Brazilian) variants could be detected and confirmed.   (31). Our results showed that DHPLC was able to assess the VOCs with a viral load two or three order of magnitude less than the minimal amount of viral copies required for an effective amplicon-based full genome sequencing (37) The S protein has different hotspots of mutation and deletion; the most likely candidates for immune escape are those within the RBD, such as K417N/T, E484K, and N501Y; N439K, N440K, G446S, L452R, Y453F, S477N, T478K, Q493R, G496S, N498Y, N501Y and Y505H may have a negative impact on the effectiveness of the vaccination (4,5,10,19,20,(38)(39)(40). Alarmingly, a convergent evolution has led to diverse groupings of those mutations among different clades (14). Using known sequenced strains, DHPLC is able to rapidly screen a large number of samples, establishing whether or not they share the same combination of mutations in the RBD. In the latter case, additional combinations may be rapidly ascertained in a few minutes by mixing and denaturing at high temperatures, allowing renaturing, and running in DHPLC with amplicons from known mutation combinations. Other possible targets could also be added to the analysis when their role emerges in vaccination escape in other domains (41,42). It is important to check for the PCR amplicon presence in the test sample using a fragment analyser or an agarose gel; this step ensures that the presence of a unique peak in the DHPLC reflects the perfect sequence identity of the test sample with the standard and it is not an artifact due to the analysis of the standard sample alone.
As compared to other screening methods, such as multiplex real-time PCR genotyping (43), or multiplex amplification refractory mutation system (ARMS) PCR (44), the assay has some drawbacks and many advantages. The main drawbacks are the need for an additional instrumentation while the other methods are run on widely available qPCR thermalcyclers, and the additional costs with respect to the use of diagnostic multiplex PCR with alerting of "suspect samples" software-based.
On the other hand, DHPLC screening may detect all the variants, either the known or the still unknown ones while these the former techniques are strictly sequence specific and can only detect known variants for whose specific probes have been designed. Only unexpected or inconsistent findings due to mismatches restricted to the probes sequence prompt the suspect of unknown variants and in-depth analysis of the sample. DHPLC. does not require complicated procedures for establishing or maintaining analytical performance as designing and validating increasing number of probes, or cumbersome technical replicatesOther sequence non-specific methods, such as High-Resolution Melting (HRM) may be valid alternatives. Nonetheless, HRM has been limited by inherently shorter amplicons and a longer running time (45)(46)(47)(48)(49)(50). Furthermore, HRM approaches are greatly limited to discrete mutations targeted to achieve consistency and reliability in performance (45)(46)(47)(48)(49)(50) In conclusion, the advantages for the healthcare systems of the flexible, cost-effective streamlined approach herein described, which integrates DHPLC with Sanger sequencing and, eventually, NGS should be considered as valuable alternative for variant screening in the genomic surveillance.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of AUSL Romagna under the protocol code "COVdPCR of 07/02/2020 (it includes appropriate approvals or waivers). Written informed consent for participation was not provided by the participants' legal guardians/next of kin because it is not applicable. The samples included in this study were sent to the Unit of Micro-biology, Greater Romagna Area Hub Laboratory, Cesena, Italy, for routine diagnostic purposes, and the laboratory data results were reported as an answer to clinical suspicion. As such, informed consent from patients was not required. Each sample included was preventively anonymized.

FUNDING
Partial financial support was received from AUSL Romagna under the protocol code "COVdPCR".