Genomic Modeling as an Approach to Identify Surrogates for Use in Experimental Validation of SARS-CoV-2 and HuNoV Inactivation by UV-C Treatment

Severe Acute Respiratory Syndrome coronavirus-2 (SARS-CoV-2) is responsible for the COVID-19 pandemic that continues to pose significant public health concerns. While research to deliver vaccines and antivirals are being pursued, various effective technologies to control its environmental spread are also being targeted. Ultraviolet light (UV-C) technologies are effective against a broad spectrum of microorganisms when used even on large surface areas. In this study, we developed a pyrimidine dinucleotide frequency based genomic model to predict the sensitivity of select enveloped and non-enveloped viruses to UV-C treatments in order to identify potential SARS-CoV-2 and human norovirus surrogates. The results revealed that this model was best fitted using linear regression with r2 = 0.90. The predicted UV-C sensitivity (D90 – dose for 90% inactivation) for SARS-CoV-2 and MERS-CoV was found to be 21.5 and 28 J/m2, respectively (with an estimated 18 J/m2 obtained from published experimental data for SARS-CoV-1), suggesting that coronaviruses are highly sensitive to UV-C light compared to other ssRNA viruses used in this modeling study. Murine hepatitis virus (MHV) A59 strain with a D90 of 21 J/m2 close to that of SARS-CoV-2 was identified as a suitable surrogate to validate SARS-CoV-2 inactivation by UV-C treatment. Furthermore, the non-enveloped human noroviruses (HuNoVs), had predicted D90 values of 69.1, 89, and 77.6 J/m2 for genogroups GI, GII, and GIV, respectively. Murine norovirus (MNV-1) of GV with a D90 = 100 J/m2 was identified as a potential conservative surrogate for UV-C inactivation of these HuNoVs. This study provides useful insights for the identification of potential non-pathogenic (to humans) surrogates to understand inactivation kinetics and their use in experimental validation of UV-C disinfection systems. This approach can be used to narrow the number of surrogates used in testing UV-C inactivation of other human and animal ssRNA viral pathogens for experimental validation that can save cost, labor and time.


INTRODUCTION
Coronaviruses belong to the family of Coronaviridae, comprising of 26 to 30 kb, positive-sense, single-stranded RNA, in an enveloped capsid (Woo et al., 2010). Coronaviruses can cause severe infectious diseases in human and vertebrates, being fatal in some cases. Severe acute respiratory syndrome (SARS) coronavirus (SARS-CoV-1), a β-coronavirus emerged in Guangdong, southern China, in November, 2002(Guan et al., 2003, and the Middle East respiratory syndrome (MERS) coronavirus (MERS-CoV), was first detected in Saudi Arabia in 2012 (Alagaili et al., 2014). Since late December 2019, a novel β-coronavirus (2019-nCoV or SARS-CoV-2) has been responsible for the pandemic coronavirus disease  with >7.2 million confirmed cases throughout the world, and a fatality rate of approximately 5.7% as of 11 June, 2020 (World Health Organization [WHO], 2020a). This 2019-nCoV is thought to have originated from a seafood market of Wuhan city, Hubei province, China, and has spread rapidly to other provinces of China and other countries (Zhu et al., 2020).
According to current evidence documented by the World Health Organization [WHO] (2020a,b), SARS-CoV-2 virus (2019-nCoV) is transmitted between humans through respiratory droplets and contact (person-to-person, fomites, etc.) routes (World Health Organization [WHO], 2020b). van Doremalen et al. (2020) reported that SARS-CoV-2 remained viable in aerosols throughout the 3 h duration of the experiment and more stable on plastic and stainless steel than on copper and cardboard, and virus was detected up to 72 h after the application to these surfaces at 21-23 • C and 40% relative humidity. Given the ability of these viruses to survive in the environment, appropriate treatment strategies are needed to inactivate SARS-CoV-2. As per WHO recommendations, SARS-CoV-2 may be inactivated using chemical disinfectants. As of 07 April, 2020, the United States Environmental Protection Agency [USEPA] (2020) has announced a list of 428 registered chemical disinfectants that have been approved for use against SARS-CoV-2 (United States Environmental Protection Agency [USEPA], 2020). On the other hand, physical disinfection method "ultraviolet light (UV) treatment" (with germicidal UV-C at wavelengths from 100 to 280 nm) can be an effective approach to inactivate SARS-CoV-2 on surface areas and in the air. UV inactivates a broad spectrum of microorganisms by damaging the DNA or RNA and thereby prevents and/or alters cellular functions and replication . UV-C inactivation of various microorganisms such as pathogenic bacteria, spores, protozoa, algae and viruses has been reported (Malayeri et al., 2016;Bhullar et al., 2019;Gopisetty et al., 2019;Pendyala et al., 2019Pendyala et al., , 2020Patras et al., 2020). Because UV inactivation studies with SARS-CoV-2 requires specifically trained and skilled personnel working under biosafety level 3 (BSL-3) laboratory containment conditions, the use of surrogate coronaviruses has the potential to cross these hurdles for experimental validation of designed UV systems. Based on the biophysical properties and genomic structure, literature studies on testing the efficacy of disinfectants against coronaviruses used the following surrogates; murine hepatitis virus (MHV), Human coronavirus 229 E, transmissible gastroenteritis virus (TGEV), and feline infectious peritonitis virus (FIPV) (Kumar et al., 2020). However, the selection of potential surrogates to SARS-CoV-2 requires a comparative evaluation of UV-C sensitivity between these viruses. As of date, the precise experimental UV-C susceptibility (D 90 value) of SARS-CoV/SARS-CoV-2 is not reported.
Human noroviruses (HuNoVs) cause >80% of global non-bacterial gastroenteritis that can be spread through contamination of food, water, fomites, or direct contact, and also via aerosolization (Fankhauser et al., 2002;Widdowson et al., 2005;Godoy et al., 2006). HuNoVs are also single-stranded RNA viruses that are small 27 to 32 nM in size that belong to the Caliciviridae family. However, HuNoVs are enclosed in a non-enveloped capsid, unlike SARS-CoV-2 that is enveloped. UV-C inactivation data on the HuNoV genogroups is limited due to the lack of available cultivation methods to obtain high infectious titers (Doultree et al., 1999;Ettayebi et al., 2016;Estes et al., 2019). Thus, reverse transcription quantitative polymerase chain reaction (RT-qPCR) is widely used for assessing survivor populations of HuNoVs after treatment. However, research studies showed overestimation of survivors with RT-qPCR in comparison to virus infectivity plaque assays (Rönnqvist et al., 2014;Wang and Tian, 2013;Walker et al., 2019). As an alternative, cultivable animal viruses [caliciviruses, echoviruses and murine norovirus (MNV)] have been used as surrogates to determine UV-C inactivation of HuNoVs (Thurston-Enriquez et al., 2003;de Roda Husman et al., 2004;Lee et al., 2008;Park et al., 2011), but proper selection of surrogates which mimic the UV-C inactivation characteristics of HuNoVs is required to evaluate kinetics and scale up validation studies. Furthermore, it is well known that microorganisms respond to UV exposure at rates defined in terms of UV rate constants . The slope of the logarithmic decay curve is defined by the rate constant, which is designated as k. The UV rate constant k has units of cm 2 /mJ or m 2 /J and is also known as the UV susceptibility. It can be also defined as D 90 or D 10 [dose for 90% inactivation or 10% survival] as the primary indicator of UV susceptibility. UV dose is expressed as J/m 2 or mJ/cm 2 . The varied microbial sensitivity to ultraviolet light (UV) among species of microbes, is due to several intrinsic factors including physical size, presence of chromophores or UV absorbers, presence of repair enzymes or dark/light repair mechanisms, hydrophilic surface properties, relative index of refraction, specific UV spectrum (broad band UVC/UVB versus narrow band UVC), genome based parameters; molecular weight of nucleic acids, DNA conformation (A or B), G+C%, and % of potential pyrimidine or purine dimerization (Kowalski et al., 2009).
The physical size of a virus bears no clear relationship with UV susceptibility, except that for the largest viruses, as size increases, the UV rate constant tends to decrease slightly (which is likely the result of UV scattering) (Kowalski et al., 2009). There is no thorough literature available on the above-mentioned optical parameters, hydrophilic surface properties and repair mechanisms relating to UV sensitivity. On the other hand, genome sequences of UV susceptibility can be easily retrieved from genome databases and the development of genomic models based on the above mentioned genome-based parameters is feasible to predict the UV susceptibility of ssRNA viruses, which include human pathogenic novel viruses (such as SARS-CoV-2) and cultivation-challenging HuNoVs.
Our hypothesis is that predicting UV-C inactivation based on genomic modeling, will enable the determination of surrogates to be used in UV-C validation studies. In the present study, we attempted to develop a genomic model to predict and compare the UV sensitivity of enveloped SARS-CoV-2 and non-enveloped HuNoVs and to determine their suitable surrogates for use in UV-C process validation.

Collection of Reported ssRNA Viruses UV 254 Sensitivity (D 90 Values)
We collected UV-C sensitivity of ssRNA viruses form published studies and carefully selected D 90 values ( Table 1). The selection was based on the careful assessment of methods that were used to determine UV-C sensitivity. The selected UV-C sensitivity of an ssRNA virus is determined via the standard method (Bolton and Linden, 2003), with the log 10 survivors as a function of UV dose and represented as D 90 .

Determination of Genomic Parameters;
Genome Size, and Pyrimidine Dinucleotide Frequency Value (PyNNFV) The molecular size and nucleotide sequences of genomes used in this study were directly obtained from available NCBI genome database ( Table 2 and Table 5). PyNNFV model was developed based on the frequency of each type of pyrimidine dinucleotides (TT, TC, CT, and CC) which varies based on genome sequences. Pyrimidines are almost 10 times more susceptible to photoreaction (Smithyman and Hanawalt, 1969), while strand breaks, inter-strand cross links and DNA-protein cross links form with less frequency (1:1000 of the number of dimers and hydrates) (Setlow and Carrier, 1966). Three simple rules were formulated for sequencedependent dimerization (Becker and Wang, 1989); "(i) When two or more pyrimidines are neighboring to one another, photoreactions are observed at both pyrimidines, (ii) Nonadjacent pyrimidines exhibit little or no photoreactivity, and (iii) Purines form UV photoproducts when they are flanked at 5 side by two or more adjacent pyrimidine residues." Therefore, we considered 100% probability of formation of photoreaction products when PyNN are flanked by pyrimidines on both sides and 50% probability when PyNN are flanked by purine on either side. The individual PyNNs were counted by the exclusive method (each pyrimidine considered in one PyNN combination only). Research studies showed the proportion of photoreaction products in the order of TT > TC > CT > CC (Douki, 2013), thus same sequence was followed in counting individual PyNNs. Table 3 shows the method used for PyNNFV calculation in this study. A mathematical function was written to calculate PyNNFV from the potential PyNNs to exist in the genome of RNA (Eq. 1).
The PyNNFVs from complete genome sequences of 16 ssRNA viruses and corresponding reported D 90 values were used to plot a model graph. Then, the correlation between PyNNFVs and D 90 values was analyzed by fitting the appropriate regression model (linear regression). Table 1 shows the median D 90 values collected from UV-C inactivation studies of various ssRNA viruses. The data was selected from the studies conducted with uniform viral suspensions in transparent medium (water or phosphate buffer saline), followed standard method for UV dose calculation (Bolton and Linden, 2003). The D 90 values reported for ssRNA viruses ranged from 18 J/m 2 for SARS-CoV-1 to 190 J/m 2 for murine sarcoma virus. Genomic parameters; genome size,

Genomic Models to Predict UV-C Sensitivity of ssRNA Viruses
To determine the relationship between genome size and UV-C sensitivity, the D 90 values were plotted against the genome size of various ssRNA viruses (Figure 1). The data were best fitted to log linear regression model with r 2 = 0.63. The results revealed that there was a decisive relationship between genome size and UV sensitivity across the range of 3569-29751 bp. Further to evaluate the influence of base composition and sequence along with genome size on UV-C sensitivity, the D 90 values were plotted versus PyNNFV (Figure 2). Linear regression model was best fitted with r 2 = 0.90. Therefore, based on the value of r squared a moderate positive relationship was found between PyNNFV and UV-C sensitivity of the virus. The following linear regression equation shows the correlation between D 90 values and PyNNFV. y = 19984x + 10.409 (2) Also, to predict the distribution of UV-C sensitivities and estimates of the true population mean using this model, 95% prediction and confidence intervals were shown in Figure 2. To confirm the adequacy of the fitted model, studentized residuals versus run order were tested and the residuals were observed to be scattered randomly, suggesting that the variance was constant. It can be indicated from Figure 3 that predicted values were in close agreement with the experimental values and were found to be not significantly different at p > 0.05 using a paired t-test. Despite some variations, results obtained predicted model and actual experimental values showed that the established models reliably predicted the D 90 value. Therefore, the predictive performance of the established model can be considered acceptable. The applicability of the models was also quantitatively evaluated by comparing the bias and accuracy factors ( Table 4 and Eqs 3 and 4).
The average mean deviation (E%) were used to determine the fitting accuracy of data (Eq. 5). Where, n e is the number of experimental data, V E is the experimental value and V P is the predicted value.
In most cases, as shown in Table 4, the accuracy factor (AF) values for the genomic model were close to 1.00, except for Measles virus (0.83), Semliki forest virus (0.86). The bias factor (BF) values for the predicted models were also close to 1.00, ranging from 1.02 to 1.21 for the parameter studied. These results clearly indicate that there was a good agreement between predicted and observed D 90 values. Ross et al. (2000) stated that predictive models ideally would have an AF = BF = 1.00, indicating a perfect model fit where the predicted and actual response values are equal. However, typically, the AF of a fitted model will increase by 0.10-0.15 units for each predictive variable in the model (Ross et al., 2000). Genomic model, as in this study, that forecasts a response may be expected to have AF and BF   values ranging from 0.83 to 1.21 or an equivalent percentage error range of 0.82-25.67%.

Prediction of UV Sensitivity of Various Corona Viruses and Human Noroviruses
Owing to good model fitting, the PyNNFV genomic model was used to predict UV sensitivity of coronaviruses including SARS-CoV-2 and different HuNoV genogroups. PyNNFV values of target viruses were calculated from genomic sequences obtained from the NCBI database. The UV sensitivities were predicted by substituting PyNNFV value in Eq. 2. Table 5 shows PyNNFV values and corresponding predicted D 90 values of target viruses. Predicted D 90 of SARS-CoV-2 virus (21.5 J/m 2 ) ( Table 5) is closer to the estimated D 90 of SARS-CoV-1 (18 J/m 2 ) from the experimental study (Table 1). Kariwa et al. (2006) irradiated 2 mL of SARS-CoV-1 in 3-cm petri dishes without stirring UV-C light at 134 µW/cm 2 for 15 min, and observed reduction in infectivity from 3.8 × 10 7 to 180 TCID 50 /mL with equivalent to D 90 value of 226 J/m 2 . In contrast, Darnell et al. (2004) showed 4 log reduction of SARS-CoV-1 at UV-C exposure of 4016 µW/cm 2 for 6 min which is equivalent to D 90 value of 3610 J/m 2 . The authors conducted the experiment in a 24 well plate containing 2 mL virus aliquots without mixing. These two studies neither calculate the average irradiance nor provide conditions for uniform UV-C dose distribution throughout the test fluid and thereby reported higher values. The model predicted D 90 value of MERS-CoV (28.1 J/m 2 ) that is found to be higher than SARS-COV-2, whereas murine hepatitis coronavirus (MHV) strains showed similar UV-C sensitivity (D 90 values = 20.3 to 21 J/m 2 ). For αand γ-coronaviruses, the predicted D 90 values (17.8 to 18.3 J/m 2 ) were lower than the β-coronaviruses (Table 5). Saknimit et al. (1988) demonstrated the efficiency of UV-C irradiation on the inactivation of MHV and CCV coronaviruses using 15 W UV-C lamp at a distance of 1 m and reported efficient UV-C inactivation after 15 min treatment. From this data, the estimated D 90 values for MHV and CCV (γ-coronavirus) were 17 and 15 J/m 2 , respectively, and observed to be slightly lower (∼20%) than the model predicted values (Table 5). Overall the results show Values in parenthesis denote estimated D 90 values from experimental study (Saknimit et al., 1988).
that coronaviruses are highly sensitive to UV-C light than other ssRNA viruses reported in Table 1. From the UV sensitivity data obtained using the genomic model, it was observed that UV doses ranging from 90 to 141 J/m 2 are required for 5 log reduction of human pathogenic coronaviruses (SARS-CoV-1, MERS-CoV, 2019-nCoV). Here we demonstrate an example of UV exposure using a low-pressure mercury lamp. If the UV-C lamp source provides an average irradiance of 0.4 mW/cm 2 or 4 W/m 2 (under uniform dose distribution conditions), a mere 35 s treatment is adequate to inactivate β-coronaviruses (99.999% or 5 log reduction). Since the developed model relies on total PyNNFV (not on specific gene sequences), slight viral mutations should not cause significant variations in UV sensitivity. For instance, if the PyNNFV value of SARS-CoV-2 changes up to ±10%, the model predicted UV sensitivity (D 90 value) ranges from 20.4 to 22.6 J/m 2 with the change of just ±2.6%. The predicted D 90 values of HuNoVs are 69.1, 89, and 77.6 J/m 2 for genogroups, GI, GII, and GIV, respectively ( Table 5). The results revealed that the UV-C sensitivity of GII was lower with higher predicted D 90 value in comparison to GI and GIV. To the best of our knowledge, limited experimental data is currently available on UV-C sensitivity of HuNoVs. Some research studies used RT-qPCR method to estimate MNV survivors and validated with virus infectivity assay (Wang and Tian, 2013;Rönnqvist et al., 2014;Walker et al., 2019). The reported validation results showed that the values obtained with RT-qPCR method are overestimated compared to standard virus infectivity assays (Wang and Tian, 2013;Rönnqvist et al., 2014;Walker et al., 2019). For instance, Rönnqvist et al. (2014) reported 4-log reduction of MNV at a UV dose of 60 mJ/cm 2 with the infectivity assay, whereas just 2-log decline of MNV and HuNoV RNA levels was found at a UV dose of 150 mJ/cm 2 by the RT-qPCR method. The experimental D 90 values of conservative surrogates (MNV, echovirus and caliciviruses) obtained via viability assay are reported to be in the range of 60-100 J/m 2 ( Table 1).

Identification of Potential Surrogates for UV-C Inactivation
Validation of the UV-C inactivation kinetics of specific pathogens such as SARS-CoV-2 is not possible (without the use of appropriate surrogates) because of the need for sophisticated biosafety level (BSL)-3 containment, and to protect the researchers, and the public from health risk in environmental settings. For HuNoV, research on reproducible cultivable systems that obtain high titers are still on-going. Hence, criteria for the selection and application of surrogates are required to ensure that the surrogates mimic the behavior of the SARS-CoV-2 or HuNoVs under specific treatment conditions, while ensuring safety of personnel and also decreasing labor, cost and time. Also, surrogates are useful in process validation studies at scale up that can reduce the uncertainties linked with UV-C dose measurement.
For HuNoVs, the predicted D 90 values of all genogroups (69-89 J/m 2 ) were higher than D 90 values of the reported caliciviruses (60-67 J/m 2 ) in our study, echoviruses (75 J/m 2 ), except being lower than MNV-1 (100 J/m 2 ) (Tables 1, 5). Use of surrogates that exhibit similar or slightly higher D 90 values to target pathogens can avoid the risk associated with improper inactivation, hence our results indicate that MNV-1 is the better choice (though conservative) to validate UV-C inactivation of all HuNoVs under laboratory experimental setup conditions.
In conclusion, a predictive genomic-modeling method was developed for estimating the UV sensitivity of SARS-CoV-2 and HuNoVs. Results of the model validation showed that the developed model had acceptable predictive performance, as assessed by mathematical and graphical model performance indices. We predicted the D 90 values by conducting extensive genomic modeling. Although the parameters reported here may suffice to estimate the UV sensitivity, experimental research directed to address various knowledge gaps identified in this study is required to maximize the accuracy of predicted models. Additional parameters will be computed to the predictive model as needed, including terms for the presence of chromophores or UV absorbers and for possible UV scattering.

DATA AVAILABILITY STATEMENT
All datasets presented in this study are included in the article/supplementary material.

AUTHOR CONTRIBUTIONS
BPe and AP conceived of the presented idea and wrote the manuscript. BPe developed the theory and performed the computations. BPo contributed to statistical analysis. BPe, AP, and DD'S contributed to the interpretation and discussion of the results. All authors contributed to the article and approved the submitted version.