Genomic modeling as an approach to identify surrogates for use in experimental validation of SARS-CoV-2 and HuNoVs inactivation by UV-C treatment

Severe Acute Respiratory Syndrome coronavirus-2 (SARS-CoV-2) is responsible for the COVID-19 pandemic that continues to pose significant public health concerns. While research to deliver vaccines and antivirals are being pursued, various effective technologies to control its environmental spread are also being targeted. Ultraviolet light (UV-C) technologies are effective against a broad spectrum of microorganisms when used even on large surface areas. In this study, we developed a pyrimidine dinucleotide frequency based genomic model to predict the sensitivity of select enveloped and non-enveloped viruses to UV-C treatments in order to identify potential SARS-CoV-2 and human noroviruses surrogates. The results revealed that this model was best fitted using linear regression with r2=0.90. The predicted UV-C sensitivity (D90 - dose for 90% inactivation) for SARS-CoV-2 and MERS-CoV was found to be 21 and 28 J/m2, respectively (with an estimated 18 J/m2 as published for SARS-CoV-1), suggesting that coronaviruses are highly sensitive to UV-C light compared to other ssRNA viruses used in this modeling study. Murine hepatitis virus (MHV) A59 strain with a D90 of 21 J/m2 close to that of SARS-CoV-2 was identified as a suitable surrogate to validate SARS-CoV-2 inactivation by UV-C treatment. Furthermore, the non-enveloped human noroviruses (HuNoVs), had predicted D90 values of 69.1, 89 and 77.6 J/m2 for genogroups GI, GII and GIV, respectively. Murine norovirus (MNV-1) of GV with a D90 = 100 J/m2 was identified as a potential conservative surrogate for UV-C inactivation of these HuNoVs. This study provides useful insights for the identification of potential nonpathogenic surrogates to understand inactivation kinetics and their use in experimental validation of UV-C disinfection systems. This approach can be used to narrow the number of surrogates used in testing UV-C inactivation of other human and animal ssRNA viral pathogens for experimental validation that can save cost, labor and time.

nCoV is thought to have originated from a seafood market of Wuhan city, Hubei province, China,50 and has spread rapidly to other provinces of China and other countries (Zhu et al., 2020). 51 According to current evidence documented by the World Health Organization (WHO), SARS-CoV-2 52 virus (2019-nCoV) is transmitted between humans through respiratory droplets and contact (person-53 to-person, fomites, etc.) routes (WHO, 2020b). van Doremalen et al. (2020) reported that SARS-54 CoV-2 remained viable in aerosols throughout the 3 h duration of the experiment and more stable on 55 plastic and stainless steel than on copper and cardboard, and virus was detected up to 72 hours after 56 the application to these surfaces at 21-23°C and 40% relative humidity. Given the ability of these 57 viruses to survive in the environment, appropriate treatment strategies are needed to inactivate 58 SARS-CoV-2. As per WHO recommendations, SARS-CoV-2 may be inactivated using chemical 59 disinfectants. As of 07 April, 2020, the United States Environmental Protection Agency (USEPA) 60 has announced a list of 428 registered chemical disinfectants that have been approved for use against 61 SARS-CoV-2 (USEPA, 2020). However, chemical disinfection requires intense labor and product to 62 treat large surface areas. As an alternative, ultraviolet light (UV) technology (with germicidal UV-C 63 at wavelengths from 100 nm to 280 nm) can be an effective approach to inactivate SARS-CoV-2 on 64 large surface areas and in the air (regardless of humidity levels) with less labor. UV inactivates a 65 broad spectrum of microorganisms by damaging the DNA or RNA and thereby prevents and/or alters 66 cellular functions and replication . UV-C inactivation of various microorganisms 67 such as pathogenic bacteria, spores, protozoa, algae and viruses has been reported (Malayeri et  from genome databases and the development of genomic models based on the above mentioned 103 genome-based parameters is feasible to predict the UV susceptibility of ssRNA viruses, which 104 include pathogenic novel viruses (such as SARS-CoV-2) and cultivation-challenging HuNoVs. 105 Our hypothesis is that predicting UV-C inactivation based on genomic modeling, will enable the 106 determination of surrogates to be used in UV-C validation studies. In the present study, we attempted 107 to develop a genomic model to predict and compare the UV sensitivity of enveloped SARS-CoV-2 108 and non-enveloped HuNoVs and to determine their suitable surrogates for use in UV-C process 109 validation.  (Table 1). The selection was based on the careful assessment of methods that were used to 114 generate the UV dose response curves. The UV sensitivity of an ssRNA virus is determined via a 115 dose-response curve, with the log 10 survivors as a function of UV dose and represented as D 90 . 116 2.2 Determination of genomic parameters; genome size, and Pyrimidine dinucleotide 117 frequency value (PyNNFV). 118 The molecular size of genomes were directly obtained from available NCBI genome database (Table  119 2). PyNNFV model was developed based on the frequency of each type of pyrimidine dinucleotides 120 (TT, TC, CT and CC) which varies based on genome sequence. Pyrimidines are almost 10 times 121 more susceptible to photoreaction (Smithyman and Hanawalt, 1969), while strand breaks, inter-strand 122 cross links and DNA-protein cross links form with less frequency (1:1000 of the number of dimers 123 and hydrates) (Setlow and Carrier, 1966). Three simple rules were formulated for sequence-124 dependent dimerization (Becker and Wang, 1989); "i) When two or more pyrimidines are 125 neighboring to one another, photoreactions are observed at both pyrimidines, ii) Non-adjacent 126 pyrimidines exhibit little or no photoreactivity, and iii) Purines form UV photoproducts when they 127 are flanked at 5' side by two or more adjacent pyrimidine residues". Therefore, we considered 100% 128 probability of formation of photoreaction products when PyNN are flanked by pyrimidines on both 129 sides and 50% probability when PyNN are flanked by purine on either side. The individual PyNNs 130 were counted by the exclusive method (each pyrimidine considered in one PyNN combination only). 131 Research studies showed the proportion of photoreaction products in the order of TT > TC > CT > 132 CC (Douki, 2013), thus same sequence was followed in counting individual PyNNs. PyNNFV. 147

Genomic models to predict UV-C sensitivity of ssRNA viruses 148
To determine the relationship between genome size and UV-C sensitivity, the D 90 values were 149 plotted against the genome size of various ssRNA viruses ( Figure 1). The data was best fitted to log 150 linear regression model with r 2 = 0.63. The results revealed that decisive relationship between 151 genome size and UV sensitivity across the range (3569 -29751 bp). 152 Further to evaluate the influence of base composition and sequence along with genome size on UV-C 153 sensitivity, the D 90 values were plotted versus pyrimidine dinucleotide frequency value (PyNNFV) 154 (Fig 2). Linear regression model was best fitted with r 2 = 0.90. Therefore, the results show good 155 relationship between PyNNFV and UV-C sensitivity of virus. The following linear regression 156 equation shows the correlation between D 90 values and PyNNFV. 157 ൌ ૢ ૢ ૡ . ૢ

Equation 2 158
Also, to predict the distribution of UV-C sensitivities and estimates of the true population mean using 159 this model, 95% prediction and confidence intervals were shown in Figure 2. To confirm the 160 adequacy of the fitted model, studentized residuals versus run order were tested and the residuals 161 were observed to be scattered randomly, suggesting that the variance was constant. It can be 162 indicated from Figure 3 that predicted values were in close agreement with the experimental values 163 and were found to be not significantly different at p > 0.05 using a paired t-test. Despite some 164 variations, results obtained predicted model and actual experimental values showed that the 165 established models reliably predicted the D 90 value. Therefore, the predictive performance of the 166 established model can be considered acceptable. The applicability of the models was also 167 quantitatively evaluated by comparing the bias and accuracy factors (Table 4,  The average mean deviation (E %) were used to determine the fitting accuracy of data (equation 5). 172 Where, n e is the number of experimental data, V E is the experimental value and V P is the predicted 173 value. 174 In most cases, as shown in  (Table 1) observed to be slightly lower (~20 %) than the model predicted values (Table 5). Overall the results 206 show that coronaviruses are highly sensitive to UV-C light than other ssRNA viruses reported in 207 Table 1. From the UV sensitivity data obtained using the genomic model, it was observed that UV 208 doses ranging from 90-141 J/m 2 is required for 5 log reduction of human pathogenic coronaviruses 209 (SARS-CoV-1, MERS-CoV, 2019-nCoV). Here we demonstrate an example of UV exposure using a 210 low-pressure mercury lamp. If the UV-C lamp source provides an average irradiance of 0.4 mW/cm 2 211 or 4 W/m 2 (under uniform dose distribution conditions), a mere 35 second treatment is adequate to 212 inactivate β -coronaviruses (99.999% or 5 log reduction). 213 The predicted D 90 values of HuNoVs are 69.1, 89 and 77.6 J/m 2 for genogroups, GI, GII, GIV, 214 respectively (Table 5). The results revealed that the UV-C sensitivity of GII was lower with higher 215 predicted D 90 value in comparison to GI and GIV. To the best of our knowledge, limited 216 experimental data is currently available on UV-C sensitivity of HuNoVs. Some research studies used 217 RT  (Table 1). 225

Identification of potential surrogates for UV-C inactivation 226
Validation of the UV-C inactivation kinetics of specific pathogens such as SARS-CoV-2 is not 227 possible because of the need for sophisticated biosafety level (BSL)-3 containment, and to protect the 228 researchers, and the public from health risk in environmental settings. For HuNoV, research on 229 reproducible cultivable systems that obtain high titers are still on-going. Hence, criteria for the 230 selection and application of surrogates are required to ensure that the surrogates mimic the behavior 231 of the SARS-CoV-2 or HuNoV under specific treatment conditions, while ensuring safety of 232 personnel and also decreasing labor, cost and time. Also, surrogates are useful in process validation 233 studies at scale up that can reduce the uncertainties linked with UV-C dose measurement. 234 As seen from Table 5  including SARS-CoV-1, MERS-CoV, and SARS-CoV-2 (2019-nCoV) were not as resistant to UV-247 C treatments compared to the non-enveloped HuNoVs and caliciviruses used in this modeling study. 248 In conclusion, a predictive genomic-modeling method was developed for estimating the UV 249 sensitivity of SARS-CoV-2 and HuNoVs. Results of the model validation showed that the developed 250 model had acceptable predictive performance, as assessed by mathematical and graphical model 251 performance indices. We predicted the D 90 values by conducting extensive genomic modelling. 252 Although the parameters reported here may suffice to estimate the UV sensitivity, experimental 253 research directed to address various knowledge gaps identified in this study is required to maximize 254 the accuracy of predicted models. Additional parameters will be computed to the predictive model as 255 needed, including terms for the presence of chromophores or UV absorbers and for possible UV 256 scattering. In the future, we plan to validate this data by demonstrating experimental UV-C 257 sensitivity (D 90 values) of SARS-CoV-2 in containment laboratories with biosafety level 3 (BSL3) 258 features and for HuNoVs (when suitable cultivation systems that reproducibly provide high viral 259 titers are easily available for use in most laboratories). 260

Data availability statement 261
The genome sequences used in this study can be found in the NCBI nucleotide database.