Needed: More Reliable Bioeffects Studies at “High Band” 5G Frequencies

One major source of controversy related to possible health effects of radiofrequency radiation (RFR) is the large number of reported statistically significant effects of exposure, over the entire RF part of the spectrum and over a wide range of exposure levels, even as health agencies do not find clear evidence for health hazards of exposure at levels within current IEEE and ICNIRP exposure limits. This Perspective considers 31 studies related to genetic damage produced by exposure to RFR at frequencies above 6 GHz, including at millimeter-wave (mm-wave) frequencies. Collectively, the papers report many statistically significant effects related to genetic damage, many at exposure levels below current exposure limits. However, application of five risk of bias (RoB) criteria and other considerations suggest that the studies in many cases are vulnerable to false discovery (nonreplicable results). The authors call for improvements in study design, analysis and reporting in future bioeffects research to provide more reliable information for health agencies and regulatory decision makers. This Perspective is a companion to another Perspective by Mattsson et al. elsewhere in this volume (Mattsson et al., 2021) 1 .


INTRODUCTION
The possible biological and health effects of radiofrequency (RF) energy from wireless communications have been debated by scientists and the public for many years, with particularly vociferous public debate about the safety of 5G (more accurately, 5G New Radio or 5G NR) systems that are currently being rolled out around the world. While several thousand bioeffects studies have been conducted, nearly all of have been done at frequencies below 6 GHz where most present communications systems operate.
Some scientists have pointed to the many reported statistically significant effects of RF exposure as evidence of proof that RF fields over wide ranges of exposure parameters damage genetic material. For example, Ruediger (2009) commented: "101 publications . . . have studied genotoxicity of radiofrequency electromagnetic fields (RF-EMF) in vivo and in vitro. Of these 49 reported a genotoxic effect and 42 do not. In addition, 8 studies failed to detect an influence on the genetic material but showed that RF-EMF enhanced the genotoxic action of other chemical or physical agent . . . there is ample evidence that RF-EMF can alter the genetic material of exposed cells in vivo and in vitro and in more than one way." (Ruediger 2009) Lai (2021) in a comprehensive review of genetic damage studies commented: "[I]n the studies reviewed . . . approximately 70% of them showed effects. One could say that EMF exposure can lead to genetic changes. Some genetic damages could eventually lead to detrimental health effects. . .. knowing the mechanism is not necessary to accept that the data are valid." By contrast, in reviewing the same evidence, officially-and health agency-sponsored expert reviews have expressed a more cautious view. A critical review by the Scientific Committee on Emerging and Newly Identified Health Risks (Scientific Committee on Emerging and Newly Identified Health Risks (SCENIHR), 2015, under the auspices of the European Commission) concluded in 2015: ". . .taken together, the in vitro studies differ greatly for exposure characteristics and duration, cell type, biological endpoint and do not allow for any conclusion. Concerning genotoxicity, due to the close correlation between DNA damage and cancer occurrence, and the importance of genomic instability in assessing the potential health effects of radiation, the conflicting results presented here deserve future attention". (Scientific Committee on Emerging and Newly Identified Health Risks (SCENIHR), 2015, p. 69, p. 69) Similarly, critical reviews by individual scientists find at best weak evidence of genotoxic effects of exposure to RFR even as they note many reports of such effects in the literature (Verschaeve et al., 2010;Vijayalaxmi and Prihoda, 2019;Karipidis et al., 2021 for laboratory studies; for a much broader review see; IARC Working Group on the Evaluation of Carcinogenic Risks to Humans, 2013). Demonstration of genotoxicity of RFR at exposure levels within current safety limits (International Commission on Non-Ionizing Radiation Protection (ICNIRP), 2020; Institute of Electrical and Electronics Engineers (IEEE), 2019) would be extremely important for carcinogenic risk assessment.
Other recent reviews of the bioeffects literature above 6 GHz show many reports of effects of exposure for many endpoints, many at exposure levels below international limits such as Institute of Electrical and Electronics Engineers (IEEE), 2019 or International Commission on Non-Ionizing Radiation Protection (ICNIRP), 2020 (Simkó and Mattsson, 2019;Leszczynski, 2020).
This Perspective addresses the potential reasons for the disparity in viewpoints, between concern for "many effects" of exposure to RFR on one hand, with conclusions of other experts and health agencies that fail to find convincing evidence for harmful effects of RFR at exposure levels below current safety limits. The present focus is on technical weaknesses in study design and analysis. It is not intended as a critical or systematic review, for which a different analytical approach would be needed.

METHODS
We presently consider 31 genetic damage studies on animal and human cells exposed in vitro or in vivo to RF energy over a wide range of exposure parameters at frequencies above 6 GHz. The papers had been extracted from a recent review by one of us of English-language papers on genetic damage studies involving exposures to RFR between 0.3 MHz and 300 GHz (Vijayalaxmi and Prihoda, 2019), and published between 1990 and 2017. The papers had been identified from an extensive search of standard databases, and at the time of the study were as complete as possible a collection of genetic damage studies involving human and animal cells.
The 31 papers described a total of 175 different experiments involving RFR exposures to animals (10 studies, in vivo or in vitro exposures) or humans (21 studies, in vitro exposures). Most of the experiments compared RFR-exposed to sham controls; a few compared RFR + a known genotoxic agent such as X-rays. RFR exposures were restricted to frequencies >6 GHz, which is the transition frequency in both the IEEE and ICNIRP limits at which the dosimetric quantity changes from specific absorption rate (SAR) to absorbed power density at the surface of the tissue, reflecting the increasingly shallow penetration depth of the radiation in tissue at higher frequencies.
We summarized effect sizes for 157 individual experiments in terms of Cohen's d (the remaining papers did not provide sufficient information to determine d). Cohen's d, a standard measure of effect size, is defined as the difference in means of the exposed and control groups divided by a pooled standard deviation: d mean exposed − mean control 0.5 SD 2 exposed + SD 2

control
(1) where mean and SD refer to the mean and standard deviation of the respective group. For a two-sample t-test with n observations and equal standard deviations in each group, Cohen's d is related to the t-statistic by In addition, we applied the Student's t-test (one-sided for independent samples, assuming equal variances in each group) using the group means and standard deviations obtained from the papers using p < 0.05 as the criterion for statistical significance. Calculations were done using Matlab (The Mathworks, Natick MA). The results were partly recalculated using a statistics software package to confirm the results from Matlab.
Finally, we evaluated each of the 31 studies using the five risk of bias (RoB) criteria defined in Table 1  RoB criteria have come into increasing use to assess internal validity of studies for use in systematic reviews and meta-analyses of both laboratory and epidemiology studies (National Toxicology Program Office of Health Assessment and Translation (OHAT), 2015). Criteria similar (but more extensive than) those presently used have been applied by an expert group at Aachen University in systematic reviews of the bioeffects literature (e.g., Bodewein et al., 2019) and, less formally, by health agencies such as the Swedish Radiation Safety Authority (Swedish Radiation Safety Authority (SSM), 2019) in triaging excessively flawed studies from their literature reviews.
The RoB assessments and the statistical significance testing had significant uncertainties, for two main reasons.
First, many of the papers lacked sufficient documentation to permit more than a rough evaluation of RoB-a problem that is hardly unique to this set of papers. Exposure characterization, in particular, is a difficult technical problem at frequencies >6 GHz due to the short energy penetration depth in tissue and other factors. It is difficult to evaluate a study that simply states that exposure (the specific absorption rate or SAR) was calculated using a software package without further elaboration (e.g., Karaca et al., 2012). Second, many of the papers lacked sufficient information to allow an assessment of the correctness of the statistical analysis or even, in some cases, an independent application of a significance test. For a fuller discussion of the problems in extracting statistical data from genetic toxicology studies see Vijayalaxmi and Prihoda (2008). Comparison of the studies is limited by the diversity of assays used (Supplementary Material), which however are all sensitive measures of genetic damage. Figure 1 shows the distribution of number of statistically independent samples or animals exposed in the 175 experiments. Consistent with Organisation for Economic Cooperation and Development (OECD) (2016) recommendations, the exposed animal was considered to be the experimental unit; typically investigators scored many cells per exposed animal. The median "n" was 3, indicating that most of the studies had extremely limited statistical power.  (Figure 3). Statistically significant/no significant effects are indicated by +/o. Statistically significant effects are scattered over the whole range of exposures. Most of the results correspond to twofold or less variations in damage measures between exposed and control samples. This is comparable to variations in spontaneously occurring chromosome abnormalities and micronuclei endpoints in human cells (Vijayalaxmi and Prihoda, 2012).  Figure 4A shows the distribution of the d values for all 157 experiments. The effect sizes clustered near d ≈ 0 (no effect) with the large majority between (−2 < d < 2), possibly due to sampling effects. However, a number of outliers appear with higher d values. These are older studies (Garaj-Vrhovac et al., 1991;Zotti-Martelli et al., 2000;Kesari and Behari, 2009;Shckorbatov et al., 2010;Karaca et al., 2012). Their results are not consistent with subsequent studies on similar endpoints and may be in error. Figure 4B shows the distribution of d for the 30 experiments that showed statistically significant effects, which are consistently higher than those in the full set of experiments in Figure 4A. In part this reflects a well-known tendency of null hypothesis significance testing to exaggerate effect sizes (Gelman and Carlin 2014). This is a trivial effect of selecting "statistically significant" results, which selects datasets with t (and hence d) above a critical level (Eq. 2). With underpowered studies, as in the present case, this exaggeration can be quite large. In addition, some of the studies may have appreciable systematic errors.

Effect Sizes
Most of the data in Figure 4A are consistent with no effect (or at best small effects relative to the natural background variability of the endpoints) of exposure. A more detailed analysis (systematic review or meta-analysis) with detailed evaluation of the individual studies with respect to each endpoint is clearly needed, but the sparse and very uneven quality of the presently considered literature limits what can be concluded from such an analysis. Karipidis et al. (2021) found "no confirmed evidence" for genotoxic or other hazardous effects of RFR > 6 GHz. were satisfied by about half of the studies. The most common deficiency was lack of positive controls, which are needed to assess the proper functioning of an assay. Failure to satisfy other RoB criteria (lack of appropriate sham controls, blinded study design and adequate dosimetry) would be fatal to the validity of a study.

Risk of Bias
While failure to meet RoB criteria raises concerns about the possibility of systematic errors, the converse is not true: the RoB criteria do not establish internal validity of a study. To reliably measure effects of the magnitude reported in most of these studies, which were comparable to natural background variation, would require extraordinary measures to control experimental errors. If such measures were taken in any of the presently considered studies they were not described in the papers. Some studies found quite large effects, but their consistency with other studies and replicability would need to be considered.
Other Criteria: Flexibility in Data Collection and Statistical Analysis Simmons et al. (2011) attributed a major cause of nonreplicable science to "flexibility in data collection and analysis" which "allows presenting anything as significant." This refers to investigator degrees of freedom in arranging the conduct of a study, selecting data to present and analyze, choosing which comparisons to make, etc. A simple (but unethical) example would be to disregard data as they are being collected that appear inconsistent with what the investigator believes are reasonable results, without a formal procedure for managing erroneous data.
These authors recommended reducing this flexibility by: "list [ing] all variables collected in a study . . . and report [ing] all experimental conditions, including failed manipulations. . . .. . .. If observations are eliminated, authors must also report what the statistical results are if those observations are included." None of the 31 papers described such precautions. The OECD protocols for chemical toxicity testing [Organisation for Economic Cooperation and Development (OECD) 2014; Organisation for Economic Co-operation and Development (OECD) (2016) for genetic toxicology studies] include extensive precautions related to study design and evaluation and reporting of results to reduce this flexibility. None of the 31 studies considered here appear to have been compliant with OECD guidelines (although some authors did follow OECD guidelines in the number of animals used and/or in the number of cells examined).
A final consideration is the high rate of false discovery due to naïve use of null hypothesis significance testing (NHST), a problem that has been pointed out many times by statisticians but remains the default approach to analyzing many experimental studies. [For an extensive recent review see Colling and Szucs, 2021]. Gelman (2018) noted: "Null hypothesis significance testing (NHST) only works when you have enough accuracy that you can confidently reject the null hypothesis. You get this accuracy from a large sample of measurements with low bias and low variance. But you also need a large effect size. Or, at least, a large effect size, compared to the accuracy of your experiment." In the presently considered collection of studies, those conditions are clearly not satisfied.
Needless to say, NHST with p < 0.05 is the default statistical approach used in virtual all RF bioeffects studies to identify "effects" of exposure. It is profoundly misleading to retrospectively focus on statistically significant results in a collection of studies without concern for nonsignificant results that were also reported, study validity, and the size and biological significance of reported effects.

DISCUSSION AND CONCLUSION
Because of their small size and other limitations, many of the 31 presently considered studies can best be described as pilot studies. Because of the public interest in possible health hazards of 5G NR communications and the paucity of quality bioeffects studies at frequencies >6 GHz, further studies are warranted (but not necessarily a full range of studies).
In its 2019 review of the bioeffects literature related to possible health effects of 5G NR technology in three bands, the French agency Agence nationale de sécurité sanitaire de l'alimentation and de l'environnement et du travail (ANSES), 2019 (Agence nationale de sécurité sanitaire de l'alimentation, de l'environnement et du travail) concluded that (in English translation) "the data are not sufficient to conclude on the existence or not of health effects related to exposure to electromagnetic fields in the band of frequencies around 26 GHz". ANSES offered a laundry list of suggested studies in this band, emphasizing studies on the skin, in vitro genotoxicity studies, possible behavioral and neurophysiological effects, all to be done with "rigorous quality methods".
For such studies, major improvements in quality are needed relative to the presently considered set of studies. These include: 1. Better exposure assessment, which is a difficult problem at the frequency range >6 GHz due to the small penetration depth of RFR into tissue at those frequencies. 2. Larger studies with reasonable statistical power 3. Stronger study design with due attention to the RoB criteria, see also Zeni and Scarfi (2012) and Vijayalaxmi (2016). 4. Use of currently accepted best practices to reduce effects of investigator degrees of freedom, e.g., "publish [ing] pre-study power calculations and effect sizes, including negative findings. Hypothesis-testing studies should be pre-registered and optimally raw data published." (Szucs and Ioannidis, 2017).
INTERPHONE is one potential model for such studies (Cardis, et al., 2007). This set of coordinated epidemiological studies in several countries was funded jointly by industry and government, and was designed to address public concerns about possible links between use of mobile phones and brain cancer. At present, such controversies have not yet developed with respect to high-band 5G NR handsets (few if any are presently on the market in any event). However, the model for this program, with joint industry-government funding but governance free of industry influence, might be useful for more diverse studies on mm-wave bioeffects such as recommended by ANSES.
The EMF-RAPID Program is perhaps a more useful model. The program was set up by an act of [U.S.] Congress in 1992 to study the potential health impacts of extremely low frequency (ELF-EMF) fields from powerlines, responding to a growing public controversy at that time [The National Academies of Sciences Engineering and Medicine (NASEM), 1999]. The 5year program had three basic components: 1) a research program of bioeffects research on a range of endpoints; 2) information compilation and public outreach and 3) a health assessment for evaluation of any potential hazards arising from exposure to ELF-EMF. In addition it sponsored extensive surveys of population exposures to ELF-EMF from various sources. The program had multiple levels of oversight with government officials as well as representatives of public interest groups. Studies were selected for support "for their potential to provide solid, scientific data on whether ELF-EMF exposure represents a human health hazard, and if so, whether risks are increased under exposure conditions in the general population." Given the clear need of the public for reliable information about 5G NR telecommunications technologies and the difficult and only partly solved problem of assessing RFR exposure to the user of a handset and from base stations, a program such as EMF-RAPID would be a promising approach.
Short of such large programs, funding agencies and journal editors can increase the reliability of the bioeffects literature by supporting quality studies with adequate funding, and raising acceptance standards for bioeffects papers -a "carrot and stick" approach (Vijayalaxmi and Foster, 2021).
Unfortunately, after many years of debate about biological effects of RFR, health agencies are clearly losing enthusiasm for such a program and they may have to make do with a series of smaller investigator-generated studies, hopefully of better quality than presently available.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.