Validation, Optimization, and Application of the Zebrafish Developmental Toxicity Assay for Pharmaceuticals Under the ICH S5(R3) Guideline

The zebrafish as an alternative animal model for developmental toxicity testing has been extensively investigated, but its assay protocol was not harmonized yet. This study has validated and optimized the zebrafish developmental toxicity assay previously reported by multiple inter-laboratory studies in the United States and Europe. In this study, using this classical protocol, of 31 ICH-positive compounds, 23 compounds (74.2%) were teratogenic in zebrafish, five had false-negative results, and three were neither teratogenic nor non-teratogenic according to the protocol standard; of 14 ICH-negative compounds, 12 compounds (85.7%) were non-teratogenic in zebrafish and two had false-positive results. After we added an additional TI value in the zebrafish treated with testing compounds at 2 dpf along with the original 5 dpf, proposed a new category as the uncategorized compounds for those TI values smaller than the cutoff both at 2 dpf and 5 dpf but inducing toxic phenotypes, refined the testing concentration ranges, and optimized the TI cut-off value from ≥ 10 to ≥ 3 for compounds with refined testing concentrations, this optimized zebrafish developmental assay reached 90.3% sensitivity (28/31 positive compounds were teratogenic in zebrafish) and 88.9% (40/45) overall predictability. Our results from this study strongly support the use of zebrafish as an alternative in vivo method for screening and assessing the teratogenicity of candidate drugs for regulatory acceptance.


INTRODUCTION
Developmental toxicity and teratogenicity represent a severe safety problem that causes approximately 5-10% of the congenital abnormalities of human newborns by teratogenic agents (Seiler et al., 2009). After the developmental effects of thalidomide were recognized in 1966, the Food and Drug Administration (FDA) established protocols to be used for assessing drug effects on reproduction and development prior to approval for human use (Marathe and Thomas, 1990). In addition, there are currently more than 143,835 preregistered chemicals that could contaminate food and the environment (Rovida and Hartung, 2009;Kim et al., 2016), but about 86% of these and other existing chemicals have no safety testing data (Selderslaghs et al., 2012).
Due to concerns about the safety of chemicals, the European Registration, Evaluation, Authorization and restriction of Chemicals (REACH) program established protocols to collect such data for all chemicals produced or marketed in quantities of more than 1 t per year (Selderslaghs et al., 2012). It was estimated that 5816 traditional testing animals and a £1,883,200 cost would be required for the assessment of the developmental toxicity of one chemical according to the Organization for Economic Co-operation and Development (OECD) guidelines TG 414 (Organization for Economic Co-operation, and Development [OECD], 2001a), TG 416 (Organization for Economic Co-operation, and Development [OECD], 2001b), TG 421 (Organization for Economic Cooperation and Development [OECD], 1995), and TG 422 (Organization for Economic Co-operation and Development [OECD], 1996), no matter how time-consuming (Fleischer, 2007). This situation has urged research into alternative methods for developmental toxicity testing, namely, the whole embryo culture (WEC) test (Webster et al., 1997), the mammalian micromass (MM) test (Flint, 1993), and the embryonic stem cell test (EST; Spielmann et al., 1997). The former two models still use intact mammals to serve as test systems; the last one has always been a controversial area. Indeed, these tests do not cover the whole period of embryo development (Spielmann et al., 2006).
The zebrafish as a non-mammalian vertebrate animal furnishes several advantages for alternative toxicity assay, such as economic husbandry requirements, embryonal transparency, high fecundity, and 6-384-well plate highthroughput screening (Brannen et al., 2010;Lantz-Mcpeak et al., 2015). Furthermore, zebrafish also appears to be an applicable model for developmental toxicity testing, and the developmental toxicity of drugs and compounds has actually been evaluated in zebrafish in the past 10 years as summarized in Table 1 (Organization for Economic Cooperation, and Development [OECD], 2011; Organization for Economic Co-operation, and Development [OECD], 2012). In one blinded study in four laboratories, 20 non-proprietary compounds were tested in zebrafish for developmental toxicity; each of the testing laboratories achieved similar overall concordance to the mammalian data (60-70%; Gustafson et al., 2012). After optimizing experimental parameters and taking zebrafish embryo-uptake into consideration, in their second phase of this project, 38 proprietary pharmaceutical compounds were evaluated in two laboratories; 62-82% total concordance was achieved (Ball et al., 2014). In other studies, the zebrafish developmental toxicity assay achieved an overall predictive value of 50-60% and 92% at Flemish Institute for Technological Research and at Phylonix Pharmaceutical, respectively (Haldi et al., 2011;Selderslaghs et al., 2012). It was easy to find that these developmental toxicity reports derived from the zebrafish assays were not uniform: the definition of teratogenicity, predictability of results, multiple details related to experimental conditions and data analysis, and rationality of concentration settings were not fully optimized and validated.
Publication and implementation of the International Conference of Harmonization (ICH) Harmonized Guideline S5(R3) (International Conference on Harmonization [ICH], 2017, 2020) is driving the progress of alternative methods for developmental toxicity testing in China. The introduction of alternative test systems in the International Conference on Harmonization (ICH) S5(R3) step 2 draft guideline (2017) (International Conference on Harmonization [ICH], 2017) of which China participated in the revision is one of the biggest updates, and it is also the first time that the content of alternative test methods has been added at great length in the ICH safety evaluation guidelines. The ICH guidelines proposed the use of alternative in vitro and non-mammalian in vivo reproduction tests for embryofetal developmental toxicity (EFD) risk assessment. The ICH S5(R3) final version (2020) has clearly stated that data generated from qualified alternative assays conducted alone or in conjunction with one or more in vivo studies can be utilized to support hazard identification and risk assessment under limited circumstances. However, the data produced from the zebrafish developmental toxicity assay have been used for investigative new drug (IND) applications domestically and internationally but not officially recognized by China drug regulatory agency yet.
In this study, we intended to validate and optimize internationally the preliminarily established the zebrafish developmental toxicity assay as an alternative in vivo assay, and hopefully for the regulatory acceptance of a China drug regulatory agency. Following the recommendation of the ICH Reference Compound List (Accessory 11.3.4, Table 9-6 Reference Compounds for Qualifying Alternative Assays, S5(R3) step 2 draft guideline, International Conference on Harmonization [ICH], 2017), 45 compounds were selected from among the 66 compounds based on their categories for the teratogenicity experiments: 31 out of 50 positive controls and 14 out of 16 negative controls. We have assessed and validated the zebrafish developmental toxicity assay protocol originally reported by multiple interlaboratory studies (Gustafson et al., 2012;Ball et al., 2014), here designated as the "classic protocol." Combining our experience  and that of other studies (Haldi et al., 2011;Selderslaghs et al., 2012), we have optimized the original classical protocol and further improved its predictability for pharmaceutical use under the ICH S5(R3) guideline.
Additionally, three non-ICH reference compounds, chlorogenic acid, triptolide, and aconitine, were applied to test the optimum zebrafish developmental toxicity assay. The results derived from this study indicated that the zebrafish developmental toxicity assay optimized and validated in this report is a reliable and reproducible non-mammalian in vivo method for screening and assessing the teratogenicity (i.e., Malformations or Embryo-Fetal Lethality, MEFL) of candidate compounds.

Zebrafish Husbandry and Egg Collection
Adult AB strain zebrafish were obtained from the China Zebrafish Resource Center (Shanghai, China) and housed in a light-and temperature-controlled aquaculture facility with a standard 14:10-h light/dark photoperiod and fed with live brine shrimp twice daily and dry flake once a day. Four to five pairs of zebrafish were set up for natural mating every time.
On average, 200-300 embryos were generated. Embryos were maintained at 28 • C in fish water (0.2% Instant Ocean Salt in deionized water, pH 6.9-7.2, conductivity 480-510 mS.cm −1 and hardness 53.7-71.6 mg/L CaCO 3 ). The embryos were washed and staged at 6 and 24 h post-fertilization (hpf) (Kimmel et al., 1995). The zebrafish facility at Hunter Biotechnology, Inc., is accredited by the Association for Assessment and Accreditation of Laboratory Animal Care (AAALAC) International (No. 001458, Zhou et al., 2015;Zhu et al., 2016), the China National Accreditation Service for Conformity Assessment (CNAS), and the China Inspection Body and Laboratory Mandatory Approval (CMA). After experiments, all the zebrafish were anesthetized and euthanized with 0.25 g/L tricaine methanesulfonate, which conforms to the American Veterinary Medical Association (AVMA) requirements for euthanasia by anesthetic (Shen et al., 2015). This study was approved by the Institutional Animal Care and Use Committee (IACUC) at Hunter Biotechnology, Inc., and the IACUC approval numbers were IACUC-2018-017, IACUC-2018-043, and IACUC-2020-117.

Selection of Test Compounds
The 45 designated-compounds, in the ICH Reference Compound List, were selected following the ICH S5(R3) Step 2 Draft (2017) that stated, "to be appropriate for regulatory use, the alternative assay(s) should be characterized using the ICH Reference Compound List"; "at least 45 compounds in total should be tested"; and "all classes should be tested (at least two or three compounds from each class). An approximate 2:1 ratio of positive to negative compounds should be tested because it is important to identify positive compounds." These ICH-positive (teratogenic) and -negative (non-teratogenic) compounds were evaluated for the validation of the proposed zebrafish developmental toxicity assay (Gustafson et al., 2012;Ball et al., 2014) in this study. These ICH reference compounds were classified into 10 categories based on their action mechanisms and summarized in Supplementary  The compounds were dissolved in 100% DMSO, and subsequently, various volumes of the master solutions were directly added to the testing fish water at designated testing concentrations with a final DMSO concentration of 0.5% (v/v). All the master solutions were prepared freshly right before each experiment, and pH values were checked without any artificial adjustments.

Compound Treatment
The evaluation of malformation and mortality in zebrafish after exposure to compounds was performed following the previous reports from multiple inter-laboratory studies (Gustafson et al., 2012;Ball et al., 2014;Raghunath et al., 2019). At 4-6 hpf, zebrafish were manually transferred into a 24-well plate (Nest Biotech, Shanghai, China) containing negative compounds, positive compounds, or vehicle (0.5% DMSO), with 12 zebrafish per well in 1 ml of fish water. We selected zebrafish treatment starting at 4-6 h because most unfertilized zebrafish eggs could be easily identified and removed at 4 hpf and most zebrafish laboratories in China and Euro-America perform zebrafish egg cleaning and staging at 4-6 hpf. As reported in our previous studies and others , using 4-6 hpf high-quality fertilized zebrafish eggs for developmental toxicity assay would produce more reliable results. Untreated (fish water) control zebrafish were examined in parallel. Uniformly, a range of five concentrations at 0.1, 1, 10, 100, and 1000 µM were tested to assess the teratogenicity of a test compound. Exposure was continuous and static without feeding; dead zebrafish were recorded and removed from the solution during daily observations, and morphological characteristics of each individual zebrafish were evaluated at 48 hpf (2 days post-fertilization, dpf) and at 120 hpf (5 dpf) under a stereomicroscope (Nikon, SMZ645, Tokyo, Japan). Falsepositive, false-negative, and uncategorized compounds were repeated for at least one time.

Determination of LC 25 and NOAEL
Accumulated mortality was counted at 2 dpf and 5 dpf to calculate 25% lethal concentration (LC 25 ), respectively, based on lethality curve. If lethality at the highest tested concentration was <25%, an LC 25 value was by default set to be higher than the highest tested concentration in subsequent applications (Gustafson et al., 2012). Determination of the no observed adverse effect level (NOAEL) involved assessing the concentrationresponse relationship of abnormal effects; an adverse effect must be a gradient concentration-response. The NOAEL exhibited compound-related anomalies equal to or below those observed in the vehicle and negative controls. There were two NOAELs obtained from 2 dpf and 5 dpf observations, respectively, under a stereomicroscope.

Developmental Toxicity Assessment
In the original classical protocol, zebrafish developmental toxicity was only assessed at 5 dpf, which led to higher false-negative results. In our pilot study, we found that zebrafish developmental toxicity assessed at both 2 dpf and 5 dpf could give a better predictability for positive teratogenic compounds. After compound treatment, zebrafish were immobilized using 3% methylcellulose and photographed for morphological anomalies using a 1 × DF PLAPO objective (Olympus, Japan) and a VertA1 camera (Sony Exmor CCD Sensor). Developmental malformations of structure/organ in morphology observed in zebrafish are summarized in Table 2. The assessed tissues and organs included heart, circulation, hemorrhage/thrombosis, head, pharyngeal arches/jaw, eye, sacculi/otoliths, liver, kidney, swim bladder, intestinal tract, notochord/somites/tail, trunk muscle, pectoral fins, and body pigmentation (Haldi et al., 2011;Gustafson et al., 2012;Ball et al., 2014;He et al., 2014). In the previous work, we found that swim bladder loss or delayed yolk sac absorption happened quite a lot in zebrafish embryos exposed to various pharmaceuticals and environmental agents, not specific for developmental toxicity. Therefore, swim bladder loss and yolk sac absorption were recorded in Table 2, but not used for developmental toxicity assessment and TI calculation.

Teratogenicity and Teratogenic Index Calculation
STEP 1: Validation of the Classical Protocol (Gustafson et al., 2012;Ball et al., 2014) Based on NOAEL and LC 25 values, a teratogenic index (TI) was calculated as the ratio of LC 25 /NOAEL for each time point (Gustafson et al., 2012;Ball et al., 2014). In this classical protocol, compounds were classified as non-teratogens or teratogens according to the TI values derived from 5 dpf. A TI value in 5 dpf greater than or equal to 10 represented abnormalities for which compounds were predicted as teratogens; if TI values were <10, the compounds were predicted as non-teratogens; and compounds for which the LC 25 -to-NOAEL ratio could not be determined due to both concentrations being >1000 µM were predicted as non-teratogens (Gustafson et al., 2012;Ball et al., 2014).

STEP 2: Optimizations of the Test Concentration Range and Teratogenic Criteria
The optimizations of the classical protocol included calculating an additional TI value at 2 dpf, adding a new category of the uncategorized compounds, narrowing down the testing concentration ranges for some compounds identified in the initial tests and proposing a new teratogenic criterion.
(1) The calculating method of the classical protocol based on LC 25 and NOAEL was used to obtain a TI value for compound-treated zebrafish at 2 dpf, and a TI value from either 2 dpf or 5 dpf greater than or equal to 10 represented a teratogenic compound. To reduce possible false-negative results, compounds that showed obvious morphological effects on zebrafish, but with TI values <10, were categorized as uncategorized teratogens, in which toxic potential could cover up the teratogenic potency.
(2) Uncategorized and false-positive compounds identified in the initial tests were further re-tested using the smaller test concentration ranges that were optimized to obtain more accurate LC25 and NOAEL. Ceritinib was re-tested at concentrations of 5, 10, 15, 20, and 25 µM and warfarin at concentrations of 10, 25, 50, 75, and 100 µM. Cyproheptadine hydrochloride was retested at concentrations of 5, 10, 25, 50, and 100 µM, and cyclobenzaprine hydrochloride at concentrations of 62.5, 125, 250, 500, and 1000 µM. In addition, because the categorized positive teratogenic compound aspirin showed cardiovascular toxicity at 2 dpf but this toxicity was completely recovered at 5dpf, it was rested at lower concentrations of 10, 25, 50, 75, and 100 µM.
The concentration-mortality curve was calculated using OriginPro 8.0 software (Zheng et al., 2020) to obtain the best LC 25 and NOAEL at 5 dpf and, correspondingly, a more realistic TI value. These compounds re-tested at refined concentrations were judged as teratogens if TI at 5 dpf was ≥ 3.

Verification Tests of the Optimized Zebrafish Developmental Toxicity Assay
After the validation and optimization of the classical protocol, we selected three compounds to test their developmental toxicity and teratogenicity in zebrafish using our optimized protocol and standards developed from this study. Zebrafish were treated with triptolide, aconitine, or chlorogenic acid at concentrations of 0.1, 1, 10, 100, and 1000 µM, and LC 25 and NOAEL at 2 dpf and 5 dpf were calculated based on their respective lethality curve. After getting an initial TI value, the testing concentrations were further refined and optimum TI values were obtained.

Comparison to Human Data
The ICH Reference Compound List was selected based on data available in 2017 and derived from Detection of Toxicity to Reproduction for Human Pharmaceuticals. We further identified the relevant literature on the developmental toxicity of most selected compounds from searches of the PubMed (Medline) database. References were also identified from databases such as the Developmental and Reproductive Toxicology Database and the Hazardous Substances Data Bank. The results from the zebrafish developmental toxicity ftests were compared with human teratogenicity data in the ICH Harmonized Guideline S5(R3) (International Conference on Harmonization [ICH], 2017, 2020), and then the overall predictability of the zebrafish developmental toxicity and teratogenicity evaluation was obtained. A compound was considered to be a developmental toxicant in conventional mammalian studies (rat, mouse, and rabbit, etc.) if it caused an increase in the occurrence of one of four manifestations: functional deficits, altered growth, structural abnormalities, or death (Schwetzm and Harris, 1993). In case conflicting conclusions were reported, we assumed the compound in question to be teratogenic. Literature shows that, for most compounds, data were available on teratogenicity on different strains of a species or laboratory animals' development in multiple species. The overall concordance of this report was obtained by calculating the sensitivity (teratogens), specificity (non-teratogens), and overall predictability between human/mammal and zebrafish data (Selderslaghs et al., 2012).

Assay Performance and Data Analysis
The study was conducted in accordance with the Basic & Clinical Pharmacology & Toxicology policy for experimental and clinical studies (Pernille et al., 2018). Successful experiments met all the following quality control milestones: (1) zebrafish natural death in untreated and vehicle-treated groups was ≤ 10%; (2) there was no statistical difference (p > 0.05) in assessed endpoints or signals between untreated and vehicle-treated groups; (3) intraand inter-group coefficient of variation (CV) was ≤ 25%.
Mortality data were imported into Origin (OriginPro, version 8.0, 2007) and fitted to a sigmoidal equation with variable slope, thus creating concentration-response curves. These concentration-response curves were required to determine NOAEL and LC 25 values. TI was calculated as the ratio of LC 25 /NOAEL for each time point. The correct classifications of positive and negative predictive values were imported into GraphPad (GraphPad Prism, version 5.0, 2003) for chi-square test, and p < 0.05 was considered significant.
The assay's performance was evaluated by overall concordance. True positive (Y) and true negative (N) compounds were compounds that had zebrafish teratogenicity classification consistent with mammalian data. False-positive (FP) and false-negative (FN) were compounds that had zebrafish teratogenicity classification inconsistent with mammalian results. Uncategorized (U) meant compounds that induced visually observable malformation(s) on zebrafish, but TI values were <10. The analyses included determining the following endpoints: sensitivity for detecting teratogens = Y/(Y + FN + U) × 100%; specificity for detecting non-teratogens = N/(N + FP) × 100%; and overall concordance = (Y + N)/45 × 100%.

Developmental Toxicity of the ICH Reference Compounds in Zebrafish as Validation of the Classical Protocol
Developmental toxicity of 45 ICH categorized positive (teratogenic) or negative (non-teratogenic) compounds was assessed at 2 dpf to 5 dpf zebrafish using parameters presented in Tables 3, 4, and Supplementary Tables 2-4, and LC 25 , NOAEL, and TI were calculated as described in Materials and Methods.
Based on the methods described in Step 1 in Materials and Methods, as shown in the results of "Before optimization" in Tables 3, 5, of 31 ICH-positive compounds, 23 compounds (74.2%) were teratogenic in zebrafish, and eight had falsenegative results; of 14 ICH-negative compounds, 12 compounds (85.7%) were non-teratogenic in zebrafish and two had false positive results, and overall concordance was 77.8% (35/45). Table 4 contains a total of 14 negative controls, and the effects of 12 non-teratogenic compounds on zebrafish were highly consistent with the classification of ICH, except cyproheptadine hydrochloride and cyclobenzaprine hydrochloride, which were classified as negative controls by ICH but presented as falsepositive results in zebrafish tests (Supplementary Figure 1). Cyproheptadine hydrochloride produced significant teratogenic phenotypes, including pericardial edema, bradycardia, absent blood flow, oversized jaw, small eyes, liver degeneration, yolk sac absorption delay, renal edema, and swim bladder loss, at a concentration of 10 µM with 25% zebrafish death and 100% death at 100 and 1000 µM. Cyclobenzaprine hydrochloride induced apparent liver degeneration and oversized jaw at 100 µM with 8.3% death and 100% death at 1000 µ M.
Meanwhile, Table 3 contains a total of 31 positive controls, and the effects of 23 teratogenic compounds on zebrafish were highly consistent with the classification of ICH, except that aspirin, acitretin, isotretinoin, ceritinib, hydroxyurea, warfarin cytarabine, and ribavirin compounds had false-negative results when TI was below 10, as shown in the results of "Before optimization" in Table 3. Repeated experiments were performed on all 10 false-positive and false-negative compounds, and the results remained reproducible.
The morphological and functional abnormalities induced by teratogenic compounds at 2 dpf to 5 dpf in zebrafish are shown in Supplementary Tables 2-4. The most observed defects were pericardial edema, bradycardia, oversized jaw, yolk sac absorption delay, renal edema, and missing swim bladder. Aspirin had a typical dysplastic phenotype of brain hemorrhage; phenytoin showed a tachycardia; whereas the zebrafish treated with carbamazepine and cisplatin were developmentally much delayed and still in the chorion at the end of treatment (5 dpf).

Optimization for Detection Time and Definition for Uncategorized Compounds
We found that the TI value of acitretin was >1000 at 2 dpf, which was significantly teratogenic, but with TI = 1 at 5 dpf, it was finally judged as a teratogen; this was in line with the new teratogenicity standard-a TI value from either 2 dpf or 5 dpf greater than or equal to 10 represented a teratogenic compound, as set by us from this study. TI values of aspirin, acitretin, and isotretinoin at 2 dpf were greater than 10, 1000, and 100, respectively, and thus, they were all re-categorized as teratogens, whereas cytarabine and ribavirin were still false-negative based on TI values both at 2 dpf or 5 dpf and morphology.
Ceritinib caused a toxic reaction manifested as missing swim bladder at 10 µM and 100% death at 100 and 1000 µM concentrations, but TI was below 10; warfarin caused 75% swim bladder loss and 100% yolk sac absorption delay at 10 µM and 100% death at 100 and 1000 µM, but TI was below 10; and hydroxyurea caused 100% swim bladder loss at 1000 µM, but TI was also less than 10 (Supplementary Figure 2). Thus, hydroxyurea, warfarin, and ceritinib were categorized as uncategorized compounds (Table 3), based on TI values for three uncategorized compounds smaller than 10 at both 2 dpf and 5 dpf, but they all induced morphological abnormalities.

Test Concentration Refinement and TI Value Optimization
Optimum concentrations and TI values were performed for one positive compound, two uncategorized compounds, and two false-positive compounds. As indicated in Supplementary  Figure 3, the TI value of the positive compound aspirin was optimized to 3.9 in the refined concentration test. Not any toxic phenotype was found at 10 µM; one zebrafish had pericardial edema, yolk sac absorption delay, and swim bladder loss at Y = yes, FN = false negative, and U = uncategorized. √ : correct prediction; × : incorrect prediction.
(U): A uncategorized compound judged only based on the TI value at Step 2 of the optimization. *: A false-negative compound was corrected to a teratogenic compound after adding an additional TI at 2 dpf along with the original TI at 5 dpf. #: After refining testing concentration range and optimizing TI parameters, a false-negative or uncategorized compound was finalized as a teratogenic compound.
25 µM; five zebrafish with the delayed yolk sac absorption, swim bladder loss, cardiovascular toxicity, and Renal edema; and 12 dead when treated at 75 µ M. After refining the concentrations and calculating curves, the TI values of uncategorized compounds ceritinib and warfarin were shown at 3.5 and 23.9, respectively. Developmental toxicity phenotypes of ceritinib demonstrate three zebrafish with yolk sac absorption delay and swim bladder loss at 15 µM, and 12 dead when treated at 20 mM. Warfarin had cardiovascular toxicity at 10 µM; and four were dead when treated at 50 mM. TI value in the zebrafish treated with ceritinib at 5 dpf was smaller than 10 but greater than 3, and warfarin was greater than 10. If TI cut-off value was lowered from 10 to 3 based on these new results from the refined concentration experiments, ceritinib and warfarin were re-categorized from uncategorized compounds to teratogens.
Under the refined testing concentrations, the TI values of two false-positive compounds cyproheptadine hydrochloride and cyclobenzaprine hydrochloride TI were obtained at 13.8 and 11.7, respectively. Obviously, even in the optimized concentration tests, the TI values of cyproheptadine hydrochloride and cyclobenzaprine hydrochloride were still ≥ 10, and thus, they were true or false positive in the zebrafish developmental toxicity assay.
In summary, as shown in Tables 3, 5, after optimizations using the above methods, the predictability for five positive compounds became consistent from being inconsistent. The sensitivity of the zebra assay for detecting teratogens significantly increased from

Application of the Optimized Zebrafish Developmental Toxicity Assay
To verify the optimized zebrafish developmental toxicity assay, three non-ICH compounds were tested in zebrafish using the new protocol and standards. As demonstrated in Table 6 and Supplementary Table 5, at 2 dpf, no compound-related toxic phenotypes and deaths were seen in the zebrafish treated with triptolide at 0.1 µM, but 12 zebrafish had pericardial edema at 1 µM, and 12 zebrafish died at 10 µM. At 5 dpf, 0.1 µM of triptolide treatment led to 12 zebrafish renal edema, pericardial edema, and cardiovascular toxicity, and three zebrafish died and nine zebrafish showed deformities at 1 µM. Therefore, the LC 25 was 1 µM and the NOAEL was <0.1 µ M. At 2 dpf zebrafish, aconitine induced in two zebrafish pericardial edema, cardiovascular toxicity, and no death at 100 µM, but no compound-related dysplasia at 10 µM; and in seven zebrafish, pericardial edema, cardiovascular toxicity, and no death at 1000 µM. The LC 25 was >1000 µM and the NOAEL was 10 µM. At 5 dpf, 10 µM aconitine treatment resulted in 12 zebrafish having renal edema and delayed yolk sac absorption; at 100 µM, two zebrafish died, 10 zebrafish had renal edema and pericardial edema, and the LC 25 was >100 µM and the NOAEL was 1 µ M.
At 2 dpf zebrafish, no compound-related toxicity was found from chlorogenic acid treatment and deaths were seen at 10 µM; at 100 µM, two zebrafish died, but no other dysplasia was observed, and the LC 25 was >100 µM, and the NOAEL was 10 µM. At 5 dpf, no toxicity was detectable at 10 µM; at 100 µM, two zebrafish died, six zebrafish showed swim bladder loss, and five zebrafish had the delayed yolk sac absorption, and the LC 25 was >100 µM and the NOAEL was 10 µM. The TI values of triptolide, aconitine, and chlorogenic acid were >10, >100, and >10, respectively, and all these three compounds were categorized as teratogenic agents in zebrafish. These results were further confirmed with the refined concentration tests as indicated in Figure 1 and Table 4.

DISCUSSION
Zebrafish have similar physiology, morphology, and functions with mammals and have been recognized as valuable in vivo models for toxicity and safety assessment of drug candidates and chemicals (Dooley and Zon, 2000;Mcgrath and Li, 2008;Macrae and Peterson, 2015). In recent years, zebrafish developmental toxicity assays and zebrafish embryotoxicity tests have been reported from several laboratories in the United States and Europe with inconsistent experimental methods, parameters, and sensitivity of prediction (Table 1).
Most zebrafish developmental toxicity assays require the exposure of zebrafish embryos to up to 5 dpf for viability and morphology assessment. According to the European Union Directive (2010) on protection of laboratory animals, independent feeding is the stage at which zebrafish embryos are subject to regulations for animal experimentation (Gustafson et al., 2012;Strahle et al., 2012). Conventionally, post-hatched embryos >5 dpf are considered protected since the swim bladder is inflated, enabling free swimming and self-feeding (Belanger et al., 2010). Therefore, the design of this whole-organism assay is in compliance with definitions associated with the use of non-protected species (Schwetzm and Harris, 1993;Gustafson et al., 2012). With reference to the internationally reported zebrafish developmental toxicity and teratogenicity evaluation methods, we intended to validate and optimize a quick and reliable alternative method for the IND regulatory use hopefully in China and in the world.
Step 1 mainly referred to the method of AstraZeneca et al. reported in 2012(Gustafson et al., 2012Ball et al., 2014). This method has a high throughput and saves new drug research and development time and cost; but in our validation study, this classical protocol had eight false negatives and the correct predictability was only 74.2%. We found that: (1) only using TI values at 5 dpf in the classical protocol could result in false-negative results. For example, isotretinoin as a relatively strong teratogenic drug induced a high percentage of zebrafish death at 5 dpf and, thus, affected TI value calculation (lower TI value). In addition, we and others found that the zebrafish cardiovascular system was more sensitive to toxic agents at 2 dpf (Zhu et al., 2014); in this study, for example, aspirin induced pericardial edema at 2 dpf, but this cardiovascular toxicity was recovered at 5 dpf; (2) testing concentration range at a 10 × scale in the classical protocol could quickly screen a large number of compounds in one test, but it could be easy to lose some classes of compounds with a narrow safety window, leading to false-negative results. For example, ceritinib induced zebrafish malformation, but its LC 25 and NOAEL values were quite close; (3) that TI ≥ 10 as a teratogenicity cutoff as proposed in the classical protocol could mistakenly categorize some positive teratogenic compounds into negative ones and a smaller TI cut-off value could be more reasonable.
Based on our experience in the zebrafish toxicity and safety assays and the initial validation data, we added a new TI value at 2 dpf and compounds were assessed as teratogenic if either one or both TI values at 2 dpf and 5 dpf were greater than the cut-off value. This newly added TI at 2 dpf could help reduce false negatives.
There were only two categories of teratogenic and nonteratogenic compounds in the classical protocol, some compounds with TI values smaller than 10 at 2 dpf and 5 dpf but inducing toxic phenotypes could not be categorized as teratogenic or non-teratogenic. Therefore, we defined these neither teratogenic nor non-teratogenic compounds in the classical protocol assay as uncategorized compounds. In the further optimization study, we found that these uncategorized compounds could be divided into two sub-categories. For the first sub-category of uncategorized compounds, their LC 25 and NOAEL both were greater than 1000 µM, i.e., no observable effects on zebrafish at 1000 µM; LC 25 of these compounds was too high to be assessed for their developmental toxicity in zebrafish, and other types of animal tests could be needed. For the second sub-category of uncategorized compounds with LC 25 or NOAEL smaller than 1000 µM, we performed new experiments using testing concentrations at smaller scales and obtained optimum TI values. After optimizing the TI cut-off value from ≥ 10 to ≥ 3, the second sub-category of uncategorized compounds was re-categorized into positive (teratogenic) compounds.
In addition to obtaining the TI values, the morphological observations of developmental toxicity and teratogenicity played an important role in the judgment of drug toxicity, especially those drug-related morphological abnormalities. For the NOAEL biomarkers, we summarized the commonly and rarely observed morphological abnormalities of the zebrafish developmental toxicity and teratogenicity in Table 2. From our experience, if the zebrafish only showed a delayed yolk sac absorption and/or swim bladder loss after a compound treatment at 2 dpf and/or 5 dpf, these two malformations might not be used as NOVEL cutoff as we did in this study. We postulated that delayed yolk sac absorption and/or swim bladder loss were most likely related to developmental retard, but not teratogenicity.
After the validation and optimization of the classical protocol, we selected three non-ICH compounds (triptolide, aconitine, and chlorogenic acid) from our customer compound library isolated from Chinese herbs and tested them in the zebrafish developmental toxicity assay using the optimized protocol. Triptolide was toxic to human and male mouse reproductive systems (Ni et al., 2008) and aconitine was embryotoxic to rats in vitro (Xiao et al., 2007). To our best knowledge, the developmental toxicity of chlorogenic acid was unknown yet. Our data demonstrated that these three compounds were all teratogenic in zebrafish; the results of triptolide and aconitine in zebrafish were consistent with literature, whereas the results of chlorogenic acid should be confirmed in other tests in future investigations.
We must point out that the 45 ICH categorized as positive (teratogenic) and negative (non-teratogenic) compounds used in this study were selected from the ICH Reference Compound List for qualifying alternative assays in the S5(R3) draft guideline published in 2017 (International Conference on Harmonization [ICH], 2017). In this draft version, there were 16 negative compounds and 50 positive compounds listed. However, in the ICH final version promulgated in February 2020 (International Conference on Harmonization [ICH], 2020), only three negative compounds and 29 positive compounds were recommended, and the teratogenic effects of each of them on rats, rabbits, and humans were clarified. In 14 negative compounds tested in the current study, there were only two compounds, cetirizine and vildagliptin, included in the final version and they were also non-teratogenic in zebrafish. In 31 selected positive compounds, 18 compounds were in the 2020 version. Of these 18 positive compounds, 15 compounds were teratogenic and three compounds (cytarabine, ribavirin, and hydroxyurea) could not be assessed as teratogens in zebrafish. These results suggest that the zebrafish developmental toxicity assay may not be suitable for some types of compounds, or compound delivery through injection into zebrafish may be needed. Further investigations of compound structure-activity relationships and the assay optimization could clarify these and other issues. A challenging study is being planned to determine whether similar concordance could be reached with a larger set of pharmaceutical compounds and with mass spectrometry to quantify compound stability, absorption, metabolism and excretion.
As indicated in ICH S5(R3), the use of qualified alternative assays, such as non-mammalian in vivo assays, can reduce animal use while preserving the ability to detect relevant reproductive risks, and can be an appropriate approach for risk assessment under certain circumstances where they are interpreted in conjunction with routine in vivo reproductive testing. In addition, the use of qualified alternative assays can be a potential approach to defer in vivo testing as part of an integrated testing strategy. Our results in this study strongly support the zebrafish developmental toxicity assay as a predictable nonmammalian in vivo method for screening and assessing the teratogenicity of candidate compounds, and this zebrafish assay could be a promising alternative test system for regulatory acceptance. A multiple inter-laboratory validation study is being planned and will be reported in the future.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
The animal study was reviewed and approved by the zebrafish facility at Hunter Biotechnology, Inc. is accredited by the Association for Assessment and Accreditation of Laboratory Animal Care (AAALAC) International, the China National Accreditation Service for Conformity Assessment (CNAS), and the China Inspection Body and Laboratory Mandatory Approval (CMA).