In silico Prediction of Skin Sensitization: Quo vadis?

Skin direct contact with chemical or physical substances is predisposed to allergic contact dermatitis (ACD), producing various allergic reactions, namely rash, blister, or itchy, in the contacted skin area. ACD can be triggered by various extremely complicated adverse outcome pathways (AOPs) remains to be causal for biosafety warrant. As such, commercial products such as ointments or cosmetics can fulfill the topically safe requirements in animal and non-animal models including allergy. Europe, nevertheless, has banned animal tests for the safety evaluations of cosmetic ingredients since 2013, followed by other countries. A variety of non-animal in vitro tests addressing different key events of the AOP, the direct peptide reactivity assay (DPRA), KeratinoSens™, LuSens and human cell line activation test h-CLAT and U-SENS™ have been developed and were adopted in OECD test guideline to identify the skin sensitizers. Other methods, such as the SENS-IS are not yet fully validated and regulatorily accepted. A broad spectrum of in silico models, alternatively, to predict skin sensitization have emerged based on various animal and non-animal data using assorted modeling schemes. In this article, we extensively summarize a number of skin sensitization predictive models that can be used in the biopharmaceutics and cosmeceuticals industries as well as their future perspectives, and the underlined challenges are also discussed.


INTRODUCTION
Skin is a protective barrier against the external environment to guard internal organs, bones, and muscles. The skin is an organ of the integumentary system made of multiple layers containing epidermis (surface layer), and dermis (deeper layer) (Rehfeld et al., 2017). The topical application and transepidermal delivery of natural or synthesized chemicals are striking approaches for the discovery and development of drugs and medicines by physicians and pharmacologists (Alkilani et al., 2015); and for maintaining healthy skin in general by dermatologists (Kraft and Lynde, 2005;Spada et al., 2018). Upon skin contacts with chemicals or substances that could be non-allergy or allergen caused the hypersensitivity to the subject (Lee and Thomson, 1999). There are four types of skin hypersensitivity, fundamentally, based on the immunologic mechanism that mediates the disease, namely type I (immediate/IgE-related), in which the cutaneous skin test reaction reaches the peak at 2 h; type II (antibody and complement related cytotoxicity); type III (antigen-antibody complex mediated); and type IV or delayed type hypersensitivity (DTH) response, that can occur within 48-72 h (Lee and Thomson, 1999;Posadas and Pichler, 2007). Notably, skin sensitization or allergic contact dermatitis (ACD) is a type IV DTH or type IV allergy (Ouyang et al., 2014). ACD can substantially affect the life quality of patients with uncomfortable symptoms of skin rash, blister, and/or swollen that could persist for a lifetime in some cases (Strickland et al., 2016). In illness observation, ACD has affected more than 20% of North America's and Western Europe's population based on the data collected from all age groups, and the contact allergy tends to be more prevalent in younger children as the comparison with adults (Thyssen et al., 2007).
The adverse outcome pathways (AOPs) of skin sensitization are the sequential events from the initial skin exposure to chemicals, followed by triggering the downstream cascade pathways, which include induction and elicitation phases. The chemical sensitization pathway (CSP) is initialized by the adduct formation, viz. a covalent bond between skin proteins and chemicals to subsequently form a full antigen . Moreover, skin sensitizers have the same physicochemical properties as haptens, which are electrophilic per se (Roberts and Lepoittevin, 1998) or lead to the formation of free radicals (Gäfvert et al., 1994). In some cases, skin sensitizers can be named as prehapten, which initially are not electrophilic or radicals but can be activated through air exposure, photoactivation, bacterial degradation on the skin surface (Karlberg et al., 1992;Sköld et al., 2002) and skin sensitizers also can be termed as prohapten, which can be triggered through the metabolic pathway (Nilsson et al., 2005;Gerberick et al., 2008;van Eijl et al., 2012). As such, those skin sensitizers may act as electrophiles, whereas the skin protein functions as a nucleophile in the process of adduction formation. More specifically, those nucleophilic amino acids such as cysteine (thiols), histidine, lysine (primary amines), methionine, and tyrosine within the skin protein can interact or react with electrophilic hapten (Ahlfors et al., 2003;Gerberick et al., 2004;Schwöbel et al., 2011). This interaction with cysteine and/or lysine leads to the formation of covalent bonds and production of the haptenprotein complex consequently processed by both epidermal and dermal dendritic cells (DCs), which constitute the skin immune system (Ochoa et al., 2008;Clausen and Stoitzner, 2015). Subsequently, the DC presents the part of this protein complex (antigen) on major histocompatibility complex (MHC) and activates naïve T lymphocyte in the lymph node Huppert et al., 2018;Johnson et al., 2020). In addition, this can induce the differentiation and proliferation of T cells that, in turn, will propagate the inflammatory response throughout the whole body (Gefen et al., 2015). After the initial exposure, the secondary exposure to the same allergen will initiate the elicitation phase, in which the activated T cells are triggered to secrete specific cytokines to attract inflammatory cells entering into the epidermis of infected parts, causing rash, itchy, and burning on the exposed skin surface. The detail of induction and elicitation phases has been illustrated elsewhere (OECD, 2012). Markedly, the response in the elicitation phase of the immune system is faster than that in the induction phase (OECD, 2012).
The complications of whiting-cosmetics have been documented in recent years since the cosmetic ingredients are not only a major concern in the beauty industry but also a critical factor in human health. Some ingredients in brightening cosmetics such as hydroquinone, corticosteroids, and mercury can cause severe complications. For instance, chronic application of hydroquinone-contained cosmetics can result in exogenous ochronosis or "fish odor syndrome," the accumulation of mercury can lead to increased pigmentation, nail discoloration, and ACD; and the aggregation of corticosteroids will produce "steroid addiction syndrome" or induce acne on the anterior chest (Olumide et al., 2008;Ladizinski et al., 2011;Mahé, 2014). Cosmeceuticals is a burgeoning industry, in which cosmetic products can exert therapeutic effects (Martin and Glaser, 2011). Vitamin, hydroxy acids, growth factors, peptides, and botanicals, for example, are considered as the cosmeceutical ingredients (Martin and Glaser, 2011). Some skincare products also include ingredients with pharmaceutical properties as exemplified by Oz.Or. Oil 30, which cannot only soften the skin but also show the potential in antibacteria and ameliorating dermal wound healing (Serio et al., 2017). Sargafuran, which is extracted from marine brown alga, is a promising compound to be used in skincare cosmetics for preventing acne because of its antibacterial properties (Kamei et al., 2009). In addition, Food and Drug Administration (FDA) has already approved some antibiotics such as quinupristindalfopristin, linezolid, and daptomycin for the treatment of skin-structure infections (Schweiger and Weinberg, 2004). According to FDA regulation, a product can be both a cosmetic and a drug if it can meet the definitions of both cosmetics and drugs. However, the category such as "cosmeceuticals" has not been recognized by FDA. In contrast, cosmeceuticals are a subclass of cosmetics in Europe and Japan, and considered as a subclass of drugs in the UK (Pandey et al., 2021). However, the criteria for classifying compounds or raw materials from the same plant are various in different countries. For instance, the raw materials of C. limon have been considered as the natural ingredients to potentially threat human health by the European Food Safety Authority (EFSA). However, the oil and extracts from this species is classified as safe products by FDA (Klimek-Szczykutowicz et al., 2020), suggesting that these criteria are not universally applicable.
Skin sensitization is an increasingly important issue that can be manifested by the number of publications about skin sensitization as illustrated in Figure 1. It can be observed that the number of published literature has gradually increased in recent years, especially the dramatic increase after 2000. The consumption and interest in the cosmetic market have progressively increased by 5% every year and it is expected to reach 31.75 billion US dollars by 2023 (Kumar, 2005;Orbis-Research, 2018). The potential benefits and demand are still high and the information about toxicity, physicochemical, and bioactivity properties of the cosmetics ingredients need to be promoted (Kumar, 2005;Panico et al., 2019), the growth of the global cosmetics market is updated annually at http://www. statista.com/statistics/297070/growth-rate-of-the-global-cosmeticsmarket/.

SKIN SENSITIZATION ASSAY
Various tests have been devised to evaluate the potential of the human skin sensitization of new substances and they can be basically classified into human tests, animal tests, and non-animal tests as enlisted in Table 1. The more detailed test information will be discussed as follows.

Human Tests
Human tests for skin sensitization include human repeat insult patch test (HRIPT) and human maximization test (HMT) (Kligman, 1966;Kligman and Epstein, 1975;Marzulli and Maibach, 1974). In both tests, the human skin reaction is recorded after the secondary contact between a tested substance and human skin. Generally, the response of the tested substance is classified into 5 levels according to the incidence of the positive response from test subjects in the HMT system: weak (0-2/25), mild (3-7/25), moderate (8-13/ 25), strong (14-20/25), or extreme sensitizer (21-25/25) (Kligman, 1966). The HRIPT classification system is instituted according to the grades of skin reactions: 1) erythema; 2) erythema and induration; 3) vesiculation; and 4) bulla formation and only the substances of grade 1 are qualified as non-sensitizers (Marzulli and Maibach, 1974). To date, the classification systems of skin sensitization in human tests are not consistent and often depend on the subjective judgment of experts (Gerberick et al., 2001;Roberts, 2018). In the classification and labeling of chemicals of the globally harmonized system (GHS), chemicals are classified as subcategory 1A or 1B if their HRIPT or HMT values are ≤500 μg/cm 2 or HRIPT or HMT values are >500 μg/cm 2 , respectively. Both subcategories are experimentally considered as skin sensitizers. Non sensitizers are not classified in this classification system (ICCVAM, 2011;United Nation, 2013). Nevertheless, the other classification system has also been proposed, in which chemicals are classified into six skin sensitization categories based on their no observed effect level (NOEL) values of HRIPT as enlisted in Table 2 (Basketter et al., 2014). Nowadays, human tests are only implemented to confirm skin sensitization of test chemicals under specific conditions and no maximum concentrations are allowed to apply due to ethical issues .

Animal Tests
Besides human tests, there are various animal tests have been conducted to evaluate the potential of the human skin sensitization for new substance, namely local lymph node assay (LLNA), which depends on the nature of AOP key events as listed in Table 1 (OECD, 2012), Guinea pig maximization test (GPMT), and Buehler tests. Of various animal assay systems, LLNA (OECD, 2010a;2010b), which is based on the extent of induced proliferative responses in draining lymph nodes after the topical exposure of chemicals to mice (stimulation index, SI), is the preferred animal test model and has been adopted by various regulatory agencies (Cockshott et al., 2006;Gerberick et al., 2007a). The LLNA system is designated to measure the substance concentration when the lymphocyte proliferation of the lymph node is three-fold higher than that of the vehicle-treated controls, viz. SI S 3, and is defined as the LLNA EC3 value. The risk potential of skin sensitizers is categorized into various classes according to the measured LLNA EC3 values as summarized in Table 3 (ICCVAM,  2011). The LLNA EC3 value has been converted from the percentage to µg/cm 2 since 2001 to develop the correlation between LLNA EC3 and NOEL value from human tests, namely HRIPT and HMT. It has been found that the EC3 value can be used to quantitatively estimate the skin sensitization potency in human since EC3 values can be highly correlated with NOELs (Gerberick et al., 2001) that also has been confirmed by Api et al. (Api et al., 2015).
The GPMT method is another popular animal model, in which intradermal injection and/or epidermal application were employed in induction periods to expose guinea pig skin with the test substances. The animals are repeatedly exposed to test substances with a challenge dose after 10-14 days. The skin reaction to the challenge exposure in the test animals is determined by comparing it with the untreated control animals (OECD, 1992). The skin sensitization potential of chemicals can be classified into various extents, namely extreme, strong, moderate, or weak levels, depending on the induction concentration and the incidence of subjects as listed in Table 4 (ICCVAM, 2011). The Buehler method is another test to use guinea pig skin as a module. The only difference between GPMT and Buehler test is the way of sample preparation in that the test substance is mixed with Freund's complete adjuvant (FCA) in the GPMT test, whereas that step is absent in the non-adjuvant Buehler method (OECD, 1992). Moreover, LLNA and GPMT can be carried out in a synergistic fashion to evaluate skin sensitization. More specifically, there is no need to carry out the GPMT or Buehler test for further validation once the test substance is defined as skin sensitization positive by LLNA. Nevertheless, a substance is subjected to further evaluation by GPMT or Buehler test in case it is qualified as skin sensitization negative by LLNA (OECD, 1992).
The LLNA and GPMT skin sensitization models cannot completely serve as a surrogate to predict skin sensitization potential in human since they can only accurately predict 70% of human tests in addition to the fact that LLNA and GMPT do not always reach the same agreement. It has been shown that the LLNA model can foretell human skin sensitization better than GPMT in case of discordance between LLNA and GPMT assays (Dean et al., 2001). As such, animal tests cannot completely replace their human counterparts because of their limitations. Surprisingly, it has been observed that one-third of strong sensitizers in the human test were predicted to be weak sensitizers by LLNA despite the fact that the LLNA test is commonly recognized as the gold standard for the human skin sensitization test (ICCVAM, 2011;Strickland et al., 2017;Roberts and Api, 2018), indicating the discrepancy in both assay systems. In addition, LLNA predictions can correlate with human tests well as long as those sensitizers lie within the applicability domain of the LLNA model. The inconsistent predictions, nevertheless, are not due to randomness. More specifically, the under-estimation of human skin sensitization potency by the LLNA model can be principally attributed to the fact that the test chemicals contain electrophilic aromatic Schiff bases or impurities (Roberts and Api, 2018). Conversely, the LLNA model is prone to over-estimating the potency when compared with the human test if the test chemicals under LLNA conditions will undergo autoxidation or have cutaneous pharmacological potentials other than skin sensitization (Roberts and Api, 2018). Furthermore, chemicals, namely pre-or pro-haptens, can be falsely predicted (Roberts and Api, 2018). Moreover, animal tests for skin sensitization that have been adopted for a long time still comprised some controversial issues concerning their effectiveness and ethical problems (Rollin, 2003). In 2017, Predictive Toxicology Roadmap was established by FDA. In this project, they evaluate new methods and technologies that can expand the predictive capabilities of toxicology and reduce the use of animal testing. With the same goals, the in vitro testing methods have been evaluated by a Consortium comprising the Institute for In vitro Sciences, Inc (IIVS) the Consumer Healthcare Products Association (CHPA), and the PETA International Science Consortium (PETA-ISC) to substitute rabbit vaginal irritation (RVI) test (Costin et al., 2020). There is a growing tendency, nevertheless, to use non-animal tests as an alternative approach to assess skin sensitization (Doke and Dhawale, 2015).

Non-animal Assays
Animal testing approaches for cosmetic products have been banned due to animal rights and welfare by the 7th amendment to the EU Cosmetics Directive in Europe since 2013 (European Union, 2003;European Commission, 2013a). Notably, some non-animal testing methods have been developed Substances known as contact allergens produce sensitization in 0.01-0.1% of those exposed Between 500 and 2,500 μg/cm 2 4 Chemicals in this category require prolonged exposure to higher dose level to produce sensitization and are rarely regarded as important clinical allergens More than 2,500 μg/cm 2 5 Very low intrinsic ability to cause skin sensitization. Even in the highly selected patient groups, the incidence should not exceed 1% The NOEL values are variable or absents, because of the inaccuracy of determination of a threshold 6 Free from skin sensitization activity  For instance, those methods accepted by the Organization for Economic Co-operation and Development (OECD) only focus on one AOP key event or activation of some specific genes, such depletion of cysteine and/or lysine-containing protein in Direct Peptide Reactivity Assay (DPRA) (OECD, 2015a), CD86 and CD54 overexpression in human cell line activation test (h-CLAT) (OECD, 2018c), induction of nuclear factor-erythroid-2 related factor 2 (Nrf2)-Kelch-like ECH-associated protein 1 (Keap1)antioxidant/electrophile response element (ARE) pathway in KeratinoSens ™ (OECD, 2015b), CD86 overexpression for U-SENS ™ test (OECD, 2016b), and the expressions of antioxidation, inflammation, and cell migration genes in SENS-IS test (Cottrez et al., 2016).
DPRA is a non-animal model focused on the hapten and protein interaction due to skin exposure to chemical substances. This method was first proposed by Gerberick and has been further accepted by OECD since 2015 (Gerberick et al., 2004;Gerberick et al., 2007b;OECD, 2015a), in which the synthesized peptides such as cysteine (Ac-RFAACAA-COOH) and lysine (Ac-RFAAKAA-COOH) (R: Arginine, F: Phenylalanine, A: Alanine, C: Cysteine K: Lysine) are incubated with test substances, followed by measuring the absorption peaks at 220 nm to determine the concentration of cysteine and lysine after the reaction. Generally, skin sensitization can be divided into 4 classes, namely negative, positive with low, moderate, or high reactivity level (OECD, 2015a). Nevertheless, DPRA can be limited by solubility and complex mixture (Gerberick, 2016). In addition, the accuracy of measurement results would be hampered by the fact that chemicals could be co-eluted with the peptide (Natsch et al., 2007;Natsch and Gfeller, 2008). Until now, the DPRA has been improved by another version such as amino acid derivative reactivity assay (ADRA) to prevent the coelution of test chemicals and nucleophilic agents Wanibuchi et al., 2019;Imamura et al., 2021). This method has been accepted by OECD (OECD, 2020). Another modified DPRA called kinetic direct peptide reactivity assay (kDPRA), in which several concentrations of tested compounds are incubated with synthetic peptide for different incubation times, has been accepted by OECD (OECD, 2020). The matrix of depletion values and incubation times and concentrations are constructed to measure the rate constant (log k max ). The test compounds are further classified into GHS classification scales by their log k max values. (Natsch et al., 2020). The reproducibility between intraand inter-laboratories for this method achieved 96 and 88%, respectively .
The KeratinoSens ™ method takes a different approach by focusing on a second AOP key event ( Table 1), namely the inflammatory responses and gene expression associated with specific cell signaling pathways such as ARE-dependent pathways. Keap1 binds to the transcription factor Nrf2 in the un-induced state that helps ubiquitin bind to Nrf2 by CuI2mediated ubiquitinylation, which, in turn, can degrade Nrf2 into the proteasome (de Freitas Silva et al., 2018). However, Keap1 cannot bind to Nrf2 protein once the covalent bond is formed between Keap1 and small molecules such as sensitizers, leading to the accumulation of Nrf2 protein in the nucleus. The released Nrf2 protein binds to ARE sequence in the promoter regions of detoxification, antioxidant, and anti-inflammatory genes, triggering the expression of target genes (de Freitas Silva et al., 2018). The mechanism of Nrf2-ARE pathway activation was illustrated in Figure 2 of de Fritas Silva et al. (de Freitas Silva et al., 2018). Accordingly, the human keratinocytes HaCaT cells are stably transfected with the selected plasmid, which contains the ARE sequence, SV40 promoter, and luciferase gene (luc2) in the KeratinoSens ™ test (Emter et al., 2010;Steinberg, 2013). The test chemicals are designated as sensitizers in the KeratinoSens ™ test provided that they can produce the induction of luciferase activity above 1.5 folds with respect to the negative control or non-sensitizers otherwise (OECD, 2015b). The LuSens assay is another method accepted by OECD and is developed based on the same concept as KeratinoSens ™ (Ramirez et al., 2014;OECD, 2018b).
The activation of DCs is the AOP key event investigated in the h-CLAT method ( Table 1). In the induction phase of skin sensitization, the co-expression of CD86 and CD54 on the Langerhans cells is used as the indicator of the antigenpresenting process (Nuriya et al., 1996;Reiser and Schneeberger, 1996;Tuschl et al., 2000). Thus, the expressions of CD86 and CD54 on THP-1 cells, which are a human monocytic leukemia cell line, are measured in the event when THP-1 cells are exposed to sensitizers in the h-CLAT method (Sakaguchi et al., 2006;Sakaguchi et al., 2009). The upregulation of these markers indicates the occurrence of DCs activation and the skin sensitization activity caused by the test chemical. Of note, chemicals are further classified as sensitizers or non-sensitizers in this test (OECD, 2018c). In addition, the U-SENS ™ test method for skin sensitization testing is based on the expression of CD86 cell surface marker on the U937 cells, which are a human histiocytic lymphoma cell line (Piroird et al., 2015). Briefly, a compound is considered to be a sensitizer when CD86 expression in the U937 cell line is 1.5 fold higher than the untreated control and non-sensitizer otherwise. It is of interest to note that this method has been submitted to OECD and the drafted proposal has been publicized on the OCED website (OECD, 2016b;OECD, 2018c). This method recently has been used to evaluate the role of nanomaterials in skin sensitization (Bezerra et al., 2021).
SENS-IS is another non-animal method to measure the skin sensitivity of chemicals using the commercially reconstituted human skin (EpiSkin) (Netzlaff et al., 2005;Cottrez et al., 2015), in which the expression levels of Redox and SENS-IS genes are measured. The former includes 17 genes that contained an ARE in their promoter (Cottrez et al., 2016), which are related to the target genes modulated by the Nrf2-Keap1-ARE signaling pathway, whereas the latter includes 21 genes, which are linked to the activities of DCs and associated with inflammation, danger signals, and cell migration. Those genes measured in the SENS-IS group can be triggered by sensitizers but not under the control of the Nrf2-Keap1-ARE pathway (Cottrez et al., 2015;Cottrez et al., 2016). There are four chemical concentrations, namely 50, 10, 1, and 0.1%, applied onto the artificial skin, followed by collecting the mRNA from the EpiSkin cells and analyzing the gene expression using reverse transcriptase polymerase chain reaction (RT-PCR). When the number of expressed genes is more than 7 and less than 20 in both groups, the test chemicals are defined as sensitizers and subsequently categorized as weak, moderate, strong, or extreme sensitivity depending on the chemical concentrations that, in fact, is similar to the classification system adopted by LLNA. The test chemical concentration will be lowered when there are 20 genes expressed that are termed overexpression. Moreover, a chemical is considered as negative in case of failures in all tested concentrations (Cottrez et al., 2016). Until now, this method is still under the validation process, which has been announced at https://tsar.jrc.ec.europa.eu/.
Various research groups have published their assay data using those above-mentioned methods and the results are summarized in Table 5, which provides affluence of data source for building in silico models. Until now, there are still many researchers endeavoring to improve the accuracy of in vitro assay for assessing the skin sensitization potential such as finding new biomarkers for predicting skin sensitization (Hirota and Moro, 2006) or developing a novel assay like Genomic Allergen Rapid Detection (GARD ™ ) to define the skin sensitization activity by only one assay (Roberts, 2018). GARD ™ depends on the changes of the gene expression when myeloid cells are exposed to the chemicals (Johansson et al., 2013;Johansson et al., 2019;Johansson et al., 2011;Roberts, 2018;Stevenson et al., 2019). This method was validated by numerous laboratories with an inter-laboratory reproducibility of 92.0% in 2019 (Johansson et al., 2019), and has been under the peer-review process for EURL ECVAM validation, which has been announced at https:// tsar.jrc.ec.europa.eu/. Furthermore, the conformal prediction has been implemented into GARD ™ protocol with an accuracy of 88%. In 2021, Masinja et al., nevertheless, used GARD ™ to predict the skin sensitization potential of agrochemical active ingredients in total 7/12 GARD ™ results concurred with mammalian data, suggesting that GARD ™ still needs to be improved to validate the skin sensitization potential of not only cosmetics, but also the other active ingredients (Masinja et al., 2021). Another way is to modify or to improve the current non-animal methods to further increase the accuracy. For instance, the spectro-DPRA method using 5, 5-dithiobis-2nitrobenzoic acid, or fluorescamine ™ as the detection reagent was designed to investigate the unreacted peptide in 2014. It was demonstrated that the accuracy of this method could increase up to 91.5 and 94.9% when compared with LLNA and human data, respectively (Cho et al., 2014;Cho et al., 2019).
Most of the non-animal tests such as U-SENS ™ , h-CLAT, and KeratinoSens ™ are qualitative per se, in which compounds are divided into skin sensitization positive and negative, viz. a binary classification fashion (OECD, 2015b;2016b;2018c), whereas DPRA and SENS-IS are basically quantitative, in which the levels of skin sensitization potential are determined (OECD, 2015a). Additionally, the non-animal tests such as DPRA, h-CLAT, and KeratinoSen ™ are routinely used as the preliminary screening by Europe, whereas others such as U-SENS ™ and SENS-IS can be implemented to further characterize the nature of skin sensitization (OECD, 2016b(OECD, , 2019. The non-animal models for skin sensitization have been adopted for a long time and the first non-animal DPRA model has been accepted by OECD since 2015. However, not all chemicals such as insoluble chemicals, pro-haptens, and chemicals co-eluting with the model peptide can be assessed by DPRA that may severely limit their applications. These chemicals, nevertheless, can be evaluated by in silico models in the preliminary phase (Urbisch et al., 2016). As such, in silico models are expected to be a useful method for predicting skin sensitization of novel chemicals in this aspect.

IN SILICO MODELS Data Source
Europe has banned animal tests to verify the safety of cosmetic products such as toxicity in repeated dose systems, skin sensitization, reproductive toxicity, carcinogenicity, and toxicokinetics since 2013 (European Commission, 2013a). Alternatively, various non-animal tests, namely DPRA, KeratinoSens ™ , and h-CLAT, have been derived and accepted by OCED (vide supra). In addition, various skin sensitization data have been published according to the collection of animal and non-animal data, as well as chemical structure information are listed in Table 5. Some online skin sensitization data sources, which have collected the data and structural alerts, can be used to build predictive models and are listed in Table 6. Of various skin sensitization databases, SkinSensDB, which has collected the animal and non-animal tests and contains 710 unique   Wang et al., 2017), is freely accessible. The non-confidential substance data, which have been submitted to European chemicals agency (ECHA) under the Registration, Evaluation, Authorization and Restriction of Chemicals (REACH) regulation, are publicly available and free of charge. This dataset contains many types of assays and study categories. Another available source is Cosmetic ingredient database (CosIng). This European Commission database includes the information about cosmetic substances and ingredients ( Table 6). Another source is Chemical Evaluation and Risk Estimation System (CERES), which is developed by FDA's Center for Food Safety and Nutrition. This database contains toxicity data including the skin sensitization hazard and potency (Ghosh et al., 2020).Various predictive models and packages to predict skin sensitization have been published (Wilm et al., 2018). Some models provide structure alerts based on the analysis of chemical characteristics that are responsible for skin sensitization (Sushko et al., 2011). ToxAlerts was established in 2012 serving as a valuable data source for model development to predict the chemical toxicity. Initially, 600 structural alerts for carcinogenicity, mutagenicity, skin sensitization, acute aquatic toxicity, and potential idiosyncratic drug toxicity were issued (Sushko et al., 2012), and the number has increased to more than 3,000 structural alerts to date. The Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) is a permanent committee of the National Toxicology Program (NTP) Division, which is responsible for evaluating the toxic potential, developing and validating the toxicology methods, collecting the data to strengthen the scientific base for risk assessment. ICCVAM has established a database for skin sensitization with the collection of 1,060 chemicals for the LLNA test and 208 chemicals for the GPMT and Buehler tests (ICCVAM, 2011). Vitic is a commercial toxicity database and information management system developed by Lhasa, consisting of more than 38,000 skin sensitization data for more than 10,000 structures. eChemPortal, which has been developed by OECD, is a free public source and provides the chemical characteristics of physical-chemical properties, environmental fate and behavior, ecotoxicity, and toxicity. The chemical information can be searched using chemical names and numbers or GHS classifications.

Commercial Package
Numerous commercially available packages and/or models have been published and are listed in Table 7. Computer automated structure evaluation (case) Ultra program is a commercial package to issue the structure alerts, in which, principally, molecular structures are divided into various subunits and those ones responsible for specific activities are identified and termed biophore (Klopman, 1992). Moreover, different subunits can give rise to different biophore activities. A subunit is termed synergistic or biophobic if it could increase or decrease the activity, respectively (Graham et al., 1996;Chakravarti et al., 2012). As such, this package can predict the effective level of skin sensitization for a given compound and has been validated by predicting various adverse effects of drugs, namely carcinogenicity, hepatotoxicity, cardiotoxicity, renal toxicity, and reproductive toxicity. Saiakhov et al. have carried out a pilot study using case Ultra to analyze other adverse effects, including skin sensitization (Saiakhov et al., 2013). A non-sensitizer might be converted into a sensitizer through a biodegradation metabolism pathway (Jaworska et al., 2002). CATABOL (http://oasis-lmc.org/products/models/ environmental-fate-and-ecotoxicity/catabol-301c.aspx) is an online package that can simulate the metabolic pathways of chemicals by predicting the abiotic molecular transformation and enzyme-mediated reactions such as reductive, hydrolytic, oxidative, redox, conjugative reactions, reactions with skin protein, as well as predicting the chemical transformation through spontaneous reactions, enzyme-catalyzed metabolism reactions, and reactions with protein nucleophiles (Jaworska et al., 2002). The tissue metabolism simulator (TIMES) model based on the prediction from CATABOL consists of simulators: 1) using the microbial metabolism simulator to generate the metabolic maps from the training samples; 2) evaluation of skin sensitization potential in light of the metabolic maps and structural alerts (Dimitrov et al., 2005). The TIMES model for skin sensitization (TIMES-SS) package is commercially available and the information about skin metabolism associated with skin sensitization is available online (http://oasis-lmc.org/products/ models/metabolism-simulators/skin-metabolism.aspx). The training samples were excerpted from LLNA, GMPT, and human datasets (Dimitrov et al., 2005;Mekenyan et al., 2012). Ivanova et al. expanded the development of the kinetic component into the TIMES-SS model in 2020. In this model, they tried to implement the kinetic of biotic transformations to predict the skin sensitization potential. The initial predictions were consistent with the experimental data for those tested compounds (Ivanova et al., 2020).
The skin sensitization activity of a chemical will also depend on the transformation ability from prohapten into hapten in that the sensitizers themselves are not electrophilic per se. Nevertheless, they can undergo enzymatic or oxidative processes to become electrophilic that, in turn, can facilitate the interaction with skin protein, producing antigens consequently (Aptula et al., 2005). Unlike the other packages, Computer Aided Discovery and Redesign-Skin Sensitization (CADRE-SS) is focused on such biological transformation and is comprised of three modules to analyze the reaction in each step: I) skin permeability; II) haptenation and hapten-activation mechanisms, and III) conjugation with protein. The interaction potential between chemicals and skin protein is analyzed by module II using the Smiles ARbitary Target Specification (SMARTS) pattern structure, and compounds are subjected to further analysis by module III once the chemicals are identified as potential haptens. The key event in this process is the adduct formation between the chemical and the Keap1 protein, which contains highly reactive cysteine and lysine amino acids (Kostal and Voutchkova-Kostal, 2016).
SMARTS patterns have been mined by ToxTree (Enoch et al., 2008) to identify the potential of skin sensitization. A series of SMARTS patterns based on the previously identified mechanisms of action have been identified, namely aromatic nucleophilic substitution (SNAr), Schiff base formation (SB), Michael-type addition (MA), aliphatic nucleophilic substitution (SN2), and acylation (Ac) (Aptula et al., 2005), in which the covalent bond can be formed between skin protein and sensitizer (Enoch et al., 2008;Enoch et al., 2011). Totally, 104 structural alerts were issued in 2011  and the most updated version is commercially available at https://www.daylight.com/products/ toolkit.html through Daylight Toolkit.
Deductive estimation of risk from existing knowledge (DEREK) is an expert knowledge system-based commercial predictive package, in which the structure alerts are proposed to predict the binding potential between electrophilic chemicals and skin protein. Initially, only 40 structure-activity rules for skin sensitization were issued in 1994 (Barratt et al., 1994), and that number increased up to 70 in 2006 (Langton et al., 2006) from the GPMT input data. The modified version of Derek Nexus was released in 2017 using the LLNA EC3 value from over 650 compounds in the Lhasa EC3 dataset (https://www. lhasalimited.org/products/skin-sensitization-assessment-usingderek-nexus.htm) instead of GPMT, which was used in the previous versions. This version features the qualitative prediction for mammalian skin sensitization and the quantitative EC3 prediction for skin sensitizers (Canipa et al., 2017). The skin sensitization structure alerts in Derek Nexus increased from 73 to 90 between 2014 and 2018 and the performance was validated against a dataset over 2,500 chemicals with LLNA and/or GPMT data (Chilton et al., 2018).

Models Based on Animal Tests
Computer Assisted Evaluation of Industrial Chemical Substances According to Regulations (CAESAR) was developed according to the QSAR validation principles issued by OECD. This model was

References Model
Commercial package Klopman (1992) Case Jaworska et al. (2002) CATABOL (http://oasis-lmc.org/products/models/environmental-fate-and-ecotoxicity/catabol-301c.aspx) Dimitrov et al. (2005) TIMES-SS (http://oasis-lmc.org/products/models/metabolism-simulators/skin-metabolism.aspx) Kostal and Voutchkova-Kostal (2016) CADRE-SS Models based on animal tests Enoch et al. (2008) ToxTree Barratt et al. (1994), Langton et al. (2006) Derek Nexus Wilm et al. (2018) CAESAR ( built by the EU (Cassano et al., 2010;Wilm et al., 2018) and is freely available (http://www.caesar-project.eu). CAESAR can be used to develop QSAR models for five endpoints, namely skin sensitization, carcinogenicity, mutagenicity, bioconcentration factor, and developmental toxicity. The CAESAR model for skin sensitization was derived based on 209 compounds excerpted from a previous study (Gerberick et al., 2005) to classify compounds into sensitizer or non-sensitizer (http://www. caesar-project.eu/index.php?page results&section endpoint&ne 2). Afterward, Virtual models for property Evaluation of chemicals within a Global Architecture (VEGA) derived from CAESAR model can predict skin sensitization based on the LLNA data. This binary classifier, which is freely accessible, can be downloaded at http://www.vega-qsar.eu  and the latest version can be found at https://www.vegahub.eu/. Fitzpatrick et al. compared the performance of VEGA, TIME-SS, and Derek Nexus in skin sensitization by applying 1,249 substances from the eChemportal skin sensitization dataset (http://www.echemportal.org/echemportal/index.action) and 515 substances from the Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEA ™ ) LLNA database (https://ntp.niehs.nih.gov/pubhealth/evalatm/test-method-evaluations/ immunotoxicity/index.html). The results showed that the accuracy of any expert models was about 65%, especially with the substances that were within the application domain of TIME-SS, the accuracy could be reached to 79 and 82% for both datasets (Fitzpatrick et al., 2018). This comparison, in fact, is consistent with the observation made by Teubner et al., in which it has been demonstrated that TIME-SS executed better than the others such as VEGA and DEREK (Teubner et al., 2013). Another model developed by Istituto di Ricerche Farmacologiche Mario Negri (IRCCS) and the Joint Research Center (JRC) is available at https://www.vegahub.eu/. The model is built based on decision trees using 8 descriptors, which are listed at https://www.vegahub.eu/vegahub-dwn/ qmrf/QMRF_SKIN_JRC.pdf. The endpoint of this model is skin sensitization on mice (LLNA). When applied to external validation, this model can obtain the accuracy, specificity and sensitivity of 71, 82 and 65%, respectively. All 75 in silico models imbedded in VEGA have been implemented into the OECD QSAR Toolbox.
The Correlation and Logic (CORAL) package has been used as a tool for QSAR analyses (Toropov et al., 2013). Afterward, this software was used to develop a tool to predict the skin sensitization (Toropova and Toropov, 2017). Various QSAR models were built based on 204 compounds with the local lymph node assay results using the Monte Carlo technique. The hybrid descriptors calculated via the representation of the molecular structure by SMILES with molecular graph were used to generate the models. The model is available at http://www. insilico.eu/coral. Very recently, the first ternary predictive model has been developed by Wilm et al. termed Skin Doctor CP (available at: https://nerdd.zbh.uni-hamburg.de/skinDoctorII/) based on the LLNA database. The most distinguishing characteristic of this model is that compounds are initially categorized into sensitizers or non-sensitizers by the first classifier and the predicted sensitizers are further grouped into weak to moderate sensitizers and strong to extreme sensitizers by a second classifier. The model showed the accuracies of 0.90 and 0.73 as well as the efficiencies of 0.42 and 0.90 at the significance levels of 0.10 and 0.30, respectively. However, this ternary classifier did not achieve good performance since the validity values were 0.70, 0. 58, and 0.63 for non-sensitizers, weak to moderate sensitizers, and strong to extreme sensitizers, respectively, at the significance level of 0.30 (Wilm et al., 2020).

Models Based on Non-animal Tests
Unlike the other predictive models, which rely on a single data type, some predictors make decisions based on multiple data types. For instance, Otsubo et al. have built a binary classifier based on KeratinoSens ™ and h-CLAT, and chemicals are designated as skin sensitizers if they have positive results by either one of the assays and non-sensitizers otherwise. The predictions produced the sensitivity values of 93.4 and 94.4% as compared with the LLNA and human data, respectively (Otsubo et al., 2017). After collecting data from three nonanimal assays, namely DPRA, KeratinoSens ™ , and h-CLAT, this study was further extended to build a majority voting system. Compounds were defined as sensitizers when at least two positive responses were obtained from those three assays. The accuracy obtained from this model was 90% when compared with human data, whereas that resulted from LLNA alone was merely about 80% as compared with the human data (Urbisch et al., 2015). According to this, multiple data type models can execute better than their single-data-type counterparts.
Asturiol et al. took a different approach to develop a qualitative skin sensitization predictive model using decision tree (DT) (Asturiol et al., 2016). The model was derived by combining 3 non-animal test data types, namely DPRA, KeratinoSens ™ , and h-CLAT. The accuracy of the model was defined by comparing with the LLNA classification (sensitizer/non-sensitizer). The model showed 93% accuracy, 98% sensitivity, and 85% specificity for 269 chemicals (Asturiol et al., 2016). A different approach was taken to build various DTs based on non-animal test results, in which compounds were classified as sensitizers when DPRA gave rise to positive results. Further evaluation by h-CLAT was carried out once compounds were considered as negative by DPRA. Compounds were classified as skin sensitizers if h-CLAT showed positive results, whereas compounds were labeled as non-sensitizers otherwise . Additionally, various models were developed according to binary combinations of those three non-animal tests, namely DPRA, h-CLAT, and KeratinoSens ™ . It was found that the combination of DPRA and h-CLAT performed best in distinguishing sensitizers from non-sensitizers. More importantly, it was observed that all of the models based on combinations of non-animal tests usually performed better than their counterparts based on a single test that, actually, is consistent with the previous observation (vide supra) (Urbisch et al., 2016;Otsubo et al., 2017), suggesting that predictive models based on the single non-animal test are not sufficient to comprehensively render the skin sensitization complicated Frontiers in Pharmacology | www.frontiersin.org May 2021 | Volume 12 | Article 655771 9 process (Adler et al., 2011;Hartung et al., 2011;Wilm et al., 2018;Madden et al., 2020).

Models Based on Mixed Test Types
Most of the published packages or models are binary classification systems, viz. sensitizer vs. non-sensitizer, based on one or more than one non-animal tests. Integrated approaches to testing and assessment (IATA) has taken a different approach by combining various animal tests, non-animal tests, and in silico models to predict the latency of skin sensitization (OECD, 2016a). IATA includes the models, which are flexible and non-formalized judgment based, e.g. grouping and read-across or more structured, rule based approaches such as Integrated Testing Strategy (ITS) (OECD, 2016a). ITS can combine DPRA, KeratinoSens ™ , and h-CLAT (Jaworska et al., 2015;Urbisch et al., 2015), DPRA, SENS-IS and/or h-CLAT (Clouet et al., 2017), two of 3 non-animal tests, namely DPRA, KeratinoSens ™ , and h-CLAT, to generate the predictive model (Otsubo et al., 2017) based on in chemico, in vitro, and in silico data (Jaworska et al., 2011). It has been found that the models based on this strategy showed better performance. For instance, this approach was implemented to develop various models based on the combinations of 2 or 3 non-animal datasets, namely DPRA, KeratinoSens ™ , and h-CLAT and the built model with the selection of 3 non-animal datasets displayed the highest sensitivity and yet the lowest specificity as compared with its counterparts with the combination of only 2 of 3 non-animal datasets (Otsubo et al., 2017). In 2017, Douglas Connect Integrated Testing Strategy (DC ITS) SkinSens was launched to access the integrated testing strategy developed by Jaworska . The latest updated version for DC ITS SkinSens is SaferSkin ™ (https://saferworldbydesign.com/saferskin/). SkinSensPred, which is a skin sensitization predictive function, was developed in 2019 based on SkinSensDB, is freely accessed at https://cwtung.kmu.edu.tw/skinsensdb/predict (Tung et al., 2018;Tung et al., 2019). This multitask learning model is based on three AOP key events and human skin sensitization test using protein binding (DPRA), keratinocyte activation, dendritic cell activation to binarily classify results in the human test. This model can analyze the application domain (AD) and structure alerts (SA) to predict the human sensitization potential of a chemical. When applied to novel chemicals within the defined AD, this model could reach an accuracy of 84.3% (Tung et al., 2019). In addition, a majority voting model (2 out of 3) (Urbisch et al., 2015) and a DT model  can be implemented as the read-across predictive methods.
In 2018, Del Bufalo et al. developed an alternative integrated testing for skin sensitization using the combination of 3 in vitro methods (DPRA, Keratinosens ™ , U-SENS ™ ), two in silico tools (TIMES-SS, TOXTREE) and physicochemical parameters (volatility, pH). These data were run in 5 different classification models (Boosting, Naive Bayes, support vector machine (SVM), Sparse Partial Least Squares Discriminant Analysis, and Expert Scoring). The validation results were used in the stacking metamodel to evaluate the skin sensitization potential (Del Bufalo et al., 2018). The predictions achieved the accuracies of 93 and 91% for the training set and test set, respectively when compared with LLNA hazard data. A larger data set was used to validate this model in 2020 (Tourneix et al., 2020). The Defined Approach (version 5) was used (Tourneix et al., 2020) to evaluate the skin sensitization potential of 219 compounds.
Pred-skin, which is accessible at http://predskin.labmol.com. br/, is a consensus Naïve Bayes model that employs multiple QSAR models based on various human, LLNA, and non-animal data to predict skin sensitization. This model exhibited good performance in predicting human skin sensitization with sensitivity (94%) and specificity (84%). When applied to 11 new potential sensitizers, which were not included in the dataset, Pred-skin exerts an efficient approach to identify nine sensitizers (Borba et al., 2020a;Braga et al., 2017). Zang et al. have published an in silico model, which was derived by combining the non-animal data, namely DPRA, h-CLAT, and KeratinoSens ™ , and six physicochemical properties, namely octanol/water partition coefficient, water solubility, vapor pressure, melting point, boiling point, and molecular weight, as the descriptors to predict LLNA and human outcomes . Compounds were classified into sensitizers or nonsensitizers in this model, and the sensitizers were further divided into 1A (strong) or 1B (weak) sensitizer subcategories based on the GHS. The model achieved the accuracy of 88% for the prediction of LLNA outcomes and 81% prediction for human test outcomes .
Ohtake et al. have published a predictive model based on highly heterogeneous data, namely in silico Derek Nexus, in chemico DPRA, and in vitro h-CLAT, in which the results of DPRA and h-CLAT were scaled between 0 and 3, and the outcomes from Derek Nexus were reduced between 0 and 1, and the final total score was generated by summing those scores. A compound is defined as a strong sensitizer when its total score is larger than 7, and a weak sensitizer when its total score is between 2 and 6 (Ohtake et al., 2018). The unique characteristic of this model is the fact that the skin sensitizers are further divided into the strong and weak ones in this model despite the fact that this multiple classification system is not the same as the animal or human test classification. However, only nine isocyanates were included and the prediction results indicated that this model underestimated the skin sensitization potential when compared with LLNA data.
In 2020, Silva et al. used the different combinations of in vitro (human information), in chemico (DPRA), and/or in silico (the formation descriptor calculated by the TIMES-SS) data to build the models, which can predict the skin sensitization potential. The results showed that the combination of in vitro, in chemico, and in silico achieved the best prediction results. Moreover, the models reached an accuracy of 100% in differentiating sensitizers from non-sensitizers. When the same model was, it exhibited the accuracies of 98.8 and 97.5% accuracy when applied to the compounds based on GHS classification (3-level scales) and human data (6-level scales), respectively (Silva et al., 2020).

Machine Learning-Based Models
Currently, a variety of simulation approach DT, artificial neural network (ANN), support vector machine (SVM), AdaBoost, the iterative least squares linear discriminant (TILSQ), logistic Frontiers in Pharmacology | www.frontiersin.org May 2021 | Volume 12 | Article 655771 regression (LR), and K-step yard sampling (KY) method (U.S. Patent No. 7725413) (Kohtarou, 2010), consensus methods, Bayesian networks have been adopted to build various skin sensitization predictive models. The detail of some machine learning schemes has been described and illustrated by a review paper of Tarca et al. and the assessment of some defined approaches in skin sensitization prediction was evaluated by Kleinstreuer et al. (Tarca et al., 2007;Kleinstreuer et al., 2018). The SH test is designed to measure changes in cell surface thiols on hapten-treated cells and used to develop the first version of ANN-based iSENS to predict skin sensitization with the combination of h-CLAT data (Suzuki et al., 2009;Hirota et al., 2013). The second version was released afterward using the combination of the ARE assay data and n-octanol-water partition coefficient (log p) value (Natsch and Emter, 2008;Tsujita-Inoue et al., 2014). Further extended versions of the ANN model were based on various combinations of h-CLAT, DPRA, KeratinoSens ™ , and SH test (Hirota et al., 2015). It has been observed that the performance of an ANN model actually depended on the combination of data types. For instance, the ANN model based on the combination of h-CLAT and DPRA showed a better correlation with LLNA than other combinations such as DPRA and ARE assay or the SH test and ARE assay. The predictive models based on three descriptors such as the selection of h-CLAT, DPRA, and ARE assay or h-CLAT, SH test, and ARE assay produced higher correlation coefficients, viz. r values, and smaller prediction errors than their two-data-type counterparts (Hirota et al., 2015).

More recently, Macmillan and Chilton have combined Derek
Nexus and non-animal KeratinoSens ™ , h-CLAT, DPRA, and U-SENS ™ tests to develop a DT model. The derived DT model showed great performance with 73 and 76% accuracy of LLNA and human data, respectively, depending on the GHS classification (Macmillan and Chilton, 2019). A variety of machine learning-based schemes, namely ANN, SVM, AdaBoost, and TILSQ, were employed to build skin sensitization predictive models based on linear and non-linear discriminant analyses of 291 samples. It was found that SVM and AdaBoost models based on 32 descriptors to encode the 2-D and 3-D structural characteristics showed the highest performance with 100% accuracy of negative and positive (Sato et al., 2009). This investigation was further extended by including more samples (593 compounds) and adopting a novel KY scheme (Sato et al., 2012). Unlike any binary classification models, compounds were allotted to negative, positive, and gray zones through multiple steps in this study. Compounds in the gray zone, which was a confusing area, were repeatedly deposited into the positive and negative zones until no compound was left, the strategy was illustrated by Sato et al. in 2012. All 593 compounds were classified impeccably in 3 steps (Sato et al., 2012). Strickland et al. have adopted the LR and SVM schemes to develop predictive models based on non-animal tests, namely DPRA, h-CLAT, and KeratinoSens ™ using 6 physicochemical properties, namely log p, water solubility, vapor pressure, melting point, boiling point, and molecular weight. It was found that log p was the most pivotal factor in determining skin sensitization among various physicochemical properties that was further assured by Gleeson et al. (Gleeson and Gleeson, 2020). Of various combinations of non-animal test data, models that included the combination of DPRA and h-CLAT produced the highest accuracy . Pre-and pro-hapten sensitizers, which need to go through chemical transformation through air exposure (Karlberg et al., 1992;Sköld et al., 2002) or metabolism pathway (Nilsson et al., 2005;van Eijl et al., 2012) prior to the sensitization process, hinder the accuracy of current in vitro assays. Accordingly, a novel tri-culture assay system, which includes MUTZ-3-derived Langerhans cells, HaCaT keratinocytes, and primary dermal fibroblasts, and then measures the secretion levels of cytokines after these cells are exposed to test compounds, viz. sensitizer or non-sensitizer, has been proposed. Numerous SVM models were developed based on the stimulation indices (SI) of 27 human cytokines, namely IL-1β, IL-1ra, IL-2, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, IL-10, IL-12, IL-13, IL-15, IL-17, eotaxin, basic FGF, G-CSF, GM-CSF, IFN-γ, IP-10, MCAF, MIP-1α, MIP-β, PDGF-BB, RANTES, TNF-α, and VEGF to identify the most significant cytokines associated with skin sensitization. It was observed that the SVM model based on the top three ranking biomarkers, namely IL-8, MIP-1β, and GM-CSF, in tri-culture assay showed the highest performance with the prediction accuracy of 91%, and the detection of pre-and prohapten was improved accordingly (Lee et al., 2018).
Matsumura et al. perfomed a study using QSAR-like deep neural network (DNN) and light gradient boosting machine (LightGBM) to evaluate the potential of skin sensitization. Physical and structural properties of chemicals and the skin sensitizer/non-sensitizers based on the classification of GHS were used as input variables. The results showed that the dualinput LightGBM model (74%) and dual-DNN model (72%) were moderately accurate when compared with the traditional approaches (Matsumura, 2020).
In addition, other algorithms and methods have been adopted to improve the classification performance. For instance, Abdallh et al used binary crow search algorithm (BCSA) that was initially proposed by Askarzadeh in 2016 (Askarzadeh, 2016) to select the most relevant descriptors in the model development. The results gained the classification accuracy for the compound into sensitizer/non-sensitizer.

FUTURE PERSPECTIVES
The dermatology researched has shifted into a new paradigm after the introduction of artificial intelligence. Most of the applications involve in image analysis, but also include the analyses of the physiochemical properties of substances (Gomolin et al., 2020). Various applications in toxicity and environmental hazard endpoints, for instance, indicate that the great diversity of QSAR models (Chinen and Malloy, 2020). Data quality plays a critical role in model development and it is almost impossible to build a sound in silico model based on contaminated or impure data, especially for the quantitative predictive models (Cherkasov et al., 2014). The accuracy of a virtual model depends on the data quality, the lack of instruction of how to report and publish the toxicogenomic studies can hinder the usage of data in building the computational models (FitzGerald, 2020). There are some strategies to build an in silico model. The first strategy is to collect a large number of experimental data, extended the data coverage, and the use of big data approach. Another more effective way to build a model relied on deeply understanding the biological mechanism to predict the biochemical processes and the bioactivity of the novel compounds. The model based on mechanism does not need a large amount of training set, but it needs highly accurate experimental data (Kostal and Voutchkova-Kostal, 2020). Accordingly, it is of necessity to implement data curation prior to model development by removing those assay data obtained from impurity or mixture to maintain data integrity.
In 2020, Golden et al. carried out an investigation, which compared the accuracy of eight in silico models (PredSkin, Toxtree, QSAR Toolbox, Danish QSAR database, CAESAR, REACHAcross ™ , TIMES-S and Derek Nexus) against human data sets. Most of the models showed the accuracies of 70-80% on human data sets, suggesting that in silico models can be a convenient and inexpensive tool to define the skin sensitization in human (Golden et al., 2021). There is no doubt that an in silico model to predict skin sensitization based on human data will be more realistic and much needed. However, the scarcity in consistent human data in the public domain has created an unsurmountable hurdle for creating a sound predictive model due to their small amount of available data and limited structural diversity (Cherkasov et al., 2014). The relatively ample amount of LLNA data makes it a better alternative since the LLNA predictions can well correlate with the human tests in most of the cases and it has been commonly recognized as the gold standard for the human skin sensitization. However, the LLNA model is inevitably susceptible to some chemotypes, suggesting that it is of necessity to develop different predictive models for different chemotypes to accommodate the variations in skin sensitization mechanism. Some problems remain unresolved when the test compounds have consisted of more than one chemotype or when the test compounds lie outside of the applicability domain of the derived model. With the effort to improve the accuracy of skin sensitization tests, Leontaridou et al. identified the borderline range (BR) around the classification threshold of DPRA, LuSens, h-CLAT and LLNA. The substances with the test results fell into the BR and another available test method was required to depict the positive/negative outcome (Leontaridou et al., 2017).
The applications of animal tests on cosmetics products have been prohibited in Europe since 2013 (European Commission, 2013a) and lately, some countries also have accepted the OECD non-animal method to test the skin sensitization (Strickland et al., 2019). Nevertheless, animal tests, especially GMPT and LLNA, are still available and required by numerous countries such as Canada, China, Brazil, Japan, and the United States (Daniel et al., 2018). Additionally, the applications of animal tests for pesticides, plant protection products, pharmaceuticals, household products, art materials, industrial chemicals, medical devices, and workplace chemicals are needed and still acceptable in many industries, even in Europe (Daniel et al., 2018). These data provide valuable resources for building some in silico models to assess the covert of skin sensitization. In addition, the skin sensitization QSAR models can be applied to not only cosmetic ingredients but also the compound, which can impact the ecosystems such as dye pollution, the effect of personal care products on aquatic species, or plasticizers (Arulanandam et al., 2021;Funar-Timofei and Ilia, 2020;Khan et al., 2020) as well as the pharmacokinetics profiles of low molecular weight oligohydroxyalkanoates (Roman et al., 2020), suggesting that the skin sensitization models has a wide range of applications. To date, most of the published skin sensitization models are qualitative predictions, viz. binary classification of sensitizers or nonsensitizers. Nevertheless, it has been observed that a quaternary predictive model would execute better than its ternary counterpart, which, in turn, performed better than a binary one in the case of drug-induced liver injury (DILI) prediction (Weng and Leong, 2020). Accordingly, it is plausible to expect a multiple-class qualitative model to predict skin sensitization can function better than a two-class one. Predictive models based on a single type of assay data can only take into account one single pathway, suggesting that no single non-animal test can comprehensively render the whole complex skin sensitization process (Adler et al., 2011;Hartung et al., 2011). Additionally, some cosmetic or commercial products have currently contained sensitizers (Robinson et al., 2000) despite the fact that they do not trigger adverse reactions in the induction and elicitation phases when the applied doses are low. In addition, the application volume of those cosmetic products that are in direct and persistent contact with skin, e.g. cream or foundation, are different from those that can be washed or rinsed off, e.g. shampoo or body lotion. The differences in exposure dose between these groups can be up to 30 folds (Fewings and Menné, 1999;Frosch et al., 1995;Robinson et al., 2000). Therefore, qualitative in silico or non-animal models have hindered applications for those weak or moderate sensitizers in pharmaceuticals or cosmeceuticals markets, more importantly, a quantitative prediction model can be truly useful. To solve this problem, the package SpheraCosmolife, which is implemented in VEGAHUB, can process various ingredients in a product, has been derived recently. It can predict the mutagenicity, genotoxicity, and skin sensitization based on the concentration and the product type, namely lotion, shampoo, shower gel, etc, recorded in the internal database (Regulation (EC) No. 1223/2009 of the European Parliament and of the Council of 30 November 2009 on cosmetic products). This software is available at https:// www.vegahub.eu/download/sphera-cosmolife-download/and is implemented in VEGA. However, some challenges still remain since they cannot predict the skin sensitization caused by metals (Biswas et al., 2020).
Based on the principles published by the International Cooperation on Cosmetic Regulation (ICCR), Gilmour et al. displayed next generation risk assessment (NGRA) framework for skin sensitizers in 2020, which can be illustrated in Figure 1 of their publication (Gilmour et al., 2020). According to four elements of risk assessment, which included consumer exposure, hazard identification, hazard characterization and establishment of a dose response, that they presented a workflow assembled three tires and integrated all relevant information using a weight of evidence approach to predict a chemical to be a skin sensitizer or non-sensitizer (Gilmour et al., 2020).
The number of in silico models to predict skin sensitization has increased for the past few years. Those models have been built based on in vivo, in vitro, in chemico, and/or in silico data. Johnson et al. defined the rules and principles to develop the effective in silico skin sensitization models to facilitate the implementation and acceptance of in silico approaches based on the skin sensitization mechanism and the strengths/ limitations of each experimental methods. The standardization of this hazard assessment framework has further strengthened the use and application of in silico tools in agencies and industries (Johnson et al., 2020).
With the effort to collect the available web portal to predict 6 of acute toxicity tests, namely acute oral toxicity, acute dermal toxicity, acute inhalation toxicity, skin irritation and corrosion, eye irritation and corrosion, and skin sensitization), Borba et al. developed a package called Systemic and Topical chemical Toxicity (STopTox), which is available at https://stoptox.mml. unc.edu/ (Borba et al., 2020b).

CONCLUSION
In vitro skin sensitization tests alone cannot replace human and animal tests because they only focus on one single pathway in AOP. In silico approach, conversely, has more advantages than in vitro tests since it can take into account more than one AOP key event by combining various in chemico, in vitro, and in vivo data simultaneously. To date, most of the published in silico models only classify chemicals into sensitizers or non-sensitizer. This binary classification system has severely limited the applications of weak or moderate sensitizers in commercial products. Multiple-class in silico models can be greatly useful in practical applications as exemplified by the DILI study (vide supra) and quantitative ones will be even better. With the effort to develop the quantitative models, some in silico models have been generated recently, and yet they inevitability suffer from major limitations, suggesting that these quantitative models still need to be improved to be more accurate and have a wider range of applications in evaluating the skin sensitization potential. The development of robust and accurate in silico models for skin sensitization prediction is still a long and winding path ahead of the molecular modeling community.

FUNDING
This work was supported by the Ministry of Science and Technology, Taiwan.