Artificial Intelligence in Cutaneous Oncology

Skin cancer, previously known to be a common disease in Western countries, is becoming more common in Asian countries. Skin cancer differs from other carcinomas in that it is visible to our eyes. Although skin biopsy is essential for the diagnosis of skin cancer, decisions regarding whether or not to conduct a biopsy are made by an experienced dermatologist. From this perspective, it is easy to obtain and store photos using a smartphone, and artificial intelligence technologies developed to analyze these photos can represent a useful tool to complement the dermatologist's knowledge. In addition, the universal use of dermoscopy, which allows for non-invasive inspection of the upper dermal level of skin lesions with a usual 10-fold magnification, adds to the image storage and analysis techniques, foreshadowing breakthroughs in skin cancer diagnosis. Current problems include the inaccuracy of the available technology and resulting legal liabilities. This paper presents a comprehensive review of the clinical applications of artificial intelligence and a discussion on how it can be implemented in the field of cutaneous oncology.


INTRODUCTION
The increasing incidence of skin cancer is a global trend. Skin cancer, which was previously known to be a common disease in Western countries, is occurring more frequently in South Korea. According to the Korean Statistical Information Service 1 , the number of patients with nonmelanoma skin cancer in 2015 was 4,804 (9.4 people per 100,000), an increase over the 1,960 in 2005 and 3,270 in 2010. The increase in incidence rate is thought to be due to the aging population, the increased popularity of outdoor activities, increased ultraviolet exposure, improved access to medical services, and increased awareness of skin cancer among patients (1).
Skin biopsy and histopathologic evaluation are essential in confirming skin cancer. However, it is impossible to confirm all pigmented lesions by biopsy due to pain and scar development. Therefore, it is first necessary to establish whether or not a biopsy is required through a visual inspection performed by an experienced dermatologist. Furthermore, dermatologist needs a device that can detect changes over time in skin lesions and record the lesions in detail so that wrong-site surgery does not occur (2,3).
With the development of imaging technologies, methods and devices for recording and analyzing what doctors see have progressed rapidly. Universally, dermoscopic imaging irradiates light onto the upper dermal layer, to observe and record more detailed pigment changes. In recent years, development of high-resolution non-invasive diagnostic devices (e.g., confocal microscopy, multiphoton microscopy, etc.) that can detect cellular levels of the skin lesions without biopsy has also been enriched (4)(5)(6). In addition, diagnoses of such skin images using artificial intelligence (AI) have been shown to outperform the average diagnosis performances of doctors. These developments are expected to have a significant impact on the diagnosis of skin cancer, the accurate recording of changes in suspicious lesions, and the effectiveness of follow-up skin cancer surgery. For user convenience, applications suitable for general smartphones have become available; however, these are not sufficiently supported by scientific evidence.
In this review, we introduce the basic concepts and clinical applications of AI via a literature review and discuss how these can be implemented in the field of dermatological oncology.

BASIC CONCEPTS OF ARTIFICIAL INTELLIGENCE
AI is a field of computer science that solves problems by imitating human intelligence, these problems typically require the recognition of patterns in various data. Conventional machine learning refers to machine learning methods that do not involve deep learning; these methods extract features such as those relating to colors, textures, and edges. In conventional machine learning, precise engineering knowledge and extensive experience are required to design feature extractors capable of extracting suitable features. Using these features, conventional machine learning can derive various results and identify correlations.
Deep learning uses deep neural networks to learn features, which are obtained by designing simple but non-linear modules for each layer. Using deep neural networks, very complex functions can be learned. For example, in the field of computer vision, a deep neural network's first layer typically learns the presence of edges at particular orientations and locations within the image. Larger combinations of such edges are identified in the next layer. As the layers become deeper, they learn larger and more specific features (7). Figure 1 shows the relationship between AI, machine learning, and deep learning. Deep learning falls within the category of machine learning, which falls within the category of AI. In this figure, the examples for conventional machine learning and deep learning are classifications of acral lentiginous melanoma (ALM) and benign nevus (BN) in dermoscopy images. Conventional machine learning extracts specific features from dermoscopy images; for example, the gray-level co-occurrence matrix (GLCM) is used to extract texture features (8). The conventional machine learning method then trains classifiers, using the extracted features to classify ALM and BN. However, deep learning methods learn by extracting various features through deep neural networks. The main difference between conventional machine learning and deep learning is that deep learning extracts various features per layer, without human intervention (9).
We divided the cutaneous oncology publications into those evaluating malignant skin cancers and non-melanoma skin cancers. In addition, each publication was divided into machine learning (excluding deep learning), deep learning, and hybrid methods (a combination of machine learning and deep learning) (Figure 2).
In terms of machine learning methods, most publications use a feature extractor to extract a feature from an image, they then train the classifier model using these features (e.g., malignant melanoma (MM) vs. BN). Recently, deep convolution neural network (DCNN) have been implemented in many medical-imaging studies (10)(11)(12). DCNN use convolution operations to compensate for the problems that arise through neglecting the correlations and pixel localities of multi-layer perceptron (MLP). Thus, deep learning can be used to train a robust classifier model with a variety of data. Figure 3 shows an example of a DCNN for classifying ALMs and BNs in dermoscopic images. The DCNN feature extractor repeatedly applies convolution and max-pooling (to obtain the largest activation for each region) operations to the layer input. This process generates a feature map. The feature map is inputted to a classifier via global average pooling for each channel. The classifier finally determines probabilities for ALM and BN. The result is then compared with the actual label, and the parameters are updated via backpropagation. However, DCNN operations require highly powerful graphics processing units to manage the complex computations and large datasets involved. Although DCNN learning capacities can be limited by insufficient medical-image data, it is possible to fine-tune state-of-art deep learning models that show high performance in ImageNet large-scale visual recognition challenge (ILSVRC), making them suitable for medical purposes (13). In the hybrid method, an ensemble classifier is designed by combining a conventional machine learning method and a deep learning method. For example, after extracting the features of an image using a conventional machine learning method, these extracted features are used as inputs for a DCNN. Another example is that of training a support vector machine (SVM) using a feature map obtained through a DCNN (14). One publication showed that hybrid models outperform both deep learning and conventional machine learning models (15), another publication highlighted the limitations of deep learning and stated a need for hybrid models to overcome these limitations (16). Thus, these two methods can be used effectively to create more accurate models.
Every year, the number of articles describing AI implementations in the field of cutaneous oncology increases. By observing the trends of the discipline, it can be seen that studies using conventional machine learning have been decreasing

APPLICATION OF ARTIFICIAL INTELLIGENCE IN THE DIAGNOSIS OF MALIGNANT SKIN CANCERS Melanoma
A total of 18 publications were identified, six of these described the use of conventional machine learning, nine publications FIGURE 3 | Example of DCNN for classifying ALM and BN in dermoscopic images. In the feature extractor, each layer performs a convolution operation on the input data and then performs a max-pooling operation, thereby reducing the image size and increasing the number of channels. The feature extractor generates a feature map by repeating this process for each layer. After the global average pooling operation, the feature map is used as the input of the classifier layer (fully-connected layer). Finally, the output of the fully-connected layer appears as a probability of ALM or BN.
showed the use of deep learning, and two publications presented the use of hybrid models. Among the total 18 publications, 14 used dermoscopic images as the dataset, and the remainder used unspecified or clinical images; nine used more than 500 datasets, and the remainder used <500 datasets. Moreover, in five of the publications, other skin lesion data such as seborrheic keratosis (SK) and basal cell carcinoma (BCC) were used alongside malignant melanomas and nevus. Seven publications presented the area under the curve (AUC) as a performance indicator of the model and the remainder presented accuracy (Acc), sensitivity (Sen), and specificity (Spe) (Tables 1-3).

Deep Learning
Among the deep learning algorithms discussed in the literature, five were fine-tuned using pre-trained models. The remainder were fully trained with new models. In four publications, preprocessing was performed prior to model training. One publication (Premaladha and Ravichandran) compared the conventional machine learning method 'Hybrid Adaboost-SVM' and a deep learning-based neural network on the same dataset; they showed that the deep learning-based neural network delivered superior performance. Moreover, one publication (Cui et al.) demonstrated that when more data was used, the results of deep learning outperformed conventional machine learning methods.

Conventional Machine Learning
From the conventional machine learning publications, four of the five publications performed feature extraction and then created a classifier. Two of these publications used SVM for the classifier, one used multivariable linear regression, and one used a layered model. In three publications, artifact removal or lesion segmentation were performed prior to feature extraction. On the other hand, one publication (Marchetti, Codella et al.) presented a new model using a fusion method, developed by 25 teams participating in International Symposium on Biomedical Imaging (ISBI) 2016.

Hybrid (Deep Learning + Machine Learning)
In the publications using hybrid methods, one publication (Jafari, Nasr-Esfahani et al.) preprocessed the input images, extracted the patches, and performed segmentation using a convolutional neural network (CNN). In one publication (Xie, Fan et al.), segmentation was performed after preprocessing, using a neural network called self-generating neural network (SGNN); they then presented an ensemble network by designing a feature extractor and classifier. Furthermore, in one publication (Sabbaghi et al.), a deep auto-encoder combined with bag of features (BoF) outperformed the model using a BoF or deep auto-encoder alone.

Non-melanoma Skin Cancer: BCC, Squamous Cell Carcinoma (SCC)
We identified seven deep learning publications, three machine learning publications and three hybrid publications on nonmelanoma skin cancer. Several publications discussed MM; however, all of them discussed BCC and three publications discussed SCC, thus we classified the publications into these categories. The results are organized in Tables 4-6.
The results of all publications were presented using an accuracy indicator, and some of these publications using a variety of indicators, such as specificity, sensitivity, precision, and F1 score. The datasets used in each publication were different, making it impossible to compare them directly.

Deep Learning
Rezvantalab et al. compared the abilities of deep learning against the performances of highly trained dermatologists. This publication presented outcomes from various deep learning models. In the BCC classification, the highest AUC of the publication was reported as 99.3%, using DenseNet 201. When compared against dermatologists (AUC 88.82%), the results of deep learning were found superior.
Five publications used datasets of dermoscopic images. One used full-field optical coherence tomography (FFOCT) images, and Jordan Yap et al. used different forms of data including metadata, macroscopic images, and dermoscopic images. Next, they trained a deep learning model using fusion techniques, in which image feature vectors were concatenated with the metadata feature vectors. Two publications by Zhang et al. written in 2017 and 2018, showed interesting results; the 2018 publication improved the previous year's algorithm for utilizing medical information. Their results showed an average improvement of 0.7% over those of the previous year.

Conventional Machine Learning
We identified four publications that used only machine learning techniques. Three publications used dermoscopic images and one used polarization-sensitive optical coherence tomography (PS-OCT) images. Each author used different methods and features.
Marvdashti et al. performed feature extraction and classification using multiple machine learning methods [SVM, k-nearest neighbor (KNN)]. Kharazmi et al. segmented vascular structures using independent component analysis (ICA) and k-means clustering, then classified them using a random forest classifier. Kefel et al. introduced automatically generated borders using geodesic active contour (GAC) and Otsu's threshold for the detection of pink blush features, known as a common feature of BCCs. Subsequently, they classified using logistic regression, based on features such as smoothness and brightness.

Hybrid (Deep Learning + Machine Learning)
Three publications implementing hybrids were identified. Each publication used a different dataset. One publication used optical coherence tomography (OCT) images and another used  All images from the students' training session were also used to retrain the last layer of the "GoogLeNet Inception v3" neural network, without any kind of test-set augmentation (4,000 epochs, learning rate 0.001, batch size 50).

IMPLEMENTATION IN SMARTPHONES
With the spread of smartphones, the mobile application market has expanded rapidly. Applications can be used in various fields, particularly in the field of dermatology through the use of smartphone cameras. In particular, due to the ubiquity of smartphones, easily accessible mobile apps can make it more efficient to detect and monitor skin cancers during the early stages of development. In addition, with the recent development of smartphone processors and cameras, machine learning techniques can be applied, and skin cancer diagnoses  can be conducted through smartphones. Table 7 shows that a lot of research and development on smartphone implementation is carried out. AI technology relevant to skin cancer diagnosis is anticipated to eventually be implemented in smartphones, enabling the reduction of unnecessary hospital visits. Many types of mobile health application are already available.

Types and Accuracies of Diagnostic Applications Using a Smartphone
According to a recent review (53,54), numerous applications have already been released, seven of which use image analysis algorithms. Four of the seven applications are not supported by scientific evidence, and these four have been deleted from the app store since the review was conducted; the other three apps are still available. Table 7 provides a summary of the apps. SkinScan, SkinVision, and SpotMole are currently available. SkinVision uses machine learning algorithms and SkinScan and SpotMole use the ABCDE rule (that is asymmetry, border irregularity, color that is not uniform, diameter >6 mm, and evolving size, shape or color). Only one application employs a machine learning technique. The sensitivity and specificity of these applications are shown in the table.
Most diagnosis applications are not accurate (55). Furthermore, only a few inform users using image analysis and machine learning. Most apps are not supported by scientific evidence and require further research.

Problems and Possible Solutions
Inaccuracies in medical applications can result in problems of legal liability. In addition, the transmission of patient information may correspond to telemedicine practices, for which there are certain legal restrictions; these include information protection regulations to prevent third parties accessing data during the transmission process. Even if the accuracy is improved, the advertisements embedded in the application suggest that the technology could be used for commercial advertisements; for example, to attract patients. To solve this problem, a supervisory institution in which doctors participate is required, along with a connection to remote medical care services. The United States has been steadily attempting to promote telemedicine in its early stages, to address the issue of access to healthcare. Since the establishment of the American Telemedicine Association (ATA)-a telemedicine research institute-in 1993, legislation, including the Federal Telemedicine Act, has been established. It has been applied to more than 50 detailed medical subjects, including heart diseases, and has been successfully implemented in rural areas, prisons, homes, and schools (56).
To obtain good results, it is necessary to focus on securing high-quality data, to form a consensus between the patient and the doctor, and to actively participate in development.
In summary, the evidence for the diagnostic accuracy of smartphone applications is still lacking because few mHealth apps offer services. In addition, because the rate of service or algorithm change is faster than the peerreview publishing process, it is difficult to compare different apps accurately.

Risks of Smartphone Applications
Smartphone applications pose some risk to users, especially if the algorithm returns negative results and delays the detection and treatment of undiagnosed skin cancer. It is very difficult to study false-negative rates because there is no histological evidence. Users may not be able to assess all skin lesions, especially if they are located in areas difficult to reach or to see. Given the generally low specificity of current applications, there would be a few false positives. This would put unnecessary stress on the user and result in unnecessary visits to the dermatologist. Furthermore, through limited trust and awareness, the user may not follow the advice provided by the smartphone application.
Chao et al. described the ethical and privacy issues of smartphone applications (57). Whilst applications have the potential to improve the provisions of medical services, there are important ethical concerns regarding patient confidentiality, informed consent, transparency in data ownership, and protection of data privacy. Many apps require users to agree to their data policies; however, the methods in which patient data are externally mined, used, and shared are often not transparent. Therefore, if a patient's data are stored on a cloud server or released to a third party for data analysis, assessing liability in the event of a breach of personal information is a challenge. In addition, it is unclear how the responsibilities for medical malpractice will be determined if the patient is injured as a result of inaccurate information.

CONCLUSION
In this review, we analyzed a total of 35 publications. Studies on skin lesions were divided into those assessing malignant melanomas and non-melanoma skin cancers. In addition, studies involving clinical data and OCT images were used alongside those involving the dermoscopic images widely used in dermatology. Because the considered datasets differed between the publications, it was impossible to determine how best to perform the analysis. However, as seen in the publication by Cui et al. deep learning methods obtain better results than conventional machine learning methods if the dataset is large. Also, certain publications have reported comparable or superior results to dermatologist. In   (26,58). Therefore, in the future, computer-aided diagnostics in dermatology will show greater reliance on deep learning methods. For the convenience of users, the use of a smartphone is necessary. However, an accuracy limitation occurs when applied to smartphones. This problem is due to the limitations of hardware, which used conventional machine learning techniques such as SVM rather than deep learning. However, MobileNet has recently made it possible to use deep learning methods in IoT devices, including smartphones (59). This enables deep learning to be applied to IoT devices for faster performances than large networks, which will lead to more active research into skin lesion detection using applications.
Application inaccuracies can lead to legal problems. To solve this problem, doctors and patients must participate together in the development stage, and an institution for managing and supervising this process is also required.

AUTHOR CONTRIBUTIONS
BO and SY: contributed conception and design of the study, wrote sections of the manuscript, and contributed to manuscript revision. YC and HA: collected data and wrote the first draft of the manuscript. All authors read and approved the submitted version.