Research in the application of artificial intelligence to lung cancer diagnosis

The morbidity and mortality rates in lung cancer are high worldwide. Early diagnosis and personalized treatment are important to manage this public health issue. In recent years, artificial intelligence (AI) has played increasingly important roles in early screening, auxiliary diagnosis, and prognostic assessment. AI uses algorithms to extract quantitative feature information from high-volume and high-latitude data and learn existing data to predict disease outcomes. In this review, we describe the current uses of AI in lung cancer-focused pathomics, imageomics, and genomics applications.


Introduction to AI
The terms artificial intelligence (AI) and machine intelligence were initially used in 1956 to refer to computer "intelligence, " which involves mathematics, computer science, and computational neuroscience.AI simulates, extends, or expands human intelligence, aiding the completion of specific tasks.Research focused on computer-aided detection/diagnosis (CADe/ Dx) intensified in the 1990s, and CADe/Dx of pulmonary nodules is becoming increasingly accurate.Machine learning is the core component of AI; it refers to the use of computer algorithms and models to draw inferences, recognize data patterns, and accomplish specific tasks without human intervention.The algorithms include shallow learning, deep learning, and other routines.Although shallow learning aids classification and regression, it requires large training datasets and does not adequately manage complex problems.Deep learning, also known as deep neural network learning, uses multi-level feature learning, involving convolutional neural networks (CNNs), deep belief networks, and self-encoders for deep mining of big data; thus, it can solve complex multi-classification problems.Deep learning based on CNNs accurately detects cancerous tissues and malignant tumors, predicts survival, and finds common mutations.
In recent years, imageomics has come to the forefront of AI research.This approach extracts quantitative features from a large number of medical images that reveal tumor morphology, size, and texture, thereby facilitating tumor diagnosis and treatment using new images (9).The steps include image acquisition, lesion outlining, feature extraction, feature selection, and machine modeling (10).Overall, imageomics constitutes task-driven re-use of existing humanlabeled image data via both AI and statistical analyses (11); it has applications in imaging suites and nuclear medicine.
AI-assisted diagnosis has led to the concept of pathomics, which involves the transformation of pathology images to high-fidelity, highthroughput mineable datasets with quantitative textural, morphological, and edge gradient features, as well as biological characteristics.These datasets are used to quantify pathological diagnoses and predict disease prognoses (12).Applications of pathomics include cytology, primary screening, quantitative morphological and histopathological diagnoses, and auxiliary prognostic judgments (13).The findings can be compared and integrated with the results of genomic and histological analyses; such comparisons are regarded as multi-omics crossover studies.
Genomics refers to the exploration of relationships between genes and functions using genome-scale assays (14).AI facilitates the complex data analyses, which are necessary for variant identification and classification, imaging and genetic diagnoses, and genotypephenotype predictions.

Applications of AI in lung cancer diagnosis 3.1 Early differentiation of lung cancer imaging
Approximately 70% of lung cancer patients have middle-stage and late-stage disease at the time of diagnosis; this delayed diagnosis hinders treatment and reduces survival.Early screening is important to reduce mortality.Lung nodules are early signs of lung cancer; early and accurate detection of benign and malignant nodules facilitates early treatment, improving prognosis and survival.Imaging physicians assess nodule location, size, and differences in density relative to surrounding structures.Such evaluations are subjective; the American Lung Cancer Screening Program found that approximately 8.9% of lung cancer cases were missed on baseline CT (15).Zhang et al. (16) compared AI and manual assessments of the chest CT scans of 60 patients with early-stage lung cancer and 50 patients with benign lung nodules (both diagnoses were determined via pathological analyses).The sensitivity of AI in terms of early-stage lung cancer detection was better than the human sensitivity, indicating that the use of AI improved diagnostic sensitivity for early-stage lung cancer and provided rapid results.Clinical implementation was suggested.
Currently, CADe/CADx schemes are used to reduce the falsepositive lung nodule rate.CADe first selects all suspected nodules, classifies them as nodules or non-nodules, and removes the non-nodules with high sensitivity.The CADe system is based on deep learning; the Lung Image Database Consortium is used to analyze nodule morphology, density, and texture.Some variations among studies are apparent, which may reflect the nodules examined.Some nodules, such as ground glass nodules (GGNs) and apposed vascular nodules, are associated with high false-positive rates; screening must be improved to enhance detection.CNNs are widely used to screen pulmonary nodules.For example, Mahmood et al. (17) developed a system based on AlexNet, a novel CNN in terms of layer ordering; it exhibits improved hyperparameters and functions.The image segmentation algorithm was used to process the lung scan sequence to generate a lung area map, and then, the lung image was generated according to the lung area map.The lung region image generated by lung segmentation and the nodule labeling information were used to generate the nodule region image, and the pulmonary nodule segmenter based on convolutional neural network was trained, and then, the lung nodule segmentation was performed on the image to obtain the suspected pulmonary nodule area.Once the suspected pulmonary nodule is located, a 3D convolutional neural network is used.The pulmonary nodules were classified to obtain the location and confidence of the true pulmonary nodules (18).An entire dataset is pre-processed, segmented, normalized, and zero-centered.The accuracy is 98.7%, the sensitivity is 98.6%, and the specificity is 98.9%; overall, performance is excellent.Compared with a two-dimensional CNN, a three-dimensional (3D) CNN more effectively analyses spatial information, but its computational burden is larger and a more complex network is involved.Importantly, the sensitivity is higher.Yang et al. (19) used a squeeze excitation module for feature fusion, to solve the problem posed by the different views of a multi-view model; they constructed a 3D multi-view squeeze excitation CNN.The accuracy and sensitivity in terms of classifying benign and malignant pulmonary nodules were 96.04 and 98.59%, respectively; 94.8% of agreement with pathological diagnoses was higher than the agreement achieved by other methods.The model effectively learned the spatial heterogeneities of pulmonary nodules and solved the multi-view discrepancy problem; it will provide substantial assistance to physicians.Ma et al. ( 20) developed an improved 3D Mask regional CNN for the detection and classification of pulmonary GGNs using a pre-trained 3D U-Net to extract the lung parenchyma, a 3D target detection network to locate the lesion and determine its malignancy status, and a feature-based weighted clustering algorithm to remove false images.The mean detection accuracy was high, and the "false alarm" rate was low.The feature-based weighted clustering algorithm allowed this deep learning-based framework to automatically detect and classify pulmonary GGNs and to accurately locate and classify nodules; it will be useful for physicians.Liu et al. (21) subjected the images of 262 patients with pulmonary GGNs to physician and AI review.The AI sensitivity and leakage rates were lower, but the misdiagnosis rate was higher compared with the rates of the physician.
Further cooperation between AI developers and physicians is needed.Wu et al. (22) used AI to analyze images of 175 patients with pulmonary nodules who attended regular follow-up; they found that nodule growth was influenced by original nodule size, mean CT value, surface signs, and malignancy probability.Clinicians were recommended to consider these factors when choosing follow-up times; such considerations will facilitate early detection of nodule growth and timely initiation of treatment.

Lung cancer diagnosis
AI diagnoses of lung cancer include imaging, pathological, and genetic diagnoses.A large number of images are subjected to multilevel quantitative feature extraction, which aids the diagnosis of benign and malignant tumors while predicting histological type and invasiveness; such feature extraction assists clinical diagnosis (23).Yu et al. (24) used a machine learning tool to image lung cancer CT and pathology slices with a focus on lung nodules and segmentation of cell nuclei, as well as digital pathology results.The findings were useful in lung nodule and nuclear screening.The positive diagnostic accuracy was similar to the accuracy of less senior doctors, and the mean screening time was shortened by 58%; thus, the tool improved efficiency, reduced doctor workloads, and supported clinical diagnoses.

AI and CT recognition
AI analyses of medical images improve imaging efficiency and image processing; they also allow remote consultations by primary hospitals (25).Feng et al. (26) evaluated 90 lung cancer patients using a Mask-CNN for image segmentation and a dual path network for nodule detection.In terms of detecting lung lesions, the dual path network accuracy was 88.74%; the accuracy of CT diagnosis was 88.37%, and the sensitivity and specificity were 82.91 and 87.43%, respectively.Notably, CT image examination based on deep learning combined with serum markers exhibited an accuracy of 97.94%, with a sensitivity of 98.12% and a specificity of 100%.Shi et al. (27) used a semi-supervised deep migration learning framework (known as SDTL) for the diagnosis of benign and malignant lung nodules among 3,038 nodules with pathologically confirmed benign or malignant markers and 14,735 unmarked nodules.They found that SDTL exhibited superior diagnostic performance.Migration learning improved the accuracy by 2%, and semi-supervised learning improved the accuracy by an additional 2.9%.Such effective classifications are expected to have applications in clinical practice.Fang et al. (28) analyzed CT images of 224 hairy nodules and GGNs from 210 patients.AI identified GGNs using quantitative parameters and CT signs.The AI parameters identified the early lung adenocarcinoma subtypes, and the combination of AI data with CT signs led to improved diagnostic efficacy.

AI and pathological diagnosis
Many studies have shown that AI can help pathologists to rapidly determine lung cancer types, identify mutations in specific genes, predict prognoses, and make clinical decisions, thereby significantly improving diagnostic efficiency and reducing workloads (29).In a study of 101 patients who had been clinically diagnosed with lung cancer, Wang et al. (30) used an AI cytopathological diagnostic system and rapid on-site evaluation to assess 94 lung tissue biopsy specimen, 6 pleural effusion specimen, and 1 ascites specimen.The AI cytopathological diagnostic accuracy was lower than the accuracy of pathologists but similar to the accuracy of rapid on-site evaluation; this AI system reduced workloads, improved efficiency, and will be useful in lung cancer diagnosis.Chen et al. (31) evaluated 110 lung adenocarcinoma pleural fluid specimen and 20 non-cancerous pleural fluids (controls).They trained the Yolo V4 model using suspected and confirmed lung cancer cells; they trained the Inception V3 model on various classifications of cells.The trained Yolo V4 model identified and labeled both suspected and confirmed lung adenocarcinoma cells in pleural fluids with a mean accuracy of 20% for all classes; in contrast, the accuracy of the trained Inception V3 model was 98%.The dispersal of cell clumps into single cells improved efficiency and accuracy.Thus, AI detected and classified lung adenocarcinoma cells in pleural fluid; this ability aids lung cancer diagnosis and is a good example of a practical application for deep learning.

AI and genotyping
AI can manage the complex big data produced by genomics analyses; it can also assist geneticists and clinicians in accurately evaluating sequencing results.However, AI applications in this field remain limited, and further optimization is needed.Liu et al. (32) used a KL scatter-based gene selection method to identify high-scatter genes.They used the focal loss function to establish a deep neural network and then used k-fold cross-validation (k = 5) to select the best model.The area under the curve (AUC) was 0.99 for a validation set that explored KL divergence-based gene selection, indicating improved lung cancer prediction accuracy.Wang et al. (33) used a fully automated AI system to extract whole-lung information from CT images and predict the prognoses of patients with EGFR genotypes who had been treated with EGFR-tyrosine kinase inhibitors.The images were collected from 18,232 lung cancer patients, and the EGFR gene sequences were collected from nine cohorts in China and the United States.Kaplan-Meier analysis revealed that the AUC of fully automated AI systems ranged from 0.748 to 0.813 among the different cohorts, which was better than the AUCs of commonly used tumorfocused deep learning models.The 29 prognostic deep learning features of fully automated AI systems identified patients with EGFR mutations that exhibited a high risk of tyrosine kinase inhibitor resistance.

Lung cancer staging
Accurate tumor staging helps clinicians choose appropriate treatment plans.Compared with AI, imaging physicians assess tumor stages more accurately because the physicians explore both local invasion and distant metastases using images collected via multiple modalities.AI lung cancer staging (11) principally focuses on lymph node and distant metastases based on their primary tumor characteristics; it can also focus solely on lymph nodes to determine whether they are benign or malignant.Although imaging physicians are currently irreplaceable, there is evidence (34)(35)(36) that AI can provide assistance for those physicians.Yang et al. (37) evaluated 96 lung cancer patients with surgically and pathologically confirmed peripheral non-small cell lung cancer, all of whom underwent preoperative multi-detector CT and were staged using the TNM system.Two methods of maximum lesional diameter assessment (manual physician measurement and AI measurement) were used for preliminary preoperative T staging.The physician accuracy was 67.71%, whereas the AI accuracy was 83.33%.Pathological T staging consistency between the two methods was good; however, the AI CT-based T staging was more accurate, reproducible, and stable.

Lung cancer treatment
Watson for Oncology, an outstanding medical AI application that became available in China in 2017, rapidly and accurately develops standardized treatment plans for tumor patients; such plans are consistent with the plans established by clinical oncologists (38).Lee et al. (39) developed Seq2Seq, a deep learning algorithm that predicts weekly anatomical changes in lung tumors and the esophagus during radical radiotherapy; it incorporates potential tumor shrinkage into predictive treatment planning.The trained Seq2Seq algorithm was used to evaluate 60 patients, and its performance was very good.41) used an AI-assisted diagnostic system to extract features from lung CT images of 162 patients with adenocarcinomas who exhibited GGNs.The 5-year postoperative overall survival and relapse-free survival were better in a pure GGN group than in a mixed GGN group.Imaging microfeatures (e.g., microvascular clusters, nodule volumes, nodule lengths, and nodule diameters) were independent risk factors for poor survival.These features, as well as nodule central density and lymph node metastasis, were also independent risk factors for poor relapse-free survival, confirming that AI-assisted diagnosis could effectively predict the prognosis of GGN-type lung adenocarcinoma and support personalized treatment planning.Such study will also facilitate lung cancer prevention.

Applications of AI in other cancers
The results of the artificial intelligence software for skin cancer diagnosis developed by the research team of Stanford University in the United States were published in the journal Nature in January 2017 (42).The AI software requires the completion of three diagnostic tasks: identifying keratocytoma, identifying melanoma, and classifying melanoma using dermoscopic images.The researchers measured the performance of the algorithm through a sensitivity-specificity curve.In three diagnostic tasks, the AI performed close to that of a human dermatologist, with a sensitivity of 91%.The development of artificial intelligence software will bring revolutionary intercommunication to digital pathology research.
An AI-assisted diagnostic system developed by Dutch researchers can assess and identify the extent of lesions based on Barrett's esophagus (43).In the research and development stage, under the guidance of endoscopy experts, the location of the lesion in the endoscopic picture is identified in combination with the histopathological diagnosis results, and then, the AI-assisted diagnosis system is used for in-depth Xi.Finally, the system compared with 53 endoscopists from 4 countries to evaluate 160 lesion countries, and the results showed that the AI-assisted diagnosis system was more accurate in predicting the lesion extent than 53 endoscopists, and all lesion ranges depicted overlapped with the range depicted by endoscopists, and the optimal biopsy site was determined in more than 92% of the cases.

Discussion
AI detection research models for lung cancer have been widely developed, and these models will be objective, efficient, multi-angle, and repeatable technical means applied to the diagnosis of lung cancer, can greatly alleviate the work pressure of clinicians, reduce doctors due to fatigue caused by misdiagnosis, may change the current medical model, is expected to make doctors work as a decision maker.
Although the effectiveness of AI in the diagnosis of lung cancer has been preliminarily verified, it is still in the stage of clinical exploration, and many aspects need to be improved: (1) The AI-assisted diagnosis system based on CT images still has missed diagnosis and misdiagnosis of benign and malignant lung cancer, which may be caused by incomplete information extraction of CT imaging data of lung cancer patients such as image features and typical manifestations and may affect the differential diagnosis results of lung cancer types by this system.(2) In a daily imaging diagnosis study, radiologists need to conduct a comprehensive evaluation and diagnosis of a lung CT image data, including determining whether there are pulmonary nodules, emphysema, pneumonia, mediastinal lymph node enlargement, and heart macrovascular lesions and then make a comprehensive evaluation.The single-task AI-assisted diagnosis system for lung cancer is obviously unable to meet the needs of comprehensive clinical study, and it is necessary to further develop a multi-task and multi-threaded real-world diagnosis system.(3) CT image data are the core resource required by AI algorithm, but at present, the degree of data sharing and interoperability among hospitals is low, and it is difficult for AI research and development companies to obtain large-scale multi-center data.Moreover, the annotation of CT images is often controversial; different countries, international organizations, societies, and hospitals may implement different standards, and understanding of signs of different doctors is not uniform.Therefore, it is imperative to establish a standard CT image database.(4) The current law does not clearly specify the scope of responsibility that AI should bear.At present, the mistakes made by AI in medical services need to be borne by doctors, but doctors cannot identify the quality of products, which makes it more difficult to assess the responsibility of AI.In the future, the responsibility of hospitals, doctors, and AI companies should be divided from the legal level, so as to urge enterprises to improve the performance of products and protect patient privacy.Similarly, at this stage, the progress of artificial intelligence pathology diagnosis is still mostly in the laboratory research stage and has not really entered the clinic, and its limitations are as follows: (1) Data quality problems: At present, specimen processing, section staining, and image annotation have not yet formed standardized processes, and the amount of data used for artificial intelligence training is insufficient, which affects the reliability of diagnosis.(2) Data integration problem: At present, the data of artificial intelligence models mainly come from pathological sections but do not combine the symptoms, signs, and other test results of patients, which weakens the accuracy of diagnosis.
Thus far, AI is not widely used in China; however, AI is increasingly used to aid medical care, develop medicines, optimize health insurance, assist image recognition, support pathological and auxiliary diagnoses, predict prognoses, and construct disease databases.AI improves diagnostic accuracy and thus alleviates the imbalances, affecting medical and healthcare supply and demand in China.More cooperation among researchers is needed to promote rapid AI development.Although AI cannot completely replace clinical decision-making, it can serve as a valuable clinical assistant.Doctors must build appropriate emerging technologies into their clinical practices to promote the synergistic development of AI and medical care.In the near future, AI will presumably facilitate major medical breakthroughs.
AI facilitates early and accurate identification and diagnosis of individuals at high risk of lung cancer, and it also accurately predicts prognoses, aiding treatment selection.Miller et al. (40) used a new, integrated machine-learning approach to accurately predict patient survival based on metabolomics data from tumor core biopsies.Wei-Ning et al. (