AUTHOR=Liu Zhou , Li Li , Li Tianran , Luo Douqiang , Wang Xiaoliang , Luo Dehong TITLE=Does a Deep Learning–Based Computer-Assisted Diagnosis System Outperform Conventional Double Reading by Radiologists in Distinguishing Benign and Malignant Lung Nodules? JOURNAL=Frontiers in Oncology VOLUME=Volume 10 - 2020 YEAR=2020 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2020.545862 DOI=10.3389/fonc.2020.545862 ISSN=2234-943X ABSTRACT=Background: In differentiating indeterminate pulmonary nodules, multiple studies indicated the superiority of deep-learning based computer-aided diagnosis system (DL-CADx) over conventional double reading by radiologists, which needs external validation. Therefore, our aim was to external validate the performance of a commercial DL-CADx in differentiating benign and malignant lung nodules. Methods: In this retrospective study, 233 patients with 261 pathologically confirmed lung nodules were enrolled. Double reading was used to rate each nodule using a four-scale malignancy score system, including unlikely (0-25%); malignancy cannot be completely excluded (25-50%); highly likely (50-75%) and considered as malignant (75-100%), with any disagreement resolved through discussion. DL-CADx automatically rated each nodule with a malignancy likelihood ranging from 0-100%, which was then quadrichotomized accordingly. Intraclass correlation coefficients (ICC) was used to evaluate the agreement in malignancy risk rating between DL-CADx and double reading, with ICC value of < 0.5, 0.5 to 0.75, 0.75 to 0.9, and > 0.9 indicating poor, moderate, good and perfect agreement, respectively. With malignancy likelihood > 50% as cut-off value for malignancy and pathological results as gold standard, sensitivity, specificity and accuracy were calculated for double reading and DL-CADx, separately. Results: Among the 261 nodules, 247 nodules were successfully detected by DL-CADx with detection rate of 94.7%. Regarding malignancy rating, DL-CADx was in moderate agreement with double reading (ICC = 0.555, 95% confidence interval: 0.424 to 0.655). DL-CADx misdiagnosed 40 true malignant nodules as benign nodules and 30 true benign nodules as malignant nodules with sensitivity, specificity and accuracy of 79.2%, 45.5% and 71.7%, respectively. In contrast, double reading achieved better performance with 16 true malignant nodules misdiagnosed as benign nodules and 26 true benign nodules as malignant nodules with sensitivity, specificity and accuracy of 91.7%, 52.7% and 83.0%, respectively. Conclusions: Compared with double reading, DL-CADx we used still shows inferior performance in differentiating malignant and benign nodules.