<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fdata.2024.1366415</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Toward explainable AI in radiology: Ensemble-CAM for effective thoracic disease localization in chest X-ray images using weak supervised learning</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Aasem</surname> <given-names>Muhammad</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2622794/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/conceptualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/data-curation/"/>
<role content-type="https://credit.niso.org/contributor-roles/formal-analysis/"/>
<role content-type="https://credit.niso.org/contributor-roles/methodology/"/>
<role content-type="https://credit.niso.org/contributor-roles/software/"/>
<role content-type="https://credit.niso.org/contributor-roles/validation/"/>
<role content-type="https://credit.niso.org/contributor-roles/visualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Javed Iqbal</surname> <given-names>Muhammad</given-names></name>
<role content-type="https://credit.niso.org/contributor-roles/conceptualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/formal-analysis/"/>
<role content-type="https://credit.niso.org/contributor-roles/investigation/"/>
<role content-type="https://credit.niso.org/contributor-roles/methodology/"/>
<role content-type="https://credit.niso.org/contributor-roles/project-administration/"/>
<role content-type="https://credit.niso.org/contributor-roles/resources/"/>
<role content-type="https://credit.niso.org/contributor-roles/software/"/>
<role content-type="https://credit.niso.org/contributor-roles/supervision/"/>
<role content-type="https://credit.niso.org/contributor-roles/validation/"/>
<role content-type="https://credit.niso.org/contributor-roles/visualization/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
</contrib>
</contrib-group>
<aff><institution>Department of Computer Science, University of Engineering and Technology</institution>, <addr-line>Taxila</addr-line>, <country>Pakistan</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Hariharan Shanmugasundaram, Vardhaman College of Engineering, India</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Yuanda Zhu, Independent Researcher, Atlanta, United States</p>
<p>Eugenio Vocaturo, National Research Council (CNR), Italy</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Muhammad Aasem <email>muhammadaasem&#x00040;gmail.com</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>02</day>
<month>05</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="collection">
<year>2024</year>
</pub-date>
<volume>7</volume>
<elocation-id>1366415</elocation-id>
<history>
<date date-type="received">
<day>06</day>
<month>01</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>08</day>
<month>04</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2024 Aasem and Javed Iqbal.</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Aasem and Javed Iqbal</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license></permissions>
<abstract>
<p>Chest X-ray (CXR) imaging is widely employed by radiologists to diagnose thoracic diseases. Recently, many deep learning techniques have been proposed as computer-aided diagnostic (CAD) tools to assist radiologists in minimizing the risk of incorrect diagnosis. From an application perspective, these models have exhibited two major challenges: (1) They require large volumes of annotated data at the training stage and (2) They lack explainable factors to justify their outcomes at the prediction stage. In the present study, we developed a class activation mapping (CAM)-based ensemble model, called Ensemble-CAM, to address both of these challenges via weakly supervised learning by employing explainable AI (XAI) functions. Ensemble-CAM utilizes class labels to predict the location of disease in association with interpretable features. The proposed work leverages ensemble and transfer learning with class activation functions to achieve three objectives: (1) minimizing the dependency on strongly annotated data when locating thoracic diseases, (2) enhancing confidence in predicted outcomes by visualizing their interpretable features, and (3) optimizing cumulative performance via fusion functions. Ensemble-CAM was trained on three CXR image datasets and evaluated through qualitative and quantitative measures via heatmaps and Jaccard indices. The results reflect the enhanced performance and reliability in comparison to existing standalone and ensembled models.</p></abstract>
<kwd-group>
<kwd>explainable artificial intelligence</kwd>
<kwd>class activation maps</kwd>
<kwd>weak supervised learning</kwd>
<kwd>computer aided diagnosis</kwd>
<kwd>ensemble learning</kwd>
<kwd>transfer learning</kwd>
</kwd-group>
<counts>
<fig-count count="6"/>
<table-count count="6"/>
<equation-count count="13"/>
<ref-count count="60"/>
<page-count count="14"/>
<word-count count="8991"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Medicine and Public Health</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1 Introduction</title>
<p>The healthcare industry plays a pivotal role in ensuring the wellbeing of individuals and communities. Despite the rapid advancements in technology, most of the industry still relies heavily on manual procedures including, but not limited, to diagnosis and treatments. These manual procedures can be time-consuming and prone to errors in the result of workload and lack of facilities. Such factors may further lead to serious consequences such as misdiagnosis, incorrect treatment, and adverse patient outcomes (Silva et al., <xref ref-type="bibr" rid="B48">2022</xref>). To overcome these challenges, various approaches have been explored to assist caregivers in decision-making by Computer Aided Diagnosis (CAD) (Doi, <xref ref-type="bibr" rid="B9">2007</xref>). Among Fuzzy logic (Kovalerchuk et al., <xref ref-type="bibr" rid="B23">1997</xref>), rule-based (Ion et al., <xref ref-type="bibr" rid="B19">2009</xref>), and other predictive models (Yanase and Triantaphyllou, <xref ref-type="bibr" rid="B58">2019</xref>), machine learning (ML) established outstanding potentials for CAD systems (Reyes et al., <xref ref-type="bibr" rid="B39">2020</xref>). The most highlighted approach in machine learning is known as deep learning (DL) for its ability to learn complex and meaningful patterns from large volume of data (LeCun et al., <xref ref-type="bibr" rid="B25">2015</xref>; Voulodimos et al., <xref ref-type="bibr" rid="B52">2018</xref>; Shrestha and Mahmood, <xref ref-type="bibr" rid="B46">2019</xref>; Georgiou et al., <xref ref-type="bibr" rid="B13">2020</xref>; Mahony et al., <xref ref-type="bibr" rid="B27">2020</xref>). In spite of its success in disease classification and localization, there are many internal and external challenges in deep learning (Aasem et al., <xref ref-type="bibr" rid="B1">2022</xref>). Internal challenges include appropriate selection of hyperparameters and interpretability. Similarly, external challenges necessitate addressing the demands for high computational resources and large volume of training data.</p>
<p>Advancements in hardware technology, such as graphics processing unit (GPU), tensor processing unit (TPU), and application-specific integrated circuit (ASIC), have sufficiently addressed the demand of high computational need for deep learning (Mittal and Vaishay, <xref ref-type="bibr" rid="B28">2019</xref>; Hu et al., <xref ref-type="bibr" rid="B17">2022</xref>; Nikoli&#x00107; et al., <xref ref-type="bibr" rid="B29">2022</xref>). However, acquisition of large volume data with task-specific annotation is still a challenge (Aasem et al., <xref ref-type="bibr" rid="B1">2022</xref>). This becomes even more harder when annotation requires specialized skills and experience of radiologists. This study exploits weak supervised learning for dealing with the annotations issue for disease localization in chest X-ray images using deep learning. In general, X-ray images are examined by radiologists who specialize in the interpretation of similar reports related to diagnoses of chest, lungs, heart, and related disorders. In routine tasks, they can identify the patterns of related disorder just by visual examination. In some cases, multiple radiologists are engaged to discuss a given report for its complexity and criticality (Siegel, <xref ref-type="bibr" rid="B47">2019</xref>). Such cases may not be concluded easily and may float with misperceptions. To resolve such cases, majority of votes, senior opinion weightage, or further testing are considered. Moreover, conclusive inferences are still made in conjunction with additional information such as patient history and current condition (Prevedello et al., <xref ref-type="bibr" rid="B33">2019</xref>). This complexity makes the annotation process harder to accomplish for a large volume of images. This study discusses an indirect approach for localization, thereby aiming to overcome such dependency issues in weak supervised learning.</p>
<p>Furthermore, deep learning models have been deemed untrustworthy due to their non-justified inferences (Adabi and Berrada, <xref ref-type="bibr" rid="B2">2018</xref>; Sheu and Pardeshi, <xref ref-type="bibr" rid="B44">2022</xref>). Such behavior is critical for the CAD system that creates a major bottleneck for their practical application in the healthcare industry (Reyes et al., <xref ref-type="bibr" rid="B39">2020</xref>; Elhalawani and Mak, <xref ref-type="bibr" rid="B11">2021</xref>; Yu et al., <xref ref-type="bibr" rid="B59">2022</xref>; Park et al., <xref ref-type="bibr" rid="B31">2023</xref>). Despite overlooking the need for the model&#x00027;s self-justification concern, they are evaluated based on their performance metrics for given datasets. As highlighted by Wagstaff (<xref ref-type="bibr" rid="B53">2012</xref>), models must be measured beyond benchmarked datasets and quantitative metrics. Predicting a medical image as positive or negative disorder does not answer completely from the radiologist&#x00027;s perspective. &#x0201C;How the prediction inferred?&#x00022; is also a matter interest of transparency and reliability view points (Adabi and Berrada, <xref ref-type="bibr" rid="B2">2018</xref>). To address the transparency concern, the proposed work aims to employee CAM as function. The existing literature have discussed CAM and its variants for single model interpretability within the limited scope, i.e., visual evaluation. The proposed framework is referred to as Ensemble-CAM because it extends the current scope in two directions: First, it allows multiple models in the ensemble learning paradigm to generate a single set of interpretable features. Second, it evaluates the intermediate and final outcomes using quantitative metrics, i.e., Jaccard index or Intersection over Union (IoU). An intuitive illustration of the proposed framework has been illustrated in <xref ref-type="fig" rid="F1">Figure 1</xref>. This depicts a weakly supervised pipeline, where the image classifier is trained on X-ray images in the first phase. Until this phase, the model is a black box, capable only of predicting a class value. The next block consists of a CAM function that generates a heatmap and reveals activated features. The heatmap further constitutes spatial information in the form of bounding box coordinates.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Extracting localization details via classification skills.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-07-1366415-g0001.tif"/>
</fig>
<p>The rest of the study is organized into four main sections. In Section 2, a brief overview of related literature is provided, serving as a foundation for the proposed methodology outlined in Section 3. This methodology includes details on the dataset utilized, the proposed technique, and the experimental methodology employed. The results and discussions are presented in Section 4, providing insights into the outcomes of the study. Finally, in Section 5, the study concludes with a comprehensive summary of the findings and directions for future research, offering a glimpse into the potential avenues for growth and advancement in this field.</p></sec>
<sec id="s2">
<title>2 Literature review</title>
<p>Deep learning has revolutionized computer-aided diagnosis (CAD) in medical imaging, marking significant progress since the last decade (Ma et al., <xref ref-type="bibr" rid="B26">2021</xref>). Its successful integration into various medical fields, particularly in radiology (Reyes et al., <xref ref-type="bibr" rid="B39">2020</xref>; Chandola et al., <xref ref-type="bibr" rid="B6">2021</xref>), dermatology (Esteva et al., <xref ref-type="bibr" rid="B12">2017</xref>; Rezvantalab et al., <xref ref-type="bibr" rid="B40">2018</xref>; Jeong et al., <xref ref-type="bibr" rid="B22">2022</xref>), and cardiology, demonstrates its versatility and effectiveness. In radiology, deep learning models such as DenseNet (ea Shortliffe, <xref ref-type="bibr" rid="B10">1975</xref>) and ResNet have been instrumental in enhancing the detection and diagnosis of abnormalities in chest X-ray images, evolving from traditional rule-based methods to more advanced, reliable solutions (Doi, <xref ref-type="bibr" rid="B9">2007</xref>). These models have not only improved diagnostic accuracy but also introduced flexibility, making them adaptable across different imaging modalities. Despite their success, these deep learning approaches face challenges such as data dependency and interpretability, necessitating a balanced evaluation of their impact on medical imaging and patient care.</p>
<p>Explainable AI (XAI) techniques in medical imaging have gained traction for enhancing the transparency and trustworthiness of deep learning models (Giuste et al., <xref ref-type="bibr" rid="B14">2023</xref>). Tools, such as Grad-CAM, Yan et al. (<xref ref-type="bibr" rid="B57">2018</xref>) and Guan et al. (<xref ref-type="bibr" rid="B15">2020</xref>) provide visual explanations of model decisions, particularly in chest X-ray analysis, by highlighting relevant areas influencing the diagnostic outcome. This advancement is crucial in radiology, where understanding the rationale behind AI predictions is as important as the predictions themselves. Shi et al. (<xref ref-type="bibr" rid="B45">2021</xref>) further emphasizes the role of XAI in combating pandemics, showcasing how these methods can bridge the trust gap in clinical decision-making during critical health crises. Although XAI has empowered radiologists with better interpretative insights, it still faces challenges, such as the potential for misinterpretation and the need for improved methods to accurately reflect the underlying model logic. The integration of XAI in medical imaging thus represents a pivotal step toward more reliable and interpretable diagnostic systems, fostering greater acceptance and confidence among medical professionals (Szegedy et al., <xref ref-type="bibr" rid="B50">2013</xref>; Rao et al., <xref ref-type="bibr" rid="B38">2020</xref>). Rani et al. (<xref ref-type="bibr" rid="B36">2022a</xref>) proposed model Covid-Scanner detects COVID-19 in chest radiographs through a multi-modal system. By combining bone suppression, lung segmentation, and classification they further utilize GradCAM&#x0002B;&#x0002B; for feature visualization.</p>
<p>Similarly, Caroprese et al. (<xref ref-type="bibr" rid="B5">2022</xref>) explores argumentation approaches in XAI, offering structured justifications for medical decisions, thereby improving explainability and transparency. Although XAI has empowered radiologists with better interpretative insights, it still faces challenges, such as the potential for misinterpretation and the need for improved methods to accurately reflect the underlying model logic. The integration of XAI in medical imaging, including argumentation theory, thus represents a pivotal step toward more reliable and interpretable diagnostic systems, fostering greater acceptance and confidence among medical professionals (Szegedy et al., <xref ref-type="bibr" rid="B50">2013</xref>; Rao et al., <xref ref-type="bibr" rid="B38">2020</xref>). The CovidScanner model (Rani et al., <xref ref-type="bibr" rid="B36">2022a</xref>), for instance, detects COVID-19 in chest radiographs through a multi-modal system, utilizing GradCAM&#x0002B;&#x0002B; for feature visualization and exemplifying the practical application of XAI in pandemic response.</p>
<p>Weakly supervised learning has emerged as a promising approach in chest X-ray image analysis, addressing the scarcity of finely annotated medical images (Islam et al., <xref ref-type="bibr" rid="B21">2017</xref>; Ouyang et al., <xref ref-type="bibr" rid="B30">2020</xref>). Unlike strongly supervised methods that require detailed annotations, weak supervision leverages image-level labels to localize and identify pathological features, thereby mitigating the extensive effort and expertise needed for detailed labeling. Despite its cost-effectiveness and reduced annotation requirements, weakly supervised models often face challenges in achieving the high precision and specificity seen in fully supervised systems. The balance between model performance and the availability of limited annotated data is critical, making weakly supervised learning a key area of research for improving accessibility and efficiency in medical diagnostics (Rozenberg et al., <xref ref-type="bibr" rid="B41">2020</xref>; Wehbe et al., <xref ref-type="bibr" rid="B55">2021</xref>). This approach not only broadens the applicability of deep learning in resource-constrained settings but also encourages advancements in algorithmic efficiency and interpretability.</p>
<p><xref ref-type="table" rid="T1">Table 1</xref> presents an overview of abnormalities detection approaches for X-ray images. The comparative analysis of deep learning methods in medical imaging, especially in chest X-ray analysis, reveals a diverse landscape of methodologies ranging from traditional machine learning to advanced deep learning and weakly supervised models (Rajpurkar et al., <xref ref-type="bibr" rid="B35">2017</xref>; An et al., <xref ref-type="bibr" rid="B3">2022</xref>). Each method presents its own set of advantages and limitations. For instance, while deep learning models such as DenseNet and ResNet have shown remarkable success in accuracy and reliability, they require substantial data and computational resources (ea Shortliffe, <xref ref-type="bibr" rid="B10">1975</xref>). The SFRM-GAN (Rani et al., <xref ref-type="bibr" rid="B37">2022b</xref>) enhances bone suppression while preserving image quality and spatial resolution. On the other hand, weakly supervised approaches offer a solution to limited data scenarios but may compromise on localization precision (Ouyang et al., <xref ref-type="bibr" rid="B30">2020</xref>). The critique of these methods underscores the need for a balanced approach that considers both the technical and practical aspects of medical image analysis. It emphasizes the importance of interpretability, resource efficiency, and adaptability to varying clinical needs, guiding future research Toward more holistic and context-aware diagnostic solutions (Yan et al., <xref ref-type="bibr" rid="B57">2018</xref>; Ponomaryov et al., <xref ref-type="bibr" rid="B32">2021</xref>).</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Summary of relevant approaches for detection of abnormalities in X-ray images.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Refereces</bold></th>
<th valign="top" align="left"><bold>Methodology</bold></th>
<th valign="top" align="left"><bold>Ensembled</bold></th>
<th valign="top" align="left"><bold>Interpretability</bold></th>
<th valign="top" align="left"><bold>Localization</bold></th>
<th valign="top" align="left"><bold>Evaluation</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Rajpurkar et al. (<xref ref-type="bibr" rid="B35">2017</xref>)</td>
<td valign="top" align="left">DenseNet-121</td>
<td valign="top" align="left">No</td>
<td valign="top" align="left">Grad-CAM</td>
<td valign="top" align="left">Heatmap</td>
<td valign="top" align="left">Visual</td>
</tr> <tr>
<td valign="top" align="left">Islam et al. (<xref ref-type="bibr" rid="B21">2017</xref>)</td>
<td valign="top" align="left">ResNet-50, ResNet-101, ResNet-152</td>
<td valign="top" align="left">Yes</td>
<td valign="top" align="left">Convnet up-sample</td>
<td valign="top" align="left">Heatmap</td>
<td valign="top" align="left">Occlusion sensitivity</td>
</tr> <tr>
<td valign="top" align="left">Rozenberg et al. (<xref ref-type="bibr" rid="B41">2020</xref>)</td>
<td valign="top" align="left">Specialized loss function, anti-aliasing filters, and conditional random field layers</td>
<td valign="top" align="left">No</td>
<td valign="top" align="left">No</td>
<td valign="top" align="left">No</td>
<td valign="top" align="left">IoU</td>
</tr> <tr>
<td valign="top" align="left">An et al. (<xref ref-type="bibr" rid="B3">2022</xref>)</td>
<td valign="top" align="left">ResNet &#x0002B; channel attention</td>
<td valign="top" align="left">No</td>
<td valign="top" align="left">No</td>
<td valign="top" align="left">Channel attention</td>
<td valign="top" align="left">No</td>
</tr> <tr>
<td valign="top" align="left">Yan et al. (<xref ref-type="bibr" rid="B57">2018</xref>)</td>
<td valign="top" align="left">DenseNet, squeeze-and-excitation block, multi-map transfer layer, max-min pooling operator</td>
<td valign="top" align="left">No</td>
<td valign="top" align="left">Grad-CAM&#x0002B;&#x0002B;</td>
<td valign="top" align="left">Heatmap</td>
<td valign="top" align="left">Visual</td>
</tr> <tr>
<td valign="top" align="left">Guan et al. (<xref ref-type="bibr" rid="B15">2020</xref>)</td>
<td valign="top" align="left">AG-CNN (Global block, Local block, Fusion)</td>
<td valign="top" align="left">No</td>
<td valign="top" align="left">Grad-CAM</td>
<td valign="top" align="left">Heatmap</td>
<td valign="top" align="left">Visual</td>
</tr> <tr>
<td valign="top" align="left">Wehbe et al. (<xref ref-type="bibr" rid="B55">2021</xref>)</td>
<td valign="top" align="left">DeepCOVID-XR (DenseNet-121, ResNet-50, InceptionV3, Inception-ResNetV2, Xception, EfficientNet-B2)</td>
<td valign="top" align="left">Yes</td>
<td valign="top" align="left">Grad-CAM</td>
<td valign="top" align="left">Heatmap</td>
<td valign="top" align="left">Visual</td>
</tr> <tr>
<td valign="top" align="left">Ouyang et al. (<xref ref-type="bibr" rid="B30">2020</xref>)</td>
<td valign="top" align="left">Foreground, positive, and abnormality attentions</td>
<td valign="top" align="left">No</td>
<td valign="top" align="left">Grad-CAM</td>
<td valign="top" align="left">BBox</td>
<td valign="top" align="left">IoU</td>
</tr> <tr>
<td valign="top" align="left">Wu et al. (<xref ref-type="bibr" rid="B56">2020</xref>)</td>
<td valign="top" align="left">6-region-slice, U-Net</td>
<td valign="top" align="left">No</td>
<td valign="top" align="left">No</td>
<td valign="top" align="left">BBox</td>
<td valign="top" align="left">IoU</td>
</tr> <tr>
<td valign="top" align="left">Ponomaryov et al. (<xref ref-type="bibr" rid="B32">2021</xref>)</td>
<td valign="top" align="left">X-Ray CAD (DenseNet-201, ResNet-50, EfficientNet)</td>
<td valign="top" align="left">Yes</td>
<td valign="top" align="left">Grad-CAM</td>
<td valign="top" align="left">Heatmap</td>
<td valign="top" align="left">Visual</td>
</tr> <tr>
<td valign="top" align="left">Rani et al. (<xref ref-type="bibr" rid="B36">2022a</xref>)</td>
<td valign="top" align="left">Multi-modal bone suppression, lung segmentation</td>
<td valign="top" align="left">No</td>
<td valign="top" align="left">Grad-CAM&#x0002B;&#x0002B;</td>
<td valign="top" align="left">Heatmap</td>
<td valign="top" align="left">Visual</td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>The comparison of different methodologies in the literature.</p>
</table-wrap-foot>
</table-wrap>
<p>Current trends in medical imaging, particularly in chest X-ray analysis, indicate a growing emphasis on addressing the challenges of labeled data acquisition, transparency, and reliability (Irvin et al., <xref ref-type="bibr" rid="B20">2019</xref>; Wu et al., <xref ref-type="bibr" rid="B56">2020</xref>). The acquisition of labeled data remains a significant bottleneck, with efforts such as CheXpert (Irvin et al., <xref ref-type="bibr" rid="B20">2019</xref>) aiming to expand the availability of annotated datasets for training more robust models. Transparency in AI decisions is another critical aspect, where models such as U-Net and RetinaNet are being adapted to provide clearer insights into diagnostic decisions (Wu et al., <xref ref-type="bibr" rid="B56">2020</xref>). However, the reliability of these AI systems, especially in the face of noisy or limited data, continues to be a concern (Rao et al., <xref ref-type="bibr" rid="B38">2020</xref>; Szegedy et al., <xref ref-type="bibr" rid="B50">2013</xref>). The end-goal is to develop AI systems that not only perform well under various constraints but also earn the trust of medical professionals through transparent and interpretable outputs. Addressing these challenges requires ongoing innovation in machine learning techniques and a deeper understanding of the clinical context, to ensure that the development of AI in medical imaging aligns with the real-world needs of healthcare providers and patients.</p></sec>
<sec sec-type="materials and methods" id="s3">
<title>3 Materials and methods</title>
<p>The proposed model consists of three main components, namely classification, class activated mapping, and aggregation. It also employs two supporting components that shall be referred as classfinalizer and heatmap-generator. The architecture of the proposed model follows ensemble learning at the classification and localization stages and is named as Ensemble-CAM. As illustrated in <xref ref-type="fig" rid="F2">Figure 2</xref>, it requires no localization annotations at the training phase, yet capable to produce the bounding box and segmentation details in the explainable format. The output of Ensemble-CAM consists of aggregated class value, bounding boxes, mask, and heatmaps that interpret the result formation.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Block diagram of Ensemble-CAM for localizing abnormalities in the X-ray image with interpretable outcomes.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-07-1366415-g0002.tif"/>
</fig>
<p>This section briefly explains the methodology of proposed work in detail. First, the sub-section describes the properties of datasets for the experiments while subsequently listing the deep learning classifiers. Next, conceptual definitions are established in general for class activation mapping and heatmap generation. Finally, Ensemble-CAM is defined and demonstrated via some test data.</p>
<sec>
<title>3.1 Dataset</title>
<p>Three datasets have been considered to validate the performance of the proposed approach. To classify and localize pneumonia, the RSNA pneumonia detection dataset (Anouk Stein, <xref ref-type="bibr" rid="B4">2018</xref>) has been used with 14,864 images to train the classifiers. In this dataset, 6,012 images have been marked positive for pneumonia, while 8,851 show no relative symptoms. For all pneumonia confirming images, the dataset also offers bounding box ground truth which was not used during the training phase. Similarly, the Chest-Xray-14 dataset (Wang and Peng, <xref ref-type="bibr" rid="B54">2017</xref>) has been considered to detect cardiomegaly. The classifiers have been trained only for 9,628 images in which 4,000 images show enlarged hearth visuals. The dataset contains a small subset of images that have bounding box annotations which were ignored during training the classifier but considered in testing. The third dataset contains radiographs that have been tagged as COVID-19 confirming cases (Chowdhury et al., <xref ref-type="bibr" rid="B8">2020</xref>; Rahman et al., <xref ref-type="bibr" rid="B34">2021</xref>). Unlike the previous two datasets, there exist no bounding box annotations in this dataset. Therefore, a quantitative metric for localization has not been applied to demonstrate the model&#x00027;s performance. <xref ref-type="table" rid="T2">Table 2</xref> shows the distribution of given datasets for training and validation during the training phase.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Datasets for demonstration of Ensemble-CAM performance.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>DATASET</bold></th>
<th valign="top" align="left"><bold>TARGET</bold></th>
<th valign="top" align="center"><bold>TRAIN</bold></th>
<th valign="top" align="center"><bold>VALID</bold></th>
<th valign="top" align="center"><bold>TOTAL</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">RSNA</td>
<td valign="top" align="left">Pneumonia</td>
<td valign="top" align="center">11,891</td>
<td valign="top" align="center">2,972</td>
<td valign="top" align="center">14,864</td>
</tr> <tr>
<td valign="top" align="left">Chest X-Ray14</td>
<td valign="top" align="left">Cardiomegaly</td>
<td valign="top" align="center">5,477</td>
<td valign="top" align="center">1,369</td>
<td valign="top" align="center">6,846</td>
</tr> <tr>
<td valign="top" align="left">COVID-19</td>
<td valign="top" align="left">COVID-19</td>
<td valign="top" align="center">7,703</td>
<td valign="top" align="center">1,925</td>
<td valign="top" align="center">9,628</td>
</tr></tbody>
</table>
</table-wrap></sec>
<sec>
<title>3.2 Methods for evaluation</title>
<p>The performance evaluation metrics in this study has been split into two groups task-wise. For the classification task, accuracy (<xref ref-type="disp-formula" rid="E1">Equation 1</xref>), recall (<xref ref-type="disp-formula" rid="E2">Equation 2</xref>), and precision (<xref ref-type="disp-formula" rid="E3">Equation 3</xref>) have been computed. Similarly, Intersection over Union (<xref ref-type="disp-formula" rid="E4">Equations 4</xref>, <xref ref-type="disp-formula" rid="E5">5</xref>) (also known as the Jaccard index) has been used to measure the quality of the localization task. The base components for all these metrics are as follows:</p>
<list list-type="bullet">
<list-item><p>True positive: output that correctly indicates the presence of a condition.</p></list-item>
<list-item><p>True negative: output that correctly indicates the absence of a condition.</p></list-item>
<list-item><p>False positive: output that wrongly indicates the presence of a condition.</p></list-item>
<list-item><p>False negative: output that wrongly indicates the absence of a condition.</p></list-item>
</list>
<p>Accuracy: accuracy is a primary metric that refers to the ratio of number of correct predictions to the total number of input samples.</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">Accuracy</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mtext class="textrm" mathvariant="normal">number of correct predictions</mml:mtext></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">total number of predictions made</mml:mtext></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Recall: recall is the proportion of actual positive cases that are correctly identified.</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">Recall</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mtext class="textrm" mathvariant="normal">true positive</mml:mtext></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">true positive</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext class="textrm" mathvariant="normal">false negative</mml:mtext></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Precision: precision also known as positive predictive value (PPV), refers to the proportion of positive cases that were correctly identified.</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">Precision</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mtext class="textrm" mathvariant="normal">true positive</mml:mtext></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">true positive</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext class="textrm" mathvariant="normal">false positive</mml:mtext></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Intersection-over-Union: the metric is well known for object detection task in strong supervised learning. It quantifies the degree of overlap between predicted and ground-truth boxes. Its values range from 0 to 1 where 0 refers to no overlap and 1 declares perfect overlap.</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">IoU</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mtext class="textrm" mathvariant="normal">area of overlap</mml:mtext></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">area of union</mml:mtext></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>In confusion matrices, it can be expressed as follows:</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">IoU</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mtext class="textrm" mathvariant="normal">TP</mml:mtext></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">TP</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext class="textrm" mathvariant="normal">FP</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext class="textrm" mathvariant="normal">FN</mml:mtext></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The keynote for IoU in weak surprised learning is the unavailability of ground truth values. This makes it challenging to validate the performance of the given model. To quantify the proposed model performance with IoU, this study includes two datasets with bounding box annotated ground-truth. They have not been exposed during training but used at test instances only.</p></sec>
<sec>
<title>3.3 Ablation study for classification task</title>
<p>The ablation study conducted as part of this research aimed to evaluate a comprehensive range of deep learning image classifiers for the task of disease localization in chest X-ray images. Included in this assessment were AlexNet (Krizhevsky et al., <xref ref-type="bibr" rid="B24">2017</xref>), VGG-16 &#x00026; VGG-19 (Simonyan and Zisserman, <xref ref-type="bibr" rid="B49">2014</xref>), ResNet-50 (He et al., <xref ref-type="bibr" rid="B16">2016</xref>), EfficientNetB1 (Tan and Le, <xref ref-type="bibr" rid="B51">2019</xref>), NasNetMobile (Zoph et al., <xref ref-type="bibr" rid="B60">2018</xref>), MobileNetV2 (Sandler et al., <xref ref-type="bibr" rid="B42">2018</xref>), DenseNet169 (Huang et al., <xref ref-type="bibr" rid="B18">2017</xref>), and DenseNet121 (Huang et al., <xref ref-type="bibr" rid="B18">2017</xref>).</p>
<p>The common hyperparameters employed in training these models are detailed in <xref ref-type="table" rid="T3">Table 3</xref>. The experiments were executed on a 64bit Ubuntu 20.04.5 LTS platform, powered by an Intel<sup>&#x000AE;</sup>Core i5-3470 CPU &#x00040; 3.20GHz x 4 and an NVIDIA GeForce GTX 1080 GPU, utilizing Python 3.9.12 with tensorflow 2.4.1 and keras-gpu 2.4.3.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Configurations for training the image classifiers.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Dataset</bold></th>
<th valign="top" align="left"><bold>Key</bold></th>
<th valign="top" align="left"><bold>Value</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Dataset</td>
<td valign="top" align="left">Split</td>
<td valign="top" align="left">Ratio: 70/30</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">Color mode</td>
<td valign="top" align="left">RGB</td>
</tr> <tr>
<td valign="top" align="left">Callback</td>
<td valign="top" align="left">Model checkpoint</td>
<td valign="top" align="left">Monitor: validation accuracy. Mode: Max</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">Early stopping</td>
<td valign="top" align="left">Monitor: validation loss. Min_delta: 0.01. Patience: 6. Mode: auto. Baseline: None</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">Reduce LR on plateau</td>
<td valign="top" align="left">Monitor: validation loss. Factor: 0.01. Patience: 4. Mode: auto. Min_delta:0.001</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">others</td>
<td valign="top" align="left">TerminateOnNaN</td>
</tr> <tr>
<td valign="top" align="left">Hyper-parameter</td>
<td valign="top" align="left">Max. Epoch</td>
<td valign="top" align="center">50</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">Optimizer</td>
<td valign="top" align="left">Adam</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">Loss</td>
<td valign="top" align="left">Categorical Crossentropy</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">Initial weights</td>
<td valign="top" align="left">Imagenet</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">Output layer</td>
<td valign="top" align="left">Softmax</td>
</tr></tbody>
</table>
</table-wrap>
<p>The initial phase of this study revealed that the models with fewer layers, such as AlexNet, VGG-16, VGG-19, and NasNetMobile, did not perform optimally on the chest X-ray datasets, which characteristically exhibit less feature variation than other types of image datasets. Thus, these models were excluded from the subsequent training rounds. Deeper and more complex architectures were then subjected to a rigorous second round of training.</p>
<p>The subsequent evaluations led to the selection of DenseNet models and Xception for their exemplary performance metrics, while ResNet-50, InceptionV3, and MobileNetV2 were phased out due to denser and complex architectures. This selection process was instrumental in constructing an Ensemble-CAM framework composed of classifiers that not only excel in image-level classification but also in generating precise heatmaps for disease localization.</p>
<p>The experimental iterations for given datasets with specified hyperparameters concluded on DenseNet169, DenseNet121, InceptionResnetV2, and Xception as detailed in <xref ref-type="table" rid="T4">Table 4</xref>. These models, particularly the DenseNet architectures, excelled in localizing cardiomegaly within the Chest-Xray14 dataset and pneumonia in the RSNA dataset, while InceptionResnetV2 demonstrated exceptional precision across multiple conditions. Notably, for COVID-19 detection, DenseNet121 and InceptionResnetV2 demonstrated high accuracy and precision, highlighting their capacity for reliable pattern identification.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Performance of classifiers on given datasets.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Target class</bold></th>
<th valign="top" align="left"><bold>Classifier</bold></th>
<th valign="top" align="center"><bold>Acc</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Cardiomegaly (Chest-Xray14)</td>
<td valign="top" align="left">DenseNet169</td>
<td valign="top" align="center">0.95</td>
<td valign="top" align="center">0.92</td>
<td valign="top" align="center">0.90</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">DenseNet121</td>
<td valign="top" align="center">0.94</td>
<td valign="top" align="center">0.91</td>
<td valign="top" align="center">0.89</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">InceptionResnetV2</td>
<td valign="top" align="center">0.96</td>
<td valign="top" align="center">0.95</td>
<td valign="top" align="center">0.94</td>
</tr> <tr>
<td valign="top" align="left">Pneumonia (RSNA)</td>
<td valign="top" align="left">DenseNet169</td>
<td valign="top" align="center">0.97</td>
<td valign="top" align="center">0.93</td>
<td valign="top" align="center">0.88</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">Xception</td>
<td valign="top" align="center">0.93</td>
<td valign="top" align="center">0.93</td>
<td valign="top" align="center">0.90</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">InceptionResnetV2</td>
<td valign="top" align="center">0.93</td>
<td valign="top" align="center">0.90</td>
<td valign="top" align="center">0.87</td>
</tr> <tr>
<td valign="top" align="left">COVID-19 (COVID-19)</td>
<td valign="top" align="left">DenseNet121</td>
<td valign="top" align="center">0.97</td>
<td valign="top" align="center">0.95</td>
<td valign="top" align="center">0.95</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">InceptionResnetV2</td>
<td valign="top" align="center">0.98</td>
<td valign="top" align="center">0.96</td>
<td valign="top" align="center">0.97</td>
</tr>
 <tr>
<td/>
<td valign="top" align="left">Xception</td>
<td valign="top" align="center">0.97</td>
<td valign="top" align="center">0.92</td>
<td valign="top" align="center">0.94</td>
</tr></tbody>
</table>
</table-wrap>
<p>The classifiers ultimately incorporated into Ensemble-CAM were deliberately chosen to strike an optimal balance between localization performance and computational demand. While the selected models&#x02013;DenseNet169, DenseNet121, InceptionResnetV2, and Xception&#x02013;require considerable computational resources due to their complexity, they also significantly enhance localization accuracy. This is essential for clinical applications where diagnostic precision is paramount. The selection process prioritized models that brought substantial improvements in localization accuracy without disproportionately increasing computational costs. This ensures that Ensemble-CAM delivers a high diagnostic value while remaining practical for use in diverse clinical environments, even where computational resources may be limited.</p>
<p>Finally, the chosen classifiers for Ensemble-CAM were carefully picked to ensure a good balance between accurate disease localization and the amount of computational power needed. These models do require more computational resources, but they provide better accuracy in pinpointing diseases on chest X-ray images. The decision to use these models was based on their ability to give clearer results for diagnosis without needing an unreasonable amount of computing power, making Ensemble-CAM a practical option for medical settings with varying levels of available technology.</p></sec>
<sec>
<title>3.4 Application of CAM</title>
<p>Ensemble-CAM utilizes class activation mapping techniques for achieving two objectives: (1) to generate heatmaps that make the outcome interpretable and (2) to extract spatial information for the localization task. While employing the CAM technique, the design goal was to avoid model alteration, re-training, and better visibility of detected objects. Three variants of CAM have been considered in the ablation study, namely Vanilla CAM (Definition 1), Grad-CAM (Definition 2), and Grad-CAM&#x0002B;&#x0002B; (Definition 3). Two limitations were identified in Vanilla CAM for the proposed framework, i.e., coarse visuals on heatmap image and model alteration with a global average pooling layer. To address these challenges, Grad-CAM (Selvaraju and Batra, <xref ref-type="bibr" rid="B43">2020</xref>) was evaluated next as it offers better interpretability without trading-off the model structure and performance. Grad-CAM extracts a raw feature map during the forward propagation. This tensor is backpropagated to the desired rectified convolutional feature maps. This collectively computes the coarse Grad-CAM localization which explains where the model must look to make the specific decision. During experiments on X-ray images, Grad-CAM&#x00027;s ability to properly localize areas of interest was observed decreasing for multiple occurrences of the same class. The main reason for this decrease is emphasizing the global information that local differences are vanished in it. This impact has been minimized in Grad-CAM&#x0002B;&#x0002B; (Chattopadhay et al., <xref ref-type="bibr" rid="B7">2018</xref>) which enhances the output map for the multiple occurrences of the same object in a single image. Specifically, it emphasizes the positive influences of neurons by considering higher-order derivatives.</p>
<p><bold>Notation</bold>. Let us declare a convolutional neural network as <italic>Y</italic> &#x0003D; <italic>f</italic>(<italic>X</italic>), such that input <italic>X</italic> &#x02208; &#x0211D;<sup><italic>d</italic></sup> and output <italic>Y</italic> as a probability distribution. We define <italic>Y</italic><sup><italic>c</italic></sup> as the probability of being classified as class <italic>c</italic>. For a specified layer <italic>l</italic>, let <italic>A</italic><sub><italic>l</italic></sub> refer to the activation of layer <italic>l</italic>. Specifically, if <italic>l</italic> has been selected as a convolution layer, then <inline-formula><mml:math id="M6"><mml:mrow><mml:msubsup><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> denotes the activation for the <italic>k</italic>-th channel. This also denotes the weight of the <italic>k</italic>-th neuron at layer <italic>l</italic> which connects two layers <italic>l</italic> and <italic>l</italic>&#x0002B;1 as <italic>W</italic><sub><italic>i</italic>&#x0002B;<italic>l</italic>&#x0002B;1</sub>.</p>
<p><bold>Definition 1 (Class Activation Map)</bold>. Using the defined notation, consider a model <italic>f</italic> consists of a global pooling layer <italic>l</italic> that takes the output from the last convolutional layer <italic>l</italic> &#x02212; 1 and feeds the pooled activation to a fully connected layer <italic>l</italic> &#x0002B; 1 for classification. For a class of interest <italic>c</italic>, <inline-formula><mml:math id="M7"><mml:mrow><mml:msubsup><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">CAM</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> can be defined in <xref ref-type="disp-formula" rid="E6">Equation 6</xref> as:</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M8"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">CAM</mml:mtext></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">ReLU</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msubsup><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:msubsup><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p><inline-formula><mml:math id="M18"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">where:</mml:mtext></mml:mtd><mml:mtd><mml:msubsup><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></inline-formula></p>
<p><inline-formula><mml:math id="M19"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></inline-formula> is the weight for the <italic>k</italic>-th neuron after global pooling at layer l.</p>
<p><bold>Definition 2 (Grad-CAM)</bold>. Using the stated notation, suppose a model <italic>f</italic> and class of interest <italic>c</italic>, Grad-CAM is defined in <xref ref-type="disp-formula" rid="E7">Equation 7</xref> as:</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M10"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">CAM</mml:mtext></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">ReLU</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msubsup><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:msubsup><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where:</p>
<disp-formula id="E9"><mml:math id="M11"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup></mml:mtd><mml:mtd><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">GP</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:msup><mml:mrow><mml:mi>Y</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:msubsup><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">GP</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext>&#x0200A;</mml:mtext></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mtext class="textrm" mathvariant="normal">denoted the global pooling operation.</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p><bold>Definition 3 (Grad-CAM&#x0002B;&#x0002B;)</bold>. Using the stated notation, suppose a model <italic>f</italic> and class of interest <italic>c</italic>, Grad-CAM&#x0002B;&#x0002B; is defined in <xref ref-type="disp-formula" rid="E10">Equation 8</xref> as:</p>
<disp-formula id="E10"><label>(8)</label><mml:math id="M12"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">gradCAM&#x0002B;&#x0002B;</mml:mtext></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">ReLU</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msubsup><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:msubsup><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where:</p>
<disp-formula id="E11"><mml:math id="M13"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup></mml:mtd><mml:mtd><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>Z</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:msup><mml:mrow><mml:mi>Y</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:msubsup><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>Z</mml:mi></mml:mtd><mml:mtd><mml:mtext class="textrm" mathvariant="normal">is a constant that refers to the number of pixels in the</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">activation map.</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></sec>
<sec>
<title>3.5 Ensemble-CAM using interpretable features</title>
<p>The input image consists of three types of features such as (1) noise, (2) relevant, and (3) salient features. Noise induces distraction in the classification task and subject to be removed by techniques such as Gaussian blur, median filtering, and various filters. The relevant features are referred to the domain of interest which is to identify the legitimate chest X-ray (CXR) image with the frontal view. The salient features are class-specific sub-part of the relevant features.</p>
<p>In this study, CNN models have been targeted during the classification task for extracting salient features using the class activation mapping technique. As explained in section D, CAM identifies parts of the image that contribute most to the target class. The feature interpretability of a CAM arises from the fact that it provides a visual representation of the CNN model&#x00027;s understanding of the input image features that are important for the classification decision. The heatmap generated by the CAM highlights the regions of the image that are most relevant for the CNN model&#x00027;s prediction and can be used to identify the key features that distinguish between different classes. This provides valuable insights into the decision-making process of the CNN model and can help to identify which image features are most important for making a diagnosis.</p>
<p>Ensemble-CAM offers a fusion scheme to highlight prominent sub-regions in the X-ray image. It consolidates activation maps that have been generated by more than one image classifiers in the heatmap format. The resultant heatmaps are intersected by high confidence function. Formally stating:</p>
<p><bold>Definition 4 (Ensemble-CAM)</bold>. Suppose ensemble learning as a function <italic>g</italic> such that it produces a set of <italic>n</italic> number of heatmaps <italic>H</italic> (<xref ref-type="disp-formula" rid="E12">Equation 9</xref>); through models <italic>M</italic>; for a given input image <italic>x</italic>:</p>
<disp-formula id="E12"><label>(9)</label><mml:math id="M14"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>H</mml:mi><mml:mo>=</mml:mo><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>M</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where:</p>
<list list-type="bullet">
<list-item><p><italic>g</italic>() symbolizes as ensembled function.</p></list-item>
<list-item><p><italic>M</italic>() refers to set of models; <italic>m</italic><sub>1</sub>, <italic>m</italic><sub>2</sub>, ......., <italic>m</italic><sub><italic>n</italic></sub> that predicts class <italic>c</italic>.</p></list-item>
<list-item><p><italic>c</italic> implies either the user-defined input that explicitly refers to a class or the maximum occurrence of a predicted class.</p></list-item>
<list-item><p><italic>H</italic> denotes the set generated heatmaps <inline-formula><mml:math id="M15"><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>.</p></list-item>
</list>
<p>Then, Ensemble-CAM is defined in <xref ref-type="disp-formula" rid="E13">Equation 10</xref> as the intersection of <italic>H</italic> such that</p>
<disp-formula id="E13"><label>(10)</label><mml:math id="M16"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">ensembleCAM</mml:mtext></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x02229;</mml:mo><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x02229;</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>&#x02229;</mml:mo><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The proposed model is also expressed in <xref ref-type="table" rid="T7">Algorithm 1</xref>. First, input radiograph <italic>x</italic> is classified by all the given image classifiers <italic>m</italic><sub>1</sub>, <italic>m</italic><sub>1</sub>, ....., <italic>m</italic><sub><italic>n</italic></sub> to predict class values &#x00109;<sub>1</sub>, &#x00109;<sub>2</sub>, ....., &#x00109;<sub><italic>m</italic></sub>. The majority of class predicted value determines final predicted class such that <italic>c</italic>&#x02190;argmax([&#x00109;<sub>1</sub>, &#x00109;<sub>2</sub>, ......, &#x00109;<sub><italic>m</italic></sub>]) . The final class value <italic>c</italic> along with original radiograph <italic>x</italic> are provided to <monospace>cam</monospace> function (Grad-CAM&#x0002B;&#x0002B;) as an input. Each classifier generates a heatmap image as <inline-formula><mml:math id="M17"><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:mo>,</mml:mo><mml:mi>h</mml:mi><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>.</p>
<table-wrap position="float" id="T7">
<label>Algorithm 1</label>
<caption><p>Ensemble-CAM.</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><monospace> Require: &#x000A0;Image <italic>X</italic>&#x02208;<italic>R</italic><sup><italic>d</italic></sup>, target class <italic>c</italic>, models = [<italic>m</italic>1, <italic>m</italic>2, <italic>m</italic>3], cam=[gradcam]</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> Ensure: &#x000A0;Heatmap <italic>H</italic>, predicted class <italic>c</italic>, Bounding box (x, y, width, height)</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 1: &#x000A0;number_of_models &#x02190; count(models)</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 2: &#x000A0;Clist &#x02190;[]</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 3: &#x000A0;for i &#x02190; 1 to number_of_models <bold>do</bold></monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 4: &#x000A0; <italic>mi</italic>&#x02190; models[i]</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 5: &#x000A0; <italic>ci</italic>&#x02190; mi.predict_class(X)</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 6: &#x000A0; push(<italic>ci</italic>, Clist)</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 7: &#x000A0;end <bold>for</bold></monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 8: &#x000A0;if c = null <bold>then</bold></monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 9: &#x000A0; c &#x02190; argmax (Clist)</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 10: &#x000A0;end <bold>if</bold></monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 11: &#x000A0;for i &#x02190; 1 to number_of_models <bold>do</bold></monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 12: &#x000A0; <italic>mi</italic>&#x02190; models[i]</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 13: &#x000A0; <italic>H</italic><sub><italic>mi</italic></sub>&#x02190; mi.predict_map(X,c)</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 14: &#x000A0; gray &#x02190; extract_channel(<italic>H</italic><sub><italic>mi</italic></sub>, &#x00027;red&#x00027;)</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 15: &#x000A0; ret, thresh = threshold(gray,127,255,0)</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 16: &#x000A0; contours, hierarchy = findContours(thresh)</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 17: &#x000A0; rect = minAreaRect(contours))</monospace> </td></tr>
<tr><td align="left" valign="top"><monospace> 18: &#x000A0;end <bold>for</bold></monospace></td></tr> 
</tbody>
</table>
</table-wrap>
 <p>The aim of Ensemble-CAM is to increase the probability of true positive inferences at the pixel level by reducing noise and irrelevant regions. Hence, it produces more reliable spatial regions within the X-ray image. This study demonstrates the outcome of Ensemble-CAM for estimating bounding boxes without being trained on bounding box annotations (x, y, w, h). As discussed previously, class of a disorder at the image-level is predicted by three models independently. The top-ranking class is declared final automatically by the maximum voting scheme. Any class of interest can also be selected manually, if needed, for analysis. Next, Grad-CAM&#x0002B;&#x0002B; generates class-oriented heatmap in the jet-colormap scheme and the red channel is sliced for corresponding visual semantics.</p></sec></sec>
<sec id="s4">
<title>4 Results and discussion</title>
<p>The aim of Ensemble-CAM is to offer reliable and interpretable localization details without being explicitly trained on localization data. It supports the adaptation of the existing state-of-the-art image classification models and CAM function that require no alteration in the architecture. In addition to extending image classifier capabilities for the localization task, the framework presents the outcome in an explainable layout.</p>
<p>In this study, multiple deep learning models have been trained on three datasets of X-ray images (see <xref ref-type="table" rid="T4">Table 4</xref>).</p>
<p>During the testing phase, it was observed that different image classifiers may not always predict the same class for the same input X-ray image. This induces the unreliability aspect of employing the single model for diagnosis task. Such behavior validates the adaptation of assembling approach to overcome the probability of false predictions. Subsequently, the classification task with ensemble learning improved the overall performance.</p>
<p>Alongside classification, the also expect the model to justify the outcome. In traditional machine learning models such as decision trees, one could find such justifications in if-then hierarchies. Deep learning models are considered too opaque for if-then justifications in the image classification task. One alternative for such reasoning is supervised learning localization where areas-of-interest are highlighted either by masking or bounding boxing. This option is depended on rich annotated data that are difficult to acquire in higher quantity with adequate quality. Another alternative is to leverage the classification knowledge for localization as weakly supervised learning approach. We opted later option with class activation mapping techniques to achieve two objectives, i.e., localization and interpretation. Among CAM variants, Grad-CAM&#x0002B;&#x0002B; was found best suited for its ability to be adapted without altering the model while extracting finer localization information in case of multiple instances. Equally, it has also been found useful visual explainer for interpreting its outcome intuitively when heatmap images were generated.</p>
<p>As discussed, <italic>n</italic> numbers of classifiers produce <italic>n</italic> numbers of predictions in ensemble models. An aggregating function is therefore required to draw a single conclusion. Likewise, localization task also follows the same process, i.e., <italic>n</italic> classifiers produce <italic>n</italic> heatmaps which further require aggregation. We employed maximum voting function to achieve the confident value for final localization. This function has been applied at pixel level where maximum intersection occurs. Finally, minimum area rectangle has been formed from the qualified pixels&#x00027; left-top and right-bottom coordinates.</p>
<p>This study demonstrates the performance of the proposed model on three chest X-ray datasets for detecting three different pneumonia, COVID-19, and cardiomegaly. The RSNA pneumonia detection dataset has been used for training and validation. Although the dataset offers ground truth labels, they were not used to follow a weakly supervised approach. Image classifiers were trained on images using image-level class labels. Once trained, images from the test set were asked to classify and localize. The results were compared with the ground truth values to calculate the Jaccard index. The same strategy has been followed for detection of cardiomegaly using the Chest-Xray14 dataset. As both the datasets possess bounding-box level ground truth labels, the Jaccard index was calculated. For detecting COVID-19, the model has been trained only on images with class labels. However, the Jaccard index has not been computed as bounding box annotations for this dataset are not available.</p>
<p><xref ref-type="fig" rid="F3">Figure 3</xref> shows the generated heatmaps in the BGR color scheme referring to the intensity of activation from the highest to lowest. The green color serves as the border between the highest (blue) and lowest values (red). To form an estimated mask, a contour is drawn by connecting the green pixels as convex hull. The resultant polygon is served as the mask when filled with binary 1 while marking the rest as binary 0. Though the mask offers better localization, we proceeded to generate bounding boxes. The first reason is the demonstration of model capability for predicting bounding box. The second reason is to evaluate the localization performance with available annotation. The example of ground truth valued BBox is shown in <xref ref-type="fig" rid="F3">Figure 3</xref> in black color as a reference while the computed Jaccard index is displayed on the top.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Ensemble-CAM generates localization information from image level class labels while making the process interpretable. Reference to labeled data, IoU has been computed for quantification of the result.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-07-1366415-g0003.tif"/>
</fig>
<p>The detection and localization of cardiomegaly are shown in <xref ref-type="fig" rid="F4">Figure 4</xref> for few samples. The model consists of three classifiers, namely DenseNet169, DenseNet121, and InceptionResNet. The size of BBox among these classifiers can be observed as the first visual difference. DenseNet121 and DenseNet169 belong to the same family of architectures and form smaller and medium BBoxes respectively. InceptionResNet comparatively creates larger BBoxes with least accuracy in the collection. The consolidation step aggregates all the three BBoxes into single BBox to form a conclusive outcome. As the dataset is furnished with a small set of ground truth BBox annotations, quantitative results have also been computed using the Jaccard index. <xref ref-type="table" rid="T5">Table 5</xref> presents the computed values for the listed sample images. The same values can also be visualized at the center-top of each image under the classifier column. The performance varies from image to image among the classifiers. In the case of cardiomegaly, DenseNet121 constantly outperforms all radiographs while DenseNet169 and InceptionResNetV2 alternate for second place. This also ensembled outcomes to form comparatively coarse IoU because it considers cumulative intersections. For such configuration, a practitioner can give more value to the best classifier&#x00027;s predictions. However, there exist scenarios where single classifier may not always point to the right locations. Such scenarios have been demonstrated for the detection of pneumonia in the next model.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Estimation of bounding box annotation for cardiomegaly localization quantified by IoU scores.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-07-1366415-g0004.tif"/>
</fig>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>IoU detection score for the detection of cardiomegaly (ref. <xref ref-type="fig" rid="F4">Figure 4</xref>).</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Input instance</bold></th>
<th valign="top" align="center"><bold>DenseNet 169</bold></th>
<th valign="top" align="center"><bold>DenseNet 121</bold></th>
<th valign="top" align="center"><bold>Inception ResNetv2</bold></th>
<th valign="top" align="center"><bold>Finalized</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">a</td>
<td valign="top" align="center">0.48</td>
<td valign="top" align="center">0.53</td>
<td valign="top" align="center">0.37</td>
<td valign="top" align="center">0.46</td>
</tr> <tr>
<td valign="top" align="left">b</td>
<td valign="top" align="center">0.48</td>
<td valign="top" align="center">0.64</td>
<td valign="top" align="center">0.38</td>
<td valign="top" align="center">0.46</td>
</tr> <tr>
<td valign="top" align="left">c</td>
<td valign="top" align="center">0.37</td>
<td valign="top" align="center">0.64</td>
<td valign="top" align="center">0.44</td>
<td valign="top" align="center">0.51</td>
</tr> <tr>
<td valign="top" align="left">d</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="center">0.69</td>
<td valign="top" align="center">0.46</td>
<td valign="top" align="center">0.53</td>
</tr> <tr>
<td valign="top" align="left">e</td>
<td valign="top" align="center">0.44</td>
<td valign="top" align="center">0.70</td>
<td valign="top" align="center">0.50</td>
<td valign="top" align="center">0.50</td>
</tr></tbody>
</table>
</table-wrap>
<p>Therefore, the next better classifier was employed, which belongs to the InceptionResNet family architecturally. Ensemble-CAM is agile enough to replace any of its components when required without any further alteration in the framework. In this instance of model, it can be observed that the performance of pneumonia detection is not good enough compared to the instance of cardiomegaly. The reason can be traced out by observing the generated heatmaps on different radiographs. For instance, we found that Dense-Net169 is consistently highlighting the lower part of radiographs for its opacity to declare it pneumonia. Once found the issue with learning, we have options to either fine-tune it by changing the hyperparameters, perform further training with filtered data, or combine the bast of both. Nevertheless, we replaced it with another successor because of its availability reason. Regarding the model performance, none of the sub-models show consistency in producing finer localization for all given radiographs. This can be observed quantitatively via <xref ref-type="table" rid="T6">Table 6</xref>. Xception shows the best IoU on input f and h and least for i and j. likewise, DenseNet121&#x00027;s best IoU is for j while InceptionResNetV2 ranks first for h. Visual conformance of the stated scenario is illustrated in <xref ref-type="fig" rid="F5">Figure 5</xref> where the black outlining box has been referred as ground truth for the generated boxes. This creates the need for collecting the proposals from all classifiers and form a one that honor their mutu-al/intersected arguments. The last use-case has been demonstrated in <xref ref-type="fig" rid="F6">Figure 6</xref> for the fact that some datasets may not have any bounding box information even for test purposes and still detection task is demanded. This illustrates the application of proposed model for the detection of COVID-19 symptoms. The associated dataset does not provide ground truth values; therefore, quantitative results were not computed on the Jaccard index. For visual analysis, the model is supposed to highlight ground-glass opacity in the lungs area. Since pneumonia and COVID-19 share similar characteristics, we adapted pneumonia detecting classifiers for COVID-19. The combined results of pneumonia and COVID-19 show high variance in performance. They are not fully consistent on mutual agreement and so result in poor performance.</p>
<table-wrap position="float" id="T6">
<label>Table 6</label>
<caption><p>IoU detection score for the detection of pneumonia (ref. <xref ref-type="fig" rid="F5">Figure 5</xref>).</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left"><bold>Input Instance</bold></th>
<th valign="top" align="center"><bold>DenseNet 121</bold></th>
<th valign="top" align="center"><bold>Inception ResnetV2</bold></th>
<th valign="top" align="center"><bold>Xception</bold></th>
<th valign="top" align="center"><bold>Finalized</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">f</td>
<td valign="top" align="center">0.21</td>
<td valign="top" align="center">0.0</td>
<td valign="top" align="center">0.25</td>
<td valign="top" align="center">0.34</td>
</tr> <tr>
<td valign="top" align="left">g</td>
<td valign="top" align="center">0.14</td>
<td valign="top" align="center">0.16</td>
<td valign="top" align="center">0.17</td>
<td valign="top" align="center">0.17</td>
</tr> <tr>
<td valign="top" align="left">h</td>
<td valign="top" align="center">0.23</td>
<td valign="top" align="center">0.30</td>
<td valign="top" align="center">0.24</td>
<td valign="top" align="center">0.27</td>
</tr> <tr>
<td valign="top" align="left">i</td>
<td valign="top" align="center">0.22</td>
<td valign="top" align="center">0.29</td>
<td valign="top" align="center">0.0</td>
<td valign="top" align="center">0.31</td>
</tr> <tr>
<td valign="top" align="left">j</td>
<td valign="top" align="center">0.46</td>
<td valign="top" align="center">0.35</td>
<td valign="top" align="center">0.22</td>
<td valign="top" align="center">0.32</td>
</tr></tbody>
</table>
</table-wrap>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Estimation of bounding box annotation for Pneumonia localization quantified by IoU scores.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-07-1366415-g0005.tif"/>
</fig>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Qualitative illustration of predicting bounding boxes for COVID-19 cases.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-07-1366415-g0006.tif"/>
</fig>
<p>The proposed study differs in localization techniques such as YOLO, SSD, etc. in terms of supervision, i.e., strong vs. weak. It induces explainability while extracting interpretable features for localization task using CAM. To enhance the reliability on prediction, it offers ensembled strategies for classifiers and localizers without alteration in the base models. The performance of IoU can be discussed in two perspectives. In comparison to strong-supervised approaches, they may not touch the benchmark. However, they are highly dependent on spatial-annotated data. To overcome this dependency, weak supervised learning offers localization as an alternative approach with lower IoU. They only require image-level labels during training. For Ensemble-CAM, the cumulative results of Ensemble-CAM for given datasets show promising results in localizing abnormalities within chest radiographs. This framework is based on loosely coupled components that are replaceable and extendable to tune up the overall performance. Moreover, it offers interpretability for debugging the training deficiencies as well as justification at the prediction stage. Leveraging its interpretability features, the model also exhibits favorable results for estimation of mask and bounding box annotations by getting trained on only class labels. Taking these capabilities into account, Ensemble-CAM can play a vital role in assisting reliable diagnosis in clinical practice. Although it eliminates the need for strong annotation for training, it requires more computational resource for training and for prediction.</p>
<p>To further advance the capabilities of our Ensemble-CAM framework, we are committed to addressing the current limitations and exploring new dimensions in thoracic disease analysis. Future efforts will include the adoption of additional quantitative metrics, such as the DICE coefficient and Precision, to enhance the evaluation of localization and detection accuracy. These metrics will provide deeper insights into the model&#x00027;s performance and its effectiveness in clinical settings. Moreover, we are planning to improve the system&#x00027;s architecture by integrating unified classifiers designed to process a broader spectrum of thoracic diseases. This development aims to achieve a more comprehensive and efficient diagnostic tool, capable of providing robust analyses from chest X-ray images. By pursuing these enhancements, we intend to not only refine the diagnostic accuracy of our system but also to broaden the scope of its applicability in medical imaging, ensuring that our research contributes continuously to the evolving field of AI in healthcare.</p></sec>
<sec sec-type="conclusions" id="s5">
<title>5 Conclusion</title>
<p>The diagnosis of thoracic diseases using chest X-ray images is a critical and sensitive area. It has many risks for incorrect conclusions due to workload, skillset, and other subjective errors. Assisting medical professionals with AI powered computer aided systems using deep learning face multiple challenges. This study focuses on the challenges of inadequate data and interpretable inferences for deep learning models and presents Ensemble-CAM. It has been formulated as a unified model that utilizes the existing classifiers and class activation mapping to detect and localize thoracic disease in chest X-ray images. Three independent experiments on respective chest X-ray datasets have been conducted. During the training phase, no localization details were considered to predict bounding boxes. The generated heatmaps were evaluated both visually and quantitively. In comparison to the existing standalone models, Ensemble-CAM carries the lowest risk of incorrect classification errors when it encounters noisy features in X-ray images. This enhances the overall confidence on deep learning models for clinical practice. The theoretical contribution of Ensemble-CAM is envisioned in explainable AI and weak supervised learning spaces. This further contributes to the elevation of confidence on deep learning models to be employed in medical practice. In future studies, we aim to broaden the research scope by incorporating more image classifiers, exploring different CAM variants, and refining ensemble strategies. These enhancements are expected to provide deeper insights and higher accuracy, further leveraging the potential of AI in medical imaging and continuing the evolution of reliable, interpretable diagnostic tools for clinical practice. In future studies, we will enhance Ensemble-CAM by adding metrics, such as DICE and Precision, and developing unified classifiers. These steps aim to improve accuracy and broaden clinical use, contributing further to medical imaging and AI.</p></sec>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.</p></sec>
<sec sec-type="author-contributions" id="s7">
<title>Author contributions</title>
<p>MA: Conceptualization, Data curation, Formal analysis, Methodology, Software, Validation, Visualization, Writing &#x02013; original draft. MJ: Conceptualization, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing &#x02013; review &#x00026; editing.</p></sec>
</body>
<back>
<sec sec-type="funding-information" id="s8">
<title>Funding</title>
<p>The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aasem</surname> <given-names>M.</given-names></name> <name><surname>Iqbal</surname> <given-names>M. J.</given-names></name> <name><surname>Ahmad</surname> <given-names>I.</given-names></name> <name><surname>Alassafi</surname> <given-names>M. O.</given-names></name> <name><surname>Alhomoud</surname> <given-names>A.</given-names></name></person-group> (<year>2022</year>). <article-title>A survey on tools and techniques for localizing abnormalities in X-ray images using deep learning</article-title>. <source>Mathematics</source> <volume>10</volume>:<fpage>4765</fpage>. <pub-id pub-id-type="doi">10.3390/math10244765</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Adabi</surname> <given-names>A.</given-names></name> <name><surname>Berrada</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Peeking inside the black-box: a survey on explainable artificial intelligence</article-title>. <source>IEEE Access</source> <volume>6</volume>, <fpage>52138</fpage>&#x02013;<lpage>52160</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2018.2870052</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>An</surname> <given-names>L.</given-names></name> <name><surname>Peng</surname> <given-names>K.</given-names></name> <name><surname>Yang</surname> <given-names>X.</given-names></name> <name><surname>Huang</surname> <given-names>P.</given-names></name> <name><surname>Luo</surname> <given-names>Y.</given-names></name> <name><surname>Feng</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>E-TBNET: light deep neural network for automatic detection of tuberculosis with X-ray DR imaging</article-title>. <source>Sensors</source> <volume>22</volume>:<fpage>821</fpage>. <pub-id pub-id-type="doi">10.3390/s22030821</pub-id><pub-id pub-id-type="pmid">35161567</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anouk Stein</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <source>Rsna pneumonia detection challenge</source>.</citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Caroprese</surname> <given-names>L.</given-names></name> <name><surname>Vocaturo</surname> <given-names>E.</given-names></name> <name><surname>Zumpano</surname> <given-names>E.</given-names></name></person-group> (<year>2022</year>). <article-title>Argumentation approaches for explanaible ai in medical informatics</article-title>. <source>Intell. Syst. Appl</source>. <volume>16</volume>:<fpage>200109</fpage>. <pub-id pub-id-type="doi">10.1016/j.iswa.2022.200109</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chandola</surname> <given-names>Y.</given-names></name> <name><surname>Virmani</surname> <given-names>J.</given-names></name> <name><surname>Bhadauria</surname> <given-names>H.</given-names></name> <name><surname>Kumar</surname> <given-names>P.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Chapter 1 - Introduction,&#x0201D;</article-title> in <source>Deep Learning for Chest Radiographs, Primers in Biomedical Imaging Devices and Systems</source>, eds <person-group person-group-type="editor"><name><surname>Chandola</surname> <given-names>Y.</given-names></name> <name><surname>Virmani</surname> <given-names>J.</given-names></name> <name><surname>Bhadauria</surname> <given-names>H.</given-names></name> <name><surname>Kumar</surname> <given-names>P.</given-names></name></person-group> (<publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>Academic Press</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>33</lpage>. <pub-id pub-id-type="doi">10.1016/B978-0-323-90184-0.00003-5</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chattopadhay</surname> <given-names>A.</given-names></name> <name><surname>Sarkar</surname> <given-names>A.</given-names></name> <name><surname>Howlader</surname> <given-names>P.</given-names></name> <name><surname>Balasubramanian</surname> <given-names>V. N.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;GRAD-CAM&#x0002B;&#x0002B;: generalized gradient-based visual explanations for deep convolutional networks,&#x0201D;</article-title> in <italic>2018 IEEE Winter Conference on Applications of Computer Vision (WACV)</italic> (<publisher-loc>Lake Tahoe, NV</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>839</fpage>&#x02013;<lpage>847</lpage>. <pub-id pub-id-type="doi">10.1109/WACV.2018.00097</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chowdhury</surname> <given-names>M. E. H.</given-names></name> <name><surname>Rahman</surname> <given-names>T.</given-names></name> <name><surname>Khandakar</surname> <given-names>A.</given-names></name> <name><surname>Mazhar</surname> <given-names>R.</given-names></name> <name><surname>Kadir</surname> <given-names>M. A.</given-names></name> <name><surname>Mahbub</surname> <given-names>Z. B.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Can AI help in screening viral and COVID-19 pneumonia?</article-title> <source>IEEE Access</source> <volume>8</volume>, <fpage>132665</fpage>&#x02013;<lpage>132676</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2020.3010287</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Doi</surname> <given-names>K.</given-names></name></person-group> (<year>2007</year>). <article-title>Computer-aided diagnosis in medical imaging: historical review, current status and future potential</article-title>. <source>Comput. Med. Imaging Graph</source>. <volume>31</volume>, <fpage>198</fpage>&#x02013;<lpage>211</lpage>. <pub-id pub-id-type="doi">10.1016/j.compmedimag.2007.02.002</pub-id><pub-id pub-id-type="pmid">17349778</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>ea Shortliffe</surname> <given-names>E.</given-names></name></person-group> (<year>1975</year>). <article-title>A model of inexact reasoning</article-title>. <source>Med. Math. Biosci</source>. <volume>23</volume>, <fpage>1</fpage>&#x02013;<lpage>379</lpage>. <pub-id pub-id-type="doi">10.1016/0025-5564(75)90047-4</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Elhalawani</surname> <given-names>H.</given-names></name> <name><surname>Mak</surname> <given-names>R.</given-names></name></person-group> (<year>2021</year>). <article-title>Are artificial intelligence challenges becoming radiology&#x00027;s new &#x0201C;bee&#x00027;s knees&#x0201D;?</article-title> <source>Radiol. Artif. Intell</source>. <volume>3</volume>:<fpage>e210056</fpage>. <pub-id pub-id-type="doi">10.1148/ryai.2021210056</pub-id><pub-id pub-id-type="pmid">34138989</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Esteva</surname> <given-names>A.</given-names></name> <name><surname>Kuprel</surname> <given-names>B.</given-names></name> <name><surname>Novoa</surname> <given-names>R. A.</given-names></name> <name><surname>Ko</surname> <given-names>J.</given-names></name> <name><surname>Swetter</surname> <given-names>S. M.</given-names></name> <name><surname>Blau</surname> <given-names>H. M.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Dermatologist-level classification of skin cancer with deep neural networks</article-title>. <source>Nature</source> <volume>542</volume>, <fpage>115</fpage>&#x02013;<lpage>118</lpage>. <pub-id pub-id-type="doi">10.1038/nature21056</pub-id><pub-id pub-id-type="pmid">28117445</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Georgiou</surname> <given-names>T.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Chen</surname> <given-names>W.</given-names></name> <name><surname>Lew</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>A survey of traditional and deep learning-based feature descriptors for high dimensional data in computer vision</article-title>. <source>Int. J. Multimed. Inf. Retr</source>. <volume>9</volume>, <fpage>135</fpage>&#x02013;<lpage>170</lpage>. <pub-id pub-id-type="doi">10.1007/s13735-019-00183-w</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Giuste</surname> <given-names>F.</given-names></name> <name><surname>Shi</surname> <given-names>W.</given-names></name> <name><surname>Zhu</surname> <given-names>Y.</given-names></name> <name><surname>Naren</surname> <given-names>T.</given-names></name> <name><surname>Isgut</surname> <given-names>M.</given-names></name> <name><surname>Sha</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2023</year>). <article-title>Explainable artificial intelligence methods in combating pandemics: a systematic review</article-title>. <source>IEEE Rev. Biomed. Eng</source>. <volume>16</volume>, <fpage>5</fpage>&#x02013;<lpage>21</lpage>. <pub-id pub-id-type="doi">10.1109/RBME.2022.3185953</pub-id><pub-id pub-id-type="pmid">35737637</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guan</surname> <given-names>Q.</given-names></name> <name><surname>Huang</surname> <given-names>Y.</given-names></name> <name><surname>Zhong</surname> <given-names>Z.</given-names></name> <name><surname>Zheng</surname> <given-names>Z.</given-names></name> <name><surname>Zheng</surname> <given-names>L.</given-names></name> <name><surname>Yang</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Thorax disease classification with attention guided convolutional neural network</article-title>. <source>Pattern Recognit. Lett</source>. <volume>131</volume>, <fpage>38</fpage>&#x02013;<lpage>45</lpage>. <pub-id pub-id-type="doi">10.1016/j.patrec.2019.11.040</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Ren</surname> <given-names>S.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Identity mappings in deep residual networks,&#x0201D;</article-title> in <italic>Computer Vision</italic>- <italic>ECCV 2016. ECCV 2016</italic> (Cham: Springer), <fpage>630</fpage>&#x02013;<lpage>645</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-46493-0_38</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name></person-group> (<year>2022</year>). <article-title>&#x0201C;A survey on convolutional neural network accelerators: GPU, FPGA and ASIC,&#x0201D;</article-title> in <italic>2022 14th International Conference on Computer Research and Development (ICCRD)</italic> (<publisher-loc>Shenzhen</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>100</fpage>&#x02013;<lpage>107</lpage>. <pub-id pub-id-type="doi">10.1109/ICCRD54409.2022.9730377</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>G.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Van Der Maaten</surname> <given-names>L.</given-names></name> <name><surname>Weinberger</surname> <given-names>K. Q.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Densely connected convolutional networks,&#x0201D;</article-title> in <italic>Proceedings of the IEEE conference on computer vision and pattern recognition</italic> (<publisher-loc>Honolulu, HI</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>4700</fpage>&#x02013;<lpage>4708</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2017.243</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ion</surname> <given-names>A.</given-names></name> <name><surname>Udristoiu</surname> <given-names>S.</given-names></name> <name><surname>Stanescu</surname> <given-names>L.</given-names></name> <name><surname>Burdescu</surname> <given-names>D.</given-names></name></person-group> (<year>2009</year>). <article-title>&#x0201C;Rule-based methods for the computer assisted diagnosis of medical images,&#x0201D;</article-title> in international <italic>Conference on Advancements of Medicine and Health Care through Technology</italic> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>247</fpage>&#x02013;<lpage>250</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-04292-8_55</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Irvin</surname> <given-names>J.</given-names></name> <name><surname>Rajpurkar</surname> <given-names>P.</given-names></name> <name><surname>Ko</surname> <given-names>M.</given-names></name> <name><surname>Yu</surname> <given-names>Y.</given-names></name> <name><surname>Ciurea-Ilcus</surname> <given-names>S.</given-names></name> <name><surname>Chute</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison</article-title>. <source>Proc. AAAI Conf. Artif. Intell</source>. <volume>33</volume>, <fpage>590</fpage>&#x02013;<lpage>597</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v33i01.3301590</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Islam</surname> <given-names>M. T.</given-names></name> <name><surname>Aowal</surname> <given-names>M. A.</given-names></name> <name><surname>Minhaz</surname> <given-names>A. T.</given-names></name> <name><surname>Ashraf</surname> <given-names>K.</given-names></name></person-group> (<year>2017</year>). <article-title>Abnormality detection and localization in chest X-rays using deep convolutional neural networks</article-title>. <source>arXiv</source> [Preprint] arXiv:1705.09850. <pub-id pub-id-type="doi">10.48550/arXiv:1705.09850</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jeong</surname> <given-names>H. K.</given-names></name> <name><surname>Park</surname> <given-names>C.</given-names></name> <name><surname>Henao</surname> <given-names>R.</given-names></name> <name><surname>Kheterpal</surname> <given-names>M.</given-names></name></person-group> (<year>2022</year>). <article-title>Deep learning in dermatology: a systematic review of current approaches, outcomes and limitations</article-title>. <source>JID Innov</source>. <volume>3</volume>:<fpage>100150</fpage>. <pub-id pub-id-type="doi">10.1016/j.xjidi.2022.100150</pub-id><pub-id pub-id-type="pmid">36655135</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kovalerchuk</surname> <given-names>B.</given-names></name> <name><surname>Triantaphyllou</surname> <given-names>E.</given-names></name> <name><surname>Ruiz</surname> <given-names>J. F.</given-names></name> <name><surname>Clayton</surname> <given-names>J.</given-names></name></person-group> (<year>1997</year>). <article-title>Fuzzy logic in computer-aided breast cancer diagnosis: analysis of lobulation</article-title>. <source>Artif. Intell. Med</source>. <volume>11</volume>, <fpage>75</fpage>&#x02013;<lpage>85</lpage>. <pub-id pub-id-type="doi">10.1016/S0933-3657(97)00021-3</pub-id><pub-id pub-id-type="pmid">9267592</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krizhevsky</surname> <given-names>A.</given-names></name> <name><surname>Sutskever</surname> <given-names>I.</given-names></name> <name><surname>Hinton</surname> <given-names>G. E.</given-names></name></person-group> (<year>2017</year>). <article-title>Imagenet classification with deep convolutional neural networks</article-title>. <source>Commun. ACM</source> <volume>60</volume>, <fpage>84</fpage>&#x02013;<lpage>90</lpage>. <pub-id pub-id-type="doi">10.1145/3065386</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>LeCun</surname> <given-names>Y.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>Hinton</surname> <given-names>G.</given-names></name></person-group> (<year>2015</year>). <article-title>Deep learning</article-title>. <source>Nature</source> <volume>521</volume>, <fpage>436</fpage>&#x02013;<lpage>444</lpage>. <pub-id pub-id-type="doi">10.1038/nature14539</pub-id><pub-id pub-id-type="pmid">26017442</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>Y.</given-names></name> <name><surname>Niu</surname> <given-names>B.</given-names></name> <name><surname>Qi</surname> <given-names>Y.</given-names></name></person-group> (<year>2021</year>). <source>Survey of image classification algorithms based on deep learning 11911</source>, <fpage>422</fpage>&#x02013;<lpage>427</lpage>.</citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mahony</surname> <given-names>N. O.</given-names></name> <name><surname>Campbell</surname> <given-names>S.</given-names></name> <name><surname>Carvalho</surname> <given-names>A.</given-names></name> <name><surname>Harapanahalli</surname> <given-names>S.</given-names></name> <name><surname>Velasco-Hernandez</surname> <given-names>G.</given-names></name> <name><surname>Krpalkova</surname> <given-names>L.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>&#x0201C;Deep learning vs. traditional computer vision,&#x0201D;</article-title> <italic>Advances in Computer Vision. CVC 2019. Advances in Intelligent Systems and Computing, Vol 943</italic>. (Cham: Springer). ArXiv:1910.13796 [cs]. <pub-id pub-id-type="doi">10.1007/978-3-030-17795-9</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mittal</surname> <given-names>S.</given-names></name> <name><surname>Vaishay</surname> <given-names>S.</given-names></name></person-group> (<year>2019</year>). <article-title>A survey of techniques for optimizing deep learning on GPUs</article-title>. <source>J. Syst. Architect</source>. <volume>99</volume>:<fpage>101635</fpage>. <pub-id pub-id-type="doi">10.1016/j.sysarc.2019.101635</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nikoli&#x00107;</surname> <given-names>G. S.</given-names></name> <name><surname>Dimitrijevi&#x00107;</surname> <given-names>B. R.</given-names></name> <name><surname>Nikoli&#x00107;</surname> <given-names>T. R.</given-names></name> <name><surname>Stojcev</surname> <given-names>M. K.</given-names></name></person-group> (<year>2022</year>). <article-title>&#x0201C;A survey of three types of processing units: CPU, GPU and TPU,&#x0201D;</article-title> in <italic>2022 57th International Scientific Conference on Information, Communication and Energy Systems and Technologies (ICEST)</italic> (Ohrid: IEEE), <fpage>1</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1109/ICEST55168.2022.9828625</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ouyang</surname> <given-names>X.</given-names></name> <name><surname>Karanam</surname> <given-names>S.</given-names></name> <name><surname>Wu</surname> <given-names>Z.</given-names></name> <name><surname>Chen</surname> <given-names>T.</given-names></name> <name><surname>Huo</surname> <given-names>J.</given-names></name> <name><surname>Zhou</surname> <given-names>X. S.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Learning hierarchical attention for weakly-supervised chest X-ray abnormality localization and diagnosis</article-title>. <source>IEEE Trans. Med. Imaging</source> <volume>40</volume>, <fpage>2698</fpage>&#x02013;<lpage>2710</lpage>. <pub-id pub-id-type="doi">10.1109/TMI.2020.3042773</pub-id><pub-id pub-id-type="pmid">33284748</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Park</surname> <given-names>S. H.</given-names></name> <name><surname>Han</surname> <given-names>K.</given-names></name> <name><surname>Jang</surname> <given-names>H. Y.</given-names></name> <name><surname>Park</surname> <given-names>J. E.</given-names></name> <name><surname>Lee</surname> <given-names>J.-G.</given-names></name> <name><surname>Kim</surname> <given-names>D. W.</given-names></name> <etal/></person-group>. (<year>2023</year>). <article-title>Methods for clinical evaluation of artificial intelligence algorithms for medical diagnosis</article-title>. <source>Radiology</source> <volume>306</volume>, <fpage>20</fpage>&#x02013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1148/radiol.220182</pub-id><pub-id pub-id-type="pmid">36346314</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ponomaryov</surname> <given-names>V. I.</given-names></name> <name><surname>Almaraz-Damian</surname> <given-names>J. A.</given-names></name> <name><surname>Reyes-Reyes</surname> <given-names>R.</given-names></name> <name><surname>Cruz-Ramos</surname> <given-names>C.</given-names></name></person-group> (<year>2021</year>). <source>Chest X-ray classification using transfer learning on multi-GPU 11736</source>, <fpage>111</fpage>&#x02013;<lpage>120</lpage>.</citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Prevedello</surname> <given-names>L. M.</given-names></name> <name><surname>Halabi</surname> <given-names>S. S.</given-names></name> <name><surname>Shih</surname> <given-names>G.</given-names></name> <name><surname>Wu</surname> <given-names>C. C.</given-names></name> <name><surname>Kohli</surname> <given-names>M. D.</given-names></name> <name><surname>Chokshi</surname> <given-names>F. H.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Challenges related to artificial intelligence research in medical imaging and the importance of image analysis competitions</article-title>. <source>Radiol. Artif. Intell</source>. <volume>1</volume>:<fpage>e180031</fpage>. <pub-id pub-id-type="doi">10.1148/ryai.2019180031</pub-id><pub-id pub-id-type="pmid">33937783</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rahman</surname> <given-names>T.</given-names></name> <name><surname>Khandakar</surname> <given-names>A.</given-names></name> <name><surname>Qiblawey</surname> <given-names>Y.</given-names></name> <name><surname>Tahir</surname> <given-names>A.</given-names></name> <name><surname>Kiranyaz</surname> <given-names>S.</given-names></name> <name><surname>Kashem</surname> <given-names>S. B. A.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images</article-title>. <source>Comput. Biol. Med</source>. <volume>132</volume>:<fpage>104319</fpage>. <pub-id pub-id-type="doi">10.1016/j.compbiomed.2021.104319</pub-id><pub-id pub-id-type="pmid">33799220</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rajpurkar</surname> <given-names>P.</given-names></name> <name><surname>Irvin</surname> <given-names>J.</given-names></name> <name><surname>Zhu</surname> <given-names>K.</given-names></name> <name><surname>Yang</surname> <given-names>B.</given-names></name> <name><surname>Mehta</surname> <given-names>H.</given-names></name> <name><surname>Duan</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Chexnet: radiologist-level pneumonia detection on chest X-rays with deep learning</article-title>. <source>arXiv</source> [Preprint]. arXiv:1711.05225. <pub-id pub-id-type="doi">10.48550/arXiv.1711.05225</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rani</surname> <given-names>G.</given-names></name> <name><surname>Misra</surname> <given-names>A.</given-names></name> <name><surname>Dhaka</surname> <given-names>V. S.</given-names></name> <name><surname>Buddhi</surname> <given-names>D.</given-names></name> <name><surname>Sharma</surname> <given-names>R. K.</given-names></name> <name><surname>Zumpano</surname> <given-names>E.</given-names></name> <etal/></person-group>. (<year>2022a</year>). <article-title>A multi-modal bone suppression, lung segmentation, and classification approach for accurate COVID-19 detection using chest radiographs</article-title>. <source>Intell. Syst. Appl</source>. <volume>16</volume>:<fpage>200148</fpage>. <pub-id pub-id-type="doi">10.1016/j.iswa.2022.200148</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rani</surname> <given-names>G.</given-names></name> <name><surname>Misra</surname> <given-names>A.</given-names></name> <name><surname>Dhaka</surname> <given-names>V. S.</given-names></name> <name><surname>Zumpano</surname> <given-names>E.</given-names></name> <name><surname>Vocaturo</surname> <given-names>E.</given-names></name></person-group> (<year>2022b</year>). <article-title>Spatial feature and resolution maximization gan for bone suppression in chest radiographs</article-title>. <source>Comput. Methods Programs Biomed</source>. <volume>224</volume>:<fpage>107024</fpage>. <pub-id pub-id-type="doi">10.1016/j.cmpb.2022.107024</pub-id><pub-id pub-id-type="pmid">35863123</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rao</surname> <given-names>C.</given-names></name> <name><surname>Cao</surname> <given-names>J.</given-names></name> <name><surname>Zeng</surname> <given-names>R.</given-names></name> <name><surname>Chen</surname> <given-names>Q.</given-names></name> <name><surname>Fu</surname> <given-names>H.</given-names></name> <name><surname>Xu</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>A thorough comparison study on adversarial attacks and defenses for common thorax disease classification in chest X-rays</article-title>. <source>arXiv</source> [Preprint]. arXiv:2003.13969. <pub-id pub-id-type="doi">10.48550/arXiv:2003.13969</pub-id></citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Reyes</surname> <given-names>M.</given-names></name> <name><surname>Meier</surname> <given-names>R.</given-names></name> <name><surname>Pereira</surname> <given-names>S.</given-names></name> <name><surname>Silva</surname> <given-names>C. A.</given-names></name> <name><surname>Dahlweid</surname> <given-names>F.-M.</given-names></name> <name><surname>Tengg-Kobligk</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>On the interpretability of artificial intelligence in radiology: challenges and opportunities</article-title>. <source>Radiol. Artif. Intell</source>. <volume>2</volume>:<fpage>e190043</fpage>. <pub-id pub-id-type="doi">10.1148/ryai.2020190043</pub-id><pub-id pub-id-type="pmid">32510054</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rezvantalab</surname> <given-names>A.</given-names></name> <name><surname>Safigholi</surname> <given-names>H.</given-names></name> <name><surname>Karimijeshni</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>Dermatologist level dermoscopy skin cancer classification using different deep learning convolutional neural networks algorithms</article-title>. <source>arXiv</source> [Preprint]. arXiv:1810.10348. <pub-id pub-id-type="doi">10.48550/arXiv.1810.10348</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Rozenberg</surname> <given-names>E.</given-names></name> <name><surname>Freedman</surname> <given-names>D.</given-names></name> <name><surname>Bronstein</surname> <given-names>A.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Localization with limited annotation for chest X-rays,&#x0201D;&#x0201D;</article-title> in <source>Proceedings of the Machine Learning for Health NeurIPS Workshop</source>, eds. <person-group person-group-type="editor"><name><surname>Dalca</surname> <given-names>A. V.</given-names></name> <name><surname>McDermott</surname> <given-names>M. B. A.</given-names></name> <name><surname>Alsentzer</surname> <given-names>E.</given-names></name> <name><surname>Finlayson</surname> <given-names>S. G.</given-names></name> <name><surname>Oberst</surname> <given-names>M.</given-names></name> <name><surname>Falck</surname> <given-names>F.</given-names></name> <name><surname>Beaulieu-Jones</surname> <given-names>B.</given-names></name></person-group>, <fpage>52</fpage>&#x02013;<lpage>65</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://proceedings.mlr.press/v116/rozenberg20a/rozenberg20a.pdf">http://proceedings.mlr.press/v116/rozenberg20a/rozenberg20a.pdf</ext-link></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sandler</surname> <given-names>M.</given-names></name> <name><surname>Howard</surname> <given-names>A.</given-names></name> <name><surname>Zhu</surname> <given-names>M.</given-names></name> <name><surname>Zhmoginov</surname> <given-names>A.</given-names></name> <name><surname>Chen</surname> <given-names>L.-C.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Mobilenetv2: inverted residuals and linear bottlenecks,&#x0201D;</article-title> in <italic>Proceedings of the IEEE conference on computer vision and pattern recognition</italic>(Salt Lake City, UT: IEEE), <fpage>4510</fpage>&#x02013;<lpage>4520</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2018.00474</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Selvaraju</surname> <given-names>R. R.</given-names></name> <name><surname>Cogswell</surname> <given-names>M.</given-names></name> <name><surname>Das</surname> <given-names>A.</given-names></name> <name><surname>Vedantam</surname> <given-names>R.</given-names></name> <name><surname>Parikh</surname> <given-names>D. D. Batra</given-names></name></person-group> (<year>2020</year>). <article-title>GRAD-CAM: visual explanations from deep networks via gradient-based localization</article-title>. <source>Int. J. Comput. Vis</source>. <volume>128</volume>, <fpage>336</fpage>&#x02013;<lpage>359</lpage>. <pub-id pub-id-type="doi">10.1007/s11263-019-01228-7</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sheu</surname> <given-names>R.-K.</given-names></name> <name><surname>Pardeshi</surname> <given-names>M. S.</given-names></name></person-group> (<year>2022</year>). <article-title>A survey on medical explainable AI (XAI): recent progress, explainability approach, human interaction and scoring system</article-title>. <source>Sensors</source> <volume>22</volume>:<fpage>8068</fpage>. <pub-id pub-id-type="doi">10.3390/s22208068</pub-id><pub-id pub-id-type="pmid">36298417</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shi</surname> <given-names>W.</given-names></name> <name><surname>Tong</surname> <given-names>L.</given-names></name> <name><surname>Zhu</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>M. D.</given-names></name></person-group> (<year>2021</year>). <article-title>Covid-19 automatic diagnosis with radiographic imaging: explainable attention transfer deep neural networks</article-title>. <source>IEEE J. Biomed. Health Inform</source>. <volume>25</volume>, <fpage>2376</fpage>&#x02013;<lpage>2387</lpage>. <pub-id pub-id-type="doi">10.1109/JBHI.2021.3074893</pub-id><pub-id pub-id-type="pmid">33882010</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shrestha</surname> <given-names>A.</given-names></name> <name><surname>Mahmood</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <article-title>Review of deep learning algorithms and architectures</article-title>. <source>IEEE Access</source> <volume>7</volume>, <fpage>53040</fpage>&#x02013;<lpage>53065</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2019.2912200</pub-id></citation>
</ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Siegel</surname> <given-names>E. L.</given-names></name></person-group> (<year>2019</year>). <article-title>Making ai even smarter using ensembles: a challenge to future challenges and implications for clinical care</article-title>. <source>Radiol. Artif. Intell</source>. <volume>1</volume>:<fpage>e190187</fpage>. <pub-id pub-id-type="doi">10.1148/ryai.2019190187</pub-id><pub-id pub-id-type="pmid">33937807</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Silva</surname> <given-names>W.</given-names></name> <name><surname>Gon&#x000E7;alves</surname> <given-names>T.</given-names></name> <name><surname>H&#x000E4;rm&#x000E4;</surname> <given-names>K.</given-names></name> <name><surname>Schr&#x000F6;der</surname> <given-names>E.</given-names></name> <name><surname>Obmann</surname> <given-names>V. C.</given-names></name> <name><surname>Barroso</surname> <given-names>M. C.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Computer-aided diagnosis through medical image retrieval in radiology</article-title>. <source>Sci. Rep</source>. <volume>12</volume>:<fpage>20732</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-022-25027-2</pub-id><pub-id pub-id-type="pmid">36456605</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Simonyan</surname> <given-names>K.</given-names></name> <name><surname>Zisserman</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>Very deep convolutional networks for large-scale image recognition</article-title>. <source>arXiv</source> [Preprint]. arXiv:1409.1556. <pub-id pub-id-type="doi">10.48550/arXiv.1409.1556</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Szegedy</surname> <given-names>C.</given-names></name> <name><surname>Zaremba</surname> <given-names>W.</given-names></name> <name><surname>Sutskever</surname> <given-names>I.</given-names></name> <name><surname>Bruna</surname> <given-names>J.</given-names></name> <name><surname>Erhan</surname> <given-names>D.</given-names></name> <name><surname>Goodfellow</surname> <given-names>I.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Intriguing properties of neural networks</article-title>. <source>arXiv</source> [Preprint]. arXiv:1312.6199. <pub-id pub-id-type="doi">10.48550/arXiv.1312.6199</pub-id></citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tan</surname> <given-names>M.</given-names></name> <name><surname>Le</surname> <given-names>Q.</given-names></name></person-group> (<year>2019</year>). <source>Efficientnet: rethinking model scaling for convolutional neural networks</source>, <fpage>6105</fpage>&#x02013;<lpage>6114</lpage>.</citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Voulodimos</surname> <given-names>A.</given-names></name> <name><surname>Doulamis</surname> <given-names>N.</given-names></name> <name><surname>Doulamis</surname> <given-names>A.</given-names></name> <name><surname>Protopapadakis</surname> <given-names>E.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Deep learning for computer vision: a brief review</article-title>. <source>Comput. Intell. Neurosci</source>. <volume>2018</volume>:<fpage>7068349</fpage>. <pub-id pub-id-type="doi">10.1155/2018/7068349</pub-id><pub-id pub-id-type="pmid">29487619</pub-id></citation></ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wagstaff</surname> <given-names>K.</given-names></name></person-group> (<year>2012</year>). <article-title>Machine learning that matters</article-title>. <source>arXiv</source> [Preprint]. arXiv:1206.4656. <pub-id pub-id-type="doi">10.48550/arXiv.1206.4656</pub-id></citation>
</ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Peng</surname> <given-names>Y. L.</given-names></name> <name><surname>Lu</surname> <given-names>L.</given-names></name> <name><surname>Bagheri</surname> <given-names>Z.</given-names></name> <name><surname>Summers</surname> <given-names>R. M.</given-names></name></person-group> (<year>2017</year>).<article-title>&#x0201C;Chest X-ray 8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases,&#x00022;</article-title> in <italic>2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</italic> (Honolulu, HI: IEEE), <fpage>3462</fpage>&#x02013;<lpage>3471</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2017.369</pub-id></citation>
</ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wehbe</surname> <given-names>R. M.</given-names></name> <name><surname>Sheng</surname> <given-names>J.</given-names></name> <name><surname>Dutta</surname> <given-names>S.</given-names></name> <name><surname>Chai</surname> <given-names>S.</given-names></name> <name><surname>Dravid</surname> <given-names>A.</given-names></name> <name><surname>Barutcu</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Deepcovid-XR: an artificial intelligence algorithm to detect COVID-19 on chest radiographs trained and tested on a large us clinical data set</article-title>. <source>Radiology</source> <volume>299</volume>, <fpage>E167</fpage>&#x02013;<lpage>E176</lpage>. <pub-id pub-id-type="doi">10.1148/radiol.2020203511</pub-id><pub-id pub-id-type="pmid">33231531</pub-id></citation></ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>J.</given-names></name> <name><surname>Gur</surname> <given-names>Y.</given-names></name> <name><surname>Karargyris</surname> <given-names>A.</given-names></name> <name><surname>Syed</surname> <given-names>A. B.</given-names></name> <name><surname>Boyko</surname> <given-names>O.</given-names></name> <name><surname>Moradi</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>&#x0201C;Automatic bounding box annotation of chest X-ray data for localization of abnormalities,&#x0201D;</article-title> in <italic>2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI)</italic> (Iowa City, IA: IEEE), <fpage>799</fpage>&#x02013;<lpage>803</lpage>. <pub-id pub-id-type="doi">10.1109/ISBI45749.2020.9098482</pub-id></citation>
</ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yan</surname> <given-names>C.</given-names></name> <name><surname>Yao</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>R.</given-names></name> <name><surname>Xu</surname> <given-names>Z.</given-names></name> <name><surname>Huang</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Weakly supervised deep learning for thoracic disease classification and localization on chest X-rays,&#x0201D;</article-title> in <italic>BCB &#x00027;18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics</italic> (New York, NY: ACM), <fpage>103</fpage>&#x02013;<lpage>110</lpage>. <pub-id pub-id-type="doi">10.1145/3233547.3233573</pub-id></citation>
</ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yanase</surname> <given-names>J.</given-names></name> <name><surname>Triantaphyllou</surname> <given-names>E.</given-names></name></person-group> (<year>2019</year>). <article-title>A systematic survey of computer-aided diagnosis in medicine: past and present developments</article-title>. <source>Expert Syst. Appl</source>. <volume>138</volume>:<fpage>112821</fpage>. <pub-id pub-id-type="doi">10.1016/j.eswa.2019.112821</pub-id></citation>
</ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>A. C.</given-names></name> <name><surname>Mohajer</surname> <given-names>B.</given-names></name> <name><surname>Eng</surname> <given-names>J.</given-names></name></person-group> (<year>2022</year>). <article-title>External validation of deep learning algorithms for radiologic diagnosis: a systematic review</article-title>. <source>Radiol. Artif. Intell</source>. <volume>4</volume>:<fpage>e210064</fpage>. <pub-id pub-id-type="doi">10.1148/ryai.210064</pub-id><pub-id pub-id-type="pmid">35652114</pub-id></citation></ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zoph</surname> <given-names>B.</given-names></name> <name><surname>Vasudevan</surname> <given-names>V.</given-names></name> <name><surname>Shlens</surname> <given-names>J.</given-names></name> <name><surname>Le</surname> <given-names>Q. V.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Learning transferable architectures for scalable image recognition,&#x0201D;</article-title> in <italic>Proceedings of the IEEE conference on computer vision and pattern recognition</italic> (Salt Lake City, UT: IEEE), <fpage>8697</fpage>&#x02013;<lpage>8710</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2018.00907</pub-id></citation>
</ref>
</ref-list>
</back>
</article> 