COVID-Net CXR-2: An Enhanced Deep Convolutional Neural Network Design for Detection of COVID-19 Cases From Chest X-ray Images

As the COVID-19 pandemic devastates globally, the use of chest X-ray (CXR) imaging as a complimentary screening strategy to RT-PCR testing continues to grow given its routine clinical use for respiratory complaint. As part of the COVID-Net open source initiative, we introduce COVID-Net CXR-2, an enhanced deep convolutional neural network design for COVID-19 detection from CXR images built using a greater quantity and diversity of patients than the original COVID-Net. We also introduce a new benchmark dataset composed of 19,203 CXR images from a multinational cohort of 16,656 patients from at least 51 countries, making it the largest, most diverse COVID-19 CXR dataset in open access form. The COVID-Net CXR-2 network achieves sensitivity and positive predictive value of 95.5 and 97.0%, respectively, and was audited in a transparent and responsible manner. Explainability-driven performance validation was used during auditing to gain deeper insights in its decision-making behavior and to ensure clinically relevant factors are leveraged for improving trust in its usage. Radiologist validation was also conducted, where select cases were reviewed and reported on by two board-certified radiologists with over 10 and 19 years of experience, respectively, and showed that the critical factors leveraged by COVID-Net CXR-2 are consistent with radiologist interpretations.


INTRODUCTION
As the global devastation of the coronavirus disease 2019 (COVID-19) pandemic continues, the need for effective screening methods has grown. A crucial step in the containment of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus causing the COVID-19 pandemic is effective screening of patients in order to provide immediate treatment, care, and isolation precautions. While the main screening method is reverse transcription polymerase chain reaction (RT-PCR) testing (1), recent studies have shown that the sensitivity of such tests can be relatively low and highly variable depending on how or when the specimen was collected (2)(3)(4)(5).
Chest X-ray (CXR) radiography screenings are a complimentary screening method to RT-PCR that has seen growing interest and increased usage in clinical institutes around the world. Studies have shown characteristic pulmonary abnormalities in SARS-CoV-2 positive cases such as groundglass opacities, bilateral abnormalities, and interstitial abnormalities (6)(7)(8)(9)(10). Compared to other imaging modalities, CXR equipment is readily available in many healthcare facilities, is relatively easy to decontaminate, and can be used in isolation rooms to reduce transmission risk (11) due to the availability of portable CXR imaging systems (12). More importantly, CXR imaging is a routine clinical procedure for respiratory complaint (13), and thus is frequently conducted in parallel with viral testing to reduce patient volume.
Despite the growing interest and usage of CXR radiography screenings in the COVID-19 clinical workflow, a challenge faced by clinicians and radiologists during CXR screenings is differentiating between SARS-CoV-2 positive and negative infections. More specifically, it has been found that potential indicators for SARS-CoV-2 infections may also be present in non-SARS-CoV-2 infections, and the differences in how they present can also be quite subtle. As such, computeraided screening systems are highly desired for assisting frontline healthcare workers to streamline the COVID-19 clinical workflow by more rapidly and accurately interpreting CXR images to screen for COVID-19 cases.
In this work, we introduce COVID-Net CXR-2, an enhanced deep convolutional neural network design for COVID-19 chest X-ray detection built on a greater quantity and diversity of patients than the original COVID-Net network design (14). To facilitate this, we introduce a benchmark dataset that is, to the best of the authors' knowledge, the largest, most diverse open access COVID-19 CXR cohort, with patients from at least 51 countries. We leverage explainability-driven performance validation to audit COVID-Net CXR-2 in a transparent and responsible manner to ensure the decision-making behavior is based on relevant visual indicators for improving trust in its usage. Furthermore, radiologist validation was conducted where select cases were reviewed and reported on by two board-certified radiologists with over 10 and 19 years of experience, respectively.
While not a production-ready solution, we hope the open-source, open-access release of COVID-Net CXR-2 and the respective CXR benchmark dataset will help encourage researchers, clinical scientists, and citizen scientists to accelerate advancements and innovations in the fight against the pandemic.
The paper is organized as follows. Section 2 describes the underlying methodology behind the construction of the proposed COVID-Net CXR-2 as well as the preparation of the benchmark dataset. Section 3 presents and discusses the efficacy and decision-making behavior of COVID-Net CXR-2 from both a quantitative perspective as well as a qualitative perspective. Finally, conclusions are drawn in Section 4.

METHODOLOGY
In this study, we introduce COVID-Net CXR-2, an enhanced deep convolutional neural network design for detection of COVID-19 from chest X-ray images. To train and test the network, we further introduce a new CXR benchmark dataset which represents the largest, most diverse open access COVID-19 CXR dataset available, spanning a multinational patient cohort from at least 51 countries. All methods and experimental protocols in this study were carried out in accordance with the Tri-Council Policy Statement (TCPS2) and the University of Waterloo Research Integrity guidelines. The study has received ethics clearance from the University of Waterloo (42235). Data used in this study was curated by several organizations and initiatives from around the world with their own respective ethics clearance and informed consent. The details regarding data preparation, network design, and explainability-driven performance validation are described below.

Benchmark Dataset Preparation
To train and evaluate COVID-Net CXR-2, we first created a new CXR benchmark dataset with example images shown in Figure 1, unifying patient cohorts from several organizations and initiatives from around the world (50)(51)(52)(53)(54)(55)(56). The new CXR benchmark dataset comprises 19,203 CXR images from a multinational cohort of 16,656 patients from at least 51 countries, making it the largest, most diverse COVID-19 CXR dataset available in open access form to the best of the authors' knowledge. In terms of data and patient distribution, there are a total of 5,210 images from 2,815 SARS-CoV-2 positive patients and 13,993 images from 13,851 SARS-CoV-2 negative patients. The negative patient cases comprise of both no pneumonia and non-SARS-CoV-2 pneumonia patient cases, with 8,418 no pneumonia images from 8,300 patients and 5,575 non-SARS-CoV-2 pneumonia images from 5,551 patients. The distribution of CXR images in the benchmark dataset for SARS-CoV-2 negative and positive cases is shown in Figure 2, with respective patient distribution shown in Figure 3. Select patient cases from the benchmark dataset were reviewed and reported on by two board-certified radiologists with 10 and 19 years of experience, respectively.
The COVID-Net CXR-2 network is evaluated on a balanced test set of 200 SARS-CoV-2 positive images from 178 patients  and 200 SARS-CoV-2 negative images from 100 no pneumonia and 100 non-SARS-CoV-2 pneumonia patient cases. The test images were randomly selected from international patient cohorts curated by the Radiological Society of North America (RSNA) (50,51), with the cohorts collected and expertly annotated by an international group of scientists and radiologists from different institutes around the world. The test set was selected in such a way to ensure no patient overlap between training and test sets. Table 1 summarizes the demographic variables and imaging protocol variables of the CXR data in the benchmark dataset. It can be observed that the patient cases in the cohort used in the benchmark dataset are distributed across the different age groups, with the mean age being 46.89 and the highest number of patients in the cohort being between the ages of 50-59.
The benchmark dataset, along with all data generation and preparation scripts, are available in an open source manner at http://www.covid-net.ml.

Network Design and Learning
Leveraging the aforementioned benchmark dataset, we built COVID-Net CXR-2 to be tailored for COVID-19 case detection from CXR images using machine-driven design. The machinedriven design exploration strategy automatically discovers highly customized and unique macroarchitecture and microarchitecture designs to optimize the trade-off between accuracy and efficiency. More specifically, the concept of generative synthesis (57) was leveraged, where the macroarchitecture and microarchitecture designs of a tailored deep neural network architecture are determined by an optimal generator G whose generated deep neural network architectures {N s |s ∈ S} maximize a universal performance function U [e.g., (58)], with constraints imposed on a set of operational requirements as defined by an indicator function 1 r (·), where S denotes a set of seeds to the generator. For the purpose of building COVID-Net CXR-2, the set of constraints imposed via indicator function 1 r (·) were: (1) sensitivity ≥ 95%, and (2) positive predictive value (PPV) ≥ 95%. A number of observations can be made about the proposed COVID-Net CXR-2 deep convolutional neural network architecture design shown in Figure 4. It can be observed that the proposed COVID-Net CXR-2 deep convolutional neural network possesses a light-weight network architecture design that exhibits notable diversity from both a macroarchitecture and microarchitecture design perspective. More specifically, the COVID-Net CXR-2 architecture design possesses a diverse mix of point-wise and depth-wise convolutions, and a very sparing use of conventional convolutions at the input stage of the architecture. In particular, the network design leverages light-weight design patterns in the form of project-replicationprojection-expansion (PRPE) patterns to provide enhanced representational capabilities while maintaining low architectural and computational complexities. More specifically, the PRPE design pattern replicates the input feature representations and disentangles these learned features through the use of depthwise convolutions before mixing them via a series of pointwise convolutions. This allows for more efficient representational learning for high efficiency while maintaining high representational performance. The key difference between PRPE and PRPE-S blocks is that PRPE-S blocks further reduces spatial dimensionality through the introduction of a strided depthwise convolution to further improve computational efficiency at appropriate positions in the network architecture. Furthermore, sparse use of long-range connections can also be observed in the network architecture design to strike a good balance between architectural and computational efficiency and representational capacity. The strong balance between efficiency and accuracy achieved by the proposed network highlights the utility of machine-driven design exploration for tailored architectures beyond the capabilities of manual, human designs. It is important to note that compared to the COVID-Net network architecture (14), the COVID-Net CXR-2 architecture design has very different macroarchitecture designs and microarchitecture designs, ranging from the use of replicators within a PRPE design pattern to greatly reduce computational complexity, leveraging two different forms of design patterns (PRPE and PRPE-S) as opposed to the more limited design patterns in COVID-Net, and much fewer parameters at each stage of the network architecture for greater computational and architectural efficiency. Finally, it is important to note that the network design leverages a more complex flatten layer as opposed to a less complex global average pooling layer, which illustrates how the machine-driven design exploration strategy takes into account different factors when optimizing for overall trade-offs between accuracy and efficiency.
Training was conducted using a binary cross-entropy loss and Adam optimization with learning rate of 1e-5 on a batch size of 8 for 40 epochs. The final model was selected by tracking the validation accuracy throughout training and employing early stopping. All construction, training, and evaluation are conducted in the TensorFlow deep learning framework. As a pre-processing step, the CXR images were cropped (top 8% of the image) prior to training and testing to better mitigate commonly-found embedded textual information. The CXR images were then resampled to 480 × 480 and normalized to the range [0, 1] via division by 255. Furthermore, data augmentation was leveraged during training with the following augmentation types: translation (±10% in x and y directions), rotation (±10 • ), horizontal flip, zoom (±15%), and intensity shift (±10%). A batch re-balancing strategy was introduced to promote better distribution of SARS-CoV-2 positive cases and SARS-CoV-2 negative cases at a batch level.
The COVID-Net CXR-2 network and associated scripts are available in an open source manner at http://www.covid-net.ml.

FIGURE 4 |
The proposed COVID-Net CXR-2 architecture design. The COVID-Net design exhibits high architectural diversity and sparse long-range connectivity, with macro and microarchitecture designs tailored specifically for the detection of COVID-19 from chest X-ray images. The network design leverages light-weight design patterns in the form of projection-replication-projection-expansion (PRPE) patterns to provide enhanced representational capabilities while maintaining low architectural and computational complexities.

Explainability-Driven Performance Validation
The trained COVID-Net CXR-2 network was audited to gain deeper insights into its decision-making behavior and ensure that it is driven by clinically relevant indicators rather than erroneous cues such as imaging artifacts and embedded metadata. We leveraged GSInquire (59) to conduct explainability-driven performance validation as it was shown to provide state-of-the-art explanations compared to other methods in literature, including gradient-based explainability methods such as Expected Gradients which has been shown to be superior for explainability than methods like Grad-CAM (60). In this work, we define explainability as the ability to obtain an explanation on the key factors from the input data that the model relied on to produce an output prediction and decision, presented in a way that a human can understand and interpret the results. More specifically, GSInquire takes advantage of the generative synthesis (57) strategy leveraged during machine-driven design exploration to identify and visualize the critical factors that COVID-Net CXR-2 uses to make predictions. Insights are gained through an inquisitor I within a generator-inquisitor pair {G, I}, where the generator G is the optimal generator used to generate COVID-Net CXR-2 as shown in Equation (1). More specifically, the inquisitor function is defined as I(G; θ I ), parameterized by θ I that given the generator G, produces a set of parameter changes denoted by θ G = I(G). The insights gained by the inquisitor I are not only used to improve the generated deep neural networks but can also be leveraged to interpret decisions made by the generated network.
Compared to other explainability methods in literature such as Grad-CAM (60) that produce relative heat maps that visualize variations in potential importance within an image, GSInquire has a unique capability of surfacing specific critical factors within an image that quantitatively impact the decisions made by the deep neural network. This makes the explanations easier to interpret objectively and better reflects the decision-making process of the deep neural network for validation purposes.
Explainability-driven performance validation is crucial for improved transparency and trust, particularly in healthcare applications such as clinical decision support. It can also help clinicians to uncover new insights into key visual indicators associated with COVID-19 to improve screening accuracy. To further audit the results for COVID-Net CXR-2, select patient cases from the explainability-driven performance validation were further reviewed and reported on by two board-certified radiologists (A.S. and A.A.). The first radiologist (A.S.) has over 10 years of experience, while the second radiologist (A.A.) has over 19 years of radiology experience.

RESULTS AND DISCUSSION
To explore and evaluate the efficacy of the proposed COVID-Net CXR-2 deep convolutional neural network design for detecting COVID-19 cases from CXR images, we conducted a quantitative performance analysis to assess its architectural and computational complexity as well as its detection performance on the benchmark dataset. We further explored its decision-making behavior using an explainability-driven performance validation approach to audit COVID-Net CXR-2 in a transparent and responsible manner. The quantitative and qualitative results are presented and discussed in detail below.

Quantitative Analysis
Let us first explore the quantitative performance and underlying complexity of the proposed COVID-Net CXR-2 deep neural network architecture tailored for the detection of COVID-19 cases from CXR images. For comparison purposes, we also provide quantitative results on the test data for COVID-Net (14), which was shown to provide state-of-the-art performance for COVID-19 detection when compared to other methods in research literature, and other state-of-the-art architectures commonly leveraged in computer vision including InceptionResNetV2 (61), ResNet-50 (62), InceptionV3 (63), and DenseNet201 (64). The COVID-Net CXR-2 network builds upon the originally proposed COVID-Net architecture by offering a tailored network for SARS-CoV-2 detection of lower complexity and higher detection performance as a result of training on a larger, more diversified dataset. In addition, the COVID-Net CXR-2 network is leveraged for binary SARS-CoV-2 positive and negative detection in regard to physician priorities, while the original COVID-Net network is utilized for normal, pneumonia, and SARS-CoV-2 multi-class classification. The state-of-the-art deep neural networks referenced in this study were trained using the same proposed CXR benchmark dataset, with the same hyperparameters including binary cross-entropy loss and Adam optimizer tuned to a learning rate of 1e-5 and batch size of 8 for 40 epochs for optimal performance. The architectural and computational complexity of COVID-Net CXR-2 in comparison is shown in Table 2, with quantitative performance results shown in Table 3. It can be observed from the results that the COVID-Net CXR-2 network achieves overall the highest SARS-CoV-2 sensitivity, area under ROC curve (AUC), and accuracy in comparison to other state-of-the-art architectures while maintaining significantly lower network complexity. Specifically, the COVID-Net CXR-2 network achieved an architectural complexity of 8.8M parameters and computational complexity of 5.6G MACs that is ∼25 and ∼84% lower than the least and most complex comparison architectures of the COVID-Net and InceptionResNetV2. In addition, the proposed COVID-Net CXR-2 architecture achieved the highest test accuracy of 96.3%, highest area under ROC curve (AUC) of 99.4%, and highest SARS-CoV-2 sensitivity of 95.5%. In comparison to the other networks, the COVID-Net CXR-2 achieved 2% higher sensitivity than the next performing COVI D-Net architecture, and 10.2% higher than the ResNet-50 deep network. In respect to positive predictive value (PPV), the COVID-Net CXR-2 network achieved a lower performance than the COVID-Net network at 97.0% test PPV, but was still able to outperform the other comparison architectures by a 1.6% minimum. This trade-off of higher sensitivity gained by COVID-Net CXR-2 compared to COVID-Net in exchange for a decrease in PPV (which is still quite high for COVID-Net CXR-2 at 97.0%) is a reasonable one given that a higher sensitivity results in fewer missed SARS-CoV-2 positive patient cases during the screening process. This is very important from a clinical perspective in controlling the spread of the SARS-CoV-2 virus during the on-going COVID-19 pandemic in light of the new highly infectious variants. Finally, Table 4 and Figure 5 provides a more detailed picture of the performance of COVID-Net CXR-2 via the confusion matrix and receiver operator characteristic (ROC) curve.

Qualitative Analysis
Examples of patient cases and the associated critical factors identified by GSInquire as the driving factors behind the decision-making behavior of COVID-Net CXR-2 are shown in Figure 6. It can be observed that the network primarily leverages areas in the lungs of the CXR images and is not relying on incorrect factors such as artifacts outside of the body, motion artifacts, and embedded markup symbols. From further investigation into the correctly detected COVID-19 cases, the critical factors typically identified correspond to clinically relevant visual cues such as ground-glass opacities, bilateral abnormalities, and interstitial abnormalities. These observations indicate that the network's decision-making process is generally consistent with clinical interpretation.
This explainability-driven performance validation process is important for a number of important reasons from the perspectives of transparency, dependability, and trust. First of all, this process enabled us to audit and validate that COVID-Net CXR-2 exhibits dependable decision-making behavior since it is not only guided by clinically relevant visual indicators, but more importantly it is not dependent on erroneous visual indicators such as imaging artifacts, embedded markup symbols, and embedded text in the CXR images. This ensures the network does not make the right decisions for the wrong reasons. Second, this validation process allows for the discovery and identification of potential new insights into what types of clinically relevant visual indicators are particularly useful for differentiating between SARS-CoV-2 infections and non-SARS-CoV-2 infections. Such discoveries could be useful information for aiding clinicians and radiologists in better detecting SARS-CoV-2 infection cases during the clinical decision process. Finally, by validating the behavior of COVID-Net CXR-2 in a transparent and responsible manner, one can provide greater transparency and garner greater trust for clinicians and radiologists during usage in their screening process to make faster yet accurate assessments.
These quantitative and qualitative results show that COVID-Net CXR-2 not only provides strong COVID-19 detection performance, but also exhibits clinically relevant decisionmaking behavior.

Radiologist Analysis
The expert radiologist findings and observations with regards to the critical factors identified by GSInquire for select patient cases shown in Figure 6 are as follows. In both cases, COVID-Net CXR-2 detected them to be patients with SARS-CoV-2 infection, which were clinically confirmed. Case 1. According to radiologist findings, it was observed by both radiologists that there is an opacity at the right lung base, which is consistent with one of the identified critical factors leveraged by COVID-Net CXR-2. Additional imaging would be recommended by both radiologists.
Case 2. According to radiologist findings, it was observed by both radiologists that there are opacities in the right midlung and left paratracheal region that coincide with the identified critical factors leveraged by COVID-Net CXR-2 in that region. Additional imaging would be recommended by one of the radiologists.
As such, based on the radiologist findings and observations on the two patient cases, it was demonstrated that several of the identified critical factors leveraged by COVID-Net CXR-2 are consistent with radiologist interpretation.

CONCLUSION
In this study, we introduced COVID-Net CXR-2, an enhanced deep convolutional neural network design tailored for COVID-19 detection from CXR images that is built based on a greater quantity and diversity of patient cases than the original COVID-Net. A new benchmark dataset of CXR images representing a multinational cohort of 16,656 patients from at least 51 countries was also introduced, which is the largest, most diverse COVID-19 CXR dataset in open access form to the best of the authors' knowledge. Experimental results demonstrate that COVID-Net CXR-2 can not only achieve strong COVID-19 detection performance in terms of accuracy, sensitivity, and PPV, but also exhibit behavior consistent with clinical interpretation during an explainability-driven performance validation process, which was further validated based on radiologist interpretation. The hope is that the release of COVID-Net CXR-2 and its respective benchmark dataset in an open source manner will help encourage researchers, clinical scientists, and citizen scientists to accelerate advancements and innovations in the fight against the pandemic. Several potential limitations with the proposed work include demographic imbalances that can affect how the network may make decisions for particularly patient groups, and limited data quantity in the current benchmark dataset that may lead to potential biases in the network's decision-making process. Further work involves the continued improvement of the benchmark dataset as well as architecture design, as well as exploration into other clinical workflow tasks (e.g., severity assessment, treatment planning, resource allocation, etc.) as well as other imaging modalities (e.g., computed tomography, pointof-care ultrasound, etc.). Furthermore, we aim to conduct more comprehensive auditing of both the benchmark dataset as well as the deep neural network to identify potential decision-making biases and potential gaps in the trustworthiness in the decisionmaking process, as well as conduct cross-validation experiments on a larger benchmark dataset as it becomes available.

DATA AVAILABILITY STATEMENT
The benchmark dataset, along with all data generation and preparation scripts, are available in an open source manner at: http://www.covid-net.ml.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the University of Waterloo (42235). Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
MP, NT, AC, and AW conceived the experiments. MP, NT, AZ, SS, HA, and HG conducted the experiments. AA and AS reviewed and reported on select patient cases and corresponding explain ability results illustrating model's decision-making behavior. All authors analyzed the results. All authors reviewed the manuscript. All authors contributed to the article and approved the submitted version.