An AI-Based Algorithm for the Automatic Classification of Thoracic Radiographs in Cats

An artificial intelligence (AI)-based computer-aided detection (CAD) algorithm to detect some of the most common radiographic findings in the feline thorax was developed and tested. The database used for training comprised radiographs acquired at two different institutions. Only correctly exposed and positioned radiographs were included in the database used for training. The presence of several radiographic findings was recorded. Consequenly, the radiographic findings included for training were: no findings, bronchial pattern, pleural effusion, mass, alveolar pattern, pneumothorax, cardiomegaly. Multi-label convolutional neural networks (CNNs) were used to develop the CAD algorithm, and the performance of two different CNN architectures, ResNet 50 and Inception V3, was compared. Both architectures had an area under the receiver operating characteristic curve (AUC) above 0.9 for alveolar pattern, bronchial pattern and pleural effusion, an AUC above 0.8 for no findings and pneumothorax, and an AUC above 0.7 for cardiomegaly. The AUC for mass was low (above 0.5) for both architectures. No significant differences were evident in the diagnostic accuracy of either architecture.


INTRODUCTION
Plain radiographs are, nowadays, a widely used diagnostic imaging tool used in the veterinary clinical routine to investigate the thorax in small animals. Despite the increasing availability of more advanced imaging techniques, such as computed tomography, plain radiographs are, in most cases, the first screening technique for thoracic disease. Furthermore, often the decision whether to perform additional, and more advanced, imaging investigations is based on the results of plain radiographs. In such a scenario, the correct interpretation of plain radiographs is paramount in prescribing successful treatment. However, the reported incidence of interpretation errors (in human medicine) for trained radiologists is still around 10-15% (1)(2)(3). The incidence of interpretation errors in veterinary medicine has not yet been reported.
Several strategies to reduce the incidence of interpretation errors have been proposed. While some non-technological solutions, such as structured reports, reductions in multitasking, and double readings, alone or combined, have been reported as decreasing inattention-related errors, on the other hand technological solutions such as eye-tracking technologies or computer-aided detection (CAD) software have also been proposed (3). The increasing availability of computers with a high computing power has driven the current research trend in the direction of the development of artificial intelligence (AI)-based CADs (4,5). In fact, AI application in radiology is a major field of research with massive ongoing investments (6,7). Currently, in human medicine, the main applications of AI on plain radiographs are related to the automatic detection of findings or pathologies (7,8). In the veterinary field, the scope to use AI-based algorithms to detect some radiographic findings has been explored in dogs over the last few years (9)(10)(11)(12). The same potential use has not yet been explored in cats.
Therefore, the aims of this study were: (1) to develop an AI-based CAD algorithm to automatically detect some of the most common radiographic findings in cats (2) to compare the diagnostic accuracy of some of the most commonly used convolutional neural network (CNN) architectures on our database.

Database Creation
All the feline radiographs performed at the Veterinary Teaching Hospital of the University of Padua between June 2010 and March 2021 and at the Pedrani Veterinary Clinic between December 2018 and November 2019 were included in the database. Three different X-ray equipments were used: at the Veterinary Teaching Hospital of the University of Padua a Kodak Point of Care CR-360 System (Carestream Health Inc.) was used from June 2010 to June 2018, whereas a FDR D-EVO 1200 G43 (Fujifilm Corporation) digital radiology (DR) is currently in use. At the Pedrani Veterinary Clinic a Isomedic RT 800 MA (Isomedic S. r. L) X-ray equipment was available. The PACS were interrogated to search for thoracic radiographs.

Radiographic Findings
All the images were individually evaluated by two of the authors, TB and AZ with over 10 and 20 years' experience in small animal diagnostic imaging, respectively. All the radiographs were evaluated simultaneously by the two authors and the interpretation was concorded following a consensus discussion. Only correctly positioned and exposed radiographs were included in the database. Furthermore, only radiographs of skeletally mature cats were used. The radiographic findings were annotated, in a standardized fashion, for each radiograph. In particular, the presence of the following was recorded: alveolar pattern, bronchial pattern, interstitial pattern, cardiomegaly, pleural effusion, pneumothorax, fracture, hernia, megaoesophagus, pneumomediastinum, and subcutaneous emphysema (pneumoderma). If a radiograph was within normal limits, a "no findings" tag was applied. Bronchial or interstitial pattern were recorded as either present or absent, and no grading score was used. Cardiomegaly was defined based on the recommendations reported in the literature (13); in particular, all cats with abnormalities in both the size and shape of the cardiac silhouette (e.g., bulging of the right or left atrium) were classified as having cardiomegaly. The cardiac silhouette was evaluated, when possible, on both lateral and ventrodorsal and dorsoventral projections. If cardiomegaly was detected in one of the available projections, all the radiographs of the same animal were classified as cardiomegaly even if cardiomegaly was not evident in all the available projections. Both diffuse and segmental megaoesophagus were classified as megaoesophagus. The site of fractures was not recorded. Radiographs showing fractures of the hind limbs were discarded.

Image Analysis
The images were stored in the lossless MHA format before being fed to the data loader. The processing pipeline started with resizing of the images to a 224 × 224 pixel format; these were then normalized to a (0-1) range. Classification was performed using a convolutional neural network (CNN), which is a group of deep-learning architectures specifically for image classification, segmentation and registration. Two different CNN architectures were evaluated, namely ResNet-50 (14) and Inception V3 (IncV3) (15). The CNN weights were initialized by pre-training the network using the ImageNet database and then they were fine-tuned on the database. A multi-label approach was opted for because different radiographic findings are usually present simultaneously. Binary cross-entropy was used as a cost function. The training hyperparameters were shared by the networks and this process was performed using the Adam optimizer together with an exponentially decaying learning rate until convergence was reached. The model state showing the epoch with the lowest loss in the validation set was chosen for further testing. The training cases were augmented by random cropping, affine warping, flips, and contrast changes. These augmentations apply random transformations to increase the dataset diversity. This is a standard, and commonly used, technique to improve the generalizability of deep networks by reducing the risk of over fitting the training set. The images were randomly splitted into a training, validation, and test set with a 8:1:1 ratio, respectively; an algorithm maintaining the same ratio among different tags in training, validation, and test set was used. The information regarding the institution was not used for training. The performance of the trained model on the test set is reported. No crossvalidation was used. A purpose-built deep-learning workstation equipped with four graphics processing units was utilized

Database
One thousand six hundred and thirty-seven latero-lateral (LL) radiographs and 1,105 ventro-dorsal (VD) radiographs were retrieved. 575 LL radiographs and 426 VD radiographs were discarded due to poor positioning or incorrect exposure. Consequently, the database was composed of 1,062 LL and 679 VD radiographs. Due to the limited number of available VD radiographs, the CNN was trained only on the LL radiographs. The number of radiographs showing each radiographic finding is reported in Table 1.

Selection of Radiographic Findings
Some of the included radiographic findings (fracture, hernia, megaoesophagus, interstitial pattern, pneumomediastinum, and pneumoderma) were scarcely represented in the database and, therefore, were not included in training. Consequently, the findings included for training were: no findings, bronchial pattern, pleural effusion, mass, alveolar pattern, pneumothorax, cardiomegaly (Figure 1).

Classification Results
The complete classification results in the test set for ResNet 50 and for IncV3 are reported in Tables 2, 3, respectively. The results of the De Long test showed no significant differences in the performances of either architecture for all the included radiographic findings. The overall accuracy in the test set was 81.8% for InceptionV3 and 84.1% for ResNet50. A visual representation of the analysis results is provided in Figure 2.

DISCUSSION
An AI-based algorithm for the automatic detection of some of the most common radiographic findings in cats was developed. The high classification accuracy on the test set for some of the included radiographic findings, in particular alveolar pattern, bronchial pattern, no findings, pleural effusion, and pneumothorax, suggests that the developed CAD algorithm could potentially be used to assist veterinarians in interpreting feline thoracic radiographs. To fully investigate the usefulness of the proposed CNN, the error rate of the veterinarians in the detection of the above radiographic findings should also be investigated. Interestingly, the accuracy of this CAD in detecting the above-mentioned radiographic findings was comparable to the results reported for dogs (9,10) and humans (16,17), even though the database used for training was significantly smaller. A possible explanation is that the greater homogeneity in terms of body size and shape of cats might have reduced the intrinsic variability in the database thus enabling the CAD to achieve a high accuracy on the test set despite the reduced size of the database. The accuracy for mass detection was low for both the tested CNN architectures. Interestingly, also in dogs (9, 10) the accuracy of CNNs in the detection of masses is lower than for the other radiographic findings. Instead, the reported accuracy for such a radiographic finding is reported to be high in human studies (16). It is the authors' opinion that, such a difference is, most likely, due to the presence of several mass-like structures (e.g., nipples, degeneration of costochondral joints) in normal canine and feline thoracic radiographs. Another possible explanation is that such a low accuracy might be related to the combination of the variable dimensions and locations of the masses within the thorax and the limited size of the training database.
Accuracy in detection of cardiomegaly was also lower than for the other radiographic findings. The radiographic identification of cardiomegaly in cats is challenging and, although some guidelines are currently available (13), its interpretation is often very subjective, especially in mild cases. Left atrial enlargement (the so called "valentine" heart) is a common finding in cats with cardiac disease (13) and is often better detected in dorsoventral rather than lateral projection. The low accuracy in detecting cardiomegaly evident in this study might be related to the fact that the CNN was trained only on lateral images and that the information on dorsoventral projections was unavailable during training. More in general, current guidelines on the classification, diagnosis and management of cardiomyopathies in cats (18) state that radiology is an insensitive method for detecting cardiac disease in cats and that cats with congestive heart failure may present radiologically normal cardiac silhouettes. A possible way to overcome such a limitation could be to train a CNN on feline thoracic radiographs classified based on the results of echocardiographic examinations, and then to test whether this CNN provides more accurate results than an experienced operator.
Recent studies (19) highlighted that, when trained on databases from different institutions, the generalization performances of CNNs depend on the disease prevalence in each database. Furthermore, the above-mentioned study also highlighted that CNNs trained on pooled data from different sites performed better on the data from these sites but not on external data. In the present study, the database used to train the network contained pooled data from two different institutions using three different X-ray equipments. Due to the limited size of the available database, the CAD performance differences regarding the data from each individual institution were not tested. However, training the models on pooled data from different institutions is reported as providing better generalization performances than training the model on data generated from a single institution (19).
The two CNN architectures tested in this study, ResNet 50 and IncV3, have been widely used both in human (20) and in veterinary medicine (21,22) for the classification of diagnostic images. Both architectures have been engineered for the classification of everyday images and do not contain any radiology specific features. Furthermore, to improve performance, both CNNs were pre-trained on a large-scale database of everyday images, called ImageNet (www.image-net.org), and then fine-tuned on the feline database. It is the authors' opinion that the high classification accuracy achieved in the test set for several of the included radiographic findings might be, at least partially, due to the high standardization of the radiographic images. In fact, everyday images are often messy, and the same subject might come in different sizes; different shapes and colors might be in the foreground or background and so on. Instead, radiographs are acquired by skilled personnel in a highly standardized fashion. Interestingly, no statistically significant differences were evident in the performances of the two CNN architectures for any of the included radiographic findings.
A limitation of this study is that, due to the small database size, the number of radiographic findings included to train the CAD algorithm is smaller compared to those included in canine (9)(10)(11)(12) and human studies (16). It is the authors' opinion that, at this stage of development, the proposed CAD could be more useful during emergency assistance, where the prompt identification of some of the included radiographic findings, in particular alveolar pattern, pleural effusion, and pneumothorax, is very important. The main advantage of using CNNs to develop CADs is that they are relatively easy to implement. Indeed, once the parsing modes have been defined, the individual radiographic findings can be directly selected or excluded for training.
To improve classification performances only correctly positioned and exposed radiographs were included in the database used for training. Therefore, the performance of the developed algorithm might be slightly different when used on technically incorrect images. Another limitation is that cross validation was not used and, given the limited size of the available database, different results are to be expected if other random splits are used. On the other hand, cross validation is not commonly used when CADs for the automatic classification of thoracic radiographs are developed, even in case of small sized data bases for training (8,12).

CONCLUSIONS
A CAD algorithm for the automatic detection of some radiographic findings in feline thoracic radiographs is proposed. This CAD showed a high accuracy in the identification of alveolar pattern, bronchial pattern, no findings, pleural effusion, and pneumothorax. The accuracy in identifying cardiomegaly was moderate whereas the accuracy in the identification of masses was low. The use of a larger database for training could, potentially, provide more accurate results. The developed CAD can be easily upgraded by simply adding new images to the database used for training, validation, and testing. Further testing on images acquired with different of X-Ray equipment will provide more insights in the performances of the developed CAD.

DATA AVAILABILITY STATEMENT
The data sets generated during and analyzed during the current study are not publicly available because they are property of the Veterinary Teaching Hospital of the University of Padua but are available from the corresponding author on reasonable request.

ETHICS STATEMENT
This study was conducted respecting the Italian law 26/2014 (that transposes the EU directive 2010/63/EU). As the data used in this study were part of routine clinical activity, no Ethical Committee approval was needed. Informed consent regarding the treatment of personal data was obtained from the owners.

AUTHOR CONTRIBUTIONS
TB conceived the study, evaluated the radiographs, and drafted the manuscript. MW and HM developed the CNNs and drafted the manuscript. AZ, FT, CD, and FS evaluated the radiographs and drafted the manuscript. All authors contributed to the article and approved the submitted version.