Edited by: Satrajit S. Ghosh, Massachusetts Institute of Technology (MIT), USA
Reviewed by: Graham J. Galloway, Translational Research Institute, Australia; Ender Konukoglu, ETH Zurich, Switzerland
*Correspondence: Ricardo A. Pizarro
Xi Cheng
Venkata S. Mattay
†These authors have contributed equally to this work and shared first authorship.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
High-resolution three-dimensional magnetic resonance imaging (3D-MRI) is being increasingly used to delineate morphological changes underlying neuropsychiatric disorders. Unfortunately, artifacts frequently compromise the utility of 3D-MRI yielding irreproducible results, from both type I and type II errors. It is therefore critical to screen 3D-MRIs for artifacts before use. Currently, quality assessment involves slice-wise visual inspection of 3D-MRI volumes, a procedure that is both subjective and time consuming. Automating the quality rating of 3D-MRI could improve the efficiency and reproducibility of the procedure. The present study is one of the first efforts to apply a support vector machine (SVM) algorithm in the quality assessment of structural brain images, using global and region of interest (ROI) automated image quality features developed in-house. SVM is a supervised machine-learning algorithm that can predict the category of test datasets based on the knowledge acquired from a learning dataset. The performance (accuracy) of the automated SVM approach was assessed, by comparing the SVM-predicted quality labels to investigator-determined quality labels. The accuracy for classifying 1457 3D-MRI volumes from our database using the SVM approach is around 80%. These results are promising and illustrate the possibility of using SVM as an automated quality assessment tool for 3D-MRI.
High-resolution T1-weighted structural three-dimensional brain magnetic resonance imaging (3D-MRI
Currently, the primary method to assess the quality of 3D-MRI is visual inspection, which can be subjective. Gardner et al. (
Mortamet et al. (
We investigated the utility of a machine-learning algorithm on multi-dimensional, non-linear classification to overcome errors that can arise from a univariate approach. Specifically, the application of a supervised classification algorithm based on a support vector machine (SVM) (Burges,
The analysis pipeline is outlined in Figure
The visual inspection procedure was performed in two stages as shown in Figure
In Stage II, the green and red 3D-MRI volumes were labeled “usable” and “not-usable,” respectively. A group of 5–9 investigators further classified the yellow 3D-MRI volumes as “usable” or “not-usable” based on a majority vote. At the end of Stage II, the 1457 3D-MRI volumes of the dataset were classified by the visual inspection into 1267 “usable” 3D-MRIs and 190 “not-usable” 3D-MRIs. The final evaluation was used as the reference point in order to train and assess the performance of the SVM classifier.
In the remainder of the Methods,
The accuracy of the classification depends on the features used; hence, defining features is a crucial step in SVM classification. These features fall into two broad categories: volumetric features and artifact-specific features. After the 1457 3D-MRI volumes were preprocessed, these features were computed and used in the SVM classifier algorithm.
First, three different features were computed to quantify the artifacts in each 3D-MRI. The initial goal was to define a volumetric measure that quantifies artifacts throughout the 3D-MRI volume, similar to Mortamet et al. (
A histogram of the intensity values over each 3D-MRI was computed. The histogram of a 3D-MRI has profiles with peaks representing the gray matter, white matter, cerebral spinal fluid (CSF), skull, fat, and background. We expected high contrast 3D-MRIs to have two well-defined and distinct gray matter and white matter profiles over the histogram, and a low contrast 3D-MRIs to have indistinguishable gray matter and white matter voxels resulting in overlapping profiles in the histogram.
The 3D-MRI histogram,
Three class probability maps,
The class map histogram,
The third volumetric feature is a single statistic, denoted as
Artifact-specific features were defined to target the three most prominent artifacts: eye movement, ringing, and aliasing. As shown in Table
No artifacts | 665 | 46 |
Eye movement or ringing or aliasing | 746 | 51 |
Other artifacts | 46 | 3 |
Total | 1457 | 100 |
A
Eye-movement generated artifacts consist of excess noise both inside the brain, and in front of the eyes. To quantify the amount of noise generated by eye movement, a region located directly in front of the eyes was masked. To that end, the location and dimensions of the eye sockets in normalized space (
The
Ringing artifacts contain noise around the top of the brain and outside the head as well. To quantify the ringing artifact, the
The
The aliasing artifact causes the nose and other facial features to contaminate the posterior part of the brain, as illustrated in Figure
Each iteration of the supervised classification procedure consisted of a training stage and a testing stage and is illustrated in Figure
First, the reliability of the artifact-specific features (ASFs) was investigated to assess how well these features captured the artifacts. Second, SVM performance was analyzed by computing the accuracy of the correctly predicted 3D-MRI when compared to the visually inspected category.
ASFs were developed using 3D-MRI volumes that were tagged yellow, after the stage I of the visual inspection procedure
ASF1 was compared to the 4° rating scale by summarizing the distributions as a bar plot in Figure
Furthermore, another look at the 3D-MRI volumes computed to have high ASF1 but originally labeled by the human experts to have slight or no eye movement related artifacts, in fact, revealed to have heavy eye movement artifacts.
ASF2 was compared to the 4° rating scale by summarizing the distributions as box plot in Figure
In addition, a second visual inspection of the 3D-MRI with a high value of ASF2 and labeled by human experts to have slight or no ringing, disclosed that these 3D-MRI volumes in reality had heavy ringing artifact.
Finally, ASF3 was compared to the 4° rating scale by summarizing the distributions as a bar plot in Figure
Several combinations of the developed features were used as input into SVM. The combination with the highest accuracy was chosen to be the winning set. The accuracy was computed as the percentage of correctly classified 3D-MRI volumes as compared the visual-inspection category. The combinations yielding the highest accuracies are displayed in Figure
Next, the distribution of the ASF results were compared to a random permutation, shown in Figure
Compared to a univariate quality assessment approach that generates a single number, the multivariate approach (together with the global and ROI automated image quality features) presented in this paper is more informative as it provides details that categorize, localize, and quantify the extent of noise in the data. These parameters are key tools for assessing the quality of 3D-MRI in a neuroimaging database, where the brain 3D-MRI volumes are indexed and can be queried according to the artifact type. Moreover, since the features used in the classifier are regional, the affected regions can be highlighted in an automated plugin added to a 3D imaging viewer. This plugin can then be used to improve the efficiency of the quality assessment procedure of 3D-MRIs through visual inspection by helping investigators to focus in on the problematic regions. Such a task is not possible with a global feature that is estimated from the entire volume.
The categories generated by visual-inspection were used as the gold standard when developing the automated classification procedure. In visual-inspection, an investigator first looks for regions that may contain artifacts, visually rates the level of artifact as heavy, moderate, slight or none, and then makes an overall conclusion with an appropriate label (red, yellow, green). The features that produced the highest accuracy were based on first quantifying the artifacts, then using the machine-learning algorithm to classify the 3D-MRIs as usable or non-usable. The multivariate classifier estimates a discriminate hyperplane, by incorporating all features rather than attempting to classify by each feature individually. Individual classification would lead to incongruent information resulting in lower performance accuracy. The winning automated-procedure emulates most closely the visual-inspection procedure, by first objectively quantifying each artifact, then estimating the most discriminative hyperplane to classify each 3D-MRI as usable or not-usable. This could potentially explain why the volumetric features such as the histogram based features or the gw-t-score did not perform as well as the artifact-specific features.
The SVM classifying approach was able to correctly classify 70.1% of the 3D-MRI that were assigned “not-usable” based on investigator-based visual inspection. In addition, the SVM approach was able to classify 88.2% of the 3D-MRI that were assigned “usable” based on investigator-based visual inspection. Based on these sensitivity and specificity estimates, the accuracy of our SVM approach and automated image quality features was computed to be around 80%. The remainder of this section is devoted to identification and discussion of the strengths and weaknesses in the current approach and possible reasons for the discrepancy between sensitivity and specificity measures, with the goal of providing areas to further improve the current methodology.
The reasons for misclassification can either be related to human error, SVM classification error, or a combination of the two. Human error misclassification is an important motivation for this study and can occur for a number of reasons. Visual inspection of structural 3D-MRI volumes in large databases may require that this process be shared by a number of experts rather than an individual investigator, which can introduce inter-rater variability in how the 3D-MRI volumes are labeled. Even if the task were assigned to a single investigator, intra-individual variability though probably less than inter-individual variability is still an issue that needs to be addressed.
Misclassification of a 3D-MRI due to machine error suggests the developed methods can be further refined, to improve performance. One problem, that came to light through this work is when an artifact is not present in the air background and exists outside the targeted region. The ringing artifact problem, for example, can be present completely inside the brain and not cause any changes in the background part of the image. The value for ASF2, in this case, would be small making the SVM classifier assign minimal or no ringing artifacts, when in reality there is ringing artifact corrupting the 3D-MRI. In order to address this issue, additional image processing tools would need to be developed to distinguish this artifact from the brain.
The power of the artifact-specific features is the automated and flexible nature of these metrics. ASF3 was developed and automated to target aliasing in the coronal direction. However, aliasing can also be present in the sagittal and axial directions. In some of the 3D-MRIs of the dataset analyzed, there was aliasing in the axial direction, with 2–3 axial neck slices wrapping around to the top. The
While the artifact-specific features were developed to target artifacts found in 3D-MRI volumes acquired using a spoiled gradient recalled (SPGR) sequence, they can be used for assessing 3D-MRIs acquired using other pulse sequence. To extract meaningful features, investigators can follow a two-step approach: (1) identify recurring artifacts specific to the acquisition pulse sequence and (2) adapt the automated artifact-specific features to target the most prominent artifacts.
As illustrated in Table
In the current study, the proposed objective metrics were developed to emulate the visual inspection process. It should therefore be noted that even if our proposed method worked with 100% accuracy, it still could translate into a limitation of this study. This is because visual inspection, besides being a subjective and inconsistent process, is also limited by the inability of humans to pick up subtle artifacts. For instance, the visual inspection process may not identify 3D-MRIs that cause automatic segmentation algorithms to fail due to contrast or bias artifact issues in the MR images. This is important in algorithms that have multiple steps and extract fine detailed measurements, like Freesurfer (Fischl,
The automated package presented in this paper can potentially be used for pre-screening purposes in order to cut down the amount of work and time spent rating the 3D-MRI visually by investigators. Toward this end, a confidence level for each classification result can be computed so that if the classification is below a certain confidence level, an investigator can further visually inspect the dataset for further clarification.
A novel method to automatically assess the quality of images using a multivariate classifier has been proposed. This is the first study that uses a multidimensional machine-learning algorithm based on automated regional extracted features to automate quality assessment of structural magnetic resonance images (MRI). Compared to a univariate quality assessment approach that generates a single number, the approach presented in this paper is more informative as it provides details that categorize, localize and quantify the extent of noise in the data. By breaking the problem down into smaller problems, features have been developed that individually quantify each of the artifacts that can be used as inputs into the SVM classifier. These parameters are key tools for assessing the quality of 3D-MRI in a neuroimaging database where the brain images are indexed and can be queried according to the different types of artifacts. Moreover, since the features used in the classifier are regional and localized, affected regions can be automatically marked in an imaging viewer, improving the efficiency for the human visual inspection procedure, a task not possible with a feature that is estimated from the entire 3D-MRI. The accuracy is close to 80%, and can increase with additional work to include more features to account for the other artifacts not yet explored.
RP, XC, VM, and QL made substantial contribution to the concept and design of the work. KB, JC, DW, VM, BV, AG, and EX made substantial contribution to the acquisition of data for the work. RP, XC, AB, HL, BV, AG, and VM made substantial contribution to the analysis and interpretation of data for the work. RP, XC, and VM drafted the manuscript critically. All co-authors revised the manuscript critically for important intellectual content. All co-authors approved the final version to be published. All co-authors agreed to be accountable for all aspects of the work ensuring the accuracy and integrity of the work have been appropriately investigated and resolved.
This research was supported by (1) the Intramural Research Program of the National Institute of Mental Health, NIH, Bethesda, MD 20892, USA, (2) the Office of Science Management and Operations (OSMO) of the NIAID, Bethesda, MD 20892, USA, and (3) the Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD 21205, USA. RP, Ph.D. was supported (in part) by Award Number R25GM083252 from the National Institute of General Medical Sciences.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Material has been reviewed by the Walter Reed Army Institute of Research. There is no objection to its presentation and/or publication. The opinions or assertions contained herein are the private views of the author, and are not to be construed as official, or as reflecting true views of the Department of the Army or the Department of Defense.
The authors thank Drs. Heike Toste, Hao Yang Tan, Lukas Kemph, Fabio Sambataro, Roberta Rasetti, Eugenia Radulescu, Shabnam Hakimi and Mr. Saumitra Das (Genes, Cognition and Psychosis Program, NIMH) for their part in the visual inspection and quality ratings of the 3D-MRIs; and Mr. Brad Zoltick and Dr. Yunxia Tong (Genes, Cognition and Psychosis Program, NIMH) for data-basing and IT support.
13D-MRI and 3D-MRI volume are used interchangeably throughout the manuscript.