ADHD classification by a texture analysis of anatomical brain MRI data

The ADHD-200 Global Competition provides an excellent opportunity for building diagnostic classifiers of Attention-Deficit/Hyperactivity Disorder (ADHD) based on resting-state functional MRI (rs-fMRI) and structural MRI data. Here, we introduce a simple method to classify ADHD based on morphological information without using functional data. Our test results show that the accuracy of this approach is competitive with methods based on rs-fMRI data. We used isotropic local binary patterns on three orthogonal planes (LBP-TOP) to extract features from MR brain images. Subsequently, support vector machines (SVM) were used to develop classification models based on the extracted features. In this study, a total of 436 male subjects (210 with ADHD and 226 controls) were analyzed to show the discriminative power of the method. To analyze the properties of this approach, we tested disparate LBP-TOP features from various parcellations and different image resolutions. Additionally, morphological information using a single brain tissue type (i.e., gray matter (GM), white matter (WM), and CSF) was tested. The highest accuracy we achieved was 0.6995. The LBP-TOP was found to provide better discriminative power using whole-brain data as the input. Datasets with higher resolution can train models with increased accuracy. The information from GM plays a more important role than that of other tissue types. These results and the properties of LBP-TOP suggest that most of the disparate feature distribution comes from different patterns of cortical folding. Using LBP-TOP, we provide an ADHD classification model based only on anatomical information, which is easier to obtain in the clinical environment and which is simpler to preprocess compared with rs-fMRI data.


INTRODUCTION
Attention-Deficit/Hyperactivity Disorder (ADHD) is a multifactorial and clinically heterogeneous disorder, which is highly prevalent in children worldwide. It is estimated that 5-10% of school-age children and 4% of adults suffer from ADHD (Biederman, 2005). The negative impact of ADHD on patients, their families, and society make ADHD a major public health problem (Ferguson, 2000). However, an objective biological tool to diagnose ADHD is still unavailable. Foreseeing the importance, the organizers of the ADHD-200 Global Competition have collected functional and anatomical ADHD MRI datasets of an unprecedented scale, which are accessible via the Internet (http://fcon_1000.projects.nitrc.org/indi/adhd200/). This work provides an important opportunity for researchers all over the world to study brain changes in ADHD subjects based on numerous brain MRI images.
During the ADHD-200 global competition, we tried many strategies to discriminate ADHD patients from control subjects based on resting-state fMRI (rs-fMRI) and anatomical brain image data. We found that the brain morphological changes described by a 3D texture analysis can be used to distinguish children with ADHD from typically developing children (TDC).
These structural image-based models demonstrated similar accuracy compared with our models based on rs-fMRI data. In the present study, we describe and analyze the 3D texture analysis method.
It is not easy to construct a classification rule to distinguish ADHD from TDC subjects. ADHD is a complex disorder with a composite etiology (Faraone and Mick, 2010). No simple existing indicators can be used to diagnose ADHD at present. Currently, the Diagnostic and Statistical Manual of Mental Disorders, 4th ed., text revision (DSM-IV-TR) is most often used for diagnostic criteria for ADHD. Some ADHD criteria are based on subjective descriptions by a child's parents or teachers and not on objective analysis tools. Recent research has demonstrated that using different versions of the DSM or disparate sources of collateral information can significantly affect the calculated prevalence of ADHD (Polanczyk et al., 2007). Moreover, both sex and age play important roles in the development of ADHD. These factors also increase the complexity of building a diagnostic tool (Biederman, 2005). All the aforementioned factors make it challenging to build an efficient classification model for ADHD.
Additionally, approaches based on rs-fMRI data suffer from unstable echo planar imaging (EPI) and involve sophisticated data

LBP-TOP
LBP is a simple and efficient image texture operator introduced by Ojala et al. (1996Ojala et al. ( , 2002. Figure 1 shows the three steps for computing LBP on 2D images. The LBP P,R operator can be defined as where v c and v p are the values of the center pixel and neighborhood pixels with radius R, respectively, P is the total number of neighborhood pixels, and R is the radius in pixel. After the LBP codes for all voxels in an image are computed, the histogram of the codes computed over specific regions or over the whole image can be used as a texture descriptor. Therefore, each bin of the histogram can be regarded as a "micro-texton" encoded by LBP (Hadid et al., 2004). Figure 2 demonstrates patterns encoded by these histogram bins. Any morphological changes would modify the distribution of the codes, resulting in alterations to the histogram. Therefore, the histogram of the computed LBP codes is a good descriptor for comparing changes between images. For 3D data, Zhao et al. have proposed simplifying spatiotemporal descriptors by concatenating LBP on three orthogonal planes (LBP-TOP), i.e., the xy, xt, and yt planes (Zhao and Pietikainen, 2007). Here, we used LBP-TOP to describe brain volume data. Therefore, we replaced the t dimension with the z dimension. We propose using the same radius for x, y, and z for LBP-TOP. Figure 3 illustrates the specific steps for computing LBP-TOP on 3D-volume data.
The direct output of LBP-TOP is an over-complete set of features. One method for selecting the most informative bins from the raw histogram is to map "uniform patterns." Figure 4 shows the rule of uniform patterns and some examples. Ojala et al. observed that in image classification problems, such as face and texture classification, the major patterns are uniform patterns. Non-uniform patterns rarely exist (Ojala et al., 2002). In uniform LBP mapping, there is a separate output label for each uniform pattern, and all non-uniform patterns are assigned to a single label. In the case of LBP with eight neighbors, after mapping uniform patterns, the length of the histogram bins for whole-brain images was reduced from 255 to 259.

PARTICIPANTS
To best demonstrate the discriminative power of LBP-TOP, only male subjects were used to control for the known sexbased differences in ADHD subjects (Biederman, 2005;Polanczyk et al., 2007). Male data from the Kennedy Krieger Institute (KKI), the NeuroIMAGE sample (NeuroIMAGE), the New York University Child Study Center (NYU), Oregon Health and Science University (OHSU), Peking University (Peking_1, Peking_2, and Peking_3), and the University of Pittsburgh (Pittsburgh) were selected for analysis in this study. We ruled out using the dataset from Washington University because it was Frontiers in Systems Neuroscience www.frontiersin.org September 2012 | Volume 6 | Article 66 | 2 FIGURE 1 | Steps for computing the local binary pattern (LBP) operator.
The LBP of 2D images aims to map all pixels from the grayscale pattern space onto the binary pattern space in three steps.
Step (I): A small window is defined by a radius R and the total number of neighborhood pixels P (in this case, p = 8).
Step (II): The neighborhood of each pixel is thresholded by the value of the center pixel, resulting in a binary number. Then, the resulting code of the center pixel is given as a weighted sum of the binary number of its neighbors.
Step (III): After all the LBP codes of an image are computed, the texture features are defined by building a histogram over specific regions or over the whole image. not in the test set of the ADHD-200 global competition and no ADHD subject in it. Five subjects (0010016, 0010027, 0010055, 0010098, and 0010127) in the NYU dataset were excluded because no anatomical data existed for them. Subject 0010013 in the NYU dataset was also excluded because some of the brain in the anatomical image was cropped during the face removal process. ADHD hyperactive-type subjects were excluded due to the small number of such subjects in the dataset. Therefore, the ADHD subjects in this study were of both the ADHD combined type and the ADHD inattentive type. A total of 436 male subjects (210 ADHD subjects and 226 TDC, mean age = 12.12 ± 2.95) were used in this study. The distributions of subjects by age and by type of ADHD are shown in Tables S1 and S2. A list of all subjects can be found in Table S3. The detailed phenotype of each subject can be found on the website for the ADHD-200 global competition (http://fcon_1000.projects.nitrc.org/indi/adhd200/).

FIGURE 3 | The steps of computing LBP on three orthogonal planes (LBP-TOP). Step (I):
A small window is defined by a radius R and the total number of neighborhood pixels P.
Step (II): LBP codes are computed for three orthogonal directions (x, y, z). Each voxel is encoded based on the three orthogonal planes (xy, yz, and xz). After the LBP of each direction is computed, the texture features can be computed by building the histogram over a specific volume or over the whole-brain volume.
Step (III): The histograms from the three planes are combined to build the resulting histogram, which represents the texture features of the selected volume.

DATA PREPROCESSING
An overview of the data analysis procedure is shown in Figure 5.
The details of each step are described below.

Registration methods
Linear registrations with 9 degree of freedom (9-DOF) and 12-DOF were performed using the linear multimodality registration method developed by Oxford FSL FLIRT (Jenkinson et al., 2002;Smith et al., 2004). All images were transformed to standard MNI152 space by FLIRT with 6-DOF (rigid-body transformation). The results of FLIRT with 6-DOF were then linearly transformed by FLIRT with 9-DOF (rigid-body + independent scaling). The results of FLIRT with 9-DOF were also linearly transformed by FLIRT with 12-DOF (rigid-body + scales + FIGURE 4 | Uniform patterns. Using circular neighborhoods, U is calculated as the number of bitwise transitions from 0 to 1 or vice versa. An LBP pattern is considered uniform if U is less than or equal to 2. Here, we show three simple examples of uniform and non-uniform patterns. A list of all uniform LBP patterns with eight neighbors has previously been published (Inen et al., 2011). skews). Non-linear normalization procedures were performed using the automated registration tool (ART). ART was developed by Ardekani et al. (2005) and can be downloaded from http:// www.nitrc.org/projects/art/. Klein and colleagues demonstrated that ART provides better efficiency and consistency than other non-linear registration methods (Klein et al., 2009).
FIGURE 5 | Overview of the data analysis procedure. Preprocessing involves three steps. First, all raw data are transformed into a 1 mm isotropic volume using the rigid body transformation as performed by FLIRT with 6 degrees of freedom (6-DOF). Second, brain images are registered to standard MNI152 space by linear (FLIRT with 9-DOF and FLIRT with 12-DOF) and non-linear (ART) registration methods. To achieve better registration results, the registration parameters were obtained by transforming the skull-stripped brains to the standard MNI152 brain template. Third, we computed the LBP-TOP histograms based on the registered images with various spatial context information (i.e., brain mask, AAL, and CC200). Following these steps, classification models can be trained by directly using the resulting histogram or by using a subset of data after applying the feature selection algorithm. (The major process flow is denoted by the thick line. The minor process flow is denoted by the thin line. The square boxes are the major steps showing how to extract features from raw data. The round boxes are the parameters or information needed for the process flow. Texts in gray color indicate the methods or the subtypes the process step used.)

3D skull striping
To obtain better registration results, skull stripping was performed prior to using the registration algorithm, using the 3DSkullStrip algorithm (Smith, 2002) developed by AFNI (Cox, 1996). 3DSkullStrip has proven to be a relatively robust skullstripping algorithm (Iglesias et al., 2011). However, it is not a perfect tool. Incomplete skull stripping can result in a loss of information from some brain regions. Hence, we applied the transformation parameters for skull-stripped brains to the original whole-brain images to create the input for the LBP-TOP algorithm.

Spatial context information and brain parcellations
To examine different spatial context information, we performed three separate parcellations in this study. First, a simple brain mask in MNI152 space, as provided by FSL, was used to compute the total histogram of the wholebrain volume. Second, to introduce spatial context based on brain anatomical information, the widely used automated anatomical labeling (AAL) template with 116 regions was used (Tzourio-Mazoyer et al., 2002). Finally, we also used an atlas derived from functionally parcellating the resting state data (Craddock et al., 2012). A 200 ROI version with 190 regions of spatially constrained parcellation (CC200) was used to introduce the spatial context information based on rs-fMRI data. The CC200 functional parcellation template made for the competition was kindly provided by Cameron Craddock. Details of the construction of CC200 have been previously published (Craddock et al., 2012) and can also be found on the Athena preprocessing strategies page of the ADHD-200 preprocessed data website: http://www.nitrc. org/plugins/mwiki/index.php/neurobureau:AthenaPipeline.

COMPUTATION OF LBP-TOP
The LBP-TOP algorithm was implemented using Java to build the LBP-TOP map from the structural image. All resulting LBP-TOP histograms were mapped for the detection of uniform patterns. Preliminary testing (not shown) demonstrated that only the LBP-TOP with eight neighbors provided sufficient information to classify ADHD within a reasonable processing time. Therefore, only tests with eight neighbors are shown here.

CLASSIFIERS
A k nearest neighbor classifier (KNN, K = 1) was used to show the baseline of the discriminative power of LBP-TOP. Moreover, an efficient and widely used classifier, SVM, was used in this work (Boser et al., 1992). SVM maps training data into highdimensional feature space to find the separating hyperplane with the maximal margin. Due to the large feature size of LBP-TOP results, we used linear SVM for greater efficiency. LIBLINEAR (Fan et al., 2008) was chosen for use because of its optimization for linear SVM.

FEATURE SELECTION
After introducing spatial context information, the LBP-TOP histogram bins become an over-completed feature set. To build a more efficient and robust classification model, a feature selection method is needed. Moreover, by only selecting the most important features, we can combine features from various points of view. For example, we can combine features from different LBP-TOP results based on dissimilar radii.
Feature selection based on the linear SVM has proven to be efficient and useful for gene selection, document classification, and many other applications (Brank et al., 2002;Chang and Lin, 2008).
For any test subject x, the decision function of linear SVM is where x is the feature vector, b is a constant, and w is the weight vector. Each value of w denotes the weight of each feature. The larger the absolute value of w j , the more important the j th feature is in deciding the result.
After training a linear SVM model, the w in (2) can be used as a relative importance index. Therefore, we can build a simpler model using the top n important features.
For combining features from different point of views, we first trained a linear SVM model using feature groups and ranked features by the absolute weights of the model. Only half of the features remained. Then, we combine these features with the features from a second feature group and trained another linear SVM model. Similarly, only half of the features were chosen to be merged into the next feature group. Using this iterative procedure, we combined various feature groups and found the most important features among these feature groups. Algorithm 1 shows steps of this iteration. Given a set of N subjects and K different feature groups, for each training dataset of our 10-fold crossvalidation, we use Algorithm 1 to select and combine the most important features.

BRAIN SEGMENTATION
FSL's automated segmentation toolbox (FAST) was used to segment raw brain images into GM, WM and CSF (Zhang et al., 2001;Smith et al., 2004). Figure 6 shows an example of a resulting probability map. The three tissue probability maps were analyzed following the same procedure described in

REFERENCE MODELS BASED ON rs-fMRI FEATURES
To compare the results of discriminative models based on rs-fMRI data, we used a simple and easily repeatable approach. Briefly, for the preprocessing of rs-fMRI data, we used the extracted timecourses from the Athena preprocessed data, which can be download from the ADHD-200 Preprocessed Data website. Details of the specific preprocessing steps can be found on the website. The timecourses of the AAL and CC200 parcellations used in LBP-TOP study were chosen for comparison. The extracted timecourses files, ADHD200_AAL_TCs_filtfix.tar.gz and ADHD200_CC200_TCs_filtfix.tar.gz, can be found on the ADHD-200 Preprocessed Data website. The correlation coefficients between each pair of regions were computed based on their extracted timecourses. For example, there are 116 regions in the AAL parcellation. Therefore, 6670 correlation coefficients can be computed based on the 6670 ROI pairs. All the correlation coefficients were used as features for the linear SVM ADHD classifier.
The results of each model were validated using the same crossvalidation settings used in the LBP-TOP studies. As described on the ADHD-200 Preprocessed Data website, the nuisance variance for the extracted time series of each region was removed, with or without use of a band-pass filter (0.009 Hz-0.08 Hz), and blurred with a 6 mm FWHM Gaussian filter. Both time series, with or without filtering by a band-pass filter, were tested.

EVALUATION
All tests in this study were evaluated by 10-fold cross-validation. We randomly partitioned the 436 subjects into 10 subgroups. For each step of cross-validation, one subgroup was used as a test data set, and the remaining nine subgroups were pooled as a training data set. After 10 cross-validations, the test results of all 10 subgroups were combined to build the accuracy of the estimation of each model. To facilitate comparison of the results, the same 10-fold cross-validation set was used in all evaluations.
We used grid searching to find the best penalty parameter C for linear SVM for each training dataset. That is, another 10-fold cross-validation was applied to each training dataset with several candidate values of C, and we chose the parameter C that led to the highest accuracy.
While performing feature selection, the assignment of optimal feature weights can be achieved when the optimal value of C is chosen during each round of cross-validation. After that, we evaluated the effect of feature number using each testing dataset. Then, we combined the results of 10 test dataset to build the accuracy of different feature numbers.

STATISTICAL TESTS
To show the classifier has learned a structure in the data, we compute the p-value against the null distribution using permutation tests (Good, 2000;Ojala and Garriga, 2010). The null hypothesis of permutation test is that the labels are independent of the features. Therefore, one can learn almost same accuracy using random labeled data set. By randomly permuting the labels of the data set, permutation tests can measure how likely the observed accuracy is learned by chance. The permutation-based p-value is defined by where D is the original labeled data, e(c, D) denotes the error of classifier c learned from D, and D is a set of k randomized versions D of D (Ojala and Garriga, 2010). In this work, the e(c, D) was estimated by same 10-fold cross-validation with other tests. One hundred randomized sets of each test were used to estimate the p-values (k = 100).
To compare different approaches of this work, McNemar's tests were applied to compute p-values between two approaches (Everitt, 1992;Dietterich, 1998). While comparing two different approaches, confusion matrices of each approach were estimated by same 10-fold cross-validations. Then we compute the p-values of McNemar's tests using R (Team, 2012). Table 2 shows the 10-fold cross-validation results for different radii (1 mm, 2 mm, and 3 mm) for the LBP-TOP, various parcellations, linear registrations, and non-linear registrations, respectively. Table 2A shows the baseline accuracy which LBP-TOP can provide with the simple 1 NN classifier. Comparing the results of Tables 2A and 2B, we can find the linear-SVM classifiers can provide better accuracy than 1 NN classifiers. Moreover, some of the properties changed while using different classifiers. The LBP-TOP with a radius equal to 3 mm provided better accuracy than the LBP-TOP for the other two radii in most cases while using linear-SVM classifiers. The same properties cannot be found while using 1 NN approaches. However, there are nonsignificant between different radii in NcNemar's test ( Table S4). As expected, brain data with ART non-linear registration showed the highest accuracy in almost all cases, especially while using 1 NN as classifiers. Notably, using linearly registered brain data did not greatly reduce accuracy. After apply NcNemar's test, there is no significant difference between registration methods in any cases with linear-SVM classifiers. And only few cases show significant difference between registration methods while using 1 NN approaches (Table S5).

LBP-TOP
Although the resulting feature sizes varied widely (from 177 to 33630 features), accuracy across disparate parcellations was not greatly affected. Models using a histogram computed from the whole-brain region had higher accuracies than models based on other parcellations. Only considering the results of the AAL and CC200 parcellations, the CC200 showed better results most often. This finding may be the result of the greater number of utilized features or the greater number of homogeneous areas in the CC200 parcellation.
The results of reference models based on rs-fMRI features are shown in Table 2. These data indicate that simple approaches to analyzing rs-fMRI data do not discriminate as well as models based on structural information. The McNemar's test between structural features and rs-fMRI features also show significant difference in most cases (  ADHD-200 Global Competition, different preprocessing settings can affect the resulting accuracy. Moreover, combining the results of different rs-fMRI approaches can provide better discriminative power. The results of these simple approaches can be viewed as the baseline of discriminative power that rs-fMRI data can achieve.

PERMUTATION TEST OF BASIC MODELS
The results of permutation test in Table 4 shows each approach can learn the class structure in the data. Classifiers based on  LBP-TOP features show more significant than approaches based on rs-fMRI data.

FEATURE SELECTION
The feature selection results with the ART non-linear registration methods are shown in Table 5. When introducing spatial context information (the AAL and CC200 parcellations), only a few features are needed to build a sufficiently accurate classification model. In most cases, using the same number of features but combining features from all the radii of LBP-TOP (R1+R2+R3) improves the accuracy of the resulting model. After combining all features based on different parcellations and various radii, we achieved a model with greater accuracy compared with the AAL or CC200 parcellations alone. However, the accuracy of the combined model did not surpass that of the model based   (177 features on the histogram of the whole-brain region. Figure 7 shows the test results from using different feature groups based on AAL parcellation. Table 6 shows the accuracy of models based on various brain image resolutions. Models utilizing higher resolutions usually had better accuracy. However, models based on the CC200 parcellation had greater accuracy when using 3 × 3 × 3 mm resolution. Nevertheless, higher resolution data generally provided more information for the discrimination of ADHD from TDC subjects.

TISSUE TYPES
To determine the most discriminative tissue type within the brain, models based on GM, WM, and CSF probability maps were tested. These results are shown in Table 7. In most cases, the structural differences found in the GM data provided the highest discriminative power for separating ADHD from TDC subjects. The McNemar's test between different tissue types do not show significant difference while using whole brain and AAL parcellations, but show significant difference in some cases using CC200 parcellations (Table S6).

DISCUSSION
The prevalence of ADHD around the world is highly heterogeneous. Polanczyk et al. (2007) have shown that this variability may be explained primarily by the use of differing ADHD diagnostic criteria and collateral sources of information. Additionally, geographic location also plays a role in the variability of ADHD prevalence around the world (Faraone et al., 2003;Polanczyk et al., 2007).
Based on the research of Polanczyk et al., estimations of ADHD prevalence rates using the DSM-III-R or ICD-10 criteria are significantly lower than when using other criteria, such as those of the DSM-IV. Additionally, the use of different collateral sources of information, such as parents, teachers, subjects, the best-estimate procedure, the "and rule (parent and teacher)," or the "or rule (parent or teacher)," can also significantly affect the estimate of ADHD (Polanczyk et al., 2007).
The ADHD-200 global competition dataset was pooled from research sites all over the world. The organizers of the competition went to great lengths to maintain the consistency of the dataset. Nevertheless, for various historical reasons, including the use of different benchmarks at each site, it is difficult to use the same procedure to diagnose ADHD around the world (Table 1). However, the worldwide diagnosis of ADHD reflects an objective reality from which ADHD classification models can be built and evaluated.
While constructing classification models based on machine learning approaches, the inconsistency of diagnostic criteria may introduce so-called class label noise, which may seriously diminish accuracy. Class label noise may be the most important contributor to low accuracy in the ADHD-200 Global Competition.
While constructing our ADHD diagnostic tool based on brain images, we found it difficult to compare the rs-fMRI data from different research sites due to differences in image resolution, slice thickness, time points utilized, and image quality. Moreover, the complex preprocessing steps of fMRI data analysis also introduce hardships that can affect the results. Finding the optimal preprocessing strategy to provide the most useful information for building a classifier is a time-consuming process. Therefore, we chose anatomical data rather than rs-fMRI data to mine useful information from brain morphological changes. The resulting classification model based on morphological changes was found to be competitively accurate in discriminating ADHD from TDC subjects. Our results demonstrate that using features based on LBP-TOP data to train the linear SVM can result in greater discriminative power than using features based on rs-fMRI data. The resulting accuracies based on LBP-TOP features are better than those based on rs-fMRI data ( Table 2).

Frontiers in Systems Neuroscience
www.frontiersin.org September 2012 | Volume 6 | Article 66 | 9 FIGURE 7 | Feature selection results of ADHD-TDC classification accuracy based on AAL parcellation and the ART non-linear registration method. R1, R2, and R3 denote the LBP-TOP radii in mm. R1 + R2 + R3 denotes the combination all features from R1, R2, and R3. All combine refers to the combination of all features from different parcellations (i.e., brain mask, AAL, and CC200) and various radii.

ROBUST TO REGISTRATION METHOD
The robustness of the registration methods when using LBP-TOP features with ADHD data is notable. Although the model based on the ART non-linear registration method proved to be the most accurate, the models based on linear registrations (FLIRT with 9-DOF and 12-DOF) also performed well in our tests ( Table 2). This finding demonstrates the stability of the LBP-TOP to registration methods. Due to the large interindividual variability of the human brain, the registration step of MRI brain data analysis is both critical and challenging (Uylings et al., 2005). Aside from the linear registration method, more than a dozen non-linear registration methods have been developed in recent years, but a perfect registration method does not yet exist (Klein et al., 2009).
However, after performing a perfect registration, no structural differences should exist between subjects. Therefore, a good index for morphological changes should not be based on perfect nonlinear registration methods. This property of LBP-TOP might provide a simple and efficient way to compare brain morphology with linearly registered brains.

GLOBAL EFFECTS OF ADHD?
To introduce different spatial context information, we utilized several parcellation strategies in this study. Unexpectedly, the models using only the distribution of whole-brain features usually demonstrated the highest accuracy in our tests (Tables 2, 3, and 4). Adding parcellation information did not improve the resulting models.
Our results imply that morphological changes in the ADHD brain may affect the whole-brain texture distribution. Further research should be performed to confirm these findings. Theoretically, introducing spatial context information can provide higher accuracy if there are significant structural brain changes in several brain regions. Published structural imaging studies, summarized in two meta-analyses (Valera et al., 2007;Ellison-Wright et al., 2008), have failed to find robust brain changes between ADHD and control subjects. Meta-analyses can help in identifying brain regions that may be the most abnormal in ADHD subjects. However, it is difficult to build a robust discriminative model of ADHD based only on such selected regions.

COMBINING MODELS USING FEATURE SELECTION
Consider the results of 1 NN and linear-SVM in Table 2. 1 NN uses features as they have same weights, whereas the linear-SVM assigns various weights to them. The results might imply that, with linear registration, use all features with same weight (1 NN) cannot provide good results. However, we can make some features more important to make a better classifier (linear-SVM). Only few features might be needed to build a sufficient good classifier in this problem.
To find the most important features and to improve the robustness and efficiency of our model, we used linear-SVM to rank the overall extracted features, and we made an effort to choose the most important features from which to build a better classification model. Moreover, using the feature selection method, we combined models from different point of view to construct a more general model. The results of our tests show that it is useful to combine features to build better models (Tables 5  and 6). Moreover, we only need few features to build sufficient good classifiers (Table 5). To build a simpler and more robust model, we combined different LBP-TOP features to provide better accuracy. However, when dealing with too many features, the over-fitting effect came into play due to the insufficient number of subjects in this study (436 subjects). In most cases, greater accuracy was not gained by combining more than 4096 features.

MOST DISCRIMINATIVE TISSUE
To determine the most useful brain tissue for discriminating ADHD from TDC subjects, models based on GM, WM, and CSF probability maps were tested. These results are shown in Table 7. In most cases, GM-based structural difference provided the greatest discriminative power.
LBP-TOP extracted morphological data based on the distribution of various curvatures, edges, dots, corners, and the content size of the specific region (Figure 2). Most of this information may come from the complex patterns of cortical folding, which essentially dominates GM morphology. Therefore, we suggest that the primary morphological information utilized by our model may come from gyrification patterns. Wolosin et al. have previously shown different folding indices for ADHD compared with control subjects (Wolosin et al., 2009).

CONCLUSIONS
In this study, we approached the ADHD classification problem by working to find a simple method that could provide sufficient discriminative power. We determined that information derived from texture analysis of brain morphology could be used to distinguish ADHD from TDC subjects. An approach based on structural images is simpler than one based on functional data, and the data are easier to obtain making such an approach potentially more useful in the clinical environment. Our results demonstrate that structural brain data may be another treasure-trove in the ADHD-200 global competition dataset.
Although the accuracy of the models presented in this study are far from being useful clinically, texture difference-based feature extraction may point the way toward a simple and efficient method for determining morphological brain changes. We have demonstrated that LBP-TOP is a good candidate to build a discriminative classification model based on structural brain changes.

ACKNOWLEDGMENTS
This study is financially supported by Excellent Research Projects of National Taiwan University, under the contract 101R891602.