Automated measurement of the human corpus callosum using MRI

Herron, Timothy  J; Kang, Xiaojian; Woods, David  L

doi:10.3389/fninf.2012.00025

ORIGINAL RESEARCH article

Front. Neuroinform., 12 September 2012

Volume 6 - 2012 | https://doi.org/10.3389/fninf.2012.00025

Automated measurement of the human corpus callosum using MRI

Timothy J. Herron¹*

Xiaojian Kang²

David L. Woods^1,2,3

¹Human Cognitive Neurophysiology Laboratory, Research Service, US Veterans Affairs, Northern California Health Care System, Martinez, CA, USA
²Department of Neurology and Center for Neuroscience, University of California, Davis, CA, USA
³Center for Mind and Brain, University of California, Davis, CA, USA

The corpus callosum includes the majority of fibers that connect the two cortical hemispheres. Studies of cross-sectional callosal morphometry and area have revealed developmental, gender, and hemispheric differences in healthy populations and callosal deficits associated with neurodegenerative disease and brain injury. However, accurate quantification of the callosum using magnetic resonance imaging is complicated by intersubject variability in callosal size, shape, and location and often requires manual outlining of the callosum in order to achieve adequate performance. Here we describe an objective, fully automated protocol that utilizes voxel-based images to quantify the area and thickness both of the entire callosum and of different callosal compartments. We verify the method's accuracy, reliability, robustness, and multisite consistency and make comparisons with manual measurements using public brain-image databases. An analysis of age-related changes in the callosum showed increases in length and reductions in thickness and area with age. A comparison of older subjects with and without mild dementia revealed that reductions in anterior callosal area independently predicted poorer cognitive performance after factoring out Mini-Mental Status Examination scores and normalized whole brain volume. Open-source software implementing the algorithm is available at www.nitrc.org/projects/c8c8.

Introduction

Neuroimaging of the corpus callosum has attracted great interest in both medical and neuroscience literature in the past few decades. Callosal changes due to brain atrophy have been characterized in Alzheimer's disease (Tomaiuolo et al., 2007; Di Paola et al., 2010; Frederiksen et al., 2011b), multiple sclerosis (Hasan et al., 2012b), and Huntington's disease (Di Paola et al., 2012) and callosal morphology has been related to symptom severity. Abnormalities in callosal morphology have also been reported in neuropsychiatric diseases including schizophrenia, bipolar disorder, and depression (Sun et al., 2009; Walterfang et al., 2009b; Bearden et al., 2011). In addition, developmental disorders (Paul, 2011) including Williams syndrome (Luders et al., 2007a; Sampaio et al., 2012), autism (Tepest et al., 2010), attention-deficit/hyperactivity disorder (Luders et al., 2009; Gilliam et al., 2011), and dyslexia (Hasan et al., 2012a) are associated with callosal abnormalities. The corpus callosum is also vulnerable to diffuse axonal injury and atrophy following traumatic brain injury (Maller et al., 2010). Finally, callosal changes are found during human development and aging (Sullivan et al., 2002; Hasan et al., 2008b; Luders et al., 2010b), with callosal morphology reflecting hemispheric asymmetries as well as gender differences (Bishop and Wahlsten, 1997; Luders et al., 2003, 2010a,b; Gurd et al., 2012).

The quantitative morphological analysis of the midsagittal corpus callosum is complicated by the interindividual variability of its size and shape (Thompson et al., 2003). Standard automated morphometric analyses that rely on standard whole-brain normalization of T1-weighted (T1W) images to a common stereotaxic template (Ashburner and Friston, 2000; Wang et al., 2009) often result in the imprecise alignment of the callosum because of variability in size and shape with respect to other brain structures (Dougherty et al., 2005; Chaim et al., 2007; Wang et al., 2009). More sophisticated deformation-based image normalization and coregistration techniques (Shen and Davatzikos, 2003; Huang et al., 2005; Sun et al., 2007; Tomaiuolo et al., 2007; Wang et al., 2009) have also been used to accurately map white matter (WM) into the same space. Strong deformation mapping into a unified space has the advantage that no callosal segmentation needs to be performed in individual images, only on the template or mean image. However, the robustness of deformation-based callosal analysis in multisite studies is not clear. Further, whole head coregistration based on a general-purpose optimization function lacks the flexibility that might be useful in mapping callosum subregions across subjects.

A variety of techniques have been introduced to accurately segment, align and measure the callosum (Bookstein, 2003; Thompson et al., 2003; Luders et al., 2007a; Sun et al., 2007; Wang et al., 2009; Adamson et al., 2011). However, most of these techniques are not completely automated and require manual intervention to outline the callosal boundaries (Ballmaier et al., 2008), correct WM segmentation (Walterfang et al., 2009a; Adamson et al., 2011), identify the tips of the callosum (Peters et al., 2002) or label other seed points (Niogi et al., 2007).

One valuable automated approach to automatically segmenting and measuring the callosum is provided by boundary-based callosal segmentation (Brejl and Sonka, 2000; Van Ginneken et al., 2002; Xu et al., 2007) and unified measurement protocols (Kubicki et al., 2008; Rotarska-Jagiela et al., 2008; Luders et al., 2010a; Frederiksen et al., 2011b). These algorithms generally require a training set of hand-segmented callosa to define a population-specific atlas of callosal templates. The templates can either be warped upon T1 callosal images to try and match new callosa, or can be used to define a shape or appearance model of the callosum. For example in the algorithm of Styner and colleagues (Styner et al., 2005; Kubicki et al., 2008), the training set is used to encode a boundary shape model parameterized by complex Fourier coefficients. The range of such coefficients then constrains the possible callosal boundaries that can be identified in a new subject. Best fit boundaries are aligned across subjects using a Procrustean alignment (Peterson et al., 2001; Bookstein, 2003) between evenly spaced boundary points determined over the Fourier-parameterized boundaries. Another sophisticated boundary- and atlas-based callosal segmentation and measurement system, developed by Stegmann and colleagues (Stegmann et al., 2004; Ryberg et al., 2007), was recently used in a multisite study of the effects of aging on the callosum (Ryberg et al., 2008; Frederiksen et al., 2011a). Boundary-based methods using atlases have the advantage of superior performance in automatically segmenting the callosum from the fornix and pericallosal artery because permissible callosal shapes can be strongly constrained by the atlas or its derived shape models. The main disadvantage of boundary-based methods is the necessity of developing a population-specific atlas defining permissible callosal shapes and subsequent potential inaccuracies in quantifying callosa with unusual shapes that may occur in other patient populations.

Rule-based callosal segmentation algorithms are capable of automatically segmenting the callosum without manually defined templates (Lee et al., 2000). For example, Schönmeyer and colleagues (Schönmeyer et al., 2007; Rotarska-Jagiela et al., 2008) recently developed a rule-based image processing algorithm that uses relatively homogenous image intensity values to define image objects that are then used for callosal segmentation. Such image objects must be present in certain absolute locations of the image and have particular positions with respect to each other for the rules to properly segment the callosum. The rules include explicit procedures for detaching the fornix. However, Schönmeyer's algorithm has not been tested on a large database of images from different sources and does not provide measures of callosal thickness or overall length. Rule-based callosal measurement methods have several advantages. First, they are computationally simple and hence can be executed rapidly. Second, because they do not depend on parameterizing the upper and lower callosal boundaries they are less impacted by boundary errors (Lee et al., 2000) than template-based approaches. Third, rule-based algorithms are less vulnerable to measurement errors in subjects with unusual callosal shapes or divided into multiple clusters. The main disadvantage of rule-based algorithms is that they contain only relatively crude implicit shape information, and hence can be vulnerable to segmentation errors, particularly in failing to accurately segment the callosum from the fornix and pericallosal arteries.

In the current study, we introduce a novel fast, fully automated rule-based technique that does not require manual callosal segmentation. We introduce methods for (1) automatically isolating and parcellating the callosum, (2) defining standard locations along the length of the midsagittal corpus callosum, and (3) estimating callosal thickness centered on those standard locations as well as quantifying areas within geometrically defined callosal compartments (Witelson, 1989; Hofer and Frahm, 2006). Then we validate the performance of these automated procedures, collectively named C8, in four separate tests using publicly available structural brain imaging datasets (Marcus et al., 2007; Biswal et al., 2010): (1) We compare the results of our method with the results obtained using manual callosal segmentation. (2) We evaluate the robustness of the method to variations in image preprocessing procedures and to variations in image resolution. (3) We compare the test–retest reliability of our method with manual segmented images in subjects who underwent repeated scans. And finally, (4) we test the influences of different scanners on callosal morphology. These tests establish that C8 provides accurate callosal measurements regardless of image preprocessing or image resolution and that these measures show high test–retest reliability and comparability across different scanners.

We then use C8 to characterize callosal changes in normal aging and mild dementia. First, we analyze healthy control data from subjects of different ages to examine the influences of age, sex, and intracranial volume (ICV) on callosal size and morphology. The results show significant changes with both age and ICV. Second, we compare callosal size and shape in older, mildly demented adults with matched controls and find that the size of anterior callosal compartments adds predictive information about cognitive outcome.

Materials and Methods

Our overall approach is to first use standard algorithms to produce whole-brain WM segmentations that are then used for callosal quantification: standard spatial affine normalization algorithms applied to the T1W image are used to warp the WM segmentations into Montreal Neurological Institute (MNI) space. Specialized callosal cluster detection algorithms are then used to define the cross-sectional midline portion of the callosum along its full extent. Finally, voxel-based measurements are taken by summing the segmentation values in 2D (areas) and along line segments in 1D (thicknesses) in MNI space and inverting the normalization to obtain original image space values. It is worth noting that although we identify a callosum boundary during the clustering step, we do not parameterize the boundary curves in order to define superior and inferior callosal surfaces for use in normalization or for making measurements (Sun et al., 2007; Kubicki et al., 2008; Ryberg et al., 2008; Wang et al., 2009; Luders et al., 2010b; Adamson et al., 2011). In this sense C8 is a voxel-based method as opposed to a surface-based method (Clarkson et al., 2011).

Thickness Definition and Rationale

Defining callosal thickness is complicated by the shape variability of the callosum. For example, one early definition of callosal thickness used the length of line segments stretching between two sets of corresponding, evenly spaced anchor points on the superior and inferior callosal boundaries (Peters et al., 2002; Luders et al., 2003). However, beyond the key requirement to accurately define callosal endpoints that separate the inferior and superior boundaries, this definition results in inflated thickness values if the superior and inferior boundaries have different curvatures and lengths that introduce offsets in corresponding anchor points. Figure 1 shows several possible defining properties of thickness and the potential problems each property has due to varying callosal shape. For example, it is problematic to require that thickness defining line segments be perpendicular with respect to either the boundaries (Figure 1B), because boundaries may not be parallel in corresponding locations; or with respect to an interior line (Figure 1C), because interior path location or curvature errors can also inflate thickness values.

FIGURE 1

Figure 1. Complexities in defining thickness (using dotted line segments) shown on a cartoon posterior callosum. (A) Problem with definition by minimal traversal distance (vertical line is shorter). (B) Line segments defining thickness cannot always be perpendicular to both boundaries simultaneously. (C) Sensitivity of thickness to median anchor point (solid lines) when perpendicularity to the median line is required. (D) High boundary curvature on only one surface causes fans of thickness-defining lines that nearly intersect; complicating attempts to define thickness using uniformly spaced grid lines.

We chose here to use the minimum traversal distance to define thickness: namely, the length of the shortest line segment across the callosum that intersects a given interior anchor point on a median line defined along the length of the callosum (see section “Measuring Thickness, Area, and Length” for details). Interior anchor points, as opposed to boundary anchor points, are used in order to minimize the incidence of inappropriately small values (Figure 1A). An important aspect of the minimal traversal distance is that is it a fully local definition: the thickness estimates in one location are not dependent upon the shape of the corpus callosum in distant areas.

A challenge in making a fully automated algorithm to isolate and measure the corpus callosum is to make the method robust to segmentation failures. In particular, it is often challenging to algorithmically disconnect the callosum from the fornix (Lee et al., 2000; Schönmeyer et al., 2007) and from the pericallosal artery (Figure 2C). Although we implemented several methods to remove the fornix and pericallosal arteries from the callosum WM cluster (see section “Callosal Identification”), we were not successful in removing them in all cases. However, because our method provides local measures, thickness misestimations are limited to those localities where the callosum is connected to another structure like the fornix. Further, our minimum traversal distance definition of thickness also will tend to avoid using fornix WM or blood vessel voxels that fail to be properly excised from the callosum cluster because the shortest line segment across the callosum will generally avoid callosal attachments.

FIGURE 2

Figure 2. Preprocessing of T1 images. Each T1-weighted MR image (A) was normalized into MNI space (B) and segmented into white matter (C) and gray matter (D). Labels: FX = fornix and PCA = peri-callosal arteries. Arrows indicate incorrectly segmented WM voxels at the callosal boundary in the GM partition.

Brain Images

For the evaluation of our method as well as for sample applications, we used images taken from two public T1W image databases. First, we used the OASIS high-resolution anatomical image database (Marcus et al., 2007)¹. There are a total of 416 right-handed subjects in the database, including 152 young normals (age 18–39; 20 with repeated scans), 98 demented older subjects (age 60–96), 100 cognitively normal older subjects (age 60–94), and 66 middle-aged controls (age 40–59). All subjects underwent three or four anatomical T1W sagittal scans (1.5T MP-RAGE, voxel size 1.0 mm × 1.25 mm × 1.0 mm with TR/TE/FA = 9.7 ms/4.0 ms/10°) that were averaged together to create one high resolution T1W image for analysis. Every subject's age and sex was recorded. For older subjects, education, socioeconomic status, the Mini-Mental State Examination (MMSE) score and the clinical dementia rating (CDR) (Morris, 1993) were also obtained. Finally, a set of machine-generated callosal segmentations for all 316 healthy controls from the Automatic Registration Toolbox (ART) project² were hand-corrected for segmentation errors and served as a performance reference.

Second, T1W whole-brain image data from 1231 subjects from the 1000 Functional Connectomes Project (FCP) database were also analyzed (Biswal et al., 2010)³. The subset of this database that we analyzed originated from 25 different sites (see Supplementary Table S1 for details) and included age and sex as covariates for all subjects: 54% female, age range 13–85 (71% age 18–29), and ~95% right-handed. Each utilized subject's dataset included one high-resolution T1W image and had in plane resolution of 1.2 mm or less. We excluded 24 anatomical images because either normalization to MNI space or whole head tissue segmentation failed using site-specific scripts.

Image Preprocessing

T1W images were segmented into gray matter (GM), WM, and cerebrospinal fluid (CSF) compartments using SPM5⁴, which assigns probabilities to each voxel that reflect the likelihood that the voxel belongs to GM, WM, or CSF (Figure 2). SPM5 tissue segmentation uses a clustering analysis that starts with an apriori template of tissue locations (warped from MNI space) and iteratively solves for mixtures of tissue types present in each voxel (Ashburner and Friston, 1997). In order to examine the influence of different automated segmentation algorithms, the OASIS images from young normal subjects were also segmented with SPM8's unified segmentation (Ashburner and Friston, 2005), with FreeSurfer's⁵ WM segmentation algorithm (Dale et al., 1999), and with the expectation-maximization algorithm EMS (Van Leemput et al., 1999)⁶. SPM8's unified segmentation is similar in overall approach to SPM5's cluster-based design but incorporates nonlinear registration of prior probability maps during classification giving it somewhat improved performance (Salvado et al., 2007; Klauschen et al., 2009; De Bresser et al., 2011). FreeSurfer's segmentation algorithm specializes in labeling WM for the purpose of generating cortical surfaces by combining sophisticated spatial and histogram intensity normalization with specialized boundary detection algorithms. Overall, FreeSurfer's segmentations have excellent quality but tend to slightly over assign voxels to WM when compared to expert segmentations (Klauschen et al., 2009). The EMS algorithm uses an iterative expectation-maximization algorithm to predict tissue types based on image intensity, starting from a T1W image template, while also imposing Markov random field structure to enhance neighborhood agreement on tissue type. EMS has been demonstrated to achieve nearly the same performance as SPM5 but with different technical characteristics; in particular EMS WM segmentations tend to assign too few brain voxels to WM (Salvado et al., 2007).

For all of the above segmentation algorithms, default parameters were used except for SPM5 and SPM8. For those, it was found that reducing the brain voxel sample spacing parameter to 2 mm (default: 3 mm) substantially improved the accuracy of the resulting segmentation. Unfortunately, the tissue segmentations performed by all tested image preprocessing packages contain voxels that, because of partial voluming effects, are miscategorized as GM (particularly evident on the inferior callosal surface in Figure 2D). Thus, we expected that our callosal measurements would underestimate the thickness of the callosum in comparison with measurements based on manual callosal segmentation.

Finally, the T1W images were also normalized to MNI space following a 12 parameter affine transformation using SPM5 (Figure 2B). The segmented images were then normalized to MNI space and resliced to isotropic 1 mm³ voxels (or isotropic 8 mm³ voxels for some analyses) using trilinear interpolation.

Morphometric Analysis of the Corpus Callosum

C8's fully automated analysis procedure was used to identify and measure the corpus callosum by analyzing the normalized WM segmentations generated as described above. The callosum on the mid-sagittal plane and on each of two adjacent parasagittal slices (at x = ±1 mm in MNI space) were isolated and analyzed separately. Thus, the analysis procedures described below were applied to all three para-midsagittal slices, and median values of the final derived quantities were used in order to increase the robustness of the overall procedure. Multiple callosal slice analysis has been used previously (Rotarska-Jagiela et al., 2008) and improves performance because the callosal shape profile changes only gradually away from the midline (Luders et al., 2006a).

Callosal identification

Within each slice, a bounding box was defined in MNI space using a probabilistic map of the mid-sagittal callosum based on post-mortem brains (Burgel et al., 2006). C8 initially searched for callosal clusters within this box. Using seed points dropped down from a callosum's superior surface plus a WM cluster growing procedure, contiguous sets of WM voxels were selected on the sagittal plane and the boundaries of the callosal clusters were identified using a fixed WM segmentation value threshold. Note that this procedure allows for the possibility of generating multiple clusters of midsagittal callosal voxels that could occur in cases of disease, malformation, or the rare instance of a normal subject having a very thin isthmus that appears to separate the callosum into two parts.

We used four techniques to reduce the incidence of apparent fornix or callosal artery attachments. First, prior to identifying callosal clusters, any segmented WM voxel that could not be placed on some locally linear WM path within 45 degrees of the medial-to-lateral direction (i.e., “Y” direction in MNI space) was removed. This reflects the fact that callosal fibers are expected to primarily traverse the callosum in a mediolateral direction, a fact used previously by others in analyzing diffusion images to isolate and measure the callosum (Hasan et al., 2008b). Second, the aforementioned analysis of three para-midsagittal slices often resulted in only one of those callosal clusters containing fornix WM. In such cases, the faulty measurements from that single slice were significantly discounted in the final estimate by using median values of all estimates. Third, the use of a fixed WM segmentation threshold to define the interior of the callosal clusters often helped assign the fornix and other non-callosal structures to separate clusters that could then be ignored by restricting analysis to the largest (and longest along the anterior–posterior axis) cluster. Fourth, after obtaining the putative callosal WM cluster, we erased any WM cluster branch (a WM cluster segment separated by non-WM voxels) that was inferior to the main body of the cluster along the callosal mid body within a specified MNI range. The use of these four techniques generally limited fornix contamination to only a small part of the fornix remaining attached to the callosum.

Defining standard callosal partitions

The geometric partitioning schemes proposed by Hofer and Frahm (Hofer and Frahm, 2006) and Witelson (Witelson, 1989) were used to segment the CC into topographic compartments. The maximum extent of the CC along its anterior–posterior axis was identified, and parcellated into five or six compartments based on geometric ratios (Figure 3). The Hofer and Frahm parcellation incorporates a representation of five subregions of the human callosum based on diffusion imaging fiber tractography (Hofer and Frahm, 2006). The cortical parcellation is as follows: Compartment 1 to prefrontal cortex, Compartment 2 to premotor and supplementary motor cortex, Compartment 3 to primary motor cortex, Compartment 4 to primary sensory cortex, and Compartment 5 to parietal, temporal, and occipital cortices. This geometric parcellation is similar to the scheme introduced by Witelson, based on non-human primate data (Witelson, 1989), that has been widely used to assess callosal pathology (Thompson et al., 2003).

FIGURE 3

Figure 3. Quantification of callosal area and thickness. (A) Average callosal structure of 152 young subjects from the OASIS database, averaged in MNI space. Each subject's callosum was subdivided into five compartments along the anterior–posterior axis using geometric ratios following Hofer and Frahm (H&F, top of panel A) (Hofer and Frahm, 2006) and divided into six compartments following Witelson (W, bottom of panel A) (Witelson, 1989). (B) Callosal boundaries were defined with reference to a series of radial lines (three shown) emanating from a centroid. (C) Radial lines intersecting the callosum were oriented vertically. This unwrapped the callosum to define a median line and measure thickness. The same three lines intersecting the callosum in (B) are shown. The light gray line shows the median location of WM probabilities (dark gray) considered vertically. Callosal thickness was computed at each point using the shortest line segment connecting the superior and inferior surfaces through that point (five shown, short thin white).

Measuring thickness, area, and length

The thickness at each point along the length of the corpus callosum was computed as the minimum distance between the probabilistic boundaries of the callosum measured with line segments cutting across the callosum that intersected points on a median line (defined below) in the sagittal plane. Sums of automatically generated segmentation probabilities are commonly used to compute brain volumes, e.g., as in Kruggel (2006), and the summing technique allowed us to compute thicknesses and areas while avoiding the difficult task of defining callosal boundaries to subvoxel accuracy. Similarly, defining corpus callosum thickness as the minimum distance computed using variously angled short line segments passing through one point avoids the difficult problem of defining the correct perpendicular line with respect to the callosal boundaries or to the median itself. A similar technique has been successfully used to produce reliable cortical thickness measurements (Fischl and Dale, 2000) although it may produce slight underestimates of thickness due to image noise or mismatched boundary shapes (Figure 1). The median callosal line was determined over the entire length of the callosum (Figure 3) by using a series of radial lines at 1.65° intervals emanating from a centroid located halfway between the most anterior and posterior extents of the callosum and along the inferior–superior axis at the most inferior extent of the splenium. Our centroid is slightly superior to the Hampel centroid often used to divide the callosal into partitions radially (Hampel et al., 1998). A median callosal point was defined along each radial line as the median WM location using WM segmentation probabilities squared as median weights within an 11.55° neighborhood. Thicknesses were measured through each of these median points and then interpolated to obtain values at 50 equal-angle spaced points from the anterior tip to the posterior tip of the callosum.

Mean thickness (in mm) and total callosal area (in mm²) were then computed for each of the five callosal compartments. All measurements of thickness and area, performed within standard MNI space, were transformed back to native anatomical space by inverting the affine spatial normalization transformation computed for each individual brain. Thus, C8 provides both MNI space values and original anatomical space values—each has their use depending upon the application (Jäncke and Steinmetz, 2003; Luders et al., 2006b). In this manuscript we will use native space estimates unless otherwise indicated. Finally, the 50 anchor-point standardized median line allows us to compute an estimate of the total internal callosal length in MNI space by summing the lengths separating adjacent median line anchor points and inverting the affine transform to provide native space length estimates.

Method Accuracy, Reliability, Robustness

C8's accuracy and reliability were first evaluated by making callosal measurements on structural MRI data from the 152 normal control subjects contained in the publicly available OASIS high-resolution anatomical image database (Marcus et al., 2007). Visually inspecting the callosal segmentations isolated by C8 suggested that when fornix or pericallosal artery adhesions were evident on multiple para-midsagittal slices they were generally limited to a few voxels and would therefore have little effect on regional thickness and area measurements.

We performed several comparisons to evaluate the reliability of our morphometric procedures. We analyzed CC areas within the Witelson partitions (see Figure 3A) for each of the 152 healthy young OASIS controls (OASIS-152) and compared them with the results of previous studies which used expert manual CC delineation on similar datasets from young, healthy right-handed subjects (Jäncke et al., 1997; Bermudez and Zatorre, 2001; Luders et al., 2003, 2006a; John et al., 2008). We also evaluated the effects of image resolution by comparing C8 measurements performed on three resolutions of affine normalized segmentation images—0.125, 1, and 8 mm³ isotropic voxels. Finally, we compared the C8 callosal estimates with those computed using expert-corrected callosal segmentations from the ART database. ANOVA statistical comparisons and power estimates, using non-central F distribution models, were computed using CLEAVE⁷.

The OASIS database also contains repeated anatomical scans for a subset of 20 young, normal subjects (OASIS-20) that were used to estimate the scan-to-scan reliability of C8 measurements. We further manually delineated the callosa within these 40 images as an additional test of C8's accuracy. The manual segmentation was done on anonymized OASIS-20 T1W images (40 total) affine-normalized to MNI space by a trained member of our laboratory not otherwise affiliated with this study. In addition to correlations between repeated scans, we also computed fractional Dice coefficients (Crum et al., 2006) to measure the overlap between automated and manually corrected segmentations.

A third set of tests checked the robustness of C8 to different segmentation algorithms. We compared results from the OASIS-152 dataset within the Hofer and Frahm (H&F) partitions using the SPM5, SPM8, FreeSurfer, and EMS segmentation algorithms. Mean values of area and thickness are reported within three H&F partitions as well as correlations between these values across segmentation types.

A fourth set of tests aimed to validate the consistency of the area and thickness measurements. We used the estimated distances in native space between adjacent median line thickness measurement locations combined with local thickness measures to produce local area measurements that should sum to the total callosal area measurement. Thus, this test verified how well area values (2D sums of segmentation probabilities) compared with thickness values generated by searching for minimal 1D sums of (interpolated) segmentation values across the callosum.

Finally, we evaluated the reliability of C8 across image sets by analyzing T1 image data taken from 14 different MR scanners within the FCP image database (Biswal et al., 2010) that contain comparable young, healthy subjects of both sexes (group mean age <36 y.o., sex ratio between 1:2 and 2:1). First, we compared regional mean callosal thickness values across the groups. Second, we performed ANOVAs and multivariate linear regressions using the MatLab Statistics toolbox⁸ in order to measure variation in callosum areas due to scanner/group differences vs. those due to age, sex, overall brain volume, and image quality. Image quality was parameterized in two ways: first by voxel size (Y: anterior/posterior and Z: superior/inferior) in the sagittal plane and second by computing the image entropy (−∫_v∈ICVv · ln v · dv) over all intracranial voxel (ICV) intensity values v. Intracranial voxels are defined as voxels primarily segmented as being in WM, GM, or CSF in the preprocessing stage. Image entropy measures the overall lack of distinctness in the image as reflected in the histogram of image values: we normalize the maximum entropy value, given by a uniform distribution of values, to 1.

Applications

Age-related changes in the callosum

We first analyzed C8 callosum measurements from all FCP subjects (25 sites, 1231 subjects) using linear regression to evaluate the effects of age, sex, total brain volumes estimated from the accompanying segmentation images, and image quality. We computed quadratic regression curves for effects of age on both callosal thickness and median length. We also repeated the analysis using all 316 normal controls from the OASIS dataset for both automatically segmented callosa and expert corrected segmentations. We then computed the full Spearman correlation matrix for the above values in order to view the regional callosal area and thickness covariates of age, sex, and brain volume. In these statistical analyses, we discarded four obvious outlying measurements of older subjects from each of the FCP and OASIS datasets. The outlier data were due to poor image contrast, unusual spatial fluctuations in anatomical image values, or excessive WM hypointensities that interfered with SPM5's whole brain segmentation's clustering algorithm.

Callosal alterations in dementia

Our second analysis compared CC values of 98 mild Alzheimer-related dementia cases from the OASIS dataset (age 76.7 ± 7.1 years, 58 female) to those with 98 age-matched older controls (age 75.9 ± 9.0 years, 72 female) to evaluate if callosal measurements contain information helpful to classifying mild dementia. This data were previously used to show (Marcus et al., 2007) that normalized whole brain volume (nWBV), the fraction of brain matter contained within the total ICV, was a useful predictor of mild dementia as defined by the CDR score (Morris, 1993). Here, we measured the callosum areas within the five Hofer and Frahm partitions to see if they provide additional separation power to distinguish normal aging from mild dementia. We used subject demographic information of age, sex, and education (five-point scale) along with the MMSE score (Folstein et al., 1975), nWBV, total ICV, and the five H&F partition areas within an ordinal logistic regression, using the Design library in R ver. 2.13⁹, to predict the CDR score: 0 for controls; or 0.5 (very mild) and 1.0 (mild) for dementia patients.

Results

Method Accuracy, Reliability, Robustness

Table 1 shows the measurements of corpus callosum area obtained using C8 fully automatically or when using expert corrected segmentations. The results are comparable with earlier reports using manual callosal tracing as shown in Table 1, albeit with the expected small underestimation of CC area of the automatically segmented C8 data. Similarly, thickness measures produced by the minimum traversal distance definition used above were slightly smaller than those reported with manual callosal delineation. For example, the average thickness in H&F compartments 2 and 3 for the 152 young, normals from the OASIS database was 5.3 ± 0.7 mm. Mean thicknesses for the callosal body computed in prior studies using manual tracing (not using a minimal traversal distance definition) averaged 6–7 mm for two groups of young controls (Luders et al., 2007a,b) reported by Luders and colleagues, and averaged 7.2 ± 1.9 mm in young controls reported by Raine and colleagues (Raine et al., 2003). However, a previous study using semi-automated methods applied to voxel intensity-based WM segmentations produced mid-body callosal thicknesses of approximately 6.0 mm (Walterfang et al., 2009a,b), closer to our own.

TABLE 1

Table 1. Corpus callosum area measurements (mean and standard deviations) for callosal Witelson (W) compartments (see Figure 3A) obtained from the OASIS-152 anatomical image database (Marcus et al., 2007) using the present method.

In direct comparisons of the OASIS-152 subjects using fully automated segmentations vs. expert-corrected segmentations, thickness correlations were fairly high across most of the median callosal locations (Figure 4). Similarly, Pearson correlations between the automated and expert segmentations within Hofer and Frahm partitions were 0.87, 0.86, 0.68, 0.89, and 0.95, from anterior to posterior (and 0.92 for total area). Thus, only the thin H&F3 callosal compartment adjacent to the fornix and the tip of the splenium had relatively reduced correlations.

FIGURE 4

Figure 4. Mean and standard deviation (error bars) thickness at 50 equal angle spaced locations using automated segmentation (purple solid) and expert-corrected segmentations (gold dashed). The red dotted line indicates the Pearson correlations between the two values at each location.

Repeated scan reliability of the mean area and thickness measurements are reported in Table 2, using the OASIS-20 subjects who underwent two scanning sessions. Intersession variability (~5%) was markedly less than the intersubject variability (>15%) shown in Table 1. We have found no previous callosum intersession variability numbers for comparison, however comparable volume variability for healthy controls's shapewise-similar subcortical structures of the caudate and hippocampus range from about 1% to 3% (Jovicich et al., 2009).

TABLE 2

Table 2. Mean absolute % differences in thickness and area within the five Hofer and Frahm (H&F) compartments across two imaging sessions for 20 subjects who underwent repeated scanning in the OASIS-20 database.

Manually delineated total CC areas for the OASIS-20 images correlated strongly with C8 estimates as shown in Figure 5. C8's automated method produced a consistent and accurate (r = 0.90, Pearson correlation overall) estimate of callosal areas. This correlation to the gold-standard of hand segmenting the T1W images (as well as the 0.92 derived previously for the OASIS-152 data) compares favorably to a Pearson correlation of 0.80 obtained by Hasan and colleagues using a semi-automated DTI-based method (Hasan et al., 2008a). However, C8's correlation with manual segmentation was less than most inter-rater correlations of total area measurements under repeated hand-segmenting of single images by different experts, which have correlation that often exceed 0.95 (Bermudez and Zatorre, 2001; Dorion et al., 2002; Horton et al., 2004; Wang et al., 2006; Ballmaier et al., 2008; Tepest et al., 2010). The 40 automated callosal segmentations and hand segmented callosa overlap each other with a mean fractional Dice coefficient of 0.89 (0.04 SD), which is not as high as a previously reported value (Schönmeyer et al., 2007) of 0.97 (but where Schönmeyer threw out one of 50 images as an outlier). Higher overlap values with C8 would not be expected because SPM5's whole-brain segmentation algorithm assigns some of the callosal boundary to the discarded GM partition (Figure 2).

FIGURE 5

Figure 5. Scatterplot comparing manually delineated corpus callosum total cross-sectional areas (y-axis) with C8 area estimates (x-axis) for 20 normal subjects in the OASIS database who underwent repeated scans. Diamonds are from image session 1 and asterisks are for session 2. Thin dotted lines connect results for the same subject's two sessions, while the thick dotted line is the area diagonal.

Table 3 shows the effects of different segmentation preprocessing algorithms on C8. Correlations between the area and thickness values produced by processing different WM segmentations were high with thickness showing generally higher cross-segmentation correlations across the callosum. Finally, processing more than three adjacent mid-sagittal slices also made only minor differences in mean values of area or thickness (see Supplementary Table S3).

TABLE 3

Table 3. Pearson correlation coefficients for area (upper triangular, bold) and thickness (lower triangular, italic) values between C8 computations applied to the OASIS-152 dataset using four different segmentation preprocessing methods (see the Image Preprocessing section and Supplemental Table S2) within three of the H&F partitions (see Figure 3).

Next, we checked the consistency of thickness measurements with respect to area measurements by computing local thickness values multiplied by local median line distance measurements. We compared the sums of all such locally computed area values with the total callosal area measured simply by summing the segmentation values. We found that the local thickness times median length values had a mean area deficit of only −0.9% (SD 1.6%). Thus, assuming that accurate callosal median length values were used, thickness measurements showed only a slight negative bias. However there is a slight correlation (Pearson r = −0.17) between the bias and the size of the callosum, with larger CCs having larger discrepancies (see Supplementary Figure S4).

In order to test C8 on multisite data, we compared callosal estimates across different scanners by using 14 image sets containing young, normal, gender-balanced subjects within the FCP database (864 subjects total, mean age 24.2, SD 6.0). We obtained regional group mean thickness values at 50 equal-angle-spaced locations along the CC shown in Figure 6. Mean thickness values were similar across sites although the mid-body values had higher relative variance.

FIGURE 6

Figure 6. Mean mid-sagittal callosal thicknesses from 14 different scanners (black dotted lines) with young (age means 21–33 years), mixed gender, healthy subjects taken from the FCP database (see Supplemental Table S1). The thick gray line shows the mean and standard deviation.

ANOVAs performed on mean callosal thickness, median length, and total area all showed statistically significant omnibus differences across the 14 groups: F_{(13, 850)} = 3.4, 12.3, and 7.1, respectively (all p < 0.001, Greenhouse-Geisser corrected). However, the effect sizes of the group differences are fairly small: intergroup standard deviations are 3.5, 2.9, and 5.5% for thickness, median length and area compared to mean intersubject standard deviations of 11.6, 7.3, and 13.5%, respectively. These site differences largely reflected demographic differences in subject populations and image quality. When the potentially relevant covariates of age, sex, brain volume (from WM and GM segmentations), voxel size, image entropy, and indicator variables for group membership were included within a regression of callosum measurements (Table 4), no sites's values were statistically different (Bonferroni corrected) from those of the most average group. In all cases, however, image voxel dimensions and image entropy differed between groups [entropy ANOVA: F_{(13, 850)} = 11.6, p < 0.0001]. Thus, two sources of group variation were accounted for otherwise with image entropy being the most significant. Recomputing the regressions after discarding the group indicator variables resulted in small but significant Z voxel dimensions for mean callosal thickness (t₈₄₄ = 3.9) and Y voxel dimensions for callosal length (t₈₄₄ = 4.1).

TABLE 4

Table 4. Regression equation coefficients and t-values computed from young, normal FCP data from 14 groups for overall thickness, median length, and area (columns).

The results in Table 1 and Figure 4 show that the variance of C8 measurements of Witelson partition areas fell within the range of variance obtained in studies using manual callosal segmentation. We next compared the statistical power of C8 and manual analysis in two ANOVAs: one comparing male and female callosal OASIS-152 measurements and another comparing OASIS-152 subjects, divided into two groups that showed above and below median ICVs. The gender comparison yielded insignificant differences with either type of segmentation (e.g., males had 2.0% greater total area, p > 0.2). However, as suggested by Table 4, subjects with larger ICVs had increased callosal length. This effect was detected with similar statistical power using C8's automated segmentations [by 5.7%, F_{(1, 150)} = 33.2, p < 0.0001; power: 50 subjects give a 90% chance of p < 0.05] and manually corrected segmentations [by 5.4%, F_{(1, 150)} = 29.0, p < 0.0001; power: 57 subjects give a 90% chance of p < 0.05]. Mean callosal thickness also increased with higher ICV, both with C8 [by 4.7%, F_{(1, 150)} = 7.5, p < 0.01; power: 242 subjects give a 90% chance of p < 0.05] and manually corrected segmentations [by 4.4%, F_{(1, 150)} = 6.8, p < 0.01; power: 268 subjects give a 90% chance of p < 0.05].