A Fully-Automated Subcortical and Ventricular Shape Generation Pipeline Preserving Smoothness and Anatomical Topology

In this paper, we present a fully-automated subcortical and ventricular shape generation pipeline that acts on structural magnetic resonance images (MRIs) of the human brain. Principally, the proposed pipeline consists of three steps: (1) automated structure segmentation using the diffeomorphic multi-atlas likelihood-fusion algorithm; (2) study-specific shape template creation based on the Delaunay triangulation; (3) deformation-based shape filtering using the large deformation diffeomorphic metric mapping for surfaces. The proposed pipeline is shown to provide high accuracy, sufficient smoothness, and accurate anatomical topology. Two datasets focused upon Huntington's disease (HD) were used for evaluating the performance of the proposed pipeline. The first of these contains a total of 16 MRI scans, each with a gold standard available, on which the proposed pipeline's outputs were observed to be highly accurate and smooth when compared with the gold standard. Visual examinations and outlier analyses on the second dataset, which contains a total of 1,445 MRI scans, revealed 100% success rates for the putamen, the thalamus, the globus pallidus, the amygdala, and the lateral ventricle in both hemispheres and rates no smaller than 97% for the bilateral hippocampus and caudate. Another independent dataset, consisting of 15 atlas images and 20 testing images, was also used to quantitatively evaluate the proposed pipeline, with high accuracy having been obtained. In short, the proposed pipeline is herein demonstrated to be effective, both quantitatively and qualitatively, using a large collection of MRI scans.


INTRODUCTION
Analyzing the shape of subcortical and ventricular structures subjected to brain disorders is an area of ever growing importance, especially in the fields of neurodegenerative diseases such as Alzheimer's disease (Qiu et al., 2009b;Wang et al., 2011;Shi et al., 2013Shi et al., , 2015Tang et al., 2014Tang et al., , 2015bMiller et al., 2015), Huntington's disease (HD) (van den Bogaard et al., 2011;Younes et al., 2014;Faria et al., 2016), and Parkinson's disease (Sterling et al., 2013;Nemmi et al., 2015) as well as various neurodevelopmental disorders (Knickmeyer et al., 2008;Rimol et al., 2010;Seymour et al., 2017). The anatomical shapes of the structures of interest in those cases are usually represented using a mesh that can be created from the corresponding structural volumetric segmentation. In more detail, generating a segmentation-based shape representation of a specific structure of interest (such as the left hippocampus) consists of two steps: (1) segmenting that structure of interest from a structural magnetic resonance image (MRI), resulting in a 3D volumetric segmentation; (2) converting that volumetric segmentation into a smooth surface representing the structural segmentation's boundary (Levine et al., 2012).
The fully automated segmentation of subcortical and ventricular structures, based on structural MRIs, is a wellestablished field of research, with a variety of highly accurate algorithms having already been developed (Barra and Boire, 2001;Khan et al., 2008;Powell et al., 2008;Patenaude et al., 2011;Chakravarty et al., 2013;Tang et al., 2015c). As for the generation of surfaces, image-based meshing is typically employed, especially when creating computer models for computational fluid dynamics and finite element analysis (Young et al., 2008;Chen et al., 2013;Chernikov et al., 2013;Foteinos and Chrisochoides, 2013;Zhang, 2013). More recently, segmentation based meshing has also been applied to the medical imaging field, see Zhang (2013) for a general introduction. One of the most representative meshing techniques is the marching cubes algorithm, which has been incorporated into a number of commercial and non-commercial software packages. The marching cubes algorithm takes a 3D segmentation image as its input and outputs surface data in the form of a triangulated mesh, represented using vertices and faces.
Combining what we have just outlined leads to an "automated volume segmentation + marching cubes based surface generation" pipeline for subcortical and ventricular structures. Such a procedure may well be vulnerable to noise induced by inaccurate segmentations, resulting in disconnected regions or holes within the surface (Qiu and Miller, 2008). In addition, it is plausible that the marching cubes algorithm is liable to miss thin subregions of a structure of interest such as the thin "bridge" connecting the inferior horn and the main body of the lateral ventricle (Qiu and Miller, 2008). In other words, the resulting surface may not have the correct anatomical topology. Furthermore, even for a structure of interest with a highly accurate segmentation and an "easy" topology (a relatively simple shape), it is likely that the marching cubes algorithm will not deliver surfaces of a sufficient smoothness. Indeed, it is a most challenging task to extract the structure of interest's surface with high accuracy, correct anatomical topology, and sufficient smoothness in the same instance. To ensure a high degree of accuracy in the surface, a precise volumetric segmentation and a high fidelity in the surface with respect to the corresponding volumetric segmentation is required. Naturally, to ensure a correct anatomical topology, a surface generation approach that is devised around the notion of preserving the anatomical topology of the structure of interest is needed. Meanwhile, the classic filtering and smoothing approaches may not be sufficient to ensure the required smoothness without sacrificing the fidelity to the corresponding volumetric segmentation.
Alternatives to the aforementioned combination are certainly possible and there are numerous existing pipelines that can generate smooth subcortical structural shapes directly from MRIs. In contrast to a binary segmentation procedure for shape generation, those pipelines generally employ shape modeling for their segmentation purposes (Heimann and Meinzer, 2009;Patenaude et al., 2011). In other words, the structural shapes were not created from the binary segmentation, but directly from the dense MR images. The main limitation of these shapemodeling based approaches is the lack of flexibility in relation to individual components; one may desire the ability to utilize a more accurate segmentation algorithm or a more sophisticated meshing algorithm.
It is in the context of all of the above that we propose a fullyautomated subcortical and ventricular shape generation pipeline which satisfies the demand for accuracy (both topological and otherwise) and smoothness in four steps: (1) automatically segment the subcortical and ventricular structures of interest using the raw structural MRI data acquired from a scanner; (2) create a study-specific template shape with the correct anatomical topology and sufficient surface smoothness; (3) create a triangulated mesh from each binary segmentation obtained in step (1) using the marching cubes algorithm; (4) filter and smooth the surfaces generated in step (3) in a deformation based manner.
To perform the initial segmentation, we employ a fullyautomated segmentation pipeline, the diffeomorphic multi-atlas likelihood fusion (MALF) algorithm (Tang et al., 2013), the accuracy of which in segmenting subcortical and ventricular structures has been validated on a variety of MRI datasets (Tang et al., 2015c). Instead of applying the marching cubes algorithm directly, to generate a corresponding triangulated mesh from the segmentation of MALF with the desired properties, we rely on deformation based shape generation in the setting of large deformation diffeomorphic metric mapping (LDDMM) for surfaces (Vaillant and Glaunès, 2005). Given a pre-defined triangulated surface of a specific structure of interest, LDDMM is capable of preserving the topology and smoothness of that surface when registering it to a target surface. In other words, if we register a template surface with the correct anatomical topology and a high degree of smoothness to a target surface using LDDMM, the deformed surface is guaranteed to inherit that topology and smoothness from the template while being as similar as possible to the target surface. This is essentially due to the properties of diffeomorphic transformations and the capability of LDDMM to deliver the accurate diffeomorphisms needed for surface registration (Vaillant and Glaunès, 2005).
In this paper, we will first detail each of the above steps in the proposed pipeline. We then proceed to evaluate the proposed pipeline quantitatively and qualitatively using three MRI datasets. There are 16 structural MRIs in the first dataset, for each of which we manually segmented the subcortical and ventricular structures, with a view to quantitatively evaluating the performance of the proposed pipeline by comparison with the gold standard. Within the second dataset, there are a total of 1,445 structural MRIs, on which we qualitatively examine the surfaces delivered by the proposed pipeline. For the third dataset, there are 15 atlas structural MRIs and 20 testing structural MRIs, with the structures of interest being the subcortical structures that have been manually delineated. We also compared our results with those from a well-established pipeline that outputs smooth subcortical surfaces directly from dense MRIs, namely the FSL-FIRST pipeline (Patenaude et al., 2011). Three aspects were examined; the accuracy based on quantitative evaluation, the anatomy topology based on visual examination, and the smoothness based on quantitative assessment.

PREDICT-HD
The first two datasets that feature in this work are both part of the PREDICT-HD study (https://www.predict-hd.net/) where all enrolled subjects were at risk of HD and had previously undergone elective predictive genetic testing. Subjects labeled as premanifest HD (pre-HD) are those who were found to be "gene expanded, " possessing a cytosine-adenine-guanine (CAG) ≥ 36 but not exhibiting the motor criteria consistent with a diagnosis of HD (The Huntington's Disease Collaborative Research Group, 1993). A control group was defined as subjects who were deemed "non-gene expanded, " possessing a CAG ≤ 30. Participants of PREIDCT-HD were recruited from 32 sites across the United States, Canada, Europe, and Australia and underwent longitudinal study visits consisting of a neurological motor examination, cognitive assessment, brain MRI, psychiatric and functional assessment, and blood testing for genetic and biochemical analyses. Informed written consent was obtained from all subjects before participating in this study.
Subjects with pre-HD were further divided into three subgroups ("low-HD, " "mid-HD, " and "high-HD") based on their CAP scores, a function of their CAG repeat length and current age given by CAP = (age at study entry) × (CAG -33.66) (Zhang et al., 2011). The three subgroups are defined according to CAP < 290 (the low-HD group), 290 ≤ CAP ≤ 368 (the mid-HD group), and CAP > 368 (the high-HD group).

Subjects
In the first dataset, there are a total of 16 subjects (3 males and 13 females, mean age = 42.1 ± 10.1 years), including 6 control subjects, 4 low-HD subjects, 3 mid-HD subjects, and 3 high-HD subjects. Only one scan of each subject was selected, resulting in a total of 16 MRI scans in the first dataset.
For the second dataset, there are a total of 169 control subjects, including 106 females (mean age at baseline = 48.3 ± 11.2 years) and 63 males (mean age at baseline = 48.6 ± 14.8 years). Within the control group, 59 subjects had only 1 scan, 43 subjects had 2 scans, 27 subjects had 3 scans, 16 subjects had 4 scans, 15 subjects had 5 scans, 7 subjects had 6 scans, and 1 subject had 7 scans, resulting in a total of 414 MRI scans, with the average interval between two consecutive scans being 1.1 years. Within the low-HD group, there are a total of 113 subjects, including 85 females (mean age at baseline = 33.1 ± 9.1 years) and 28 males (mean age at baseline = 35.7 ± 10.8 years). In the low-HD group, 52 subjects had only 1 scan, 35 subjects had 2 scans, 12 subjects had 3 scans, 8 subjects had 4 scans, 3 subjects had 5 scans, 2 subjects had 6 scans, and 1 subject had 8 scans, resulting in a total of 225 MRI scans, with the average interval between two consecutive scans being 0.8 years. Within the mid-HD group, there are a total of 141 subjects, including 98 females (mean age at baseline = 42.1 ± 10.2 years) and 43 males (mean age at baseline = 42.4 ± 11.2 years). In the mid-HD group, 62 subjects had only 1 scan, 36 subjects had 2 scans, 14 subjects had 3 scans, 17 subjects had 4 scans, 5 subjects had 5 scans, 6 subjects had 6 scans, and 1 subject had 7 scans, resulting in a total of 312 MRI scans, with the average interval between two consecutive scans being 0.8 years.
Within the high-HD group, there are a total of 227 subjects, including 136 females (mean age at baseline = 49.3 ± 10.9 years) and 91 males (mean age at baseline = 50.0 ± 11.1 years). In the high-HD group, 99 subjects had only 1 scan, 68 subjects had 2 scans, 26 subjects had 3 scans, 17 subjects had 4 scans, 8 subjects had 5 scans, 8 subjects had 6 scans, and 1 subject had 8 scans, resulting in a total of 477 MRI scans, with the average interval between two consecutive scans being 0.9 years. There are another 4 females (mean age at baseline = 44.6 ± 9.9 years) that were not identified as belonging to any group. Among those 4 subjects, 3 had been scanned once while the remainder had been scanned twice, resulting in a total of 5 MRI scans. There are another 12 MRI scans for which we could not identify their demographic and clinical information. However, given that the goal of this paper is to evaluate a surface generation pipeline rather than to compare groups of different clinical states, we retained all of the 1,445 scans from the second dataset for pipeline validation. A summary of this dataset is tabulated in Table 1.
High resolution anatomical MR images of the first two datasets were used in this study. Given that the PREDICT-HD study was both multi-centered and longitudinal in nature, the image acquisition procedures were heterogeneous, including multiple vendors (GE, Phillips, and Siemens), different field strengths (1.5 Tesla and 3 Tesla), and more than 20 different MR acquisition protocols (due to issues with transmission and receiver hardware). Detailed scanning information for each of the 1,445 MR scans can be found in the Supplementary Material 1.
The third dataset used in this study includes 35 brain MRI scans from the OASIS project. The manual segmentations of these images were produced by Neuromorphometrics, Inc. (http://Neuromorphometrics.com/) using the brainCOLOR labeling protocol. The data were applied in the 2012 MICCAI Multi-Atlas Labeling Challenge and are publicly accessible (https://masi.vuse.vanderbilt.edu/workshop2012/index.php/ Main_Page). In the challenge, 15 subjects were used as atlases and the remaining 20 images were used for testing. For this dataset, our structures of interest are the 12 subcortical regions. No. of scans = 4 17 17 No. of scans = 5 5 8 No. of scans = 6 6 8 No. of scans = 7 1 0 No. of scans = 8 0 1 Average inter-scan interval 0.8 years 0.9 years

Automated Structure Segmentation
As shown in Figure 1 (the work flow of the proposed pipeline), one can view this pipeline as having two major components; automated structure segmentation and surface filtering. The subcortical and ventricular structures, in both hemispheres, were extracted from each T1-weighted image using a fullyautomated structure segmentation pipeline (Tang et al., 2015c) itself consisting of two steps, skull-stripping and brain structure segmentation. The underlying theoretical basis of this approach is multi-atlas likelihood-fusion (MALF) in the framework of a random deformable template model (Tang et al., 2013). This segmentation pipeline has been tested and validated on a number of datasets with relevance to various brain structures, particularly the subcortical and ventricular structures (Liang et al., 2015;Tang et al., 2015a). In this study, the 16 T1-weighted images of the first dataset served as the atlases used in MALF to perform the automated structure segmentation for the first and the second datasets. Each structure of interest, such as the left hippocampus, was manually delineated in all 16 atlases by a team of neuroanatomists at Johns Hopkins University with more than 15 years' experience in manually tracing subcortical structures. Various sets of subcortical and ventricular atlases, used in our other studies, were all created by the same team and have proven their reliability (Tang et al., 2013(Tang et al., , 2015c(Tang et al., , 2016Seymour et al., 2017). Intra-and inter-rater reliability of manual delineations by this team have been quantified in earlier studies; intra-class correlation (ICC) statistics revealed high rates of intra-and interrater reliability (intra-rater ICC ranges between 0.96 and 0.98; inter-rater ICC ranges between 0.9 and 0.93) (Qiu et al., 2009a).
To evaluate the proposed pipeline's handling of the first dataset, we adopted a leave-one-out strategy; one atlas image was treated as the to-be-segmented image while the remainder served as the atlas set used in segmenting that excluded image. When evaluating the second dataset, we continued to use these 16 atlases for segmentation via MALF. For the third dataset, the 15 atlas images were used to segment the subcortical structures in each of the 20 testing images.

Surface Generation
With the binary segmentation of the structures of interest completed using the structure segmentation procedure discussed above, we proceeded to create a triangulated mesh contouring the boundary of the segmentation using the marching cubes algorithm. The marching cubes algorithm yields triangulated surfaces with a high fidelity to the segmentation. Thus, when the segmentation is lacking accuracy, the marching cubes algorithm will be incapable of correcting the mistakes incurred during the segmentation step. In addition, the resulting surface may well be insufficiently smooth for our purposes. To overcome these limitations, one potential approach is to register a template surface to a target surface (the raw structure surface created from the marching cubes algorithm). The template surface is supposed to have correct anatomical topology and sufficient smoothness. The deformed template surfaces are therefore expected to have geometric characteristics identical to those of the target surfaces while possessing the topology and connectivity of the template surface.
In our pipeline, the template surface came from one of the 16 subjects in the first dataset. The 14 structures of interest for the selected subject were manually delineated with care taken to ensure both segmentation accuracy and boundary smoothness during the manual delineation. That specific subject was chosen based on three considerations: (1) the area of the subject's surface should be close to the mean area across all 16 surfaces from the manual segmentations; (2) the geometry and topology of the subject's surface should be correct based on visual examination; (3) the selected surface should be sufficiently smooth quantitatively and qualitatively.
In creating the template surface, instead of using the marching cubes algorithm, we adopted the Delaunay algorithm for triangulation (Lee and Schachter, 1980;Shewchuk, 2002) to guarantee further smoothness. We have noticed, however, that the Delaunay algorithm is much less stable than that of the marching cubes, even though it yields smoother results. This is our rationale for using marching cubes for the triangulation of the raw structure surfaces rather than the Delaunay algorithm.
With the template surface and target surfaces for each structure of interest created, we performed a rigid alignment of the surfaces and then proceeded to the LDDMM surface registration (Vaillant and Glaunès, 2005). Specifically, the template surface was rigidly aligned (rotation and translation) to the target surface, with the optimal rigid transformation between the vertex sets of the two surfaces obtained by minimizing a score that combines registration and soft assignment. After that, the LDDMM surface registration was performed from the rigidly aligned template surface to the target surface. Details on the "rigid + LDDMM" surface registration pipeline can be found in our previous work (Tang et al., 2014). After obtaining all of the rigid and diffeomorphic transformations between the template surface and the target surfaces, we applied these transformations in turn to the template surface, generating a deformed template surface for each structure of interest in each subject MRI. This deformed template surface is the result of our proposed pipeline, a smooth surface of a subcortical and ventricular structure of interest in an individual MRI scan.

Evaluation Criteria
As we have the gold standard-manual segmentations-at our disposal for the first and the third datasets, we quantitatively computed the accuracy and reliability of the proposed pipeline through the use of the following evaluation metrics: where V(A) and V(B) are the volumetric measurements of segmented images A and B. For example, A may represent the binary segmentation of the left hippocampus from manual delineation while B represents the corresponding automated segmentation from MALF.
• Absolute volume difference (AVD) where V(A) and V(B) are again the volumetric measurements of segmented images A and B.
• Correlation coefficient For the third quantitative comparison metric, we employed the Pearson product-moment correlation coefficient (PCC) between the volumetric measurements of the two segmentations in comparison, for example those of the manual segmentation and the MALF-derived automated segmentation.
In addition to evaluating the segmentation accuracy using the first and the third datasets, we also assessed the smoothness of the resulting surfaces quantitatively and qualitatively (through visual examination by several raters) using all three datasets. The smoothness of a surface was quantified using the following metric: • Geometric Laplacian (GL) where n(v) is the index set of the vertices v i which are themselves the direct neighbors of v, and l i is the Euclidean distance from v to v i . GL(v) represents a kind of measure of roughness: the higher it is, the rougher is the surface around v. The GL of a surface is computed as the sum of the norm of all vertex-wise GL vectors,

Group Comparisons
In our first experiment, we compared results from the proposed pipeline, in terms of both volumetric segmentations and triangulated surfaces, with those before filtering (obtained from MALF) using all three datasets. Their results were also compared to the gold standard of the first and the third datasets. In the first experiment, our structures of interest included all the 14 subcortical and lateral ventricle structures for the first two datasets and the 12 subcortical structures for the third dataset.
In the second experiment, we performed a comparison with a state-of-the-art pipeline, FSL-FIRST, that outputs volumetric segmentations as well as smooth triangulated surfaces of subcortical structures as well. This experiment was conducted on the first dataset and analyzed the 12 subcortical structures only, as FSL-FIRST does not output lateral ventricle results. Student's t-tests were employed to evaluate the significance of a group difference in all settings.

The First Experiment
In Tables 2-4, we respectively detail the mean and standard deviations of the DSCs, the AVDs, and the PCCs for each of the 14 structures of interest of the first dataset when calculated under the three possible comparisons; the raw automated segmentations from MALF vs. the manual segmentations, the raw automated segmentations from MALF vs. the filtered automated segmentations, as well as the filtered automated segmentations vs. the manual ones. The corresponding results on   (Table S1). Please note, the filtered automated segmentations were generated from the smoothly deformed surfaces via nearest neighbor assignment. As shown in the first column of each of the three tables, the raw automated segmentations obtained from MALF are highly accurate when compared to the gold standard. This illustrates the accuracy of the first step of our surface generation pipeline. For the second step, generating a smoothed version of the raw surface, we achieved a high fidelity, as is demonstrated in the second column in each of the three tables. Comparing the final results, the filtered surface based segmentations, with the gold standard, the accuracy is again high (the third column of each of the three tables) and indeed similar to that of the raw accuracy.
Results on comparing the smoothness of the surfaces of those three approaches for the first and the third datasets are respectively demonstrated in Table 5 and the Supplementary Material 2 (Table S2). Clearly, for each of the structures of interest, surfaces from the proposed pipeline are significantly smoother (p << 1E −10 ) than not only the raw automated results from MALF but also the manual results. In Figure 2, we present comparison results for the three methods (manual, raw automated, and filtered automated), in terms of segmentations that are superimposed on the structural MR image (for better visualization) and the corresponding surfaces, for one representative subject. Evidently, the proposed method is capable of capturing thin regions of a structure of interest, such as in the lateral ventricle, and thus preserving the structure's anatomical topology. Furthermore, even when compared with the gold standard surfaces created from the marching cubes algorithm, the surfaces delivered by the proposed pipeline are much smoother.
In Figure 3, we illustrate the smoothness comparison results of both datasets before and after deformation based filtering, from which a significant increase in smoothness was observed for each structure in both datasets. In addition to smoothness, the segmentation accuracy of the second dataset were also visually examined independently by three experienced raters. We found that on the bilateral putamen, globus pallidus, amygdala, thalamus, and lateral ventricle, the proposed pipeline delivered sufficiently well-generated surfaces for all 1,445 scans. In other words, the failure rate for any of those 5 structures in both hemispheres is 0%. For the other subcortical structures the number of surfaces found to be flawed were as follows: 19 out of 1,445 surfaces of the left caudate (failure rate being 1.31%), 15 out of 1,445 surfaces of the right caudate (failure rate being 1.04%), 7 out of 1,445 surfaces of the left hippocampus (failure rate being 0.48%), and 33 out of 1,445 surfaces of the right hippocampus (failure rate being 2.28%). We also note that the 19 left caudate surfaces with flaws were generated from the scans of 16 subjects while the 15 right caudate surfaces came from 9 subjects, the 7 left hippocampus surfaces came from 4 subjects, and the 33 right hippocampus surfaces came from 14 subjects. Such observations suggest that a failure for the proposed pipeline is more likely to recur in longitudinal scans of the same subject than on the dataset as a whole. In Figures 4, 5, we present the outputs in representative failure cases for the caudate (both left and right) and the hippocampus (both left and right) respectively.
In addition to qualitative assessment, we also conducted outlier analysis based on each surface's GL value. To be specific, outliers were defined as those whose GL values were outside the range Q 1 − 1.5(Q 3 − Q 1 ), Q 1 + 1.5(Q 3 − Q 1 ) , where Q 1 and Q 3 respectively denote the 25 percentile and the 75 percentile  of all structure-specific GL values. From this outlier analysis, we detected 15 outliers for the left caudate, 9 outliers for the right caudate, 6 outliers for the left hippocampus, and 26 outliers for the right hippocampus. These numbers agree well with our qualitative assessment results.

The Second Experiment
The mean values and standard deviations of GL for the 12 subcortical surfaces, delivered by FSL-FIRST, are also listed in Table 5, from which we observed a similar level of smoothness as results from the proposed pipeline, both being significantly smoother than those from the gold standard and MALF. Comparing between the proposed pipeline and FSL-FIRST, the bilateral amygdalar surfaces from the proposed pipeline are much smoother than those from FSL-FIRST whereas an opposite pattern was observed for the bilateral hippocampal surfaces. Overall, those two methods have similar performance in terms of surface smoothness. With regards to the segmentation accuracy, as quantified by the DSCs (Table 6), the AVDs (Table 7), and PCCs (Table 8), the proposed pipeline significantly outperformed FSL-FIRST.

DISCUSSION
In this paper, we have developed a fully-automated shape generation pipeline for subcortical and ventricular structures of the human brain which preserves smoothness and anatomical topology in the surfaces. The performance of the pipeline has been validated on three datasets, both quantitatively and qualitatively. We found that, without sacrificing the accuracy, the resultant surfaces have high smoothness and correct anatomical topology. Based on visual examinations and outlier analyses on a large number of surfaces (1,445 in total for each structure), the pipeline has a very low rate of failure; to be specific, the failure rate is 0% for the putamen, the globus pallidus, the amygdala, the thalamus, and the lateral ventricle in both hemispheres, 1.31% for the left caudate, 1.04% for the right caudate, 0.48% for the left hippocampus, and 2.28% for the right hippocampus. As is exemplified in Figures 4, 5, the main cause of failure for the caudate and the hippocampus is segmentation inaccuracy incurred in the MALF based automated segmentation. Those two structures are both adjacent to the cerebrospinal fluid and it has been found that this makes them more susceptible to inaccuracy (Tang et al., 2013). Even for those two structures, the failure rates on the first and the third datasets are 0% while those on the second dataset are < 3% and we consider such results to be a strong indicator of the pipeline's capacity for high performance.
There are three main components in the pipeline: automated structure segmentation; creation of study-specific template shapes; and LDDMM-based shape filtering. For automated structure segmentation, we utilized a well-developed algorithm of our own group's creation, the diffeomorphic multi-atlas  likelihood fusion. Using the first and the third datasets, which have the manual segmentations available, we again validated the performance of the MALF algorithm in terms of the automated segmentation of subcortical and ventricular structures. For this component, we can also use other automated structure segmentation algorithms, as long as the accuracy is sufficient, such as FreeSurfer (Fischl et al., 2002) and FSL-FIRST (Patenaude et al., 2011). FreeSurfer based segmentations have also been used for surface generation in existing works (Qiu and Miller, 2008;Qiu et al., 2009b;Tang et al., 2014). In FSL-FIRST, the  segmentation of a subcortical structure of interest is actually obtained from its corresponding smooth surface. In other words, FSL-FIRST outputs both smooth surfaces and segmentations for subcortical structures. In that sense, it may be redundant to perform another round of surface generation based on segmentations from FSL-FIRST.
In this work, we did not compare the surface results from the proposed pipeline with those obtained from replacing our segmentation module with another one since that is essentially a comparison of various segmentation algorithms, which is not the goal of this paper. With that being said, we did validate the segmentation accuracy of our pipeline using the gold standard of the first dataset, with the DSCs ranging between 0.87 and 0.93 (Table 2), the AVDs ranging between 0.04 and 0.1 (Table 3), and the PCCs ranging between 0.72 and 1 (Table 4), as well as the third dataset (see Table S1 in the Supplementary Material 2). For the second step, the creation of study-specific template shapes, we applied the Delaunay algorithm (Lee and Schachter, 1980;Shewchuk, 2002) for triangulating a carefully-selected manual segmentation for each structure of interest. The reason for using a manually created segmentation is 2-fold: firstly, a manual segmentation can guarantee correct anatomy and smoothness to some degree; secondly, we had previously generated the manual segmentations to serve as atlases in our automated structure segmentation phase, meaning no additional effort was required here. With that being said, we can also create a template shape based on an automated segmentation with sufficient accuracy, correct anatomy, and sufficient smoothness. The Delaunay algorithm is superior to the marching cubes algorithm in terms of smoothness of the resultant surfaces though it can fail in some cases, especially when the segmentation is flawed. Therefore, in this case, we were well-placed to generate the template shapes using the Delaunay algorithm since we could pay special attention to those surfaces. Meanwhile the marching cubes algorithm was better suited for the target segmentations.
In practice, there are two guiding rules in selecting the template surface: (1) the same definitions should be used in the automated segmentations of the target MRIs as in the segmentation of the template surface. For example, in this work, all automated segmentations of the first two datasets were obtained by using the atlases of the 16 subjects while the template surface was also obtained from this 16-subject pool. It may be inappropriate to use a template surface from a MALF-based segmentation definition to smooth an automated segmentation from FSL-FIRST; (2) It is better to select a template surface from the same study sample. In other words, it may be inappropriate to use a template surface from our HD study to smooth an automated segmentation from another study.
For the third step, LDDMM-based shape filtering, the key idea is to use a diffeomorphic transformation that can accurately deform the template shape to be very close to the target one while preserving the smoothness and topology of the template shape. LDDMM-surface is a validated algorithm that has been shown to yield sophisticated diffeomorphisms that can accurately register a pair of surfaces (Vaillant and Glaunès, 2005). According to our experiments on all three datasets, the deformed results, based on LDDMM-surface matching, are very close to the raw data (the target segmentations for which we aim to create their corresponding surfaces) while preserving the topology and smoothness of the template shapes. The high fidelity of the resulting surfaces to the target segmentations is somewhat of a double-edge sword; on the one hand, it guarantees high accuracy while on the other, it causes sensitivity to the inaccuracy induced in the segmentation process. In other words, when the segmentations are noisy (like those from the second dataset that the pipeline failed on), the resulting surfaces will inherit the noise (inaccuracy) of the segmentations from MALF. A potential solution is to utilize a much more robust variant of the LDDMM-surface matching, such as the one proposed by Tward and colleagues . Investigation of more advanced surface matching algorithms that are capable of maintaining a high fidelity to the segmentation while being robust to noisy subregions of the segmentations will be one of our future efforts. Furthermore, there are wholly separate registration approaches that can be applied to deforming surfaces, such as the 14 methods compared in (Klein et al., 2009). We did not compare here the surfaces generated by using different surface deformation approaches as that goes beyond the scope of this paper; to formulate the proposed pipeline.
This work was strongly motivated by the ongoing search for simpler, more effective, and more flexible pipelines capable of generating subcortical and ventricular surfaces with high smoothness and correct anatomy. According to our comparison results with another popular pipeline that directly outputs binary segmentations and smooth triangulated surfaces, namely FSL-FIRST, the surface results from the proposed pipeline have a similar degree of smoothness as those from FSL-FIRST, whereas the proposed pipeline's segmentation accuracy is significantly higher than FSL-FIRST for almost each of the 12 subcortical structures, which agrees with our previous findings (Tang et al., 2015c). This again may suggest a superiority of the proposed pipeline, although we must be aware of the potential unfairness given that a specific structure's definition may differ significantly for atlases used in MALF and those in FSL-FIRST. Compared with existing pipelines, the main contribution of this work, aside from the pipeline performance, is to have provided a general framework that can be easily adopted or modified according to one's own purpose; for example, to replace MALF with another segmentation algorithm that one favors more or to choose a template surface that one considers to be more suitable for a specific study.
One potential limitation of the proposed pipeline is that it is difficult to be sure that no subtle disease-related features were lost during this surface generation process. A way to partially address this question is to compare the disease-related features (via group comparison to a control group) obtained by using a set of surfaces created manually (to ensure accuracy) and those obtained by using a set of surfaces created from the proposed pipeline. However, given the lack of such a set of manually created surfaces involving both control and disease subjects, it is not possible to conduct such an experiment at this moment. We anticipate that as one of our future endeavors.
The statistical shape analysis of subcortical and ventricular structures of the human brain has become a topic of most considerable interest in contemporary research (Styner et al., 2003;Qiu and Miller, 2008;Qiu et al., 2010). We are confident that the proposed pipeline will further the development of this research field, especially in the investigations of HD.

AUTHOR CONTRIBUTIONS
XT and MM: Contributed to the design of the entire pipeline; YL, ZC, and NH: Contributed to the analysis and evaluation experiments; HJ and JP: Contributed to the data acquisition; XT: Wrote the paper. All authors revised the manuscript critically for important intellectual content.